OASIS Open Document Format for Office Applications (OpenDocument) TC

Expand all | Collapse all

xml:lang settings. Confused.

David Pawson

David Pawson06-02-2007 07:58

David Pawson

David Pawson06-07-2007 13:48

  • 1.  xml:lang settings. Confused.

    Posted 06-01-2007 15:14
    I'm processing a file in an I18N scenario.
    
    Which is the 'definitive' language please? (inaccurate definition)
    
    In ooo, styles.xml shows quite a few.
    
    


  • 2.  Re: [office] xml:lang settings. Confused.

    Posted 06-01-2007 17:21
    On Friday 01 June 2007, Dave Pawson wrote:
    > fo:language="sv"
    > fo:country="SE"
    Obviously this is a document in swedish (sv), from sweden (SE).
    Like a glibc locale would say: sv or sv_SE.
    
    -asian and -complex are a special case (see also font-name-asian font-family-asian etc.)
    I forgot again why they are separated, we don't use that in KOffice.
    
    -- 
    David Faure, faure@kde.org, sponsored by Trolltech to work on KDE,
    Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
    


  • 3.  Re: [office] xml:lang settings. Confused.

    Posted 06-02-2007 07:58
    On 01/06/07, David Faure 


  • 4.  Re: [office] xml:lang settings. Confused.

    Posted 06-07-2007 13:14
    Hi Dave,
    
    On Friday, 2007-06-01 16:13:36 +0100, Dave Pawson wrote:
    
    > Which is the 'definitive' language please? (inaccurate definition)
    
    For a mixed content it depends on the Unicode script type.
    If there are CJK sections they are associated with
    > style:language-asian="en"
    > style:country-asian="GB"
    (having set that to en-GB is pretty useless anyway, probably no en-GB
    spell checker will recognize CJK words, and also an en-GB breakiterator
    will not work correctly in that scenario)
    
    If there are CTL script type sections they are associated with
    > style:language-complex="en"
    > style:country-complex="GB"
    (also here, having set that to en-GB is pretty useless as well)
    
    For all other character based script types (including Latin, Cyrillic, ...)
    > fo:language="sv"
    > fo:country="SE"
    is assigned.
    
      Eike
    
    -- 
     OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
    


  • 5.  Re: [office] xml:lang settings. Confused.

    Posted 06-07-2007 13:48
    On 07/06/07, Eike Rathke 


  • 6.  Re: [office] xml:lang settings. Confused.

    Posted 06-15-2007 17:03
    Hi Dave,
    
    On Thursday, 2007-06-07 14:47:39 +0100, Dave Pawson wrote:
    
    > I'm assuming for mixed docs, somewhere within the body the change in 
    > language
    > would be signalled by an xml:lang attribute on the paragraph?
    
    Only if it differs from the inherited setting. By using fo:* attributes
    though, not xml:lang. However, a paragraph style may have one Western,
    CJK and CTL language assigned. These are not repeated when script types
    change.
    
    > My question related to (mainly) single language documents,
    > where I need the primary language of the document.
    
    The primary aka default language should be the 


  • 7.  Re: [office] xml:lang settings. Confused.

    Posted 06-16-2007 06:56
    On 15/06/07, Eike Rathke 


  • 8.  Re: [office] xml:lang settings. Confused.

    Posted 06-19-2007 17:37
    Hi Dave,
    
    On Saturday, 2007-06-16 07:56:00 +0100, Dave Pawson wrote:
    
    > I'm OK with that, though why use fo: rather than xml:lang seems
    > a bit NIH?
    
    No idea, I wasn't involved with the original decision.
    
    Btw: NIH? Not Invented Here? (please bear in mind that this is an
    international list and not all participants have a full understanding of
    every English acronym and abbreviations)
    
    
    > >I wish xml:lang was used, would had made the latest adaption to be able
    > >to support RFC 4646 moot, as xml:lang already says "The values of the
    > >attribute are language identifiers as defined by [IETF RFC 3066], Tags
    > >for the Identification of Languages, or its successor". Which RFC 4646
    > >is.
    > >
    > >Does anyone happen to know why xml:lang exactly was not used?
    > 
    > Even stronger, please can we change to xml:lang then standard XML
    > processors can do what they should do?
    
    How exactly should a "change to" look like?
    
    
    > >> I'm curious. When I initially open a document authored in Japanese or
    > >> Chinese,
    > >> how would I know whether to look at style:language-asian or fo:language?
    > >
    > >I guess you don't without actually looking at the script type of the
    > >textual content.
    > 
    > Which, IMHO, is a big hole in the ODF spec.
    
    Seconded. On the other hand, having two attributes, for example Western
    and CTL, for an entire paragraph gets rid of the need to define
    alternating xml:lang (or whatever) attributes whenever the script type
    changes. And having alternating CTL and Roman scripts is quite common.
    
    
    > >> I guess that defines what I meant by 'primary language' of the document?
    > >
    > >The 


  • 9.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 07:12
    On 19/06/07, Eike Rathke 


  • 10.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 08:51
    Dave,
    
    Dave Pawson wrote:
    > On 19/06/07, Eike Rathke 


  • 11.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 09:08
    On 20/06/07, Michael Brauer 


  • 12.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 13:08
    Hi Dave,
    
    Dave Pawson wrote:
    > On 20/06/07, Michael Brauer 


  • 13.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 14:06
    On 20/06/07, Michael Brauer 


  • 14.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 16:00
    Hi Dave,
    
    On Wednesday, 2007-06-20 15:05:38 +0100, Dave Pawson wrote:
    
    > > [...] the same
    > >way as you can select the font you want to use for the ASCII characters,
    > >you may want to be able to do so for your Asian or CTL characters.
    > 
    > I've not heard of CTL characters. I doubt you mean control characters.
    
    Characters of scripts of CTL type, where CTL == Complex Text Layout, see
    also http://en.wikipedia.org/wiki/Complex_text_layout
    
      Eike
    
    -- 
     OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
    


  • 15.  Re: [office] xml:lang settings. Confused.

    Posted 06-25-2007 13:31
    On Wednesday 20 June 2007, Michael Brauer wrote:
    > > 
    > > Can we just discuss xml:lang please, fonts may be associated
    > > but should be a seperable problem.
    > 
    >  From the user perspective it is the same problem. In mixed documents,
    > you not only have to switch the language, but also the fonts and other
    > settings.
    
    But we already have the properties for "fonts and other settings" that can
    be associated with a run of text. And some implementations (e.g. Qt) are able
    to find the font with a given character without the need for the user to explicitly
    specify a font that has that character....
    But let's assume an implementation that doesn't do that. In that case (and only in that case),
    I understand that the reason for this latin/CTL/asian split is that you don't want to have to 
    specify the font of each run of text independently. However this assumes that the installed 
    fonts handle all of the latin characters, all of the CTL characters and all of the asian characters.... 
    which is not the case.
    And even then the split only makes sense as a document-global setting (for newly written text).
    For existing text it would be much simpler to just associate the correct font with each run of text
    rather than having all three; creating new spans where needed, in mixed content.
    
    So I think the latin/CTL/asian split is a hack -- a half-solution which doesn't fully solve the
    problem and only makes it more complicated. I would wish it out of the ODF spec
    (KOffice certainly doesn't have such a weird split), but then people will say again
    that we are against interoperability with microsoft....
    
    -- 
    David Faure, faure@kde.org, sponsored by Trolltech to work on KDE,
    Konqueror (http://www.konqueror.org), and KOffice (http://www.koffice.org).
    


  • 16.  Re: [office] xml:lang settings. Confused.

    Posted 06-25-2007 13:45
    On Jun 25, 2007, at 9:31 AM, David Faure wrote:
    
    > So I think the latin/CTL/asian split is a hack -- a half-solution  
    > which doesn't fully solve the
    > problem and only makes it more complicated. I would wish it out of the  
    > ODF spec
    > (KOffice certainly doesn't have such a weird split), but then people  
    > will say again
    > that we are against interoperability with microsoft....
    
    :-)
    
    So what would you propose as a better solution?
    
    Also, how well does the current "hack" handle the issue this user  
    outlines:
    
    


  • 17.  Re: [office] xml:lang settings. Confused.

    Posted 06-25-2007 14:14
    See ...
    
    


  • 18.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 12:03
    Hi Dave,
    
    On Wednesday, 2007-06-20 08:11:54 +0100, Dave Pawson wrote:
    
    > >> Even stronger, please can we change to xml:lang then standard XML
    > >> processors can do what they should do?
    > >
    > >How exactly should a "change to" look like?
    > 
    > That the specification states how xml:lang is set, and that the attribute
    > is used as per http://www.w3.org/TR/xml11/#sec-lang-tag .
    > It would need to address the relationship to the various other metadata
    > usages (or remove them which sounds simpler).
    
    No, I was referring to the change process itself; IF we introduced
    xml:lang we'd have to provide a migration path from fo:* attributes to
    xml:lang and vice versa. Additionally all applications would have to
    adapt their load/store procedures quite significantly. I doubt we'd get
    lot acceptance..
    
    > >Well, the 


  • 19.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 12:27
    On 20/06/07, Eike Rathke 


  • 20.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 16:28
    Hi Dave,
    
    On Wednesday, 2007-06-20 13:27:17 +0100, Dave Pawson wrote:
    
    > >No, I was referring to the change process itself; IF we introduced
    > >xml:lang we'd have to provide a migration path from fo:* attributes to
    > >xml:lang and vice versa.
    > 
    > I thought forward compatibility was the issue? rev 1.0 should be readable
    > by 2.0, not the other way round?
    
    Sure. Nevertheless it's nice if we provide a way where 1.2 aware
    applications can write a format that still can be read by 1.1-only
    applications, if possible.
    
    > For forward, it shouldn't be much of a transform above the identity 
    > transform?
    > 
    > Additionally all applications would have to
    > >adapt their load/store procedures quite significantly.
    > 
    > Why?
    
    Huh? If they didn't they'd write 1.1 fo:* format, not xml:lang, and
    would not be able to read xml:lang sequences.
    
    > Note I have no knowledge of the application level.
    > To me the value of odf is the xml on disk. The apps are temporal.
    
    Please note that this TC works on something called
    
    "OASIS Open Document Format for Office Applications (OpenDocument)"
    
    I guess the wording "for Office Applications" was intentional. We're not
    constructing some theoretical storage format. We try to adapt to the
    needs of applications. You do want to process the ODF on disk, not just
    lock it away.
    
      Eike
    
    -- 
     OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
    


  • 21.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 16:33
    On 20/06/07, Eike Rathke 


  • 22.  Re: [office] xml:lang settings. Confused.

    Posted 06-20-2007 08:52
    Hello,
    
    
    I'd like to bring in my two cents of Euro here.
    While I'm in favor of using a standard way to define the linguistic
    settings inside ODF (and thus to use xml:lang), I'd like to point out
    that when it comes to i18n and l10n the format cannot do everything and
    has to be versatile enough to allow extremely various use cases. The use
    of multiple languages and thus encodings inside one document as
    described by Eike is a good example of that.
    
    That requirement of "versatility" and flexibility put aside, we have to
    include that xml:lang inside the 1.2 spec if possible. I guess this
    should be a quite consensual decision so maybe we could do it "fast".
    
    
    Cheers,
    
    Charles.
    
    Eike Rathke a écrit :
    > Hi Dave,
    >
    > On Saturday, 2007-06-16 07:56:00 +0100, Dave Pawson wrote:
    >
    >   
    >> I'm OK with that, though why use fo: rather than xml:lang seems
    >> a bit NIH?
    >>     
    >
    > No idea, I wasn't involved with the original decision.
    >
    > Btw: NIH? Not Invented Here? (please bear in mind that this is an
    > international list and not all participants have a full understanding of
    > every English acronym and abbreviations)
    >
    >
    >   
    >>> I wish xml:lang was used, would had made the latest adaption to be able
    >>> to support RFC 4646 moot, as xml:lang already says "The values of the
    >>> attribute are language identifiers as defined by [IETF RFC 3066], Tags
    >>> for the Identification of Languages, or its successor". Which RFC 4646
    >>> is.
    >>>
    >>> Does anyone happen to know why xml:lang exactly was not used?
    >>>       
    >> Even stronger, please can we change to xml:lang then standard XML
    >> processors can do what they should do?
    >>     
    >
    > How exactly should a "change to" look like?
    >
    >
    >   
    >>>> I'm curious. When I initially open a document authored in Japanese or
    >>>> Chinese,
    >>>> how would I know whether to look at style:language-asian or fo:language?
    >>>>         
    >>> I guess you don't without actually looking at the script type of the
    >>> textual content.
    >>>       
    >> Which, IMHO, is a big hole in the ODF spec.
    >>     
    >
    > Seconded. On the other hand, having two attributes, for example Western
    > and CTL, for an entire paragraph gets rid of the need to define
    > alternating xml:lang (or whatever) attributes whenever the script type
    > changes. And having alternating CTL and Roman scripts is quite common.
    >
    >
    >   
    >>>> I guess that defines what I meant by 'primary language' of the document?
    >>>>         
    >>> The