OASIS Open Document Format for Office Applications (OpenDocument) TC

 View Only
  • 1.  Proposal for language tags according to RFC 4646

    Posted 11-28-2006 13:18
    Hi,
    
    Michael Brauer already mentioned there are proposals to come, so here is
    my proposal to enable support of any language/dialect/variant in any
    script type in any region.
    
    Currently, ODF sections 15.4.23 and 15.4.24 specify the fo:language and
    fo:country attributes for style-text-properties-attlist. They refer the
    W3C Extendible Stylesheet Language
    http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#country which in
    turn also separated language/country but refers RFC 3066 [see
    http://tools.ietf.org/html/rfc3066.html ] language tags where the
    combined strings of those language tags don't necessarily result in
    a language-country pair, but a sequence of primary-subtag and possibly
    empty subtag(s) instead.
    
    Similar to the style's fo:language and fo:country at least the following
    are affected:
    
    - "3.1.15 Language", the metadata 


  • 2.  Re: [office] Proposal for language tags according to RFC 4646

    Posted 11-28-2006 13:33
    On 28/11/06, Eike Rathke 


  • 3.  Re: [office] Proposal for language tags according to RFC 4646

    Posted 11-29-2006 12:14
    Hi Dave,
    
    On Tuesday, 2006-11-28 13:32:15 +0000, Dave Pawson wrote:
    
    > >The question now is how to do it in the schema.
    > 
    >  public static String langtag="("+language+
    >    "(-"+script+")?"+
    >    "(-"+region+")?"+
    >    "(-"+variant +")?"+
    >    "(-"+extension + ")?"+
    >    "(-"+privateuse+")?)|"+irregulars;
    
    Well, yes, that almost does it for a language tag string. Note that
    variant and extension in RFC 4646 have zero or more occurrences though,
    not only zero or one. That IMHO makes it difficult (or impossible? I'm
    not an expert there) to express them as attributes in Relax NG.
    
      Eike
    
    -- 
     OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
    


  • 4.  Re: [office] Proposal for language tags according to RFC 4646

    Posted 12-04-2006 14:13
    Hi,
    
    Eike Rathke wrote:
    > Hi,
    > 
    > Michael Brauer already mentioned there are proposals to come, so here is
    > my proposal to enable support of any language/dialect/variant in any
    > script type in any region.
    > 
    > Currently, ODF sections 15.4.23 and 15.4.24 specify the fo:language and
    > fo:country attributes for style-text-properties-attlist. They refer the
    > W3C Extendible Stylesheet Language
    > http://www.w3.org/TR/2001/REC-xsl-20011015/slice7.html#country which in
    > turn also separated language/country but refers RFC 3066 [see
    > http://tools.ietf.org/html/rfc3066.html ] language tags where the
    > combined strings of those language tags don't necessarily result in
    > a language-country pair, but a sequence of primary-subtag and possibly
    > empty subtag(s) instead.
    > 
    > Similar to the style's fo:language and fo:country at least the following
    > are affected:
    > 
    > - "3.1.15 Language", the metadata 


  • 5.  Re: [office] Proposal for language tags according to RFC 4646

    Posted 12-04-2006 15:01
    Hi Michael,
    
    On Monday, 2006-12-04 15:13:04 +0100, Michael Brauer wrote:
    
    > Since we started using the XSL-FO concept, I think it's reasonable to 
    > continue with that. This also ensures compatibility with ODF 1.1 
    > applications, that would use all language/country information when 
    > reading an ODF 1.2 file if we switch to a single RFC 4646 attribute now.
    > 
    > I'v checked XSL 1.1, which is a proposed recommendation, and it only has 
    > language, country and script, too. We could take the script attribute 
    > from there,
    
    XSL 1.1 narrows language/country to RFC 3066, which formally covers only
    ISO 639-1 and ISO 639-2 and IANA-registered languages, and does not
    cover the upcoming ISO/FDIS 639-3 nor other language designators that
    might be valid according to RFC 4646. I don't know whether that is
    really an obstacle and would affect us or not, just want to mention.
    
    > and could add a single attribute from our own namespaces 
    > that contains the
    > 
    > [region] *("-" variant) *("-" extension) ["-" privateuse]
    > fragment of RFC 4646.
    
    Note that the region is (most times? always? need to reread RFC 4646)
    the country.
    
      Eike
    
    -- 
     OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS
    


  • 6.  Re: [office] Proposal for language tags according to RFC 4646

    Posted 12-13-2006 19:31
    Hi,
    
    On Monday, 2006-12-04 15:59:55 +0100, Eike Rathke wrote:
    
    > XSL 1.1 narrows language/country to RFC 3066, which formally covers only
    > ISO 639-1 and ISO 639-2 and IANA-registered languages, and does not
    > cover the upcoming ISO/FDIS 639-3 nor other language designators that
    > might be valid according to RFC 4646. I don't know whether that is
    > really an obstacle and would affect us or not, just want to mention.
    > 
    > > and could add a single attribute from our own namespaces 
    > > that contains the
    > > 
    > > [region] *("-" variant) *("-" extension) ["-" privateuse]
    > > fragment of RFC 4646.
    > 
    > Note that the region is (most times? always? need to reread RFC 4646)
    > the country.
    
    Not always, it can also be a 3-digit UN M.49 code that formally also
    knows regions like South America.
    
    In fact with the *:language definitions we're also not able to model the
    'extlang' extended language part of RFC 4646 'language', see below, nor
    the 'privateuse' or 'grandfathered' tags. So Michael's suggestion to
    continue to use *:language and in an own attribute place only the
    remainder doesn't work.
    
    I think for backwards compatibility we should continue to place the
    2*3ALPHA ISO 639 code in *:language, the 2ALPHA ISO 3166 code in
    *:country, and create a *:script attribute where the 4ALPHA ISO 15924
    code will go. Doing so makes it easy to use by existing implementations.
    However, the script attribute may be lost during roundtrips.
    
    In all cases where a combination of those 3 ISO codes is not sufficient
    another single attribute should be added where the entire RFC 4646
    notation is contained, including repeated information from the
    *:language, *:country and *:script attributes. The reason for
    replicating that information is simply that otherwise parsing the
    attribute could be a nightmare, and existing RFC 4646 parsers will know
    how to handle it. Effectively an existing language-tag attribute would
    override existing *:language, *:country, *:script attributes.
    
    I'll try to mock up some specification text for this.
    
    
    Here the RFC 4646 syntax of the language tag in ABNF [RFC4234]:
    
       Language-Tag  = langtag
                     / privateuse             ; private use tag
                     / grandfathered          ; grandfathered registrations
    
       langtag       = (language
                        ["-" script]
                        ["-" region]
                        *("-" variant)
                        *("-" extension)
                        ["-" privateuse])
    
       language      = (2*3ALPHA [ extlang ]) ; shortest ISO 639 code
                     / 4ALPHA                 ; reserved for future use
                     / 5*8ALPHA               ; registered language subtag
    
       extlang       = *3("-" 3ALPHA)         ; reserved for future use
    
                                              ; NOTE: RFC 4646bis, waiting
                                              ; for ISO 639-3 to be
                                              ; finalized, will replace this
                                              ; comment with:
                                              ; specific ISO 639-3 codes
    
       script        = 4ALPHA                 ; ISO 15924 code
    
       region        = 2ALPHA                 ; ISO 3166 code
                     / 3DIGIT                 ; UN M.49 code
    
       variant       = 5*8alphanum            ; registered variants
                     / (DIGIT 3alphanum)
    
       extension     = singleton 1*("-" (2*8alphanum))
    
       singleton     = %x41-57 / %x59-5A / %x61-77 / %x79-7A / DIGIT
                     ; "a"-"w" / "y"-"z" / "A"-"W" / "Y"-"Z" / "0"-"9"
                     ; Single letters: x/X is reserved for private use
    
       privateuse    = ("x"/"X") 1*("-" (1*8alphanum))
    
       grandfathered = 1*3ALPHA 1*2("-" (2*8alphanum))
                       ; grandfathered registration
                       ; Note: i is the only singleton
                       ; that starts a grandfathered tag
    
       alphanum      = (ALPHA / DIGIT)       ; letters and numbers
    
                            Figure 1: Language Tag ABNF
    
       Note: There is a subtlety in the ABNF for 'variant': variants
       starting with a digit MAY be four characters long, while those
       starting with a letter MUST be at least five characters long.
    
    
    -- 
     OpenOffice.org Engineering at Sun: http://blogs.sun.com/GullFOSS