OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only
  • 1.  Preserve Space / Language Info at the inline level

    Posted 11-18-2014 15:20
    Hi David, all, Looking at: http://tools.oasis-open.org/version-control/browse/wsvn/xliff/trunk/xliff-21/xliff-core-v2.1.pdf Section "5.9.4.1 ITS Preserve Space Annotation" I'm not sure if this is the best way to define the annotation for Preserve Space, and I assume, Language Information later. There are two options: a) We have two attributes itsm:space and itsm:lang that can be set in any <mrk>/<sm> element, regardless of the type (just like translate). In that case we get this type of annotations: <mrk id='m1' translate='no' itsm:space='preserve' itsm:space='zxx'>3x + 5y = 2</mrk> <mrk id='m2' type='term' itsm:lang='fr-CA'>poutine</mrk> <mrk id='m3' type='itsm:any' itsm:space='preserve'>[ ]=2s</mrk> Etc. Or b) we decide to force a specific annotation for Preserve Space and for Language Information that are not mixed with others. In that second case, the simplest way to define them would be: <mrk id='1' type='itsm:space value='preserve'>...</mrk> <mrk id='2' type='itsm:lang' value='fr-CA'>...</mrk> Also, it seems to me that it would be a lot more clear for the reader to have just one ITS Module section (no appendix) and have each data category defined there, regardless how they are mapped. Cheers, -yves


  • 2.  Re: Preserve Space / Language Info at the inline level

    Posted 11-18-2014 19:12
    Am 18.11.2014 um 16:20 schrieb Yves Savourel <ysavourel@enlaso.com>: > Hi David, all, > > Looking at: > http://tools.oasis-open.org/version-control/browse/wsvn/xliff/trunk/xliff-21/xliff-core-v2.1.pdf > > Section "5.9.4.1 ITS Preserve Space Annotation" > > I'm not sure if this is the best way to define the annotation for Preserve Space, and I assume, Language Information later. > > There are two options: > > a) We have two attributes itsm:space and itsm:lang that can be set in any <mrk>/<sm> element, regardless of the type (just like > translate). > > In that case we get this type of annotations: > > <mrk id='m1' translate='no' itsm:space='preserve' itsm:space='zxx'>3x + 5y = 2</mrk> > > <mrk id='m2' type='term' itsm:lang='fr-CA'>poutine</mrk> > > <mrk id='m3' type='itsm:any' itsm:space='preserve'>[ ]=2s</mrk> > > Etc. > > > Or b) we decide to force a specific annotation for Preserve Space and for Language Information that are not mixed with others. > > In that second case, the simplest way to define them would be: > > <mrk id='1' type='itsm:space value='preserve'>...</mrk> > > <mrk id='2' type='itsm:lang' value='fr-CA'>...</mrk> > > > Also, it seems to me that it would be a lot more clear for the reader to have just one ITS Module section (no appendix) and have > each data category defined there, regardless how they are mapped. +1. Cheers, - Felix > > Cheers, > -yves > > >


  • 3.  RE: [xliff] Preserve Space / Language Info at the inline level

    Posted 11-18-2014 19:14
    Hi Yves, all I thought a bit more on this after today's XLIFF call and I'm starting to think that we cannot properly do this without a change to the core specification if we want it as a generic feature and not an ITS module feature. Long discussion bellow. Tl;dr sentence: Put ITS space and language mapping in a separate namespace than the rest of the ITS mapping module, add PR to set "xml:space" to "preserve" on a higher level element if any span need it. As it currently stands the validity of an XLIFF document (or adherence to our processing requirements) is not changed if the document is passed through an XML pretty printing application that respect the "xml:space" attribute and uses the schema. The default is "default" on everything except on the <data> element where it is restricted to only ever being "preserve" and defaults to that. Sometimes translatable content need to include leading or trailing whitespace that must be preserved. That can be done by setting "xml:space" to "preserve" on an ancestor of such content. That would also handle <ignorable>'s which a lot of the time will only contain whitespace and quite possibly spaces and newlines mixed up together as it's only content. If we want to allow an inline notation to control treatment of whitespaces on the type of logical spans we allow across segments there is no way we can tell an arbitrary XML processor about how they work. So a pretty printer (or storage system or some other kind of processor) would never see that we want preservation of spaces on a particular span. Which could lead to for example pretty printing to violate our processing requirements. I doubt you will find any XML pretty printer / formatter that will not make some change to a bunch of whitespace sitting alone in an <ignorable> but protected by a <sm/><em/> pair in other sibling trees. To make the space handling safe with respect to non XML spans and generic XML processors we should at a minimum make <source> and <target> use "xml:space" set to "preserve" as their default and only possible value. Treating content that has "xml:space" set to "default" as if it was set to "preserve" is not an error, the opposite is an error. So by using standard XML constructs to enforce the more restrictive mode and then allowing a nonstandard mechanic to relax to a less restrictive mode should always be safe. If we make the change to the default value and add a constraint that it must be set explicitly if set to "preserve" a 2.x document would still be 100% compatible with a 2.0 document. A 2.0 document could not make use of the new non-XML span space handling but a 2.0 processor would not cause harm to a 2.x document. The same goes for ITS specified space handling in XLIFF. A non ITS aware 2.1 processor (or a 2.0 processor) would not have any way to know that whitespace in a particular span must be preserved. So the ITS module must allow non ITS module aware processors to do changes to "preserve" tagged whitespace, which seems bad. Or it must require that "xml:space" is set to "preserve" on a higher level if any span contain space that should be preserved is encountered. I'd like us to eventually have a solution for the space handling that is not tied to ITS and that the ITS module would make use of that feature. And that things like pretty printing would not violate XLIFF processing requirements. If that can't be done right now we need to at least we need to add the higher level setting of "xml:space" to "preserve" to the ITS mapping module to make it safe. Perhaps we could break out this (and xml:lang) into a namespace of its own. That way a later version of core could adopt it from the ITS module without semantic changes. Thus avoiding the situation where we would have one core and one module feature do the exact same thing, or needing to make an incompatible change. Regards, Fredrik Estreen >


  • 4.  Re: [xliff] Preserve Space / Language Info at the inline level

    Posted 11-20-2014 11:17
    Fredrik, thanks for this detailed explanation. IMHO, we cannot change the core and that is baseline for the following.. 1) The point about prettyprinting is well made, still it does not warrant a core change AFAIK. We can make a Warning, in which we explain that generic XML processors will not uderstand inline XLIFF/ITS notations for whitespace handling, so if people want to make their XLIFF files safe for these, they should make the set or inherited value of xml:space on <source> and <target> "preserve" That should do the trick, no matter what the other decisions under 2) 2) I now understand what you mean by an ITS dependency. IMHO this low risk. Same as the W3C ITS our conformance clause for the module should say that it is enough to support one data category. [Similarly slr says it is enough to support just the predefined profiles] This category can as well be the Preserve Space category.. This said, ITS categories can be thematically grouped and modules made smaller. This has been done informatively to ease adoption, as 20 categories is a lot.. Or we can decide to make a module for inline handling of xml namespace attributes. This would have its own namespace and the ITS mapping would be using it for expressing the Preserve Space and Language information categories. Possible names of the module could be: Inline Handling of Space and Language OR Core Supplement [;-)]  Finally,  in the light of the above discussion I think it was a good 2.0 decision to disallow xml namespace on inlines, as having it on <mrk> and the specialized xliff handing only on <sm> would even further complicate the issue of generic XML processors, who would interpret only part of the inline set and inherited values, i.e. the part on <mrk> elements, ignoring the values on the set values on <sm> elements that are likely to override the inherited <mrk> values. It also shows there is value in the Preserve Space strategies using core only means. Hiding the whitesspaces to be preserved in the original data where xml:space is restricted to preserve seems the only fully expressive and fool proof way.. Cheers dF  Dr. David Filip ======================= OASIS XLIFF TC Secretary, Editor, and Liaison Officer  LRC CNGL CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto:  david.filip@ul.ie On Tue, Nov 18, 2014 at 7:13 PM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com > wrote: Hi Yves, all I thought a bit more on this after today's XLIFF call and I'm starting to think that we cannot properly do this without a change to the core specification if we want it as a generic feature and not an ITS module feature. Long discussion bellow. Tl;dr sentence: Put ITS space and language mapping in a separate namespace than the rest of the ITS mapping module, add PR to set "xml:space" to "preserve" on a higher level element if any span need it. As it currently stands the validity of an XLIFF document (or adherence to our processing requirements) is not changed if the document is passed through an XML pretty printing application that respect the "xml:space" attribute and uses the schema. The default is "default" on everything except on the <data> element where it is restricted to only ever being "preserve" and defaults to that. Sometimes translatable content need to include leading or trailing whitespace that must be preserved. That can be done by setting "xml:space" to "preserve" on an ancestor of such content. That would also handle <ignorable>'s which a lot of the time will only contain whitespace and quite possibly spaces and newlines mixed up together as it's only content. If we want to allow an inline notation to control treatment of whitespaces on the type of logical spans we allow across segments there is no way we can tell an arbitrary XML processor about how they work. So a pretty printer (or storage system or some other kind of processor) would never see that we want preservation of spaces on a particular span. Which could lead to for example pretty printing to violate our processing requirements. I doubt you will find any XML pretty printer / formatter that will not make some change to a bunch of whitespace sitting alone in an <ignorable> but protected by a <sm/><em/> pair in other sibling trees. To make the space handling safe with respect to non XML spans and generic XML processors we should at a minimum make <source> and <target> use "xml:space" set to "preserve" as their default and only possible value. Treating content that has "xml:space" set to "default" as if it was set to "preserve" is not an error, the opposite is an error. So by using standard XML constructs to enforce the more restrictive mode and then allowing a nonstandard mechanic to relax to a less restrictive mode should always be safe. If we make the change to the default value and add a constraint that it must be set explicitly if set to "preserve" a 2.x document would still be 100% compatible with a 2.0 document. A 2.0 document could not make use of the new non-XML span space handling but a 2.0 processor would not cause harm to a 2.x document. The same goes for ITS specified space handling in XLIFF. A non ITS aware 2.1 processor (or a 2.0 processor) would not have any way to know that whitespace in a particular span must be preserved. So the ITS module must allow non ITS module aware processors to do changes to "preserve" tagged whitespace, which seems bad. Or it must require that "xml:space" is set to "preserve" on a higher level if any span contain space that should be preserved is encountered. I'd like us to eventually have a solution for the space handling that is not tied to ITS and that the ITS module would make use of that feature. And that things like pretty printing would not violate XLIFF processing requirements. If that can't be done right now we need to at least we need to add the higher level setting of "xml:space" to "preserve" to the ITS mapping module to make it safe. Perhaps we could break out this (and xml:lang) into a namespace of its own. That way a later version of core could adopt it from the ITS module without semantic changes. Thus avoiding the situation where we would have one core and one module feature do the exact same thing, or needing to make an incompatible change. Regards, Fredrik Estreen >