OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  mrk translate outside the content but in scope

    Posted 10-28-2013 15:37
    Hi all, As I was trying to implement a simple pseudo-translation command in our library, I've run into some problem about how to handle the translate feature in <mrk>: For the <segment element we have the following definition for the translate attribute: [[ When used in any other admissible structural element: The value of the translate attribute of its parent element. ]] But annotation can be placed in ignorable elements. And one of the fairly common uses for <ignorable> is to store things such as opening and closing codes that enclose the whole content of the segment, allowing for "cleaner" segments. Annotations may get the same treatment. Another way to get such unit is when a segmenter puts breaks as close as possible to the text. The bottom line is nothing prevent you to get a unit like this: <unit id="1"> <segment id="s1"> <source>T-Sentence 1.</source> </segment> <ignorable> <source> <sm id="m1" translate="no"/></source> </ignorable> <segment id="s2"> <source>NT-Sentence 2.</source> </segment> <ignorable> <source><em start="m1" /></source> </ignorable> <segment id="s3"> <source>T-Sentence 3.</source> </segment> </unit> So in such case a tool would get a default of translate='yes' for the segment s2 while the content is clearly intended (and coded) to be non-translatable. The solution would be to change the definition so the default first takes into account the translate state at the end of the previous segment or ignorable element. Note that this is potentially difficult to implement: you may have to look inside all previous siblings of a <segment> to get its default translate value. I do realize this comment/question is outside the comment period and (I think) is not related to any of the open comments we have. So technically it should not be addressed until the next round of comments. But hopefully OASIS process has some better way to address cases like that. Cheers, -yves


  • 2.  Re: [xliff] mrk translate outside the content but in scope

    Posted 11-04-2013 12:55
    Thanks, Yves, I can see the issue and have suggested to Bryan that he puts it on agenda tomorrow. Leaving the administrative issue of a late comment aside for now.. The issue you outline is more general in the sense that you do not even need to use an ignorable to get into the described issue. You can have a non-translatable span starting with an <sm> in the 1st segment running through the 2nd segment, and terminating within the 3rd segment with an <em> tied to the <sm>. This is again a situation where the default value of the segment translate should be overridden with the value from the (non-well formed) marked span. So I believe that the general solution is to say that recursively inherited defaults on <segment> and <mrk> need to be checked against possible <sm>/<em> overrides within the unit. Rgds dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Mon, Oct 28, 2013 at 3:36 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all, As I was trying to implement a simple pseudo-translation command in our library, I've run into some problem about how to handle the translate feature in <mrk>: For the <segment element we have the following definition for the translate attribute: [[ When used in any other admissible structural element: The value of the translate attribute of its parent element. ]] But annotation can be placed in ignorable elements. And one of the fairly common uses for <ignorable> is to store things such as opening and closing codes that enclose the whole content of the segment, allowing for "cleaner" segments. Annotations may get the same treatment. Another way to get such unit is when a segmenter puts breaks as close as possible to the text. The bottom line is nothing prevent you to get a unit like this:   <unit id="1">    <segment id="s1">     <source>T-Sentence 1.</source>    </segment>    <ignorable>     <source> <sm id="m1" translate="no"/></source>    </ignorable>    <segment id="s2">     <source>NT-Sentence 2.</source>    </segment>    <ignorable>     <source><em start="m1" /></source>    </ignorable>    <segment id="s3">     <source>T-Sentence 3.</source>    </segment>   </unit> So in such case a tool would get a default of translate='yes' for the segment s2 while the content is clearly intended (and coded) to be non-translatable. The solution would be to change the definition so the default first takes into account the translate state at the end of the previous segment or ignorable element. Note that this is potentially difficult to implement: you may have to look inside all previous siblings of a <segment> to get its default translate value. I do realize this comment/question is outside the comment period and (I think) is not related to any of the open comments we have. So technically it should not be addressed until the next round of comments. But hopefully OASIS process has some better way to address cases like that. Cheers, -yves --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 3.  RE: [xliff] mrk translate outside the content but in scope

    Posted 11-04-2013 15:36
    Hi David, all, > The issue you outline is more general in the sense > that you do not even need to use an ignorable to get into > the described issue. Yes, that was just a concrete example. > So I believe that the general solution is to say that recursively > inherited defaults on <segment> and <mrk> need to be checked > against possible <sm>/<em> overrides within the unit. I think we need to be a little bit more specific because it is quite complicated. [[ "When used in any other admissible structural element: The value of the translate attribute is first set to the translate of its parent element" ]] Should split into two parts and be something like this: [[ When used in <group> or <unit>: The value of the translate attribute of its parent element. When used in <segment>: - When applied to the source content: - If the segment is the first among all <segment> or <ignorable> element in this unit: The default value of translate is the same as the one of the parent unit. - otherwise: The default value is the same as the translate value at the end of the processing of the previous <segment> or <ignorable> element. - When applied to the target content: - If the segment is the first in the target order in this unit: The default value of translate is the same as the one of the parent unit. - Otherwise: The default value is the same as the translate value at the end of the processing of the previous <segment> or <ignorable> element in the target order. ]] The requirement of dealing with the target order has some important implication: Potentially you have to look even at the <segment> or <ignorable> elements that comes (physically in the file) after the one you are processing before you can guess the correct translate default. I think we should have some examples for the user to visualize this. I'm also looking forward to more than one SOU stating this is implemented. Cheers, -yves PS On a side note: Now, after all the tweaks done to <segment> it looks like segment markers are really better when treated as annotations like there were in 1.2. The problem related to segment markers in 1.2 was really because backward compatibility was to be maintained and one could not have a special element for it. But it wasn't dumb at all to use annotation-like mechanism. Having a "source/target within a segment" structure like now is fine for simple cases, but as soon as you have to work with re-segmentation or different ordering that structure shows its limitations, while the structure "segment within source/target" has none of all these issues.


  • 4.  Re: [xliff] mrk translate outside the content but in scope

    Posted 11-04-2013 18:10
    Yves, inline Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Mon, Nov 4, 2013 at 3:36 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi David, all, > The issue you outline is more general in the sense > that you do not even need to use an ignorable to get into > the described issue. Yes, that was just a concrete example. > So I believe that the general solution is to say that recursively > inherited defaults on <segment> and <mrk> need to be checked > against possible <sm>/<em> overrides within the unit. I think we need to be a little bit more specific because it is quite complicated. Of course we need a specific solution for the spec.  [[ "When used in any other admissible structural element: The value of the translate attribute is first set to the translate of its parent element" ]] Should split into two parts and be something like this: [[ When used in <group> or <unit>: The value of the translate attribute of its parent element. Yes, this one is simple..  When used in <segment>: - When applied to the source content:         - If the segment is the first among all <segment> or <ignorable> element in this unit:                 The default value of translate is the same as the one of the parent unit.         - otherwise:                 The default value is the same as the translate value at the end of the processing of the previous <segment> or <ignorable> element. - When applied to the target content:         - If the segment is the first in the target order in this unit:                 The default value of translate is the same as the one of the parent unit.         - Otherwise:                 The default value is the same as the translate value at the end of the processing of the previous <segment> or <ignorable> element in the target order. ]] This is a possible approach although I do not know what mean by "at the end of the processing", however this misses the fact that the same issue as with <segment> is with <mrk> Also I'd prefer the approach of determining the default in two steps. 1. look what default value got inherited recursively. 2. Check if it is overridden locally by an <sm>/<em> pair Please note that you are only looking for the closest enclosing occurrence and that you are done if you do not find any <sm> within the given <unit> This is just to explain the approach, not fleshing out the exact wording yet..   Also please note that we should avoid prescribing the implementation, an implementer can still implement both styles of defining the default using the described by Yves, as they appear to be logically equivalent.. The requirement of dealing with the target order has some important implication: Potentially you have to look even at the <segment> or <ignorable> elements that comes (physically in the file) after the one you are processing before you can guess the correct translate default. I believe this would be also simplified if the default was determined as local override of the structurally inherited value  I think we should have some examples for the user to visualize this. I agree  I'm also looking forward to more than one SOU stating this is implemented. Cheers, -yves PS On a side note: Now, after all the tweaks done to <segment> it looks like segment markers are really better when treated as annotations like there were in 1.2. The problem related to segment markers in 1.2 was really because backward compatibility was to be maintained and one could not have a special element for it. But it wasn't dumb at all to use annotation-like mechanism. Having a "source/target within a segment" structure like now is fine for simple cases, but as soon as you have to work with re-segmentation or different ordering that structure shows its limitations, while the structure "segment within source/target" has none of all these issues. --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 5.  RE: [xliff] mrk translate outside the content but in scope

    Posted 11-04-2013 19:43
    Maybe some use cases will help to clarify things. -- case 1) <unit id='1'> <segment translate='no'> <source>t1 <sm id='1' translate='yes'/></source> </segment> <segment translate='no'> <source>t2 <em startRef='1'/>t3</source> </segment> -> t2 is translatable -> t3 is not translatable -- case 2) <unit id='1' translate='no'> <segment translate='no'> <source>t1 <sm id='1' translate='yes'/></source> </segment> <segment> <source>t2 <em startRef='1'/>t3</source> </segment> -> t2 is translatable -> t3 is not translatable -- case 3) <unit id='1' translate='yes'> <segment translate='no'> <source>t1 <sm id='1' translate='yes'/></source> </segment> <segment> <source>t2 <em startRef='1'/>t3</source> </segment> -> t2 is translatable -> t3 is translatable or not??? -- case 4) <unit id='1' translate='no'> <segment translate='yes'> <source>t1 <sm id='1' translate='no'/>t2 <sm id='2' translate='yes'/></source> </segment> <segment> <source>t3 <em startRef='2'/>t4 <em startRef='1'/>t5</source> </segment> -> t3 is translatable -> t4 is not translatable -> t5 is translatable or not??? -- case 5) <unit id='1' translate='no'> <segment translate='yes'> <source>t1 <sm id='1' translate='no'/>t2 <sm id='2' translate='yes'/></source> </segment> <segment translate='yes'> <source>t3 <em startRef='2'/>t4 </source> </segment> <segment> <source><em startRef='1'/>t5</source> </segment> -> t3 is translatable -> t4 is not translatable (the translate='yes' in that <segment> is non-applicable) -> t5 is translatable or not??? -ys


  • 6.  Re: [xliff] mrk translate outside the content but in scope

    Posted 11-04-2013 21:57
    Yves, I am reacting to the use cases inline using my proposed algorithm.. I first thought that yours should produce the same results but probably not, as your question marks do not seem puzzling to me.. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Mon, Nov 4, 2013 at 7:43 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Maybe some use cases will help to clarify things. -- case 1) <unit id='1'>  <segment translate='no'>   <source>t1 <sm id='1' translate='yes'/></source>  </segment>  <segment translate='no'>   <source>t2 <em startRef='1'/>t3</source>  </segment> -> t2 is translatable -> t3 is not translatable Agreed  -- case 2) <unit id='1' translate='no'>  <segment translate='no'>   <source>t1 <sm id='1' translate='yes'/></source>  </segment>  <segment>   <source>t2 <em startRef='1'/>t3</source>  </segment> -> t2 is translatable -> t3 is not translatable Agreed  -- case 3) <unit id='1' translate='yes'>  <segment translate='no'>   <source>t1 <sm id='1' translate='yes'/></source>  </segment>  <segment>   <source>t2 <em startRef='1'/>t3</source>  </segment> -> t2 is translatable Agreed,   -> t3 is translatable or not??? IMHO t3 is translatable as well. t1 is in scope of its <segment translate="no">, not overridden, hence not translatable t3 is in scope of its <unit translate="yes">  inherited through its parent segment and not overridden by an <sm/>/<em/> pair, hence translatable. I do not see why it should be even considered not translatable.. Where is the puzzle? BTW, t2 is set as translatable twice (redundantly) in this example -- case 4) <unit id='1' translate='no'>  <segment translate='yes'>   <source>t1 <sm id='1' translate='no'/>t2 <sm id='2' translate='yes'/></source>  </segment>  <segment>   <source>t3 <em startRef='2'/>t4 <em startRef='1'/>t5</source>  </segment> -> t3 is translatable yes, the unit default is overridden by the <sm/>/<em/> pair #2  -> t4 is not translatable yes, the unit default is redundantly confirmed by the <sm/>/<em/> pair #1   -> t5 is translatable or not??? is not translatable by its unit default, this has not been overridden by anything, again where is the puzzle? -- case 5) <unit id='1' translate='no'>  <segment translate='yes'>   <source>t1 <sm id='1' translate='no'/>t2 <sm id='2' translate='yes'/></source>  </segment>  <segment translate='yes'>   <source>t3 <em startRef='2'/>t4 </source>  </segment>  <segment>   <source><em startRef='1'/>t5</source>  </segment> -> t3 is translatable I agree, the inner <sm/>/<em/> wins over the outer -> t4 is not translatable (the translate='yes' in that <segment> is non-applicable) it applies but is enclosed in the overriding <sm/>/<em/> pair #1  -> t5 is translatable or not??? t5 is not translatable by its unit default that has been neither flipped on segment nor overridden by an <sm/>/<em/> pair.  -ys --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 7.  RE: [xliff] mrk translate outside the content but in scope

    Posted 11-05-2013 01:02
    Hi David, all, > Yves, I am reacting to the use cases inline using > my proposed algorithm.. > I first thought that yours should produce the same > results but probably not, as your question marks do not > seem puzzling to me.. I put the question marks because it wasn't clear to me what was the result of your algorithm. Now I know. I've tried to implement this and here are my conclusions: -- We actually do not want to change the current definition with the defaults: they are fine. It's not the default value of the translate attribute of <segment> that depends on the overrides, it's how the value (set or inherited) is applied to the content. -- What we are missing is a description of how an agent determines the translate state of a text is at any point in the source and the target content. I'm not sure where that should be added. The translate attribute section is a possible place. Here is the algorithm I've tried: Resolved value = value directly set or obtained by inheritance Part = a segment or an ignorable element - Create a stack of translate values - Push a single value in the stack (the actual value does not matter). - Get the list of the parts in the unit: - for the source content use the parts in the order they are in the document - for the target content use the parts in the order set by the target order attributes - Starting at the first part of the list, for each part: - Set the bottom of the stack to the translate value of the part: - For a segment: the resolved translate value of the segment - For an ignorable: the resolved translate value of the parent unit. - Iterate through the content - If an opening marker for a Translate Annotation is found: - Push the translate value of the annotation at the top of the stack. - If a closing marker for a Translate Annotation is found: - Remove the corresponding item in the stack At any time, the top of the stack has the translate value to apply to the current content. This method has one caveat: The closing markers of each translation annotations must be after its corresponding opening marker. Technically I don't think this is enforced by a constraint or a PR. It should be OK for the source content. But the target ordering may introduce some problems if it's not done properly. There are probably other ways to do it. Hopefully Bryan will come up with a nice way to do it in XSLT. Cheers, -yves


  • 8.  Re: [xliff] mrk translate outside the content but in scope

    Posted 11-05-2013 10:36
    Thanks, Yves, most of it makes sense to me, and IMHO as a bottom line it actually does not require a normative change, except possibly adding a constraint enforcing the <em/> to come logically after their <sm/>. Now as I think of it,  we always assume (we e.g. do not allow em to have its own id) that <sm/>/<em> must always go in a pair but I am not sure if we explicitly say that anywhere in the spec.. Detailed answers inline: Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Nov 5, 2013 at 1:01 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi David, all, > Yves, I am reacting to the use cases inline using > my proposed algorithm.. > I first thought that yours should produce the same > results but probably not, as your question marks do not > seem puzzling to me.. I put the question marks because it wasn't clear to me what was the result of your algorithm. Now I know. Good, and is it the same as yours?  I've tried to implement this and here are my conclusions: -- We actually do not want to change the current definition with the defaults: they are fine. I agree  It's not the default value of the translate attribute of <segment> that depends on the overrides, it's how the value (set or inherited) is applied to the content. I agree  -- What we are missing is a description of how an agent determines the translate state of a text is at any point in the source and the target content. IMHO this description should be added as a non normative example, as otherwise it would be pushing an implementation I'm not sure where that should be added. The translate attribute section is a possible place. Sounds plausible, will shout if I find a better place. We could place a warning in the attribute description under the deafault values description aying that these defaults may be overridden by values set in <sm></em> pairs throughout the enclosing unit and that implementers should be careful about that. Your stack algorithm description could come as a possible implementation example, I would try to flesh out my abstract algorithm description, if Bryan or Tom can add an XSLT based example that we be very good. Here is the algorithm I've tried: Resolved value = value directly set or obtained by inheritance Part = a segment or an ignorable element - Create a stack of translate values - Push a single value in the stack (the actual value does not matter). - Get the list of the parts in the unit:   - for the source content use the parts in the order they are in the document   - for the target content use the parts in the order set by the target order attributes - Starting at the first part of the list, for each part:   - Set the bottom of the stack to the translate value of the part:     - For a segment: the resolved translate value of the segment     - For an ignorable: the resolved translate value of the parent unit. I don't think so, ignorable is not translatabale no matter what, it can just hold pieces of annotations that can flip parts outside of itself..    - Iterate through the content     - If an opening marker for a Translate Annotation is found:       - Push the translate value of the annotation at the top of the stack.     - If a closing marker for a Translate Annotation is found:       - Remove the corresponding item in the stack At any time, the top of the stack has the translate value to apply to the current content. This method has one caveat: The closing markers of each translation annotations must be after its corresponding opening marker. Technically I don't think this is enforced by a constraint or a PR. It should be OK for the source content. But the target ordering may introduce some problems if it's not done properly. I think we do not have this, I propose to add this to the <sm/> and <em/> descriptions. I think we should say for <sm/> that it MUST have  a closing <em/> within the enclosing unit (and vice versa) AND that <em/> must come logically after its corresponding <sm/> I'd also add a warning that the order attribute has to be considered in target for determining the logical order.. There are probably other ways to do it. Hopefully Bryan will come up with a nice way to do it in XSLT. Cheers, -yves --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 9.  RE: [xliff] mrk translate outside the content but in scope

    Posted 11-05-2013 12:13
    Hi David, all, > possibly adding a constraint enforcing the <em/> > to come logically after their <sm/>. Possibly. That may be the case for <sc> and <ec> too. >> I put the question marks because it wasn't clear >> to me what was the result of your algorithm. Now I know. > > Good, and is it the same as yours? I didn't have a set expectation. >>  - Set the bottom of the stack to the translate >> value of the part: >>    - For a segment: the resolved translate value of > the segment >>    - For an ignorable: the resolved translate value >> of the parent unit. > > I don't think so, ignorable is not translatabale no matter what, > it can just hold pieces of annotations that can flip parts > outside of itself. When computing the state of translate in a segment one must take account both the segment and the ignorable elements into account as part of a whole, otherwise you simply cannot end up with the correct result. Note also that ignorable element contains whatever the Extractors or Modifiers deem appropriate (whitespace and inline codes most of the time). While its content is not expected to be translatable text, it may need to be adapted in some cases, for example to remove whitespace between segments in some languages, or to adjust whitespace when re-ordering the target. >> Technically I don't think this is enforced by a constraint or a PR. > > I think we do not have this, I propose to add this to the <sm/> > and <em/> descriptions. > I think we should say for <sm/> that it MUST have  a closing <em/> > within the enclosing unit (and vice versa) > AND > that <em/> must come logically after its corresponding <sm/> I tend to agree. But other really need to look at all this and have opinions. > I'd also add a warning that the order attribute has to be considered > in target for determining the logical order.. Possibly. But let's just make sure we don't add repeated warnings, notes, PRs all over the place. As a general guideline we should be concise. In my experience, overwhelming the reader with repeated information often leads to confusion and make things look more complex than they are. The Segmentation Modification section is an example of that. I'll get to that at some point. Cheers, -yves