OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  RE: Inline attributes and canCopy

    Posted 06-16-2015 22:37
    Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan  


  • 2.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-17-2015 03:26
    Hi Ryan,   In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.   See the id definition in the specification: When used in  <segment> ,  <ignorable> ,  <mrk> ,  <sm> ,  <pc> ,  <sc> ,  <ec> , or  <ph>  elements: The inline elements enclosed by a  <target>  element MUST use the duplicate  id  values of their corresponding inline elements enclosed within the sibling  <source>  element if and only if those corresponding elements exist. Except for the above exception, the value MUST be unique among all of the above within the enclosing  <unit>  element.   Also, I’m not sure I understand the following text in your old message:   “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”   The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.   Cheers, -yves       From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan