OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only
  • 1.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-19-2015 00:09
    Hi Yves,   Please tell me if I have the correct understanding then.   Valid Source Target Yes <pc Id=1 x> Yes <pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> No <pc Id=1 x> <pc Id=1 y> No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> No <pc Id=1 x> <pc Id=2 x> Yes <pc Id=1 x> <pc Id=2 y> No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x>   x and y stand for all attributes other than id, where x and y are different values.   Thanks, Ryan   From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Tuesday, June 16, 2015 8:26 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.   See the id definition in the specification: When used in  <segment> ,  <ignorable> ,  <mrk> ,  <sm> ,  <pc> ,  <sc> ,  <ec> , or  <ph>  elements: The inline elements enclosed by a  <target>  element MUST use the duplicate  id  values of their corresponding inline elements enclosed within the sibling  <source>  element if and only if those corresponding elements exist. Except for the above exception, the value MUST be unique among all of the above within the enclosing  <unit>  element.   Also, I’m not sure I understand the following text in your old message:   “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”   The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.   Cheers, -yves       From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik ( Fredrik.Estreen@lionbridge.com ); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan  


  • 2.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-19-2015 02:56
    Hi Ryan,   Here is my take. But others should check too.   One note: some attributes (e.g. dataRef* can have different values between source and target) Valid Source Target   Yes <pc Id=1 x> OK (added target code) Yes <pc Id=1 x> OK (if canDelete='yes' for <pc id=1>) Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK (added target code) No <pc Id=1 x> <pc Id=1 y> Correct: This is not valid No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Correct: This is not valid because some attributes in <pc id=1> must be the same in source and target Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> OK No <pc Id=1 x> <pc Id=2 x> No is wrong: This is valid. If canDelete='yes' for <pc id=1> then in the target you can delete it and add a new code that has the same attributes (except of id). But it will be seen as an added code. It is probably not something one wants to do, but it is difficult to prevent it. Yes <pc Id=1 x> <pc Id=2 y> OK (if canDelete='yes' for <pc id=1>) No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Correct: This is not valid Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK   Cheers, -yves     From: Ryan King [mailto:ryanki@microsoft.com] Sent: Friday, June 19, 2015 2:09 AM To: Yves Savourel; Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   Please tell me if I have the correct understanding then.   Valid Source Target Yes <pc Id=1 x> Yes <pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> No <pc Id=1 x> <pc Id=1 y> No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> No <pc Id=1 x> <pc Id=2 x> Yes <pc Id=1 x> <pc Id=2 y> No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x>   x and y stand for all attributes other than id, where x and y are different values.   Thanks, Ryan   From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Tuesday, June 16, 2015 8:26 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.   See the id definition in the specification: When used in  <segment> ,  <ignorable> ,  <mrk> ,  <sm> ,  <pc> ,  <sc> ,  <ec> , or  <ph>  elements: The inline elements enclosed by a  <target>  element MUST use the duplicate  id  values of their corresponding inline elements enclosed within the sibling  <source>  element if and only if those corresponding elements exist. Except for the above exception, the value MUST be unique among all of the above within the enclosing  <unit>  element.   Also, I’m not sure I understand the following text in your old message:   “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”   The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.   Cheers, -yves       From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik ( Fredrik.Estreen@lionbridge.com ); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan  


  • 3.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-19-2015 03:32
    Regarding the highlighted one, assuming canDelete is yes, I understand that they can delete and re-add, but the fact that all the attributes, except id, are all the same, I would argue it is the same tag, and therefore can prevent them using a different id by failing validation. If it is valid, then how can I ever validate this From: Yves Savourel Sent: ?6/?18/?2015 7:56 PM To: Ryan King ; 'Estreen, Fredrik' ; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi Ryan,   Here is my take. But others should check too.   One note: some attributes (e.g. dataRef* can have different values between source and target) Valid Source Target   Yes <pc Id=1 x> OK (added target code) Yes <pc Id=1 x> OK (if canDelete='yes' for <pc id=1>) Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK (added target code) No <pc Id=1 x> <pc Id=1 y> Correct: This is not valid No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Correct: This is not valid because some attributes in <pc id=1> must be the same in source and target Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> OK No <pc Id=1 x> <pc Id=2 x> No is wrong: This is valid. If canDelete='yes' for <pc id=1> then in the target you can delete it and add a new code that has the same attributes (except of id). But it will be seen as an added code. It is probably not something one wants to do, but it is difficult to prevent it. Yes <pc Id=1 x> <pc Id=2 y> OK (if canDelete='yes' for <pc id=1>) No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Correct: This is not valid Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK   Cheers, -yves     From: Ryan King [mailto:ryanki@microsoft.com] Sent: Friday, June 19, 2015 2:09 AM To: Yves Savourel; Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   Please tell me if I have the correct understanding then.   Valid Source Target Yes <pc Id=1 x> Yes <pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> No <pc Id=1 x> <pc Id=1 y> No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> No <pc Id=1 x> <pc Id=2 x> Yes <pc Id=1 x> <pc Id=2 y> No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x>   x and y stand for all attributes other than id, where x and y are different values.   Thanks, Ryan   From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Tuesday, June 16, 2015 8:26 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.   See the id definition in the specification: When used in  <segment> ,  <ignorable> ,  <mrk> ,  <sm> ,  <pc> ,  <sc> ,  <ec> , or  <ph>  elements: The inline elements enclosed by a  <target>  element MUST use the duplicate  id  values of their corresponding inline elements enclosed within the sibling  <source>  element if and only if those corresponding elements exist. Except for the above exception, the value MUST be unique among all of the above within the enclosing  <unit>  element.   Also, I’m not sure I understand the following text in your old message:   “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”   The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.   Cheers, -yves       From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik ( Fredrik.Estreen@lionbridge.com ); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan  


  • 4.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-19-2015 03:33
    Sorry, please ignore. I accidently hit send while writing this. I think I understand something different. I will resend my answer in a bit. From: Ryan King Sent: ?6/?18/?2015 8:31 PM To: Yves Savourel ; 'Estreen, Fredrik' ; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Regarding the highlighted one, assuming canDelete is yes, I understand that they can delete and re-add, but the fact that all the attributes, except id, are all the same, I would argue it is the same tag, and therefore can prevent them using a different id by failing validation. If it is valid, then how can I ever validate this From: Yves Savourel Sent: ?6/?18/?2015 7:56 PM To: Ryan King ; 'Estreen, Fredrik' ; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi Ryan,   Here is my take. But others should check too.   One note: some attributes (e.g. dataRef* can have different values between source and target) Valid Source Target   Yes <pc Id=1 x> OK (added target code) Yes <pc Id=1 x> OK (if canDelete='yes' for <pc id=1>) Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK (added target code) No <pc Id=1 x> <pc Id=1 y> Correct: This is not valid No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Correct: This is not valid because some attributes in <pc id=1> must be the same in source and target Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> OK No <pc Id=1 x> <pc Id=2 x> No is wrong: This is valid. If canDelete='yes' for <pc id=1> then in the target you can delete it and add a new code that has the same attributes (except of id). But it will be seen as an added code. It is probably not something one wants to do, but it is difficult to prevent it. Yes <pc Id=1 x> <pc Id=2 y> OK (if canDelete='yes' for <pc id=1>) No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Correct: This is not valid Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK   Cheers, -yves     From: Ryan King [mailto:ryanki@microsoft.com] Sent: Friday, June 19, 2015 2:09 AM To: Yves Savourel; Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   Please tell me if I have the correct understanding then.   Valid Source Target Yes <pc Id=1 x> Yes <pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> No <pc Id=1 x> <pc Id=1 y> No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> No <pc Id=1 x> <pc Id=2 x> Yes <pc Id=1 x> <pc Id=2 y> No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x>   x and y stand for all attributes other than id, where x and y are different values.   Thanks, Ryan   From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Tuesday, June 16, 2015 8:26 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.   See the id definition in the specification: When used in  <segment> ,  <ignorable> ,  <mrk> ,  <sm> ,  <pc> ,  <sc> ,  <ec> , or  <ph>  elements: The inline elements enclosed by a  <target>  element MUST use the duplicate  id  values of their corresponding inline elements enclosed within the sibling  <source>  element if and only if those corresponding elements exist. Except for the above exception, the value MUST be unique among all of the above within the enclosing  <unit>  element.   Also, I’m not sure I understand the following text in your old message:   “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”   The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.   Cheers, -yves       From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik ( Fredrik.Estreen@lionbridge.com ); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan  


  • 5.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-19-2015 03:40
    I think my confusion is around the definition of "corresponding".    I was under the impression it meant if all attributes are the same between the tags, then the ids must also be the same. So validation would check all other attributes first, and if they were the same, enforce id to be the same. However, Yves' highlighted text below leads me to believe that "corresponding" means if the ids are the same, all other attributes must also be the same. I'm From: Yves Savourel Sent: ?6/?18/?2015 7:56 PM To: Ryan King ; 'Estreen, Fredrik' ; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi Ryan,   Here is my take. But others should check too.   One note: some attributes (e.g. dataRef* can have different values between source and target) Valid Source Target   Yes <pc Id=1 x> OK (added target code) Yes <pc Id=1 x> OK (if canDelete='yes' for <pc id=1>) Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK (added target code) No <pc Id=1 x> <pc Id=1 y> Correct: This is not valid No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Correct: This is not valid because some attributes in <pc id=1> must be the same in source and target Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> OK No <pc Id=1 x> <pc Id=2 x> No is wrong: This is valid. If canDelete='yes' for <pc id=1> then in the target you can delete it and add a new code that has the same attributes (except of id). But it will be seen as an added code. It is probably not something one wants to do, but it is difficult to prevent it. Yes <pc Id=1 x> <pc Id=2 y> OK (if canDelete='yes' for <pc id=1>) No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Correct: This is not valid Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK   Cheers, -yves     From: Ryan King [mailto:ryanki@microsoft.com] Sent: Friday, June 19, 2015 2:09 AM To: Yves Savourel; Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   Please tell me if I have the correct understanding then.   Valid Source Target Yes <pc Id=1 x> Yes <pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> No <pc Id=1 x> <pc Id=1 y> No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> No <pc Id=1 x> <pc Id=2 x> Yes <pc Id=1 x> <pc Id=2 y> No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x>   x and y stand for all attributes other than id, where x and y are different values.   Thanks, Ryan   From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Tuesday, June 16, 2015 8:26 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.   See the id definition in the specification: When used in  <segment> ,  <ignorable> ,  <mrk> ,  <sm> ,  <pc> ,  <sc> ,  <ec> , or  <ph>  elements: The inline elements enclosed by a  <target>  element MUST use the duplicate  id  values of their corresponding inline elements enclosed within the sibling  <source>  element if and only if those corresponding elements exist. Except for the above exception, the value MUST be unique among all of the above within the enclosing  <unit>  element.   Also, I’m not sure I understand the following text in your old message:   “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”   The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.   Cheers, -yves       From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik ( Fredrik.Estreen@lionbridge.com ); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan  


  • 6.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-19-2015 03:43
    Sorry, did it again! Fat fingers, small phone! So...I think my confusion is around the definition of "corresponding".    I was under the impression it meant if all attributes are the same between the tags, then the ids must also be the same. So validation would check all other attributes first, and if they were the same, enforce id to be the same. However, Yves' highlighted text below leads me to believe that "corresponding" means if the ids are the same, all other attributes must also be the same.  So validation would check ids first, and if they were the same, enforce that all other attributes were the same same. Is that correct? From: Ryan King Sent: ?6/?18/?2015 8:39 PM To: Yves Savourel ; 'Estreen, Fredrik' ; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy I think my confusion is around the definition of "corresponding".    I was under the impression it meant if all attributes are the same between the tags, then the ids must also be the same. So validation would check all other attributes first, and if they were the same, enforce id to be the same. However, Yves' highlighted text below leads me to believe that "corresponding" means if the ids are the same, all other attributes must also be the same. I'm From: Yves Savourel Sent: ?6/?18/?2015 7:56 PM To: Ryan King ; 'Estreen, Fredrik' ; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi Ryan,   Here is my take. But others should check too.   One note: some attributes (e.g. dataRef* can have different values between source and target) Valid Source Target   Yes <pc Id=1 x> OK (added target code) Yes <pc Id=1 x> OK (if canDelete='yes' for <pc id=1>) Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK (added target code) No <pc Id=1 x> <pc Id=1 y> Correct: This is not valid No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Correct: This is not valid because some attributes in <pc id=1> must be the same in source and target Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> OK No <pc Id=1 x> <pc Id=2 x> No is wrong: This is valid. If canDelete='yes' for <pc id=1> then in the target you can delete it and add a new code that has the same attributes (except of id). But it will be seen as an added code. It is probably not something one wants to do, but it is difficult to prevent it. Yes <pc Id=1 x> <pc Id=2 y> OK (if canDelete='yes' for <pc id=1>) No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Correct: This is not valid Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK   Cheers, -yves     From: Ryan King [mailto:ryanki@microsoft.com] Sent: Friday, June 19, 2015 2:09 AM To: Yves Savourel; Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   Please tell me if I have the correct understanding then.   Valid Source Target Yes <pc Id=1 x> Yes <pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> No <pc Id=1 x> <pc Id=1 y> No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> No <pc Id=1 x> <pc Id=2 x> Yes <pc Id=1 x> <pc Id=2 y> No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x>   x and y stand for all attributes other than id, where x and y are different values.   Thanks, Ryan   From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Tuesday, June 16, 2015 8:26 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target.   See the id definition in the specification: When used in  <segment> ,  <ignorable> ,  <mrk> ,  <sm> ,  <pc> ,  <sc> ,  <ec> , or  <ph>  elements: The inline elements enclosed by a  <target>  element MUST use the duplicate  id  values of their corresponding inline elements enclosed within the sibling  <source>  element if and only if those corresponding elements exist. Except for the above exception, the value MUST be unique among all of the above within the enclosing  <unit>  element.   Also, I’m not sure I understand the following text in your old message:   “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.”   The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data.   Cheers, -yves       From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik ( Fredrik.Estreen@lionbridge.com ); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Hi Frederik, all,   I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have:   xml <p><b><i>text</i></b></p>   xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source>   With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data?   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy   Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way.   Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.   Thanks for the help, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy   Hi Ryan,   I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it.   The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway.   “copyOf” is not really optional. It is required if the copied inline element does not have associated original data.   In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements Modifiers MUST NOT clone a code that has its canCopy attribute is set to no . The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..)   If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers.   Following the rules of inline IDs and  copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case.   The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec.   Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy   Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: ·          When used in <segment> , <ignorable> , <mrk> , <sm> , <pc> , <sc> , <ec> , or <ph> elements: o     The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o     Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec:   <unit id="1"> <segment>   <source>Äter <pc id="1">katter möss</pc>?</source>   <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>?   </target> </segment> </unit>   In order for this to meet the above constraint, the intended meaning is probably something like:   ·          inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. * Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id. *   And if that is true, when we validate the constraint to make sure the * original * source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint.   What is the logical reason for this constraint?   Thanks, Ryan  


  • 7.  RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-19-2015 04:15
    Hum… It’s true that we do not define what ‘corresponding’ means, and there is actually no explicit PR or constraints saying which if any attributes should be the same when an inline code in source and one in target have the same id value. And because there is no PR nor constraints currently Lynx does not check any 'correspondence'. So it let pass something like <pc id='1' canCopy='yes'> in the source and <pc id='1' canCopy='no'> in the target. I think you are right Ryan: when a source and a target code have the same id a validator should check their attributes are the same, but it's probably more complex: E.g. dataRef values could be different but the pointed data should be the same, etc. Also, a source <pc> may be mapped to a target <sc>/<ec> and therefore some attributes may be different. So it's not a trivial task to validate this correspondence. And without PR/constraints it'll harder to get validators to do it the same way. -ys From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Friday, June 19, 2015 5:43 AM To: Ryan King; Yves Savourel; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: [QUAR] RE: [xliff] RE: Inline attributes and canCopy Sorry, did it again! Fat fingers, small phone! So...I think my confusion is around the definition of "corresponding". I was under the impression it meant if all attributes are the same between the tags, then the ids must also be the same. So validation would check all other attributes first, and if they were the same, enforce id to be the same. However, Yves' highlighted text below leads me to believe that "corresponding" means if the ids are the same, all other attributes must also be the same. So validation would check ids first, and if they were the same, enforce that all other attributes were the same same. Is that correct? ________________________________________ From: Ryan King Sent: ?6/?18/?2015 8:39 PM To: Yves Savourel; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy I think my confusion is around the definition of "corresponding". I was under the impression it meant if all attributes are the same between the tags, then the ids must also be the same. So validation would check all other attributes first, and if they were the same, enforce id to be the same. However, Yves' highlighted text below leads me to believe that "corresponding" means if the ids are the same, all other attributes must also be the same. I'm ________________________________________ From: Yves Savourel Sent: ?6/?18/?2015 7:56 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi Ryan, Here is my take. But others should check too. One note: some attributes (e.g. dataRef* can have different values between source and target) Valid Source Target Yes <pc Id=1 x> OK (added target code) Yes <pc Id=1 x> OK (if canDelete='yes' for <pc id=1>) Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK (added target code) No <pc Id=1 x> <pc Id=1 y> Correct: This is not valid No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Correct: This is not valid because some attributes in <pc id=1> must be the same in source and target Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> OK No <pc Id=1 x> <pc Id=2 x> No is wrong: This is valid. If canDelete='yes' for <pc id=1> then in the target you can delete it and add a new code that has the same attributes (except of id). But it will be seen as an added code. It is probably not something one wants to do, but it is difficult to prevent it. Yes <pc Id=1 x> <pc Id=2 y> OK (if canDelete='yes' for <pc id=1>) No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Correct: This is not valid Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> OK Cheers, -yves From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Friday, June 19, 2015 2:09 AM To: Yves Savourel; Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: RE: [xliff] RE: Inline attributes and canCopy Hi Yves, Please tell me if I have the correct understanding then. Valid Source Target Yes <pc Id=1 x> Yes <pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> No <pc Id=1 x> <pc Id=1 y> No <pc Id=1 x> <pc Id=1 y><pc Id=2 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 y> No <pc Id=1 x> <pc Id=2 x> Yes <pc Id=1 x> <pc Id=2 y> No <pc Id=1 x> <pc Id=1 x><pc Id=1 x> Yes <pc Id=1 x> <pc Id=1 x><pc Id=2 x> x and y stand for all attributes other than id, where x and y are different values. Thanks, Ryan From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Tuesday, June 16, 2015 8:26 PM To: Ryan King; 'Estreen, Fredrik'; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi Ryan, In you example (and in all cases really), you know the relationship between source and target inline codes by their id values: A code with id=’1’ in the source corresponds to the code with id=’1’ in the target. See the id definition in the specification: • When used in <segment>, <ignorable>, <mrk>, <sm>, <pc>, <sc>, <ec>, or <ph> elements: o The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. Also, I’m not sure I understand the following text in your old message: “Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>.” The “arbitrarily” part seems wrong: You should always know which source code corresponds to which target code because they have identical ids. And copyOf is to use when a new code is introduced in the target and has no associated originalData, it that case copyOf points to an existing code for which the merger knows how to get the original data. Cheers, -yves From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Wednesday, June 17, 2015 12:36 AM To: Estreen, Fredrik (Fredrik.Estreen@lionbridge.com); 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Frederik, all, I’m resurrecting this thread. It seems to me that the only way to tell if a target <pc> corresponds to source <pc> is to make sure their attributes, with the exception of id, are identical, then validate the constraint. However, as Frederik mentions below, this constraint is to help mergers when <originalData> is not present so that they know which <pc> tags correspond to which original codes (which may be stored outside of the xliff). BUT if I have: xml <p><b><i>text</i></b></p> xlf <source><pc id=”1”><pc id=”2”>text</<pc></pc></source> <target><pc id=”?”>text</pc></source> With no other attributes in <pc> other than id, how do I know which id to match in source? Is the tag in target <b> or <i>? How can I apply the constraint without knowing the original data? Thanks, Ryan From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Thursday, January 29, 2015 9:02 PM To: Estreen, Fredrik; 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Thanks for the detailed explanation, Frederik! Somehow I missed the copyOf processing requirement. With your rationale, as long as we have <originalData>, we don’t really need ids for merge, which is our case. Additionally, we decode inline tags back to native codes when we store data in our TMS, so we perform normalization for matching, etc. in a non-XLIFF dependent way. Bottom line seems to be that to satisfy both constraints, we just need to make sure arbitrarily that one of the ids in target match source each time we process and validate, since we won’t really know which was the original, because our tools can’t use copyOf, because we always carry <originalData>. Thanks for the help, Ryan From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, January 29, 2015 6:28 PM To: Ryan King; 'xliff@lists.oasis-open.org' Subject: RE: Inline attributes and canCopy Hi Ryan, I interpret the specification the same way as you do with respect to the IDs and agree with your added sentence clarifying it. The rule that a an inline element that represent the exact same element in both source and target use the same ID in both locations is there to facilitate merge to native format for agents that do not put native data in the XLIFF document. Without it an agent would not be able to detect reordering or addition of codes. Safe substitution of tags in matches, in system that allow that, also need this. And it and also enables the storage of inline elements in TMs without the actual native code. Storing the native code in the TM offers more options for validation and match transformation so it may be a good thing anyway. “copyOf” is not really optional. It is required if the copied inline element does not have associated original data. In “4.7.2.4.1 Duplicating an existing code”: Processing Requirements • Modifiers MUST NOT clone a code that has its canCopy attribute is set to no. • The copyOf attribute MUST be used when, and only when, the base code has no associated original data. This requirements makes sure that a merger can always know what an inline element in target means as long as it knows what the meaning of the inline codes in source is. The expectation is that a merger not storing original data would be able to learn the meaning of the source inline elements at merge time through some to XLIFF external method (database, original file, etc..) If the inline elements have original data associated a comparison of that data will allow re-associating the copies with the originals at least to a degree needed by mergers. Following the rules of inline IDs and copyOf also allows more tag substitution to happen in matches. I personally believe that use of “copyOf” even for codes that have original data will allow a little bit more known safe substitutions than relying on comparison of original data. Unfortunately we don’t allow that behavior. The case where it makes a difference is if you have two identical inline codes in source and three in target. Which of the source ones is the third target one a copy of? But in most situations that will not be important to know, I have so far only found one TM related situation where it would help. Making it always required would also solve your use case. The only XLIFF solution to the problem you present is the solution you include. Perform processing at modification time to make sure that the PRs and co-constraints are met. Or operate on a slightly modified model internally where you require the use of copyOf regardless of if the code has native data or not. The have an export / cleanup step that uses copyOf to make sure that one tag has the source ID and finally removes “copyOf” information that is in violation of the spec. Regards, Fredrik Estreen From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 30 januari 2015 09:18 To: 'xliff@lists.oasis-open.org' Subject: [xliff] Inline attributes and canCopy Hi TC, we’ve run into a dilemma and require some expert guidance. The XLIFF 2.0 spec says this about Id: • When used in <segment>, <ignorable>, <mrk>, <sm>, <pc>, <sc>, <ec>, or <ph> elements: o The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. o Except for the above exception, the value MUST be unique among all of the above within the enclosing <unit> element. So, when an inline element is copied, I might get this example from the spec: <unit id="1"> <segment> <source>Äter <pc id="1">katter möss</pc>?</source> <target>Do <pc id="1">cats</pc> eat <pc id="2" copyOf="1">mice</pc>? </target> </segment> </unit> In order for this to meet the above constraint, the intended meaning is probably something like: • inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist. *Copies of inline elements are not considered to be corresponding to the original elements enclosed within the sibling <source> element and do not need to have the same id.* And if that is true, when we validate the constraint to make sure the *original* source and target inline element ids match, how do we know which one is the original one if copyOf is not required and they can be reordered? If I just rely on making sure at least one of the Ids match regardless of position, what happens if I deleted the original elements in <target> and add back in two new ones? Do I have to make sure at least one of the elements has an id that matches? It seems like a lot of processing just satisfy the constraint. What is the logical reason for this constraint? Thanks, Ryan


  • 8.  RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy