OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-28-2015 04:20
    Title: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitute s a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get simi lar attributes for things like canDelete or subFlow. For example, the extractor that cre ated the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don ’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: T he id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/ kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content . Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity. I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files. Thanks, -yves


  • 2.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-28-2015 05:35
    Hi Yves, all, It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM: <source>Hello <pc id="1">World</pc></source> <target>Hola <pc id="2">Mundo</pc></target> Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The   trouble would be where I might have more than one tag: <source><pc id="1">Hello</pc> <pc id="2">World</pc></source> <target><pc id="3">Hello</pc> <pc id="4">World</pc></target> Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data  references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data... Ryan From: Yves Savourel Sent: ?6/?27/?2015 9:20 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitute s a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get simi lar attributes for things like canDelete or subFlow. For example, the extractor that cre ated the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don ’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: T he id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/ kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content . Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity. I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files. Thanks, -yves


  • 3.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-29-2015 04:35
    Hi Ryan,   I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide. I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.   So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes. With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.   Cheers, -ys   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Sunday, June 28, 2015 7:35 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all, It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM: <source>Hello <pc id="1">World</pc></source> <target>Hola <pc id="2">Mundo</pc></target> Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The   trouble would be where I might have more than one tag: <source><pc id="1">Hello</pc> <pc id="2">World</pc></source> <target><pc id="3">Hello</pc> <pc id="4">World</pc></target> Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data  references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data... Ryan From: Yves Savourel Sent: ?6/?27/?2015 9:20 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content. Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity.   I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate   You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files.   Thanks, -yves


  • 4.  RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-29-2015 05:44
    Hi Yves,   OK, if we go with correspondence being defined as the same kind of code and the same ID, then what is the purpose of validating that the corresponding codes have the same ID if that was already a criteria to correspondence? It makes even less sense to me now how to validate this constraint.   Ryan     From: Yves Savourel [mailto:ysavourel@enlaso.com] Sent: Sunday, June 28, 2015 9:35 PM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide. I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.   So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes. With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.   Cheers, -ys   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Sunday, June 28, 2015 7:35 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all, It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM: <source>Hello <pc id="1">World</pc></source> <target>Hola <pc id="2">Mundo</pc></target> Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The   trouble would be where I might have more than one tag: <source><pc id="1">Hello</pc> <pc id="2">World</pc></source> <target><pc id="3">Hello</pc> <pc id="4">World</pc></target> Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data  references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data... Ryan From: Yves Savourel Sent: ?6/?27/?2015 9:20 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content. Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity.   I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate   You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files.   Thanks, -yves


  • 5.  RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-29-2015 08:08
    Hi Ryan, all,   I guess it is like the translation itself: The specification says the <target> contains the translated text of the <source>, but there is no real way to valid this. Those inline codes are the same: If they have the same ID we can at least verify they are also of the same kind (spanning/placeholder) and that is it: that is what define them as ‘corresponding’.   As for the text “The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.” It just says that an ID value can be duplicated in the source and target only when the two codes are corresponding codes.   There is probably a better way to express that.   But at least it says that if a code has an ID value that does not exists in the source it is not a “corresponding code”, and therefore it’s an added one.   Cheers, -ys     From: Ryan King [mailto:ryanki@microsoft.com] Sent: Monday, June 29, 2015 7:44 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   OK, if we go with correspondence being defined as the same kind of code and the same ID, then what is the purpose of validating that the corresponding codes have the same ID if that was already a criteria to correspondence? It makes even less sense to me now how to validate this constraint.   Ryan     From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 28, 2015 9:35 PM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide. I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.   So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes. With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.   Cheers, -ys   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Sunday, June 28, 2015 7:35 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all, It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM: <source>Hello <pc id="1">World</pc></source> <target>Hola <pc id="2">Mundo</pc></target> Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The   trouble would be where I might have more than one tag: <source><pc id="1">Hello</pc> <pc id="2">World</pc></source> <target><pc id="3">Hello</pc> <pc id="4">World</pc></target> Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data  references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data... Ryan From: Yves Savourel Sent: ?6/?27/?2015 9:20 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content. Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity.   I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate   You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files.   Thanks, -yves


  • 6.  RE: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-29-2015 17:22
    Hi Yves, all,   Thanks Yves, for the conversation. So, if there are matching IDs between source and target inline codes, then the type of code must match. This makes sense and is something that can be easily validated. We are in agreement then on this constraint. However, I think it would create less confusion for other implementers if the meaning of correspondence was clarified in the spec to reflect this.   Thanks, Ryan   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel Sent: Monday, June 29, 2015 1:08 AM To: XLIFF Main List Subject: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan, all,   I guess it is like the translation itself: The specification says the <target> contains the translated text of the <source>, but there is no real way to valid this. Those inline codes are the same: If they have the same ID we can at least verify they are also of the same kind (spanning/placeholder) and that is it: that is what define them as ‘corresponding’.   As for the text “The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.” It just says that an ID value can be duplicated in the source and target only when the two codes are corresponding codes.   There is probably a better way to express that.   But at least it says that if a code has an ID value that does not exists in the source it is not a “corresponding code”, and therefore it’s an added one.   Cheers, -ys     From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Monday, June 29, 2015 7:44 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   OK, if we go with correspondence being defined as the same kind of code and the same ID, then what is the purpose of validating that the corresponding codes have the same ID if that was already a criteria to correspondence? It makes even less sense to me now how to validate this constraint.   Ryan     From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 28, 2015 9:35 PM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide. I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.   So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes. With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.   Cheers, -ys   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Sunday, June 28, 2015 7:35 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all, It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM: <source>Hello <pc id="1">World</pc></source> <target>Hola <pc id="2">Mundo</pc></target> Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The   trouble would be where I might have more than one tag: <source><pc id="1">Hello</pc> <pc id="2">World</pc></source> <target><pc id="3">Hello</pc> <pc id="4">World</pc></target> Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data  references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data... Ryan From: Yves Savourel Sent: ?6/?27/?2015 9:20 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content. Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity.   I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate   You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files.   Thanks, -yves


  • 7.  RE: [QUAR] RE: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-29-2015 17:57
    I agree, it’d nice to have a clarification of what “corresponding code” means.   From: Ryan King [mailto:ryanki@microsoft.com] Sent: Monday, June 29, 2015 7:22 PM To: Yves Savourel; XLIFF Main List Subject: [QUAR] RE: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all,   Thanks Yves, for the conversation. So, if there are matching IDs between source and target inline codes, then the type of code must match. This makes sense and is something that can be easily validated. We are in agreement then on this constraint. However, I think it would create less confusion for other implementers if the meaning of correspondence was clarified in the spec to reflect this.   Thanks, Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Yves Savourel Sent: Monday, June 29, 2015 1:08 AM To: XLIFF Main List Subject: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan, all,   I guess it is like the translation itself: The specification says the <target> contains the translated text of the <source>, but there is no real way to valid this. Those inline codes are the same: If they have the same ID we can at least verify they are also of the same kind (spanning/placeholder) and that is it: that is what define them as ‘corresponding’.   As for the text “The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.” It just says that an ID value can be duplicated in the source and target only when the two codes are corresponding codes.   There is probably a better way to express that.   But at least it says that if a code has an ID value that does not exists in the source it is not a “corresponding code”, and therefore it’s an added one.   Cheers, -ys     From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Monday, June 29, 2015 7:44 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   OK, if we go with correspondence being defined as the same kind of code and the same ID, then what is the purpose of validating that the corresponding codes have the same ID if that was already a criteria to correspondence? It makes even less sense to me now how to validate this constraint.   Ryan     From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 28, 2015 9:35 PM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide. I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.   So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes. With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.   Cheers, -ys   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Sunday, June 28, 2015 7:35 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all, It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM: <source>Hello <pc id="1">World</pc></source> <target>Hola <pc id="2">Mundo</pc></target> Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The   trouble would be where I might have more than one tag: <source><pc id="1">Hello</pc> <pc id="2">World</pc></source> <target><pc id="3">Hello</pc> <pc id="4">World</pc></target> Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data  references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data... Ryan From: Yves Savourel Sent: ?6/?27/?2015 9:20 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content. Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity.   I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate   You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files.   Thanks, -yves


  • 8.  RE: [QUAR] RE: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy

    Posted 06-29-2015 18:01
    Hi David, is this something that can be clarified for 2.1?   Thanks, Ryan   From: Yves Savourel [mailto:ysavourel@enlaso.com] Sent: Monday, June 29, 2015 10:57 AM To: Ryan King; 'XLIFF Main List' Subject: RE: [QUAR] RE: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   I agree, it’d nice to have a clarification of what “corresponding code” means.   From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Monday, June 29, 2015 7:22 PM To: Yves Savourel; XLIFF Main List Subject: [QUAR] RE: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all,   Thanks Yves, for the conversation. So, if there are matching IDs between source and target inline codes, then the type of code must match. This makes sense and is something that can be easily validated. We are in agreement then on this constraint. However, I think it would create less confusion for other implementers if the meaning of correspondence was clarified in the spec to reflect this.   Thanks, Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Yves Savourel Sent: Monday, June 29, 2015 1:08 AM To: XLIFF Main List Subject: [xliff] RE: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan, all,   I guess it is like the translation itself: The specification says the <target> contains the translated text of the <source>, but there is no real way to valid this. Those inline codes are the same: If they have the same ID we can at least verify they are also of the same kind (spanning/placeholder) and that is it: that is what define them as ‘corresponding’.   As for the text “The inline elements enclosed by a <target> element MUST use the duplicate id values of their corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist.” It just says that an ID value can be duplicated in the source and target only when the two codes are corresponding codes.   There is probably a better way to express that.   But at least it says that if a code has an ID value that does not exists in the source it is not a “corresponding code”, and therefore it’s an added one.   Cheers, -ys     From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Monday, June 29, 2015 7:44 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: [QUAR] RE: [xliff] RE: Inline attributes and canCopy   Hi Yves,   OK, if we go with correspondence being defined as the same kind of code and the same ID, then what is the purpose of validating that the corresponding codes have the same ID if that was already a criteria to correspondence? It makes even less sense to me now how to validate this constraint.   Ryan     From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 28, 2015 9:35 PM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Ryan,   I don’t think the merger could necessarily resolve the correspondence because it would need the <data> for the target, which the extractor may or may not provide. I think it would also make life a lot more difficult in general to have mismatched IDs between source and target. Also we wouldn’t be able to tell which code is new vs exist in the source.   So if two codes have the same ID and are of the same kind (spanning or placeholder) that should be enough to indicate that they are corresponding codes. With that the merger can choose what it does with the data: use the one from the source, or the target, or even from its own internal mechanism.   Cheers, -ys   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Sunday, June 28, 2015 7:35 AM To: Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy   Hi Yves, all, It is interesting, Yves, that you bring up the TM and merger scenario together because in my mind then there isn't any reason why I couldn't populate my target with the following from a TM: <source>Hello <pc id="1">World</pc></source> <target>Hola <pc id="2">Mundo</pc></target> Especially since the merger could resolve the original data in source and then use the same original data in the target, e.g. treat them as corresponding. Or would you argue I shouldn't be able to leverage this high similarity match from my TM just because of the ID difference? The   trouble would be where I might have more than one tag: <source><pc id="1">Hello</pc> <pc id="2">World</pc></source> <target><pc id="3">Hello</pc> <pc id="4">World</pc></target> Now how does the merger determine corresponding? Which brings me back to my original thought that you can only really tell if a code is corresponding if they both have the same original data reference. So regarding "corresponding inline elements enclosed within the sibling <source> element if and only if those corresponding elements exist" seems to indicate to me that once original data  references are resolved to find correspondences, IDs must match...but again, what is the purpose of the matching IDs since I've already resolved the original data... Ryan From: Yves Savourel Sent: ?6/?27/?2015 9:20 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: Inline attributes and canCopy Hi everyone, After discussing the issue of how to determine what constitutes a ‘corresponding code’ with others I think it is probably too restrictive to even expect most of the attributes listed below to be the same. Leveraging from TM is a frequent scenario and one cannot expect to get similar attributes for things like canDelete or subFlow. For example, the extractor that created the leveraged content may have had different rules on what can be deleted or not, or a previous version of the same code may have no sub-flow (like <a> without alt in version 1, an <a> with alt in version 2).   Such differences are frequents but they don’t mean the target code is not corresponding to the source code. Maybe only two information in both codes need to be identical: The id value and the kind of code (spanning or placeholder). Ultimately it is the merger agent that should decide what to do with the target codes. As long as it can associate a target code to a source one (and only id/kind-of-code are needed for this) it should be fine. Otherwise we may end up prohibiting the processing of files that are being translated and have leveraged target content. Thoughts? -yves _____________________________________________ From: Yves Savourel [ mailto:ysavourel@enlaso.com ] Sent: Sunday, June 21, 2015 7:20 AM To: 'xliff@lists.oasis-open.org' Subject: [xliff] RE: Inline attributes and canCopy Hi Ryan, all, > Thanks Yves, when you have something working in Lynx, can you share > the full implementation with me and we'll look at what we can do to > at least have parity.   I've implemented a verification that tries to detect when two source/target codes with the same ID are not "corresponding". It’s just my best solution so far, but I’m obviously open to adjustments. I have not yet made a formal release with the change: It would be nice to get the feedback from other implementers and TC members. Such two codes are seen as corresponding if: - Both are either spanning (<pc>/<sc><ec>) or both standalone (<ph>) And if they have, at least, the following properties identical: - type - canOverlap - canDelete - canRemove - canReorder - subFlows/subFlowStart/subFlowEnd - canCopy - copyOf - disp/dispStart/dispEnd - equiv/equivStart/equivEnd They may have different values for all the other properties: the rational to not include the others (like data, dir, subType, etc.) is that they might be changed when they are in the target. For the annotations: I check for identical values on: - type - translate   You can see the source code here: https://bitbucket.org/okapiframework/xliff-toolkit/src/1349e1470f2422752f675a98cf3bcff433569e6d/okapi/libraries/lib-xliff/src/main/java/net/sf/okapi/lib/xliff2/reader/XLIFFReader.java?at=master#cl-1627 Note that the code performs the verification on the parsed elements, so, for example, there is no distinction between the attributes disp, dispStart and dispEnd: In the object model it's just the disp property of that tag object. You can play with the new behavior in the online validator: http://okapi-lynx.appspot.com/validation (The issue Nesho found with <cp hex='7FFFFFFF'/> is also fixed) There are a few test files (bad_NotCorrespondingCode*.xlf) here: https://bitbucket.org/okapiframework/xliff-toolkit/src/master/okapi/libraries/lib-xliff/src/test/resources/invalid/ But we should also add valid test files.   Thanks, -yves