OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  Segmentation Modifications

    Posted 11-26-2013 05:28
    Hi all, I'm working on implementing the segmentation modification, in this case: un-segmenting. I've attached several files before (_in) and after (_out). Could you tell me if any of the results is not what you would expect? The modifications: the tool joins all segments/ignorable elements in each unit as long as the unit has no segment with canResegment='no'. Thanks, -yves Attachment: toJoin3_out.xlf Description: Binary data Attachment: toJoin1_in.xlf Description: Binary data Attachment: toJoin1_out.xlf Description: Binary data Attachment: toJoin2_in.xlf Description: Binary data Attachment: toJoin2_out.xlf Description: Binary data Attachment: toJoin3_in.xlf Description: Binary data


  • 2.  RE: [xliff] Segmentation Modifications

    Posted 11-26-2013 05:49
    I've also updated the tool here: http://okapi.opentag.com/snapshots/okapi-xliffLib_all-platforms_0.21-SNAPSHOT.zip   In case you want to see if my assumptions when splitting/joining are correct with your own input files:   To segment (after ". ") use: lynx -rw -seg myFile.xlf   To un-segment as a modifier, use: lynx -rw -join1 myFile.xlf   You can also un-segment and re-segment (The parameters order doesn’t matter: The un-segmenting is always done first, then the segmentation is performed): Lynx -rw –seg -join1 myFile.xlf   To un-segment as a merger, use: lynx -rw -join2 myFile.xlf   That -join2 option is to test the PRs for a Merger vs a Modifier (a Merger should be able to join all segments even the ones with canResegment='no') obviously here the tool just generate an XLIFF output not a merged file.   Note that for now lynx accepts the translate attribute in <segment> but just ignore it when re-segmenting (since we decided to remove it).   I'm still testing but hopefully I'll have a set of comments for the Segmentation Modification section soon.   Cheers, -yves


  • 3.  RE: [xliff] Segmentation Modifications

    Posted 11-30-2013 13:56
      |   view attached
    Hi all,   As mentioned here: https://lists.oasis-open.org/archives/xliff/201311/msg00138.html , I've been trying to implement segmentation modification for XLIFF 2.0 for a while now and I have a few comments.   For reference, the cs02 section for this is here: http://docs.oasis-open.org/xliff/xliff-core/v2.0/csprd02/xliff-core-v2.0-csprd02.html#d0e9317     --- The section (starting with its new title) keeps talking about "segmentation modification" and "resegmentation". Could we just talk about segmentation modification everywhere? The two things are the same thing.     --- That section has many constraints and processing requirements. It was quite difficult to follow when I tried to implement it.   For example: (take a deep breath) "Modifiers MUST copy all attributes including values, except for the id and order attributes, from their original instances on or within the original <segment> element onto both instances on and within the resulting two <segment> or <ignorable> elements, except for attributes that do not have valid instances on the eventually resulting <ignorable> element."   To make a long story short and get to the point, I think that section should be re-worded to be simpler, organized by action (split or join), and completed with a few things (some subState PRs, explicit directionality conversion, etc.)   The proposed modified text is in the attached document.   I believe it covers what is needed, but it's a complex set of PRs and it should be carefully checked by all. For example I'd like a confirmation on the Unicode control characters used for the directionality conversion.   Thanks, -yves     Attachment: reseg.docx Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document

    Attachment(s)

    docx
    reseg.docx   17 KB 1 version


  • 4.  Re: [xliff] Segmentation Modifications

    Posted 12-12-2013 12:48
    Yves, all I did not hear any dissent on that As far as i checked this, your proposal is equivalent to what was there for csprd02 with two small exceptions that add to clarity: 1) You use an explicit bidi provision, so that people do not need to research the Unicode BiDi algorithm for merging segments with different dir 2) You also proposed to have an option to downgrade state on split segments, which makes sense to me Otherwise it is is just reorganizing the PRs by the perfomred type of modification, which seems fine and I do not have a preference regarding the presentation of the provisions. @Yves, Do you want to implement this proposal in the spec or should I? Please let me know Thanks dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie On Sat, Nov 30, 2013 at 1:56 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all,   As mentioned here: https://lists.oasis-open.org/archives/xliff/201311/msg00138.html , I've been trying to implement segmentation modification for XLIFF 2.0 for a while now and I have a few comments.   For reference, the cs02 section for this is here: http://docs.oasis-open.org/xliff/xliff-core/v2.0/csprd02/xliff-core-v2.0-csprd02.html#d0e9317     --- The section (starting with its new title) keeps talking about "segmentation modification" and "resegmentation". Could we just talk about segmentation modification everywhere? The two things are the same thing.     --- That section has many constraints and processing requirements. It was quite difficult to follow when I tried to implement it.   For example: (take a deep breath) "Modifiers MUST copy all attributes including values, except for the id and order attributes, from their original instances on or within the original <segment> element onto both instances on and within the resulting two <segment> or <ignorable> elements, except for attributes that do not have valid instances on the eventually resulting <ignorable> element."   To make a long story short and get to the point, I think that section should be re-worded to be simpler, organized by action (split or join), and completed with a few things (some subState PRs, explicit directionality conversion, etc.)   The proposed modified text is in the attached document.   I believe it covers what is needed, but it's a complex set of PRs and it should be carefully checked by all. For example I'd like a confirmation on the Unicode control characters used for the directionality conversion.   Thanks, -yves     --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 5.  RE: [xliff] Segmentation Modifications

    Posted 12-12-2013 13:04
    Hi David,   I can do the change, that will free you time for other ones.   Did you double check the bidi mapping? I’m not expert on bidi, so it’d be good to have more than my input on that part.   Cheers, -yves   From: Dr. David Filip [mailto:David.Filip@ul.ie] Sent: Thursday, December 12, 2013 5:48 AM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] Segmentation Modifications   Yves, all I did not hear any dissent on that   As far as i checked this, your proposal is equivalent to what was there for csprd02 with two small exceptions that add to clarity:   1) You use an explicit bidi provision, so that people do not need to research the Unicode BiDi algorithm for merging segments with different dir   2) You also proposed to have an option to downgrade state on split segments, which makes sense to me   Otherwise it is is just reorganizing the PRs by the perfomred type of modification, which seems fine and I do not have a preference regarding the presentation of the provisions.     @Yves, Do you want to implement this proposal in the spec or should I? Please let me know   Thanks dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie   On Sat, Nov 30, 2013 at 1:56 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all,   As mentioned here: https://lists.oasis-open.org/archives/xliff/201311/msg00138.html , I've been trying to implement segmentation modification for XLIFF 2.0 for a while now and I have a few comments.   For reference, the cs02 section for this is here: http://docs.oasis-open.org/xliff/xliff-core/v2.0/csprd02/xliff-core-v2.0-csprd02.html#d0e9317     --- The section (starting with its new title) keeps talking about "segmentation modification" and "resegmentation". Could we just talk about segmentation modification everywhere? The two things are the same thing.     --- That section has many constraints and processing requirements. It was quite difficult to follow when I tried to implement it.   For example: (take a deep breath) "Modifiers MUST copy all attributes including values, except for the id and order attributes, from their original instances on or within the original <segment> element onto both instances on and within the resulting two <segment> or <ignorable> elements, except for attributes that do not have valid instances on the eventually resulting <ignorable> element."   To make a long story short and get to the point, I think that section should be re-worded to be simpler, organized by action (split or join), and completed with a few things (some subState PRs, explicit directionality conversion, etc.)   The proposed modified text is in the attached document.   I believe it covers what is needed, but it's a complex set of PRs and it should be carefully checked by all. For example I'd like a confirmation on the Unicode control characters used for the directionality conversion.   Thanks, -yves     --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php  


  • 6.  Re: [xliff] Segmentation Modifications

    Posted 12-12-2013 14:15
    Thanks Yves, I think that the best person to double check the BiDi is Fredrik, but I am afraid he is travelling and not able to chime in. @Fredrik? I will double check with the official BiDi algorithm by the end of this week, it won't be an in depth check but should prevent typos or unintended reversal of values. We can also ask Richard Ichida to check on that, but the response time of the W3C I18n group have been slow.. Rgds dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie On Thu, Dec 12, 2013 at 1:04 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi David,   I can do the change, that will free you time for other ones.   Did you double check the bidi mapping? I’m not expert on bidi, so it’d be good to have more than my input on that part.   Cheers, -yves   From: Dr. David Filip [mailto: David.Filip@ul.ie ] Sent: Thursday, December 12, 2013 5:48 AM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] Segmentation Modifications   Yves, all I did not hear any dissent on that   As far as i checked this, your proposal is equivalent to what was there for csprd02 with two small exceptions that add to clarity:   1) You use an explicit bidi provision, so that people do not need to research the Unicode BiDi algorithm for merging segments with different dir   2) You also proposed to have an option to downgrade state on split segments, which makes sense to me   Otherwise it is is just reorganizing the PRs by the perfomred type of modification, which seems fine and I do not have a preference regarding the presentation of the provisions.     @Yves, Do you want to implement this proposal in the spec or should I? Please let me know   Thanks dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie   On Sat, Nov 30, 2013 at 1:56 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all,   As mentioned here: https://lists.oasis-open.org/archives/xliff/201311/msg00138.html , I've been trying to implement segmentation modification for XLIFF 2.0 for a while now and I have a few comments.   For reference, the cs02 section for this is here: http://docs.oasis-open.org/xliff/xliff-core/v2.0/csprd02/xliff-core-v2.0-csprd02.html#d0e9317     --- The section (starting with its new title) keeps talking about "segmentation modification" and "resegmentation". Could we just talk about segmentation modification everywhere? The two things are the same thing.     --- That section has many constraints and processing requirements. It was quite difficult to follow when I tried to implement it.   For example: (take a deep breath) "Modifiers MUST copy all attributes including values, except for the id and order attributes, from their original instances on or within the original <segment> element onto both instances on and within the resulting two <segment> or <ignorable> elements, except for attributes that do not have valid instances on the eventually resulting <ignorable> element."   To make a long story short and get to the point, I think that section should be re-worded to be simpler, organized by action (split or join), and completed with a few things (some subState PRs, explicit directionality conversion, etc.)   The proposed modified text is in the attached document.   I believe it covers what is needed, but it's a complex set of PRs and it should be carefully checked by all. For example I'd like a confirmation on the Unicode control characters used for the directionality conversion.   Thanks, -yves     --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php  


  • 7.  Re: [xliff] Segmentation Modifications

    Posted 11-26-2013 09:39
    Yves, I quickly checked the files.. toJoin2_out seems not to be joined, the files look the same as the  *in file Other than that the joining behavior seems ok Few other comments based on the fragment identifier discussion, we have seen that file ids need to be REQUIRED It remains to be seen if they can stay NMTOKENs BTW, I know think they can.. Also translate should not be on segment any longer, as confirmed several minutes ago.. I am in touch with Asanka re publishing notes, so that the number of important consensus resolutions is seen on the mailing list.. Cheers dF I  Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie On Tue, Nov 26, 2013 at 5:28 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all, I'm working on implementing the segmentation modification, in this case: un-segmenting. I've attached several files before (_in) and after (_out). Could you tell me if any of the results is not what you would expect? The modifications: the tool joins all segments/ignorable elements in each unit as long as the unit has no segment with canResegment='no'. Thanks, -yves --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 8.  RE: [xliff] Segmentation Modifications

    Posted 11-26-2013 10:02
    Hi David, all, > toJoin2_out seems not to be joined, > the files look the same as the  *in file Yes. That is because at least one segment says canResegment='no'. The command used here joins all segments in a unit, except if one or more segments has canResegment='no' (all or nothing, it's just the way the command work for now). > based on the fragment identifier discussion, we have seen > that file ids need to be REQUIRED > It remains to be seen if they can stay NMTOKENs > BTW, I know think they can.. Very possible. But we don't yet have a TC decision on this overall issue. I try to avoid doing changes until it's official at the TC level. > Also translate should not be on segment any longer, > as confirmed several minutes ago.. Yes. That's part of the changes. I've just pushed a new version of the tool and test files that expect translate to not be in <segment>. Thanks, -ys