OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  Section 2.6.7 "Target Content Modification" - Split/Merge

    Posted 12-06-2012 15:43
    Our Translation Infrastructure team has some concerns on the Section 2.6.7 Target Content Modification . This mail is about one of the the concerns. Our main comments in blue, and questions in red. (For non-HTML mail readers, I used ==> to start a comment, and brackets to mark a question). The specification says about split/merge actions in the processing requirements. 2.6.7.1 Without an Existing Target User agents may leave the existing target unchanged.  ==> This is contradictory to the title. No existing target. User agents may split the segment into two segments. User agents may join the segment with the following one. 2.6.7.2 With an Existing Target User agents may join the segment with the following segment ==> No PR about splitting segments. Does this mean it is not allowed? If <unit> has one <segment>, then will it be prohibited that agents split the segment? [We would like to prevent any split/merge actions by agent. How? ] User agents may delete the existing target and start over as if working without an existing target  ==> If an existing target can be deleted, this implies the processing requirements in 2.6.7.2 can be completely ignored. [ How can we prevent it? ] Content in our translation kits is pre-segmented and an agreed processing requirement is that segments should not be altered (further split or merged). As it stands there is no mechanism at unit level in XLIFF 2 that would support our use case/requirement, and we would have to create a proprietary extension to ensure our translation vendors would support that PR, which implies losing xyz (standardisation, interoperability, etc, you name it) [Suggestions] Attributes (canJoin, canMerge) at <segment> level to prevent changes in the number of segments inside a <unit>. By default such changes can be allowed. Or Alternatively a validation can be designed, but I think this approach requires more sophisticated design. Thanks Oracle  


  • 2.  RE: [xliff] Section 2.6.7 "Target Content Modification" - Split/Merge

    Posted 12-06-2012 17:04




    Hi Jung,
     
    The sentence in 2.6.7.1 should be removed. Or reworded to say that the user agent may leave the segment without a target.
     
    On 2.6.7.2: Regarding the current rules on splitting and joining. When there is no target it is safe to split the source string into two strings. If there is
    a target it would need language analysis to find the right point (if possible) to split source and target so that the two new pairs still contain source and target that is linguistically connected. Since this is a very hard thing to do in general that case
    is forbidden. On the other hand there is no risk that you end up with linguistically non matching source and target nodes if you completely remove the target. Once you have completely removed it you can go on and split the source as you want. There is one
    other subtle point here, if a tool with proprietary knowledge created the target (2.6.7 “The extraction tool can create the initial target content as it sees fit.”) it means that it could have placed inline tags in it that would not be allowed to be placed
    by a generic tool. These would be lost if the target is removed.
     
    I think the general idea on sub segmentation so far is that tools are allowed to do it most of the time in order to fit the requirements of their process and
    that this was seen as desirable in past discussions. I can see why you might want to restrict this ability in order to match your process. And I do not think you are alone in wanting to do so. This all comes down to the dynamic vs. static behavior of the <segment>
    and <ignorable> elements.
     
    An alternative to the attributes on segment that I feel is cleaner is to leverage the static structure property of <unit>. If you have non sub dividable pieces
    that you want to preserve they should be put in a <unit> each and in the <unit> you put a single <segment>. If logical grouping is needed, instead of using segmentation you group by using the <group> element. This way you can at the end of the processing chain
    go over the whole document and merge all <segment> and <ignorable> nodes in each <unit> back into a single <segment>. With the result that you get back to the structure you initially created regardless of what other tools did along the processing path. The
    advantage here is that it need no extra processing requirements and still allow a certain amount of flexibility to the downstream tools. And the segmentation remains as the extraction tool wanted it.
     
    Regards,
    Fredrik Estreen
     



    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Jung Nicholas Ryoo
    Sent: den 6 december 2012 16:41
    To: xliff@lists.oasis-open.org
    Subject: [xliff] Section 2.6.7 "Target Content Modification" - Split/Merge


     
    Our Translation Infrastructure team has some concerns on the Section 2.6.7 "Target Content Modification".
    This mail is about one of the the concerns.

    Our main comments in blue, and questions in red. (For non-HTML mail readers, I used "==>" to start a comment, and brackets to mark a question).

    The specification says about split/merge actions in the processing requirements.

    2.6.7.1 Without an Existing Target


    User agents may leave the existing target unchanged.  ==> This is contradictory to the title. No existing target.
    User agents may split the segment into two segments.
    User agents may join the segment with the following one.
    2.6.7.2 With an Existing Target


    User agents may join the segment with the following segment
    ==> No PR about splitting segments. Does this mean it is not allowed? If <unit> has one <segment>, then will it be prohibited that agents split the segment?
    [We would like to prevent any split/merge actions by agent. How?]
    User agents may delete the existing target and start over as if working without an existing target 
    ==> If an existing target can be deleted, this implies the processing requirements in 2.6.7.2 can be completely ignored.
    [ How can we prevent it? ]
    Content in our translation kits is pre-segmented and an agreed processing requirement is that segments should not be altered (further split or merged). As it stands there is no mechanism
    at unit level in XLIFF 2 that would support our use case/requirement, and we would have to create a proprietary extension to ensure our translation vendors would support that PR, which implies losing xyz (standardisation, interoperability, etc, you name it)

    [Suggestions]


    Attributes (canJoin, canMerge) at <segment> level to prevent changes in the number of segments inside a <unit>. By default such changes can be allowed.

    Or
    Alternatively a validation can be designed, but I think this approach requires more sophisticated design.

    Thanks
    Oracle


     









  • 3.  RE: [xliff] Section 2.6.7 "Target Content Modification" - Split/Merge

    Posted 12-10-2012 13:07
    Hi Jung, Fredrik, all, > The sentence in 2.6.7.1 should be removed. Or reworded to > say that the user agent may leave the segment without a target. +1. > On 2.6.7.2: Regarding the current rules on splitting and joining. > ... > And I do not think you are alone in wanting to do so. This all > comes down to the dynamic vs. static behavior of the <segment> > and <ignorable> elements. I think having some kind of mechanism that indicates a unit should not be re-segmented would be perfectly acceptable. This type of requirement exists for example in XLIFF:doc from Interoperability Now. this notion is also listed in our wiki (See the section "Segment modification" in https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/Segmentation ). We had no discussion on this until now, but that doesn't mean we should not. > An alternative to the attributes on segment that I feel is > cleaner is to leverage the static structure property of <unit>. > If you have non sub dividable pieces that you want to preserve > they should be put in a <unit> each and in the <unit> you put > a single <segment>. I would prefer to use some unit-level attribute than re-purpose the group/unit. Hinting that putting the segment representation may be done that way would somewhat go against the idea of having only one way to do one thing. I'd rather have a clear attribute that indicate what re-segmentation operation can be done on a unit. An extra note for the Inline-SC members: This discussion is related to text in a sub-section that lives in the "Inline Content" section, but it concerns more the general structure than the inline markup. I think any potential change here would not affect the overall inline markup proposals. So I think we can proceed in the SC with moving our proposals to the TC. cheers, -yves


  • 4.  Re: [xliff] Section 2.6.7 "Target Content Modification" - Split/Merge

    Posted 12-10-2012 17:33
    Thank you very much Fredrik, Yves for your feedback. I had to wait for the opinions of developers here regarding your feedback. We agree with Yves's approach to have a single attribute (or two) to prevent re-segmentation by other tools. We will wait for a few more feedback from Fredrik and others. Regards Jung On 10/12/2012 13:06, Yves Savourel wrote: Hi Jung, Fredrik, all, The sentence in 2.6.7.1 should be removed. Or reworded to say that the user agent may leave the segment without a target. +1. On 2.6.7.2: Regarding the current rules on splitting and joining. ... And I do not think you are alone in wanting to do so. This all comes down to the dynamic vs. static behavior of the <segment> and <ignorable> elements. I think having some kind of mechanism that indicates a unit should not be re-segmented would be perfectly acceptable. This type of requirement exists for example in XLIFF:doc from Interoperability Now. this notion is also listed in our wiki (See the section Segment modification in https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/Segmentation ). We had no discussion on this until now, but that doesn't mean we should not. An alternative to the attributes on segment that I feel is cleaner is to leverage the static structure property of <unit>. If you have non sub dividable pieces that you want to preserve they should be put in a <unit> each and in the <unit> you put a single <segment>. I would prefer to use some unit-level attribute than re-purpose the group/unit. Hinting that putting the segment representation may be done that way would somewhat go against the idea of having only one way to do one thing. I'd rather have a clear attribute that indicate what re-segmentation operation can be done on a unit. An extra note for the Inline-SC members: This discussion is related to text in a sub-section that lives in the Inline Content section, but it concerns more the general structure than the inline markup. I think any potential change here would not affect the overall inline markup proposals. So I think we can proceed in the SC with moving our proposals to the TC. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org --