OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only
Expand all | Collapse all

Source read only or modifiable?

  • 1.  Source read only or modifiable?

    Posted 05-22-2012 20:45
    Hi,   I started a thread some time ago about making source read-only or not and would like to revisit the topic.   In the last meeting we had some kind of agreement on considering that segmentation does not change source text. Segmentation only distributes source text in one or more segments without altering it.   We also agreed that replacing span-like inline tags with equivalents in the form of start/end markers may not constitute a modification of source text as long as the replacing operation doesn’t lose data.   If we consider source text as read-only in the specification, the two operations mentioned above can be declared as allowed in the relevant processing expectation section.   Keeping in mind that segmentation and certain tag replacements are not to be considered modifications of source, should we consider source text read-only or should we allow modifications during the translation process?   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com    


  • 2.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 00:37
    Hi Rodolfo,   If we were to make the source content “read-only” I think it would be also important to allow the addition/removing of annotations in the source (<mrk>). There are many use cases where multi-steps processes enrich the source with such metadata, and like with segmentation, it does not change the “content”.   Also: would a content * not * marked with xml:space=’preserve’ be considered modified if it gets normalized? For example go from:   <source>Some text and more text.</source>   to:   <source>Some text and more text.</source>   Cheers, -yves     From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Rodolfo M. Raya Sent: Tuesday, May 22, 2012 2:45 PM To: xliff@lists.oasis-open.org Subject: [xliff] Source read only or modifiable?   Hi,   I started a thread some time ago about making source read-only or not and would like to revisit the topic.   In the last meeting we had some kind of agreement on considering that segmentation does not change source text. Segmentation only distributes source text in one or more segments without altering it.   We also agreed that replacing span-like inline tags with equivalents in the form of start/end markers may not constitute a modification of source text as long as the replacing operation doesn’t lose data.   If we consider source text as read-only in the specification, the two operations mentioned above can be declared as allowed in the relevant processing expectation section.   Keeping in mind that segmentation and certain tag replacements are not to be considered modifications of source, should we consider source text read-only or should we allow modifications during the translation process?   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com    


  • 3.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 02:27
    Hi Yves,   Annotations using <mrk> alter source text.   If the tool that creates the XLIFF decorates <source> with <mrk>, fine. All other tools used later in the translation chain must preserve them, even if they cannot take advantage of  those elements.   If the tool that creates the XLIFF doesn’t add <mrk> elements, then <mrk> elements should not be present when the XLIFF returns to the original tool for generating the translated file.   As we discussed the other day during the meeting, changes done by a second tool should be temporary and the original <source> restored before moving to the next tool/stage.   I proposed a solution for annotations some time ago that used offsets stored as attributes in an element outside <source> or <target>. Annotations don’t need to alter the original source. They don’t need to be inline elements at all, they can be optional elements that live in a module.   A tool that creates an XLIFF file should be able to receive a translated file with the same structure it generated. There should not be any extra element added during the translation process. Neither in <source> nor in <target>.   Normalizing spaces means modifying spaces. It is a change that should not be allowed in any stage if the xml:space attribute is set to “preserve”.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel Sent: Tuesday, May 22, 2012 9:37 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] Source read only or modifiable?   Hi Rodolfo,   If we were to make the source content “read-only” I think it would be also important to allow the addition/removing of annotations in the source (<mrk>). There are many use cases where multi-steps processes enrich the source with such metadata, and like with segmentation, it does not change the “content”.   Also: would a content * not * marked with xml:space=’preserve’ be considered modified if it gets normalized? For example go from:   <source>Some text and more text.</source>   to:   <source>Some text and more text.</source>   Cheers, -yves     From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Rodolfo M. Raya Sent: Tuesday, May 22, 2012 2:45 PM To: xliff@lists.oasis-open.org Subject: [xliff] Source read only or modifiable?   Hi,   I started a thread some time ago about making source read-only or not and would like to revisit the topic.   In the last meeting we had some kind of agreement on considering that segmentation does not change source text. Segmentation only distributes source text in one or more segments without altering it.   We also agreed that replacing span-like inline tags with equivalents in the form of start/end markers may not constitute a modification of source text as long as the replacing operation doesn’t lose data.   If we consider source text as read-only in the specification, the two operations mentioned above can be declared as allowed in the relevant processing expectation section.   Keeping in mind that segmentation and certain tag replacements are not to be considered modifications of source, should we consider source text read-only or should we allow modifications during the translation process?   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com    


  • 4.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 04:00
    Hi Rodolfo, all, > Normalizing spaces means modifying spaces. > It is a change that should not be allowed in > any stage if the xml:space attribute is set > to “preserve”. The question was: "Would a content *not* marked with xml:space=’preserve’ be considered modified if it gets normalized?" (in the context of having a "read-only" source). > Annotations using <mrk> alter source text. No more than <segment> elements. And we said re-segmenting was not considered a modification of the content. > If the tool that creates the XLIFF doesn’t add > <mrk> elements, then <mrk> elements should not be > present when the XLIFF returns to the original > tool for generating the translated file. I disagree. Like with <segment>, the merging tool can strip the <mrk>. By definition they are not part of the content to merge. This is how <mrk> is defined in 1.2 ("The <mrk> element is usually not generated by the extraction tool and it is not part of the tags used to merge the XLIFF file back into its original format.") and I certainly hope that the same principle will carry over in 2.0. > As we discussed the other day during the meeting, > changes done by a second tool should be temporary and > the original <source> restored before moving to the > next tool/stage. We discussed this when talking about inline codes (e.g. converting text to inline codes, i.e. adding codes). We were not talking about <mrk> at that time. > I proposed a solution for annotations some time ago > that used offsets stored as attributes in an element > outside <source> or <target>. Annotations don’t need > to alter the original source. They don’t need to be > inline elements at all, they can be optional elements > that live in a module. Your idea was discussed in the Inline Markup SC and unanimously found impractical for an XML exchange format. If XLIFF was a binary format in the other hands... But it's not. > There should not be any extra element added > during the translation process. Neither in <source> > nor in <target>. I disagree. We must be able to add/remove inline codes in the target. It's a basic requirement. Cheers, -yves


  • 5.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 10:35
    >


  • 6.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 11:20
    Hi Rodolfo, all, >> The question was: "Would a content *not* marked with >> xml:space=’preserve’ be considered modified if it gets normalized?" >> (in the context of having a "read-only" source). > > If xml:space is not set to "preserve", then spaces are meaningless > from XML point of view and normalization does not affect content. > That's something defined in a level above XLIFF. Good. So normalizing source (if xml:preserve != 'preserve') would not be a modification of the content. >>> Annotations using <mrk> alter source text. >> >> No more than <segment> elements. > > An annotation does not represent anything included in the original > document. It is neither original text nor formatting information > that needs to be preserved. Exactly. So we agree that adding/removing annotations does not modify the content. >>> If the tool that creates the XLIFF doesn’t add <mrk> elements, then >>> <mrk> elements should not be present when the XLIFF returns to the >>> original tool for generating the translated file. >> >> I disagree. Like with <segment>, the merging tool can strip the <mrk>. >> By definition they are not part of the content to merge. > > If that's not part of the original content to merge, then > it is not an essential inline element. Exactly. It can be ignored by the merger tool. >> This is how <mrk> is defined in 1.2 ("The <mrk> element >> is usually not generated by the extraction tool and it >> is not part of the tags used to merge the XLIFF file >> back into its original format.") and I certainly hope that >> the same principle will carry over in 2.0. > > If it is not generated by the extraction tool, then it > should not be in core. It must live in an optional module. The SC hasn't discussed yet if the different elements of the inline markup should be part of the core or their own namespace(s). So I don't have an opinion on this yet. Note that annotations can be generated by the extraction tool, and some can be important to the translation process (e.g. <mrk translate='no'>). So there might be a case for <mrk> to be part of the core if the other inline elements are. I hope the SC will be able to address that question during the face-to-face meeting. >>> There should not be any extra element added during the translation >>> process. Neither in <source> nor in <target>. >> >> I disagree. We must be able to add/remove inline codes in the target. >> It's a basic requirement. > > Then the tool creating the XLIFF file should be able to > indicate whether those additions are acceptable or not. > This can be indicated with an attribute at <file> level. The > original tool should have a way to express that <target> must > have the same structure as the original <source> (that should > be the default case IMO). It also should be able to express > that alterations to <source> are not allowed. And, for > completeness sake, it should be able to indicate whether > re-segmentation is allowed. I agree that being able to set different permissions at the file level may be useful. Permission for adding/removing inline codes can be set at the code level, but a file-level flag could complement (or override that). Permission to re-segment could be useful too. I think there is an item in the features list: ( http://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/PermissionControl ) That may be related to this. Cheers, -yves


  • 7.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 11:53
    >


  • 8.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 14:25
    Hi, >>>>> Annotations using <mrk> alter source text. >>>> >>>> No more than <segment> elements. >>> >>> An annotation does not represent anything included in the original >>> document. It is neither original text nor formatting information >>> that needs to be preserved. >> >> Exactly. So we agree that adding/removing annotations does not modify >> the content. > > No, we don't agree. Adding an annotation means altering > the original. The key is the verb: "add". When we re-segment we also "add" <segment>/<source> tags, and that (as we agreed) does not "modify" the original extracted content. So how can adding <mrk> is any different? If we strip out the <segment>/<source> and the <mrk> tags we get the same original content we had before adding them. The annotations are just a layer on top of the original extracted content, just like <segment>/<source>. Please, point me the difference. In any case, it seems that we do agree on one thing: Ignoring the <mrk> is fine for the merger tool. So what is the problem with having them at that stage? > The Inline SC should propose whether to have inline > elements as part of the core or in a separate schema, > we know that. Nevertheless, if inline elements live > in their own schema, annotations should live in a > different one. Maybe, maybe not. If the elements for inline codes have their own namespace, they don't have to follow the core prescriptions. We'll see what the SC come up with. It may think having all the inline markup in a single namespace is simpler. I have no opinion on that yet. >> Note that annotations can be generated by the >> extraction tool, and some can be important to the >> translation process (e.g. <mrk translate='no'>). >> So there might be a case for <mrk> to be part of >> the core if the other inline elements are. > > That would be only possible if the original document includes > something that says a given span of text should not be translated. Not necessarily. An extraction tool can do many things: some can pre-segment, some can process the content to add <mrk translate='no'> based on a black-list of do-not-translate texts. There is no original code to anchor the translate attribute in that case. Cheers, -yves


  • 9.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 15:00
    >


  • 10.  RE: [xliff] Source read only or modifiable?

    Posted 05-24-2012 02:48
    Title: RE: [xliff] Source read only or modifiable? Hi, > When you do segmentation, you add elements to <unit>. > When you add <mrk> elements, you add them to <source>. Adding a < segment> to <unit> is the same as inserting "</source></segment><segment><source>" to the initial <source> content. Just like you would insert " <mrk> ". In both cases you break the content into parts. In both cases you can strip the added tags and get back the original content of the initial <sourc e>. If anything, re-segmenting has a lot more impact on the unit than doing annotation (e.g. if you join two segments you may have to remove <matches> and other children of <segment> elements, update approved and other attributes, etc). So, I still think adding <mrk> elements in content doesn ’ t "modify" the content any more than adding <segment> does. And theref ore if it ’ s ok to re-segment, it stands to reason that annotations are ok too. > To make things worse, the description of <mrk> > in the specification draft says it could have > multiple nesting levels. That means extra levels > of processing to get ri d of undesired markup. I ’ m afraid user requirements trumps developer ’ s comfort. Not allowing things like "<mrk><mrk>Some</mrk> text</mrk>" would be like saying to HTML users: You can have text in bold or italics but not both at the same time. > > Maybe, maybe not. If the elements for inline codes have their >> own namespace, they don't have to follow the core prescriptions. >> We'll see what the SC come up with. It may think having all the >> inline markup in a single namespace is simpler. I have no opinion >> on that yet. > > An important thing to keep in mind: elements from a module are > optional and disposable. > ... > What would a tool that only supports "core" do with inline > elements if they are in a module? > ... > That's too risky and justifies having inline elements in core. I tend to agree with you. We ’ ll see what the SC thinks as a group. > As said before, that information doesn't have to be > in a <mrk> element. It could be stored outside <source> > so it is not *absolutely* es sential to consider > <mrk> as an inline element. The <mrk> is needed to delimit the span of annotated content. The idea of using offsets didn ’ t find much support I ’ m afraid. This said, XLIFF is just an exchange format: O ne can use whatever representation in the internal object model . One simple mapping from markup to offset and you have an unencumbered content . Cheers, -yves


  • 11.  RE: [xliff] Source read only or modifiable?

    Posted 05-28-2012 14:57
    Hi, I disagree with an originating tool being able to express whether alterations to <source> are allowed or not allowed. If it's a part of the agreed-upon standard, it should be allowed with no discussion; if it's not allowed, it should be forbidden. The same holds for re-segmentation - at least a significant part of any leverage lost between tools is because of re-segmentation. If a tool routinely generates XLIFF files forbidding re-segmentation, it's a good way to force users to only use the tools/translation memories/etc provided by that original tool. I would personally prefer to see file permissions restricted to optional modules / attributes / elements. Shirley


  • 12.  Re: [xliff] Source read only or modifiable?

    Posted 05-28-2012 16:30
    Hi Shirley, all, looking at the discussion of source as read only or not, I think that clearly at least two different usage scenarios have been uncovered.[1][2] I fully agree with Shirley that originating tools should not be able to prevent re-segmentation. We cannot prescribe business behavior, but on the contrary we are obliged to enable valid use cases. [The re-importing receiving tool might however set some segmentation expectations and/or enforce engineering and quality checks on re-import. And often the receiving tool will be the same as the originating tool in two different roles in the workflow. I believe that we should allow for a reasonable descriptive set of re-import checks] [1] Re-segmentation and source enrichment (annotations as mentioned by Yves) should be considered as valid and potentially desirable operations that should be allowed throughout the process despite their affecting source. [2] On the other hand changing source in the sense of e.g. removing typos and/or otherwise improving readability/translatability should NOT be allowed *during translation*. Translation should be for that purpose defined as *target editing*, ergo excluding editing changes of source [should not exclude changes of type one however]. Nevertheless, the originating tool should be able to mark a state of a segment. For the described scenarios the required states are only two: 1) preprocessing 2) translation Of course in general the segment state machine should be more complex and have dependencies with child and parent state machines. E.g. in a segment you should be able to mark separately status of source and target. (At least manual) re-segmentation should be possible even during the translation proper to allow for translators handling of erroneous sentence breaks and non breaks. It is clear that segmentation rules are an open ended system that can never be completed via any regular _expression_ engine (because the abbreviations causing exceptions are a productive area in most affected languages).. Thanks for your attention dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Mon, May 28, 2012 at 3:57 PM, Shirley Coady < scoady@multicorpora.com > wrote: Hi, I disagree with an originating tool being able to express whether alterations to <source> are allowed or not allowed. If it's a part of the agreed-upon standard, it should be allowed with no discussion; if it's not allowed, it should be forbidden. The same holds for re-segmentation - at least a significant part of any leverage lost between tools is because of re-segmentation. If a tool routinely generates XLIFF files forbidding re-segmentation, it's a good way to force users to only use the tools/translation memories/etc provided by that original tool. I would personally prefer to see file permissions restricted to optional modules / attributes / elements. Shirley


  • 13.  RE: [xliff] Source read only or modifiable?

    Posted 05-23-2012 02:41
    Hi,   I just realized that <mrk> should not be part of the core. It is not essential for creating an XLIFF file, translating it and generating a translated document.   There could be an optional  module for handling annotations.   As I said before, annotations can be handled outside source or target using offsets. If offsets are not good for a given tool, the optional element used for  the annotation may contain a copy of the text being annotated.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Yves Savourel Sent: Tuesday, May 22, 2012 9:37 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] Source read only or modifiable?   Hi Rodolfo,   If we were to make the source content “read-only” I think it would be also important to allow the addition/removing of annotations in the source (<mrk>). There are many use cases where multi-steps processes enrich the source with such metadata, and like with segmentation, it does not change the “content”.   Also: would a content * not * marked with xml:space=’preserve’ be considered modified if it gets normalized? For example go from:   <source>Some text and more text.</source>   to:   <source>Some text and more text.</source>   Cheers, -yves     From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Rodolfo M. Raya Sent: Tuesday, May 22, 2012 2:45 PM To: xliff@lists.oasis-open.org Subject: [xliff] Source read only or modifiable?   Hi,   I started a thread some time ago about making source read-only or not and would like to revisit the topic.   In the last meeting we had some kind of agreement on considering that segmentation does not change source text. Segmentation only distributes source text in one or more segments without altering it.   We also agreed that replacing span-like inline tags with equivalents in the form of start/end markers may not constitute a modification of source text as long as the replacing operation doesn’t lose data.   If we consider source text as read-only in the specification, the two operations mentioned above can be declared as allowed in the relevant processing expectation section.   Keeping in mind that segmentation and certain tag replacements are not to be considered modifications of source, should we consider source text read-only or should we allow modifications during the translation process?   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com