OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  Source read-only or not?

    Posted 04-03-2012 02:35
    Hi all, While going through last meeting's minutes I noticed this request from Rodolfo: > === 6/ New Business > Rodolfo: Request to know if source text should > be read-only or not. I would tend to say that the source text should be modifiable by any XLIFF user agents. In addition to apply some annotations (which would probably not be considered a modification of the content), some steps of the process may want to affect the source text itself before it goes to translation. For example running a spell/grammar checker to help the MT pre-translation later, or doing some form of editing based on a source TM, etc. The bottom line is that we probably don't want to restrict what tools could do without very good reasons. Cheers, -yves


  • 2.  RE: [xliff] Source read-only or not?

    Posted 04-03-2012 08:43
    Hi, I would agree that source should be modifiable within limits. One more case where the source would need to be modified is if it need to be segmented. In that operation it would be preferable no only that the content is divided up between multiple segments, but also that spanning tags can be converted to non-spanning equivalent tags. I also agree that it would be good to allow annotations on the source text to be added. Both of these operations preserve the actual original source content semantically. They just change the representation not the meaning. I'm less sure that spell correcting or editing the content, which in effect changes the meaning of the source and it's correspondence with the original document, is a good idea. It brings XLIFF closer to a authoring format. All of these features could be implemented with a read only source, I'm not sure I would like the technical solutions though. The solution would be to use external elements and attributed that point to portions of the source content and add annotations, segmentation points or even edit operations. Regards, Fredrik Estreen


  • 3.  RE: [xliff] Source read-only or not?

    Posted 04-03-2012 09:35
    Hi Yves, If source is not read-only, then it would be fine to throw away all annotations and change the elements used for inline markup. Right? Regards, Rodolfo -- Rodolfo M. Raya rmraya@maxprograms.com Maxprograms http://www.maxprograms.com >


  • 4.  Re: [xliff] Source read-only or not?

    Posted 04-03-2012 10:04
    Rodolfo, > If source is not read-only, then it would be fine to throw away all annotations and change the elements used for inline markup. Right? I bet there are cases where it would be fine. As a general rule, no, but it's not hard to think of a case where multiple tools are involved and someone could pre-process the XLIFF spit out from one tool to ensure it will work in another tool. We may want to provide guidelines that apply for 99% of the cases, but the moment we make a blanket statement like saying source must be read only because we're worried about someone doing something we don't want, we will limit the usefulness of the specification for those who don't fit into what we anticipated. That said, those who don't fit into what we anticipate will probably do what they're going to do anyway and a rule will not deter them one way or another. -Arle


  • 5.  RE: [xliff] Source read-only or not?

    Posted 04-03-2012 12:06
    Hi Fredrik, Rodolfo, Arle, all, > If source is not read-only, then it would be fine > to throw away all annotations and change the > elements used for inline markup. Right? Yes, it would be doable. Like Fredrik I think the different forms of inline codes that are equivalent should be switchable (e.g. a <pc> to a <sc>/<ec>, or vice-versa). Changing a <sc> or a <ec> to a <ph> would not be allowed (since that would loose information). Maybe we need to expend a bit the section "Usage of <pc> and <sc>/<ec>" to include all inline codes. Annotations are normally extra information rather than existing code (although the nature of translate and possible directionality markers may need to be clarified: some of those may come from the original format). So, removing or adding annotations in the source should be ok. As for whether changing the source text is a good idea or not, I don't think XLIFF can be the judge of that. It may be bad to run a spell-checking step on a source text if you need to refer that content back to the original document at some point later, but it may be just fine (and useful) in other cases. People using tools need to decide those things, not the storage format. The only control on text change I would see XLIFF trying to enforce would be the white spaces, using xml:space. (I'm not sure xml:space is perfectly adequate actually. Currently it's available on <source> and <target>, but it would be better to have such information at the <unit> level. But then xml:space would apply to all child elements (not just <source> and <target>. But that's a different topic). Cheers, -yves


  • 6.  RE: [xliff] Source read-only or not?

    Posted 04-03-2012 13:01
    >


  • 7.  RE: [xliff] Source read-only or not?

    Posted 04-03-2012 14:38
    > Why put a restriction for converting codes to <ph> > if we are allowing changes? As it would be allowed > to add inline codes, removing existing ones should > also be allowed, then replacing <pc> or <sc>/<ec> > with <ph> would be possible. Good point: I suppose some step may want to re-do the whole inline code tagging if they want to. > Considering that <source> content would be modifiable, > we cannot rely on it anymore for generating a partially > translated document from an incomplete XLIFF that lacks > some <target> elements. Not sure why. > If annotations end inside <source> instead of staying > outside using attributes to indicate start and end, > I would make them completely ignorable, treating them > in the same way XML comments are treated. > Anything that doesn't have origin in the source document > and is not required for generating a translated > document should be ignorable. That's why I said: annotations can be delete. But I wouldn't treat a marker and a comment (or a PI) the same way. a) The comment (or the PI) is not an XLIFF data structure. It interferes with the content. It must not be present in the parsed content. It MAY be preserved if the tool wants to do so (but making sure it does not interfere with the content). b) The annotation is a valid XLIFF structure carrying interoperable information. It does not interfere with the content as we have processing expectations associated with it. One of which should be to ignore annotations when you have a need to see the content with just text and inline codes. The annotation SHOULD be preserved on output. In summary: comments/PIs MAY be preserved, annotations SHOULD be preserved, neither MUST be preserved. And the Inline Markup SC has still to resolve the case of translate and possibly direction information: Currently it's not completely clear yet how to represent such information if it was in the original document (as an annotation, or an inline code, or both). >> ...But then xml:space would apply to all child >> elements (not just <source> and <target>. > > Having the "xml:space" attribute at <unit> level will > affect matches coming from TM that may need to be > preserved as retrieved. That seems a bad idea. Exactly. But then we need to make sure an xml:space set in the <source> gets copied over to <target> if the target element didn't exist. I don't think we have a processing expectation stating this explicitly. So technically one could have a <source xml:space='preserve'> and a <target>. Cheers, -yves