OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  RE: [xliff] XLIFF 2.0 example files for segmentation

    Posted 11-09-2011 22:42
    Hi Arle, The idea of using <segment> and <ignorable> is to allow freedom for splitting and merging segments that belong to the same <unit>.   A tool that creates a <unit> with one or more <segment> should not care about how many <segment> elements the <unit> has when the time to create the translated document arrives. Having <segment> in core will allow to create an XLIFF file with <unit> elements containing paragraphs, use a tool  that allows translating at sentence level and finally get a translated file. Also, this would solve the problem of sentence reordering because users would be able to merge all <segment> elements from a <unit> and translate the paragraph as one piece of text, obtaining a better translation. In essence, using <segment> moves segmentation from text extraction domain to translation domain. Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time. Regards, Rodolfo -- Rodolfo M. Raya Maxprograms http://www.maxprograms.com


  • 2.  Re: [xliff] XLIFF 2.0 example files for segmentation

    Posted 11-09-2011 22:58
    OK. So the scenario I pulled from Dave's example, file with <segment>s is refactored as file with different <segments>s but has to be restored to original <segments> is a chimera. There would be no processing expectation that <segment>s are preserved or reconstructable and a tool MUST NOT rely on the <segment> element to reconstruct the source file. If that is correct, then Dave's example would need to lose snippets two and five. Is that right? In essence, using <segment> moves segmentation from text extraction domain to translation domain. In most cases, I think that makes sense. The tool needs the freedom to segment in the manner that it sees as appropriate. Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time. If XLIFF files “don’t need to worry about segmentation…” doesn't that meant that this is not a core feature? If segmentation is moved to the translation domain and adjusted at translation time, aren't we saying that segmentation is really a function of the import filter (and hence beyond the scope of XLIFF), just as it would be for any other file format? I don't need to indicate segmentation in Word or IDML, so what makes XLIFF different in this regard? What is the case where you would need <segment> in the XLIFF file where that need wouldn't be met by a non-XLIFF process inside the tool? I still don't see that clearly enough. -Arle Arle Lommel Standards Coordinator GALA Standards Initiative +1 (707) 709 8650 (GMT -4) Skype: arle_lommel LinkedIn:  www.linkedin.com/in/arlelommel The GALA Standards Initiative promotes the effective use of standards for international and multilingual content, builds awareness of best practices for their implementation, and helps the localization community make open standards work. On Nov 9, 2011, at 13:42 , Rodolfo M. Raya wrote: Hi Arle, The idea of using <segment> and <ignorable> is to allow freedom for splitting and merging segments that belong to the same <unit>.   A tool that creates a <unit> with one or more <segment> should not care about how many <segment> elements the <unit> has when the time to create the translated document arrives. Having <segment> in core will allow to create an XLIFF file with <unit> elements containing paragraphs, use a tool  that allows translating at sentence level and finally get a translated file. Also, this would solve the problem of sentence reordering because users would be able to merge all <segment> elements from a <unit> and translate the paragraph as one piece of text, obtaining a better translation. In essence, using <segment> moves segmentation from text extraction domain to translation domain. Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time. Regards, Rodolfo -- Rodolfo M. Raya Maxprograms http://www.maxprograms.com


  • 3.  RE: [xliff] XLIFF 2.0 example files for segmentation

    Posted 11-09-2011 23:34
    > So the scenario I pulled from Dave's example, > file with <segment>s is refactored as file > with different <segments>s but has to be > restored to original <segments> is a chimera. A bad dream indeed. > ..then Dave's example would need to lose > snippets two and five. Is that right? 1 and 4, and 5 is not needed. But the merger should work with either 3 or 5. > If segmentation is moved to the translation domain > and adjusted at translation time, aren't we saying > that segmentation is really a function of the import > filter (and hence beyond the scope of XLIFF), > just as it would be for any other file format? > I don't need to indicate segmentation in Word or > IDML, so what makes XLIFF different in this regard? Well the "re-" segmentation is moved out of the extraction tool domain. But there is a block-level segmentation too: You do use it in Word and IDML: paragraphs, footnotes, table cell, etc. Likewise that "block"-level segmentation is done by the extraction tool: <unit> hold a "block" which initially is stored in a single <segment>. Why use both <unit> and <segment> initially? Why not just <unit> like in David's snippets 1 and 4? Because it's just more efficient. Maybe forget about block/sentence and see it that way: until it (optionally) gets segmented further that content is the "segment" for that unit. So why shouldn't be in <segment>? Think about Word and pages: your content is in a single page or several: you decide where the page breaks are. Even when there are no page break there is a page. A <segment> is similar to a page. Cheers, -ys


  • 4.  RE: [xliff] XLIFF 2.0 example files for segmentation

    Posted 11-10-2011 15:49
    I'm sorry my first note was so confusing, but I think that the discussion has been good. In my original note, there were 5 different XLIFF examples shown: XLIFF created by an extraction tool (tool A).  2 variations provided depending on whether <segment> was required (core) or not (module). File 1.  <segment> element was not included. File 2.  <segment> was included.  Translatable content of <unit> is the same as the content of <segment>. XLIFF file created and used in translation tool (tool B), created from step A XLIFF file. File 3.  Contains sentence segmented text and translated into Spanish. XLIFF file returned to product after translation. File 4.  Expected output if file 1 was used as input. File 5.  Expected output if file 2 was used as input.. Rodolfo: <source> can be a child of both <unit> and <segment>. David: Yes, if <segment> were not core.  <unit> would have children of either 1 <source> element, or 1+ <segment> and <ignorable> elements.  I was primarily think about simplicity for the creator of XLIFF rather than the complexity of implementing the function in a tool. Yves: A realistic scenario, in my opinion, would be for the third file (translated and sentence-segmented) to come back to the product developer. And the merging tool should be able to work with it. [XLIFF examples] 1 and 4, and 5 is not needed. But the merger should work with either 3 or 5. David: The extraction tool (tool A) and the merge tool are probably the same tool.  They have to have the same processing rules in order to extract and replace the text the same way.  What you are implying is that the extraction tool could create the XLIFF file using only the "core" elements, but the associated merge tool would have to be aware of all of the core and module elements in order to accurately process the translated XLIFF file.  That seems like an unrealistic expectation.  Can a merge tool be expected to handle every possible change to an XLIFF file which any translation tool makes to that file?  That would be difficult to develop and thoroughly test.               Rodolpho: In essence, using <segment> moves segmentation from text extraction domain to translation domain. Tools that create XLIFF files don't need to worry about segmentation as that is something that translators would be able to adjust at translation time. David: If tools that create XLIFF files don't need to worry about segmentation, then why is the <segment> element required.  Requiring <segment> seems to be because it will make later processing (further segmentation) easier for other tools used later in the process.  Will there be other "module" features which would benefit from having a "stub" element added to the core? Rodolfo: The elements that contain segment text and allow freedom in segmentation process  should be part of the core. Infrastructure is a must have, the process is something optional. David: Yes, I can agree that infrastructure is critical.  So you are implying that anything which may be essential for possible future processing must be part of the core?  That may be difficult to use as a criteria for core versus module. David: A couple of general comments: All tools processing an XLIFF file has to assume that all core and module features are completely supported. Is this really the case?  I had not been thinking along this line.  If so, what is the value of having a core and modules if you always have to support all of the modules too? Core. One of the most basic XLIFF functions is when the XLIFF file is created from a non-XLIFF file.  So an idea for how to determine what is "core" or "module" could be based on this: XML elements and attributes required to define the extracted translatable text from a non-XLIFF file so that the translated content can be integrated back into that original file format. David Corporate Globalization Tool Development EMail:  waltersd@us.ibm.com           Phone: (507) 253-7278,   T/L:553-7278,   Fax: (507) 253-1721 CHKPII:                     http://w3-03.ibm.com/globalization/page/2011 TM file formats:     http://w3-03.ibm.com/globalization/page/2083 TM markups:         http://w3-03.ibm.com/globalization/page/2071 Yves Savourel ---11/09/2011 05:34:53 PM---> So the scenario I pulled from Dave's example,  > file with <segment>s is refactored as file From: Yves Savourel <ysavourel@enlaso.com> To: "'XLIFF TC'" <xliff@lists.oasis-open.org> Date: 11/09/2011 05:34 PM Subject: RE: [xliff] XLIFF 2.0 example files for segmentation Sent by: <xliff@lists.oasis-open.org> > So the scenario I pulled from Dave's example, > file with <segment>s is refactored as file > with different <segments>s but has to be > restored to original <segments> is a chimera. A bad dream indeed. > ..then Dave's example would need to lose > snippets two and five. Is that right? 1 and 4, and 5 is not needed. But the merger should work with either 3 or 5. > If segmentation is moved to the translation domain > and adjusted at translation time, aren't we saying > that segmentation is really a function of the import > filter (and hence beyond the scope of XLIFF), > just as it would be for any other file format? > I don't need to indicate segmentation in Word or > IDML, so what makes XLIFF different in this regard? Well the "re-" segmentation is moved out of the extraction tool domain. But there is a block-level segmentation too: You do use it in Word and IDML: paragraphs, footnotes, table cell, etc. Likewise that "block"-level segmentation is done by the extraction tool: <unit> hold a "block" which initially is stored in a single <segment>. Why use both <unit> and <segment> initially? Why not just <unit> like in David's snippets 1 and 4? Because it's just more efficient. Maybe forget about block/sentence and see it that way: until it (optionally) gets segmented further that content is the "segment" for that unit. So why shouldn't be in <segment>? Think about Word and pages: your content is in a single page or several: you decide where the page breaks are. Even when there are no page break there is a page. A <segment> is similar to a page. Cheers, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 5.  RE: [xliff] XLIFF 2.0 example files for segmentation

    Posted 11-10-2011 16:33
    Hi David, all, > What you are implying is that the extraction tool > could create the XLIFF file using only the "core" > elements, but the associated merge tool would have > to be aware of all of the core and module elements in > order to accurately process the translated XLIFF file. No, like you, I certainly think that is unrealistic. I'm simply saying that the merging process should be able to merge the <unit> regardless how many segments it counts. And that is one of the reasons I think <segment> should be part of the core: The segmentation representation is just too important to the localization process to be an optional module. The cost of dealing differently between sentence-segmented file and un-sentence-segmented files because of different representations would be much higher than the cost of forcing all tools to work with <segment>. This is about setting the lowest common supported features for the tools: If we set it too low (i.e. not including segmentation representation) we will have many interoperability problems. Beside, <segment> is justifiable even in the non-sentence-segmented file: it's the element that hold the source and optionally the target of the content. In the initial XLIFF it just happened to be a single one for each <unit> and that may change during the process. It's one specific case in a more generic pattern. Again, maybe the element name causes the confusion. Just replace <segment> with something like <part>. > 1. All tools processing an XLIFF file has to assume > that all core and module features are completely > supported. > Is this really the case? I think not. Like you said, there would be no point in having core and modules then. All tools should support all features of the core obviously. And none of the optional modules should prevent a tool that supports only the work to work. Cheers, -yves


  • 6.  RE: [xliff] XLIFF 2.0 example files for segmentation

    Posted 11-09-2011 23:06
    > In essence, using <segment> moves segmentation from > text extraction domain to translation domain. Tools > that create XLIFF files don't need to worry about > segmentation as that is something that translators > would be able to adjust at translation time. Nicely summarized Rodolfo. -ys