OASIS XML Localisation Interchange File Format (XLIFF) TC

Expand all | Collapse all

[xliff] XLIFF 2.0 Core

  • 1.  [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 03:00
    In all of this discussion about what should be in the Core, I would like to understand what the audience that the Core is intended for. It is my understanding that XLIFF can be used in these situations: 1. A format that product development can create to provide translatable text to be translated. 2. A format that can be used within a tool set to manage the translation of the content. 3. A format that one tool set can export which would then be imported into another set of tools. In my opinion, the Core could be different for each situation. 1. A development group is going to need the least XLIFF function. All they want to do is extract the text from one file format (this could be a proprietary format) and create an XLIFF file containing that text in such a way that after translation, they can extract the text from the translated XLIFF file and insert it back into their unique file format. They have no need nor knowledge to define segmentation rules, provide alternate translations, etc. They are basically providing the <source> content and maybe some translation comments. If they have previous translation memory information, that would probably be in a TMX format and would be provided as a separate file. 2. A set of tools. If XLIFF is going to be used within a closed set of tools and never will be used outside of those tools, then that application can use as much of or as little of XLIFF functions as they want. If they received XLIFF files for a translation project (like from a development group, item 1 above), would they expect that XLIFF file to contain segmentation rules, alternate translations, etc.? Or is that information they would add within the translation tools they use? 3. Used between 2 separate sets of tools. This is probably an area where there is the most variation, because each tool uses their own subset of the XLIFF functions. Is this the area where most of the comments about creating a Core came from? Thanks for your comments. David Corporate Globalization Tool Development EMail: waltersd@us.ibm.com Phone: (507) 253-7278, T/L:553-7278, Fax: (507) 253-1721 CHKPII: http://w3-03.ibm.com/globalization/page/2011 TM file formats: http://w3-03.ibm.com/globalization/page/2083 TM markups: http://w3-03.ibm.com/globalization/page/2071


  • 2.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 12:11
    Hi David, > It is my understanding that XLIFF can be used in these situations: > 1. A format that product development can create to provide > translatable text to be translated. > 2. A format that can be used within a tool set to manage the > translation of the content. > 3. A format that one tool set can export which would then be > imported into another set of tools. I'm not sure #2 applies. If they keep their data within a unique toolset there is little advantage in using anything but some proprietary format it would seems. They only reason would if they want to be ready for #3. > In my opinion, the Core could be different for each situation. > ... It's an interesting way to look at the core. But I think it quickly boils down to sets of functionalities more than users. For example some users of case #1 may need to provide comments in the extracted documents, while other may not simply because their file format does not have comments facility. Same for segmentation: Some original formats may favor pre-segmented entries (I know of one such case), while other (most) would simply generate unit-level content. So depending on various factors the same category of users may need different features. I think the core can be defined in relation to the implementations: what is the minimal sub-set of XLIFF features a tool that reads XLIFF, makes modifications to it, and writes it back, should support. The more features we can get away with the better. But we have to remain realistic. The minimal main type of operation a tool is likely to do on an XLIFF file is to change the translation. That means it should probably be able to something like the following: - make the distinction between different segments if the content of a unit is segmented - read the source, and detect if it should/can be translated - create the target element if it's not present, or detect the state of the existing translation - understand inline codes in the content - update possible status flag related to the translation - preserve any construct it does not understand Among those actions, already, some may or may not be considered core. For example. Some tools simply do not deal with inline codes. Is that means inline codes should not be part of the core? Or is that mean 2.0 should not recognize such tool as compliant? Personally I think marked-up formats are so prevalent today that even software-string oriented tools should be able to handle inline codes, but that is something we would need to specify in the conformance clauses. It may also be different depending on the type of tool: for instance I don't think we can force producer tools to generate inline codes, but we may want to force consumer ones to understand them. Another example, is the translation status. Should a tool be obligated to update it? If the answer is yes, then such flag should be part of the core. If not, then it should be outside of the core. My current thoughts are that if we can get away with a core that includes the features in the list above we would be already doing well. Today many tools don't even do that. Then the additional features could be grouped in logical modules. If we manage to make them small and well defined, the tools could implement them step by step. In some cases it won't be easy to define what should be the processing expectations for a module. Notes/Comments for example. Is "supporting the notes/comments" module mean a tool should be able to do all or only some of the following actions: - read notes associated to the unit/segment and present them to the user agent - allow the user-agent to edit existing notes (or notes belonging to some categories) - allow the user-agent to create new notes (or notes belonging to some categories) - allow the user-agent to remove existing notes (or notes belonging to some categories) My guess is that we will probably end up with mandatory and optional expectations. But I'm getting away from the subject: defining the core. Cheers, -ys


  • 3.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 12:42
    >


  • 4.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 13:55
    And if my memory is correct, I believe Magnus told me Trados uses XLIFF as its internal storage format. Am I remembering correctly Andrew? - Bryan ________________________________________ From: Rodolfo M. Raya [rmraya@maxprograms.com] Sent: Friday, April 08, 2011 5:42 AM To: xliff@lists.oasis-open.org Subject: RE: [xliff] XLIFF 2.0 Core >


  • 5.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 14:29
    >


  • 6.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 16:16
    Yes, our use of XLIFF is even deeper than just the file level. In all places the content needs to be manipulated or persisted (e.g. during editing by the translator or storing in a database) we are manipulating XLIFF-like structures. The same also applies in SDL (nee Idiom) WorldServer from v. 10 onwards. Regards, Andrew


  • 7.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 15:05
    Yves, thanks for your comments. I think groups of users would required similar functionality. I think we are talking about similar objectives. For tools which generate XLIFF files from original source files (like a development group creating XLIFF files for translation), I would think that those XLIFF files would need: 1. Read the original source file format to strip out the translatable text to be added to the XLIFF file. - A <trans-unit> would be a block of similar text. I once created some internal guidelines for what the content of a <source> element should be. The XLIFF 1.2 specification states, "The <source> element is used to delimit a unit of text that could be a paragraph, a title, a menu item, a caption, etc." What is considered a unit of text will be different based on the non-XLIFF file format being handled. Other ways to think of a unit of text contained in the <source> element of a <trans-unit> element are: A block of text which is formatted the same, like the HTML text associated with a paragraph <p> tag, preformatted text in a <pre> section, or the text for a list item <li> tag. A block of text which the end user will see separately from other text, like a report title, an error message, a menu item or tooltip text. A piece of text which has its own identifier and can be separately retrieved by the program's code. A block of text which is separated by one or more blank lines, which implies the start of a new paragraph. 2. Creating the <target> section would not make sense (other than copying the <source> text to it), because this environment is only working with the source files. 3. Understand inline codes in the content. This should be a requirement. I had sent this information to Arle a while back on this topic, with the idea that he would bright this up to the inline tag committee.: I think the intent of XLIFF is to provide a structured way to represent translatable text from any file format in an easy to process, format-neutral way. A translator is concerned about translating text in a simple-to-understand format. Requiring them to understand multiple file formats creates inefficiencies and reduces quality. The key, in my opinion, is that translator should be able to translate the content of an XLIFF file without having to understand the syntax or characteristics of the original file format. Consider this example: Different ways to represent a replacement variable: User %s has been deleted. User %1 has been deleted. User %1$s has been deleted. User {0} has been deleted. User &user; has been deleted. User #user# has been deleted. User %USER has been deleted. User :user. has been deleted. User [user] has been deleted. XLIFF: Should the translator see all of the above segments as different segments which requires them to know how a variable is represented in the base file format and would result in different translation memory entries? For example, "[user]" in one file format may be a replacement variable, but it may be translatable text in another file format. Or should the XLIFF be something like: User <x id=1> has been deleted. or at least: User <ph id=1>%s</ph> has been deleted. I think that XLIFF inline elements should be used for: 1. Non-translatable items which are imbedded in the translatable text and are unique to that file format. All inline HTML and XML elements. Replacement variables. Special formatting codes. 2. Non-translatable items which if added, removed, or modified will affect the function of the product. HTML <a> tags for links. Replacement variables. David Corporate Globalization Tool Development EMail: waltersd@us.ibm.com Phone: (507) 253-7278, T/L:553-7278, Fax: (507) 253-1721 CHKPII: http://w3-03.ibm.com/globalization/page/2011 TM file formats: http://w3-03.ibm.com/globalization/page/2083 TM markups: http://w3-03.ibm.com/globalization/page/2071 Yves Savourel ---04/08/2011 07:11:41 AM---Hi David, > It is my understanding that XLIFF can be used in these situations: From: Yves Savourel <ysavourel@translate.com> To: <xliff@lists.oasis-open.org> Date: 04/08/2011 07:11 AM Subject: RE: [xliff] XLIFF 2.0 Core Hi David, > It is my understanding that XLIFF can be used in these situations: > 1. A format that product development can create to provide > translatable text to be translated. > 2. A format that can be used within a tool set to manage the > translation of the content. > 3. A format that one tool set can export which would then be > imported into another set of tools. I'm not sure #2 applies. If they keep their data within a unique toolset there is little advantage in using anything but some proprietary format it would seems. They only reason would if they want to be ready for #3. > In my opinion, the Core could be different for each situation. > ... It's an interesting way to look at the core. But I think it quickly boils down to sets of functionalities more than users. For example some users of case #1 may need to provide comments in the extracted documents, while other may not simply because their file format does not have comments facility. Same for segmentation: Some original formats may favor pre-segmented entries (I know of one such case), while other (most) would simply generate unit-level content. So depending on various factors the same category of users may need different features. I think the core can be defined in relation to the implementations: what is the minimal sub-set of XLIFF features a tool that reads XLIFF, makes modifications to it, and writes it back, should support. The more features we can get away with the better. But we have to remain realistic. The minimal main type of operation a tool is likely to do on an XLIFF file is to change the translation. That means it should probably be able to something like the following: - make the distinction between different segments if the content of a unit is segmented - read the source, and detect if it should/can be translated - create the target element if it's not present, or detect the state of the existing translation - understand inline codes in the content - update possible status flag related to the translation - preserve any construct it does not understand Among those actions, already, some may or may not be considered core. For example. Some tools simply do not deal with inline codes. Is that means inline codes should not be part of the core? Or is that mean 2.0 should not recognize such tool as compliant? Personally I think marked-up formats are so prevalent today that even software-string oriented tools should be able to handle inline codes, but that is something we would need to specify in the conformance clauses. It may also be different depending on the type of tool: for instance I don't think we can force producer tools to generate inline codes, but we may want to force consumer ones to understand them. Another example, is the translation status. Should a tool be obligated to update it? If the answer is yes, then such flag should be part of the core. If not, then it should be outside of the core. My current thoughts are that if we can get away with a core that includes the features in the list above we would be already doing well. Today many tools don't even do that. Then the additional features could be grouped in logical modules. If we manage to make them small and well defined, the tools could implement them step by step. In some cases it won't be easy to define what should be the processing expectations for a module. Notes/Comments for example. Is "supporting the notes/comments" module mean a tool should be able to do all or only some of the following actions: - read notes associated to the unit/segment and present them to the user agent - allow the user-agent to edit existing notes (or notes belonging to some categories) - allow the user-agent to create new notes (or notes belonging to some categories) - allow the user-agent to remove existing notes (or notes belonging to some categories) My guess is that we will probably end up with mandatory and optional expectations. But I'm getting away from the subject: defining the core. Cheers, -ys --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 8.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 19:33
    > ... For tools which generate XLIFF files from original source > files (like a development group creating XLIFF files for > translation), I would think that those XLIFF files would need: > 1. Read the original source file format to strip out the > translatable text to be added to the XLIFF file. > ... Yes. And that makes <unit>/<segment>/etc. part of the core. > 2. Creating the <target> section would not make sense > (other than copying the <source> text to it), because this > environment is only working with the source files. Sure. But I still think <target> should be part of the core. It may just not be used by the producer. > 3. Understand inline codes in the content. > This should be a requirement. I had sent this information to > Arle a while back on this topic, with the idea that > he would bright this up to the inline tag committee. > I think the intent of XLIFF is to provide a structured way > to represent translatable text from any file format in an > easy to process, format-neutral way. I agree with you on this. Inline codes support should be a requirement. But some tools like all the gettext-based tools (which are used for many products) don't have such view, at least not with the current PO format. I believe this has led some XLIFF1.2-capable tools to not necessarily support inline codes, or at least not by using an abstract representation for them. They sometimes handle them by directly supporting the specific format's syntax. While we can certainly require a consumer tool to handle inline codes, I wonder how we can we require a producer tool to generate inline codes. It would require to somehow specify what inline codes are in the different formats. Would that means a tool that generate something like this: "<source>Text in &lt;b>bold&lt;/b>.</source>" is not compliant? A lot of those aspects have to do with the filters. How far should XLIFF try to tell the filters what is code and text? I'm interested to see how things like this could be put in the processing expectations or the conformance clauses we attached for example to <source>. On a side note, it would be nice if you could join the inline markup sub-committee David. One teleconference a month (same day/time as the TC but the second week of the month). Next one is next week. http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff-inline -ys


  • 9.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 20:09
    >


  • 10.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 21:49
    > You can't consider PO sources or software document sources > as the main source type for XLIFF files. I didn't. Sorry if I was unclear. I was just giving examples where dealing with inline codes is done different ways by translation tools; so for some tools makers generating abstract inline codes as most of us in the TC do is not something necessarily obvious. And I was wondering if the conformance clauses of 2.0 should make this (creating inline codes) a requirement; and if they should, how to go about it (because it seems not possible to me). I think you answer that question. > As I see it, inline elements are a core part of XLIFF standard. I could not agree more. > If a tool generates "<source>Text in &lt;b>bold&lt;/b>.</source>" from > a document that clearly requires the use of inline elements, users will > hate it and hopefully will not buy that tool. > You cannot force a tool vendor to use or support inline elements. > Nevertheless, you can make inline elements available to all tools. > Implementers and users will decide what they want. I'm not sure I get this: The two statements "inline elements are a core part of XLIFF standard" and "You cannot force a tool vendor to use or support inline elements" seem contradictory. If inline codes are part of the core, then they are part of the minimal set of features a compliant XLIFF tool has to support. Or I have misunderstood what the XLIFF core means? To be clear on the inline codes aspect: I think we cannot force a producer tool to generate inline codes, but I think there would be advantages to force consumer tools to support them. -ys


  • 11.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 22:09
    >


  • 12.  RE: [xliff] XLIFF 2.0 Core

    Posted 04-08-2011 22:20
    > You did not misunderstand what XLIFF core means, > we agree on that part. The problem is with compliance. > We can define compliance of XLIFF files regarding > a schema and a specification document but we > cannot evaluate the compliance of a tool. That's right. The TC had that discussion a while back. And I recall tending to agree with your opinion that it's not really possible to verify how a tool behave, just if the file it produces is conformant. That's a topic different from the technical details of the core and the modules. I'll go back to that. -ys


  • 13.  [xliff] XLIFF 2.0 Core

    Posted 05-06-2011 15:05
    Defining the set of XLIFF elements and attributes which are a part of the "Core" is an important starting point for determining what other "Modules" are being considered for XLIFF 2.0. In my opinion, the most basic use of XLIFF is when product developers are defining the text to be translated for their product. They have only general globalization knowledge, but they do clearly understand the original file format As a starting point, I would like to suggest that the "Core" consists of the following elements and attributes. <xliff> Root element. version. It is necessary to know the version of XLIFF which the file is defined with. <file> datatype. It should be required to define the original file format for the text in this file. original. The file name of the original file that this XLIFF file was created from. If this file was not created from another file, then this could have a required value of something like "none". source-language. The language of the source text must be defined. target-language. If this is a required attribute, then initially it would have the same value as "source-language". The assumption would be that its value would be changed when the file is translated. <body> <trans-unit> Container for one block of translatable text. id. The identifier for this unit of text. <source> Source text. <target> Translated text. For the non-translated version of this file, we should consider whether: a. This element is optional. b. This should be an empty element with no content. c. This should be the same as the <source> section. Inline elements The set of inline elements that the inline sub-committee comes up with. All inline items must be handled by XLIFF inline elements. Example: <?xml version="1.0" encoding="UTF-8"?> <xliff version="1.2"> <file original="example.html" datatype="html" source-lang="en-US" target-language="en-US"> <body> <trans-unit id="1"> <source>Sample document</source> <target>Sample document</target> </trans-unit> </body> </file> </xliff> I look forward to your comments. David Corporate Globalization Tool Development EMail: waltersd@us.ibm.com Phone: (507) 253-7278, T/L:553-7278, Fax: (507) 253-1721 CHKPII: http://w3-03.ibm.com/globalization/page/2011 TM file formats: http://w3-03.ibm.com/globalization/page/2083 TM markups: http://w3-03.ibm.com/globalization/page/2071


  • 14.  RE: [xliff] XLIFF 2.0 Core

    Posted 05-06-2011 15:12
    Dear David,   I think I agree with your idea. In fact what you propose it is almost the same as the “minimal xliff” already defined in xliff 1.2 < http://docs.oasis-open.org/xliff/v1.2/os/xliff-core.html >, I quote it:   A minimal XLIFF document with one entry looks something like this: <?xml version="1.0"?> <xliff version="1.2"> <file source-language="EN" datatype="plaintext" original="file.ext"> <body> <trans-unit id="1"> <source>Hello World!</source> </trans-unit> </body> </file> </xliff>   Lucía   From: David Walters [mailto:waltersd@us.ibm.com] Sent: 06 May 2011 16:00 To: xliff@lists.oasis-open.org Subject: [xliff] XLIFF 2.0 Core   Defining the set of XLIFF elements and attributes which are a part of the "Core" is an important starting point for determining what other "Modules" are being considered for XLIFF 2.0. In my opinion, the most basic use of XLIFF is when product developers are defining the text to be translated for their product. They have only general globalization knowledge, but they do clearly understand the original file format As a starting point, I would like to suggest that the "Core" consists of the following elements and attributes. <xliff> Root element. version. It is necessary to know the version of XLIFF which the file is defined with. <file> datatype. It should be required to define the original file format for the text in this file. original. The file name of the original file that this XLIFF file was created from. If this file was not created from another file, then this could have a required value of something like "none". source-language. The language of the source text must be defined. target-language. If this is a required attribute, then initially it would have the same value as "source-language". The assumption would be that its value would be changed when the file is translated. <body> <trans-unit> Container for one block of translatable text. id. The identifier for this unit of text. <source> Source text. <target> Translated text. For the non-translated version of this file, we should consider whether: a. This element is optional. b. This should be an empty element with no content. c. This should be the same as the <source> section. Inline elements The set of inline elements that the inline sub-committee comes up with. All inline items must be handled by XLIFF inline elements. Example: <?xml version="1.0" encoding="UTF-8"?> <xliff version="1.2"> <file original="example.html" datatype="html" source-lang="en-US" target-language="en-US"> <body> <trans-unit id="1"> <source>Sample document</source> <target>Sample document</target> </trans-unit> </body> </file> </xliff> I look forward to your comments. David Corporate Globalization Tool Development EMail: waltersd@us.ibm.com Phone: (507) 253-7278, T/L:553-7278, Fax: (507) 253-1721 CHKPII: http://w3-03.ibm.com/globalization/page/2011 TM file formats: http://w3-03.ibm.com/globalization/page/2083 TM markups: http://w3-03.ibm.com/globalization/page/2071