OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  Fragment Identification

    Posted 12-16-2013 13:14
    Title: Fragment Identification Hi all, I have been looking into the fragment identification proposals made by David, Yves and Fredrik (original subject was "Comments on Fragment Identification"). As I see it there are two ways to go. We can either have David's solution where references can be local to the current unit or absolute from the file level or use prefixes as Yves and Fredrik suggested. For prefixes I would use the following scheme: (From Fredrik's proposal) IRI format: scope separator - '/' prefix separator - '='     (as Yves suggested) prefix - NMTOKEN id - NMTOKEN selector - prefix=id path - #[/}?selector[/selector]* (Again from Fredrik's proposal) Scopes: <file>, prefix 'f', unique within document <group>, prefix 'g', unique within <file> <unit>, prefix 'u', unique within <file> <note>, prefix 'n', unique within parent <file>,<group> or <unit>. Ie one scope per parent container Inline tags in target, prefix 't', unique within its enclosing <unit> Inline tags in source, no prefix, unique within its enclosing <unit>       (as Yves suggested) (Fredrik's examples modified to match above changes) Examples: An absolute reference to note "5" in file "foo.xml" and group "div12": #/f=foo.xml/g=div12/n=5". A relative reference from an inline element to unit 5 in the same file: "#u=5" A reference from within a unit to note 10 in group 7: "#g=7/n=10" A reference to an inline source <ph> tag with id 1 from the same unit: "#1" A reference to unit p40 in file foo.xml from outside the document: "#/f=foo.xml/u=p40" Below are the same examples using David's implementation: An absolute reference to note "5" in file "foo.xml" and group "div12": #foo.xml~div12~5". A relative reference from an inline element to unit 5 in the same file: Relative paths are not allowed in David's scheme (unless local to current unit) A reference from within a unit to note 10 in group 7: "#foo.xml~7~10" A reference to an inline source <ph> tag with id 1 from the same unit: "#1" (local references to source look the same as above) A reference to unit p40 in file foo.xml from outside the document: "#foo.xml~p40" The consequences of each proposal with respect to the quality/functional requirements identified in Fredrik's email: We generally want IRIs: * that are short   - For local referencing there is no difference between the two proposals (except for the prefix on target references)   - The prefix based proposal can produce relative paths which are shorter than David's abolute references but it is not something we would like to encourage (there should not be dependencies between units/files)   - Due to the lack of prefixes David's proposal produces the shortest absolute references but is less readable as a result. * that are descriptive enough to identify what they refer to (hopefully also by humans)   - As far as I can see both proposals are expressive enough to uniquely identify any element within a XLIFF document but David's proposal is less human readable (see above) * that limit what parts of a document need to be parsed / checked / remembered when following them   - As I can see it both proposals require the full XLIFF document to be stored in memory while being parsed. For both schemes there is no way to know if there is a reference to another file in the XLIFF document on inline elements. * that depend on ID scopes that are suitable for stream processing when creating new elements   - I don't see any difference between the two proposals with respect to stream processing * that are able to refer to all core constructs that makes sense   - Again, both proposals look expressive enough to uniquely identify any core constructs I would suggest some changes to David's proposal. For the scope seperator I would suggest "/" instead of "~" as it seems more intuitive (used in XPath). For prefixes I would change from {prefix} to {prefix}= as it seems to make more sense (as Yves said: "#u=123" says clearly "the unit with an id equals to 123"). Using UUIDs to add more file elements to an existing XLIFF document is a processing requirement and seems out of scope for the spec. Having said that, it seems like a small change that enables quite a powerfull operation (e.g. build a corpus of XLIFF files in a single XLIFF document). Changes required would include defining a new attribute for file to hold its unique requirements. There is also an issue with generating UUIDs in different programming languages as not all languages support UUID generation so third party tools would be required in some cases. I may have overlooked some things while writing this up so if there is anything I missed feedback would be greatly appreciated. Regards, Dave


  • 2.  RE: [xliff] Fragment Identification

    Posted 12-16-2013 13:50
    Title: Fragment Identification Hi Dave,   Thanks for the thoughts on the different options. A few notes:   -    Any suggestions for modules/extensions?   -    Just a reminder so that we don’t lose track of it: The difference between David’s proposal and the others is not just syntactic: we would also lose the separation of id scope between units and groups, which in my opinion is a bad thing.   -    Identifiers of <file>: we need to decide once for all if joining XLIFF documents is OK or not (it’s OK (and done) in 1.2). If it is also OK in 2.0 (so far nothing says it is not) then we need to define how it can be done while keeping the <file> identifier unique.   Cheers, -yves     From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of David.O'Carroll Sent: Monday, December 16, 2013 6:14 AM To: xliff@lists.oasis-open.org Subject: [xliff] Fragment Identification     Hi all, I have been looking into the fragment identification proposals made by David, Yves and Fredrik (original subject was "Comments on Fragment Identification"). As I see it there are two ways to go. We can either have David's solution where references can be local to the current unit or absolute from the file level or use prefixes as Yves and Fredrik suggested. For prefixes I would use the following scheme: (From Fredrik's proposal) IRI format: scope separator - '/' prefix separator - '='     (as Yves suggested) prefix - NMTOKEN id - NMTOKEN selector - prefix=id path - #[/}?selector[/selector]* (Again from Fredrik's proposal) Scopes: <file>, prefix 'f', unique within document <group>, prefix 'g', unique within <file> <unit>, prefix 'u', unique within <file> <note>, prefix 'n', unique within parent <file>,<group> or <unit>. Ie one scope per parent container Inline tags in target, prefix 't', unique within its enclosing <unit> Inline tags in source, no prefix, unique within its enclosing <unit>       (as Yves suggested) (Fredrik's examples modified to match above changes) Examples: An absolute reference to note "5" in file "foo.xml" and group "div12": #/f=foo.xml/g=div12/n=5". A relative reference from an inline element to unit 5 in the same file: "#u=5" A reference from within a unit to note 10 in group 7: "#g=7/n=10" A reference to an inline source <ph> tag with id 1 from the same unit: "#1" A reference to unit p40 in file foo.xml from outside the document: "#/f=foo.xml/u=p40" Below are the same examples using David's implementation: An absolute reference to note "5" in file "foo.xml" and group "div12": #foo.xml~div12~5". A relative reference from an inline element to unit 5 in the same file: Relative paths are not allowed in David's scheme (unless local to current unit) A reference from within a unit to note 10 in group 7: "#foo.xml~7~10" A reference to an inline source <ph> tag with id 1 from the same unit: "#1" (local references to source look the same as above) A reference to unit p40 in file foo.xml from outside the document: "#foo.xml~p40" The consequences of each proposal with respect to the quality/functional requirements identified in Fredrik's email: We generally want IRIs: * that are short   - For local referencing there is no difference between the two proposals (except for the prefix on target references)   - The prefix based proposal can produce relative paths which are shorter than David's abolute references but it is not something we would like to encourage (there should not be dependencies between units/files)   - Due to the lack of prefixes David's proposal produces the shortest absolute references but is less readable as a result. * that are descriptive enough to identify what they refer to (hopefully also by humans)   - As far as I can see both proposals are expressive enough to uniquely identify any element within a XLIFF document but David's proposal is less human readable (see above) * that limit what parts of a document need to be parsed / checked / remembered when following them   - As I can see it both proposals require the full XLIFF document to be stored in memory while being parsed. For both schemes there is no way to know if there is a reference to another file in the XLIFF document on inline elements. * that depend on ID scopes that are suitable for stream processing when creating new elements   - I don't see any difference between the two proposals with respect to stream processing * that are able to refer to all core constructs that makes sense   - Again, both proposals look expressive enough to uniquely identify any core constructs I would suggest some changes to David's proposal. For the scope seperator I would suggest "/" instead of "~" as it seems more intuitive (used in XPath). For prefixes I would change from {prefix} to {prefix}= as it seems to make more sense (as Yves said: "#u=123" says clearly "the unit with an id equals to 123"). Using UUIDs to add more file elements to an existing XLIFF document is a processing requirement and seems out of scope for the spec. Having said that, it seems like a small change that enables quite a powerfull operation (e.g. build a corpus of XLIFF files in a single XLIFF document). Changes required would include defining a new attribute for file to hold its unique requirements. There is also an issue with generating UUIDs in different programming languages as not all languages support UUID generation so third party tools would be required in some cases. I may have overlooked some things while writing this up so if there is anything I missed feedback would be greatly appreciated. Regards, Dave


  • 3.  Re: [xliff] Fragment Identification

    Posted 12-16-2013 14:59
    On Mon, Dec 16, 2013 at 1:50 PM, Yves Savourel < ysavourel@enlaso.com > wrote: -    Identifiers of <file>: we need to decide once for all if joining XLIFF documents is OK or not (it’s OK (and done) in 1.2). If it is also OK in 2.0 (so far nothing says it is not) then we need to define how it can be done while keeping the <file> identifier unique. Yves, all allowed modifications are specified in the spec. If someone wants to regroup files between xliff documents, it should be their private concern. AFAIK this is an invalid transformation, <unit> and higher structure is static and set once and forever by the Extractor Would you e.g. say that Mergers must accept xliff files with different files structure? I don't think so.. I think that would be a big mess.. Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie


  • 4.  RE: [xliff] Fragment Identification

    Posted 12-16-2013 15:55
    Hi David, > Yves, all allowed modifications are specified in the spec. You can't possibly define all allowed operations in the specification. Besides there is nothing in the specification that says "All allowed modifications are specified in this document", so I'm not going to argue the point further. > If someone wants to regroup files between xliff documents, > it should be their private concern. So is splitting one huge XLIFF documents into several ones to dispatch it to different translators (also a common operation). That's not explicitly allowed in the specification either, but not being allowed to do it would be laughable. My point is: Splitting or joining XLIFF documents is a relatively common operations that many tools support in 1.x. Why wouldn't they be able to do the same with 2.0? The requirement is that the merger need to get back its document. The main issue with bundling is the identifier of the <file> element. I'll answer Dave's email on that: UUIDs may not necessarily be the only solution. Cheers, -yves


  • 5.  Re: [xliff] Fragment Identification

    Posted 12-16-2013 16:22
    Yves,  people can do to the XLIFF files what the specification allows to do, not to break the guaranteed interchange. Of course, all sorts of other private things can be done with the XLIFF files. However, If you do anything to the static structure of the document, you have to be able to roll it back. It is IMHO out of scope to describe this We have the roundtrip requirement in the Conformance section and it has always been there. e. XLIFF is a format explicitly designed for exchanging data among various  Agents . Thus, a conformant XLIFF application MUST be able to accept  XLIFF Documents  it had written after those  XLIFF Documents  were  Modified  or  Enriched  by a different application, provided that: The processed files are conformant  XLIFF Documents , in a state compliant with all relevant Processing Requirements. The assumption is that an application must take back what it has previously created. Currently this includes a possibly different dynamic or segment structure, codes or markers in the other equivalnet forms etc. Changing file or groups structure is not defined ergo not allowed. It does not mean that you cannot in fact do it. but IF you do you have to undo it before continuing in the "public" roundtrip governed by the spec. Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie On Mon, Dec 16, 2013 at 3:55 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi David, > Yves, all allowed modifications are specified in the spec. You can't possibly define all allowed operations in the specification. Besides there is nothing in the specification that says "All allowed modifications are specified in this document", so I'm not going to argue the point further. > If someone wants to regroup files between xliff documents, > it should be their private concern. So is splitting one huge XLIFF documents into several ones to dispatch it to different translators (also a common operation). That's not explicitly allowed in the specification either, but not being allowed to do it would be laughable. My point is: Splitting or joining XLIFF documents is a relatively common operations that many tools support in 1.x. Why wouldn't they be able to do the same with 2.0? The requirement is that the merger need to get back its document. The main issue with bundling is the identifier of the <file> element. I'll answer Dave's email on that: UUIDs may not necessarily be the only solution. Cheers, -yves --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 6.  RE: [xliff] Fragment Identification

    Posted 12-16-2013 17:28
    Here’s a real world example that I have not seen represented in this discussion so far. It may be pertinent, or it may not. If it is the later, please disregard.   My experience with XLIFF, FWIW, is as a buyer of translations. The sources of the XLIFF files I send out for translation are mainly:   - Web CMS (Drupal source where each discrete node is a <file>) - Web CMS (Drupal source where each PO file is a <file>) - Component CMS (DITA files where each topic is a <file>) - Test and Measurement Equipment (where each discrete instrument’s UI code is an <xliff>) - Test and Measurement Equipment (where each discrete instrument’s firmware code is a <file>)   Unfortunately (or fortunately) I wrote each use case’s extractor/merger code at different points in time, and I’ve use minimalist (using a skeleton) and maximalist (relying on <group>) architecture as it suited me. But the same LPSs leverage and translate all forms.   We have very strict rules about the state of the translated XLIFF that comes back to us from our LSPs. In short, they must have identical structure, down to the <file>, <group>, <trans-unit>, <source>, and <target> level. My merge code simply cannot tolerate or reconcile diversions. We do allow adding, removing, or reordering inline elements.   So finally to my point. Does my story have bearing on this discussion? If so, should XLIFF be able to support my local rule set with my LSP?   If the answer to each is yes, then I tend to think that the spec should not prohibit me from setting up a set of local rules with my LSP as I’ve described. But I do not feel prescribing the manner in which my rules are set up needs to be explicitly part of the spec.   FWIW,   Bryan     From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. David Filip Sent: Monday, December 16, 2013 8:21 AM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] Fragment Identification   Yves,    people can do to the XLIFF files what the specification allows to do, not to break the guaranteed interchange.   Of course, all sorts of other private things can be done with the XLIFF files.   However, If you do anything to the static structure of the document, you have to be able to roll it back. It is IMHO out of scope to describe this   We have the roundtrip requirement in the Conformance section and it has always been there.   e. XLIFF is a format explicitly designed for exchanging data among various  Agents . Thus, a conformant XLIFF application MUST be able to accept  XLIFF Documents  it had written after those  XLIFF Documents  were  Modified  or  Enriched  by a different application, provided that:       i.         The processed files are conformant  XLIFF Documents ,      ii.         in a state compliant with all relevant Processing Requirements.   The assumption is that an application must take back what it has previously created. Currently this includes a possibly different dynamic or segment structure, codes or markers in the other equivalnet forms etc.   Changing file or groups structure is not defined ergo not allowed. It does not mean that you cannot in fact do it. but IF you do you have to undo it before continuing in the "public" roundtrip governed by the spec.     Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie   On Mon, Dec 16, 2013 at 3:55 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi David, > Yves, all allowed modifications are specified in the spec. You can't possibly define all allowed operations in the specification. Besides there is nothing in the specification that says "All allowed modifications are specified in this document", so I'm not going to argue the point further. > If someone wants to regroup files between xliff documents, > it should be their private concern. So is splitting one huge XLIFF documents into several ones to dispatch it to different translators (also a common operation). That's not explicitly allowed in the specification either, but not being allowed to do it would be laughable. My point is: Splitting or joining XLIFF documents is a relatively common operations that many tools support in 1.x. Why wouldn't they be able to do the same with 2.0? The requirement is that the merger need to get back its document. The main issue with bundling is the identifier of the <file> element. I'll answer Dave's email on that: UUIDs may not necessarily be the only solution. Cheers, -yves --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php  


  • 7.  Re: [xliff] Fragment Identification

    Posted 12-16-2013 19:07
    On Mon, Dec 16, 2013 at 5:28 PM, Schnabel, Bryan S < bryan.s.schnabel@tektronix.com > wrote: We have very strict rules about the state of the translated XLIFF that comes back to us from our LSPs. In short, they must have identical structure, down to the <file>, <group>, <trans-unit>, <source>, and <target> level. My merge code simply cannot tolerate or reconcile diversions. We do allow adding, removing, or reordering inline elements. Bryan, this is fine for 1.2 but for 2.0 you should be able to accept back units that have different segment and lower level structure. (what Fredrik has called the dynamic strucuture.) IMHO you should not be able to accept back files that have different unit and higher structure. If your LSP needs for any reason to change the static structure for their processing (which is quite likely, I do not disagree on that) they must be anyway able to return to you the same static structure. How they achieve that is totally up to them. So I argue that we do not comment on this or any similar private transformation as long as the "perpetrator" is able to return files that are compliant with not having performed them.. Does that make sense? Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 http://www.cngl.ie/profile/?i=452 mailto: david.filip@ul.ie