OASIS XML Localisation Interchange File Format (XLIFF) TC

  • 1.  2.0 Binary Data Module Proposal

    Posted 11-16-2012 00:03
    In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.   For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a * real world* implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award J . Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.   We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used * for now * so the we essentially get it on the 2.0 radar. We may want to propose additional features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.   Here is SharePoint’s 1.2 implementation:         <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-source>         <bin-target>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-target>       </bin-unit>           <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-source>         <bin-target>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-target>       </bin-unit>   Please let us know your opinions on the proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)    


  • 2.  RE: 2.0 Binary Data Module Proposal

    Posted 11-29-2012 12:32
    Hi Ryan, All,   The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.   I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.   Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.   If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.   Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.   Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: den 16 november 2012 01:02 To: xliff@lists.oasis-open.org Subject: [xliff] 2.0 Binary Data Module Proposal   In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.   For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a * real world* implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award J . Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.   We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used * for now * so the we essentially get it on the 2.0 radar. We may want to propose additional features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.   Here is SharePoint’s 1.2 implementation:         <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-source>         <bin-target>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-target>       </bin-unit>           <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-source>         <bin-target>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-target>       </bin-unit>   Please let us know your opinions on the proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)    


  • 3.  RE: 2.0 Binary Data Module Proposal

    Posted 12-03-2012 20:55
    Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):     We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here is an example of how that XLIFF might look with a binary module:   <?xml version="1.0" encoding="UTF-8"?> <xliff version="2.0" srcLAng="en-US" tgtLang="de-DE" xmlns:bin="urn:oasis:names:tc:xliff:binary:2.0">     <file id="158" original="example.exe">     <!-- external binary reference -->     <bin:binary id="0" mime-type="image/jpeg">       <bin:source href= />      <bin:target href= />     </bin:binary>     <group id="158">       <unit id="158" name="5" state="initial">         <segment id="158">           <source>Load Registry Config</source>           <target>Load Registry Config</target>         </segment>        <!-- target text was not translated, but dialog size was increased -->         <bin:binary id="158" state="translated" mime-type="windows-resource-dialog">           <bin:source form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAAugBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:source>           <bin:target form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAA0gBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:target>         </bin:binary>       </unit>       <group id="1">         <unit id="1" name="128;WIN_DLG_CTRL_" state="initial">           <segment id="1">             <source>OK</source>             <target>OK</target>           </segment>          <!-- Neither target text nor control size were localized -->           <bin:binary id="1" state="initial" mime-type="windows-resource-control-button">              <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:source>              <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="2">         <unit id="2" name="128;WIN_DLG_CTRL_" state="translated">          <segment id="2">             <source>Cancel</source>             <target>!!!Cancel!!!></target>           </segment>          <!-- target text was translated, but control size was not increased -->           <bin:binary id="2" state="initial" mime-type="windows-resource-control-button">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="1060">         <unit id="1060" name="130;WIN_DLG_CTRL_" state="translated">           <segment id="1060">             <source>Please select a configuration to load from the Registry</source>             <target>!!!!!!Please select a configuration to load from the Registry:!!!!!!</target>           </segment>          <!-- both target text and control size were localized -->           <bin:binary id="1060" state="translated" mime-type="windows-resource-control-static-text">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAHAAcArAAIAAAAAAAAAAABggAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAOAAcAxAAIAAAAAAAAAAABggAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="1021">         <unit id="1021" name="133;WIN_DLG_CTRL_" state="initial">           <segment id="1021">             <source form="base64"><![CDATA[]]></source>             <target form="base64"><![CDATA[]]></target>           </segment>           <!-- target text was not translated, but control size was increased -->           <bin:binary id="1021" state="translated" mime-type="windows-resource-control-combo-box">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAqgCfAAEAAAAAAAABhQAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAxACfAAEAAAAAAAABhQAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>     </group>   </file> </xliff>   One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.   Thanks, Ryan   From: Estreen, Fredrik [mailto:Fredrik.Estreen@lionbridge.com] Sent: Thursday, November 29, 2012 4:32 AM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: 2.0 Binary Data Module Proposal   Hi Ryan, All,   The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.   I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.   Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.   If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.   Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.   Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 16 november 2012 01:02 To: xliff@lists.oasis-open.org Subject: [xliff] 2.0 Binary Data Module Proposal   In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.   For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a * real world* implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award J . Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.   We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used * for now * so the we essentially get it on the 2.0 radar. We may want to propose additional features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.   Here is SharePoint’s 1.2 implementation:         <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-source>         <bin-target>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-target>       </bin-unit>           <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-source>         <bin-target>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-target>       </bin-unit>   Please let us know your opinions on the proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)     Attachment: dialogs.png Description: dialogs.png


  • 4.  RE: [xliff] RE: 2.0 Binary Data Module Proposal

    Posted 12-04-2012 15:07
      |   view attached
    Hi Ryan,   First: Just a note looking at the example (un-related to the module)   You put the state attribute in <unit>, while it should be in ,segment> per the face-2-face agreement ( https://lists.oasis-open.org/archives/xliff/201210/msg00094.html )   It seems those changes in the non-inline parts are not yet in the schema/spec.   Shirley: that should also be the case for the match type. It seems none of the F2F changes have been reflected yet.     Ryan: now a comment of the binary module.   It seems some (at least the first) binary objects are really provided as references, as opposed to resources that need to be modified. I was wondering if there should be a distinction between binary data to edit (like some image) vs binary data as reference.   -yves       From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Monday, December 03, 2012 1:55 PM To: Estreen, Fredrik; xliff@lists.oasis-open.org Subject: [xliff] RE: 2.0 Binary Data Module Proposal   Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):     We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here is an example of how that XLIFF might look with a binary module:   <?xml version="1.0" encoding="UTF-8"?> <xliff version="2.0" srcLAng="en-US" tgtLang="de-DE" xmlns:bin="urn:oasis:names:tc:xliff:binary:2.0">     <file id="158" original="example.exe">     <!-- external binary reference -->     <bin:binary id="0" mime-type="image/jpeg">       <bin:source href= />      <bin:target href= />     </bin:binary>     <group id="158">       <unit id="158" name="5" state="initial">         <segment id="158">           <source>Load Registry Config</source>           <target>Load Registry Config</target>         </segment>        <!-- target text was not translated, but dialog size was increased -->         <bin:binary id="158" state="translated" mime-type="windows-resource-dialog">           <bin:source form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAAugBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:source>           <bin:target form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAA0gBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:target>         </bin:binary>       </unit>       <group id="1">         <unit id="1" name="128;WIN_DLG_CTRL_" state="initial">           <segment id="1">             <source>OK</source>             <target>OK</target>           </segment>          <!-- Neither target text nor control size were localized -->           <bin:binary id="1" state="initial" mime-type="windows-resource-control-button">              <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:source>              <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="2">         <unit id="2" name="128;WIN_DLG_CTRL_" state="translated">          <segment id="2">             <source>Cancel</source>             <target>!!!Cancel!!!></target>           </segment>          <!-- target text was translated, but control size was not increased -->           <bin:binary id="2" state="initial" mime-type="windows-resource-control-button">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="1060">         <unit id="1060" name="130;WIN_DLG_CTRL_" state="translated">           <segment id="1060">             <source>Please select a configuration to load from the Registry</source>             <target>!!!!!!Please select a configuration to load from the Registry:!!!!!!</target>           </segment>          <!-- both target text and control size were localized -->           <bin:binary id="1060" state="translated" mime-type="windows-resource-control-static-text">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAHAAcArAAIAAAAAAAAAAABggAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAOAAcAxAAIAAAAAAAAAAABggAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="1021">         <unit id="1021" name="133;WIN_DLG_CTRL_" state="initial">           <segment id="1021">             <source form="base64"><![CDATA[]]></source>             <target form="base64"><![CDATA[]]></target>           </segment>           <!-- target text was not translated, but control size was increased -->           <bin:binary id="1021" state="translated" mime-type="windows-resource-control-combo-box">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAqgCfAAEAAAAAAAABhQAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAxACfAAEAAAAAAAABhQAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>     </group>   </file> </xliff>   One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.   Thanks, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, November 29, 2012 4:32 AM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: 2.0 Binary Data Module Proposal   Hi Ryan, All,   The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.   I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.   Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.   If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.   Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.   Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 16 november 2012 01:02 To: xliff@lists.oasis-open.org Subject: [xliff] 2.0 Binary Data Module Proposal   In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.   For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a * real world* implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award J . Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.   We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used * for now * so the we essentially get it on the 2.0 radar. We may want to propose additional features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.   Here is SharePoint’s 1.2 implementation:         <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-source>         <bin-target>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-target>       </bin-unit>           <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-source>         <bin-target>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-target>       </bin-unit>   Please let us know your opinions on the proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)    


  • 5.  RE: [xliff] RE: 2.0 Binary Data Module Proposal

    Posted 12-04-2012 16:16
      |   view attached




    Hi Yves, yes, you are correct about the state attribute, my mistake. I remembered it wrong, and that is why it does need to be added to the spec
    J .
     
    The example was really to show how the target XLIFF would look once the source dialog was localized, so the binary data is more than just reference. A binary editor would not only need to know how to read, but
    also write to the binary based on the mime-type. As for determining if a binary should be edited or not, we could follow suit with unit and segment and add an optional translate=”yes no”  attribute.
     
    Thanks,
    Ryan
     


    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Yves Savourel
    Sent: Tuesday, December 4, 2012 7:07 AM
    To: xliff@lists.oasis-open.org
    Subject: RE: [xliff] RE: 2.0 Binary Data Module Proposal


     
    Hi Ryan,
     
    First: Just a note looking at the example (un-related to the module)
     
    You put the state attribute in <unit>, while it should be in ,segment> per the face-2-face agreement ( https://lists.oasis-open.org/archives/xliff/201210/msg00094.html )
     
    It seems those changes in the non-inline parts are not yet in the schema/spec.
     
    Shirley: that should also be the case for the match type. It seems none of the F2F changes have been reflected yet.
     
     
    Ryan: now a comment of the binary module.
     
    It seems some (at least the first) binary objects are really provided as references, as opposed to resources that need to be modified. I was wondering if there should
    be a distinction between binary data to edit (like some image) vs binary data as reference.
     
    -yves
     
     
     


    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Monday, December 03, 2012 1:55 PM
    To: Estreen, Fredrik; xliff@lists.oasis-open.org
    Subject: [xliff] RE: 2.0 Binary Data Module Proposal


     
    Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our
    business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide
    tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):
     

     
    We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls
    contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here
    is an example of how that XLIFF might look with a binary module:
     
    <?xml version="1.0" encoding="UTF-8"?>
    <xliff version="2.0" srcLAng="en-US" tgtLang="de-DE" xmlns:bin="urn:oasis:names:tc:xliff:binary:2.0"> 

       <file id="158" original="example.exe">
        <!-- external binary reference -->
        <bin:binary id="0" mime-type="image/jpeg">
          <bin:source href= />
         <bin:target href= />
        </bin:binary>
        <group id="158">
          <unit id="158" name="5" state="initial">
            <segment id="158">
              <source>Load Registry Config</source>
              <target>Load Registry Config</target>
            </segment>
           <!-- target text was not translated, but dialog size was increased -->
            <bin:binary id="158" state="translated" mime-type="windows-resource-dialog">
              <bin:source form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAAugBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:source>
              <bin:target form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAA0gBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:target>
            </bin:binary>
          </unit>
          <group id="1">
            <unit id="1" name="128;WIN_DLG_CTRL_" state="initial">
              <segment id="1">
                <source>OK</source>
                <target>OK</target>
              </segment>
             <!-- Neither target text nor control size were localized -->
              <bin:binary id="1" state="initial" mime-type="windows-resource-control-button">
                 <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:source>
                 <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:target>
              </bin:binary>
            </unit>
          </group>
          <group id="2">
            <unit id="2" name="128;WIN_DLG_CTRL_" state="translated">
             <segment id="2">
                <source>Cancel</source>
                <target>!!!Cancel!!!></target>
              </segment>
             <!-- target text was translated, but control size was not increased -->
              <bin:binary id="2" state="initial" mime-type="windows-resource-control-button">
                <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:source>
                <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:target>
              </bin:binary>
            </unit>
          </group>
          <group id="1060">
            <unit id="1060" name="130;WIN_DLG_CTRL_" state="translated">
              <segment id="1060">

                <source>Please select a configuration to load from the Registry</source>
                <target>!!!!!!Please select a configuration to load from the Registry:!!!!!!</target>
              </segment>
             <!-- both target text and control size were localized -->
              <bin:binary id="1060" state="translated" mime-type="windows-resource-control-static-text">
                <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAHAAcArAAIAAAAAAAAAAABggAAAA==]]></bin:source>
                <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAOAAcAxAAIAAAAAAAAAAABggAAAA==]]></bin:target>
              </bin:binary>
            </unit>
          </group>
          <group id="1021">
            <unit id="1021" name="133;WIN_DLG_CTRL_" state="initial">
              <segment id="1021">
                <source form="base64"><![CDATA[]]></source>
                <target form="base64"><![CDATA[]]></target>
              </segment>
              <!-- target text was not translated, but control size was increased -->
              <bin:binary id="1021" state="translated" mime-type="windows-resource-control-combo-box">
                <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAqgCfAAEAAAAAAAABhQAAAA==]]></bin:source>
                <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAxACfAAEAAAAAAAABhQAAAA==]]></bin:target>
              </bin:binary>
            </unit>
          </group>
        </group>
      </file>
    </xliff>
     
    One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements
    as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe
    it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.
     
    Thanks,
    Ryan
     


    From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ]

    Sent: Thursday, November 29, 2012 4:32 AM
    To: Ryan King; xliff@lists.oasis-open.org
    Subject: RE: 2.0 Binary Data Module Proposal


     
    Hi Ryan, All,
     
    The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.
     
    I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In
    my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level,
    achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format.
    At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now
    you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual
    translation of text.
     
    Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before
    translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.
     
    If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents
    at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would
    have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.
     
    Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual
    / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate
    with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.
     
    Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations
    would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.
     
    Regards,
    Fredrik Estreen
     



    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: den 16 november 2012 01:02
    To: xliff@lists.oasis-open.org
    Subject: [xliff] 2.0 Binary Data Module Proposal


     
    In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.
     
    For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a * real world*
    implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award
    J . Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.
     
    We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used * for now * so the we essentially get it on the 2.0 radar. We may want to propose additional
    features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.

     
    Here is SharePoint’s 1.2 implementation:
     
          <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">
            <bin-source>
              <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt "
    />
            </bin-source>
            <bin-target>
              <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt "
    />
            </bin-target>
          </bin-unit>
     
     
          <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">
            <bin-source>
              <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>
            </bin-source>
            <bin-target>
              <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>
            </bin-target>
          </bin-unit>
     
    Please let us know your opinions on the proposal.
     
    Thanks,
    Microsoft Corporation
    (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)
     
     







  • 6.  Re: [xliff] RE: 2.0 Binary Data Module Proposal

    Posted 12-15-2012 20:33
      |   view attached
    @Everyone, in case you have input for this feature, please discuss it in this thread. Ryan, thanks for being transparent about MS plans with the binary module. Apparently, although this feature has been voted through on the feature freeze meeting, it seems that many are concerned with this module. @Bryan, would you please make this feature an agenda point on Tuesday? While, I do not think we should be ever revisiting ballots that were duly conducted according to TC process. It seems to me that this feature is in need of prime time discussion to make clear what various stakeholders don't like and give Ryan and Microsoft an opportunity to develop a feature that will not be killed upfront while entering the Committee Draft approval process. Now to the technical aspects: I have voted for this module and I reiterate LRC's suport and plans to make a reference implementation. However, as most modules, this module was not specified at the point of the feature approval (and still has not been specified). The use case I envisioned and the one that is inline with LRC's service oriented achitecture platform is solely bringing a file that could not be parsed for various reasons at the point of the content extraction to a point where it could be extracted in a second run by an arbitrary state-of-the-art parsing service. The use case in SharePoint is clear, while it is easy to extract SharePoint content fields (and even most of the UI elements) to XLIFF core elements in the spirit of the standard, parsing content libraries that can contain anything is beyond the scope of XLIFF implementation in SharePoint and in most Document/Content Management systems. I agree with Joachim and others that it must be clear (and made clear if it isn't) that extraction of content as a binary element is not the same as full fledged extraction into XLIFF core elements. No one says it is AFAIK. And I do not agree that by including a binary data module XLIFF is breaking any promises. 1) Bin-unit was in 1.2 2) There is a clear use case 3) Unlike in 1.2 the binary data is proposed as a module and hence clearly separated from the core functionality [Modules are part of the spec and the only TC warrented way how to address their functionality, but their support is optional if you do not want to process them and ergo support the functionality covered by them.] Opinions will vary what is worse, badly formed html snippets in cdata that appear illegally in XLIFF trans-units (status quo) or having unparsed content clearly separated (and base64 encoded) in a binary module. My firm opinion is that the former is worse and that limited bin-file handling functionality in XLIFF 2.0 can help with the remedy. So while I support the binary module, I support a limited transport capability along the lines of the SharePoint demo given in Seattle and as implemented currently in SOLAS at LRC . [Initial XLIFF file contains a base-64 encoded original file (actually as in internal file reference rather than bin-unit, but having a bin-file would be more handy) until the content is extracted into core elements by a tikal webservice. SOLAS is NOT using XLIFF as a processing format, it is using it as a SOA/ESB message and it is up to any of the (loosely, RESTfully) integrated services whether they will use XLIFF only as a message format or will also directly process it]. I do not support binary data all over the place and I do not support binary dialogs and other UI elements as a regular payload that should make it though the whole roundrtrip. This feels like 90's and I know that Microsoft has up to date XML based methods for automatically resizing UI lelements, plus if there are restrictions needed, the length/storage restriction module (as being worked on by Fredrik) can be used. Other issue with the envisioned extensive usage of binary payload is that this kind of functionality crosses the fine line between a module and a tool specific extension. [Similar issue that Yves pointed out with the validation module (will discuss in the appropriate thread),] It would be impossible to specify a standard, IMPLEMENTATION INDEPENDENT (definitory requirement for any standard) behavior and would be well out of scope of XLIFF TC as a TC dealing with an XML interchange format. Thanks for your attention and talk to you on Tuesday dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Dec 4, 2012 at 4:15 PM, Ryan King < ryanki@microsoft.com > wrote: Hi Yves, yes, you are correct about the state attribute, my mistake. I remembered it wrong, and that is why it does need to be added to the spec J .   The example was really to show how the target XLIFF would look once the source dialog was localized, so the binary data is more than just reference. A binary editor would not only need to know how to read, but also write to the binary based on the mime-type. As for determining if a binary should be edited or not, we could follow suit with unit and segment and add an optional translate=”yes no”  attribute.   Thanks, Ryan   From: xliff@lists.oasis-open.org [mailto: xliff@lists.oasis-open.org ] On Behalf Of Yves Savourel Sent: Tuesday, December 4, 2012 7:07 AM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: 2.0 Binary Data Module Proposal   Hi Ryan,   First: Just a note looking at the example (un-related to the module)   You put the state attribute in <unit>, while it should be in ,segment> per the face-2-face agreement ( https://lists.oasis-open.org/archives/xliff/201210/msg00094.html )   It seems those changes in the non-inline parts are not yet in the schema/spec.   Shirley: that should also be the case for the match type. It seems none of the F2F changes have been reflected yet.     Ryan: now a comment of the binary module.   It seems some (at least the first) binary objects are really provided as references, as opposed to resources that need to be modified. I was wondering if there should be a distinction between binary data to edit (like some image) vs binary data as reference.   -yves       From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Monday, December 03, 2012 1:55 PM To: Estreen, Fredrik; xliff@lists.oasis-open.org Subject: [xliff] RE: 2.0 Binary Data Module Proposal   Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):     We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here is an example of how that XLIFF might look with a binary module:   <?xml version="1.0" encoding="UTF-8"?> <xliff version="2.0" srcLAng="en-US" tgtLang="de-DE" xmlns:bin="urn:oasis:names:tc:xliff:binary:2.0">     <file id="158" original="example.exe">     <!-- external binary reference -->     <bin:binary id="0" mime-type="image/jpeg">       <bin:source href= />      <bin:target href= />     </bin:binary>     <group id="158">       <unit id="158" name="5" state="initial">         <segment id="158">           <source>Load Registry Config</source>           <target>Load Registry Config</target>         </segment>        <!-- target text was not translated, but dialog size was increased -->         <bin:binary id="158" state="translated" mime-type="windows-resource-dialog">           <bin:source form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAAugBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:source>           <bin:target form="base64"><![CDATA[AQABAP//AAAAAAAAAADAAMiAAAAAAAAA0gBEAAgAAAAAAAICAAAAAAACAgAAAAAAHAAAAE0AUwAgAFMAYQBuAHMAIABTAGUAcgBpAGYAAAAAAAAA]]></bin:target>         </bin:binary>       </unit>       <group id="1">         <unit id="1" name="128;WIN_DLG_CTRL_" state="initial">           <segment id="1">             <source>OK</source>             <target>OK</target>           </segment>          <!-- Neither target text nor control size were localized -->           <bin:binary id="1" state="initial" mime-type="windows-resource-control-button">              <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:source>              <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAEAAVBIAC8AMgAOAAIAAAAAAAABgAAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="2">         <unit id="2" name="128;WIN_DLG_CTRL_" state="translated">          <segment id="2">             <source>Cancel</source>             <target>!!!Cancel!!!></target>           </segment>          <!-- target text was translated, but control size was not increased -->           <bin:binary id="2" state="initial" mime-type="windows-resource-control-button">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAVCBAC8AMgAOAAMAAAAAAAABgAAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="1060">         <unit id="1060" name="130;WIN_DLG_CTRL_" state="translated">           <segment id="1060">             <source>Please select a configuration to load from the Registry</source>             <target>!!!!!!Please select a configuration to load from the Registry:!!!!!!</target>           </segment>          <!-- both target text and control size were localized -->           <bin:binary id="1060" state="translated" mime-type="windows-resource-control-static-text">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAHAAcArAAIAAAAAAAAAAABggAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAAAAlAOAAcAxAAIAAAAAAAAAAABggAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>       <group id="1021">         <unit id="1021" name="133;WIN_DLG_CTRL_" state="initial">           <segment id="1021">             <source form="base64"><![CDATA[]]></source>             <target form="base64"><![CDATA[]]></target>           </segment>           <!-- target text was not translated, but control size was increased -->           <bin:binary id="1021" state="translated" mime-type="windows-resource-control-combo-box">             <bin:source form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAqgCfAAEAAAAAAAABhQAAAA==]]></bin:source>             <bin:target form="base64"><![CDATA[AQAAAAAAAAAAAAMBIVAJABkAxACfAAEAAAAAAAABhQAAAA==]]></bin:target>           </bin:binary>         </unit>       </group>     </group>   </file> </xliff>   One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.   Thanks, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, November 29, 2012 4:32 AM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: 2.0 Binary Data Module Proposal   Hi Ryan, All,   The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.   I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.   Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.   If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.   Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.   Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 16 november 2012 01:02 To: xliff@lists.oasis-open.org Subject: [xliff] 2.0 Binary Data Module Proposal   In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.   For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a * real world* implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award J . Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.   We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used * for now * so the we essentially get it on the 2.0 radar. We may want to propose additional features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.   Here is SharePoint’s 1.2 implementation:         <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt target= _blank >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-source>         <bin-target>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt target= _blank >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-target>       </bin-unit>           <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-source>         <bin-target>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-target>       </bin-unit>   Please let us know your opinions on the proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)    


  • 7.  RE: 2.0 Binary Data Module Proposal

    Posted 12-15-2012 22:24
    Hello All,   Since there’s been a lot of discussion on the list about the use case for binary data in XLIFF, I’m resurrecting this thread. We feel it important for a localization interchange standard to handle localizable data extracted from all types of content: documentation, web, UI, etc. Interchange of localizable binary data is essential for UI localization. A source dialog may not only need to have its text localized, but potentially the size, positioning, and directionality of the control that contains it. Please take a look at the mail below for an earlier response to Fredrik regarding the Binary Module for additional details.   I have also attached our draft proposal for the Binary Module to this mail for your further review.   Thanks, Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Monday, December 3, 2012 12:55 PM To: Estreen, Fredrik; xliff@lists.oasis-open.org Subject: [xliff] RE: 2.0 Binary Data Module Proposal   Thanks Fredrik for your suggestions on a binary module. Since Microsoft is both a content provider and a tool implementer dealing in huge amounts of various types of data, this module is very important to our business model. SharePoint’s implementation of supporting file-level binaries only scratches the surface of how it would be implemented. We want to take it to the next level in 2.0 so that we can provide all possible content in XLIFF to our suppliers and provide tools (and allow suppliers to provide tools) that will properly consume binary data, which a good portion of our content contains. Take the following source and target dialogs for example (also attached):     We need to carry all of the information needed to recreate the localized dialog, not just textual data. You’ll see here that not only two strings have been localized, but also the dialog size and two controls contained in the dialog have been localized (resized in this case): the label for “Please select a configuration…” and the drop-down box associated with it. Additionally, we might want to carry around a screenshot as reference for the translator. So, here is an example of how that XLIFF might look with a binary module:   [ryanki] Original example removed. Please see the attached draft proposal for example implementation.   One thing in particular to note: Because dialog controls are often hierarchical, representing them as such in XLIFF would be important so the above example shows a <group> containing both <unit> and <group> elements as siblings. Top-level group <group id=”158”> contains a unit <unit id=”158”> which contains the dialog title and binary data, but <group id=”158”> also other groups, which contain the dialog control text and binary data. We’re not 100% sure, but we believe it would be a change to the spec to allow both a <unit> and a <group> to be defined as siblings under another group.   Thanks, Ryan   From: Estreen, Fredrik [ mailto:Fredrik.Estreen@lionbridge.com ] Sent: Thursday, November 29, 2012 4:32 AM To: Ryan King; xliff@lists.oasis-open.org Subject: RE: 2.0 Binary Data Module Proposal   Hi Ryan, All,   The ballot for this as a feature was already passed, but I’d still like to make some comments and proposals on the implementation.   I personally do not believe that binary data in XLIFF is a good idea, but I do respect the decision of the majority. My concern is that this reverse the expectations that I feel is core to the XLIFF spirit. In my opinion the core idea is that XLIFF should enable tools from multiple creators to facilitate translation of content regardless of what tool was used to create the XLIFF file or what format the actual source document has. This is, at its most basic level, achieved by an initial tool that understand the source format transforming it or extracting translatable content into an XLIFF file. The file can then be further processed (usually translated) by other tools independent of the initial tool and source format. At the end of the processing chain the file is returned to the initial tool (or closely related tool) which create a localized version of the source file. By storing the source file in binary format within the XLIFF document the model is turned around. Now you have an initial tool that has no knowledge of the source format and depend on source format knowledge in the processing chain to get any meaningful work done. This use case would be better served by a translation package format leaving XLIFF to the actual translation of text.   Regarding the concrete proposal I have a few ideas on how to improve it. Binary data will in most cases not be suitable for direct processing by translators, instead it will need a separate extraction step before translation. So I think it would be good to simplify the task of leaving the binary portions out of the file for parts of the processing.   If the <bin-unit> is changed to a <bin-file> and made a sibling of <file> it would be a long step in that direction. Different <file>s in an XLIFF document are largely independent and merging and splitting documents at this level is common. Having the binary data as units possibly mixed into the sequence of text units would probably make it ambiguous if they can safely be removed and the content still be valid. In addition an empty <bin-unit> or other placeholder would have to be left so that the content can easily be re-inserted in the right place. It would also keep the binary data out of the path of other modules such as validation and string length restrictions.   Storing units smaller than files as binary data would make interoperability even harder so I do not think adding <bin-file> in addition to <bin-unit> would be a good idea. I’d prefer just <bin-file>. If the textual / markup portion of the data refer to external binary data some form of reference mechanism might be useful. But I do not see this as a requirement. If it is added I think the option of having the reference point form the <bin-file> to the <unit> it is associate with instead of the other way around would be good. This means that tools that do not make use of binaries would never encounter the reference directly.   Besides a mime type I think an original filename would be very helpful as the file extension is still the most common way to differentiate between formats when dealing with files. And I anticipate that most implementations would save the contents of the binary node into a file and process it in one or more separate steps. No application will or even could have a complete mapping of all mime-types to extensions.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: den 16 november 2012 01:02 To: xliff@lists.oasis-open.org Subject: [xliff] 2.0 Binary Data Module Proposal   In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the second of the two, a Binary Data module.   For those who attended the XLIFF Symposium in Seattle, you were given the opportunity to see a * real world* implementation of the <bin-unit> element in SharePoint using XLIFF 1.2. Bryan Schnabel even advocated giving the SharePoint team an award J . Since there is no equivalent of the <bin-unit> in 2.0, this proposal is to add a Binary Data Module.   We think that the 2.0 implementation could be essentially the same as 1.2 with just the elements and attributes used * for now * so the we essentially get it on the 2.0 radar. We may want to propose additional features after we conduct some reviews with the SharePoint team over the next couple of weeks to get their feedback on any improvements they would like to see.   Here is SharePoint’s 1.2 implementation:         <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-source>         <bin-target>           <external-file href= href= http://sphvm-33449/sites/pub/Translation%20Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt >http://sphvm-33449/sites/pub/Translation Packages/redmond_makoscum/fr-fr-Documents-20121115T0733550000Z-0/fr-fr-Documents-0002.txt " />         </bin-target>       </bin-unit>           <bin-unit id="fab82e10-02f0-4325-8390-bb10ec086bcc" mime-type="text/plain">         <bin-source>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-source>         <bin-target>           <internal-file form="base64"><![CDATA[VGhpcyBpcyBhIHRlc3Qu]]></internal-file>         </bin-target>       </bin-unit>   Please let us know your opinions on the proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)     Attachment: Binary Module.pdf Description: Binary Module.pdf

    Attachment(s)

    pdf
    Binary Module.pdf   322 KB 1 version


  • 8.  RE: [xliff] RE: 2.0 Binary Data Module Proposal

    Posted 12-16-2012 21:01
    Hi Ryan, all, Here are some comments about the proposed binary module: === First, a few notes on 1.2 bin-unit, to place this module into perspective: The bin-unit element has been in XLIFF since the first version and until 1.2. It was meant to allow tools to work with objects such as images, icons, cursors, etc. which can be embedded into resource files (Remember that the earlier version of XLIFF were more resource-oriented than 2.0). From my experience it has been used very rarely so far. The idea was that some object could be edited by a dedicated editor (e.g. a bitmap editor), but the text to translate was treated by normal means. After translation a tool could open the binary data and change the binary object to put the translated text there. This, in my opinion, is a bit different from the intended usage of the binary module. === Now commenting on the proposal itself: Like Fredrik, I'm un-comfortable with the proposed binary module. First I see in the example that the binary element seems to be used for two very distinct purposes: a) providing a screen-shot of the dialog box and b) providing the binary data of the controls so they can be modified. I don't see a problem with illustrative screen shots: it's good to have them. But that should be a different module related to reference material, etc. One can't really have the same processing requirements for both types of binaries. Also a tool might want to support reference screen shot but not 'binary source/target' For the aspect of carrying the data/code that need to be modified: It seems to me that the binary element is used here as a way to transport proprietary un-extracted data/code in a big bag that only tools understanding the format could use. It's not necessarily wrong, but I wonder if that belongs to interoperability. I'm not sure how much different the proposed binary feature is compared to a <file>: you have some kind of original format and extracted text that need to be modified. And the original format needs to be also modified in addition to the text. Aside from possibly carrying a target version of original format, and all things being relative, this proposal does pretty much the same thing as a <file> with skeleton. By carrying the other data needing modification in a binary object, we reduce their accessibility and the potential to do things like leveraging of coordinates in an interoperable way. I do understand the need for extractors to XLIFF and tools editing XLIFF to have a way to carry such non-textual information. I'm simply wondering how long it will take for a binary module like this to conflict with other modules that will want to extend the non-textual information in a more interoperable way (for example a module for the 1.2 coord, font, etc.). === Content of <group>: The current text for <group> says: "One or more <unit> or <group> elements in any order followed by..." And it seems the schema allows <unit> and <group> to be siblings inside a <group>. Tom can confirm or correct this. cheers, -yves


  • 9.  RE: [xliff] RE: 2.0 Binary Data Module Proposal

    Posted 12-18-2012 15:49
    Hello, I've received word that Ryan and Uwe are on holiday and will not attend today's meeting. I think it will be best to have a conversation on what to do next with the binary unit when they are can attend. So rather than having the next steps conversation today. I hope we can summarize the many voices we've heard, and try to prepare points of interest so we might have a very constructive session when all are in attendance. Here's a preliminary framework as I understand it. Perhaps we can flesh this out in order to best come to conclusion: ================================= Notes on binary unit threads - Ryan provided a very comprehensive specification of the proposed module (see the PDF he attached earlier) - Helena commented that the bin-unit expands the scope of XLIFF beyond textual, making it unwieldy (echoed later by Joachim "Besides this operational aspect there is the theoretical aspect that XLIFF should only include properly marked up textual content, as this is part of its original promise.") - Fredrik/Joachim 1) Improper or wide spread use of the bin-unit feature by other implementers, with unrealistic processing expectations. 2) Failure by service staff to discover early in the process the presence of translatable, non-tagged text documents of varying format inside bin-units. - Yves said: 1) Bin-unit has been part of XLIFF from the beginning, though rarely used. Ryan's proposal changes its traditional use 2) It seems to be used for two different purposes, and should focus on one or the other (not one module for two purposes) 3) Why not use <file> and <skeleton>? ================================= Personally, I am very excited with the prospect Yves proposed in 3). After the meeting I will take a look at Ryan's specification, and see if it could be made to work by adding a bit to <skeleton> - perhaps a bin-idref attribute that points to a traditional <trans-unit> for translatable text? Looking forward to a good conversation! - Bryan ________________________________________ From: xliff@lists.oasis-open.org [xliff@lists.oasis-open.org] on behalf of Yves Savourel [ysavourel@enlaso.com] Sent: Sunday, December 16, 2012 1:00 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] RE: 2.0 Binary Data Module Proposal Hi Ryan, all, Here are some comments about the proposed binary module: === First, a few notes on 1.2 bin-unit, to place this module into perspective: The bin-unit element has been in XLIFF since the first version and until 1.2. It was meant to allow tools to work with objects such as images, icons, cursors, etc. which can be embedded into resource files (Remember that the earlier version of XLIFF were more resource-oriented than 2.0). From my experience it has been used very rarely so far. The idea was that some object could be edited by a dedicated editor (e.g. a bitmap editor), but the text to translate was treated by normal means. After translation a tool could open the binary data and change the binary object to put the translated text there. This, in my opinion, is a bit different from the intended usage of the binary module. === Now commenting on the proposal itself: Like Fredrik, I'm un-comfortable with the proposed binary module. First I see in the example that the binary element seems to be used for two very distinct purposes: a) providing a screen-shot of the dialog box and b) providing the binary data of the controls so they can be modified. I don't see a problem with illustrative screen shots: it's good to have them. But that should be a different module related to reference material, etc. One can't really have the same processing requirements for both types of binaries. Also a tool might want to support reference screen shot but not 'binary source/target' For the aspect of carrying the data/code that need to be modified: It seems to me that the binary element is used here as a way to transport proprietary un-extracted data/code in a big bag that only tools understanding the format could use. It's not necessarily wrong, but I wonder if that belongs to interoperability. I'm not sure how much different the proposed binary feature is compared to a <file>: you have some kind of original format and extracted text that need to be modified. And the original format needs to be also modified in addition to the text. Aside from possibly carrying a target version of original format, and all things being relative, this proposal does pretty much the same thing as a <file> with skeleton. By carrying the other data needing modification in a binary object, we reduce their accessibility and the potential to do things like leveraging of coordinates in an interoperable way. I do understand the need for extractors to XLIFF and tools editing XLIFF to have a way to carry such non-textual information. I'm simply wondering how long it will take for a binary module like this to conflict with other modules that will want to extend the non-textual information in a more interoperable way (for example a module for the 1.2 coord, font, etc.). === Content of <group>: The current text for <group> says: "One or more <unit> or <group> elements in any order followed by..." And it seems the schema allows <unit> and <group> to be siblings inside a <group>. Tom can confirm or correct this. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org