OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only

RE: [xliff] RE: How to translate text within G tags?

  • 1.  RE: [xliff] RE: How to translate text within G tags?

    Posted 03-07-2006 14:12
     MHonArc v2.5.0b2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    xliff message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


    Subject: RE: [xliff] RE: How to translate text within G tags?


    Title: Message

    Fredrik,

    Thank you for your insights. I’m using XSLT to extract & merge XLIFF with a skeleton. In fact, for convenience I use XLIFF 1.1 for the skeleton too. The original content is XHTML and XML. The <g> tag seems natural for inline tags and assures that the merged content will be well-formed. I would be concerned that <bpt> and <ept> might not match correctly. For example,

    ORIGINAL SOURCE

    Italic texts starts <i><b>in the middle of

       first sentence</b>. Italics ends after the second sentence.</i>

     

    XLIFF SOURCE

    <source>Italic texts starts <bpt id='i1' ctype='x-html-i'/><btp id='b1' ctype='x-html-b'/>in the middle of

       first sentence<ept id='b1' ctype='x-html-b'/>. Italics ends after the second sentence.</ept id='i1' ctype='x-html-i'></source>

    XLIFF TARGET

       <target>Italic texts starts <bpt id='i1' ctype='x-html-i'/><btp id='b1' ctype='x-html-b'/>in the middle of

       first sentence<ept id='i1' ctype='x-html-i'/><ept id='b1' ctype='x-html-b'/>. <bpt id='i1' ctype='x-html-i'/><btp id='b1' ctype='x-html-b'/>Italics ends after the second sentence.<ept id='i1' ctype='x-html-i'/></target>

    MERGED TRANSLATION

    Italic texts starts <i><b>in the middle of

       first sentence</i></b>. <i><b>Italics ends after the second sentence.</i>

    Notice that <i> and <b> overlap and that a closing <b> is missing even though the contents of the <target> tag are well-formed.

    I share some of the same feelings about the ‘id’ attribute in <trans-unit>. I was using it to reference back to the skeleton and it was not unique. Then I noticed that the ‘xid’ attribute is used to reference <trans-unit> or <bin-unit> elements, which I thought would be tenuous because of the lack of uniqueness. In conversations with Rodolfo, it came to light that the ‘id’ attribute of TU and BU must, in fact, be unique within the <file> element. It’s implied in the spec, but not explicit. Realizing that, I couldn’t use it to reference the skeleton, so I’m left with assigning an arbitrary ‘id’ value because it’s required and either using a custom attribute (XLIFF 1.1+) or overloading ‘resname’ in XLIFF 1.0.

    Perhaps we should change the spec to not require the ‘id’ attribute, similar to <group>. In XSLT, I’m left with using generate-id(), which produces unseemly values like ‘ID3F2D83’ in most XML processors or using an extension to run JavaScript to assign a sequence number, for example, 1, 2, 3, etc.

    I’m using ‘id’ in <g> to reference the skeleton. I’m concerned that segmentation will cause problems with referencing the skeleton. To illustrate, please consider the example from above.

    ORIGINAL SOURCE

    Italic texts starts <i><b>in the middle of

       first sentence</b>. Italics ends after the second sentence.</i>

     

    XLIFF SOURCE

    <source>Italic texts starts <g id='1' ctype='x-html-i'><g id='2' ctype='x-html-b'>in the middle of

       first sentence</g>. Italics ends after the second sentence.</g></source>

    where ‘1’ reference <i> and ‘2’ references <b>.

    XLIFF TARGET SEGMENTED

       <target>Italic texts starts <g id='1' ctype='x-html-i'><g id='2' ctype='x-html-b'>in the middle of first sentence</g></g>. <g id='1' ctype='x-html-i'><g id='2' ctype='x-html-b'>Italics ends after the second sentence.</g></g> </target>

    Note that the tags are duplicated so there are two <g> tags with id=’1’ and two with id=’2’. There are two <g> tags that map to one <i> in the skeleton and two to one <b>. This scenario precludes merging the target text with the skeleton for inline tags that have been duplicated as a result of segmentation or reordering. Perhaps the target text should not be merged with the skeleton, but simply reconstructed. This would be a blending of the minimal (with skeleton) and maximal (no skeleton for inline tags) approach.

    The conversion to TMX seems worth considering too.

    Fortunately, none of these issues seem insurmountable. It’s mostly a matter of clearing up ambiguities as we resolve interoperability issues and establish best practices.

    Regards,

     

    Doug Domeny

    Software Analyst

     

    Ektron, Inc.

    +1 603 594-0249 x212

    http://www.ektron.com

     


    From: Corneliusson, Fredrik [mailto:Fredrik.Corneliusson@lionbridge.com]
    Sent: Tuesday, March 07, 2006 5:15 AM
    To: bryan.s.schnabel@exgate.tek.com; ddomeny@ektron.com; rodolfo@heartsome.net
    Cc: xliff@lists.oasis-open.org
    Subject: RE: [xliff] RE: How to translate text within G tags?

    Hello,
    I just joined and this is my first post!


    My XLIFF experience is mostly as a XLIFF Editor/filter programmer (Transolution). 

    I must say that from my point of view I much prefer the <bpt/ept way of wrapping inline tags, and if the editor has tag checking it's easy to check that they are valid.

    I had the same problem with deciphering the use of <g tag from the spec as Rodolfo, and until I read the "XLIFF 1.2 Representation Guide for HTML" I was hoping I never had to deal with them as containing translatable content. XLIFF is quite a lot to digest and the <g tag really doubles the effort as it breaks the simple logic that can be used on a flat structure for translatable content. Also at some time you will need to convert XLIFF to TMX and then you need to convert it to <bpt/ept anyway. Using ph/bpt/ept gives you a very generic and straight forward approach and you preserve the original source format information exactly as it is and you can treat all formats the same.
     
    That said I can see why people like the <g approach. It's easier to wrap in existing translation tools and process with XSLT, it also looks nicer in a text editor and I suppose lessens the need for skeleton files.
     
    I have implementation question regarding the <g tag, in the XLIFF documentation the specification of the g-tags "id" attribute is different to that of the ph/bpt/ept:
    ph-tag:
    The required id attribute is used to identify the <ph> inline code
    g-tag:
    The required id attribute is used to reference the replaced code in the skeleton file.

     

    Does this mean that there can be <g and <ph tags with the same id in a segment? And what if there is no skeleton file?
     
    This brings me to a general complaint about the XLIFF spec, it is very vague and leaves a lot of room for personal taste and/or misunderstandings. This makes if hard to create a generic editor that works with XLIFF's in the wild.
    For example TU's have required ID attribute but it can be anything and does not even have to be unique, so why is it required in the first place?

     

    Cheers,

    Fredrik Corneliusson

     


    From: bryan.s.schnabel@exgate.tek.com [mailto:bryan.s.schnabel@exgate.tek.com]
    Sent: den 7 mars 2006 01:26
    To: ddomeny@ektron.com; rodolfo@heartsome.net
    Cc: xliff@lists.oasis-open.org
    Subject: RE: [xliff] RE: How to translate text within G tags?

    Hi Doug,

     

    I thought about this when I wrote that portion of the HTML profile.

     

    From a philosophical view, I strongly think I bpt/ept should only be used in XLIFF files that are derived from non-markup formats (RTF, for example).

     

    I really don't like the idea of using bpt/ept on XLIFF files derived from HTML, XHTML, or XML files.  I see "begin paired tag" and "end paired tag" as an artificial device.  It could easily lead to malformed XML on the conversion from XLIFF back to HTML.

     

    Assuming the source file is well formed, it would be a shame to have to delimit inline elements in an artificial way.  If <g tags are defined in the spec in such a way that they are thought to be for non-translatable text, I would vote to either update the specification, or come up with a new element for identifying translatable inline elements in <target elements.

     

    Thanks to Doug and Rodolfo for brining this issue to light,