OASIS XML Localisation Interchange File Format (XLIFF) TC

RE: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

  • 1.  RE: [xliff] From Mat Lovatt: reformat Summary Of Options.doc

    Posted 01-23-2003 18:51

    Excellent proposal Doug!
    It doesn't quite solve modifications
    to data in absence of text but definitely provides a suitable compromise
    for 1.1.

    Mark Levins


    IBM Software Group,
    Dublin Software Laboratory,
    Airways Industrial Estate,
    Cloghran,
    Dublin 17,
    Ireland.
    Phone: +353 1 704 6676
    IBM Tie Line 166676






    Doug Domeny <ddomeny@ektron.com>

    23/01/2003 18:05



    Please respond to
    ddomeny@ektron.com





    To
    xliff@lists.oasis-open.org


    cc



    Subject
    RE: [xliff] From Mat Lovatt:
    reformat Summary Of Options.doc









    Thank  you for the summary, Tony. I agree with the
    options, but I have a few comments  about compatibility and the need
    to retool. And I actually have another option  too.
     
    I  refer to the guideline for minor releases ( http://lists.oasis-open.org/archives/xliff/200208/msg00005.html ).



    Shall be comprised of  small changes that
    would not require re-qualification of supporting tools or  technologies

     
    There  are several aspects to compatibility to consider:
     
    1.  XLIFF 1.0 document validates against XLIFF 1.1
    schema. Given the flexibility of  schemas, it would almost always
    be possible to create a schema that allowed both  1.0 and 1.1 structures.
     
    2.  XLIFF 1.1 tool can process either XLIFF 1.0 or
    1.1 documents without  requiring extensive effort to handle
    XLIFF 1.0 documents.  
     
    3. XLIFF 1.0 tool  can process either XLIFF 1.0 or
    1.1 documents without modification (assuming a  reasonably careful
    implementation).
     
    Aspects #1 and #2 deal with backward compatibility (from
    the tool's  perspective). That is, new tools and new schemas handle
    old data. The issue is  not one of possibility, but of practicality.
    Is it easy to create the  tools?
     
    Aspect  #3 is forward compatibility (from the tool's
    perspective). That is, can the old  tool handle the new data? This
    is similar to asking whether MS Word 97 can read  a MS Word 2000 document
    (allowing for some loss). Another example is whether an  old browser,
    say IE 3, can render a new HTML document, say XHTML 1.0. Again,  allowing
    for some loss for unknown tags. The primary rule for forward  compatibility
    in a browser is, "render the contents of an unknown tag". This
     aspect of forward compatibility is crucial to meeting the guideline
    for not  re-qualifying supporting tools.
     
    XLIFF  tools, however, are not as simple as browsers.
    An XLIFF tool must be able to  modify the contents, not just render
    them. Because the contents must be  modified, the XLIFF tool requires
    more knowledge of the tags. This is why adding  extension points (non
    XLIFF tags) to content within <source> and  <target> has
    been deferred.
     
    Here  are some comments regarding each option listed
    below as they pertain to  "re-qualification of supporting tools
    or technologies".
     
    Option  1 (siblings)
     
    I  believe this is forward compatible, assuming the
    tool doesn't assume that  <target> immediately follows <source>.

    The  other concern is how <target-info>
    appears in <alt-trans> where  multiple <target> elements
    are allowed.
     
    I took  another look at the XLIFF 1.0 DTD. Here are
    the <trans-unit> and  <alt-trans> definitions:
     
    <!ELEMENT trans-unit     (source,target?,(count-group note context-group prop-group alt-trans)*)
     >
    <!ELEMENT alt-trans      (source?,target+,(note context-group prop-group)*)
    >
     
    The  new DTD would be:
     

    <!ELEMENT trans-unit    (source, source-info?,
    target?,  target-info?,(count-group note context-group prop-group alt-trans)*)
     >
    <!ELEMENT alt-trans      (source?,
    source-info?, (target,  target-info?)+, (note context-group prop-group)*)
     >
     
    I think we all have some reservations about  this
    approach because it is awkward to have two source elements and worse yet,
     difficult to match a given <target-info> element with its corresponding
     <target> element.

     
    Option  2 (restructure)
     
    We all  agree this is a clean structure but not compatible.
     
    Option  3 (embedded)
     
    Allow  me to given a different example using a <font>
    tag and a placeholder  tag.
     
    <trans-unit id="Option 1" translate="yes
    >
       <source><font face="Arial"
    size="2">
          </font><ph/>Source
     Text</source>
       <target><font face="Arial"
    size="3">
           </font><ph/>Translated
     Text </target>
    </trans-unit>
     
    The  inclusion of extension points for <source>
    and <target> are deferred  because they introduce unknown tags
    into text that is processed by a TM tool.  This option introduces
    unknown tags to the text content. This option isn't fully  compatible
    because the TM tool will need to ignore <font> and other  unknown
    tags. Granted the unknown tags should come before the rest of the text
     to be translated, but I still do not believe it is forward compatible.
     
     
    Besides, correctly parsing this structure is almost impossible.
    How does  the tool know which tag is the last format tag and which
    is the first inline  "placeholder" tag? Adding more "placeholder"
    tags to the specification would be  impossible because the tool would
    have to assume any unknown tag is a format  tag. This appears to not
    be a viable option.
     
    Option  4 (combined)
     
    This  really isn't technically different than Option
    2 other than to say that the  XLIFF 1.1 schema and XLIFF 1.1 tools
    must support the old XLIFF 1.0 structure as  well as the new structure.
    I do believe the effort is minimal to have the  <source-info>
    and <target-info> tags be optional. However, if they  are present, they
    will likely to break existing XLIFF 1.0 tools that look  for the <source>
    as an immediate child of <trans-unit>. For  instance, my existing XSL
    transforms would need to be updated to support  XLIFF 1.1 documents.
    Therefore, this option isn't fully compatible with 1.0 even  though
    it is backward compatible.
     
     
     
    With  all this said, I went back to determine the
    original purpose for proposing  elements for reformatting. The issue
    is concerning being able to specify which  format values may be modified
    during translation. In XLIFF 1.0, as you know,  there are several
    attributes to specify formatting for the text.  Namely, coord,
    font, css-style, style, and exstyle. The 'reformat'  attribute of
    <trans-unit> is either "yes" or "no" indicating
    whether any  or none of the format attribute values can be changed.
    The changed value is  stored in the <target> tag.
     
    The  problem is that 'reformat' does not give sufficient
    control to be able to say  that some formats may be changed, but others
    cannot. For example, it is allowed  to change the coord-cx, but not
    coord-x or coord-y. The original proposal was to  move each format
    attribute to be elements and each element would have its own  'reformat'
    attribute. This approach is fine except for the compatibility  problems
    that have been discussed at length.
     
    Here's  the new option.
     
    Extend  the possible values for the 'reformat' attribute
    to provide sufficient control.  XLIFF 1.0 presently uses ";"-delimited
    lists within attribute values to store  multiple values. The 'coord'
    attribute is an example. It's value is actually  four: "x;y;cx;cy",
    where "#" can be used for 'don't  care'.
     
    So  let's extend 'reformat' the same way. Of course,
    we keep "yes" and "no" for  compatibility.
     
    "yes"  = all format attributes may be changed
    "no" =  no format attributes may be changed
    ...or  a semicolon-delimited list of the following
    in any order. If an attribute is  listed, it means it may be reformatted.
    coord  = all 4 coords
    coord-x
    coord-y
    coord-cx
    coord-cy
    font =  all 3 font values
    font-name
    font-size
    font-weight
    css-style
    style
    exstyle
     
    Example,
     
    <trans-unit coord="#;#;183;272" font="Arial;2;normal"
     reformat="coord-cx;font-name" ...>
        <source>...</source>
        <target coord="#;#;181;272"
     font="System;2;normal">...</target>
       <alt-trans coord="#;#;183;272"
     font="Arial;2;normal">
           <target coord="#;#;180;272"
     font="Arial Bold;2;normal">...</target>
           <target coord="#;#;185;272"
     font="Arial, Helvetica;2;normal">...</target>
       </alt-tran>
    </trans-unit>
     
    Parsing the reformat list is fairly easy, even with XSLT,
    which has a  limited set of string functions.
     

    This  option is 100% compatible, both forward and
    backward. It does not affect the  structure at all. The only problem
    I can foresee an XLIFF 1.0 tool having is if  an invalid value for
    reformat is assumed to be "yes" instead of "no" and
    allows  some values to be changed that should. That is, an XLIFF 1.0
    tool could  interpret a value of "coord-cx;font-name" as
    "no" and not allow any of the  format value to change. Of
    course, if it assumed "no" instead of "yes" it would
     not allow any changes. Since the default value for 'reformat' is
    "yes", I don't  see either of the possibilities as being
    too harmful.

    Regards,

    Doug Domeny

    Ektron, Inc.
    +1 603  594-0249
    http://www.ektron.com