OASIS XML Localisation Interchange File Format (XLIFF) TC

RE: [xliff] comments on dtd

  • 1.  RE: [xliff] comments on dtd

    Posted 01-24-2002 00:58
    Hi All,
    
    A review of the spec may clear up some of the points better than  this
    discussion. I understand it wasn't easily available before our meeting
    and some may not have had a chance to review it. I would hope that those
    that haven't already done so to read it. It is now posted at our TC
    website on OASIS. 
    
    I would like to elaborate a little on Yves's answers, as follows.
    
    >>> Yves Savourel <ysavourel@translate.com> 1/23/02 4:56:16 PM >>>
    Thanks for posting those comments David.
    
    I'll try to answer a few of them. Not having working together yet there
    is
    maybe some terms we don't use the same way: if I'm not clear, please,
    let me
    know and I'll try to re-formulate.
    
    
    > 1. document validators - we should have support for W3C Schema,
    Schematron
    and RELAX NG, as well as DTD.
    
    I agree that we should have different ways to specify XLIFF so
    different
    people using different tools can have easy access to it. We can
    probably
    generate some of those schemas (or at leats a base to work from) from
    the
    DTD using converters as Christain showed me yesterday. I guess we
    should
    open the discussion on what schemas to use besides the DTD.
    
    <jr>This isn't a weakness in the spec; the spec simply describes the
    dictionary. The DTD and schema are artifacts of the spec. It is in the
    charter to create a schema; the schema type is unspecified. We could 
    become bogged down in a discussion of schemas when the work at hand is
    to approve or improve the current spec; with the multiplicity of schemas
    we could spend months on this topic alone. It is a good sub-committee
    discussion.</jr>
    
    -----
    > 2. Does not have entities for EXTRACT and MERGE.
    -----
    
    I'm not sure I understand the note. Could you explicit what you call
    'EXTRACT' and 'MERGE'? Maybe the following description of XLIFF with
    regard
    to extraction and merging will help:
    
    An XLIFF document stores initially the result of an extraction. The
    original
    input is split into 2 main streams: the localizable data are in the
    content
    of <source> and in various attributes (coord, etc.). Some original code
    can
    also be encapsulated withing <source> using all the inline elements:
    <bpt>,
    <ept>, <it>, <ph>. The rest of the non-localizable data is stored in
    the
    "skeleton". The skeleton is a separate file that can be either
    referenced
    from the XLIFF document (using the <skl> element with an
    <external-file>
    element), or embedded in a <internal-file> element (still in the <skl>
    element).
    
    The translated file is reconstructed (merged) from the skeleton
    (whereever
    it is located) and the content of the <target> elements (which have
    been
    added during the localization process).
    
    <jr>Specific extract and merge entities/elements have purposely been
    undefined. The method of obtaining localizable data in the XLIFF file
    varies by publisher. Some use databases which contain the localizable
    data and some will use skeleton files. Others may use yet another
    system. Because we don't want to impose process on the publisher, we've
    tried to allow for any process that can produce valid XLIFF. This allows
    for a great deal of flexibility to the publisher. 
    There are elements defined which are available to the publiisher for
    these purposes. From the spec, "The <prop-group> element contains
    tool-specific information used in combining the data with the skeleton
    file or storing the data in a repository." The <prop-group> contains the
    <prop> element, which contains the actual tool-specific data. There is
    also the ts attribute of the following elements: <file>, <group>,
    <trans-unit>, <source>, <target>, <bin-unit>, <bin-source>,
    <bin-target>, <alt-trans>, <mrk>, <g>, <x/>, <bx/>, <ex/>, <bpt>, <ept>,
    <ph>, <it>. From the spec, "The ts attribute allows you to include short
    data understood by a specific toolset." In addition, the <context>
    element allows for information of this nature, also.</jr>
    
    -----
    > 3. Does not have entities for character map used in saved file (from
    translation).
    -----
    
    I see two different meanings here, I'll re-pharse the comment two
    different
    ways to see which one (if any) is the right one:
    
    a) "XLIFF doesn't have a way to indicate what encoding has been used
    for the
    translated text."
    That's true: XLIFF uses any appropriate encoding as defined by XML
    specs.
    The mechanism to indicate the encoding used in the translated XLIFF
    document
    is the standard XML encoding declaration.
    
    b) "XLIFF doesn't have a way to indicate what encoding should be used
    for
    the translated text when merging the text into the original format."
    That's also true: the assumption (maybe incorrect) is that, knowing
    which
    type of format, which language and which platform the text is targeted
    for,
    the merger tool is responsible for using the appropriate encoding
    (possibly
    with the help of the end-user). This is consistent with how most
    current
    localization tools work. We may need to look at this more closely.
    
    <jr>Do you mean XLIFF does not have a mechanism for having different
    encoding of the target from the source? If so, that is true. The
    assumption is that the target and source will both be encoded the same.
    Usually in UTF-8. However, some mechanism for indicating a different
    encoding in the target may be useful.</jr>
    
    -----
    > 4. Target lang should be target+ in 'ELEMENT trans-unit', unless
    that's
    not intended for the whole job. [Inquiry: what is 'ELEMENT trans-unit'
    intended to handle?]
    -----
    
    The <trans-unit> element is the place where the source and one
    translation
    of a given localizable item is stored. An 'item' is not defined beyond
    being
    (most of the time) a run of translatable text. For example it can be a
    string from a Windows RC stringtable group, the value of a key/value
    pair of
    a Java properties file, the content of a <p> element in HTML, the value
    of a
    alt attribute in HTML, etc.
    Actually a <trans-unit> is allowed to have empty <source> and <target>.
    This
    is to hanlde cases where the localizable data is not text but other
    information: coordinates of a control for example, it needs to be
    represented in case some tools provide capability such as resizing,
    etc.
    XLIFF does not address explicitely anything related to segmentation.
    
    XLIFF is intended to handle a source language and ONE target language
    in
    each <file> element. This is a decision that was made very early in
    the
    design of the format, and the structure of XLIFF reflect that
    (otherwise we
    wouldn't have that <source>/<target> pair for example). The main reason
    (as
    far as can recall) was that the advantages of having multilingual
    files
    where not that big to be worth the complication. In addition it seems
    that,
    in some cases, multilingual files even cause problems in the process:
    most
    of the time you have to split the file per translator anyway. I'm sure
    other
    will be able to elaborate why a simple bilingual architecture was
    chosen
    rather than a multilingual one.
    
    The use of "target?" (zero or one target) rather than "target+" (one
    target)
    is there to allow <trans-unit> with only a source text. I think it was
    "target?" at the beginning and we changed it to "target+". Comments
    anyone?
    
    <jr>The <trans-unit> properly allows only zero or one target for any
    <source>. The DTD has it as target?. Alternate translations can be
    stored in the <alt-trans> element which contains target+. The targets in
    the alt-trans can come from a variety of places including translator
    versions and TMs. There is only one allowable target in a trans-unit
    because that is considered the current or final version. The strongest
    argument against multilingual XLIFF (more than one target language) was
    the versioning problem. It would be too difficult to keep the languages
    in sync.</jr>
    
    -----
    > 5. Does not have QC/Proofer captured.
    -----
    
    I think this is captured in the <phase> element. That element is there
    to
    allow tools to flag the progress of the document through the
    localization
    process, and even keep track of the changes through links using the
    phase-name attribute. Maybe someone from the "Status-Flags" sub-group
    can
    address this and give example?
    
    <jr>Yves is quite correct about this. Maybe Tony can give you access
    the to the DataDefinition Yahoo group so that you can see our
    discussions on that topic.</jr>
    
    -----
    > 6. Will need to support non-UTF-8 imported entities (eg. SAE Gen,
    Fordsym,
    TEI)
    -----
    
    I'm not sure if I understand this well. Could you elaborate and maybe
    give
    an example?
    
    
    
    -----
    > 7. Should support SIO, and have more atts needed for inline
    elements.
    -----
    
    Same here. You lost me with "SIO" :) Does it stands for "Serial Input
    Output", "Shift-In (shift)-Out"? Could you elaborate and maybe give a
    few
    examples.
    
    <jr>Please elaborate points 6 & 7.</jr>
    
    Thanks for taking the time to go through this David. Hopefully other
    will be
    able to elaborate my answers and possibly address the points I failed
    (miserably) to understand.
    
    Kind regards,
    -yves
    
    <jr>Thanks for looking this over. I hope this explains some things. We
    need to get everyone access to the discussions on the DataDefinition
    group site.
    
    Cheers,
    John</jr>
    
    
    ----------------------------------------------------------------
    To subscribe or unsubscribe from this elist use the subscription
    manager: <http://lists.oasis-open.org/ob/adm.pl>