OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only

RE: [xliff] XLIFF 1.0 issues

  • 1.  RE: [xliff] XLIFF 1.0 issues

    Posted 04-16-2002 11:31
     MHonArc v2.5.2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    xliff message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


    Subject: RE: [xliff] XLIFF 1.0 issues


     
    While the idea of Tool-Company may be worth exploring, this for me is an attribute of the tool itself.
     
    From my own experience with commercial tools such as ****** (don't want to get into trouble!), there have been differences in word counts reported in revisions of the same major version of the product anf that's why it's important to capture - in a structured fashion - version information.  This kind of issues is not isolated as tools evolve and content changes but is particularly troublesome when one creates a localisation budget but our partner's budget is 10% more (10% could be a substantial amount of money).
     
    Perhaps we might explore this kind of thing with a view to "XLIFF Certification of Tools" with some kind of logo branding programme - standardisation on naming would be part of this branding - this is not unusual in IT.   HTML editors, for example, stamp their content in a wildly diverse but readable way (i.e., a Javascript can use this fact to react to HTML for specific rendering purposes).
     
    Tools that are certified are registered (alongside including version/service release/service pack information, tools vendor and perhaps some description of the counting algorithm).  Those that are not certified - well, you've just got to take your chances.  This is also a good opportunity to advertise XLIFF...
     
    Just a thought... Might be a little off topic!!
     
    S.
     

     
     
     
     
    S  t  e  p  h  e  n     H  o  l  m  e  s
    Localisation Development Manager
    International Product Development
     
    Voice:  +353 (1) 241 5732
    Fax:     +353 (1) 241 5749
     
    Novell, Inc., THE leading provider of Net business solutions
    http://www.novell.com
    >>> Enda McDonnell <EndaMcD@alchemysoftware.ie> 04/15/02 10:28 >>>
    Hi,

    In the current specification, the tool attribute is free text, the 1.0 spec
    says that it "is used to specify the signature and version of the tool that
    created or modified the document".

    However, this mechanism is a bit loose and open to mis-use. For example, a
    tool may omit the version number. Including tool-name and tool-version
    attributes in the next version would be a better solution.

    Regarding a tools registry, I don't think we could limit the names to a
    standard list. The hope is that as many tools as possible will use this
    xliff format. Is it necessary to have a naming convention for tool names?
    A convention is too easy to ignore, I think the best solution may be to
    introduce another attribute, tool-company. This way a tool can be clearly
    defined as

    tool-company = ACME
    tool-name = Killer App
    tool-version = 4.0

    and not in a confusing manner such as

    tool = ACME Killer
    or
    tool = ACME Ltd. Killer 4
    or
    tool = ACME, Killer App
    or
    tool = ACME Ltd., Killer App 4.0

    I will write this up in more detail and propose these additions to the TC
    for the next release of xliff.

    Enda









    -----Original Message-----
    From: Stephen Holmes [ mailto:sholmes@novell.com]
    Sent: 14 April 2002 11:54
    To: xliff@lists.oasis-open.org ; John Reid
    Subject: Re: [xliff] XLIFF 1.0 issues


    Thanks for the information. Can I ask then, does the tool information
    capture the version of the tool aswell - is it just a free-text
    attribute?

    Reason: The responsibiility to produce the count in the first instance
    is the responsibility of the content parser. As the parser may be
    revised X times to address defects, add functionality etc, can we look
    at a standardised way of specifying the tool/parser and version?


    Is this XLIFF group, or some subcommitte looking at a tools registry,
    i.e., agreed and standard names for the tools out there or some form of
    guidelines for creating these Tool names?

    Finally, is there a plan to integrate/leverage the LISA findings on this
    topic?


    Steve.


    S t e p h e n H o l m e s
    Localisation Development Manager
    International Product Development

    Voice: +353 (1) 241 5732
    Fax: +353 (1) 241 5749

    Novell, Inc., THE leading provider of Net business solutions
    http://www.novell.com
    >>> John Reid < JREID@novell.com > 04/12/02 19:05 PM >>>
    On Point 1:
    [Alt-jr1] The purpose for including the tool as an attribute of the
    <count-group> is so that Tool Y will know that the counts it is about to
    use/update are not theirs. Thus, Tool Y may want to produce its own.
    However, if Tool Y is compatable with Tool X, then it can use Tool X's
    counts. Meanwhile, Tool X can find its own counts and update,
    accordingly.

    <count-group tool="Tool X" name="example">
    <count count-type="untranslated">132</count>
    </count-group>

    This does become complicated, though, when Tool Z is used. Tool Z may
    have a compatibility issue with Tool Y but not ToolX. If Tool Y updated
    Tool X's counts, Tool Z may use those inaccurately.

    [Alt-jr2] There is another solution to this: We already have a <phase>
    element that stores the tool used in that phase. The phase-name
    attribute could be added to <count>. Thus, when that count was produced
    and by what, could be ascertained by any subsequent tool and a
    determination of if to use the count could be made.

    <phase-group>
    <phase phase-name="create" process-name="Translation" tool="Tool X"
    date="2002-04-10T09:41:02Z"/>
    </phase-group>
    .
    <count-group name="example">
    <count phase-name="create" count-type="untranslated">132</count>
    </count-group>

    Then again, with either method, a tool has to update the attribute of a
    count or count-group element and historical data is lost. Thus, adding
    the phase-name or the tool attribute methods have essentially the same
    consequence: Tool Z knows which tool last touched the named count. Using
    a tool attribute has the advantage of being in the scope of the current
    node. The phase-name has the advantage of carrying additional
    information such as date.

    [Alt-jr3] Another alternative is to add a new element to <count-group>,
    such as <update>, that has attributes of tool and date. Thus, multiple
    updates could be recorded for a <count-group>. This would need to be at
    the <count-group> level since we do want to keep the contents of <count>
    to the actual count.

    <count-group name="example">
    <update tool="Tool X" date="2002-04-10T09:41:02Z"/>
    <count phase-name="create" count-type="untranslated">132</count>
    </count-group>

    This solution has the disadvantage that it implies an update to all the
    counts within a <count-group> of which there may be many and only one
    updated. This is also a weakness for adding the tool attribute to
    <count-group>.

    Alt-jr2 can be used to keep historical data since ther is no
    restriction on the number of counts that can be stored within the
    <count-group>. Thus, Tool X can supply a count in phase 1,

    <phase-group>
    <phase phase-name="create" process-name="Translation" tool="Tool X"
    date="2002-04-10T09:41:02Z"/>
    </phase-group>
    .
    <count-group name="example">
    <count phase-name="create" count-type="untranslated">132</count>
    </count-group>

    Tool Y can add an update to it in phase 2,

    <phase-group>
    <phase phase-name="create" process-name="Translation" tool="Tool X"
    date="2002-04-10T09:41:02Z"/>
    <phase phase-name="translate" process-name="Translation" tool="Tool Y"
    date="2002-04-11T11:43:04Z"/>
    </phase-group>
    .
    <count-group name="example">
    <count phase-name="create" count-type="untranslated">132</count>
    <count phase-name="translate" count-type="untranslated">43</count>
    </count-group>

    and Tool Z can update Tool X's count and ignore Tool Z's in phase 3.

    <phase-group>
    <phase phase-name="create" process-name="Translation" tool="Tool X"
    date="2002-04-10T09:41:02Z"/>
    <phase phase-name="translate" process-name="Translation" tool="Tool Y"
    date="2002-04-11T10:42:03Z"/>
    <phase phase-name="review" process-name="Translation" tool="Tool Z"
    date="2002-04-12T11:43:04Z"/>
    </phase-group>
    .
    <count-group name="example">
    <count phase-name="create" count-type="untranslated">132</count>
    <count phase-name="translate" count-type="untranslated">43</count>
    <count phase-name="review" count-type="untranslated">56</count>
    </count-group>


    Thoughts?

    cheers,
    john

    >>> Stephen Holmes < sholmes@novell.com > 4/11/02 4:44:58 PM >>>
    On point 1, I'd just make the comment that the value of adding the
    tool
    that created the wordcount as an attribute is of relatively little use
    if you take a situation where, for example, "Tool X" generates the
    data,
    but "Tool Y" reads it for processing and has different ideas about
    what
    consitutes a word count.

    It's an age old problem in localisation - "Who has the correct word
    count?". As tools may be completely proprietary, even if based on
    XLIFF
    containers, I see no reason in complicating the attribute qualifiers.
    This may become the topic of a subcommitte...

    On point 3 - bear in mind that localisatin/language tools that aspire
    to
    be network-based will find base64 encoded content to be monumentally
    large to transfer. Europe, remember, is still predominantly 56K and
    we
    all remember the hassle involved in FedEx'ing CD's to China - business
    reality supercedes specification.

    Cheers
    Steve.



    S t e p h e n H o l m e s
    Localisation Development Manager
    International Product Development

    Voice: +353 (1) 241 5732
    Fax: +353 (1) 241 5749

    Novell, Inc., THE leading provider of Net business solutions
    http://www.novell.com
    >>> John Reid < JREID@novell.com > 04/11/02 19:02 PM >>>
    Hi All,

    My comments follow Mark's, between <jr>...</jr> tags.

    >>> Mark Levins < mark_levins@ie.ibm.com > 4/5/02 5:59:53 AM >>>

    1. <note> as a child of <count>
    Currently the <count> element is very ambiguous, a note as a child
    element
    could be used to indicate what was being counted, what was considered
    a

    word etc.

    <jr>The <count-group> and <count> elements can be very problematic. A
    <note> element within the <count> element may help in the customized
    support required by these elements but that is a human readable
    approach
    and probably would need to be defined even more to be truly useful. A
    stronger definition of the count element may do more for us.
    <count> has the 'unit' attribute which has recommended values of word,
    page, trans-unit, bin-unit, and item. The latter three are defined
    according to elements within the spec but the former two must be
    defined
    by the tool creating the count. I suggest that we include the tool as
    an
    attribute to the count-group. This would be the same attribute used in
    <file>, <phase>, and <alt-trans>. Further refinement of the 'unit'
    attribute may alo be necessary.</jr>


    2. The <count-group>, <prop-group> and <context-group> elements can be

    used within a <group> without any other relevant child elements
    The 1.0 specification allows that a <group> element can contain (for
    example) a <count-group> without containing anything to count. I think
    the
    <group> element should be changed to contain at least one of <group>,
    <trans-unit> or <bin-unit>.

    <jr>Shouldn't this requirement be placed on the <body> also?</jr>


    3. Binary elements & <internal-file>
    This is kind of a big one. At the moment the specification does not
    define
    the form of the content of the <internal-file> element (although there
    is
    an optional 'form' attribute). The problem is see with this is that
    the

    specification allows users place binary data directly as content -
    this

    binary content may contain the reserved XML characters < > etc which
    will
    cause parsers to choke.
    The CDATA section approach is also not good enough to provide a
    solution.
    My suggestion is that the content of the <internal-file> be restricted
    to
    Base64 or at least stated so.
    Also, the description in the spec for the <internal-file> element
    reads

    "The <internal-file> element will contain the data for the skeleton
    file."
    which is technically wrong, it may also contain data for an
    <bin-source>
    or <bin-target> element.

    <jr>How does CDATA fail this purpose? I wouldn't want to restrict this
    to just Base64; thus, requiring a conversion for both the producer and
    any subsequent processor that may be able to handle the original
    format
    without a problem. Additionally, wouldn't we need an attribute such as
    'original-format' if we forced your conversion?</jr>


    4. mime-type attribute of <bin-source>
    How come this attribute is omitted from the <bin-source> element? Note

    that it is an attribute of <bin-target>

    <jr>We generally put attributes for <source> and <bin-source> in the
    parent, <trans-unt> and <bin-unit>, respectively. The 'mime-type'
    attribute of the target allows a different mime-type for the target in
    cases where it differs from that specified from the <bin-unit>'s.
    Otherwise, the mime-type of the target is unnecessary.</jr>

    Cheers,
    john

    ----------------------------------------------------------------
    To subscribe or unsubscribe from this elist use the subscription
    manager: < http://lists.oasis-open.org/ob/adm.pl >


    ----------------------------------------------------------------
    To subscribe or unsubscribe from this elist use the subscription
    manager: < http://lists.oasis-open.org/ob/adm.pl >

    ----------------------------------------------------------------
    To subscribe or unsubscribe from this elist use the subscription
    manager: < http://lists.oasis-open.org/ob/adm.pl >


    ----------------------------------------------------------------
    To subscribe or unsubscribe from this elist use the subscription
    manager: < http://lists.oasis-open.org/ob/adm.pl >

    ----------------------------------------------------------------
    To subscribe or unsubscribe from this elist use the subscription
    manager: < http://lists.oasis-open.org/ob/adm.pl >


    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [Elist Home]


    Powered by eList eXpress LLC