OASIS XML Localisation Interchange File Format (XLIFF) TC

[xliff] some comments on xliff names

  • 1.  [xliff] some comments on xliff names

    Posted 02-19-2002 13:44
    While working through the XLIFF DTD, I noticed some general problems that I
    think we should address as soon as possible.  This is not a comprehensive
    list of all problems; rather it's a summary of some problems for which >1
    instance can be found.
    
    1. inconsistent hyphenation:  why is content-type hyphenated but datatype is
    not?
    
    2. inconsistent use of generic and qualified names:  several elements have
    generic "name" attributes; others have qualified names, such as "phase-name"
    
    3. failure to exploit ID datatype for unique attribute values: a number of
    elements have "id" attributes which are documented as being unique
    identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype
    instead of an ID type (which a validating parser can check to guarantee
    uniqueness). 
    
    4. failure to exploit IDREF datatype for references to unique IDs: a number
    of elements have attributes (like "phase-name") which are documented as
    references to other elements' unique identifiers, but the DTD assigns them
    either a CDATA or a NMTOKEN datatype instead of an IDREF type (which some
    XML toolkits will auto-magically resolve for the application program).
    Personal opinion: the function of attributes of type IDREF is easier to
    understand if "ref" is part of their name.  For example, 
    
    <abc id="unique">
    [...]
    <xyz abc-ref="unique">  <!-- it's very clear that the abc-ref attribute
    referes to an instance of abc -->
    
    5. attribute names which are unclear, even in the context of their element.
    Example:  the "file" element has an "original" attribute.  It is not at all
    obvious that the value of original is supposed to be "the name of the
    original file from which the contents of the <file> have been extracted."
    Why not "original-name"  or even "extracted-from" ?  Similarly, the meaning
    of "category" is just as opaque unless you read the associated definition in
    the spec.
    
    6. embedded "little languages": the "coord" attribute defines a little
    language to represent screen coordinates, including a special character for
    null values.  Why foist this on the application programmer when XML can do
    the job for us with attributes like x-coord, y-coord, etc. ?
    
    7. Ambiguous parts-of-speech in naming:  the "clone" attribute has values
    "yes" or "no"  There are (at least) three different ways to interpret its
    meaning:
    
    (a). Is it an imperative as in "yes, this should be cloned" ?  
    (b). Is it a description of state as in "yes, this is a clone" ?  
    (c). Or is it a description of an element's capabilities as in "yes, this
    element may be cloned" ?  
    
    Reading the spec reveals that the answer is (c).  Hence, a better name would
    be "cloneable" which cannot be interpreted as either (a) or (b).
    
    8. terseness leading to confusion: "ctype" is unnecessarily opaque.  Would
    "content-type" really be so onerous?
    
    9. redundancy in attribute names.  The <mrk> element has an attribute
    "mtype" which specifies the type of the marker to which it belongs.  Why is
    this not simply "type" ?  Or, if you don't buy that, why isn't it
    "marker-type" in the same way that the <count> element has "count-type" ?
    
    Eric