While working through the XLIFF DTD, I noticed some general problems that I
think we should address as soon as possible. This is not a comprehensive
list of all problems; rather it's a summary of some problems for which >1
instance can be found.
1. inconsistent hyphenation: why is content-type hyphenated but datatype is
not?
2. inconsistent use of generic and qualified names: several elements have
generic "name" attributes; others have qualified names, such as "phase-name"
3. failure to exploit ID datatype for unique attribute values: a number of
elements have "id" attributes which are documented as being unique
identifiers, but the DTD assigns them either a CDATA or a NMTOKEN datatype
instead of an ID type (which a validating parser can check to guarantee
uniqueness).
4. failure to exploit IDREF datatype for references to unique IDs: a number
of elements have attributes (like "phase-name") which are documented as
references to other elements' unique identifiers, but the DTD assigns them
either a CDATA or a NMTOKEN datatype instead of an IDREF type (which some
XML toolkits will auto-magically resolve for the application program).
Personal opinion: the function of attributes of type IDREF is easier to
understand if "ref" is part of their name. For example,
<abc id="unique">
[...]
<xyz abc-ref="unique"> <!-- it's very clear that the abc-ref attribute
referes to an instance of abc -->
5. attribute names which are unclear, even in the context of their element.
Example: the "file" element has an "original" attribute. It is not at all
obvious that the value of original is supposed to be "the name of the
original file from which the contents of the <file> have been extracted."
Why not "original-name" or even "extracted-from" ? Similarly, the meaning
of "category" is just as opaque unless you read the associated definition in
the spec.
6. embedded "little languages": the "coord" attribute defines a little
language to represent screen coordinates, including a special character for
null values. Why foist this on the application programmer when XML can do
the job for us with attributes like x-coord, y-coord, etc. ?
7. Ambiguous parts-of-speech in naming: the "clone" attribute has values
"yes" or "no" There are (at least) three different ways to interpret its
meaning:
(a). Is it an imperative as in "yes, this should be cloned" ?
(b). Is it a description of state as in "yes, this is a clone" ?
(c). Or is it a description of an element's capabilities as in "yes, this
element may be cloned" ?
Reading the spec reveals that the answer is (c). Hence, a better name would
be "cloneable" which cannot be interpreted as either (a) or (b).
8. terseness leading to confusion: "ctype" is unnecessarily opaque. Would
"content-type" really be so onerous?
9. redundancy in attribute names. The <mrk> element has an attribute
"mtype" which specifies the type of the marker to which it belongs. Why is
this not simply "type" ? Or, if you don't buy that, why isn't it
"marker-type" in the same way that the <count> element has "count-type" ?
Eric