OASIS Darwin Information Typing Architecture (DITA) TC

  • 1.  Spec Clarification Issue: Characters Allowed in Key and Key Scope Names

    Posted 02-26-2016 14:57
    Jarno Elovirta has raised the question of what specific characters are actually allowed by the DITA specification for key names (and by extension, key scope names)? The DITA 1.2 specification says: * Key names consist of characters that are legal in a URI. The case of key names is significant. * The following characters are prohibited in key names: "{", "}", "[", "]", "/", "#", "?", and whitespace characters. This statement is unchanged in DITA 1.3 (but moved to the reference entry for the @keys attribute) The problem here is that "characters that are legal in a URI" is not as precise as perhaps we thought it was. In particular, by "legal" do we mean by characters that are allowed in the URI *string* before the URI is processed to resolve any escaped non-ASCII characters or do we mean any character that may be used in a URI, including characters that must be escaped in the ASCII encoding of a URI? I suspect we intended the latter meaning but Jarno has interpreted it as the former, more-restrictive meaning. There is definitely value in allowing a wide range of characters as keys, e.g., accented characters, characters from Asian and Middle Eastern writing systems, etc. The primary practical concern is string matching--processors have to be able to reliably compare two key names to determine if they are or are not the same. When you allow non-ASCII characters generally you run into issues around how some characters might be composed when those characters can be composed in several different ways per the Unicode spec, e.g., characters that include or can be used with diacritical marks. The XPath specification has lots of language and infrastructure around this issue (and might provide a short path to a solution if we need one). So we need to clarify what the rules for key names are and publish that clarification in some appropriate way. A good option might be to use the XML NMTOKEN definition as the basis for key names, as that already allows pretty much every useful Unicode character and disallows characters we already don't want ( https://www.w3.org/TR/REC-xml/#sec-common-syn ). The main problem I see with NMTOKEN is that it disallows characters that are not explicitly disallowed by the current definition and that are allowed by the conservative interpretation of "legal in a URI", for example, "@" and "=" are allowed for URIs but disallowed in NMTOKEN. So that could be a deal breaker. Of course, we could define key in terms of NMTOKEN plus additional characters. The thing Jarno is asking for is a precise definition of the characters allowed, that is, an explicit list of characters and character ranges. It looks to me like NMTOKEN with additions is our fastest route to a precise definition. Cheers, Eliot ---- Eliot Kimber, Owner Contrext, LLC http://contrext.com


  • 2.  RE: Spec Clarification Issue: Characters Allowed in Key and Key Scope Names

    Posted 02-26-2016 15:04
    I certainly agree that more precision is preferred. I would vote for NMToken plus additional characters (if we really need them) for key and keyscope names. Thanks and best regards, --Scott Scott Hudson Manager, Technical Writing Product Training & Documentation Customer Solutions Jeppesen     Digital Aviation     Boeing 55 Inverness Drive East Englewood, CO 80112 www.jeppesen.com