OASIS Darwin Information Typing Architecture (DITA) TC

 View Only

Re: [dita] Namespace resolution

  • 1.  Re: [dita] Namespace resolution

    Posted 08-24-2004 16:06
     MHonArc v2.5.0b2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    dita message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


    Subject: Re: [dita] Namespace resolution


    Erik Hennum wrote:
    
    > 2.  Vocabulary processing
    > 
    > Content is validated and processed against the vocabulary as a whole rather
    > than against the individual specialization modules integrated by the
    > vocabulary.
    
    I think that in fact content can be validated both against its direct 
    governing vocabulary and against the individual modules from which the 
    vocabulary is composed. If the vocabulary is a "DITA application" then 
    at most the vocabulary can only add constraints to the specialization 
    modules, which means that content that might be invalid with respect to 
    the overall vocubulary might still be valid with respect to a particular 
    module.
    
    For example, if a processor is a generic DITA processor it will only 
    care about validation against the specialization packages because it has 
    no knowledge of any other rules (knowing only about the DITA-defined 
    rules).
    
    I don't think it's necessary to make absolute statements about 
    validation--validation is a type of processing that serves specific 
    business tasks and requirements. The most we can say with certainty is 
    what can and cannot be validated using a particular technology (e.g., 
    DTDs, schemas, Schematron, one-off validation applications, human 
    inspection, and so on).
    
    It is important to identify the types of validation that are possible 
    and when such validation can or should be done.
    
    > For instance, when resolving a conref, processes are obligated to check
    > whether the vocabulary for the referencing document includes the
    > specialization modules for the elements in the referenced content.
    > Otherwise, the conref could include invalid elements.
    
    I understand the motivation for this statement but it's not that simple, 
    at least in the general case of doing transclusion (the current DITA 
    definition of conref may be more constraining, possibly too constraining).
    
    In the general case, whether a given transcluded element is "valid" or 
    not is entirely a function of the thing that is processing the result. 
    For some applications any combinations will be valid (meaning they can 
    be meaningfully processed). For other applications only exact element 
    type matches are valid. And for others, transclusion of any compatible 
    class of the referencing element is valid (e.g., any class within the 
    same specialization hierarchy).
    
    > If every element can be processed in isolation, the specialization modules
    > can provide complete processing.  If the processing requires contextual
    > sensitivity, however, the vocabulary has to be able to affect the
    > processing. After all, the vocabulary controls the context.
    
    I'm not sure I fully understand the use of "processing" in this case. 
    That is, by "vocabulary" do you mean "application" (as I've been using 
    it)? I'm trying to keep a clear distinction between the "vocubulary", 
    which is the definition of the set of types, and the "application", 
    which is the combination of a vocabulary and a set of business rules 
    that define how elements in the vocabulary must or can be processed. 
    It's a subtle distinction but I think it's important to make in XML 
    because of the fact that XML content (and by extension, XML document 
    constraint specifications such as DTDs and schemas) are entirely 
    declarative and provide no *processing* specifications. Processing 
    specifications are entirely in the domain of prose and software 
    component implementations (e.g., style sheets, Java objects, etc.).
    
    Or more simply: vocabularies define content constraints, applications 
    define processing for the content. There may be multiple applications 
    associated with a single vocabulary.
    
    > For instance, in one domain, I've specialized section as backgroundSection
    > so my topics can include background content.  In another domain, I've
    > specialized title as safetyInstructionTitle so I can include safety
    > instructions as either a topic or section.  I now create a vocabulary that
    > integrates the two domains, so I can have background sections that provide
    > safety instructions.  In the same way that a term within a dlentry has
    > processing expectations, a backgroundSection that contains a
    > safetyInstructionTitle could have processing expectations (perhaps of
    > isolation, implemented as a sidebar for some outputs).  Only the vocabulary
    > can specify the processing expectations for the combination of the two
    > elements.  After all, the background and safety modules might be supplied
    > by designers who are completely unaware of one another's specializations.
    
    To apply my terminology: I would say "Only the application can specify 
    the processing expectations for the combination...."
    
    > Note that this processing expectation is part of the semantics of the
    > vocabulary.  Different applications may realize those processing
    > expectations in different ways.
    
    And here, just to continue to be pedantic, I would use the term 
    "processors" instead of "applications". That is, I'm trying to use the 
    term "application" in the sense originally used in the SGML standard 
    (the set of rules associated with a document type) not in the sense of 
    "a set of software components that perform a task".
    
    I realize it's hair splitting to a degree but there is so much potential 
    confusion and so much abstraction that without very precise terminology 
    misunderstanding is a certainty.
    
    > 3.  Element polymorphism
    
    > We don't want to limit processing of DITA content, however, to
    > DITA-sensitive applications -- especially where existing vocabularies are
    > being retrofitted as DITA vocabularies.  For DITA-insensitive applications,
    > the declared element type is everything and the class attribute is nothing.
    
    Remember I'm not saying that, for example, the element types in the 
    DITA-supplied reference schemas should be arbitrary--far from it. I'm 
    just pointing out that in DITA-based applications there need be no 
    general constraint on element type names. At a minimum we can say that 
    element type names may or may not be namespace qualified. Or we can say 
    that fully-conforming DITA processors must use DITA class attribute 
    values to apply DITA processing semantics to elements, meaning that 
    element type names are unconstrained.
    
    I don't think as standards writers we need to mandate the 
    interchangability of document instances--it is sufficient to define a 
    mechanism by which instances can be maximimally interchangable, which 
    would be by having all element type names be the same as DITA-defined 
    class names.
    
    > In addition, the declared element type is displayed to human readers of the
    > content to guide their understanding of the semantics of the content.
    > 
    > Because the actual element name is important for these purposes, the DITA
    > architecture mandates support for generalization and respecialization
    > operations to change the declared element type.
    
    I'm not sure I understand this comment. Element type names are important 
      but they are not important *to the DITA standard*. They are important 
    to designers and implementors of DITA-based applications.
    
    Remember that the DITA-defined specialization packages define element 
    *classes* not element types--element types only exist in DITA-based 
    vocabularies. So while the DITA standard will define a set of classes 
    whose names must be carefully thought out, the element type names used 
    in a given DITA-based application are still arbitrary.
    
    >>
    >>5. DITA applications in which element type names are qualified with
    >>their corresponding package namespaces. This is possible for the same
    >>reason (4) is possible: element type names are arbitrary.
    > 
    > 
    > Would the root element for the DITA content have to declare both the
    > namespace for the vocabulary and the namespace for the element's
    > specialization module?
    
    Yes, assuming the specialization module is not a "magic" DITA-defined 
    core module.
    
    > For instance, how would a specialized topic declare both the namespace for
    > its specialization module and the namespace for the vocabulary that's
    > combining it with other topic types and domains?  As in the illegal:
    > 
    > <specializedTopic
    >     xmlns="http://some.org/dita/vocabulary/specializedVocabulary";
    >     xmlns="http://some.org/dita/module/specializedTopic";
    >    class="- topic/ph
    > http://some.org/dita/module/specializedTopic#specializedTopic ">
    
    The specialization modules must be associated with a prefix, so there 
    can never be a conflict with the document's defaul namespace. Thus your 
    example should be:
    
     > <specializedPh
     >     xmlns="http://some.org/dita/vocabulary/specializedVocabulary";
     >     xmlns:module1="http://some.org/dita/module/specializedTopic";
     >    class="- topic/ph
     >             module1/specializedPh">
    
    Remember that for the purpose of determining whether a given namespace 
    is "in scope " for an element you only need to examine the declarations 
    and you don't care what the prefixes are. That is, if my application is 
    going to examine the above element to see if the "specializedTopic" 
    namespace is in scope I would simply examine all the namespace 
    declaration attributes to see if any of them contain the expected URI.
    
    >>2. The namespace prefixes for the core DITA packages are "magic" and
    >>must be use used as-is in class attribute values in DITA 1.0. This
    >>avoids any requirement for DITA 1.0 processors to have to be prepared to
    >>dereference core package names to namespace URIs.
    >>
    >>3. The DITA 1.0 spec can *discuss* the other ways in which namespaces
    >>_can_ be used in conforming DITA applications without actually doing it
    >>requiuring it or doing it in the oasis-provided DTDs and schemas.
    > 
    > 
    > It's an inspired compromise for 1.0 to treat specialization module
    > qualifiers as magically bound to namespaces that aren't actually declared
    > on the element.  I'd like to see it applied to both core DITA and non-core
    > specialization modules so we don't have a two-tier typing scheme.
    
    I assume by "two-tier typing schema" you mean a typing scheme applied to 
    some modules but not others?
    
    I agree, consistency seems to be paramount here.
    
    I think there is very little risk in having module prefixes bound to 
    namespaces since it doesn't affect existing processors in any way and it 
    doesn't affect element type naming or schema construction, apart from 
    requiring the namespace declarations, which is standard XML syntax and 
    doesn't change the processing any tool would do (that is, 
    namespace-aware processors will handle the declarations as they would 
    anyway and namespace-unaware processors will continue to ignore them).
    
    I don't see how making module prefixes globably unique names can be 
    controversial.
    
    >>>In principle, I agree strongly.  In practice, my concern is that, to
    >>>implement this approach, we have to solve problems like swapping
    > 
    > namespaces
    > 
    >>>in and out of the class attribute during generalization and
    >>>respecialization.
    >>
    >>I'm not sure I understand this comment: the value of the class attribute
    >>is (conceptually) just a list of namespace prefixes that map to the URIs
    >>for packages. The class attribute value need never change.
    > 
    > 
    > Sorry, I was obscure.  The class attribute doesn't change, but the
    > namespace on the element would have to change during generalization and
    > respecialization.
    > 
    > For instance, here's the element before generalization
    > 
    > <specializedPh
    >     xmlns="http://some.org/dita/module/specializedDomain";
    >    class="- topic/ph
    > http://some.org/dita/module/specializedDomain#specializedPh ">
    > 
    > and after generalization
    > 
    > <ph
    >    class="- topic/ph
    > http://some.org/dita/module/specializedDomain#specializedPh ">
    > 
    > If the namespace isn't changed, the element will be in either no namespace
    > or the wrong namespace and thus won't be valid.
    
    By generalization I assume you mean "generating a new instance whose 
    element types are superclasses of the original input element."
    
    I think there is confusion about where the namespace is applied, as 
    discussed above.
    
    The namespace for the specialization package is *never* the namespace of 
    the element type. Therefore, during generation of a generalized instance 
    you would be rewriting the class attribute and, presumably, changing the 
    element type name. It would not be necessary to remove declarations of 
    names spaces not used in the generalized instance but you could if you 
    wanted to. That is, there's no problem with declaring namespaces that 
    are never used.
    
    So I think your example could be:
    
      <specializedPh
          xmlns:module1="http://some.org/dita/module/specializedDomain";
         class="- topic/ph
                  module1/specializedPh">
    
      and after generalization
    
      <ph
          xmlns:module1="http://some.org/dita/module/specializedDomain";
          class="- topic/ph">
    
    Note that the class= attribute has been rewritten but the namespace 
    declaration has not been removed. But it could be:
    
      <ph
          class="- topic/ph">
    
    Both "ph" instances are equivalent and would be processed the same way.
    
    I should note also that this notion of the generation of literal 
    generalized instances is not something that the DITA specification needs 
    to define--this type of processing is simpl one of many types of 
    processing that might be applied to DITA documents and the ability to do 
    it is inherent in the nature of the specialization mechanism.
    
    >>
    >>As long as this is always the case then the element type name is simply
    >>irrelevant for the purpose of DITA-based processing. That is, from a
    >>DITA perspective, the element type name is, by definition, a synonym for
    >>the element's class name.
    > 
    > 
    > Agreed, for DITA-sensitive applications, the element name is irrelevant.
    > 
    > DITA content also should be processable, however, by DITA-insensitive
    > applications. For those applications and as well as for human consumption,
    > the DITA architecture needs to support changing the element name --
    > effectively, casting to a different declared type.
    
    I'm still not understanding this comment: the DITA specification can 
    only define processing in terms of class values. The element type name 
    value is simply outside the scope of the DITA architectural mechanism.
    
    We can state that there is a class of simple DITA processors that expect 
    element type names to be the same as DITA-defined class names and that 
    in order to satisfy such processors one is encouraged to make element 
    type names the same as leaf class names but I see no reason to require 
    that so I see no reason for the architecture mechanism to say anything 
    about element type names at all.
    
    > 
    >>...
    > *  By the namespace on the root element if the namespace matches that of a
    > known DITA vocabulary
    
    A document need not be rooted at a DITA element. A document is a DITA 
    documente if *any element* is derived from a DITA-defined type. A 
    document is a "DITA-only document" if its root is a DITA-defined type 
    and all elements are likewise derived from DITA-defined types.
    
    > Regardless of whether the class attribute is namespaced, wouldn't these
    > tests have to be performed anyway and in the same way?
    > 
    > That is, couldn't a content management system such as XIRUSS-T use the
    > following approach?
    > 
    > 1.  Is a namespace declared on the root element?  If so, match the known
    > namespaced vocabularies including the known DITA vocabularies.
    
    Yes, but not limited to the root element--declared anywhere within the 
    document.
    
    > 2.  Is a DTD declared for the document?  If so, match the known
    > vocabularies with public identifiers including the DITA vocabularies.
    
    XIRUSS-T, like MS Word, doesn't do anything with DTDs. So no, this 
    wouldn't work. In any case, this is not reliable because the external 
    DTD subset is not 100% reliable way to determine the true document type 
    of a document.
    
    > 3.  Is a Schema declared for the document?  If so, match the known
    > vocabulary declarations including the DITA vocabulary declarations.
    
    If the schema is bound via the nonNamespaceSchemalocation then it is no 
    better than an external DTD subset.
    
    If the namespace is declared by schemaLocation then there is also a 
    namespace declared and I don't need to look at the schema.
    
    > 4.  Prompt the user for known vocabularies including the DITA vocabularies.
    
    This is reliable to the degree the user can answer the question 
    accurately but is not general in the sense that it does not support the 
    use case of generic processors acting on documents without further input.
    
    > If a namespace on the class attribute doesn't reduce the number of tests
    > needed to match content with a handler, would it make sense to defer
    > namespacing the class attribute until the full namespace solution is
    > specified?  That way, we keep our options open in case something else in
    > the solution makes it unnecessary to namespace the class attribute?
    
    I don't think so. Part of the point of namespacing the class attribute 
    is to ensure that the DITA class attribute can always be distinguished 
    from other class attributes. For example, I have existing document types 
    that have a class attribute--if I wanted to retrofit those to use DITA 
    as their underlying architecture I would have to change one attribute 
    name or the other. Therefore, requiring the class attribute to be 
    qualified ensures that at minimal cost.
    
    Remember too that attributes, by definition, are not in any namespace 
    unless they are qualified. That is, putting an element in a DITA-defined 
    namespace *does not* put the attributes of that element in a 
    DITA-defined namespace. Many specifications ignore this but nevertheless 
    it is the case.
    
    Thus if the class attribute is not qualified it cannot be reliably 
    recognized as being the DITA class attribute.
    
    Qualifying the class attribute, and only the class attribute, also 
    ensures that documents are bound to a DITA-defined namespace without 
    constraining or further complicating any other processing or requiring 
    the declaration of namespaces used only within class attribute values 
    (which will usually be limited to "magic" DITA-defined prefixes).
    
    That is, I don't see any great risk to qualifying the class attribute 
    and much benefit from doing it. Doing this would not in any way affect 
    how we might use namespaces in the future for either class attribute 
    values or element type names.
    
    > For instance, if in 2.0, the namespace for the base DITA topic module ends
    > up declared in the class attribute value, would declaring the namespace on
    > the class attribute itself become redundant?
    > 
    > <ph class="- http://dita.oasis-open.org/modules/topic#ph ">
    
    I never intended that module namespaces be declared in the class 
    attribute--there are number of syntactic reasons why this would be a bad 
    idea and in any case it's not necessary.
    
    >>This again means
    >>that element type names or details such as whether or not applications
    >>use namespace qualification need not be a direct concern to the DITA
    >>specification itself.
    > 
    > 
    > If (as suggested above) vocabularies are a core construct for the DITA
    > architecture, the namespaces used to identify vocabularies are a concern of
    > the DITA architecture.
    > 
    > Also, in the future, there's a strong argument for DITA to incorporate
    > namespaces into the typing system to identify specialization modules so we
    > can have unambiguous element types.
    > 
    > Those reasons suggest that the DITA specification shouldn't leave
    > namespaces entirely to the discretion of the application.
    
    I'm only talking about the namespace qualification of element type 
    names, not the namespaces used for modules or DITA-defined attributes. A 
    namespace-qualified element type name is no different from an 
    unqualified one as far as the DITA specification is concerned: it's an 
    arbitrary name. That means it's up to a given DITA-using vocabulary how 
    to define what and how namespaces are used for the element type names in 
    that vocabulary.
    
    >>But the DITA standard is *not* primarily an authoring support system. It
    >>is a generic standard that defines core types and processing semantics
    >>that in turn provides a solid basis from which task-specific authoring
    >>support systems can be built. That's a key difference and requires a
    >>sometimes subtle shift in emphasis of requirements and features.
    > 
    > 
    > Maybe yes and no?
    > 
    > 1.  As an architecture, DITA is a typing system for specialization of
    > elements, integration of design modules, and so on.
    > 
    > 2.  As a specific type hierarchy, DITA seeds the architecture with a base
    > specialization module, derives core specialization modules, and assembles
    > core vocabularies for the problem space of human-readable content.
    > 
    > The core declaration modules and DTDs are an attempt to conform to the DITA
    > architecture within the limits of DTD syntax.  For instance, the class and
    > domains attributes exist exclusively to support processing.  Similarly, the
    > entity design patterns exist exclusively to support integration of modules
    > as vocabularies.
    > 
    > As a specific type hierarchy, DITA has to be more concerned with
    > authorability and readability than, say, SOAP because DITA content in the
    > core problem space is, fundamentally, a communication from author to
    > reader.
    > 
    > Are concerns with readability and authorability restricted to the
    > declaration level?  Couldn't those concerns be legitimate issues for
    > abstract types?
    
    They are concerns for the abstract types but they are not *primary* 
    concerns. That is, the abstract type design should prefer precision and 
    consistency within the architecture to authorability. By the same token, 
    concrete document types can prefer authorability over precision by 
    taking advantage of the specialization mechanism to map from the 
    abstraction to the concrete.
    
    So I'm not saying that the DITA-defined abstract types should ignore 
    authoring concerns but they should not be driven by them.
    
    That is one of the big advantages of an architecture mechanism--it 
    provides for clear separation of concerns and avoids having 
    implementation details impinge on the core design while providing the 
    freedom for implementors to do what they need to do to meet pragmatic needs.
    
    Cheers,
    
    E.
    -- 
    W. Eliot Kimber
    Professional Services
    Innodata Isogen
    9390 Research Blvd, #410
    Austin, TX 78759
    (512) 372-8122
    
    eliot@innodata-isogen.com
    www.innodata-isogen.com
    


    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]