MHonArc v2.5.0b2 -->
dita message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Re: [dita] Namespace resolution
Erik Hennum wrote:
> 2. Vocabulary processing
>
> Content is validated and processed against the vocabulary as a whole rather
> than against the individual specialization modules integrated by the
> vocabulary.
I think that in fact content can be validated both against its direct
governing vocabulary and against the individual modules from which the
vocabulary is composed. If the vocabulary is a "DITA application" then
at most the vocabulary can only add constraints to the specialization
modules, which means that content that might be invalid with respect to
the overall vocubulary might still be valid with respect to a particular
module.
For example, if a processor is a generic DITA processor it will only
care about validation against the specialization packages because it has
no knowledge of any other rules (knowing only about the DITA-defined
rules).
I don't think it's necessary to make absolute statements about
validation--validation is a type of processing that serves specific
business tasks and requirements. The most we can say with certainty is
what can and cannot be validated using a particular technology (e.g.,
DTDs, schemas, Schematron, one-off validation applications, human
inspection, and so on).
It is important to identify the types of validation that are possible
and when such validation can or should be done.
> For instance, when resolving a conref, processes are obligated to check
> whether the vocabulary for the referencing document includes the
> specialization modules for the elements in the referenced content.
> Otherwise, the conref could include invalid elements.
I understand the motivation for this statement but it's not that simple,
at least in the general case of doing transclusion (the current DITA
definition of conref may be more constraining, possibly too constraining).
In the general case, whether a given transcluded element is "valid" or
not is entirely a function of the thing that is processing the result.
For some applications any combinations will be valid (meaning they can
be meaningfully processed). For other applications only exact element
type matches are valid. And for others, transclusion of any compatible
class of the referencing element is valid (e.g., any class within the
same specialization hierarchy).
> If every element can be processed in isolation, the specialization modules
> can provide complete processing. If the processing requires contextual
> sensitivity, however, the vocabulary has to be able to affect the
> processing. After all, the vocabulary controls the context.
I'm not sure I fully understand the use of "processing" in this case.
That is, by "vocabulary" do you mean "application" (as I've been using
it)? I'm trying to keep a clear distinction between the "vocubulary",
which is the definition of the set of types, and the "application",
which is the combination of a vocabulary and a set of business rules
that define how elements in the vocabulary must or can be processed.
It's a subtle distinction but I think it's important to make in XML
because of the fact that XML content (and by extension, XML document
constraint specifications such as DTDs and schemas) are entirely
declarative and provide no *processing* specifications. Processing
specifications are entirely in the domain of prose and software
component implementations (e.g., style sheets, Java objects, etc.).
Or more simply: vocabularies define content constraints, applications
define processing for the content. There may be multiple applications
associated with a single vocabulary.
> For instance, in one domain, I've specialized section as backgroundSection
> so my topics can include background content. In another domain, I've
> specialized title as safetyInstructionTitle so I can include safety
> instructions as either a topic or section. I now create a vocabulary that
> integrates the two domains, so I can have background sections that provide
> safety instructions. In the same way that a term within a dlentry has
> processing expectations, a backgroundSection that contains a
> safetyInstructionTitle could have processing expectations (perhaps of
> isolation, implemented as a sidebar for some outputs). Only the vocabulary
> can specify the processing expectations for the combination of the two
> elements. After all, the background and safety modules might be supplied
> by designers who are completely unaware of one another's specializations.
To apply my terminology: I would say "Only the application can specify
the processing expectations for the combination...."
> Note that this processing expectation is part of the semantics of the
> vocabulary. Different applications may realize those processing
> expectations in different ways.
And here, just to continue to be pedantic, I would use the term
"processors" instead of "applications". That is, I'm trying to use the
term "application" in the sense originally used in the SGML standard
(the set of rules associated with a document type) not in the sense of
"a set of software components that perform a task".
I realize it's hair splitting to a degree but there is so much potential
confusion and so much abstraction that without very precise terminology
misunderstanding is a certainty.
> 3. Element polymorphism
> We don't want to limit processing of DITA content, however, to
> DITA-sensitive applications -- especially where existing vocabularies are
> being retrofitted as DITA vocabularies. For DITA-insensitive applications,
> the declared element type is everything and the class attribute is nothing.
Remember I'm not saying that, for example, the element types in the
DITA-supplied reference schemas should be arbitrary--far from it. I'm
just pointing out that in DITA-based applications there need be no
general constraint on element type names. At a minimum we can say that
element type names may or may not be namespace qualified. Or we can say
that fully-conforming DITA processors must use DITA class attribute
values to apply DITA processing semantics to elements, meaning that
element type names are unconstrained.
I don't think as standards writers we need to mandate the
interchangability of document instances--it is sufficient to define a
mechanism by which instances can be maximimally interchangable, which
would be by having all element type names be the same as DITA-defined
class names.
> In addition, the declared element type is displayed to human readers of the
> content to guide their understanding of the semantics of the content.
>
> Because the actual element name is important for these purposes, the DITA
> architecture mandates support for generalization and respecialization
> operations to change the declared element type.
I'm not sure I understand this comment. Element type names are important
but they are not important *to the DITA standard*. They are important
to designers and implementors of DITA-based applications.
Remember that the DITA-defined specialization packages define element
*classes* not element types--element types only exist in DITA-based
vocabularies. So while the DITA standard will define a set of classes
whose names must be carefully thought out, the element type names used
in a given DITA-based application are still arbitrary.
>>
>>5. DITA applications in which element type names are qualified with
>>their corresponding package namespaces. This is possible for the same
>>reason (4) is possible: element type names are arbitrary.
>
>
> Would the root element for the DITA content have to declare both the
> namespace for the vocabulary and the namespace for the element's
> specialization module?
Yes, assuming the specialization module is not a "magic" DITA-defined
core module.
> For instance, how would a specialized topic declare both the namespace for
> its specialization module and the namespace for the vocabulary that's
> combining it with other topic types and domains? As in the illegal:
>
> <specializedTopic
> xmlns="http://some.org/dita/vocabulary/specializedVocabulary";
> xmlns="http://some.org/dita/module/specializedTopic";
> class="- topic/ph
> http://some.org/dita/module/specializedTopic#specializedTopic ">
The specialization modules must be associated with a prefix, so there
can never be a conflict with the document's defaul namespace. Thus your
example should be:
> <specializedPh
> xmlns="http://some.org/dita/vocabulary/specializedVocabulary";
> xmlns:module1="http://some.org/dita/module/specializedTopic";
> class="- topic/ph
> module1/specializedPh">
Remember that for the purpose of determining whether a given namespace
is "in scope " for an element you only need to examine the declarations
and you don't care what the prefixes are. That is, if my application is
going to examine the above element to see if the "specializedTopic"
namespace is in scope I would simply examine all the namespace
declaration attributes to see if any of them contain the expected URI.
>>2. The namespace prefixes for the core DITA packages are "magic" and
>>must be use used as-is in class attribute values in DITA 1.0. This
>>avoids any requirement for DITA 1.0 processors to have to be prepared to
>>dereference core package names to namespace URIs.
>>
>>3. The DITA 1.0 spec can *discuss* the other ways in which namespaces
>>_can_ be used in conforming DITA applications without actually doing it
>>requiuring it or doing it in the oasis-provided DTDs and schemas.
>
>
> It's an inspired compromise for 1.0 to treat specialization module
> qualifiers as magically bound to namespaces that aren't actually declared
> on the element. I'd like to see it applied to both core DITA and non-core
> specialization modules so we don't have a two-tier typing scheme.
I assume by "two-tier typing schema" you mean a typing scheme applied to
some modules but not others?
I agree, consistency seems to be paramount here.
I think there is very little risk in having module prefixes bound to
namespaces since it doesn't affect existing processors in any way and it
doesn't affect element type naming or schema construction, apart from
requiring the namespace declarations, which is standard XML syntax and
doesn't change the processing any tool would do (that is,
namespace-aware processors will handle the declarations as they would
anyway and namespace-unaware processors will continue to ignore them).
I don't see how making module prefixes globably unique names can be
controversial.
>>>In principle, I agree strongly. In practice, my concern is that, to
>>>implement this approach, we have to solve problems like swapping
>
> namespaces
>
>>>in and out of the class attribute during generalization and
>>>respecialization.
>>
>>I'm not sure I understand this comment: the value of the class attribute
>>is (conceptually) just a list of namespace prefixes that map to the URIs
>>for packages. The class attribute value need never change.
>
>
> Sorry, I was obscure. The class attribute doesn't change, but the
> namespace on the element would have to change during generalization and
> respecialization.
>
> For instance, here's the element before generalization
>
> <specializedPh
> xmlns="http://some.org/dita/module/specializedDomain";
> class="- topic/ph
> http://some.org/dita/module/specializedDomain#specializedPh ">
>
> and after generalization
>
> <ph
> class="- topic/ph
> http://some.org/dita/module/specializedDomain#specializedPh ">
>
> If the namespace isn't changed, the element will be in either no namespace
> or the wrong namespace and thus won't be valid.
By generalization I assume you mean "generating a new instance whose
element types are superclasses of the original input element."
I think there is confusion about where the namespace is applied, as
discussed above.
The namespace for the specialization package is *never* the namespace of
the element type. Therefore, during generation of a generalized instance
you would be rewriting the class attribute and, presumably, changing the
element type name. It would not be necessary to remove declarations of
names spaces not used in the generalized instance but you could if you
wanted to. That is, there's no problem with declaring namespaces that
are never used.
So I think your example could be:
<specializedPh
xmlns:module1="http://some.org/dita/module/specializedDomain";
class="- topic/ph
module1/specializedPh">
and after generalization
<ph
xmlns:module1="http://some.org/dita/module/specializedDomain";
class="- topic/ph">
Note that the class= attribute has been rewritten but the namespace
declaration has not been removed. But it could be:
<ph
class="- topic/ph">
Both "ph" instances are equivalent and would be processed the same way.
I should note also that this notion of the generation of literal
generalized instances is not something that the DITA specification needs
to define--this type of processing is simpl one of many types of
processing that might be applied to DITA documents and the ability to do
it is inherent in the nature of the specialization mechanism.
>>
>>As long as this is always the case then the element type name is simply
>>irrelevant for the purpose of DITA-based processing. That is, from a
>>DITA perspective, the element type name is, by definition, a synonym for
>>the element's class name.
>
>
> Agreed, for DITA-sensitive applications, the element name is irrelevant.
>
> DITA content also should be processable, however, by DITA-insensitive
> applications. For those applications and as well as for human consumption,
> the DITA architecture needs to support changing the element name --
> effectively, casting to a different declared type.
I'm still not understanding this comment: the DITA specification can
only define processing in terms of class values. The element type name
value is simply outside the scope of the DITA architectural mechanism.
We can state that there is a class of simple DITA processors that expect
element type names to be the same as DITA-defined class names and that
in order to satisfy such processors one is encouraged to make element
type names the same as leaf class names but I see no reason to require
that so I see no reason for the architecture mechanism to say anything
about element type names at all.
>
>>...
> * By the namespace on the root element if the namespace matches that of a
> known DITA vocabulary
A document need not be rooted at a DITA element. A document is a DITA
documente if *any element* is derived from a DITA-defined type. A
document is a "DITA-only document" if its root is a DITA-defined type
and all elements are likewise derived from DITA-defined types.
> Regardless of whether the class attribute is namespaced, wouldn't these
> tests have to be performed anyway and in the same way?
>
> That is, couldn't a content management system such as XIRUSS-T use the
> following approach?
>
> 1. Is a namespace declared on the root element? If so, match the known
> namespaced vocabularies including the known DITA vocabularies.
Yes, but not limited to the root element--declared anywhere within the
document.
> 2. Is a DTD declared for the document? If so, match the known
> vocabularies with public identifiers including the DITA vocabularies.
XIRUSS-T, like MS Word, doesn't do anything with DTDs. So no, this
wouldn't work. In any case, this is not reliable because the external
DTD subset is not 100% reliable way to determine the true document type
of a document.
> 3. Is a Schema declared for the document? If so, match the known
> vocabulary declarations including the DITA vocabulary declarations.
If the schema is bound via the nonNamespaceSchemalocation then it is no
better than an external DTD subset.
If the namespace is declared by schemaLocation then there is also a
namespace declared and I don't need to look at the schema.
> 4. Prompt the user for known vocabularies including the DITA vocabularies.
This is reliable to the degree the user can answer the question
accurately but is not general in the sense that it does not support the
use case of generic processors acting on documents without further input.
> If a namespace on the class attribute doesn't reduce the number of tests
> needed to match content with a handler, would it make sense to defer
> namespacing the class attribute until the full namespace solution is
> specified? That way, we keep our options open in case something else in
> the solution makes it unnecessary to namespace the class attribute?
I don't think so. Part of the point of namespacing the class attribute
is to ensure that the DITA class attribute can always be distinguished
from other class attributes. For example, I have existing document types
that have a class attribute--if I wanted to retrofit those to use DITA
as their underlying architecture I would have to change one attribute
name or the other. Therefore, requiring the class attribute to be
qualified ensures that at minimal cost.
Remember too that attributes, by definition, are not in any namespace
unless they are qualified. That is, putting an element in a DITA-defined
namespace *does not* put the attributes of that element in a
DITA-defined namespace. Many specifications ignore this but nevertheless
it is the case.
Thus if the class attribute is not qualified it cannot be reliably
recognized as being the DITA class attribute.
Qualifying the class attribute, and only the class attribute, also
ensures that documents are bound to a DITA-defined namespace without
constraining or further complicating any other processing or requiring
the declaration of namespaces used only within class attribute values
(which will usually be limited to "magic" DITA-defined prefixes).
That is, I don't see any great risk to qualifying the class attribute
and much benefit from doing it. Doing this would not in any way affect
how we might use namespaces in the future for either class attribute
values or element type names.
> For instance, if in 2.0, the namespace for the base DITA topic module ends
> up declared in the class attribute value, would declaring the namespace on
> the class attribute itself become redundant?
>
> <ph class="- http://dita.oasis-open.org/modules/topic#ph ">
I never intended that module namespaces be declared in the class
attribute--there are number of syntactic reasons why this would be a bad
idea and in any case it's not necessary.
>>This again means
>>that element type names or details such as whether or not applications
>>use namespace qualification need not be a direct concern to the DITA
>>specification itself.
>
>
> If (as suggested above) vocabularies are a core construct for the DITA
> architecture, the namespaces used to identify vocabularies are a concern of
> the DITA architecture.
>
> Also, in the future, there's a strong argument for DITA to incorporate
> namespaces into the typing system to identify specialization modules so we
> can have unambiguous element types.
>
> Those reasons suggest that the DITA specification shouldn't leave
> namespaces entirely to the discretion of the application.
I'm only talking about the namespace qualification of element type
names, not the namespaces used for modules or DITA-defined attributes. A
namespace-qualified element type name is no different from an
unqualified one as far as the DITA specification is concerned: it's an
arbitrary name. That means it's up to a given DITA-using vocabulary how
to define what and how namespaces are used for the element type names in
that vocabulary.
>>But the DITA standard is *not* primarily an authoring support system. It
>>is a generic standard that defines core types and processing semantics
>>that in turn provides a solid basis from which task-specific authoring
>>support systems can be built. That's a key difference and requires a
>>sometimes subtle shift in emphasis of requirements and features.
>
>
> Maybe yes and no?
>
> 1. As an architecture, DITA is a typing system for specialization of
> elements, integration of design modules, and so on.
>
> 2. As a specific type hierarchy, DITA seeds the architecture with a base
> specialization module, derives core specialization modules, and assembles
> core vocabularies for the problem space of human-readable content.
>
> The core declaration modules and DTDs are an attempt to conform to the DITA
> architecture within the limits of DTD syntax. For instance, the class and
> domains attributes exist exclusively to support processing. Similarly, the
> entity design patterns exist exclusively to support integration of modules
> as vocabularies.
>
> As a specific type hierarchy, DITA has to be more concerned with
> authorability and readability than, say, SOAP because DITA content in the
> core problem space is, fundamentally, a communication from author to
> reader.
>
> Are concerns with readability and authorability restricted to the
> declaration level? Couldn't those concerns be legitimate issues for
> abstract types?
They are concerns for the abstract types but they are not *primary*
concerns. That is, the abstract type design should prefer precision and
consistency within the architecture to authorability. By the same token,
concrete document types can prefer authorability over precision by
taking advantage of the specialization mechanism to map from the
abstraction to the concrete.
So I'm not saying that the DITA-defined abstract types should ignore
authoring concerns but they should not be driven by them.
That is one of the big advantages of an architecture mechanism--it
provides for clear separation of concerns and avoids having
implementation details impinge on the core design while providing the
freedom for implementors to do what they need to do to meet pragmatic needs.
Cheers,
E.
--
W. Eliot Kimber
Professional Services
Innodata Isogen
9390 Research Blvd, #410
Austin, TX 78759
(512) 372-8122
eliot@innodata-isogen.com
www.innodata-isogen.com
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]