OASIS Darwin Information Typing Architecture (DITA) TC

 View Only

New Proposal: Vocabulary for Capturing Publishing Process Details to Facilitate Cross-Publication XRefs

  • 1.  New Proposal: Vocabulary for Capturing Publishing Process Details to Facilitate Cross-Publication XRefs

    Posted 10-20-2012 14:37
    In response to my original proposal 13041, facility for key-based cross deliverable referencing, Michael asserted that you could manage cross-deliverable links by using keydefs to define the as-delivered locations of peer resources and that such an approach would not require any new architectural changes or any change to existing DITA 1.2 processors. I have worked through a scenario that uses Michael's approach and am convinced that he is generally correct. I will post my exercise as a separate message so as not to overburden this thread. ----- Aside The only place where Michael's approach falls down is in knowing, for a given peer resource reference, what root map the referencing document intended the peer to be published in terms of. This is because a peer reference today can only be a direct URI reference to the resource: <keydef keys="pubB-topic1" scope="peer" href="../../common/topics/topic-1.dita" /> The link establishes the peer relationship to the topic, but it doesn't establish the publication the referencing map wants that peer to be published in terms of. It is this aspect of the problem that my proposal 13041 addresses, and I will discuss the issue separately there. But this issue does not otherwise affect the soundness and utility of Michael's approach. End Aside --------- Given that sets of keydefs can be used to define peer reference as-published locations, it follows that we should standardize, or at least define clear conventions for, capturing the information needed make these keydef sets work for this processing use, just as we have with DITAVAL and SubjectScheme for filtering. I don't think it will be that hard to define an appropriate vocabulary and we can start testing such a vocabulary with the Open Toolkit and, hopefully, other DITA processors, as soon as we have something drafted. Thus I would like to propose that we define for DITA 1.3 new vocabulary that supports the use of keydef sets in processing that results in deliverables with resolved peer cross references. The rest of this message outlines the general requirements and a suggested approach for such a vocabulary. NOTE: This proposal requires that all peer resources be bound to keys so that processors can then use key sets to interchange processing details related to peer resource resolution. This requirement is not inherent in my original 13041 proposal but I do not object to the requirement. ----------- Terminology First, some terminology to help keep the discussion clear (because it can get a little twisty): - "Publication" -- the thing to be delivered as represented by a root DITA map. - "Deliverable instance" -- The result of processing a Publication to produce an output reflecting a unique set of input parameters including the deliverable data type (HTML, PDF, EPUB, etc.), the filtering specs (DITAVAL files), the delivered location (e.g., URL of where the deliverable will be published), and any other process-specific parameters what would result in a different deliverable (in particular, parameters that determine processor behavior where the DITA spec allows different behaviors, such as filtering before or after conref resolution). - "Publication specification" -- The set of parameters used to produce a deliverable instance. - "as-referenced keydef set" -- A set of key definitions that reflect the set of peer resources referenced by a given publication for a given processing specification. For example, if Pub A references peer topic B1 then the as-referenced keydef set would include the keydef Pub A used to point to topic B1 as a peer. These keydef sets are used in the processing of the referenced peer publications so they know which of their resources they need to generate as-published keydefs for. - "as-published keydef set" -- A set of key definitions reflecting the key names as used by a specific publication and the locations of the referenced resources as published in a specific deliverable instance. These keydef sets are used in the processing of the referencing publication to produce the final deliverable with correct peer resource references. ---------------- Processing Model The general process for producing a deliverable from a given publication with resolved as-published peer references using keydef sets to communication among processors is as follows: 1. Process the publication ("Pub A") to produce its as-referenced keydef sets for each of the peer publications it links to. Each as-referenced keydef set reflects the publication specification used to produce it. This is pass 1. [NOTE: DITA 1.2 provides no defined way to specify the root map (publication) a given peer reference applies to, so without proposal 13041, there would need to be some processor-specific way to specify for each peer resource what publication it applies to. At a minimum you would need metadata on each peer resource's keydef that specifies the publication.] 2. Process each of the referenced peer publications ("Pub B", etc.), specifying the as-referenced keyset for each publication as a parameter to the process, to produce the as-delivered key set for each publication. This process can be repeated for each of the possible deliverables each peer publication is or may be published to. This results in a set of keydef sets, one for each publication/publication specification pair. 3. Process the publication from step 1 (Pub A), replacing the original keydefs for the peer resources with the appropriate keydefs generated in step 2, to produce the final deliverable for the publication. Note that this implies the possibility of manual selection of specific keydefs for specific peer resources, such as choosing the PDF version over the HTML version for a specific resource. This provides complete control over which delivered version of a given peer resource a given link resolves to. This is necessarily a two-pass (or 1 1/2 pass) process because you can't finish the processing of the publication in Step 1 until you've both determined the peer resources it points to and the delivered locations of those resources. [I say "1 1/2" pass because the initial processing in Step 1 need only be that necessary to determine the peer references, it doesn't have to actually produce any other output.] -------------------------- Publication Specifications Given the above definitions, it should be clear that a given deliverable instance is identified by its publication/publication specification pair, meaning that, for a given processor, a given publication processed with a given publication specification will always produce the same deliverable instance. It also means that two deliverable instances for a given publication are distinguished by their publication specifications. This is important because you need to have a well-defined and reliable way to communicate *which* deliverable you want when configuring the as-published result of a given peer cross reference. By formally defining the notion of "publication specification" it follows that publication specifications are objects, which means they have identity, with means they must have identifiers, which means we can use their identifiers to clearly and concisely talk about them. The only open question is what form the identifier takes and what space of names it exists in--this is likely to be processor specific. In a keydef set that defines the as-published locations of topics from a given publication, you can specify the publication specification ID to which those locations apply. This allows observers to clearly distinguish one such set of keys from other such sets of keys. For example, say you want to process Pub A, which has references to topics in peer publication Pub B. As input to the pass-2 processing for Pub A you can specify the process specification for the Pub B deliverable you want your peer links to resolve to. In the case where you don't need to hand-select specific keydefs (e.g., you use the "like links to like" business rule), then a processor can automatically select the appropriate as-published keydef set from among those available for a given peer publication. Or, if you do need to hand-select specific keydefs, you can use the process specification ID to find the appropriate keydef set. To support this process with standard markup, we need three things: 1. Markup for process specifications. 2. Markup for as-referenced keydef sets 3. Markup for as-delivered keydef sets ---------------------------- Process Specification Markup Assuming we want to use DITA-based XML for defining process specifications, there are the following possibilities: 1. Define a new topic type that captures the details. The topic itself provides identity and its title can serve as a display label for the specification, e.g. "PDF for Expert Users on OSX". The topic could be specified as a parameter to processors or referenced as a resource-only resource from maps (in the case where you have a map that is intended for producing exactly one deliverable). 2. Define a new map type that captures the details as metadata within <topicmeta>. As for the topic approach, the map title can provide a display label for the specification. Also as for topics, the map could be referenced as a resource-only resource from maps that are used for exactly one deliverable. 3. Define a new topicref type that captures the details as metadata within <topicmeta>. The navigation title for the topicref can provide a display label for the specification. The topicref could optionally point to a topic that serves as additional documentation for the process specification. In thinking about it now, I think I like the topicref approach best, because it provides a natural way to hold multiple process specifications in a single XML document, through keys, provides a way to specify a unique name for each specification separate from the storage location of the specification, and allows for linking to additional documentation when necessary. In the case where you want each process specification to be a separate XML file, you just have a map with one topicref in it, which is minimal extra overhead. I think the map option (option (2) above is a non-starter because there is no way to hold multiple maps in a single DITA-conforming XML documents. I think the topic option (option (1) above is less compelling because it would completely separate the process specification from maps but all of this markup processing is otherwise entirely in the map domain. So I think using topicrefs makes the most sense. With topicrefs you can easily have a single map document that collects multiple process specifications together. By requiring keynames on the process specification topicrefs you provide a natural DITA-defined identifier. In the case where you want to manage individual process specifications as standalone documents, the overhead is imply the <map> wrapper element, e.g.: <map> <process-specification> ... </process-specification> </map> The map can provide a title to give a display label to the process specification set, which is handy. I'm not going to try to define the details of the markup for process specifications here--that would be an exercise for the stage 2 proposal. I think the general requirements are clear, as outlined above. ------------------------- As-Referenced Keydef Sets An as-referenced keydef set would be a map that contains key definitions for each peer resource referenced by a given publication in the context of a given processing specification. Thus, in addition to simply holding the keydefs, it must capture the following information: - The root map that the keydefs came from - The processing specification used to produce the keydefs My initial proposal would be to define a new map type, "as-referenced-keydef-set", with one new topicref type, <publication-map>, and one new <data> type, <processing-specification-id>: <as-referenced-keydef-set> <title>As-Referenced Keydefs for Publication PubA.ditamap</title> <as-referenced-keydef-set-metadata> <processing-specification-id>procspec-one</processing-specification> </as-referenced-keydef-set-metadata> <publication-map href="../../pubA.ditamap" format="ditamap"/> <keydefs> <keydef keys="pubB-topic1" href="../../pubB/topics/topic1.dita" scope="peer" /> <keydefs> </as-referenced-keydef-set> Where the value of the <processing-specification-id> element is whatever we decide process specification IDs are (which may be processor specific). Alternatively, it could be a direct reference to the process specification using normal DITA addressing (e.g., a pointer to a topicref within a map document). Note that this keydef set is not intended to be included in any root map--it is a standalone data set used as input to the processing of the referenced resource in the context of its publication root map (remembering that we currently have no defined way to know, for a given peer resource, what root publication map it is used in the context of). The use of map markup here is fundamentally just a convenience, but as the ultimate result of all this processing will be a new set of keydefs, it makes sense to use keydef markup for this intermediate data set as well--it keeps things clear to authors and enables use of existing map and key processing infrastructure. ------------------------ As-Delivered Keydef Sets The as-delivered keydef set is an otherwise normal map containing keydefs intended to be included in the map for a given publication. At the map level, the only additional details it needs to include are the peer publication map and processing specification (that is, deliverable instance) it reflects. For each topicref, it needs to include the navigation title for the target resource and the title of the publication. This information then enables generation of cross-publication xrefs in the output without additional processing, e.g., "See Topic 1 in Publication B". As for as-referenced keydef sets, my initial proposal is a new map type, "as-delivered-keydef-set", with the same <publication-map> and <processing-specification-id> elements. Its content would be normal keydefs with the addition of a new <data> specialization, <pubtitle>, that captures the title of the publication the peer resource is in, e.g.: <as-delivered-keydef-set> <title>As-Delivered Keydefs for Publication PubB.ditamap, HTML for OSX</title> <as-referenced-keydef-set-metadata> <processing-specification-id> procspec-html-osx </processing-specification> </as-referenced-keydef-set-metadata> <publication-map href="../../pubB.ditamap" format="ditamap"/> <keydefs> <keydef keys="pubB-topic1" href="../../pubB/topics/topic1.html" scope="peer" > <topicmeta> <navtitle>Topic 1</navtitle> <metadata> <pubtitle>Publication B</pubtitle> </metadata> </topicmeta> <keydefs> </as-delivered-keydef-set> Note that this document is very similar to the as-referenced keydef set, but reflects the peer publication, Pub B, not the referencing publication. When included in Pub A's root map before any other keydefs, the keydefs in this map will take precedence and will therefore determine the address to use in the published deliverable for Pub A. Note that the value of the @href attribute points to the resource *as delivered*, meaning that the only change to it made by the delivery processor might be to adjust the relative pathing, but to otherwise leave it alone (and if it is an absolute URI, always leave it alone). ------- Summary With some relatively simple conventions for generating and manipulating keydefs and capturing definitions of processing specifications, we can enable reliable and practical generation of peer-to-peer references in publications as delivered in a way that does not require any magic or processor-specific stuff or changes to current key-based processing (meaning the mechanism can work with existing DITA 1.2 processors). The approach supports both completely manual manipulation of the keydef sets as well as enabling automatic manipulation of them. It does not require any architectural change, only the addition of new vocabulary based on existing types. -- Eliot Kimber Senior Solutions Architect, RSI Content Solutions "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.rsicms.com www.rsuitecms.com Book: DITA For Practitioners, from XML Press, http://xmlpress.net/publications/dita/practitioners-1/