OASIS Darwin Information Typing Architecture (DITA) TC

  • 1.  Unique topic ids in the cross publication or global CMS use case

    Posted 06-15-2013 18:09
    Hi Folks   I have found numerous discussions that topic id is not required to be unique within a publication or collection of topics – none of these discussions in the current 1.2 specification (that I could find anyhow) – although omission means no requirement. One such reference was: http://tech.groups.yahoo.com/group/dita-users/message/14260 Of course topic id does have to be unique within an XML document – that is not what I am talking about here – rather I am addressing intra publication uniqueness or even global uniqueness. Some PDF processors, such as the PDF5 processor for Antenna House, however, require that topic ids do have to be unique within a publication. At first it seems like this requirement is overstepping what Oasis has recommended (or not recommended through omission).  However, one reason for this unique id requirement of PDF5 is to support the cross publication linking use case. It just so happens that we dealt with this use case recently in approving proposal 13041 (Facility for key-based, cross-deliverable referencing (Kimber)). It seems if we do not recommend or say anything about unique topic ids, then we leave processors to “twist in the wind” – or make extra requirements like PDF5 did.  On the other hand, if we require unique topic ids, we might be pre-supposing certain implementations which in fact are not necessary. It seems, however, if we are to add proposals such as 13041, then we might want to talk about how cross publication linking might happen – this proposal 13041 opens the door to some new possibilities.    For example, if our references were key rooted, we can used key export tables and the processors could do something like the following:   I use a PDF example here but it may have bearing on other cross publication links such as cross chunked HTML. In PDF, for example, to allow processor defined unique ids to topics for the purposes of merge (Like PDF2 merge) then to link from PDFB to PDFA would require PDFA to export its external links to PDFB because the ids of the topics in the PDF are not known at author time.   PDFA (export as XML)   keyname    newMergetopicId                      Original fragmentId MyKey1      a223345                                       be3333333   Then PDFB consumes this and has a reference to MyKey1/be3333333   Then when a processor builds PDFB and when it references PDFA with MyKey1/be3333333 it would resolve to PDFA.a223345/ be3333333   In this case, a223345 could be entirely generated by the PDF processor when PDFA is built, however, be3333333 would remain stable but not unique as a fragment Id.   My question here is, should we say something in the spec or when we document proposal 13041 regarding this.  Should we have text that says “we DO NOT recommend processors rely on unique topic ids within a publication” or “we DO recommend – same”.   cheers Jim


  • 2.  Re: [dita] Unique topic ids in the cross publication or global CMS use case

    Posted 06-16-2013 13:10
    Jim, I think your concern is addressed by the current cross-deliverable addressing proposal: it does in fact propose the use of keys and mappings from keys to locations as delivered as the way to ensure reliable cross-deliverable addressing. The proposal as documented should make it clear that processors are obligated to manage a mapping from objects as authored to objects as delivered such that any delivery constraints are not imposed back onto the authored content, for example, making topic IDs unique within a publication. If the proposal is not sufficiently clear on that point then we must correct it. Because I am so deeply into issues of linking and addressing I often forget that what to me seems obvious is in fact not at all obvious. Perhaps it's useful to discuss the general issue of topics IDs and their non-requirement for uniqueness in the context of addressing generally. I think there is either some general misunderstanding in the community on what is and isn't required and probably some poor implementation choices made long ago that still linger in our community. I don't fault implementors for not always understanding the subtleties of addressing--it's a challenging subject. ----------------------------------- Topic ID Uniqueness Is Not Required Topic *IDs* are not required to be unique outside the context of their containing XML document, nor do they need to be. However, topic document addresses *are* necessarily unique, because the XML documents that contain topics are distinct storage objects, which means they have a unique location within the storage system that contains them and that storage system has a unique location within the set of all possible storage locations. That's how storage systems work. In the world of the Web, every storage system exists on some kind of server with a unique IP address. The storage system itself then exists at some unique location within that server, and the resources managed by the storage system then have unique locations, e.g., filenames, object IDs, or what have you. Thus, every *topic* has a unique URL/ID pair that distinguishes it from *all possible other topics* in existence at any moment in time. Thus the ID of the <topic> element is necessary *only* to distinguish different topics within the same *XML document*. But that requirement is imposed by XML itself since DITA defines topic IDs as XML IDs. If an XML document consists of exactly one topic, then addressing the document is sufficient to reliably address the topic (by the rules of DITA addressing) and in that case the topic ID is only of interest for addressing elements within the topic, because DITA fragment identifiers are {topicid}/{elementid} pairs. But even there, the value "topicid" for all topic IDs in this case is as good as anything. For the purposes of addressing in deliverables, there is no need for topic IDs to be unique because the processor that generates the deliverable can ensure that the IDs used in the deliverable are unique within that deliverable. The deliverable is itself a storage object (or collection of storage objects) that, like all storage objects, have identity within the set of all possible storage objects. In addition, the processor that produces the deliverable must be able to have the information required to maintain the mapping from objects as authored (that is, topic ID, element IDs, and keys) to their locations as delivered. This is true because the processor must have both the original source and deliverable it generated available to it--this does not mean that all existing processors were implemented in such a way that this information is maintained, only that they all *could have been*. So again, addressability is assured as long as the processor generating the output generates unique IDs for any addressable things put into the deliverable and maintains the source-to-deliverable address mapping. If you need to do cross-deliverable addressing then you need to have a mapping from the locations (not just IDs) of the things as authored to the locations of the things as delivered. That mapping could be managed in many ways but the current cross-deliverable proposal does it through the use of keys and intermediate key definition sets that map the keys as used in the content as authored to the locations of the key-bound resources in the deliverable. That is sufficient to support the requirement for addressability. In addition, the @copy-to attribute on <topicref> gives authors additional control over deliverable addresses by allowing the assignment of new virtual source storage object locations ("filenames") for distinct references to the same topic or map. That doesn't remove the requirement for source-to-deliverable address mapping, but it means that authors may influence the details of the result. The DITA 1.2 spec doesn't say anything about topic ID uniqueness because it doesn't need to. Topic IDs don't need to be unique, except as already required by XML rules. It can be a *convenience* to assign unique IDs to the topics under your control, but there is no way that any agency short of the divine can ensure global ID uniqueness unless we mandate the use of a specific UUID generator. By the same token, there's nothing wrong with making your topic IDs globally unique if you want to, it's just not necessary and could be a waste of effort. Or it could be a useful simplifying strategy. A typical use case might be to make topic IDs be object IDs of topics managed in a component content management system. That's fine as long as everyone is clear that these IDs can at best be unique within the scope of that one component content management system instance (even if you're using some sort of UUID generator there's always the chance, however remote, that somebody might randomly choose the same ID for one of their topics). Cheers, Eliot On 6/15/13 8:08 PM, "Jim Tivy" <jimt@bluestream.com> wrote: > Hi Folks > > I have found numerous discussions that topic id is not required to be unique > within a publication or collection of topics ­ none of these discussions in > the current 1.2 specification (that I could find anyhow) ­ although omission > means no requirement. > One such reference was: > http://tech.groups.yahoo.com/group/dita-users/message/14260 > Of course topic id does have to be unique within an XML document ­ that is not > what I am talking about here ­ rather I am addressing intra publication > uniqueness or even global uniqueness. > Some PDF processors, such as the PDF5 processor for Antenna House, however, > require that topic ids do have to be unique within a publication. > At first it seems like this requirement is overstepping what Oasis has > recommended (or not recommended through omission). However, one reason for > this unique id requirement of PDF5 is to support the cross publication linking > use case. > It just so happens that we dealt with this use case recently in approving > proposal 13041 (Facility for key-based, cross-deliverable referencing > (Kimber)). > It seems if we do not recommend or say anything about unique topic ids, then > we leave processors to ³twist in the wind² ­ or make extra requirements like > PDF5 did. On the other hand, if we require unique topic ids, we might be > pre-supposing certain implementations which in fact are not necessary. > It seems, however, if we are to add proposals such as 13041, then we might > want to talk about how cross publication linking might happen ­ this proposal > 13041 opens the door to some new possibilities. > > For example, if our references were key rooted, we can used key export tables > and the processors could do something like the following: > > I use a PDF example here but it may have bearing on other cross publication > links such as cross chunked HTML. > In PDF, for example, to allow processor defined unique ids to topics for the > purposes of merge (Like PDF2 merge) then to link from PDFB to PDFA would > require PDFA to export its external links to PDFB because the ids of the > topics in the PDF are not known at author time. > > PDFA (export as XML) > > keyname newMergetopicId Original fragmentId > MyKey1 a223345 be3333333 > > Then PDFB consumes this and has a reference to MyKey1/be3333333 > > Then when a processor builds PDFB and when it references PDFA with > MyKey1/be3333333 it would resolve to PDFA.a223345/ be3333333 > > In this case, a223345 could be entirely generated by the PDF processor when > PDFA is built, however, be3333333 would remain stable but not unique as a > fragment Id. > > My question here is, should we say something in the spec or when we document > proposal 13041 regarding this. Should we have text that says ³we DO NOT > recommend processors rely on unique topic ids within a publication² or ³we DO > recommend ­ same². > > cheers > Jim -- Eliot Kimber Senior Solutions Architect, RSI Content Solutions "Bringing Strategy, Content, and Technology Together" Main: 512.554.9368 www.rsicms.com www.rsuitecms.com Book: DITA For Practitioners, from XML Press, http://xmlpress.net/publications/dita/practitioners-1/


  • 3.  RE: [dita] Unique topic ids in the cross publication or global CMS use case

    Posted 06-16-2013 17:14
    Thanks Eliot That is pretty clear From your power point the text below is crystal clear. In the proposal 13041 you have to work harder to get it - you define location as authored and location as delivered - but the power point is more clear. I think the DITA 1.3 needs to express something about "no requirement of topicids to have global uniqueness " because, as you say there is a misconception on this and we are not a strictly normative spec. ******************** your text from powerpoint *************** Addressing within the content as authored: Defined by the source format, e.g., DITA XML For XML source, should be independent of any given output format DITA defines the rules for addressing within DITA XML Addressing from the publication as delivered: Defined by the delivery format: PDF, HTML, EPUB, etc. No single standard Details may be proprietary ************************************************************ >


  • 4.  Re: [dita] Unique topic ids in the cross publication or global CMS use case

    Posted 06-16-2013 18:31
    Thanks--it's a difficult subject to explain clearly. I will definitely see what I can do as prepare the final Stage 3 version of the proposal. Cheers, E. On 6/16/13 7:13 PM, "Jim Tivy" <jimt@bluestream.com> wrote: > Thanks Eliot > > That is pretty clear > > From your power point the text below is crystal clear. In the proposal > 13041 you have to work harder to get it - you define location as authored > and location as delivered - but the power point is more clear. > I think the DITA 1.3 needs to express something about "no requirement of > topicids to have global uniqueness " because, as you say there is a > misconception on this and we are not a strictly normative spec. > > ******************** your text from powerpoint *************** > > Addressing within the content as authored: > Defined by the source format, e.g., DITA XML > For XML source, should be independent of any given output format > DITA defines the rules for addressing within DITA XML > > > Addressing from the publication as delivered: > Defined by the delivery format: PDF, HTML, EPUB, etc. > No single standard > Details may be proprietary > > ************************************************************ > >>