I spent some time over the holiday weekend reviewing chunking. The problem, as I see it, isn’t with the topic, per se; it’s the fact that the chunking attribute itself is extremely problematic. The default tokens are vague and difficult to remember, its functionality is based on assumptions that don’t apply for all processors, and most importantly, it is so overloaded as to be almost indescribable. Fundamentally, it’s responsible for two use cases: 1. Customizing the behavior of references to subsets of compound topics/ditabases. 2. Combining content referenced by ’this’ topicref and child topicrefs into a single output chunk. The spec breaks this down further, into three things: A. Selecting topics (select-*) B. Splitting of those topics into chunks (as-*) C. ‘Rendering’ the map branch (to-*) I’d argue that (A) and (B) are different aspects of use-case (1), and (C) is use-case (2), though you wouldn’t know it from the current spec language. It’s not quite that clean, though, because the to-* tokens, as far as I can tell, play double duty, controlling both the combining of child topics, as well as informing the results of the selection performed by select-*. For example, when I reference ditabase.dita#TopicC * By default (with the OTK), the navigation points to TopicC, but all of ditabase.dita is rendered as a single chunk. * chunk=“by-topic” or “select-topic to-content” will extract only TopicC into its own chunk. * chunk=“select-branch to-content” will extract TopicC and its children into a single chunk. * chunk=“select-branch by-topic” will extract TopicC and its children, each to its own chunk (though specifying ’to-content’ appears to override ‘by-topic’). And so on. I’ve actually started putting together a cheat-sheet based on trial and error, because there’s no way I can keep all the different combinations in my head. Meanwhile, the only values that really matter when dealing with a parent trying to combine/split its children are the to-* tokens, to-navigation and to-content. The to-content value combines the topicref and its children into one chunk, to-navigation... doesn’t. I’m frankly mystified as to what to-navigation is supposed to do, and I’ve been at this for hours. The spec isn’t much help. It says something about “navigation chunks” but never really defines what that means, except in a parenthetical that I’m having trouble making sense of. Re: default behavior. The spec more or less explicitly states that there is no spec-mandated default behavior, so a processor is free to chunk using the select-branch to-content algorithm suggested by Noz, and I know of at least one implementation that does. Sort of. (Arbortext selects the branch and throws the rest away, but from there, chunking/ToC generation is controlled by the stylesheet.) I think a lot of the complexity/confusion here stems from the fact that the OTK does its best to have one output chunk per input topic/ditabase file, but the ‘chunk’ attribute allows you to tweak that. The spec is operating on the assumption that all DITA processing attempts to optimize along similar lines (there’s a similar issue with @copy-to), but nowhere does the spec (as far as I know) *mandate* this behavior. I’ve always found that optimization problematic because metadata from the topicref can cascade into the topic, and so it’s very difficult to determine equality between two topicrefs, even when they’re to the same URI. As we introduce more features like scoped keys and branch filtering, this problem will continue to get worse. Post-1.3, I think we need to start moving away from that implicit one-to-one input topic/output chunk assumption in the spec, and move towards a paradigm where each (non-resource-only local dita) topicref represents its own output unit. So for 1.3, I think we need to revisit the language describing to-navigation. The other spec-specified values are pretty good, taken in isolation; the challenge comes when trying to think through how different combinations of values might affect output, and it’s in the combinations that the real value lies. The existing examples are good, though I’d suggest adding a simple ‘select a branch from a ditabase’ example as #3; #1 is simple chunking, #2 is simple bursting, and then #3 jumps into nested chunking, so a simple branch-selection example might help ease people in. Other than that, though, I’m not sure how much we can do. Post-1.3, I think we should consider deprecating the ‘chunk’ attribute altogether and replacing it with more fine-grained control attributes. Just off the top of my head (pseudo-DTD): <!ATTLIST chunk-replacement “ topic-selection (topic branch all) <!-- DEFAULT ‘all’ —> topic-split (yes no) <!-- DEFAULT ’no’ --> topic-merge (yes no) <!-- DEFAULT ’no’ --> topic-nav (per-topic first-topic) <!-- DEFAULT ‘first-topic’ --> “> * topic-selection controls what amount of the referenced file is considered the content unit. CDATA and extensible. * topic-split indicates whether to break up the selected content unit into individual output chunks. * topic-merge specifies whether to combine the referenced content unit (or pseudo-content-unit for topicheads) with content referenced beneath it into a single chunk. Topic-merge takes precedence over topic-chunk. * topic-nav controls whether navigation/ToC entries are generated for nested topics in the logical content unit resulting from topic-selection and topic-merge (and possibly to what depth). As an alternative, we could extend @toc. CDATA and extensible. I think this enables everything currently possible using the ‘chunk’ attribute, and the specified defaults map to the current default OTK behavior. It also allows something that I couldn’t get working in the OTK without multiple topicrefs, namely, including a compound topic as a single chunk with multiple TOC entries. Splitting the functions of the chunk attribute each into their own more specific, fine-grained attributes would, I think, make life easier for just about everybody. Chris Chris Nitchie (734) 330-2978
chris.nitchie@oberontech.com www.oberontech.com <
http://www.oberontech.com/ > Follow us: <
https://www.facebook.com/oberontech > <
https://twitter.com/oberontech > <
http://www.linkedin.com/company/oberon-technologies > From: Kristen James Eberlein <
kris@eberleinconsulting.com> Date: Tuesday, November 5, 2013 at 12:37 PM To: Noz Urbina <
noz.urbina@mekon.com>, "dita@lists.oasis-open.org" <
dita@lists.oasis-open.org> Cc: Mark Poston <
mark.poston@mekon.com>, Rob Hanna <
rob@infoarchitects.ca> Subject: [dita] Re: Chunking and Composite Topics Hi, Noz. (And Mark and Rob by cc) We talked about this briefly at today's TC meeting. While we cannot make any changes to chunking for DITA 1.3 -- the deadline for new proposals is long past -- I asked for volunteers to review the current content in the spec and make suggestions for improvement. And I got volunteers; Stan Doherty (Mathworks) and Chris Nitchie (Oberon Technologies) are on the hook for that work :) Best, Kris Kristen James Eberlein Principal consultant, Eberlein Consulting Co-chair, OASIS DITA Technical Committee Charter member, OASIS DITA Adoption Committee
www.eberleinconsulting.com <
http://www.eberleinconsulting.com > +1 919 682-2290; kriseberlein (skype) On 11/1/2013 8:47 AM, Noz Urbina wrote: Hello All, Kristen asked me to submit my recent work on the Chunking and Composite topic functions of DITA. With my colleagues Mark Poston and Rob Hanna we have been experimenting trying to use maps to leverage content that’s been either created in or converted to composite topics. This email contains is an almost-copy-and-paste from our report to the client, but I’d also like to add my own (hastily put together) commentary. <rant> I find the chunking attribute syntax vastly overcomplicated. Instead of offering a good default that’s simply achieved, it offers something that’s expensive for vendors to implement and/or difficult to edit by hand. I have worked with the usual main players - FrameMaker, XMetaL, oXygen, Arbortext editor – and none offer any help or special functions around chunking. It’s only an advanced-user feature, and so it doesn’t really help move licenses for people getting started, and it requires quite a lot of UI to make usable. And the documentation in the spec is just a series of examples that don’t have full XML sets shown, just partial ones with prose description of what should happen on output. Training on the functionality is a nightmare and I have actually had to look up the spec in a course when asked a question because the various permutations are so many and the tools do nothing to help. I would suggest that there are two use cases being addressed with the chunking attribute, one is merging files together, the other is reusing them from files that are used together. This may be overloading the attribute. The merging functionality makes sense, but the reuse/splitting options are rather opaque. I’d suggest by changing some of the default behaviours this could be made much easier. My own take would be: From a map, if you specify a child topic of a multi-topic file, then it’s safe to assume that that’s the topic you want, and not anything above (this is how most things in XML work, so it follows logically). So the default meanings could be: <topicref href="noz-test.dita"> = “All topics in this file” <topicref href="noz-test.dita#id1a"> = “All topics from topic id1a down” <topicref href="noz-test.dita#id1a" chunk="select-topic"> = “topic id1a only” (although it’s highly debatable whether this should be called using an attribute called “chunk” at all). In a CCMS that uses IDs, there should be no change, you just split on the # like usual. I’d suggest that simplifying the parameters passed to @chunk would enable more users to take advantage of it. I’m sure many are, but because of the complexity, lack of tool support, and resulting difficult to use for beginners, I believe many aren’t Googling the spec and learning how to use it. </rant> <reportextract> Reusing topics from a ditabase topicIf one uses chunking and conditions on the topicrefs then you can conditionally filter topics in and out and rearrange their hierarchy, even though they are stored in ditabase topics. To reuse a topic from a ditabase topic: 1. Specify the topic id in the map and set the chunking attribute to “to-content select-topic” to insert a single topic or “to-content select-branch” or a topic and its descendants. An example is supplied below of a DITAbase-based file being split up and reordered. File noz-test.dita <!DOCTYPE dita PUBLIC "-//OASIS//DTD DITA Composite//EN" "ditabase.dtd"> <dita> <topic id="id1"> <title>Topic 1</title> <body> <p>Topic 1.</p> <p>Topic 1 has a cross reference to <xref href="#id1a">Topic 1a</xref>.</p> <p>Topic 1 has a cross reference to <xref href="#id1b">Topic 1b</xref>.</p> </body> <topic id="id1a"> <title>Topic 1a</title> <body> <p>Topic 1a has a cross reference to <xref href="#id1">Topic 1</xref>.</p> <p>Topic 1a has a cross reference to <xref href="#id1b">Topic 1b</xref>.</p> </body> <topic id="id1b"> <title>Topic 1b</title> <body> <p>Topic 1b has a cross reference to <xref href="#id1">Topic 1</xref>.</p> <p>Topic 1b has a cross reference to <xref href="#id1a">Topic 1a</xref>.</p> </body> </topic> </topic> </topic> </dita> Map <!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd"> <map> <title>DITA Topic Map</title> <topicref href="noz-test.dita#id1b" chunk="to-content select-topic"> <topicref href="noz-test.dita#id1" chunk="to-content select-topic" audience=”customerABC”/> </topicref> <topicref href="noz-test.dita#id1a" chunk="to-content select-topic"/> <reltable> <relrow> <relcell> <topicref href="noz-test.dita#id1b"/> </relcell> <relcell collection-type="sequence"> <topicref href="noz-test.dita#id1"/> <topicref href="noz-test.dita#id1a"/> </relcell> </relrow> </reltable> </map> Note: · There appears to be a bug in the DITA OT that prevents rendering of topics with mixed topic types. All topics must be of the same type or else the transformation fails. The bug in the DITA OT is most likely in the Java extensions in the OT, not the XSLT. It should not be - if this is the only problem – particularly difficult to debug. Infineon must decide whether to: o Fix the bug o Make topics all the same type (most logically this would be all <topic>, within ditabase files. If this is done, as users and content are being migrated to the new, more modular way of working the topic types can and should be applied on individual topics. o Not reuse below the topic level for now. · The same limitations on xrefs apply with composite as with regular topics, and the same risks of broken links. Limitations of composite topic type· Simplified task is not included in the ditabase DTD. Ditabase DTD requires additional specialization to include simplified task. · Composite files will only be able to be categorised as a whole in the taxonomy. As they are burst, the topics contained will have to be categorised after they are created. · All IDs need to be unique across all topics – not just unique within a topic. · Additional stylesheet work may be required to achieve publishing features such as mini-tables of contents (or forward organizers). · Whole assemblies must be versioned with any change to a topic rather than simply versioning a single topic. · Topic-type OT bug as described above. </reportextract> <thanks> To you all for your attention. </thanks> B. Noz Urbina– Business Development Manager blog
http://lessworkmoreflow.blogspot.com <
http://lessworkmoreflow.blogspot.com/ > ¦ twitter@nozurbina
enoz.urbina@mekon.com < mailto:
julian.murfitt@mekon.com > ¦UK mob +44 (0)7739 522 002 ¦ES mob +34 625 467 866 ¦skype nozskype --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at:
https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php