MHonArc v2.5.0b2 -->
dita message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: RE: [dita] Groups - DITA 1.1 Issue #45: Add See, See Also indexing elements (IssueNumber45.html) uploaded
While I understand most of what Erik is
saying
on a theoretical level, I think we need to
keep
in mind how most users think of and use
indexterms
in SGML and XML markup for the past 20
years.
And that is, indexterms are points. Changing
that
paradigm is going to surprise a lot of
users.
And users are more used to giving what Chris
calls
"sort-as" when and where they markup the
indexterm,
not in some separate sort-map
concept.
So while Erik's ideas might be ivory tower nice,
I
fear they are too different from what users
and
implementors are familiar with. DITA is
already
enough of a learning curve for people--too
much,
I fear, in many cases--so I'm hesitant to take
such
a different track in defining indexterms than
is
common in other existing XML markup
vocabularies.
paul
Hi, Chris:
Interesting issues...
GENERAL.
Fundamentally, what is an index term? In a topic architecture, I'd submit that
we should regard an index term as a semantic label attached to a unit of
content (such as a phrase, paragraph, list, table, section, topic, or
collection of topics). We should not regard an index term as attached to a
point within a discourse flow because a point doesn't have any
meaning.
The following
example
<p>...<indexterm>Application
servers</indexterm>...</p>
declares
"This paragraph
is about application servers."
That's true regardless of where the
index term appears within the paragraph. To indicate that the index term
applies only to a sentence, the writer could wrap a <ph> element around
the indexed sentence. That is, the container of the index term defines the
unit of content that's about application servers.
PAGE
RANGES. From that perspective, we shouldn't need start and end markers for
a range. By definition, the container specifies the range for the indexed unit
of content. (For an index marker within a prolog, the effective container is
the topic.)
A formatter might apply the rule that, if the container
spans more than one or two pages (or some threshhold controlled by a style
policy), the generated index shows a page range. Otherwise, the formatter
emits the start page for the container.
That way, the writer doesn't
have to maintain page ranges depending on the output. If the writer starts
with an index marker on a section but adds content to the section until it
stretches to three pages (shudder), the writer doesn't have to change the
index marker to start and end markers. If the section fits on a single page
when output as 8 1/2 by 11 but flows over three pages when output as A5 (or
whatever), the writer doesn't have to revise the topic depending on the
output.
In the implementation, during the topic merge phase, the
preprocessor could insert processing instructions at the start and end of the
container if convenient for easy processing of the range.
If you find
yourself wanting to index a range of content that's a subset of a container,
you should ask yourself whether the content merits a container. That is,
requiring that semantic units have containers is consistent with the
topic-oriented approach of assembling larger structures from small, granular,
typed units of content.
In passing, the same ambiguity that came up
for the <data> element rears its ugly head here. If I put an index
markers within a topicmeta for a topicref, should the range be the referenced
topic or the entire branch of the map? Do we need a systematic way to
distinguish the properties of the referenced topic from the properties of the
referencing collection?
SEE vs SEE ALSO. I'm wondering if we
could produce both outputs correctly from a single element that expresses
synonyms for index terms. As I understand the publishing convention for "see"
and "see also," the correct tag depends on which terms have instances:
- If both the source term and target term for the synonym have instances,
the formatter should generate a "see also" on the source.
- If only the target term for the synonym has instances, the formatter
should generate a "see" on the source.
- If the target term for the synonym doesn't have instances, the formatter
should ignore the synonym (and potentially generate a
warning).
In other words, the same synonym might be a "see" or
"see also" or nothing, depending on whether the aggregating map has assembled
topics that have instances of the source and target term.
GLOBAL
SORTS AND SYNONYMS. I'd submit that it should be possible to declare sort
keys and see / see also synonyms as global definitions rather than definitions
associated with specific instances.
After all, what if an index term
has a sort key in one instance and either no sort key or a different sort key
in another instance of the index term?
Also, when the output is
generated, a see / see also synonym applies to every instance of the index
term rather than to a specific instance. Finally, the most typical reason for
defining synonyms is to identify related content. Because the map controls the
assembly of content, synonyms would sensibly be as aspect of
assembly.
Perhaps it would make sense to define sorts and synonyms
within the <keywords> element. That way, the common case (global
definitions of sorts and synonyms) is easy, the edge case (content that
requires sorts or synonyms) is awkward but possible, and index terms embedded
within content don't provide a bulky distraction from the discourse
flow.
Maybe something like the following:
<map>
<topicmeta>
<keywords>
...
<index-definitions>
<!-- sort applied at any level
-->
<sort-term>
<term>The
Jabberwocky</term>
<sort-as>
<term>Jabberwocky</term>
</sort-as>
</sort-term>
...
<index-synonym>
<indexterm>The
Jabberwocky
<indexterm>habitat
of</indexterm>
</indexterm>
<!-- maybe an
optional attribute to enable a bidirectional synonym?
-->
<index-related>
<indexterm>Travel destinations</indexterm>
</index-related>
</index-synonym>
...
</index-definitions>
</keywords>
</topicmeta>
...
</map>
In passing, the keyref proposal
(#40) should make it possible to index the topic content but assign the labels
to those index terms in the map. Producing a good index often requires
adjusting the labels based on the labels of the other indexed content. Having
to go back into the content to align index terms is an enormous pain and an
inhibiter for reuse -- especially if you'd like to freeze the content but
perform final production on the index.
MISCELLANEOUS. I'd
agree with Paul that, with <indexterm> (as with <section>),
there's an implied structure on the content that can only be validated by XML
parser when the grammar can impose constraints on mixed content models.
Regarding linking, a generated index in HTML or PDF output should have links
to the instances of the index terms. I suppose the instances of a term could
sensibly link to one another as a convenience (if the hotspot isn't too
distracting).
What do you think?
Erik
Hennum
ehennum@us.ibm.com
"Chris Wong"
<cwong@idiominc.com> wrote on 09/28/2005 09:15:28 AM:
> I'm
kind of surprised to see no questions or objections so far to
> this
proposal. I hear that people can have strong opinions about
> this
subject. I'd like to see any debate get underway so we will
> have time
to move this issue forward. Anyone?
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]