OASIS Open Document Format for Office Applications (OpenDocument) TC

  • 1.  Index Marks

    Posted 05-05-2003 17:04
    Hello all,
    
    one of my action items was to look at the different types of index 
    marks, and to see whether they could be combined.
    
    There's four types of index marks in the base spec:
    1) bibliography
    2) toc (table-of-content)
    3) alphabetical-index
    4) user-defined
    
    To make things short, user-defined indices are a kind of souped-up table 
    of content, but the others are different in functionality.
    
    
    Bibliography marks are conceptually different from all other index 
    marks: They contain data for a bibliographic reference. They don't mark 
    arbitrary different text, but rather display a suitable (and 
    configurable) identifier for their bibliographic reference. In many 
    ways, they are more like fields, with the specific function that an 
    index over all 'fields' of this type can constructed.
    
    
    A table-of-content lists content elements in document order. This 
    usually contains all headers in the document, but it can also (or 
    instead) include regions of text marked for toc-inclusion. This is what 
    toc-marks are used for. The index contains toc elements in document 
    order, and tehre is a one-to-one relationship between toc elements and 
    toc entries.
    
    
    An alphabetical index lists content elements in alphabetical order. 
    Several such elements can be grouped together. A group of elements can 
    have a main entry, and group members can be discerned by a secondary 
    keywords. To support languages where alphabetical sorting is impractical 
    (due to rather large alphabets), the index values and keywords can be 
    given in a phonetic spelling, which is then used to sort the entries.
    
    
    A user-defined index is mostly like a table-of-content, except there can 
    be several ones (with distinct markers), and the indices (not the index 
    marks) have some options that aren't traditioanlly used with tocs. 
    Essentially, one could consider a toc as a special user-defined index.
    
    
    Common to TOC/user/alphabetical marks:
    - empty stand-alone elements, or -start/-end pairs
    - text:id             [match -start and -end elements]
    - text:string-value   [for empty if index text differs from marked text]
       text:string-value-phonetic  [string-value, for phonetic sorting]
    
    Specific attributes for TOC/user/alphabetical:
    
    toc:
    - text:outline-level  [the level this entry appears in index]
    
    user:
    - text:outline-level  [like toc]
    - text:index-name     [there can be several user-defined indices]
    
    alphabetical:
    - text:key1           [keyword]
       text:key1-phonetic  [key1, for phonetic sorting]
    - text:key2           [supplementary keyword]
       text:key2-phonetic  [key2, for phonetic sorting]
    - text:main-etry      [one of several identical entries can be declared
                            main entry]
    
    
    So, combine or not combine? Well, one could combine them 'by force', 
    i.e. make a single element with a type attribute and then make the other 
    attributes type-dependent. This doesn't really make sense to me. What 
    could make sense is to combine the toc mark and the user-defined mark. 
    The only problem I have with that is that a table-of-content is 
    something I would consider to have semantics, while a user-defined index 
    doesn't. So the trade off appears to be somewhat more elegant format 
    definition vs somewhat more semantic content.
    
    Opionions?
    
    
    Sincerely,
    Daniel