docbook-apps

  • 1.  Duplicate index entries to the same page

    Posted 09-05-2006 10:24
      |   view attached
    With the standard Docbook generated index (for FO), there is no provision for eliminating duplicate identical references to the same page, resulting in index entries which look like this:
    Command Line Interface, 3, 17, 17, 23
    which is not very appealing. The problem is, of course, that there is no absolute way in XSLT to avoid this, since the actual page numbers are generated by the rendering engine, after the XSLT processing is already finished. There are solutions which involve a two-pass process, which may be too involved for many users. I have developed a partial solution (not for Docbook, as of yet) in which the @id to based on which the page numbers in the index are generated is not the id of the indexterm element itself, rather of the closest ancestral section or subsection. In the process of collecting all the index entries from the document, a check is made if the same entry is already listed under that section; if so, the new entry is ignored. If the sections are roughly one page in size, this procedure should eliminate most of the duplicate entries (if the sections are too large, then even entries on different pages will be eliminated, which is not the desired behavior). A collateral benefit of this is that the link from the index will lead to the beginning of the section which contains the indexterm, which will often be more useful than being plunked down in the middle of a paragraph. Have any other users been bothered by this problem? If there is a demand, I can update the templates accordingly. (It would also be possible to introduce a parameter which would determine if the user wants the old kind of index, without trying to remove duplicate entries [for example, for a document which has very large sections] or the new kind).
    David Zalkin
    Technology Consultant
    Tech-Tav Documentation Ltd.
    +972 (0)57 313 8506

    Attachment(s)

    vcf
    Dovid L. Zalkin.vcf   505 B 1 version


  • 2.  Re: [docbook-apps] Duplicate index entries to the same page

    Posted 09-05-2006 14:13
    Dovid Zalkin wrote:

    > With the standard Docbook generated index (for FO), there is no
    > provision for eliminating duplicate identical references to the same
    > page, resulting in index entries which look like this:
    >
    > Command Line Interface, 3, 17, 17, 23
    >
    > which is not very appealing. The problem is, of course, that there is
    > no absolute way in XSLT to avoid this, since the actual page numbers
    > are generated by the rendering engine, after the XSLT processing is
    > already finished. There are solutions which involve a two-pass
    > process, which may be too involved for many users. I have developed a
    > partial solution (not for Docbook, as of yet) in which the @id to
    > based on which the page numbers in the index are generated is not the
    > id of the indexterm element itself, rather of the closest ancestral
    > section or subsection. In the process of collecting all the index
    > entries from the document, a check is made if the same entry is
    > already listed under that section; if so, the new entry is ignored.
    > If the sections are roughly one page in size, this procedure should
    > eliminate most of the duplicate entries (if the sections are too
    > large, then even entries on different pages will be eliminated, which
    > is not the desired behavior). A collateral benefit of this is that
    > the link from the index will lead to the beginning of the section
    > which contains the indexterm, which will often be more useful than
    > being plunked down in the middle of a paragraph. Have any other users
    > been bothered by this problem? If there is a demand, I can update the
    > templates accordingly. (It would also be possible to introduce a
    > parameter which would determine if the user wants the old kind of
    > index, without trying to remove duplicate entries [for example, for a
    > document which has very large sections] or the new kind).

    The question is whether it is worth to add new complexity to stylesheets
    for indexing when there is no problem with duplicates if you are using
    XEP or XSL Formatter
    (http://www.xml.com/pub/a/2004/07/14/dbndx.html?page=2) because those FO
    processors has extensions for dealing with duplicates.

    The only problem is with FOP. But to be honest older version 0.20.5 is
    largely unusable for high quality printed output, and newer version
    0.90.x tries to implement XSL 1.1. Version 1.1 offers standard way for
    removing duplicates. I'm not sure whether this is already implemented in
    FOP, but it might be easier to implement XSL 1.1 functionality into
    stylesheets then depend on solution which has many unrealistic
    prerequisites like short one page sections.

    Jirka


    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://www.kosek.cz
    ------------------------------------------------------------------
    Profesionální školení a poradenství v oblasti technologií XML.
    Podívejte se na náš nove spuštený web http://DocBook.cz
    Podrobný prehled školení http://xmlguru.cz/skoleni/
    ------------------------------------------------------------------
    Nejbližší termíny školení:
    ** XSLT 23.-26.10.2006 ** XML schémata 13.-15.11.2006 **
    ** DocBook 11.-13.12.2006 ** XSL-FO 11.-12.12.2006 **
    ------------------------------------------------------------------
    http://xmlguru.cz Blog mostly about XML for English readers
    ------------------------------------------------------------------