docbook-apps

  • 1.  chunking very slow

    Posted 05-28-2013 01:39
    I have embarked on converting the PostgreSQL documentation from DocBook
    SGML + DSSSL to XML + XSLT. The problem is that the XSLT to chunked
    HTML build is very very slow, even with the fast chunking.

    To give you an estimate of the size, the PostgreSQL documentation is
    about 2500 pages in PDF.

    Building with OpenJade and the DocBook DSSSL stylesheets takes about 3
    minutes.

    Building with xsltproc and the xhtml/chunkfast.xsl stylesheet without
    customization takes about 18 minutes. (With customizations to match
    what I had with DSSSL, it takes longer still.)

    But using xhtml/docbook.xsl to make one big HTML output file takes only
    4 minutes. So the chunking takes a lot of time.

    The profile supports this:

    number match name mode Calls Tot 100us Avg

    0 * recursive-chunk-filename
    110411 67768503 613
    1 footnote footnote.number
    32 6365266 198914
    2 chapter label.markup
    4238 4403825 1039
    3 appendix label.markup
    1103 4345592 3939
    4 gentext.template 590443 3319394 5
    5 html.head 1115 2676209 2400
    6 l10n.language 486089 2070452 4
    7 href.target 23893 1255333 52
    8 chunk 368522 1119619 3
    9 gentext.template.exists
    450077 957036 2
    10 inherited.table.attribute
    110883 784757 7

    Does anyone have any ideas how to improve this (other than by turning
    off features such as chapter numbering, as suggested by the profile)?




  • 2.  Re: [docbook-apps] chunking very slow

    Posted 05-28-2013 07:06
    On 28.5.2013 3:38, Peter Eisentraut wrote:

    > Building with xsltproc and the xhtml/chunkfast.xsl stylesheet without
    > customization takes about 18 minutes. (With customizations to match
    > what I had with DSSSL, it takes longer still.)

    I'm not sure whether chunkfast.xsl is really that faster then chunk.xsl.
    But in your case I would try and benchmark Saxon instead of xsltproc. On
    larger documents and complex transforms it's usually much faster.

    Jirka

    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
    ------------------------------------------------------------------
    Professional XML consulting and training services
    DocBook customization, custom XSLT/XSL-FO document processing
    ------------------------------------------------------------------
    OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 rep.
    ------------------------------------------------------------------
    Bringing you XML Prague conference http://xmlprague.cz
    ------------------------------------------------------------------




  • 3.  Re: [docbook-apps] chunking very slow

    Posted 05-29-2013 04:30
    -----BEGIN PGP SIGNED MESSAGE-----
    Hash: SHA1

    On 05/28/2013 02:06 AM, Jirka Kosek wrote:
    > On 28.5.2013 3:38, Peter Eisentraut wrote:
    >
    >> Building with xsltproc and the xhtml/chunkfast.xsl stylesheet
    >> without customization takes about 18 minutes. (With
    >> customizations to match what I had with DSSSL, it takes longer
    >> still.)
    >
    > I'm not sure whether chunkfast.xsl is really that faster then
    > chunk.xsl. But in your case I would try and benchmark Saxon instead
    > of xsltproc. On larger documents and complex transforms it's
    > usually much faster.

    Maybe try the DocBook xslt 2.0 stylesheets [1] with Saxon 9.x for
    comparison as well.

    David

    [1] https://github.com/docbook/xslt20-stylesheets
    -----BEGIN PGP SIGNATURE-----
    Version: GnuPG v1.4.12 (GNU/Linux)
    Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/

    iQEcBAEBAgAGBQJRpYQzAAoJEMHeSXG7afUhsuUIAIpB5HyRLr4XL5Hv9BBP8DSk
    yex/QagHKCy0lvNRuMycAYuXy7Ci6pbkMEFRbzNSq+EBCNw12e1nmwsiLHpVnPvO
    7NftACaMJYL2+11nKQmu73hTUiehZviDDT/XE/DO9TgABtaBx5XXrkEk09DULGiw
    Z3P7aXPeknMFH9F9pZkd5w/d6IsLm8+xSFRHAozJeGnKgEMbUkV3oOR5I4t67sAc
    BfHnG18HdyVptekOo6yXtOy7qNMTlK0AaL6lG9hOJBp25RGy4eM5v7/fZ9g4snQY
    kJ1znn+FKBkZKKP46YYdATwMlU+jrpuoD0RI1vq6pnLiRSBYPzTIGANVDoMhSnY=
    =Ggcb
    -----END PGP SIGNATURE-----



  • 4.  Re: [docbook-apps] chunking very slow

    Posted 05-30-2013 12:11
    On 5/29/13 12:29 AM, David Cramer wrote:
    > Maybe try the DocBook xslt 2.0 stylesheets [1] with Saxon 9.x for
    > comparison as well.

    Thanks you for that suggestion. I tried for a couple of hours to set
    this up, and failed. Which means most of my co-developers will also
    fail. Which reminds me of the dark ages around 2000 when you had to
    assemble the SGML/DSSSL toolchain yourself. By moving to XML and
    (probably) xsltproc, I'm also hoping for a very simple installation of
    well-documented components, which the above is not (yet). Maybe it's
    the future, but I need to move to the present first. ;-)





  • 5.  Re: [docbook-apps] chunking very slow

    Posted 05-30-2013 12:07
    On 5/28/13 3:06 AM, Jirka Kosek wrote:
    > I'm not sure whether chunkfast.xsl is really that faster then chunk.xsl.

    It is, but only by 10% or so.

    > But in your case I would try and benchmark Saxon instead of xsltproc. On
    > larger documents and complex transforms it's usually much faster.

    Saxon 6.5 was twice as slow for me.

    After some more digging, I think it's not the XSLT implementation, it's
    the stylesheet implementation. If I turn off index generation, preface,
    section, chapter, appendix, and footnote numbering, as well as the
    legalnotice link, I get performance comparable to DSSSL. Apparently,
    the numbers are recomputed for every chunk. It would probably be better
    if they were computed just once and cached, not unlike what chunkfast
    does for the chunks themselves. I might take a stab at implementing
    something like that.