docbook-apps

  • 1.  Japanese index

    Posted 04-24-2018 18:40
    Dear All,

    has anybody any experience with generating Japanese back-of-the-book index
    from DocBook source?

    I am facing same issues discussed in this old thread (all entries end up in
    the Symbols section):
    https://lists.oasis-open.org/archives/docbook-apps/200605/msg00063.html

    If I understand correctly, indices in Japanese should be grouped
    phonetically:
    https://www.slideshare.net/k16shikano/imybp-light

    I've found promising Kuromoji library https://github.com/atilika/kuromoji
    I can imagine it could somehow pre-process all index entries and generate
    values for the 'sortas' attribute.

    But it is still unclear how to tweak the index code to generate groups from
    non-latin characters.

    Or are there better ways?

    Thanks,

    Jan






  • 2.  Re: [docbook-apps] Japanese index

    Posted 04-24-2018 19:54
    On 24/04/2018 19:39, Jan Tosovsky wrote:
    > has anybody any experience with generating Japanese back-of-the-book index
    > from DocBook source?

    More than 20 years ago.

    > I am facing same issues discussed in this old thread (all entries end up in
    > the Symbols section):
    > https://lists.oasis-open.org/archives/docbook-apps/200605/msg00063.html
    >
    > If I understand correctly, indices in Japanese should be grouped
    > phonetically:
    > https://www.slideshare.net/k16shikano/imybp-light
    >
    > I've found promising Kuromoji library https://github.com/atilika/kuromoji
    > I can imagine it could somehow pre-process all index entries and generate
    > values for the 'sortas' attribute.

    Slide 35 of those slides shows a corner case that a morphological
    analyzer could get wrong. (I'm not able to test it, myself.)

    If you were using 'kuromoji', you could concatenate the values of the
    'Reading' feature for all of the parts of speech of an index entry and
    use that as the 'sortas' value.

    > But it is still unclear how to tweak the index code to generate groups from
    > non-latin characters.

    I don't know, either.

    > Or are there better ways?

    It's probably not what you want to hear, but Antenna House does have a
    commercial product for doing DocBook indexes:

    https://www.antennahouse.com/antenna1/i18n-index-library/

    Regards,


    Tony Graham.
    --
    Senior Architect
    XML Division
    Antenna House, Inc.
    ----
    Skerries, Ireland
    tgraham@antenna.co.jp



  • 3.  Re: [docbook-apps] Japanese index

    Posted 04-25-2018 09:50
    On 24.4.2018 21:53, Tony Graham wrote:
    >> But it is still unclear how to tweak the index code to generate groups
    >> from
    >> non-latin characters.
    >
    > I don't know, either.

    DocBook stylesheets support three methods of indexing, see:

    http://www.sagehill.net/docbookxsl/IndexIntl.html

    In "kosek" method you can easily define groups based on the first or
    first two characters of indexed words. Unfortunately there is currently
    no suitable definition for Japanese. And my Japanese knowledge is not
    enough to create such definition.

    But internals of this methods are described in the following paper:

    http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131.2069&rep=rep1&type=pdf

    This might give you enough clue to adapt it to Japanese. If you will be
    successful it would be great if you can contribute definitions back to
    the stylesheets. Feel free to contact me if you need more info.

    > It's probably not what you want to hear, but Antenna House does have a
    > commercial product for doing DocBook indexes:
    >
    > https://www.antennahouse.com/antenna1/i18n-index-library/

    Isn't this newer version of library that is needed for "kimber" indexing
    method? I though that Elliot intended to convince AH to make this
    library open-source, but it seems that my memory is wrong.

    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
    ------------------------------------------------------------------
    Professional XML and Web consulting and training services
    DocBook/DITA customization, custom XSLT/XSL-FO document processing
    ------------------------------------------------------------------
    Bringing you XML Prague conference http://xmlprague.cz
    ------------------------------------------------------------------




  • 4.  Re: [docbook-apps] Japanese index

    Posted 04-25-2018 09:57
    On 25/04/2018 10:50, Jirka Kosek wrote:
    ...
    >> It's probably not what you want to hear, but Antenna House does have a
    >> commercial product for doing DocBook indexes:
    >>
    >> https://www.antennahouse.com/antenna1/i18n-index-library/
    >
    > Isn't this newer version of library that is needed for "kimber" indexing
    > method? I though that Elliot intended to convince AH to make this
    > library open-source, but it seems that my memory is wrong.

    Sorry, but I've never used it, so all that I know is what's on the website.

    Regards,


    Tony Graham.
    --
    Senior Architect
    XML Division
    Antenna House, Inc.
    ----
    Skerries, Ireland
    tgraham@antenna.co.jp



  • 5.  Re: [docbook-apps] Japanese index

    Posted 04-25-2018 10:11
    On 25.4.2018 11:57, Tony Graham wrote:
    > Sorry, but I've never used it, so all that I know is what's on the website.

    I see. After digging some old emails I have been able to find this link:

    https://www.antennahouse.com/i18n-support-library-2/

    It contains open-source part of library that should be working with
    "kimber" method in the stylesheets. This could provide Jan with correct
    Japanese indexing.

    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
    ------------------------------------------------------------------
    Professional XML and Web consulting and training services
    DocBook/DITA customization, custom XSLT/XSL-FO document processing
    ------------------------------------------------------------------
    Bringing you XML Prague conference http://xmlprague.cz
    ------------------------------------------------------------------




  • 6.  Re: [docbook-apps] Japanese index

    Posted 04-25-2018 16:18
    Hello all,

    Thanks for tracking down that package, Jirka.  I haven't tested it, but
    that should work with Japanese using the kimber indexing method as
    described in my book:

    http://www.sagehill.net/docbookxsl/IndexIntl.html#KimberIndexMethod

    I contacted Eliot Kimber, the author of the i18n_support library for
    whom the "kimber" method was named.  He informed me that the original
    library was under the GNU Lesser GPL license, and Antenna House took it
    in 2008, enhanced it by paying for the building of a complete
    Traditional Chinese dictionary, and made it their commercial product.
    After that date, he made further enhancements, including locating an
    open source Chinese dictionary.  I have a copy of a later version, but I
    don't think it has that dictionary.  He says it is still under the GNU
    license and can be distributed.  I will compare the two versions and
    will eventually put a package up on the DocBook Wiki for others to use. 
    But for now, the Antenna House version should work.

    Bob Stayton
    Sagehill Enterprises
    bobs@sagehill.net

    On 4/25/2018 3:10 AM, Jirka Kosek wrote:
    > On 25.4.2018 11:57, Tony Graham wrote:
    >> Sorry, but I've never used it, so all that I know is what's on the website.
    > I see. After digging some old emails I have been able to find this link:
    >
    > https://www.antennahouse.com/i18n-support-library-2/
    >
    > It contains open-source part of library that should be working with
    > "kimber" method in the stylesheets. This could provide Jan with correct
    > Japanese indexing.
    >




  • 7.  Re: [docbook-apps] Japanese index

    Posted 04-25-2018 16:19
    On 25/04/2018 11:10, Jirka Kosek wrote:
    > On 25.4.2018 11:57, Tony Graham wrote:
    >> Sorry, but I've never used it, so all that I know is what's on the website.
    >
    > I see. After digging some old emails I have been able to find this link:
    >
    > https://www.antennahouse.com/i18n-support-library-2/
    >
    > It contains open-source part of library that should be working with
    > "kimber" method in the stylesheets. This could provide Jan with correct
    > Japanese indexing.

    As relayed to me:

    Eliot Kimber originally developed the i18n Library for one of his
    customers and made it open source. Antenna House made some minor
    corrections and improvements and made those available under the open
    source license, the Support Library with no formal support. At the same
    time, Antenna House added Chinese sorting, both Traditional and
    Simplified, enhanced the library for DocBook, and offered official
    support. Over the years Antenna House has further enhanced the sorting
    module, greatly improved it, added additional languages, and created
    stylesheets (and developed the PDF5-ML DITA plugin).

    Regards,


    Tony Graham.
    --
    Senior Architect
    XML Division
    Antenna House, Inc.
    ----
    Skerries, Ireland
    tgraham@antenna.co.jp



  • 8.  Re: [docbook-apps] Japanese index

    Posted 04-25-2018 16:22
    I forgot to mention that Eliot pointed me to a new package of i18n
    support that he created for DITA and it is packaged for the DITA-OT, but
    it can be adapted for use outside of DITA. It is written for Saxon 9,
    however.  It is available on GitHub at:

    https://github.com/dita-community/org.dita-community.i18n

    Bob Stayton
    Sagehill Enterprises
    bobs@sagehill.net

    On 4/25/2018 3:10 AM, Jirka Kosek wrote:
    > On 25.4.2018 11:57, Tony Graham wrote:
    >> Sorry, but I've never used it, so all that I know is what's on the website.
    > I see. After digging some old emails I have been able to find this link:
    >
    > https://www.antennahouse.com/i18n-support-library-2/
    >
    > It contains open-source part of library that should be working with
    > "kimber" method in the stylesheets. This could provide Jan with correct
    > Japanese indexing.
    >




  • 9.  RE: [docbook-apps] Japanese index

    Posted 04-25-2018 18:31
    On 2018-04-25 Jirka Kosek wrote:
    > On 25.4.2018 11:57, Tony Graham wrote:
    > > Sorry, but I've never used it, so all that I know is what's on the website.
    >
    > I see. After digging some old emails I have been able to find this link:
    >
    > https://www.antennahouse.com/i18n-support-library-2/
    >
    > It contains open-source part of library that should be working with
    > "kimber" method in the stylesheets. This could provide Jan with correct
    > Japanese indexing.

    Oops, I've forgotten there are other indexing methods. That 'kimber' looks very promising.

    Thanks a lot for the link to the Saxon extension. The original link at http://www.sagehill.net/docbookxsl/IndexIntl.html is broken now.

    If I understand correctly, comparing the open-source version 1 and the version 2, the latter brings enhancements in Chinese sorting and support for additional languages. Both is not directly related to Japanese so I'll start with that open source version.

    Thanks,

    Jan