docbook-apps

  • 1.  Webhelp search trims certain letters from search terms?

    Posted 03-09-2011 08:07
    Hi.

    I'm producing webhelp output
    (http://www.thingbag.net/docbook/gsoc2010/doc/content/index.html) and
    I noticed that when I search for the term "nucleus," the webhelp
    search function removes the letter s and searches for "nucleu."
    "Nucleus" is a commonly used term in my document. I see the same
    behavior with the search term "zeus" and "tutus" becomes "tutu."

    Is this a configurable behavior? Is the search function purposely
    simplifying my terms?

    Thanks.

    Peter Desjardins



  • 2.  Re: [docbook-apps] Webhelp search trims certain letters from search terms?

    Posted 03-09-2011 11:56
    On Wed, Mar 9, 2011 at 1:36 PM, Peter Desjardins <peter.desjardins.us@
    gmail.com> wrote:

    > Hi.
    >
    > I'm producing webhelp output
    > (http://www.thingbag.net/docbook/gsoc2010/doc/content/index.html) and
    > I noticed that when I search for the term "nucleus," the webhelp
    > search function removes the letter s and searches for "nucleu."
    > "Nucleus" is a commonly used term in my document. I see the same
    > behavior with the search term "zeus" and "tutus" becomes "tutu."
    >
    > Is this a configurable behavior? Is the search function purposely
    > simplifying my terms?
    >

    Hi Peter,

    The searching happens for the stemmed words of the given query. i.e. it
    purposely get the root words of the given search terms to provide better
    searching support. Link [1] has an small introduction on what stemmer does
    and the limitations it has. WebHelp uses Porter stemmer for English [2], and
    Snowball stemmers for several other languages [3].

    Does it return false results for 'nucleu' when searched for 'nucleus'? We
    tested the search with stemming, and it worked as expected except some few
    glitches which is ignorable compared to the power it adds!

    [1]
    http://blog.kasunbg.org/2010/10/javascript-stemmer-for-french-language.html
    [2] http://snowball.tartarus.org/algorithms/porter/stemmer.html
    [3]
    http://docbook.sourceforge.net/release/xsl/current/webhelp/docs/content/ch03s02.html


    --Kasun


    >
    > Thanks.
    >
    > Peter Desjardins
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
    > For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org
    >
    >

    --
    ~~~*******'''''''''''''*******~~~
    Kasun Gajasinghe,
    University of Moratuwa,
    Sri Lanka.
    Blog: http://blog.kasunbg.org
    Twitter: http://twitter.com/kasunbg



  • 3.  Re: [docbook-apps] Webhelp search trims certain letters from search terms?

    Posted 03-09-2011 16:18
    On Wed, Mar 9, 2011 at 6:56 AM, Kasun Gajasinghe
    <kasun.gajasinghe@gmail.com> wrote:

    > Does it return false results for 'nucleu' when searched for 'nucleus'? We
    > tested the search with stemming, and it worked as expected except some few
    > glitches which is ignorable compared to the power it adds!

    I have a document that includes the term "nucleus" many times. If I
    enter the term "nucleus" in the search field, the result is:

    Your search returned no results for nucleu

    Is there a way to adjust the stemmer so that it will ignore specific
    words like "nucleus?" This word is extremely important in my case. I'm
    sure there are others.

    Thanks.

    Peter



  • 4.  Re: [docbook-apps] Webhelp search trims certain letters from search terms?

    Posted 03-09-2011 16:30
    Hi Peter,
    For your immediate needs you can turn off stemming by changing the
    webhelp.indexer.language property to some unsupported language:

    webhelp.indexer.language=xx

    That way you won't get stemming, but you won't be affected by this bug.

    It looks to me like the client-side stemmer is stemming nucleus but the
    indexer is not:

    With a test, in index-2.js, I see:

    w["nucleus"]="9";

    Certainly something we'll investigate.

    Thanks,
    David

    On 03/09/2011 10:17 AM, Peter Desjardins wrote:
    > On Wed, Mar 9, 2011 at 6:56 AM, Kasun Gajasinghe
    > <kasun.gajasinghe@gmail.com> wrote:
    >
    >> Does it return false results for 'nucleu' when searched for 'nucleus'? We
    >> tested the search with stemming, and it worked as expected except some few
    >> glitches which is ignorable compared to the power it adds!
    > I have a document that includes the term "nucleus" many times. If I
    > enter the term "nucleus" in the search field, the result is:
    >
    > Your search returned no results for nucleu
    >
    > Is there a way to adjust the stemmer so that it will ignore specific
    > words like "nucleus?" This word is extremely important in my case. I'm
    > sure there are others.
    >
    > Thanks.
    >
    > Peter
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
    > For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org
    >