docbook-apps

  • 1.  Re: [docbook-apps] Writing mode, xsl-fo output

    Posted 04-01-2011 17:40
    Hi Dave,
    This section has some basic information:

    http://www.sagehill.net/docbookxsl/Localizations.html#WritingMode

    But when you say "some rl-tb" text, do you mean a mixed language document? In that
    case, the writing mode value should be for the dominant language, since the document's
    writing mode determines the page layout.. Any inline translated text should get the
    correct text direction based on its Unicode character range.

    Can you describe in more detail what your needs are?

    Bob Stayton
    Sagehill Enterprises
    bobs@sagehill.net





  • 2.  Re: [docbook-apps] Writing mode, xsl-fo output

    Posted 04-01-2011 18:08
    On Fri, 1 Apr 2011 10:40:16 -0700, "Bob Stayton" <bobs@sagehill.net>
    wrote:
    > But when you say "some rl-tb" text, do you mean a mixed language
    document?
    > In that case, the writing mode value should be for the dominant
    language,
    > since the document's writing mode determines the page layout..
    > Any inline translated text should get the
    > correct text direction based on its Unicode character range.

    That last sentence--that the writing direction can be determined by
    inspecting the characters--is a common intuition (it was once my own
    intuition). But it isn't quite that simple, since some symmetrical
    punctuation marks belong sometimes to L2R text, and sometimes to R2L text.
    For example, an ASCII period at the end of a run of R2L text might belong
    at the left end of the R2L text, or--if the R2L text is at the end of an
    L2R text--it might belong at the right end of the L2R text (and therefore
    at the right end of the R2L text).

    Unsymmetrical punctuation marks sometimes exist as distinct L2R and R2L
    code points in Unicode, like the ASCII comma vs. the Arabic comma U+060C.
    But Parentheses (which of course are asymmetrical) are also sometimes used
    inside runs of R2L text--I've seen them in Urdu, for example. Here I
    believe the ASCII open parenthesis is used as an Urdu close paren, and vice
    versa.

    Space characters of course also fall into this category of ambiguous
    direction, although that's generally handled correctly by algorithmic
    methods.

    There's been considerable discussion of this general issue (whether it's
    possible to algorithmically determine the ends of an R2L run inside an L2R
    run, or vice versa) over on the XeTeX mailing list. The opinion of Those
    Who Know seems to be that it is not 100% decidable.

    Mike Maxwell



  • 3.  Re: [docbook-apps] Writing mode, xsl-fo output

    Posted 04-01-2011 19:38
    On Fri, 01 Apr 2011 14:08:07 -0400
    maxwell <maxwell@umiacs.umd.edu> wrote:

    > On Fri, 1 Apr 2011 10:40:16 -0700, "Bob Stayton" <bobs@sagehill.net>
    > wrote:
    > > But when you say "some rl-tb" text, do you mean a mixed language
    > document?
    > > In that case, the writing mode value should be for the dominant
    > language,
    > > since the document's writing mode determines the page layout..
    > > Any inline translated text should get the
    > > correct text direction based on its Unicode character range.
    >
    > That last sentence--that the writing direction can be determined by
    > inspecting the characters--is a common intuition (it was once my own
    > intuition). But it isn't quite that simple, since some symmetrical
    > punctuation marks belong sometimes to L2R text, and sometimes to R2L
    > text. For example, an ASCII period at the end of a run of R2L text
    > might belong at the left end of the R2L text, or--if the R2L text is
    > at the end of an L2R text--it might belong at the right end of the
    > L2R text (and therefore at the right end of the R2L text).
    >
    > Unsymmetrical punctuation marks sometimes exist as distinct L2R and
    > R2L code points in Unicode, like the ASCII comma vs. the Arabic comma
    > U+060C. But Parentheses (which of course are asymmetrical) are also
    > sometimes used inside runs of R2L text--I've seen them in Urdu, for
    > example. Here I believe the ASCII open parenthesis is used as an
    > Urdu close paren, and vice versa.
    >
    > Space characters of course also fall into this category of ambiguous
    > direction, although that's generally handled correctly by algorithmic
    > methods.
    >
    > There's been considerable discussion of this general issue (whether
    > it's possible to algorithmically determine the ends of an R2L run
    > inside an L2R run, or vice versa) over on the XeTeX mailing list.
    > The opinion of Those Who Know seems to be that it is not 100%
    > decidable.
    >
    > Mike Maxwell


    I think I would rather specify what I'm writing rather than
    leave it to the code point.

    Although I'm unsure who / what would do that? The formatter?




    --

    regards

    --
    Dave Pawson
    XSLT XSL-FO FAQ.
    http://www.dpawson.co.uk



  • 4.  Re: [docbook-apps] Writing mode, xsl-fo output

    Posted 04-01-2011 20:02
    On Fri, 1 Apr 2011 20:38:13 +0100, Dave Pawson <davep@dpawson.co.uk>
    wrote:
    > I think I would rather specify what I'm writing rather than
    > leave it to the code point.
    >
    > Although I'm unsure who / what would do that? The formatter?

    There's a general DocBook attr @dir, see:
    http://www.docbook.org/tdg5/en/html/ref-elements.html
    For spans of text within e.g. a paragraph, you might use this attr on a
    <phrase>.

    That said, I'm not sure how well the formatting tools use this. We use
    dblatex, and I don't recall how well this attr is supported; there's this
    comment in the file dblatex-0.3/xsl/common/l10n.xsl:



    There are also two Unicode chars for this purpose, see:
    http://www.w3.org/TR/WCAG-TECHS/H34.html

    In our own grammar work, we have added a few elements in our DocBook
    localization for text in right-to-left languages, which of course
    necessitated our writing some special XSLT code for the conversion to
    XeTeX.

    Mike Maxwell




  • 5.  Re: [docbook-apps] Writing mode, xsl-fo output

    Posted 04-01-2011 20:32
    On Fri, April 1, 2011 7:08 pm, maxwell wrote:
    > On Fri, 1 Apr 2011 10:40:16 -0700, "Bob Stayton" <bobs@sagehill.net>
    > wrote:
    >> But when you say "some rl-tb" text, do you mean a mixed language
    > document?
    >> In that case, the writing mode value should be for the dominant
    > language,
    >> since the document's writing mode determines the page layout..
    >> Any inline translated text should get the
    >> correct text direction based on its Unicode character range.
    >
    > That last sentence--that the writing direction can be determined by
    > inspecting the characters--is a common intuition (it was once my own
    > intuition). But it isn't quite that simple, since some symmetrical
    > punctuation marks belong sometimes to L2R text, and sometimes to R2L text.

    The conventional approach is to implement the Unicode Bidirectional
    Algorithm [1] (or use a library that already implements it). It may not
    be perfect -- every so often you'll meet people who say it isn't good
    enough -- but since it's up to revision 23 so far, you'll see they're
    still trying to make it as perfect as possible.

    > For example, an ASCII period at the end of a run of R2L text might belong
    > at the left end of the R2L text, or--if the R2L text is at the end of an
    > L2R text--it might belong at the right end of the L2R text (and therefore
    > at the right end of the R2L text).

    The BIDI algorithm has rules about resolving direction among characters
    with strong, weak, and neutral directionality.

    > Unsymmetrical punctuation marks sometimes exist as distinct L2R and R2L
    > code points in Unicode, like the ASCII comma vs. the Arabic comma U+060C.
    > But Parentheses (which of course are asymmetrical) are also sometimes used
    > inside runs of R2L text--I've seen them in Urdu, for example. Here I
    > believe the ASCII open parenthesis is used as an Urdu close paren, and
    > vice
    > versa.

    If you're using the BIDI algorithm, you'd always enter the open
    parentheses as the '(' character even when it will be shown with its
    mirrored glyph ')'.
    See http://www.unicode.org/reports/tr9/#Mirroring

    > Space characters of course also fall into this category of ambiguous
    > direction, although that's generally handled correctly by algorithmic
    > methods.
    >
    > There's been considerable discussion of this general issue (whether it's
    > possible to algorithmically determine the ends of an R2L run inside an L2R
    > run, or vice versa) over on the XeTeX mailing list. The opinion of Those
    > Who Know seems to be that it is not 100% decidable.

    Which is why there's also characters for explicit overrides.

    XML and other markup languages count as "higher level protocols" for the
    purposes of the BIDI algorithm, and a 'dir' attribute or similar should be
    used instead of the override characters. See
    http://www.unicode.org/reports/tr20/#Bidi

    Regards,


    Tony Graham
    Mentea.

    [1] http://www.unicode.org/reports/tr9/