docbook-apps

Expand all | Collapse all

Break long programlisting lines without spaces?

  • 1.  Break long programlisting lines without spaces?

    Posted 04-13-2011 16:45
    Hi. I'm producing PDF and XHTML from DocBook using Saxon 6.5.5 and FOP 1.0.

    Some of my programlistings include long lines of code or sample
    output. I configured my customization layer to break long lines at
    spaces as described here. It works wonderfully.

    http://www.sagehill.net/docbookxsl/FittingText.html#BreakLongLines

    The remaining long lines occur at unbroken strings of characters. I
    have some that are long enough to spill off the paper in PDF. Is there
    any way to break lines at spaces where they exist and then set a
    maximum line length at which the line must be forcibly broken?

    Has anyone developed a customization that will do this?

    Thanks for your help.

    Peter Desjardins



  • 2.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-13-2011 21:32
    Peter Desjardins wrote:

    > Has anyone developed a customization that will do this?

    There is parameter for this:

    http://docbook.sourceforge.net/release/xsl/current/doc/fo/hyphenate.verbatim.characters.html

    If you want to allow breaking after any character you can put complete
    alphabet inside this parameter.

    However I'm not sure if this will work in FOP. This is known to work in
    XEP, but AFAIK it was not working in FOP 0.20.x/0.9x. I have never
    tested it with FOP 1.0.

    Jirka

    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
    ------------------------------------------------------------------
    Professional XML consulting and training services
    DocBook customization, custom XSLT/XSL-FO document processing
    ------------------------------------------------------------------
    OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
    ------------------------------------------------------------------




  • 3.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 01:42
    On 04/13/2011 04:32 PM, Jirka Kosek wrote:
    > Peter Desjardins wrote:
    >
    >> Has anyone developed a customization that will do this?
    > There is parameter for this:
    >
    > http://docbook.sourceforge.net/release/xsl/current/doc/fo/hyphenate.verbatim.characters.html
    >
    > If you want to allow breaking after any character you can put complete
    > alphabet inside this parameter.
    >
    > However I'm not sure if this will work in FOP. This is known to work in
    > XEP, but AFAIK it was not working in FOP 0.20.x/0.9x. I have never
    > tested it with FOP 1.0.
    >
    > Jirka
    >

    I've done a little research on this and found on the fop list some
    discussion (pasted below). The fop developers feel that using the
    hyphenation character to achieve this is an abuse of the
    hypenation-character property. However, it is to my mind an obvious and
    important thing to want to do, so I would say that it's an omission from
    the FO-spec. I've meant to post on the fop list asking for an extension
    feature of some kind to allow for this situation without abusing any
    features, but haven't gotten around to it.

    David

    http://markmail.org/thread/g32fgn6pjxotudlu#query:+page:1+mid:s77ceg3jx2hxtzlc+state:results


    "...Like its name indicates, the hyphenation-character property
    specifies which character should be used when hyphenating a word. So it
    will be used only when a line break occurs within a word.

    The requirements of the present case are different. Stylesheets doing
    things like the above are just abusing the hyphenation-character property."

    http://permalink.gmane.org/gmane.text.xml.fop.user/30480

    "The hyphenation-character property (which FOP does support) is not meant
    to be used for adding continuation characters where long lines are being
    wrapped. It's only used when breaking /inside/ a word, which rarely is
    what you want when you typeset program listings."



  • 4.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 03:06
    It does not seem to work in FOP 1.0. Using hyphenate.verbatim results
    in SEVERE: Exceptions. Strangely, it seems to insert several of the
    hyphenation characters and then fail at some point before completing
    the document. Based on David's research, I may not try to hunt down
    that point.

    Maybe I can add a preprocessing step find really long strings in
    programlistings, break them at a certain length, and insert a
    line-wrap character and a line break processing instruction.

    Thanks for your help.

    Peter

    On Wed, Apr 13, 2011 at 9:41 PM, David Cramer <david@thingbag.net> wrote:
    > On 04/13/2011 04:32 PM, Jirka Kosek wrote:
    >> Peter Desjardins wrote:
    >>
    >>> Has anyone developed a customization that will do this?
    >> There is parameter for this:
    >>
    >> http://docbook.sourceforge.net/release/xsl/current/doc/fo/hyphenate.verbatim.characters.html
    >>
    >> If you want to allow breaking after any character you can put complete
    >> alphabet inside this parameter.
    >>
    >> However I'm not sure if this will work in FOP. This is known to work in
    >> XEP, but AFAIK it was not working in FOP 0.20.x/0.9x. I have never
    >> tested it with FOP 1.0.
    >>
    >>                       Jirka
    >>
    >
    > I've done a little research on this and found on the fop list some
    > discussion (pasted below). The fop developers feel that using the
    > hyphenation character to achieve this is an abuse of the
    > hypenation-character property. However, it is to my mind an obvious and
    > important thing to want to do, so I would say that it's an omission from
    > the FO-spec. I've meant to post on the fop list asking for an extension
    > feature of some kind to allow for this situation without abusing any
    > features, but haven't gotten around to it.
    >
    > David
    >
    > http://markmail.org/thread/g32fgn6pjxotudlu#query:+page:1+mid:s77ceg3jx2hxtzlc+state:results
    >
    >
    > "...Like its name indicates, the hyphenation-character property
    > specifies which character should be used when hyphenating a word. So it
    > will be used only when a line break occurs within a word.
    >
    > The requirements of the present case are different. Stylesheets doing
    > things like the above are just abusing the hyphenation-character property."
    >
    > http://permalink.gmane.org/gmane.text.xml.fop.user/30480
    >
    > "The hyphenation-character property (which FOP does support) is not meant
    > to be used for adding continuation characters where long lines are being
    > wrapped. It's only used when breaking /inside/ a word, which rarely is
    > what you want when you typeset program listings."
    >
    > ---------------------------------------------------------------------
    > To unsubscribe, e-mail: docbook-apps-unsubscribe@lists.oasis-open.org
    > For additional commands, e-mail: docbook-apps-help@lists.oasis-open.org
    >
    >



  • 5.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 06:25
    On Wed, 13 Apr 2011 20:41:43 -0500
    David Cramer <david@thingbag.net> wrote:

    > On 04/13/2011 04:32 PM, Jirka Kosek wrote:
    > > Peter Desjardins wrote:
    > >
    > >> Has anyone developed a customization that will do this?
    > > There is parameter for this:
    > >
    > > http://docbook.sourceforge.net/release/xsl/current/doc/fo/hyphenate.verbatim.characters.html
    > >
    > > If you want to allow breaking after any character you can put
    > > complete alphabet inside this parameter.

    > I've done a little research on this and found on the fop list some
    > discussion (pasted below). The fop developers feel that using the
    > hyphenation character to achieve this is an abuse of the
    > hypenation-character property. However, it is to my mind an obvious
    > and important thing to want to do, so I would say that it's an
    > omission from the FO-spec.


    Could you be a bit more specific please David?
    An omission from the spec? where, and in what regard please.
    Whilst I agree a hyphen is not a space... the logic of using
    this seems 'right'?



    --

    regards

    --
    Dave Pawson
    XSLT XSL-FO FAQ.
    http://www.dpawson.co.uk



  • 6.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 08:06
    David Cramer wrote:

    > I've done a little research on this and found on the fop list some
    > discussion (pasted below). The fop developers feel that using the
    > hyphenation character to achieve this is an abuse of the
    > hypenation-character property.

    I don't think so. Are FOP developers aware of the fact that DocBook
    stylesheets insert soft-hyphen (U+00AD) character at all places of
    possible break? This characters marks possible places for hyphenation.

    > However, it is to my mind an obvious and
    > important thing to want to do, so I would say that it's an omission from
    > the FO-spec.

    Of course U+00AD is defined in Unicode and not in XSL-FO.

    > I've meant to post on the fop list asking for an extension
    > feature of some kind to allow for this situation without abusing any
    > features, but haven't gotten around to it.

    It would be nice to ask FOP developers why they think it is abuse. I
    don't see any evidence for this neither in Unicode nor in XSL-FO spec. I
    think that this probably lives in area where exact behaviour is
    implementation dependent. So obvious thing to do is to follow useful
    behavior of market leaders (XEP, XSL Formatter).

    Jirka

    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
    ------------------------------------------------------------------
    Professional XML consulting and training services
    DocBook customization, custom XSLT/XSL-FO document processing
    ------------------------------------------------------------------
    OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
    ------------------------------------------------------------------




  • 7.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 12:22
    On 04/14/2011 03:06 AM, Jirka Kosek wrote:
    > David Cramer wrote:
    >
    >> I've done a little research on this and found on the fop list some
    >> discussion (pasted below). The fop developers feel that using the
    >> hyphenation character to achieve this is an abuse of the
    >> hypenation-character property.
    > I don't think so. Are FOP developers aware of the fact that DocBook
    > stylesheets insert soft-hyphen (U+00AD) character at all places of
    > possible break? This characters marks possible places for hyphenation.
    >
    >> However, it is to my mind an obvious and
    >> important thing to want to do, so I would say that it's an omission from
    >> the FO-spec.
    > Of course U+00AD is defined in Unicode and not in XSL-FO.
    >
    >> I've meant to post on the fop list asking for an extension
    >> feature of some kind to allow for this situation without abusing any
    >> features, but haven't gotten around to it.
    > It would be nice to ask FOP developers why they think it is abuse. I
    > don't see any evidence for this neither in Unicode nor in XSL-FO spec. I
    > think that this probably lives in area where exact behaviour is
    > implementation dependent. So obvious thing to do is to follow useful
    > behavior of market leaders (XEP, XSL Formatter).
    >
    > Jirka
    >
    In that case, we should take this over to the fop list. I think you'll
    do a better job at making the argument than I would, but I'll be there
    to give what you say a +1. Fop has come a long way and produces good
    output, but for many users this one issue will be a deal killer.

    David



  • 8.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 15:12
    On 14/04/11 09:06, Jirka Kosek wrote:
    > David Cramer wrote:
    >
    >> I've done a little research on this and found on the fop list some
    >> discussion (pasted below). The fop developers feel that using the
    >> hyphenation character to achieve this is an abuse of the
    >> hypenation-character property.
    >
    > I don't think so. Are FOP developers aware of the fact that DocBook
    > stylesheets insert soft-hyphen (U+00AD) character at all places of
    > possible break? This characters marks possible places for hyphenation.

    Those are two different things. Inserting soft hyphens at certain places
    is a ‘manual’ process (in this case automated by the XSLT) that gives
    the XSL-FO processor hints about where text can be broken.

    The hyphenation-character XSL-FO property simply allows to customise the
    character that will be used when a word is broken over two lines. That
    character must not and will not be used if the line break occurs between
    two words.

    AFAIU the DocBook stylesheets do the former. But they do it in a way
    that I’m not sure is compatible with the Unicode specification:
    http://www.unicode.org/reports/tr14/#SoftHyphen
    A soft hyphen is supposed to be used inside a word, not after
    whitespace.

    What is needed is a generalisation of the concept of soft hyphen:
    a character that appears only if it is at the end of a line. If
    I remember well LaTeX provides a \discretionary command that allows to
    do that. To my knowledge there is nothing equivalent in XSL-FO.

    I keep thinking that this issue is best resolved manually. The output is
    likely to look much better as text will be broken at sensible places and
    indented appropriately.


    >> However, it is to my mind an obvious and
    >> important thing to want to do, so I would say that it's an omission from
    >> the FO-spec.
    >
    > Of course U+00AD is defined in Unicode and not in XSL-FO.
    >
    >> I've meant to post on the fop list asking for an extension
    >> feature of some kind to allow for this situation without abusing any
    >> features, but haven't gotten around to it.
    >
    > It would be nice to ask FOP developers why they think it is abuse. I
    > don't see any evidence for this neither in Unicode nor in XSL-FO spec. I
    > think that this probably lives in area where exact behaviour is
    > implementation dependent. So obvious thing to do is to follow useful
    > behavior of market leaders (XEP, XSL Formatter).
    >
    > Jirka
    >

    Vincent



  • 9.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 16:00
    Hi Vincent,

    On 04/14/2011 10:12 AM, Vincent Hennebert wrote:
    > I keep thinking that this issue is best resolved manually. The output is
    > likely to look much better as text will be broken at sensible places and
    > indented appropriately.

    Yes, but there are cases where that's impossible. For example, we have
    code samples stored in external files. These files are part of the build
    and test suites, so the same sample code that appears in the doc was
    used in the product, test cases, etc. It's difficult in this case to add
    strange characters or other markup in the code samples.

    >
    >>> >> However, it is to my mind an obvious and
    >>> >> important thing to want to do, so I would say that it's an omission from
    >>> >> the FO-spec.
    >> >
    >> > Of course U+00AD is defined in Unicode and not in XSL-FO.

    I was thinking here at a higher level. As a stylesheet developer, I need
    some way to solve the problem of specifying that in verbatim
    environments, the lines be automatically broken (at a configurable set
    of characters) when they exceed the available space and a special
    character be added to indicate to the user that the line was broken for
    typographical reasons. The hyphenation approach does indeed feel like a
    hack. I think it would be better if I just told the fo renderer what I
    want (break-long-lines="yes" break-at="/*();. "
    break-character="⇦") and have it decide how to deal with the problem.

    Another problem that arises with FOP and the hyphenation approach is
    that font-select-strategy isn't supported, so you are limited in the
    characters you can use as you hyphen character. You typically use a
    monospace font for your code listings but want to use some kind of fancy
    arrow character to indicate the line was broken. The monospace fonts
    don't have any suitable characters so you're stuck.

    Thanks,
    David



  • 10.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 19:51
    David Cramer wrote:

    > I was thinking here at a higher level. As a stylesheet developer, I need
    > some way to solve the problem of specifying that in verbatim
    > environments, the lines be automatically broken (at a configurable set
    > of characters) when they exceed the available space and a special
    > character be added to indicate to the user that the line was broken for
    > typographical reasons. The hyphenation approach does indeed feel like a
    > hack. I think it would be better if I just told the fo renderer what I
    > want (break-long-lines="yes" break-at="/*();. "
    > break-character="⇦") and have it decide how to deal with the problem.

    Something very similar is part of XSL-FO 2.0 requirements (see
    http://www.w3.org/TR/xslfo20-req/#N66916). But it will be long way
    before it got implemented.

    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
    ------------------------------------------------------------------
    Professional XML consulting and training services
    DocBook customization, custom XSLT/XSL-FO document processing
    ------------------------------------------------------------------
    OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
    ------------------------------------------------------------------




  • 11.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-25-2011 16:14
    On 04/14/2011 02:51 PM, Jirka Kosek wrote:
    > David Cramer wrote:
    >
    >> I was thinking here at a higher level. As a stylesheet developer, I need
    >> some way to solve the problem of specifying that in verbatim
    >> environments, the lines be automatically broken (at a configurable set
    >> of characters) when they exceed the available space and a special
    >> character be added to indicate to the user that the line was broken for
    >> typographical reasons. The hyphenation approach does indeed feel like a
    >> hack. I think it would be better if I just told the fo renderer what I
    >> want (break-long-lines="yes" break-at="/*();. "
    >> break-character="⇦") and have it decide how to deal with the problem.
    >
    > Something very similar is part of XSL-FO 2.0 requirements (see
    > http://www.w3.org/TR/xslfo20-req/#N66916). But it will be long way
    > before it got implemented.
    >

    That would be the following:

    "5.7.4 Line breaks without hyphen character

    Allow the specification of a set of characters after which the
    composition process may introduce a line break without inserting a
    hyphen character. For example for '/' characters in URLs. "

    I don't see how that (as written) would help in the programlisting case.
    What I want is the ability to specify a character that indicates the
    line was broken for typographical reasons and I want that character to
    appear at the end of the line so that all these characters are aligned,
    just like in a professionally published book.

    For all use cases, you would want the ability to tell the renderer to
    stop processing if this happens. That way if Dick is preparing a
    manuscript for print publishing and wants to break all the lines
    manually, he can be sure he hasn't missed any.

    Regards,
    David



  • 12.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 16:02
    Vincent Hennebert wrote:

    > The hyphenation-character XSL-FO property simply allows to customise the
    > character that will be used when a word is broken over two lines. That
    > character must not and will not be used if the line break occurs between
    > two words.

    If the hyphenation point is made automatically by formatter then there
    of course shouldn't be visible hyphenation character. But there is
    nothing saying that if SHY is used and line break occurs after it, SHY
    should be discarded. SHY is discarded only if the line break is not
    occurring after it.

    I understand to your reasoning which is based on fact that hyphenation
    should occur only inside words. But even in XSL-FO hyphenation is not
    property of word, but of single character, see:

    http://www.w3.org/TR/xsl/#hyphenate

    Especially for CJK languages common "Western" concept of word,
    hyphenation and word/line breaking is completely wrong.

    > AFAIU the DocBook stylesheets do the former. But they do it in a way
    > that I’m not sure is compatible with the Unicode specification:
    > http://www.unicode.org/reports/tr14/#SoftHyphen
    > A soft hyphen is supposed to be used inside a word, not after
    > whitespace.

    I don't see anything there suggesting that SHY can't be used after any
    arbitrary character. This section describes how SHY is supposed to work
    inside words and how it is related to hyphenating.

    > What is needed is a generalisation of the concept of soft hyphen:
    > a character that appears only if it is at the end of a line. If
    > I remember well LaTeX provides a \discretionary command that allows to
    > do that. To my knowledge there is nothing equivalent in XSL-FO.

    Unfortunately there is no \discretionary equivalent in XSL-FO.

    Requirements for XSL-FO 2.0 lists "2.2.10 Text before or after a break"
    which might solve some use cases of \discretionary

    http://www.w3.org/TR/xslfo20-req/#N66479

    > I keep thinking that this issue is best resolved manually. The output is
    > likely to look much better as text will be broken at sensible places and
    > indented appropriately.

    That's true except for cases where you don't have luxury of fixing it
    manually. Then it is better to have indication of forced line breaks,
    then no such indication or even cropped text.

    Jirka

    --
    ------------------------------------------------------------------
    Jirka Kosek e-mail: jirka@kosek.cz http://xmlguru.cz
    ------------------------------------------------------------------
    Professional XML consulting and training services
    DocBook customization, custom XSLT/XSL-FO document processing
    ------------------------------------------------------------------
    OASIS DocBook TC member, W3C Invited Expert, ISO JTC1/SC34 member
    ------------------------------------------------------------------




  • 13.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 16:58
    Regarding this entire thread, I know an automatic method is attractive, and may make sense in some cases, but with programming examples, I think going fully automated is dangerous.

    There are two competing questions:

    1) Does the example scan well to the eye?
    2) If the reader types (or cuts and pastes) the example into a compiler/interpreter verbatim, will it work?

    I don't know of any automated approach that can satisfy these two conditions consistently.

    I think you're better off having the stylesheet warn you when you exceed a maximum line length, and then fixing the offending examples directly. If you do that, you can be assured that examples will work the same for readers who type or cut/paste the example as it the original example did.

    Jirka does make a good point that you may not have that luxury, but if you don't, and you need to accept un-reviewed line breaks, then at least make files that contain the "true, unbroken" examples available for download.

    Best Regards,
    Dick Hamilton




    -------
    XML Press
    XML for Technical Communicators
    http://xmlpress.net
    hamilton@xmlpress.net
    (970) 231-3624




  • 14.  Re: [docbook-apps] Break long programlisting lines without spaces?

    Posted 04-14-2011 17:11
    On 04/14/2011 11:57 AM, hamilton@xmlpress.net wrote:
    > Regarding this entire thread, I know an automatic method is attractive, and may make sense in some cases, but with programming examples, I think going fully automated is dangerous.
    >
    > There are two competing questions:
    >
    > 1) Does the example scan well to the eye?
    > 2) If the reader types (or cuts and pastes) the example into a compiler/interpreter verbatim, will it work?

    I'm willing to abandon #2 for pdf output. If you're online, you're
    probably (or should be) using the html output we provide. Cutting and
    pasting from html is more reliable/easier anyway. Pdfs are really for
    printing.

    > I don't know of any automated approach that can satisfy these two conditions consistently.
    >
    > I think you're better off having the stylesheet warn you when you exceed a maximum line length, and then fixing the offending examples directly. If you do that, you can be assured that examples will work the same for readers who type or cut/paste the example as it the original example did.
    >
    > Jirka does make a good point that you may not have that luxury, but if you don't, and you need to accept un-reviewed line breaks, then at least make files that contain the "true, unbroken" examples available for download.

    Yes, there are cases where this is what you want. You have the option of
    searching for the special autowrap character and manually modifying the
    code sample so that the automatic break isn't necessary.

    Some use cases require the auto-wrapping feature. In other cases,
    killing the build if a line is too long is the right answer.

    David