docbook-apps

Expand all | Collapse all

Asciidoc -> docbook -> PDF tooling

  • 1.  Asciidoc -> docbook -> PDF tooling

    Posted 11-09-2021 12:02
    I am working on an AsciiDoc -> DocBook -> PDF toolchain for an open source project (so all tooling must be freely available) because the direct AsciiDoc -> PDF toolchain is inadequate for our purposes.

    I currently have a java/maven-based AsciiDoc -> DocBook -> FOP -> PDF chain within the docbkx-maven-plugin, but would like any suggestions that appear to be better maintained and are cross platform.


    Randall Wood


  • 2.  Re: [docbook-apps] Asciidoc -> docbook -> PDF tooling

    Posted 11-10-2021 17:27
    Probably not the solution you're looking for, but we have used dblatex
    http://dblatex.sourceforge.net
    (not to be confused with the older db2latex). For our use
    case--grammars of foreign languages, with mixed languages and scripts
    (Arabic, Bengali, Thaana)--it was probably the only solution. But if
    you're not familiar with LaTeX, tweaking it might be a steep learning curve.

    On 11/9/2021 7:01 AM, Randall Wood wrote:
    > I am working on an AsciiDoc -> DocBook -> PDF toolchain for an open source project (so all tooling must be freely available) because the direct AsciiDoc -> PDF toolchain is inadequate for our purposes.
    >
    > I currently have a java/maven-based AsciiDoc -> DocBook -> FOP -> PDF chain within the docbkx-maven-plugin, but would like any suggestions that appear to be better maintained and are cross platform.
    >
    >
    > Randall Wood--
    Mike Maxwell
    "Digital objects last forever--or five years,
    whichever comes first." --Jeff Rothenberg



  • 3.  Language support in XSL-FO Stylesheets

    Posted 06-23-2022 14:07
    Dear List Members,

    i had already sent this to the docbook List, but probably this list
    docbook-apps fits better.

    I am using DocBook for a bi-lingual book. Most of the content is written
    in english, but parts are written in the german language. PDF is
    produced with XSL Stylesheets 1.79.2 shipped within Oxygen 24 and the
    Antenna House Formatter v7.

    Since most content is english, /book/@xml:lang is 'en'. Fragments in
    german have @xml:lang='de' at the appropriate level, e. g. for section
    or note elements. Sometimes i have phrase or emphasis elements only
    because of the @xml:lang attribute.

    Observation is, that hyphenation is wrong in the PDF Document for the
    german fragments. I think i have found the reason, but i am puzzled.
    There are two issues which i can't understand:

    1) There is a template named "language.attribute" in I10n.xsl. It
    calculates the language value looking at the ancestor axis, and emits an
    attribute named @lang with that value. *First Issue: *the name of the
    attribute is wrong, the correct name is @language. See section 7.10.2
    "Language" <https://www.w3.org/TR/xsl11/#language>  in Extensible
    Stylesheet Language (XSL) Version 1.1.

    2) The template named "language.attribute" is rarely used. *2nd Issue:
    *I had to create a customization layer for the templates that matches
    d:para or d:simpara, which do emit an fo:block element, so that they
    call the language.attribute template. Same for d:phrase and d:emphasis
    in inline.xsl

    Maybe i have missed something obvious. Are there any reasons for this
    lack of language support?

    Sincerely, Frank Steimke


  • 4.  Re: [docbook-apps] Language support in XSL-FO Stylesheets

    Posted 06-23-2022 14:35
    Hi Frank,

    I can't add much except to say that I also hit a wall trying to generate a
    bilingual book (English and Japanese), and the index in particular was very
    difficult.

    I got some help from Bob Stayton, but the only solution was a hack to
    generate the index using another application that I wrote, which massaged
    the XSL-FO.

    The problem, including more detail from Bob, is recorded in this GitHub
    issue: https://github.com/docbook/xslt10-stylesheets/issues/238

    All best,

    M. Roberts

    On Thu, Jun 23, 2022 at 11:07 PM Frank Steimke <
    f-steimke@berger-und-steimke.de> wrote:

    > Dear List Members,
    >
    > i had already sent this to the docbook List, but probably this list
    > docbook-apps fits better.
    >
    > I am using DocBook for a bi-lingual book. Most of the content is written
    > in english, but parts are written in the german language. PDF is produced
    > with XSL Stylesheets 1.79.2 shipped within Oxygen 24 and the Antenna House
    > Formatter v7.
    >
    > Since most content is english, /book/@xml:lang is 'en'. Fragments in
    > german have @xml:lang='de' at the appropriate level, e. g. for section or
    > note elements. Sometimes i have phrase or emphasis elements only because of
    > the @xml:lang attribute.
    >
    > Observation is, that hyphenation is wrong in the PDF Document for the
    > german fragments. I think i have found the reason, but i am puzzled. There
    > are two issues which i can't understand:
    >
    > 1) There is a template named "language.attribute" in I10n.xsl. It
    > calculates the language value looking at the ancestor axis, and emits an
    > attribute named @lang with that value. *First Issue: *the name of the
    > attribute is wrong, the correct name is @language. See section 7.10.2
    > "Language" <https://www.w3.org/TR/xsl11/#language> in Extensible
    > Stylesheet Language (XSL) Version 1.1.
    >
    > 2) The template named "language.attribute" is rarely used. *2nd Issue: *I
    > had to create a customization layer for the templates that matches d:para
    > or d:simpara, which do emit an fo:block element, so that they call the
    > language.attribute template. Same for d:phrase and d:emphasis in inline.xsl
    >
    > Maybe i have missed something obvious. Are there any reasons for this lack
    > of language support?
    > Sincerely, Frank Steimke
    >



  • 5.  Re: [docbook-apps] Language support in XSL-FO Stylesheets

    Posted 06-23-2022 16:22
    This is probably not relevant to your cases, but we typeset and
    published (through Mouton) a number of grammars ten or so years ago,
    where the languages being described had unusual scripts: Western Panjabi
    (Nasta'liq variety of Arabic script, hence right-to-left), Bangla
    (Bengali script), and Dhivehi (Thaana script, also right-to-left), and
    Pashto (Naskh variety of Arabic script). But we were using dblatex
    (http://dblatex.sourceforge.net) to convert our DocBook source to
    XeLaTeX (a Unicode-aware version of LaTeX).

    Mike Maxwell
    University of Maryland

    On 6/23/2022 10:35 AM, M. Downing Roberts wrote:
    > Hi Frank,
    >
    > I can't add much except to say that I also hit a wall trying to generate
    > a bilingual book (English and Japanese), and the index in particular was
    > very difficult.
    >
    > I got some help from Bob Stayton, but the only solution was a hack to
    > generate the index using another application that I wrote, which
    > massaged the XSL-FO.
    >
    > The problem, including more detail from Bob, is recorded in this GitHub
    > issue: https://github.com/docbook/xslt10-stylesheets/issues/238
    > <https://github.com/docbook/xslt10-stylesheets/issues/238>
    >
    > All best,
    >
    > M. Roberts
    >
    > On Thu, Jun 23, 2022 at 11:07 PM Frank Steimke
    > <f-steimke@berger-und-steimke.de
    > <mailto:f-steimke@berger-und-steimke.de>> wrote:
    >
    > Dear List Members,
    >
    > i had already sent this to the docbook List, but probably this list
    > docbook-apps fits better.
    >
    > I am using DocBook for a bi-lingual book. Most of the content is
    > written in english, but parts are written in the german language.
    > PDF is produced with XSL Stylesheets 1.79.2 shipped within Oxygen 24
    > and the Antenna House Formatter v7.
    >
    > Since most content is english, /book/@xml:lang is 'en'. Fragments in
    > german have @xml:lang='de' at the appropriate level, e. g. for
    > section or note elements. Sometimes i have phrase or emphasis
    > elements only because of the @xml:lang attribute.
    >
    > Observation is, that hyphenation is wrong in the PDF Document for
    > the german fragments. I think i have found the reason, but i am
    > puzzled. There are two issues which i can't understand:
    >
    > 1) There is a template named "language.attribute" in I10n.xsl. It
    > calculates the language value looking at the ancestor axis, and
    > emits an attribute named @lang with that value. *First Issue: *the
    > name of the attribute is wrong, the correct name is @language. See
    > section 7.10.2 "Language" <https://www.w3.org/TR/xsl11/#language>
    > in Extensible Stylesheet Language (XSL) Version 1.1.
    >
    > 2) The template named "language.attribute" is rarely used. *2nd
    > Issue: *I had to create a customization layer for the templates that
    > matches d:para or d:simpara, which do emit an fo:block element, so
    > that they call the language.attribute template. Same for d:phrase
    > and d:emphasis in inline.xsl
    >
    > Maybe i have missed something obvious. Are there any reasons for
    > this lack of language support?
    >
    > Sincerely, Frank Steimke
    >

    --
    This email has been checked for viruses by AVG.
    https://www.avg.com




  • 6.  Re: [docbook-apps] Language support in XSL-FO Stylesheets

    Posted 06-23-2022 16:24
    Frank, is your solution meeting all your needs (with the corrections
    you've made?)

    If so, perhaps put in a change request to docbook?

    regards

    On Thu, 23 Jun 2022 at 15:07, Frank Steimke
    <f-steimke@berger-und-steimke.de> wrote:
    >
    > Dear List Members,
    >
    > i had already sent this to the docbook List, but probably this list docbook-apps fits better.
    >
    > I am using DocBook for a bi-lingual book. Most of the content is written in english, but parts are written in the german language. PDF is produced with XSL Stylesheets 1.79.2 shipped within Oxygen 24 and the Antenna House Formatter v7.
    >
    > Since most content is english, /book/@xml:lang is 'en'. Fragments in german have @xml:lang='de' at the appropriate level, e. g. for section or note elements. Sometimes i have phrase or emphasis elements only because of the @xml:lang attribute.
    >
    > Observation is, that hyphenation is wrong in the PDF Document for the german fragments. I think i have found the reason, but i am puzzled. There are two issues which i can't understand:
    >
    > 1) There is a template named "language.attribute" in I10n.xsl. It calculates the language value looking at the ancestor axis, and emits an attribute named @lang with that value. First Issue: the name of the attribute is wrong, the correct name is @language. See section 7.10.2 "Language" in Extensible Stylesheet Language (XSL) Version 1.1.
    >
    > 2) The template named "language.attribute" is rarely used. 2nd Issue: I had to create a customization layer for the templates that matches d:para or d:simpara, which do emit an fo:block element, so that they call the language.attribute template. Same for d:phrase and d:emphasis in inline.xsl
    >
    > Maybe i have missed something obvious. Are there any reasons for this lack of language support?
    >
    > Sincerely, Frank Steimke



    --
    Dave Pawson
    XSLT XSL-FO FAQ.
    Docbook FAQ.



  • 7.  Canonical DocBook

    Posted 02-27-2022 11:02
    Hello List,

    I would like to propose a project "canonical DocBook" to the DocBook TC,
    and I am interested in the opinion of this mailing list. I hope this is
    the right list.

    DocBook is a great system for creating technical documents. We use it
    successfully for various purposes, which include transformation to
    formats other than HTML and PDF. For example, we work with stylesheets
    for transformation into ODF and into NISO-STS.

    Here, the flexibility of DocBook schemas is problematic, because it
    increases complexity. To give a very simple example, the title of a
    section is valid both with and without an enclosing info element. A
    template for transforming the title element must account for both
    possibilities.

    Our own stylesheets are therefore divided into at least phases. First,
    the input document is transformed into a uniform structure. This would
    ensure, for example, that each title element is always contained in an
    info element. In a second step, the document is converted into the
    target format. The advantage of this method is that the transformation
    of the second phase can be made much easier.

    As far as I can see, the XSL 3 stylesheets for XslTNG are also similar
    in structure. These are certainly much more professional, comprehensive
    and systematic in design. So there is a point in these stylesheets where
    the input document is in a sort of "canonical DocBook". However, this
    canonical format is not documented.

    My suggestion is that the DocBook TC standardize and document the
    canonical DocBook format. Subsequently, stylesheets for transforming
    valid DocBook 5 documents into the canonical format would be published -
    possibly these already exist, as part of the XslTNG stylesheets. The
    advantage would be that other projects could more easily transform
    canonical docbook to other formats. They would be able to build on a
    standard, documented DocBook format of lower complexity.

    Besides the simple example of the title elements, canonical DocBook
    would have to consider the following aspects, among others:

    para/simpara: canonical DocBook should only support simpara. para with
    block-content (tables, lists) must be transformed into a sequence of
    simpara and other block-content.

    Tables: In canonical DocBook, each table must have table column
    specifications. Default values are replaced by explicit values. Spanspec
    elements are converted to corresponding column start and end positions.
    Each cell of a table must have information about its position within the
    table, so that it is possible to determine at which column it starts and
    where it ends without complex calculations. Content of table cell must
    be element only.

    Images: Each image must have at least the attributes for image size and
    scaling.

    emphasis: explicit values instead of default values (e. g. role='bold').
    A list of values for role which must be supported (bold, italic,
    underline).

    Lists: explicit values instead of default values (e. g. numeration for
    orderedlist).

    Of course, this task could be exceedingly difficult if we were on a
    greenfield site. I hope that in reality it will be less difficult if we
    take XslTNG stylesheets as a basis. And accept the format generated in
    them for the intermediate result after simplifying the structure as a
    basis for canonical DocBook standardization.

    I would be very interested in the opinion of the members of this list on
    this proposal.

    Sincerely
    Frank Steimke

    P. S. This text was translated with www.DeepL.com/Translator (free
    version) from german language.



  • 8.  Re: [docbook-apps] Canonical DocBook

    Posted 02-27-2022 11:09
    On Sun, 27 Feb 2022 at 11:02, Frank Steimke
    <f-steimke@berger-und-steimke.de> wrote:

    > My suggestion is that the DocBook TC standardize and document the
    > canonical DocBook format.

    (My view). This is the problem Frank?
    I'm sure, even within the TC, that defining and agreeing which 'form'
    is to be named Canonical
    would be contentious?

    I agree with the motive, I'm less sure about the means of achieving it?

    regards


    --
    Dave Pawson
    XSLT XSL-FO FAQ.
    Docbook FAQ.



  • 9.  Re: [docbook-apps] Canonical DocBook

    Posted 02-27-2022 11:21
    What would be a better way to accomplish this?

    Ask Norm to document its internal format so that a particular community
    can choose it as the "de facto standard" for developing own stylesheets
    based on it?

    Frank

    Am 27.02.22 um 12:09 schrieb Dave Pawson:
    > On Sun, 27 Feb 2022 at 11:02, Frank Steimke
    > <f-steimke@berger-und-steimke.de> wrote:
    >
    >> My suggestion is that the DocBook TC standardize and document the
    >> canonical DocBook format.
    > (My view). This is the problem Frank?
    > I'm sure, even within the TC, that defining and agreeing which 'form'
    > is to be named Canonical
    > would be contentious?
    >
    > I agree with the motive, I'm less sure about the means of achieving it?
    >
    > regards
    >
    >



  • 10.  Re: [docbook-apps] Canonical DocBook

    Posted 02-27-2022 11:29
    On Sun, 27 Feb 2022 at 11:20, Frank Steimke
    <f-steimke@berger-und-steimke.de> wrote:
    >
    > What would be a better way to accomplish this?
    >
    > Ask Norm to document its internal format so that a particular community
    > can choose it as the "de facto standard" for developing own stylesheets
    > based on it?

    Not sure at all Frank.
    Democratic solution perhaps? Majority of TC support A vs B?
    Then rely on Norm (mmmm) to write appropriate stylesheets? Seems unfair IMHO

    regards

    --
    Dave Pawson
    XSLT XSL-FO FAQ.
    Docbook FAQ.



  • 11.  Re: [docbook-apps] Canonical DocBook

    Posted 02-27-2022 11:52
    OK, i see. Not so easy.

    However, i never wanted to be unfair to anyone. My hope or expectation
    was that the stylesheets are already available, because we have XslTNG.
    We have the stylesheets for preprocessing in XslTNG, but we do not have
    a documentation of the DocBook subset they produce.

    But of course, when someone writes it down, there will be a discussion
    about the target format, someone will suggest a "better" solution which
    leads to a change request for the stylesheets ...

    Frank

    Am 27.02.22 um 12:29 schrieb Dave Pawson:
    > On Sun, 27 Feb 2022 at 11:20, Frank Steimke
    > <f-steimke@berger-und-steimke.de> wrote:
    >> What would be a better way to accomplish this?
    >>
    >> Ask Norm to document its internal format so that a particular community
    >> can choose it as the "de facto standard" for developing own stylesheets
    >> based on it?
    > Not sure at all Frank.
    > Democratic solution perhaps? Majority of TC support A vs B?
    > Then rely on Norm (mmmm) to write appropriate stylesheets? Seems unfair IMHO
    >
    > regards
    >



  • 12.  Re: [docbook-apps] Canonical DocBook

    Posted 02-27-2022 13:42
    > Our own stylesheets are therefore divided into at least phases. First,
    […]
    > As far as I can see, the XSL 3 stylesheets for XslTNG are also similar
    > in structure.

    Yep. The xslTNG stylesheets go through several standard stages:

    1. Normalize the logical structure (get rid of entity refs, basically)
    2. Expand XIncludes
    3. Upgrade from 4 to 5 if the input isn’t in a namespace
    4. Process transclusions
    5. Normalize the markup
    6. Process annotations
    7. Process external link bases

    Plus a couple more that are conditional.

    > So there is a point in these stylesheets where the input document is
    > in a sort of "canonical DocBook". However, this canonical format is
    > not documented.

    That’s true.

    > My suggestion is that the DocBook TC standardize and document the
    > canonical DocBook format. Subsequently, stylesheets for transforming

    The problem with a documented canonical format is that, like a “minimal
    subset”, you could probably get broad agreement on 80% of it, but no two
    people would have the same 80% in mind.

    Another problem is that no one wants to author in the canonical format.
    It’s the format that removes all markup minimization.

    I could spin off the normalizing stylesheets, steps 1 to 5 above,
    optional 6 and 7, into a separate package. And I suppose, that could be
    documented. I don’t know if that’s a TC activity or not though as it’s
    pretty application specific.

    > para/simpara: canonical DocBook should only support simpara. para with
    > block-content (tables, lists) must be transformed into a sequence of
    > simpara and other block-content.

    That’s in your 80% is it :-).

    > Tables: In canonical DocBook, each table must have table column
    > specifications. Default values are replaced by explicit values.
    […]
    > which column it starts and where it ends without complex calculations.
    > Content of table cell must be element only.

    It sounds like what you really want here, isn’t even CALS (or HTML)
    tables. You want the completely explicit internal format that the xslTNG
    stylesheets generate during table processing. They turn the entire table
    into a perfectly rectangular grid, using “ghost” elements for cells that
    are missing.

    That’s kind of true for a few of the other ideas you proposed, like the
    inline markup.

    After a while, this starts to feel less like a canonical DocBook and
    more like a structural interchange format.

    > Images: Each image must have at least the attributes for image size
    > and scaling.

    Getting those, if the author didn’t provide them, requires extensions
    and is even then only speculative. I’m sure there are image formats I
    can’t parse. Author’s really should provide them.

    > P. S. This text was translated with www.DeepL.com/Translator (free
    > version) from german language.

    Wow. It did a remarkably good job. I would not, on a casual reading,
    have suspected autotranslation.
    Be seeing you,
    norm

    --
    Norman Tovey-Walsh <ndw@nwalsh.com>
    https://nwalsh.com/

    > Before you criticize someone, walk a mile in his shoes. That way, when
    > you criticize him, you're a mile away and you have his shoes.



  • 13.  Re: [docbook-apps] Canonical DocBook

    Posted 02-27-2022 19:18
    Thank you very much for your comments and suggestions, Norm. Please
    allow a few remarks.

    /"After a while, this starts to feel less like a canonical DocBook and
    more like a structural interchange format"./

    Yes, based on DocBook. After all, the result of standard steps 1 to 7 is
    almost a valid DocBook Document, isn't it? That is, with the exception
    of a few additional attributes in a separate namespace (e. g. ghost
    attributes in tables). But it's true that this format is not intended
    for authors. They keep writing the way they do today, and the
    interchange format is generated by applying the xslTNG steps.

    /No block Elements within para/

    That's in my 80% because neither ODF nor OOXML do allow tables or lists
    in paragraphs. I would see a great benefit when the DocBook based
    structural interchange format would allow easy transformation into
    office Standards, especially ODF.

    /Image size and scaling attributes/

    You are right, this is more a question of the application or tool.

    /I could spin off the normalizing stylesheets, steps 1 to 5 above,
    optional 6 and 7, into a separate package. And I suppose, that could be
    documented. /

    That would be great. Is there a way i can help?

    Thanks,

    Frank Steimke


    Am 27.02.22 um 14:42 schrieb Norm Tovey-Walsh:
    >> Our own stylesheets are therefore divided into at least phases. First,
    > […]
    >> As far as I can see, the XSL 3 stylesheets for XslTNG are also similar
    >> in structure.
    > Yep. The xslTNG stylesheets go through several standard stages:
    >
    > 1. Normalize the logical structure (get rid of entity refs, basically)
    > 2. Expand XIncludes
    > 3. Upgrade from 4 to 5 if the input isn’t in a namespace
    > 4. Process transclusions
    > 5. Normalize the markup
    > 6. Process annotations
    > 7. Process external link bases
    >
    > Plus a couple more that are conditional.
    >
    >> So there is a point in these stylesheets where the input document is
    >> in a sort of "canonical DocBook". However, this canonical format is
    >> not documented.
    > That’s true.
    >
    >> My suggestion is that the DocBook TC standardize and document the
    >> canonical DocBook format. Subsequently, stylesheets for transforming
    > The problem with a documented canonical format is that, like a “minimal
    > subset”, you could probably get broad agreement on 80% of it, but no two
    > people would have the same 80% in mind.
    >
    > Another problem is that no one wants to author in the canonical format.
    > It’s the format that removes all markup minimization.
    >
    > I could spin off the normalizing stylesheets, steps 1 to 5 above,
    > optional 6 and 7, into a separate package. And I suppose, that could be
    > documented. I don’t know if that’s a TC activity or not though as it’s
    > pretty application specific.
    >
    >> para/simpara: canonical DocBook should only support simpara. para with
    >> block-content (tables, lists) must be transformed into a sequence of
    >> simpara and other block-content.
    > That’s in your 80% is it :-).
    >
    >> Tables: In canonical DocBook, each table must have table column
    >> specifications. Default values are replaced by explicit values.
    > […]
    >> which column it starts and where it ends without complex calculations.
    >> Content of table cell must be element only.
    > It sounds like what you really want here, isn’t even CALS (or HTML)
    > tables. You want the completely explicit internal format that the xslTNG
    > stylesheets generate during table processing. They turn the entire table
    > into a perfectly rectangular grid, using “ghost” elements for cells that
    > are missing.
    >
    > That’s kind of true for a few of the other ideas you proposed, like the
    > inline markup.
    >
    > After a while, this starts to feel less like a canonical DocBook and
    > more like a structural interchange format.
    >
    >> Images: Each image must have at least the attributes for image size
    >> and scaling.
    > Getting those, if the author didn’t provide them, requires extensions
    > and is even then only speculative. I’m sure there are image formats I
    > can’t parse. Author’s really should provide them.
    >
    >> P. S. This text was translated withwww.DeepL.com/Translator (free
    >> version) from german language.
    > Wow. It did a remarkably good job. I would not, on a casual reading,
    > have suspected autotranslation.
    > Be seeing you,
    > norm
    >
    > --
    > Norman Tovey-Walsh<ndw@nwalsh.com>
    > https://nwalsh.com/
    >
    >> Before you criticize someone, walk a mile in his shoes. That way, when
    >> you criticize him, you're a mile away and you have his shoes.


  • 14.  Re: [docbook-apps] Canonical DocBook

    Posted 02-27-2022 22:24
    On 2/27/22 1:18 PM, Frank Steimke wrote:
    >
    > /No block Elements within para/
    >
    > That's in my 80% because neither ODF nor OOXML do allow tables or
    > lists in paragraphs. I would see a great benefit when the DocBook
    > based structural interchange format would allow easy transformation
    > into office Standards, especially ODF.
    >
    You potentially lose information by doing that. For example, when
    normalizing, profiling and other attributes on the enclosing table would
    need to be copied to the paras you create before and after the table or
    list and onto the table or list itself as well. But doing so might
    change the author's intent in subtle ways depending on what job those
    attributes are doing.

    I guess this is Norm's point about everybody having a different 80%.

    Regards,

    David



  • 15.  Canonical DocBook / para vs simpara

    Posted 02-28-2022 04:51
    Yes. If the para element has an xml:id attribute, you will have to
    decide which of the elements in the generated sequence should get that
    id. I would vote for the first Element, but every other Element is also
    possible. A reference to the para element in DocBook references a larger
    area (namely the one including the block elements) than in the generated
    document.

    On the other hand, an author who for some reason needs to publish his
    document in an Office format would have to find a solution to this
    problem anyway. If there is no automatic transformation from DocBook to
    Office, then just manually.

    If no agreement can be reached in a discussion on this aspect,
    intermediate solutions might help. For example, it would help to
    transform all para elements that do not contain block elements into
    simpara elements. This would at least make it clear that the remaining
    para elements are always those that contain block elements which may
    require special attention. Their transformation into a sequence of
    simpara and block elements could be done in the first step in the
    transformation to ODF.

    The benefit would be, that the challenge is clearly documented in the
    description of the interchange format: /"If you see a para element in
    the output, then you have to take care about the included block
    elements."/ Even with para elements allowed, this would help for the
    development of the transformation into office formats.

    It would be interesting to know how many such controversial proposals
    actually exist. I would also advocate, for example, that the sect1 to
    sect6 elements should be transformed to section elements.

    Probably one should start with a collection of properties for the
    interchange format (that is, a rough first description of the docbook
    subset) and match which of them are now already supported by xslTNG
    steps, or can be achieved very easily?

    Regards,
    Frank

    Am 27.02.22 um 23:23 schrieb David Cramer:
    > On 2/27/22 1:18 PM, Frank Steimke wrote:
    >>
    >> /No block Elements within para/
    >>
    >> That's in my 80% because neither ODF nor OOXML do allow tables or
    >> lists in paragraphs. I would see a great benefit when the DocBook
    >> based structural interchange format would allow easy transformation
    >> into office Standards, especially ODF.
    >>
    > You potentially lose information by doing that. For example, when
    > normalizing, profiling and other attributes on the enclosing table
    > would need to be copied to the paras you create before and after the
    > table or list and onto the table or list itself as well. But doing so
    > might change the author's intent in subtle ways depending on what job
    > those attributes are doing.
    >
    > I guess this is Norm's point about everybody having a different 80%.
    >
    > Regards,
    >
    > David
    >


  • 16.  Re: [docbook-apps] Canonical DocBook

    Posted 02-28-2022 07:56
    Frank Steimke <f-steimke@berger-und-steimke.de> writes:

    > Thank you very much for your comments and suggestions, Norm. Please allow a few remarks.
    >
    > "After a while, this starts to feel less like a canonical DocBook and
    > more like a structural interchange format".
    >
    > Yes, based on DocBook. After all, the result of standard steps 1 to 7
    > is almost a valid DocBook Document, isn't it? That is, with the
    > exception of a few additional attributes in a separate namespace (e.
    > g. ghost attributes in tables). But it's true that this format is not
    > intended for authors. They keep writing the way they do today, and the
    > interchange format is generated by applying the xslTNG steps.

    For clarity, the output of step 7 in my list is still absolutely valid
    DocBook. The ghost elements and ghost attributes technique (that turns
    up in both tables and callouts) are purely transitory forms used in
    formatting. I was just observing that you might want an even more
    normalized intermediate format…

    > No block Elements within para
    >
    > That's in my 80% because neither ODF nor OOXML do allow tables or
    > lists in paragraphs. I would see a great benefit when the DocBook
    > based structural interchange format would allow easy transformation
    > into office Standards, especially ODF.

    HTML doesn’t allow it either, which is why I’ve mostly trained myself
    not to do it. But it irritates me on a regular basis:

    <para>As you can see:
    <orderedlist>
    <listitem>
    <para>Logically speaking, paragraphs can contain lists and tables.
    </para>
    </listitem>
    <orderedlist>
    <listitem>
    <para>Making “As you can see:” as separate paragraph is just wrong.
    It isn’t even a complete sentence!
    </para>
    </listitem>
    <listitem>
    <para>The same is true of what follows.
    </para>
    </listitem>
    </orderedlist>
    demonstrating that this paragraph logically contains the preceding
    list.</para>

    Nothing we can do about formats that don’t allow it though. Out of
    curiosity, ODF and OOXML have any kind of neutral wrapper that can
    contain them, like HTML’s div?

    As David pointed out in another follow-up, unwrapping that structure
    changes what an xml:id on the paragraph would identify. Not such a big
    deal for linking into a document, but potentially catastrophic for
    XInclude or transclusion.

    Be seeing you,
    norm

    --
    Norman Tovey-Walsh <ndw@nwalsh.com>
    https://nwalsh.com/

    > There is only one difference between a madman and me. I am not
    > mad.--Salvador Dali



  • 17.  Canonical DocBook

    Posted 02-28-2022 10:39
    I agree that "Logically speaking, paragraphs can contain lists and tables (and mathematical formulas and ...)". But some popular Document formats do not support that. ODF has no neutral wrapper for this situation. Not sure about OOXML.
     
    This seems to make it quite obvious that the intermediate format must support the possibility of para Elements with included block elements, but that, on the other hand, sometimes special actions must be taken for the subsequent transformation into other formats.
     
    If the preprocessing steps would transform paragraphs without block elements into simpara, but leave para with block elements untouched, the difference would be visible, and subsequent steps could apply different templates, if neccessary.
     
    Norm, how would an (Relax NG | XSD 1.1) Schema for the outcome of Step 7 differ from the DocBook 5.2 Schema? It is a subset, isn't it?
     
    Regards, Frank
     
     
    Gesendet: Montag, 28. Februar 2022 um 08:55 Uhr
    Von: "Norm Tovey-Walsh" <ndw@nwalsh.com>
    An: "Frank Steimke" <f-steimke@berger-und-steimke.de>
    Cc: docbook-apps@lists.oasis-open.org
    Betreff: Re: [docbook-apps] Canonical DocBook
    Frank Steimke <f-steimke@berger-und-steimke.de> writes:

    > Thank you very much for your comments and suggestions, Norm. Please allow a few remarks.
    >
    > "After a while, this starts to feel less like a canonical DocBook and
    > more like a structural interchange format".
    >
    > Yes, based on DocBook. After all, the result of standard steps 1 to 7
    > is almost a valid DocBook Document, isn't it? That is, with the
    > exception of a few additional attributes in a separate namespace (e.
    > g. ghost attributes in tables). But it's true that this format is not
    > intended for authors. They keep writing the way they do today, and the
    > interchange format is generated by applying the xslTNG steps.

    For clarity, the output of step 7 in my list is still absolutely valid
    DocBook. The ghost elements and ghost attributes technique (that turns
    up in both tables and callouts) are purely transitory forms used in
    formatting. I was just observing that you might want an even more
    normalized intermediate format…

    > No block Elements within para
    >
    > That's in my 80% because neither ODF nor OOXML do allow tables or
    > lists in paragraphs. I would see a great benefit when the DocBook
    > based structural interchange format would allow easy transformation
    > into office Standards, especially ODF.

    HTML doesn’t allow it either, which is why I’ve mostly trained myself
    not to do it. But it irritates me on a regular basis:

    <para>As you can see:
    <orderedlist>
    <listitem>
    <para>Logically speaking, paragraphs can contain lists and tables.
    </para>
    </listitem>
    <orderedlist>
    <listitem>
    <para>Making “As you can see:” as separate paragraph is just wrong.
    It isn’t even a complete sentence!
    </para>
    </listitem>
    <listitem>
    <para>The same is true of what follows.
    </para>
    </listitem>
    </orderedlist>
    demonstrating that this paragraph logically contains the preceding
    list.</para>

    Nothing we can do about formats that don’t allow it though. Out of
    curiosity, ODF and OOXML have any kind of neutral wrapper that can
    contain them, like HTML’s div?

    As David pointed out in another follow-up, unwrapping that structure
    changes what an xml:id on the paragraph would identify. Not such a big
    deal for linking into a document, but potentially catastrophic for
    XInclude or transclusion.

    Be seeing you,
    norm

    --
    Norman Tovey-Walsh <ndw@nwalsh.com>
    https://nwalsh.com/

    > There is only one difference between a madman and me. I am not
    > mad.--Salvador Dali


  • 18.  Re: Canonical DocBook

    Posted 02-28-2022 11:03
    > Norm, how would an (Relax NG | XSD 1.1) Schema for the outcome of Step
    > 7 differ from the DocBook 5.2 Schema? It is a subset, isn't it?

    I think so. It’s intended to be, but since it’s never been validated,
    it’s possible I messed up :-)

    The big changes, on a quick skim, are that an info wrapper is added to
    all block elements that have just a title, elements with optional titles
    (glossary, index, admonitions, etc.) all have explicit titles, external
    glossaries, indexes, bibliographies, etc. are inlined (these are
    features of the stylesheets, not DocBook per se, but pulling them back
    inline simplifies formatting them), unparsed text elements from URIs
    are inlined, and HTML tables get a tbody if one is missing.

    Be seeing you,
    norm

    --
    Norman Tovey-Walsh <ndw@nwalsh.com>
    https://nwalsh.com/

    > Patriotism is often an arbitrary veneration of real estate above
    > principles.--George Jean Nathan