docbook-apps

Re: [docbook-apps] Stripping comments

  • 1.  Re: [docbook-apps] Stripping comments

    Posted 03-30-2007 17:45
    You could use XSLT, but you might not like the results. 8^)
    You start with an identity stylesheet such as the following:


    <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

    <xsl:output indent="no"/>

    <xsl:template match="node()|@*">
    <xsl:copy>
    <xsl:apply-templates select="@*"/>
    <xsl:apply-templates/>
    </xsl:copy>
    </xsl:template>

    </xsl:stylesheet>

    Then you add a template to strip out comments:

    <xsl:template match="comment()"/>

    There are several problems with using XSLT though:

    1. Entity references are expanded, not preserved as entity references.
    You can't hide them in XSLT because the parser expands them before the
    stylesheet sees them.

    2. Any DOCTYPE declaration is removed. You have to copy your doctype
    public and system identifiers to the xsl:output element's doctype-public
    and doctype-system attributes. The stylesheet can't do it, because the
    DOCTYPE is not accessible to XPath. Any internal DTD subset is lost, as
    there is no way for xsl:output to specify it.

    3. Default DocBook attributes are added. You will end up with a lot of
    moreinfo="none" attributes on elements like literal.

    4. The output will differ in other ways because the XML is parsed and then
    re-serialized: attribute order may be different, empty elements may be
    expressed differently, character references will become native UTF-8
    (unless you specify a different output encoding). These differences will
    show up in a text diff program, but not an XML-aware differencing program.

    Generally, I use Perl for such filtering. The XML comment string is a well
    defined regular expression, and Perl doesn't mess with any XML stuff. I
    read the entire file into a single string, globally replace comments with
    nothing, and then print the string.

    Bob Stayton
    Sagehill Enterprises
    DocBook Consulting
    bobs@sagehill.net