OASIS Open Document Format for Office Applications (OpenDocument) TC

 View Only
Expand all | Collapse all

ISO 14977 EBNF grammar

  • 1.  ISO 14977 EBNF grammar

    Posted 04-28-2008 14:24
    Greetings!
    
    The ISO EBNF grammar: 
    http://standards.iso.org/ittf/PubliclyAvailableStandards/s026153_ISO_IEC_14977_1996(E).zip
    
    Hope everyone is having a great day!
    
    Patrick
    
    -- 
    Patrick Durusau
    patrick@durusau.net
    Chair, V1 - US TAG to JTC 1/SC 34
    Convener, JTC 1/SC 34/WG 3 (Topic Maps)
    Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
    Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
    
    


  • 2.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-28-2008 16:07
    Patrick Durusau:
    > The ISO EBNF grammar: 
    > http://standards.iso.org/ittf/PubliclyAvailableStandards/s026153_ISO_IEC_14977_1996(E).zip
    
    Currently, in OpenFormula we use W3C's XML spec for BNFs instead of the ISO EBNF
    or IETF BNF.
    
    The ISO spec is not a _bad_ one, but it does have weaknesses for the formula purposes.
    
    When comparing ISO's with the W3C XML spec:
    * The ISO spec has no support for character ranges and negated ranges
      (part of regular expressions), while XML's does.
      We use this capability; for some (like SheetName) it's not clear how hard that
      change will be.
    * The ISO spec requires the use of "," for concatenation, instead of that
       being the default.  It also requires ";" to terminate every production.
       As a result, ISO's format is much wordier to express the same thing. E.G.:
       ...  SheetLocator "." Column Row (':' SheetLocator "." Column Row )? ..
      would become:
       ...  SheetLocator, ".", Column, Row, (':', SheetLocator, ".", Column, Row )? .. ;
       Not a show-stopper, but I think that's unfortunate.  Concatenation is EXTREMELY
       common in BNF, so failing to have it as a default operator complicates the spec.
    * The ISO spec's definition operator is "=", which is easily confused with the "="
       used inside BNFs themselves.  That's not as big a deal.
    
    Historically, the ISO document was expensive, while the XML specification was
    freely available.  I think it would have been unconscionable to have referred to the
    ISO spec while it was expensive, but now that the ISO document is publicly available
    without fee, I think it _could_ be used.  However, its lack of regular expression
    support, and unnecessary wordiness, do not give any incentives.
    
    It would take a little time to change to the ISO BNF.  To change this in OpenFormula,
    all the productions would have to change, e.g., change "::=" to "=", inserting
    commas everywhere, and trying to figure out how to replace the character ranges.
    
    Another BNF format is IETF RFC 4234.  It's kind of ugly; alternatives use "/" instead
    of the more common "|", and you HAVE to group them (which is a pain).  Even weirder,
    to indicate repetition you PRECEDE the item with "*" instead of follow it.
    Every book I've seen has "*" FOLLOW the item to be repeated.
    In my mind, IETF's is the worst of the three in terms of clarity.
    
    After looking at the W3C (XML), ISO, and IETF formats for BNFs, we chose the
    W3C's XML format. Reasons:
    * W3C's format produces the clearest, simplest specs with the same meaning.
      Concatenation is the default, alternatives are "|", the "*" is AFTER the repeated item.
      The resulting spec, with the same meaning, is simpler than with ISO or IETF.
    * W3C's format includes character range support.  ISO's does not.
    * OpenDocument is itself based on XML, so it made sense to use the same format
       used to spec XML.
    * XML's format is publicly available at no charge. At the time I believe that was
       not true of ISO's format.  It appears this point, at least, is moot (hooray!).
    
    XML is a standard, in every reasonable sense of the word, so using its BNF format
    is (in my opinion) very defensible.
    
    --- David A. Wheeler
    


  • 3.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-28-2008 17:43
    David,
    
    So, I take it that the technical issue (as oppose to aesthetics, etc.) 
    is the lack of support for character and negated ranges?
    
    When you say "lack of support" I assume you mean that character and 
    negative ranges are not predefined? Yes?
    
    Which is than saying ISO/IEC 14977 cannot define character and negated 
    ranges. Yes?
    
    Err, then you say it lacks "regular expression" support? But as above, 
    that is simply a question of defining the support that we want/need. Yes?
    
    I note that the use of the XML BNF starts with Chapter 5 of the formula 
    work. I would think it would be better to use a BNF to define the 
    primitives up to that point so as to avoid ambiguity running up to 
    Chapter 5. I don't doubt that the formula SC was far more consistent 
    than previous versions of ODF but it would not hurt to use a notation 
    that binds all of it together.
    
    Hope you are having a great day!
    
    Patrick
    
    David A. Wheeler wrote:
    > Patrick Durusau:
    >   
    >> The ISO EBNF grammar: 
    >> http://standards.iso.org/ittf/PubliclyAvailableStandards/s026153_ISO_IEC_14977_1996(E).zip
    >>     
    >
    > Currently, in OpenFormula we use W3C's XML spec for BNFs instead of the ISO EBNF
    > or IETF BNF.
    >
    > The ISO spec is not a _bad_ one, but it does have weaknesses for the formula purposes.
    >
    > When comparing ISO's with the W3C XML spec:
    > * The ISO spec has no support for character ranges and negated ranges
    >   (part of regular expressions), while XML's does.
    >   We use this capability; for some (like SheetName) it's not clear how hard that
    >   change will be.
    > * The ISO spec requires the use of "," for concatenation, instead of that
    >    being the default.  It also requires ";" to terminate every production.
    >    As a result, ISO's format is much wordier to express the same thing. E.G.:
    >    ...  SheetLocator "." Column Row (':' SheetLocator "." Column Row )? ..
    >   would become:
    >    ...  SheetLocator, ".", Column, Row, (':', SheetLocator, ".", Column, Row )? .. ;
    >    Not a show-stopper, but I think that's unfortunate.  Concatenation is EXTREMELY
    >    common in BNF, so failing to have it as a default operator complicates the spec.
    > * The ISO spec's definition operator is "=", which is easily confused with the "="
    >    used inside BNFs themselves.  That's not as big a deal.
    >
    > Historically, the ISO document was expensive, while the XML specification was
    > freely available.  I think it would have been unconscionable to have referred to the
    > ISO spec while it was expensive, but now that the ISO document is publicly available
    > without fee, I think it _could_ be used.  However, its lack of regular expression
    > support, and unnecessary wordiness, do not give any incentives.
    >
    > It would take a little time to change to the ISO BNF.  To change this in OpenFormula,
    > all the productions would have to change, e.g., change "::=" to "=", inserting
    > commas everywhere, and trying to figure out how to replace the character ranges.
    >
    > Another BNF format is IETF RFC 4234.  It's kind of ugly; alternatives use "/" instead
    > of the more common "|", and you HAVE to group them (which is a pain).  Even weirder,
    > to indicate repetition you PRECEDE the item with "*" instead of follow it.
    > Every book I've seen has "*" FOLLOW the item to be repeated.
    > In my mind, IETF's is the worst of the three in terms of clarity.
    >
    > After looking at the W3C (XML), ISO, and IETF formats for BNFs, we chose the
    > W3C's XML format. Reasons:
    > * W3C's format produces the clearest, simplest specs with the same meaning.
    >   Concatenation is the default, alternatives are "|", the "*" is AFTER the repeated item.
    >   The resulting spec, with the same meaning, is simpler than with ISO or IETF.
    > * W3C's format includes character range support.  ISO's does not.
    > * OpenDocument is itself based on XML, so it made sense to use the same format
    >    used to spec XML.
    > * XML's format is publicly available at no charge. At the time I believe that was
    >    not true of ISO's format.  It appears this point, at least, is moot (hooray!).
    >
    > XML is a standard, in every reasonable sense of the word, so using its BNF format
    > is (in my opinion) very defensible.
    >
    > --- David A. Wheeler
    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail.  You may a link to this group and all your TCs in OASIS
    > at:
    > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
    >
    >
    >   
    
    -- 
    Patrick Durusau
    patrick@durusau.net
    Chair, V1 - US TAG to JTC 1/SC 34
    Convener, JTC 1/SC 34/WG 3 (Topic Maps)
    Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
    Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
    
    


  • 4.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-28-2008 18:02

    Thanks for looking into this, David.

    Although we are not a W3C standard, ODF certain has a "family resemblance" to them, based on our use of so many other W3C standards, such as XML, XLink, MathML, XForms, etc.  So defining our syntax using their conventions is a reasonable thing.  

    On other hand, in other cases we have not used W3C standards and instead used standards from ISO.  For example, our use of RELAX NG rather than XML Schema.

    Either choice is defensible, I believe, and can lead to a clear, unambiguous syntactic definition.  

    I wonder if there is any emerging agreement within OASIS on which to use?  Certainly we can't be the only ones running into this.

    -Rob
    ___________________________

    Rob Weir
    Software Architect
    Workplace, Portal and Collaboration Software
    IBM Software Group

    email: robert_weir@us.ibm.com
    phone: 1-978-399-7122
    blog:
    http://www.robweir.com/blog/

    "David A. Wheeler" <dwheeler@dwheeler.com> wrote on 04/28/2008 12:06:46 PM:

    > Patrick Durusau:
    > > The ISO EBNF grammar:
    > >
    http://standards.iso.org/ittf/PubliclyAvailableStandards/
    > s026153_ISO_IEC_14977_1996(E).zip
    >
    > Currently, in OpenFormula we use W3C's XML spec for BNFs instead of
    > the ISO EBNF
    > or IETF BNF.
    >
    > The ISO spec is not a _bad_ one, but it does have weaknesses for the
    > formula purposes.
    >
    > When comparing ISO's with the W3C XML spec:
    > * The ISO spec has no support for character ranges and negated ranges
    >   (part of regular expressions), while XML's does.
    >   We use this capability; for some (like SheetName) it's not clear
    > how hard that
    >   change will be.
    > * The ISO spec requires the use of "," for concatenation, instead of that
    >    being the default.  It also requires ";" to terminate every production.
    >    As a result, ISO's format is much wordier to express the same thing. E.G.:
    >    ...  SheetLocator "." Column Row (':' SheetLocator "." Column Row )? ..
    >   would become:
    >    ...  SheetLocator, ".", Column, Row, (':', SheetLocator, ".",
    > Column, Row )? .. ;
    >    Not a show-stopper, but I think that's unfortunate.  
    > Concatenation is EXTREMELY
    >    common in BNF, so failing to have it as a default operator
    > complicates the spec.
    > * The ISO spec's definition operator is "=", which is easily
    > confused with the "="
    >    used inside BNFs themselves.  That's not as big a deal.
    >
    > Historically, the ISO document was expensive, while the XML specification was
    > freely available.  I think it would have been unconscionable to have
    > referred to the
    > ISO spec while it was expensive, but now that the ISO document is
    > publicly available
    > without fee, I think it _could_ be used.  However, its lack of
    > regular expression
    > support, and unnecessary wordiness, do not give any incentives.
    >
    > It would take a little time to change to the ISO BNF.  To change
    > this in OpenFormula,
    > all the productions would have to change, e.g., change "::=" to "=",inserting
    > commas everywhere, and trying to figure out how to replace the
    > character ranges.
    >
    > Another BNF format is IETF RFC 4234.  It's kind of ugly;
    > alternatives use "/" instead
    > of the more common "|", and you HAVE to group them (which is a
    > pain).  Even weirder,
    > to indicate repetition you PRECEDE the item with "*" instead of follow it.
    > Every book I've seen has "*" FOLLOW the item to be repeated.
    > In my mind, IETF's is the worst of the three in terms of clarity.
    >
    > After looking at the W3C (XML), ISO, and IETF formats for BNFs, we chose the
    > W3C's XML format. Reasons:
    > * W3C's format produces the clearest, simplest specs with the same meaning.
    >   Concatenation is the default, alternatives are "|", the "*" is
    > AFTER the repeated item.
    >   The resulting spec, with the same meaning, is simpler than with ISO or IETF.
    > * W3C's format includes character range support.  ISO's does not.
    > * OpenDocument is itself based on XML, so it made sense to use the same format
    >    used to spec XML.
    > * XML's format is publicly available at no charge. At the time I
    > believe that was
    >    not true of ISO's format.  It appears this point, at least, is
    > moot (hooray!).
    >
    > XML is a standard, in every reasonable sense of the word, so using
    > its BNF format
    > is (in my opinion) very defensible.
    >
    > --- David A. Wheeler
    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail.  You may a link to this group and all your TCs in OASIS
    > at:
    >
    https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
    >


  • 5.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-29-2008 01:14
    Patrick Durusau:
    > So, I take it that the technical issue (as oppose to aesthetics, etc.)
    > is the lack of support for character and negated ranges?
    
    Correct.  Quick clarification: It's negated _character_ ranges that
    ISO doesn't support.  Other kinds of negation work, I believe.
    
    > When you say "lack of support" I assume you mean that character and
    > negative ranges are not predefined? Yes?
    
    No.  It doesn't have a range operator at all; all it allows is listing alternatives.
    
    >Which is than saying ISO/IEC 14977 cannot define character and negated
    >ranges. Yes?
    
    It's not that it CAN'T do it, the problem is that there is no built-in
    range operator, and thus you must enumerate every instance.
    That becomes insane when you have to support international use via
    Unicode/ISO 10646 characters; there is NO way we'll enumerate them all.
    
    I think an example will clarify why the ISO spec doesn't work well for
    defining certain kinds of data formats with international characters.
    
    Here's how you can define "digits 1 through 9" in W3C's notation:
     digits1to9 ::= [1-9]
    
    Here's how you have to do it in ISO - by explicit enumeration of
    each possibility, one by one:
     digits1to9 = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
    
    Notice how painful it is when there are only 9 characters.
    Because we want to support internationalized characters (such as
    sheet names in ANY languages), enumerating all international characters except
    a few is, um, absurd.  If you go beyond the BMP (0), you're talking hundreds of
    thousands of characters in the enumeration.
    
    W3C's notation, in contrast, can say things like [^$] to say "any character but a $".
    Nice and clean.
    
    > Err, then you say it lacks "regular expression" support? But as above,
    > that is simply a question of defining the support that we want/need. Yes?
    
    Um, sorry, what I mean by "regular expression" is specifically the usual
    "character range operator" built into all regex languages that I know of.
    There is no range operator mechanism, as far as I can tell, in ISO's format.
    If I missed it, please let me know.
    
    We could extend ISO's BNF in a nonstandard way, but then what's the
    point of using the standard? Better to use a standard that has the
    needed capabilities built in.
    
    ISO's BNF format isn't horrific, and it's quite suitable for a lot of constrained
    languages.  Many formats FORBID arbitrary international characters, and for
    them it's probably okay.  But for the formula spec, where international
    characters ARE allowed, that was a problem.  Besides, it's a little ugly :-).
    
    >I note that the use of the XML BNF starts with Chapter 5 of the formula
    >work. I would think it would be better to use a BNF to define the
    >primitives up to that point so as to avoid ambiguity running up to
    >Chapter 5.
    
    That's not a bad idea.  I don't know how long that will take to do.
    
    
    Robert Weir:
    >Although we are not a W3C standard, ODF certain has a "family
    >resemblance" to them, based on our use of so many other W3C standards,
    >such as XML, XLink, MathML, XForms, etc.  So defining our syntax using
    >their conventions is a reasonable thing.
    >On other hand, in other cases we have not used W3C standards and instead
    >used standards from ISO.  For example, our use of RELAX NG rather than
    >XML Schema.
    
    >Either choice is defensible, I believe, and can lead to a clear,
    >unambiguous syntactic definition.
    
    Agree.  In the OpenFormula case I still think the W3C format is the best
    choice, and the ISO format is suboptimal.  We could switch to the ISO BNF for
    OpenFormula if it was desperately necessary. We could work around ISO's
    lack of a range operator by removing the formal specification of characters in the spec
    and using informal text instead.  The spec would be less clear because of all
    the unnecessary punctuation required by ISO's format, and what's worse, we
    would change something formally specified into something only specified by prose.
    I don't like that trade at all; I prefer that specifications be specified using formal
    (machine-processable) languages as much as possible unless it just can't
    be made clear that way.  There's less chance of mis-interpretation
    when it's spec'ed in a formal language.
    
    --- David A. Wheeler
    


  • 6.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-29-2008 13:32
    David,
    
    Sigh, I don't know how I get involved in this sort of issue. ;-)
    
    OK, the author of ISO 14977 stopped publishing several years ago and I 
    wasn't about to quickly find an email contact for him.
    
    R. S. Scowen if you are curious about that sort of thing and he wrote a 
    paper on EBNF that may be helpful:
    
    Extended BNF - A Generic Base Standard
    http://www.cl.cam.ac.uk/~mgk25/iso-14977-paper.pdf
    
    I have written to one of his younger colleagues who has a webpage 
    talking about the EBNF grammar, 
    http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html, but he is on paternity 
    leave so it may be a bit before we hear from him, assuming my post gets 
    past his spam filter. I tried not to say anything about $millions of 
    dollars, various African countries, etc., in the first few lines of my 
    post anyway. ;-)
    
    After reading ISO 14977 again (actually more than once), it seems to me 
    that we are straining without just cause. True, I think it requires us 
    to define what we mean by Unicode character, but having done so (I 
    suggest we simply copy the Unicode definition), all we need do is supply 
    a defined start and end for a sequence. I think that works, at least for 
    the range issue.
    
    As far as negation, isn't that the same as excluding a range? I realize 
    there may be deeper issues about negation but as far as parsing, doesn't 
    exclusion fit the bill?
    
    Just some quick thoughts. I will try to return to the issue later this 
    week and/or get guidance from real experts on it.
    
    Hope you are having a great day!
    
    Patrick
    
    David A. Wheeler wrote:
    > Patrick Durusau:
    >   
    >> So, I take it that the technical issue (as oppose to aesthetics, etc.)
    >> is the lack of support for character and negated ranges?
    >>     
    >
    > Correct.  Quick clarification: It's negated _character_ ranges that
    > ISO doesn't support.  Other kinds of negation work, I believe.
    >
    >   
    >> When you say "lack of support" I assume you mean that character and
    >> negative ranges are not predefined? Yes?
    >>     
    >
    > No.  It doesn't have a range operator at all; all it allows is listing alternatives.
    >
    >   
    >> Which is than saying ISO/IEC 14977 cannot define character and negated
    >> ranges. Yes?
    >>     
    >
    > It's not that it CAN'T do it, the problem is that there is no built-in
    > range operator, and thus you must enumerate every instance.
    > That becomes insane when you have to support international use via
    > Unicode/ISO 10646 characters; there is NO way we'll enumerate them all.
    >
    > I think an example will clarify why the ISO spec doesn't work well for
    > defining certain kinds of data formats with international characters.
    >
    > Here's how you can define "digits 1 through 9" in W3C's notation:
    >  digits1to9 ::= [1-9]
    >
    > Here's how you have to do it in ISO - by explicit enumeration of
    > each possibility, one by one:
    >  digits1to9 = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
    >
    > Notice how painful it is when there are only 9 characters.
    > Because we want to support internationalized characters (such as
    > sheet names in ANY languages), enumerating all international characters except
    > a few is, um, absurd.  If you go beyond the BMP (0), you're talking hundreds of
    > thousands of characters in the enumeration.
    >
    > W3C's notation, in contrast, can say things like [^$] to say "any character but a $".
    > Nice and clean.
    >
    >   
    >> Err, then you say it lacks "regular expression" support? But as above,
    >> that is simply a question of defining the support that we want/need. Yes?
    >>     
    >
    > Um, sorry, what I mean by "regular expression" is specifically the usual
    > "character range operator" built into all regex languages that I know of.
    > There is no range operator mechanism, as far as I can tell, in ISO's format.
    > If I missed it, please let me know.
    >
    > We could extend ISO's BNF in a nonstandard way, but then what's the
    > point of using the standard? Better to use a standard that has the
    > needed capabilities built in.
    >
    > ISO's BNF format isn't horrific, and it's quite suitable for a lot of constrained
    > languages.  Many formats FORBID arbitrary international characters, and for
    > them it's probably okay.  But for the formula spec, where international
    > characters ARE allowed, that was a problem.  Besides, it's a little ugly :-).
    >
    >   
    >> I note that the use of the XML BNF starts with Chapter 5 of the formula
    >> work. I would think it would be better to use a BNF to define the
    >> primitives up to that point so as to avoid ambiguity running up to
    >> Chapter 5.
    >>     
    >
    > That's not a bad idea.  I don't know how long that will take to do.
    >
    >
    > Robert Weir:
    >   
    >> Although we are not a W3C standard, ODF certain has a "family
    >> resemblance" to them, based on our use of so many other W3C standards,
    >> such as XML, XLink, MathML, XForms, etc.  So defining our syntax using
    >> their conventions is a reasonable thing.
    >> On other hand, in other cases we have not used W3C standards and instead
    >> used standards from ISO.  For example, our use of RELAX NG rather than
    >> XML Schema.
    >>     
    >
    >   
    >> Either choice is defensible, I believe, and can lead to a clear,
    >> unambiguous syntactic definition.
    >>     
    >
    > Agree.  In the OpenFormula case I still think the W3C format is the best
    > choice, and the ISO format is suboptimal.  We could switch to the ISO BNF for
    > OpenFormula if it was desperately necessary. We could work around ISO's
    > lack of a range operator by removing the formal specification of characters in the spec
    > and using informal text instead.  The spec would be less clear because of all
    > the unnecessary punctuation required by ISO's format, and what's worse, we
    > would change something formally specified into something only specified by prose.
    > I don't like that trade at all; I prefer that specifications be specified using formal
    > (machine-processable) languages as much as possible unless it just can't
    > be made clear that way.  There's less chance of mis-interpretation
    > when it's spec'ed in a formal language.
    >
    > --- David A. Wheeler
    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail.  You may a link to this group and all your TCs in OASIS
    > at:
    > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
    >
    >
    >   
    
    -- 
    Patrick Durusau
    patrick@durusau.net
    Chair, V1 - US TAG to JTC 1/SC 34
    Convener, JTC 1/SC 34/WG 3 (Topic Maps)
    Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
    Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
    
    


  • 7.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-29-2008 17:32
    Patrick Durusau:
    > After reading ISO 14977 again (actually more than once), it seems to me 
    > that we are straining without just cause.
    
    What's the strain?  There are several viable alternatives.
    I've provided evidence that the W3C alternative is the best for
    OpenFormula's purposes, and that may be true for others as well.
    W3C's is designed for XML, and OpenDocument is XML-based, so
    it's quite defensible in use.  The ISO format COULD be used, sure.
    The, excess, commas, required, by, ISO, are, quite, annoying,
    when, you, have, a, number, of, productions.
    
    I also think the W3C's format is also MUCH more commonly used
    than ISO's format; I present as evidence:
    * http://en.wikipedia.org/wiki/Backus-Naur_form
       which uses "::=" for defining and does NOT use "," for concatenation
    * http://cui.unige.ch/db-research/Enseignement/analyseinfo/BNFweb.html
       a big database about programming languages using BNF.
       Again, uses "::=" for defining and does NOT use "," for concatenation
    I would rather use the standard format for BNF instead of ISO's format :-).
    
    The paper you pointed me to is unconvincing.  The ISO format
    is carefully designed to add lots of extra gunk so that indentation need
    not be meaningful.  Yet increasingly meaningful indentation is being seen
    as a good thing (e.g., see Python and Haskell); it appears they're working
    from an obsolete spec.
    
    > True, I think it requires us 
    > to define what we mean by Unicode character, but having done so (I 
    > suggest we simply copy the Unicode definition), all we need do is supply 
    > a defined start and end for a sequence.
    
    Yes, the lack of range operators can be worked around by using prose
    to describe start/end sequences.
    
    --- David A. Wheeler
    


  • 8.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-29-2008 19:05
    David,
    
    David A. Wheeler wrote:
    > Patrick Durusau:
    >   
    >> After reading ISO 14977 again (actually more than once), it seems to me 
    >> that we are straining without just cause.
    >>     
    >
    > What's the strain?  There are several viable alternatives.
    > I've provided evidence that the W3C alternative is the best for
    > OpenFormula's purposes, and that may be true for others as well.
    > W3C's is designed for XML, and OpenDocument is XML-based, so
    > it's quite defensible in use.  The ISO format COULD be used, sure.
    > The, excess, commas, required, by, ISO, are, quite, annoying,
    > when, you, have, a, number, of, productions.
    >
    >   
    By strain I meant that we are making more out of the difficulty of using 
    the ISO EBNF syntax than it need be.
    
    OK, it uses a lot of commas, hardly seems like a crime to me.
    
    The distinction we are dancing around is the "pragmatic good enough" of 
    the W3C and a really format definition, which what I think you have with 
    ISO 14977.
    
    Sure, I don't deny that you can write really useful grammars using the 
    W3C style and lots of people have done it. That doesn't mean that avoids 
    the rule in ISO that we should use ISO standards without compelling 
    justification to the contrary.
    > I also think the W3C's format is also MUCH more commonly used
    > than ISO's format; I present as evidence:
    > * http://en.wikipedia.org/wiki/Backus-Naur_form
    >    which uses "::=" for defining and does NOT use "," for concatenation
    > * http://cui.unige.ch/db-research/Enseignement/analyseinfo/BNFweb.html
    >    a big database about programming languages using BNF.
    >    Again, uses "::=" for defining and does NOT use "," for concatenation
    > I would rather use the standard format for BNF instead of ISO's format :-).
    >
    >   
    Oh my! Dueling Wiki entries:
    
    http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
    
    ;-)
    
    > The paper you pointed me to is unconvincing.  The ISO format
    > is carefully designed to add lots of extra gunk so that indentation need
    > not be meaningful.  Yet increasingly meaningful indentation is being seen
    > as a good thing (e.g., see Python and Haskell); it appears they're working
    > from an obsolete spec.
    >
    >   
    ;-) I don't think that Python and Haskell (although I am studying the 
    latter for unrelated reasons) really qualify as world-wide movements.
    >> True, I think it requires us 
    >> to define what we mean by Unicode character, but having done so (I 
    >> suggest we simply copy the Unicode definition), all we need do is supply 
    >> a defined start and end for a sequence.
    >>     
    >
    > Yes, the lack of range operators can be worked around by using prose
    > to describe start/end sequences.
    >
    >   
    Err, no, can't we define the start and end sequences in the EBNF?
    
    I don't really care that much except that I don't want to be met with a 
    legitimate objection, non-use of ISO 14977 when my only defense is that 
    "everybody else is doing W3C." I last heard that "reason" when I was in 
    about the 2nd or 3rd grade.
    
    Give me something more than aesthetics (too many commas) or everybody 
    likes W3C, or indentation is good, etc., and if the TC backs it I will 
    find a way to get ISO to accept it. But it has to be something 
    convincing to people who don't already agree. That isn't a very good 
    test for an argument. ;-)
    
    Hope you are having a great day!
    
    Patrick
    
    PS: Any more basis stuff? Not that I have time this week, I am trying to 
    finish inserting references in the draft so testing of adding 
    auto-generated content can begin.
    > --- David A. Wheeler
    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail.  You may a link to this group and all your TCs in OASIS
    > at:
    > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
    >
    >
    >   
    
    -- 
    Patrick Durusau
    patrick@durusau.net
    Chair, V1 - US TAG to JTC 1/SC 34
    Convener, JTC 1/SC 34/WG 3 (Topic Maps)
    Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
    Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
    
    


  • 9.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-29-2008 20:24
    I said:
    > > Yes, the lack of range operators can be worked around by using prose
    > > to describe start/end sequences.
    
    Patrick Durusau:
    > Err, no, can't we define the start and end sequences in the EBNF?
    
    No. ISO's BNF format CANNOT do that.
    
    ISO 14977 has no range operators. That's what I tried to explain
    in my first message, sorry if that wasn't clear.
    Yes, you can easily specify the start and stop values in the ISO BNF format,
    but since ISO BNF it has no range operator, it doesn't do any good.
    
    > Give me something more than aesthetics...
    
    Okay, we're back to my first email.
    There is no range operator in ISO's BNF format, there _is_ one
    in W3C's format, and range operators are valuable
    for this kind of spec.
    
    Look at Wikipedia's example for ISO BNF:
    alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
                         | "H" | "I" | "J" | "K" | "L" | "M" | "N"
                         | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
                         | "V" | "W" | "X" | "Y" | "Z" ;
    Contrast this with W3C/XML BNF:
      alphabetic_character ::= [A-Z]
    
    There are lesser reasons.  ISO's BNF spec also lacks a character
    negation operator (though there's an ugly workaround for that),
    and its comma-notation makes this kind of spec unnecessarily
    wordy (never mind aesthetics; wordiness is not desirable for any spec).
    
    But I think the lack of a range operator is a simple and concise rationale.
    
    Certainly, we should use standards where we can, but the XML spec
    is a standard.  And in particular, we're specifying an XML format, so using
    the XML spec's notation for an XML-based format is sensible.
    
    >Oh my! Dueling Wiki entries:
    >http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
    >;-)
    
    Actually, the "ISO" examples are not compliant with ISO's spec; they
    are a mix of W3C and ISO formats.
    IE, they use "::=" to define their productions, even though
    the ISO symbol is "=" instead. It's not a good sign when even the people writing
    about the ISO format use W3C-format examples.
    
    Anyway...
    
    > PS: Any more basis stuff? Not that I have time this week, I am trying to 
    > finish inserting references in the draft so testing of adding 
    > auto-generated content can begin.
    
    I'm waiting for more test cases.  Once I get them, I think we can clearly
    document what's going on.
    
    --- David A. Wheeler
    


  • 10.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-29-2008 23:26
    David,
    
    David A. Wheeler wrote:
    > I said:
    >   
    >>> Yes, the lack of range operators can be worked around by using prose
    >>> to describe start/end sequences.
    >>>       
    >
    > Patrick Durusau:
    >   
    >> Err, no, can't we define the start and end sequences in the EBNF?
    >>     
    >
    > No. ISO's BNF format CANNOT do that.
    >
    > ISO 14977 has no range operators. That's what I tried to explain
    > in my first message, sorry if that wasn't clear.
    > Yes, you can easily specify the start and stop values in the ISO BNF format,
    > but since ISO BNF it has no range operator, it doesn't do any good.
    >
    >   
    No, you were clear.
    
    My point is that you can use ISO 14977 to define a range operator and 
    the Unicode characters that you want to use with it.
    
    See: http://www.open-std.org/jtc1/sc22/wg11/docs/n506.pdf, for example. 
    (It uses ISO 14977 to define a range operator.)
    
    I was *hearing* you as saying that ISO 14977 could not support range 
    operators on Unicode characters, whether that was what you intended or not.
    
    What I am suggesting that that ISO 14977 could define uChar 
    someRangeOperator uChar by defining a regex for uChar. It isn't 
    necessary to enumerate the Unicode characters. That is done in EBNF 
    because it is defining the characters to be used within the grammar. 
    Once you have that set, well, the rest is up to your imagination.
    
    Better?
    
    Hope you are having a great day!
    
    Patrick
    
    
    
    
    >> Give me something more than aesthetics...
    >>     
    >
    > Okay, we're back to my first email.
    > There is no range operator in ISO's BNF format, there _is_ one
    > in W3C's format, and range operators are valuable
    > for this kind of spec.
    >
    > Look at Wikipedia's example for ISO BNF:
    > alphabetic character = "A" | "B" | "C" | "D" | "E" | "F" | "G"
    >                      | "H" | "I" | "J" | "K" | "L" | "M" | "N"
    >                      | "O" | "P" | "Q" | "R" | "S" | "T" | "U"
    >                      | "V" | "W" | "X" | "Y" | "Z" ;
    > Contrast this with W3C/XML BNF:
    >   alphabetic_character ::= [A-Z]
    >
    > There are lesser reasons.  ISO's BNF spec also lacks a character
    > negation operator (though there's an ugly workaround for that),
    > and its comma-notation makes this kind of spec unnecessarily
    > wordy (never mind aesthetics; wordiness is not desirable for any spec).
    >
    > But I think the lack of a range operator is a simple and concise rationale.
    >
    > Certainly, we should use standards where we can, but the XML spec
    > is a standard.  And in particular, we're specifying an XML format, so using
    > the XML spec's notation for an XML-based format is sensible.
    >
    >   
    >> Oh my! Dueling Wiki entries:
    >> http://en.wikipedia.org/wiki/Extended_Backus%E2%80%93Naur_form
    >> ;-)
    >>     
    >
    > Actually, the "ISO" examples are not compliant with ISO's spec; they
    > are a mix of W3C and ISO formats.
    > IE, they use "::=" to define their productions, even though
    > the ISO symbol is "=" instead. It's not a good sign when even the people writing
    > about the ISO format use W3C-format examples.
    >
    > Anyway...
    >
    >   
    >> PS: Any more basis stuff? Not that I have time this week, I am trying to 
    >> finish inserting references in the draft so testing of adding 
    >> auto-generated content can begin.
    >>     
    >
    > I'm waiting for more test cases.  Once I get them, I think we can clearly
    > document what's going on.
    >
    > --- David A. Wheeler
    >
    > ---------------------------------------------------------------------
    > To unsubscribe from this mail list, you must leave the OASIS TC that
    > generates this mail.  You may a link to this group and all your TCs in OASIS
    > at:
    > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php 
    >
    >
    >   
    
    -- 
    Patrick Durusau
    patrick@durusau.net
    Chair, V1 - US TAG to JTC 1/SC 34
    Convener, JTC 1/SC 34/WG 3 (Topic Maps)
    Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
    Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
    
    


  • 11.  Re: [office] ISO 14977 EBNF grammar

    Posted 05-01-2008 00:47
    > My point is that you can use ISO 14977 to define a range operator and 
    > the Unicode characters that you want to use with it.
    
    Sure.
    
    But then you're not using the standard as it is; you're using a nonstandard extension.
    Better to use a standard notation that _has_ a range operator.
    
    > See: http://www.open-std.org/jtc1/sc22/wg11/docs/n506.pdf, for example. 
    > (It uses ISO 14977 to define a range operator.)
    
    No it doesn't.  I looked at that spec.
    
    Section 5.1 describes the EBNF; it has NO range operator in the EBNF
    metalanguage.  Which is expected, because it's using ISO's EBNF.
    It also doesn't use ranges where it'd be obvious to do so, e.g.,
    note that 7.1 (which uses the EBNF metalanguage) has to list each character,
    instead of using a range, because the EBNF has no range operator.
    
    Now it's true that 8.2.1 talks about a "range" operator, but this is
    NOT a range operator in the EBNF notation.
    That section is using EBNF to define a range operator in the language
    the spec is defining. That does NOT give a "range" operator
    to the EBNF itself; the ENBF continues to lack a range operator.
    
    Yes, this could be worked around by using prose definitions, set unions
    and differences, and defining many more nonterminals.
    But I see no need to use those hacks.
    ISO has accepted other standards that use the W3C/XML BNF notation, and
    I think we have a good technical reason for using W3C/XML BNF: lack of ranges.
    The much shorter/simpler resulting spec is in my mind a good justification too.
    Clearly W3C/XML is itself defined in a standard, one which is available for
    all to use and was developed by a wide consensus. Heck, I suspect ISO
    has ratified it (if so, you'd think it'd count as an ISO standard too).
    
    {Heck, ISO is willing to ratify a specification that is explicitly incompatible with
    the Gregorian/ISO 8601 calendar, so using ISO standards even when they
    clearly DO apply is obviously not THAT important ;-)  ;-)  ;-).  }
    
    --- David A. Wheeler
    


  • 12.  Re: [office] ISO 14977 EBNF grammar

    Posted 05-01-2008 17:19
    I have to say that I am with David on this one. The lack of a range
    operator and the wide use of W3C EBNF make it very justifiable that we
    use that standard instead.
    
    wt
    
    On Wed, Apr 30, 2008 at 5:47 PM, David A. Wheeler 


  • 13.  Re: [office] ISO 14977 EBNF grammar

    Posted 04-29-2008 21:24

    Patrick Durusau <patrick@durusau.net> wrote on 04/29/2008 03:04:49 PM:

    > Sure, I don't deny that you can write really useful grammars using the
    > W3C style and lots of people have done it. That doesn't mean that avoids
    > the rule in ISO that we should use ISO standards without compelling
    > justification to the contrary.

    I think we're talking different levels.  No one is talking about using any EBNF in ODF document instances.  All we're talking about is how we formally describe the syntax of formulas in the text of the specification.  This is the use of standards at a different level than when we talk about using ISO 8601 to represent dates, etc.  And there is no decision we can make here that has any influence on conformance or implementations.

    To follow your logic, OOXML would have not been allowed, since its definitive schema was in W3C XML Schema format rather than ISO's Relax NG.

    But I agree that ISO might have sensibly required a specific BNF format for defining syntax, but if they did I'd expect to read it in ISO Directives, Part 2.  But I don't see that.  So we should be able to describe our grammar any way we want, so long as we can describe or reference it precisely.  This could include defining our own syntax language, referring to the W3C's conventions, to IETF's or to ISO 14977.  

    Remember, the W3C is an Approved Referenced Specification Originator Organization (ARO), so a reference to a W3C Recommendation is essentially pre-approved.

    -Rob