David,
Sigh, I don't know how I get involved in this sort of issue. ;-)
OK, the author of ISO 14977 stopped publishing several years ago and I
wasn't about to quickly find an email contact for him.
R. S. Scowen if you are curious about that sort of thing and he wrote a
paper on EBNF that may be helpful:
Extended BNF - A Generic Base Standard
http://www.cl.cam.ac.uk/~mgk25/iso-14977-paper.pdf
I have written to one of his younger colleagues who has a webpage
talking about the EBNF grammar,
http://www.cl.cam.ac.uk/~mgk25/iso-ebnf.html, but he is on paternity
leave so it may be a bit before we hear from him, assuming my post gets
past his spam filter. I tried not to say anything about $millions of
dollars, various African countries, etc., in the first few lines of my
post anyway. ;-)
After reading ISO 14977 again (actually more than once), it seems to me
that we are straining without just cause. True, I think it requires us
to define what we mean by Unicode character, but having done so (I
suggest we simply copy the Unicode definition), all we need do is supply
a defined start and end for a sequence. I think that works, at least for
the range issue.
As far as negation, isn't that the same as excluding a range? I realize
there may be deeper issues about negation but as far as parsing, doesn't
exclusion fit the bill?
Just some quick thoughts. I will try to return to the issue later this
week and/or get guidance from real experts on it.
Hope you are having a great day!
Patrick
David A. Wheeler wrote:
> Patrick Durusau:
>
>> So, I take it that the technical issue (as oppose to aesthetics, etc.)
>> is the lack of support for character and negated ranges?
>>
>
> Correct. Quick clarification: It's negated _character_ ranges that
> ISO doesn't support. Other kinds of negation work, I believe.
>
>
>> When you say "lack of support" I assume you mean that character and
>> negative ranges are not predefined? Yes?
>>
>
> No. It doesn't have a range operator at all; all it allows is listing alternatives.
>
>
>> Which is than saying ISO/IEC 14977 cannot define character and negated
>> ranges. Yes?
>>
>
> It's not that it CAN'T do it, the problem is that there is no built-in
> range operator, and thus you must enumerate every instance.
> That becomes insane when you have to support international use via
> Unicode/ISO 10646 characters; there is NO way we'll enumerate them all.
>
> I think an example will clarify why the ISO spec doesn't work well for
> defining certain kinds of data formats with international characters.
>
> Here's how you can define "digits 1 through 9" in W3C's notation:
> digits1to9 ::= [1-9]
>
> Here's how you have to do it in ISO - by explicit enumeration of
> each possibility, one by one:
> digits1to9 = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
>
> Notice how painful it is when there are only 9 characters.
> Because we want to support internationalized characters (such as
> sheet names in ANY languages), enumerating all international characters except
> a few is, um, absurd. If you go beyond the BMP (0), you're talking hundreds of
> thousands of characters in the enumeration.
>
> W3C's notation, in contrast, can say things like [^$] to say "any character but a $".
> Nice and clean.
>
>
>> Err, then you say it lacks "regular expression" support? But as above,
>> that is simply a question of defining the support that we want/need. Yes?
>>
>
> Um, sorry, what I mean by "regular expression" is specifically the usual
> "character range operator" built into all regex languages that I know of.
> There is no range operator mechanism, as far as I can tell, in ISO's format.
> If I missed it, please let me know.
>
> We could extend ISO's BNF in a nonstandard way, but then what's the
> point of using the standard? Better to use a standard that has the
> needed capabilities built in.
>
> ISO's BNF format isn't horrific, and it's quite suitable for a lot of constrained
> languages. Many formats FORBID arbitrary international characters, and for
> them it's probably okay. But for the formula spec, where international
> characters ARE allowed, that was a problem. Besides, it's a little ugly :-).
>
>
>> I note that the use of the XML BNF starts with Chapter 5 of the formula
>> work. I would think it would be better to use a BNF to define the
>> primitives up to that point so as to avoid ambiguity running up to
>> Chapter 5.
>>
>
> That's not a bad idea. I don't know how long that will take to do.
>
>
> Robert Weir:
>
>> Although we are not a W3C standard, ODF certain has a "family
>> resemblance" to them, based on our use of so many other W3C standards,
>> such as XML, XLink, MathML, XForms, etc. So defining our syntax using
>> their conventions is a reasonable thing.
>> On other hand, in other cases we have not used W3C standards and instead
>> used standards from ISO. For example, our use of RELAX NG rather than
>> XML Schema.
>>
>
>
>> Either choice is defensible, I believe, and can lead to a clear,
>> unambiguous syntactic definition.
>>
>
> Agree. In the OpenFormula case I still think the W3C format is the best
> choice, and the ISO format is suboptimal. We could switch to the ISO BNF for
> OpenFormula if it was desperately necessary. We could work around ISO's
> lack of a range operator by removing the formal specification of characters in the spec
> and using informal text instead. The spec would be less clear because of all
> the unnecessary punctuation required by ISO's format, and what's worse, we
> would change something formally specified into something only specified by prose.
> I don't like that trade at all; I prefer that specifications be specified using formal
> (machine-processable) languages as much as possible unless it just can't
> be made clear that way. There's less chance of mis-interpretation
> when it's spec'ed in a formal language.
>
> --- David A. Wheeler
>
> ---------------------------------------------------------------------
> To unsubscribe from this mail list, you must leave the OASIS TC that
> generates this mail. You may a link to this group and all your TCs in OASIS
> at:
> https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php
>
>
>
--
Patrick Durusau
patrick@durusau.net
Chair, V1 - US TAG to JTC 1/SC 34
Convener, JTC 1/SC 34/WG 3 (Topic Maps)
Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)