OASIS Open Document Format for Office Applications (OpenDocument) TC

Expand all | Collapse all

Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

  • 1.  Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

    Posted 06-11-2009 14:54
    I've been going through the current draft of OpenFormula, looking for 
    areas that are specifically called out as "implementation-defined", 
    "unspecified" or "undefined".  I did not find as many as I thought I would 
    find.
    
    I created a spreadsheet that illustrated each one of these cases, which I 
    am attaching.
    
    My aim here is several-fold:
    
    First, since implementation-defined means that we want implementations to 
    actually define these behaviors, in their documentation, it will be good 
    if we enumerate these cases, so it is clearer what we want them to define. 
     It might be good, for example, to have an annex in the specification that 
    presents a form that could be filled out listing these answers to these 
    questions.
    
    Then I wonder if we truly need to have all of these items be 
    implementation-defined?  Or to ask the question differently, would there 
    be tangible user benefit, in terms of increased interoperability, if some 
    of these items were fully specified, knowing that some implementations 
    would then need to change their code in order to conform, and that they 
    would need to deal (perhaps with version-conditional logic) with legacy 
    documents?
    
    In some cases, the implementation-defined features are "hard" problems, 
    like the various text to number paths, which are at the very least 
    locale-dependent, not portable and hard to pin down.  This might be an 
    area where there needs to be user awareness that these constructs are not 
    portable and should be avoided in areas where portability is desired. 
    
    However, in other areas, like what SUM() does with a empty argument list, 
    or what VARP() does with only two values, or what 0^0 or ATAN(0;0) is, 
    these seem to be things where we might be able to come to some agreement 
    on.
    
    What do people think?  Is this an area that is worth cleaning up rather 
    than trying to standardize a snapshot of the legacy application mess?  It 
    seems to me that if people want a legacy monstrosity, they all have 
    another standard the can use for that.  We have the opportunity with 
    OpenFormula to be bit tidier.  But only if implementations are willing to 
    conform to the standard, even if it varies in some details from what they 
    have today in their apps.
    
    -Rob
    
    
    
    
    
    


  • 2.  RE: [oic] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 01:03
    Rob,
    
    I'm not sure how useful this exercise would be.  It sounds like quite a bit of work, and would certainly delay publication of ODF 1.2.  It would also increase the scope of work that implementers would need to do in order to implement OpenFormula, which would delay broad adoption of the standard.
    
    Consider the example you've given below, where SUM() has no parameters.  If we dictate the runtime behavior for that situation, we may make it harder for implementers on certain platforms, or those working in certain programming languages, to implement OpenFormula.  We would also be introducing inconsistencies between the behavior of "SUM nothing" in other contexts, and its behavior in OpenFormula.
    
    Regarding the specific example of 0^0, if mathematicians don't agree on what that means (and apparently they don't), it seems to me that we're creating an issue if the ODF TC takes a stand and attempts to stipulate the meaning/interpretation of 0^0.  Is that really a debate we want to get into?  And regardless of the position we take, aren't we risking being at odds with mathematicians who disagree?  I'd hate to see OpenFormula's future adoption limited by these sorts of debates, and I think we should be careful not to create too many of them.
    
    It's not clear to me how this would affect type coercion in OpenFormula.  Currently there are two classes of conformance, and type coercion is only defined in one of them.  Are you proposing that these two conformance classes be merged into a single class that would standardize all type-coercion issues?
    
    I'd rather see us focus on getting the current OpenFormula draft ready for public review, so that we can get ODF 1.2 approved sooner rather than later.  And I think that ease of implementation of OpenFormula should be a priority.  The existing approach seems to be well-designed from that perspective, so I think we should polish it up and get it done.
    
    I have one question about this matter, which perhaps David Wheeler or somebody involved early on can answer: how were the existing implementation-defined items determined?  I'm assuming that some thought has already been given to what should be in the standard and what should be implementation-defined, and that is reflected in the current content of the OpenFormula draft.  Is that a fair assumption?  And if so, it would be useful to hear some of the rationale for how these decisions were originally made.
    
    Regards,
    Doug
    
    
    


  • 3.  RE: [oic] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 01:03
    Rob,
    
    I'm not sure how useful this exercise would be.  It sounds like quite a bit of work, and would certainly delay publication of ODF 1.2.  It would also increase the scope of work that implementers would need to do in order to implement OpenFormula, which would delay broad adoption of the standard.
    
    Consider the example you've given below, where SUM() has no parameters.  If we dictate the runtime behavior for that situation, we may make it harder for implementers on certain platforms, or those working in certain programming languages, to implement OpenFormula.  We would also be introducing inconsistencies between the behavior of "SUM nothing" in other contexts, and its behavior in OpenFormula.
    
    Regarding the specific example of 0^0, if mathematicians don't agree on what that means (and apparently they don't), it seems to me that we're creating an issue if the ODF TC takes a stand and attempts to stipulate the meaning/interpretation of 0^0.  Is that really a debate we want to get into?  And regardless of the position we take, aren't we risking being at odds with mathematicians who disagree?  I'd hate to see OpenFormula's future adoption limited by these sorts of debates, and I think we should be careful not to create too many of them.
    
    It's not clear to me how this would affect type coercion in OpenFormula.  Currently there are two classes of conformance, and type coercion is only defined in one of them.  Are you proposing that these two conformance classes be merged into a single class that would standardize all type-coercion issues?
    
    I'd rather see us focus on getting the current OpenFormula draft ready for public review, so that we can get ODF 1.2 approved sooner rather than later.  And I think that ease of implementation of OpenFormula should be a priority.  The existing approach seems to be well-designed from that perspective, so I think we should polish it up and get it done.
    
    I have one question about this matter, which perhaps David Wheeler or somebody involved early on can answer: how were the existing implementation-defined items determined?  I'm assuming that some thought has already been given to what should be in the standard and what should be implementation-defined, and that is reflected in the current content of the OpenFormula draft.  Is that a fair assumption?  And if so, it would be useful to hear some of the rationale for how these decisions were originally made.
    
    Regards,
    Doug
    
    
    


  • 4.  Re: [office-formula] RE: [oic] Implementation-defined, Unspecified,and Undefined behaviors in OpenFormula

    Posted 06-12-2009 01:39
    Doug Mahugh wrote:
    "I have one question about this matter, which perhaps David Wheeler or
    somebody involved early on can answer: how were the existing
    implementation-defined items determined? I'm assuming that some thought
    has already been given to what should be in the standard and what should
    be implementation-defined, and that is reflected in the current content
    of the OpenFormula draft. Is that a fair assumption? And if so, it would
    be useful to hear some of the rationale for how these decisions were
    originally made."
    
    Sure.  Quite a lot of thought and time went into determining what is 
    implementation-defined; a little history might help.
    
    The OpenFormula specification was developed by examining a large number 
    of actual spreadsheet applications, including Excel, OpenOffice.org, 
    Lotus', Word Perfect/Quattro Pro, Gnumeric, KOffice, Palm DocumentsToGo, 
    and many others.  This reflects a belief of mine: I believe standards 
    should reflect _actual_ _practice_, instead of some academic notion of 
    what such applications MIGHT do.  I think this is a belief that most 
    others in this group share; it's certainly not unique to me.  We created 
    a number of tables comparing functions and operators in each 
    application, for example, as well as many test cases to look for "edge 
    cases" or undocumented functionality.
    
    The specification is in some sense a "union" of the applications above, 
    because we want to be able to represent the data generated by any of 
    those applications.  It's not quite a union; if the capability was 
    considered extremely exotic and unlikely to be useful to more than a few 
    users, it was omitted. For example, Gnumeric and Quattro Pro have an 
    enormous number of predefined functions, and not all of them are in 
    OpenFormula.  Nevertheless, OpenFormula can represent even very exotic 
    Excel or OpenOffice.org spreadsheets without difficulty, and since 
    extensions are easily supported, it can even represent those well. 
    Syntactically, OpenFormula looks like the XML formal used by 
    OpenOffice.org, though not it's identical; we added syntactic extensions 
    as necessary to support other implementations.
    
    When existing applications _differed_ in what they produced, we (the 
    group) tried to gain agreement on what they "should do".  In some cases, 
    we could agree that an application was simply buggy, and the spec should 
    say something else.  In other cases, we agreed that there were actually 
    different functions, that happened to have colliding names.  In those 
    cases, we defined both functions with different names. In general Excel, 
    OpenOffice.org, and Gnumeric all agree on function names, so we followed 
    their names where they existed.
    
    But in some cases, applications do things differently, and there were 
    good arguments for each option.  In those cases, we tried to determine 
    what it "should" be.  But if we could not agree on a single result, we 
    labelled it as "implementation-defined".  Even when it's labelled as 
    implementation-defined, we tried to limit the possibilities.  A good 
    example is 0^0: The only plausible values are 0, 1, or Error, so we can 
    at least spec that it must be one of those 3.
    
    As Rob noted, the number of "implementation-defined" items is really 
    very, very small.  The C and C++ specifications are _already_ 
    international standards, and they have FAR more implementation-defined. 
      Even the Ada specification has many implementation-defined items, and 
    that is one of the most rigorous specifications for a language with the 
    ability to do numerical calculations.  I don't see the few 
    implementation-defined items as a real problem.  You can certainly 
    exchange a vast number of spreadsheets, even given these 
    implementation-defined areas.
    
    That's the history in a nutshell.  Does that help?
    
    --- David A. Wheeler
    


  • 5.  RE: [oic] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 14:50
    Rob,
    
    
    Great work, it sure comes in handy to have a list like that.
    
    
    
    Browsing through the list, I noticed that most items fit into these categories:
    
    a) Boolean not being a data type of its own
    b) Aggregation functions without parameter
    c) Hex/Bin/Dec conversions with overly large input
    d) Mathematical puzzles (0^0)
    
    
    IMHO, fixing or aligning b) and c) doesn't seem to be that hard (and relatively speaking, c really isn't used that much in spreadsheets)
    
    Maybe this can be done for ODF 1.2.
    
    
    
    But a) on the other hand, seems harder since implementations without a Boolean data type have to 1) implement it as a separate datatype, 2) change and recheck all formulas that can take or produce Booleans...
    
    So that seems to be a candidate for OpenFormula-Next ?
    
    
    
    And d), well, http://www.physicsforums.com/showthread.php?t=71521 tells me that sometimes it appears to be useful to let 0^0 be 1. Three different search engines calculated it as being 1 by the way, while Wolfram said undefined :-) But I Am Not A Math Professor, so I really wouldn't know. 
    
    Perhaps we could check what specialized applications like Mathematica or Matlab are doing ?
    
    Anyway, since d) seems to be hard to solve and of little value for 99% of the users, I wouldn't mind leaving it implementation defined...
    
    
    
    Best regards,
    
    Bart
    
    


  • 6.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 01:12
    robert_weir@us.ibm.com wrote:
    > I've been going through the current draft of OpenFormula, looking for 
    > areas that are specifically called out as "implementation-defined", 
    > "unspecified" or "undefined".  I did not find as many as I thought I would 
    > find.
    
    I think that's a good thing!
    
    > I created a spreadsheet that illustrated each one of these cases, which I 
    > am attaching.
    
    Thanks for doing that.  And I agree, a little form to fill these out 
    (for example) would be great.
    
    > Then I wonder if we truly need to have all of these items be 
    > implementation-defined?  Or to ask the question differently, would there 
    > be tangible user benefit, in terms of increased interoperability, if some 
    > of these items were fully specified, knowing that some implementations 
    > would then need to change their code in order to conform, and that they 
    > would need to deal (perhaps with version-conditional logic) with legacy 
    > documents?
    
    Re-examining these is a good idea.
    
    However, I think that expecting "0 implementation-defined values" is 
    both unrealistic and undesirable in most real standards, including this 
    one.  In all non-trivial standards there are areas where there are 
    legitimate differences, and trying to prematurely force a specific 
    answer is simply undesirable.  Simply identifying those areas, so users 
    know what to avoid, is a major benefit, even when we don't specify the 
    specifics.
    
    Regarding the specifics, I have comments on two:
    * I have to admit, I'm tired of the 0^0 discussions, but we can have 
    another one.  There are good arguments for 1, or 0, or an Error, and 
    actual implementations DO vary, so it's hard to pin that down.  I don't 
    think this has a massive impact on interoperability; it'd be NICE to pin 
    down, but it can be managed.
    * In practice, I don't know why anyone would CARE what SUM() does with 
    an empty argument list.  This is not a REAL interoperability issue; it's 
    hard to imagine a normal user even DOING that.  We could leave that 
    completely *undefined*, and not impact real world interoperability.
    
    If we can nail down a few more specifics, that'd be great.  But we 
    needn't get hung up on this.
    
    --- David A. Wheeler
    
    


  • 7.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviorsin OpenFormula

    Posted 06-12-2009 04:17
    "David A. Wheeler" 


  • 8.  Re: [office-formula] Implementation-defined, Unspecified, andUndefined behaviors in OpenFormula

    Posted 06-12-2009 05:08
    On Fri, 2009-06-12 at 00:19 -0400, robert_weir@us.ibm.com wrote:
    
    > 
    > Let me know if there are any missing that jump out in your mind.  If there 
    > are, then they may not have been consistently called out with the term 
    > "implementation-defined" in the specification, and we should fix that.
    > 
    > 
    > > > Then I wonder if we truly need to have all of these items be 
    > > > implementation-defined?  Or to ask the question differently, would 
    > there 
    > > > be tangible user benefit, in terms of increased interoperability, if 
    > some 
    > > > of these items were fully specified, knowing that some implementations 
    > 
    > > > would then need to change their code in order to conform, and that 
    > they 
    > > > would need to deal (perhaps with version-conditional logic) with 
    > legacy 
    > > > documents?
    > > 
    > > Re-examining these is a good idea.
    > > 
    > > However, I think that expecting "0 implementation-defined values" is 
    > > both unrealistic and undesirable in most real standards, including this 
    > > one.  In all non-trivial standards there are areas where there are 
    > > legitimate differences, and trying to prematurely force a specific 
    > > answer is simply undesirable.  Simply identifying those areas, so users 
    > > know what to avoid, is a major benefit, even when we don't specify the 
    > > specifics.
    > > 
    > 
    > I'd like to see us come up with a good reason why it is a good thing to 
    > have a feature be implementation-defined.  Saying "mathematicians 
    > disagree" or "different implementations do different things" doesn't sound 
    > like a particularly good reason.  I think it is expected that 
    > implementations will need to change their code to implement OpenFormula. 
    > I'd be astonished if the did not.
    
    You may be in for surprises. Whatever the OpenFormula standard says,
    mathematical correctness is for some of us simply more important. Just
    because everybody else disagrees to calculate a wrong answer dos not
    mean that we have to follow. 
    
    Having said that, this doesn't necessarily maen that the file format can
    be accommodated: for example in Gnumeric typing 2^3^2 will always be 512
    not 64, but of course as we write to ODF files or read from there we may
    include parentheses to fix what we consider simply to be wrong. 
    
    > That's the analysis I'd like to see:  What is the user benefit if we 
    > eliminated these differences versus what would be the downside.
    
    
    > The nice thing here is that any choice is defensible.  It is not like any 
    > of them are wrong.
    
    Well, sorry, but two of them are wrong. Once you say what ^ means, two
    of them have to be wrong.
    
    >   We're not redefining the Gregorian calendar or 
    > anything.  So I'd just pick one, based on the majority (or plurality if 
    > that is the case) behavior. Or do whatever Excel does here, if that makes 
    > Doug happy.  I'd certainly have no hesitancy to add an "if" statement to 
    > the Symphony code if it were necessary for us to accommodate this.
    
    You may not have that hesitancy, but I for example would hesitate. You
    would need a really good reason to agree on the wrong answer. 
    > 
    
    Andreas
    -- 
    Andreas J. Guelzow
    Concordia University College of Alberta
    
    


  • 9.  RE: [office-formula] Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

    Posted 06-12-2009 05:52
    I recall "discretionary" being used in cases such as arithmetic precision,
    maximum dimensions on tables, and other cases where an implementation may
    place a limitation (e.g., on the maximum length of the dc:creator string, in
    presented characters, that will be put into some user-readable
    presentation). 
    
    Although some implementation-defined aspects may be of this nature, it might
    be useful to single them out.  I certainly agree that there should be a
    specification of how each conforming implementation handles
    implementation-defined features.
    
     - Dennis
    
    


  • 10.  RE: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 12:24
    The language on precision is:
    
    "This specification does not, by itself, specify a numerical 
    implementation model, though it does imply some minimal levels of accuracy 
    for most functions. For example, an application cannot say that it 
    implements the infix operator ?/? as specified in this document if it 
    implements integer-only arithmetic.
    
    In practice, applications tend to use at least one IEEE 754-1985 binary 
    floating-point representation, using at least the 64-bit representation 
    and possibly larger widths for intermediate results. When IEEE 754 
    representations are used, results such as Inf (infinity) and Nan (not a 
    number) are considered an Error value. Applications may use IEEE 854-1987 
    (which covers decimal arithmetic). In general, applications are encouraged 
    to use appropriate standards for their numerical models. This means that 
    applications will often not produce ?exact? results, but only approximate 
    results for a large number of places."
    
    I think I would call this "implementation-defined" and we probably want to 
    say that explicitly in cases like this. We might also wanb to rephrase 
    this from a statement about the specification, "the specification does 
    not...", to a statement about the implementation, e..g, "The numerical 
    implementation model used by a conforming OpenFormula expression evaluator 
    is implementation-defined", or something similar.
    
    -Rob
    
    
    "Dennis E. Hamilton" 


  • 11.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 13:07
    robert_weir@us.ibm.com wrote:
    > The language on precision is:
    > 
    > "This specification does not, by itself, specify a numerical 
    > implementation model,....
    > I think I would call this "implementation-defined" and we probably want to 
    > say that explicitly in cases like this.
    
    Okay, that's a good idea.
    
    Can I talk you into proposing some alternative text?
    
    --- David A. Wheeler
    
    
    


  • 12.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviorsin OpenFormula

    Posted 06-12-2009 13:32
    Yes, when I get back from the Hague plugfest I plan on taking an editorial 
    pass at the draft, making good on a promise I made long ago when I 
    volunteered to be co-editor of OpenFormula.
    
    -Rob
    
    "David A. Wheeler" 


  • 13.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 13:47
    robert_weir@us.ibm.com wrote:
     > Then I wonder if we truly need to have all of these items be
     > implementation-defined?  Or to ask the question differently, would there
     > be tangible user benefit, in terms of increased interoperability, if 
    some
     > of these items were fully specified, knowing that some implementations
     > would then need to change their code in order to conform, and that they
     > would need to deal (perhaps with version-conditional logic) with legacy
     > documents?
    
    "Aye, there's the rub".  If the cost of forcing exact equivalence 
    exceeds the benefits, then I believe that we should NOT force 
    unnecessary equivalence, and that "implementation-defined" is what we 
    SHOULD say.   For years we've been trying to eliminate 
    "implementation-defined" areas, so I'd be surprised if we could now come 
    to some agreement on many of these, but by all means let's talk.
    
    There are many areas where we DO force specific interpretations, even 
    though spreadsheets differ, because they DO impact interoperability. 
    For example, several spreadsheet implementations' ordinary string 
    operations start counting positions at 0; others start counting at 1. 
    Neither is the "right answer", but failing to agree on a convention 
    would make nearly all the string operations non-interoperable.  So we 
    settled on using 1 as the starting number.  And so on.
    
    But there are diminishing returns; at some point, it's better to give up 
    and leave some things implementation-defined.
    
     > However, in other areas, like what SUM() does with a empty argument list,
    
    Here I think there is legitimate disagreement.  Some believe that this 
    should be Error.  On the other hand, it's perfectly reasonable to argue 
    that "0" should be the result; it's even mathematically clean.  I see no 
    benefit to pressing this issue; this construct simply doesn't happen in 
    normal spreadsheets.  Why do we want implementors to make changes to 
    their implementations if there would be no improvement in 
    interoperability?  I think here, the costs to change clearly exceed any 
    benefit to interoperability.
    
     > I'd like to see us come up with a good reason why it is a good thing to
     > have a feature be implementation-defined.  Saying "mathematicians
     > disagree" or "different implementations do different things" doesn't 
    sound
     > like a particularly good reason.
    
    Why not?  If there's no "right" answer, and disagreement does not 
    significantly impact interoperability, then there's no benefit to trying 
    to find a right answer.  It all comes to down to cost vs. benefit.
    
     > That's the analysis I'd like to see:  What is the user benefit if we
     > eliminated these differences versus what would be the downside.
     >
     > The nice thing here is that any choice is defensible.  It is not like 
    any
     > of them are wrong.  We're not redefining the Gregorian calendar or
     > anything.  So I'd just pick one, based on the majority (or plurality if
     > that is the case) behavior. Or do whatever Excel does here, if that 
    makes
     > Doug happy.  I'd certainly have no hesitancy to add an "if" statement to
     > the Symphony code if it were necessary for us to accommodate this.
    
     > What do people think?  Is this an area that is worth cleaning up rather
     > than trying to standardize a snapshot of the legacy application mess?
    
    I don't see it as a mess.  There are few examples, as you noted.
    
    By the way, I skimmed through your spreadsheet.  "=3>=TRUE()" is NOT 
    necessarily an operation mixing types.  TRUE() may be a Number (it _IS_ 
    a Number on OpenOffice.org, Lotus, Quattro Pro, and many others); when 
    Logical isn't a distinct type, they're the same.
    
    
    --- David A. Wheeler
    
    


  • 14.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 15:51
    > 
    >  > However, in other areas, like what SUM() does with a empty argument 
    list,
    > 
    > Here I think there is legitimate disagreement.  Some believe that this 
    > should be Error.  On the other hand, it's perfectly reasonable to argue 
    > that "0" should be the result; it's even mathematically clean.  I see no 
    
    > benefit to pressing this issue; this construct simply doesn't happen in 
    > normal spreadsheets.  Why do we want implementors to make changes to 
    > their implementations if there would be no improvement in 
    > interoperability?  I think here, the costs to change clearly exceed any 
    > benefit to interoperability.
    > 
    
    What's the cost in this case?  I think this is something that could be 
    handled in a single if statement, along the lines of:
    
    (if param_count == 0)
            return ERROR_CODES.ARGUMENT_ERROR;
    else
    {
            //do whatever you did before
    }
    
    So I think the costs are minimal.
    
    By bias is this, from the user's perspective, and I'd recommend this as a 
    principle:  If something should not be done, then doing it should be an 
    error.  Only if detecting the error is too costly should we say it is 
    unspecified.  That is the argument, for example, why C does not define the 
    value of an uninitialized variable.  It is not because having these is 
    good.  It is that detecting them was considered to be prohibitive for 
    runtime performance.
    
    >  > I'd like to see us come up with a good reason why it is a good thing 
    to
    >  > have a feature be implementation-defined.  Saying "mathematicians
    >  > disagree" or "different implementations do different things" doesn't 
    > sound
    >  > like a particularly good reason.
    > 
    > Why not?  If there's no "right" answer, and disagreement does not 
    > significantly impact interoperability, then there's no benefit to trying 
    
    > to find a right answer.  It all comes to down to cost vs. benefit.
    > 
    
    There may not be any agreed "right" answer but that does not mean that 
    anyone agrees that there should be three "right answers".  It is very 
    possible that if we ranked our preferences here in the TC, they would be 
    as follows:
    
    1) Everyone does exactly what my implementation does
    2) We all do the same thing, even if that means my implementation must 
    change to follow the standard
    3) No one changes their code and interoperability is reduced
    
    I have no doubt that all implementors would state that their first 
    preference is #1.  That is natural and is it obvious that this is not a 
    viable solution, since we have conflicting behaviors today.  So the 
    question is what is our 2nd preference?
    
    
    >  > That's the analysis I'd like to see:  What is the user benefit if we
    >  > eliminated these differences versus what would be the downside.
    >  >
    >  > The nice thing here is that any choice is defensible.  It is not like 
    
    > any
    >  > of them are wrong.  We're not redefining the Gregorian calendar or
    >  > anything.  So I'd just pick one, based on the majority (or plurality 
    if
    >  > that is the case) behavior. Or do whatever Excel does here, if that 
    > makes
    >  > Doug happy.  I'd certainly have no hesitancy to add an "if" statement 
    to
    >  > the Symphony code if it were necessary for us to accommodate this.
    > 
    >  > What do people think?  Is this an area that is worth cleaning up 
    rather
    >  > than trying to standardize a snapshot of the legacy application mess?
    > 
    > I don't see it as a mess.  There are few examples, as you noted.
    > 
    > By the way, I skimmed through your spreadsheet.  "=3>=TRUE()" is NOT 
    > necessarily an operation mixing types.  TRUE() may be a Number (it _IS_ 
    > a Number on OpenOffice.org, Lotus, Quattro Pro, and many others); when 
    > Logical isn't a distinct type, they're the same.
    > 
    
    Can you suggest a different example?
    
    -Rob
    


  • 15.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 15:51
    > 
    >  > However, in other areas, like what SUM() does with a empty argument 
    list,
    > 
    > Here I think there is legitimate disagreement.  Some believe that this 
    > should be Error.  On the other hand, it's perfectly reasonable to argue 
    > that "0" should be the result; it's even mathematically clean.  I see no 
    
    > benefit to pressing this issue; this construct simply doesn't happen in 
    > normal spreadsheets.  Why do we want implementors to make changes to 
    > their implementations if there would be no improvement in 
    > interoperability?  I think here, the costs to change clearly exceed any 
    > benefit to interoperability.
    > 
    
    What's the cost in this case?  I think this is something that could be 
    handled in a single if statement, along the lines of:
    
    (if param_count == 0)
            return ERROR_CODES.ARGUMENT_ERROR;
    else
    {
            //do whatever you did before
    }
    
    So I think the costs are minimal.
    
    By bias is this, from the user's perspective, and I'd recommend this as a 
    principle:  If something should not be done, then doing it should be an 
    error.  Only if detecting the error is too costly should we say it is 
    unspecified.  That is the argument, for example, why C does not define the 
    value of an uninitialized variable.  It is not because having these is 
    good.  It is that detecting them was considered to be prohibitive for 
    runtime performance.
    
    >  > I'd like to see us come up with a good reason why it is a good thing 
    to
    >  > have a feature be implementation-defined.  Saying "mathematicians
    >  > disagree" or "different implementations do different things" doesn't 
    > sound
    >  > like a particularly good reason.
    > 
    > Why not?  If there's no "right" answer, and disagreement does not 
    > significantly impact interoperability, then there's no benefit to trying 
    
    > to find a right answer.  It all comes to down to cost vs. benefit.
    > 
    
    There may not be any agreed "right" answer but that does not mean that 
    anyone agrees that there should be three "right answers".  It is very 
    possible that if we ranked our preferences here in the TC, they would be 
    as follows:
    
    1) Everyone does exactly what my implementation does
    2) We all do the same thing, even if that means my implementation must 
    change to follow the standard
    3) No one changes their code and interoperability is reduced
    
    I have no doubt that all implementors would state that their first 
    preference is #1.  That is natural and is it obvious that this is not a 
    viable solution, since we have conflicting behaviors today.  So the 
    question is what is our 2nd preference?
    
    
    >  > That's the analysis I'd like to see:  What is the user benefit if we
    >  > eliminated these differences versus what would be the downside.
    >  >
    >  > The nice thing here is that any choice is defensible.  It is not like 
    
    > any
    >  > of them are wrong.  We're not redefining the Gregorian calendar or
    >  > anything.  So I'd just pick one, based on the majority (or plurality 
    if
    >  > that is the case) behavior. Or do whatever Excel does here, if that 
    > makes
    >  > Doug happy.  I'd certainly have no hesitancy to add an "if" statement 
    to
    >  > the Symphony code if it were necessary for us to accommodate this.
    > 
    >  > What do people think?  Is this an area that is worth cleaning up 
    rather
    >  > than trying to standardize a snapshot of the legacy application mess?
    > 
    > I don't see it as a mess.  There are few examples, as you noted.
    > 
    > By the way, I skimmed through your spreadsheet.  "=3>=TRUE()" is NOT 
    > necessarily an operation mixing types.  TRUE() may be a Number (it _IS_ 
    > a Number on OpenOffice.org, Lotus, Quattro Pro, and many others); when 
    > Logical isn't a distinct type, they're the same.
    > 
    
    Can you suggest a different example?
    
    -Rob
    


  • 16.  RE: [office-formula] Implementation-defined, Unspecified, andUndefined behaviors in OpenFormula

    Posted 06-12-2009 15:55
    The current draft of OpenForumla describes what I think is a very powerful and useful concept which allows ODF to support both great interoperability across vendors AND allows customers to bring their legacy spreadsheets forward if they want to standardize on using ODF within their organization.   David Wheeler's explanation in response to my earlier question shows that he and others have already put a lot of great work into this idea, and I don't think we want to lose that.
    
    In Section 2 (Conformance),  the current draft says: "... this specification discusses what is required for a document to assert that it is a portable document. A portable document shall only depend on the capabilities defined in this specification, and shall not depend on undefined or implementation-defined behavior."
    
    In an earlier mail to the list,  Eric Patterson proposed strengthening this language a bit to say:
    
    "A spreadsheet document (as opposed to a spreadsheet application) is defined as "portable" when it only depends on the capabilities defined in this specification, and does not depend on undefined or implementation-defined behavior.
    And adding
    "Applications may provide users with assistance (in the form of warning messages or other features) for the creation and editing of portable documents."
    
    If OpenForumla switches to mandating a certain behavior in the cases where current implementations differs,  then a user will not be able to re-save an existing legacy spreadsheet as an ODF 1.2 file and expect the results to be the same as they were before within the application that originally created the file.    Some may argue that users should not want to re-save spreadsheets which preserve the implementation specific behaviors they had before, but all of our feedback from real customers tells us that this is exactly what they want.    If we remove this support from OpenFomula,  I think it will limit its adoption and usefulness.   To answer one of Rob's questions earlier in the discussion, that's a good reason to continue to allow implementation defined behavior.
    
    At the same time, for newly created spreadsheets customers can use whatever features or assistance their application provides to make sure they avoid using non-portable features in their documents.   I can imagine an implementer creating a "strictly portable" mode where it simply becomes an error to write =2^3^2  rather than =2^(3^2).   But the decisions on exactly how to help users create portable documents depends very much on the applications UI design and philosophy and is out of scope for a file format spec.    If applications do a good job of this,  then over time the world's collection of spreadsheet files will become more and more portable.
    
    FYI, I'll be traveling to The Hague over the weekend for the ODF  plugfest, so may be unresponsive on email for a couple of days.  Looking forward to seeing the other TC members who will be there.
    
    Regards,
    Doug
    
    


  • 17.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 15:44
    robert_weir@us.ibm.com wrote:
    > I've been going through the current draft of OpenFormula, looking for 
    > areas that are specifically called out as "implementation-defined", 
    > "unspecified" or "undefined".  I did not find as many as I thought I would 
    > find.
    > I created a spreadsheet that illustrated each one of these cases, which I 
    > am attaching.
    
    I took your spreadsheet and added a new column, "Why Undefined in 
    OpenFormula?".  I then filled this column in, in an attempt to document 
    in more detail the rationale for WHY those items are undefined.
    
    So, attached, is my attempt to answer those issues.  Of course, we don't 
    have to agree with my attempt to answer these items. For example, maybe 
    we've learned more since.  And in two cases (VARP and TEXT), I think we 
    should change the spec as described below.  So this analysis by Weir was 
    worthwhile for at LEAST that reason.
    
    As you can tell by the attached spreadsheet, it turns out that there are 
    actually very FEW causes for something to be undefined in OpenFormula. 
    A few items that were SPECIFICALLY agreed to be undefined end up causing 
    "undefined" in several places in the spec.  So, I named those reasons, 
    which end up getting reused all over. See the attached spreadsheet for 
    details; they are:
    * DISTINCT_LOGICAL_TYPE
    * AUTO_CONVERT_TEXT_TO_NUMBER
    * ZERO_PARAMETERS_IN_LIST
    * OLD_BASE_CONVERTERS.  Basically, we want people to avoid these older 
    functions in the future... we specify just enough so that they can 
    continue to be used in older sheets.
    * PERMIT_EXTENSIONS.  A desire to permit extensions in certain very 
    limited areas, without requiring universal support.
    
    These 5 issues explain the 85% ((61-9)/61) of the implementation-defined 
    areas, plus 9 more that are very specific, which means we have 
    REMARKABLY few implementation-defined areas.  That's great!  "Few but 
    non-zero" is, to me, a sign of a GOOD specification.  "Non-zero" 
    suggests that we've been practical; "few" suggests that interoperability 
    with this specification is really good.
    
    I added two more items not on Weir's list, at the end. (Not counted in 
    the discussion above.)
    
    This analysis brings out two points:
    * What should VARP() do when passed one value?  Both Excel and 
    OpenOfficce.org return 0.  I think that makes sense - if given only one 
    value, its variance is zero.  I think we should SPECIFY this, removing 
    one of our implementation-defined items.  Comments?
    * TEXT() is vacuous.  We should at least define SOME format string 
    commands.  Can anyone take this on?
    
    
    --- David A. Wheeler
    


  • 18.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviorsin OpenFormula

    Posted 06-12-2009 15:58
    "David A. Wheeler" 


  • 19.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 16:26
    robert_weir@us.ibm.com wrote:
    > Would it be worth taking a declarative approach on some of these?
    
    I doubt it, but I'd like to know what others think.
    
    I believe the goal is NOT to have zero implementation-defined items.  I 
    believe, and I suspect you agree, that the goal is interoperability.  If 
    some decision doesn't harm interoperability in real life, then having it 
    implementation-defined is a non-issue. In fact, prematurely specifying 
    something where there is disagreement can make things worse.
    
    --- David A. Wheeler
    
    


  • 20.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviorsin OpenFormula

    Posted 06-12-2009 16:53
    "David A. Wheeler" 


  • 21.  Re: [office-formula] Implementation-defined, Unspecified, and Undefinedbehaviors in OpenFormula

    Posted 06-12-2009 21:45
    robert_weir@us.ibm.com wrote:
    > Maybe I wasn't clear enough....
    > 
    > A declarative approach means that implementation behavior can still vary, 
    > but instead of documenting this behavior only in written documentation, 
    > the behavior is documented in the XML itself.
    
    Ah!  You may have been clear, but I certainly didn't appreciate what you 
    had in mind.  I was concerned that you expected readers to be required 
    to adjust their OWN behavior, based on this information.  Which is not 
    what you were saying.
    
    As long as a specific behavior to respond to this isn't mandated by the 
    spec, but it's basically a comment in the XML file, I don't have an 
    objection in principle.  Clearly, implementations would have to modify 
    their implementations to add that information, but that sounds like 
    something they can just insert into their static headers anyway, so it'd 
    be relatively trivial to do.  We could even phase it in.
    
    > This would apply in 
    > specific cases where there are only a small choice of allowed behaviors, 
    > such as 0^0, whether boolean and number are distinct, SUM(), etc.  We're 
    > not mandating specification evaluation behavior, but mandating that the 
    > behavior used by the application that writes the document be declared in 
    > the document's XML.
    
    I think the main issue is really, "how important is this?".  If it's 
    really important, then maybe we should mandate a specific answer.  If 
    it's not important, then how far should we go?!?  Documenting the 
    assumptions of a particular writer inside the XML file is an interesting 
    alternative.
    
    Anyone else: Pro? Con?  Another alternative?
    
    --- David A. Wheeler
    


  • 22.  Re: [office-formula] Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

    Posted 06-12-2009 22:42
    
    
    On 6/12/2009, "David A. Wheeler" 


  • 23.  Re: [office-formula] Implementation-defined, Unspecified, andUndefined behaviors in OpenFormula

    Posted 07-01-2009 12:31
    Hi Andreas,
    
    On Friday, 2009-06-12 16:42:04 -0600, Andreas J. Guelzow wrote:
    
    > I think it is a really good idea to mandate in the file what
    > "implementation defined" behaviour was used in the creation of this
    > file.
    
    What specific settings come to mind?
    
    Boolean is a distinct type and not included in NumberSequence and
    behaves different in some other operations, such as sorting and
    [HV]LOOKUP range-lookup.
    vs.
    Boolean is a number.
    => 


  • 24.  Agenda Request: Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

    Posted 06-22-2009 13:52
    Rob,
    
    I notice in reviewing JIRA issues that there are many places where 
    
    (1) implementation-defined is important in being able to say something
    definitive about an ODF feature; and,
    
    (2) implementation-defined is very valuable in sorting out the relationship
    of a feature to the different conformance classes.
    
    PROPOSAL: That we introduce the implementation-defined condition in the
    conformance section of all parts of the ODF specification.
    
    I request that we discuss this on the next available agenda (06-29 or
    07-06).
    
     - Dennis
    
    PS: I don't think declarative is as passive (done as comments) as David
    Wheeler remarked.  It seems to me that such declarative information is about
    what is required to be supported for the document to be consumed properly.
    It could be ignored, of course, but there is the opportunity for an
    implementation to indicate that it cannot honor the declarative requirements
    and also obtain agreement to do what it is able to do, if it has a fallback.
    Some declarative requirements around discretionary provisions (size of a
    table's "used area" for example) might actually be ones that an
    implementation, in checking them, might end up declining to process because
    there is no reasonable fall-back in the implementation.
        The request for agenda discussion is with regard to whether the TC
    considers it worthwhile to introduce an implementation-defined condition on
    required documentation.  We certainly need it, either way, to characterize
    certain features in a clear way.
    
    
    
    


  • 25.  Agenda Request: Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

    Posted 06-22-2009 13:52
    Rob,
    
    I notice in reviewing JIRA issues that there are many places where 
    
    (1) implementation-defined is important in being able to say something
    definitive about an ODF feature; and,
    
    (2) implementation-defined is very valuable in sorting out the relationship
    of a feature to the different conformance classes.
    
    PROPOSAL: That we introduce the implementation-defined condition in the
    conformance section of all parts of the ODF specification.
    
    I request that we discuss this on the next available agenda (06-29 or
    07-06).
    
     - Dennis
    
    PS: I don't think declarative is as passive (done as comments) as David
    Wheeler remarked.  It seems to me that such declarative information is about
    what is required to be supported for the document to be consumed properly.
    It could be ignored, of course, but there is the opportunity for an
    implementation to indicate that it cannot honor the declarative requirements
    and also obtain agreement to do what it is able to do, if it has a fallback.
    Some declarative requirements around discretionary provisions (size of a
    table's "used area" for example) might actually be ones that an
    implementation, in checking them, might end up declining to process because
    there is no reasonable fall-back in the implementation.
        The request for agenda discussion is with regard to whether the TC
    considers it worthwhile to introduce an implementation-defined condition on
    required documentation.  We certainly need it, either way, to characterize
    certain features in a clear way.
    
    
    
    


  • 26.  Agenda Request: Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

    Posted 06-22-2009 16:49
    My apologies:  This request is directed to the ODF TC, not the OIC TC.
    
     - Dennis
    
    


  • 27.  Agenda Request: Implementation-defined, Unspecified, and Undefined behaviors in OpenFormula

    Posted 06-22-2009 16:49
    My apologies:  This request is directed to the ODF TC, not the OIC TC.
    
     - Dennis