OASIS Open Document Format for Office Applications (OpenDocument) TC

  • 1.  Proposal for Spreadsheets: New sort option "natural sort"

    Posted 01-19-2007 10:26
    Dear TC members,
    
    this is a proposal for a new attribute of the 


  • 2.  Re: [office] Proposal for Spreadsheets: New sort option "naturalsort"

    Posted 01-19-2007 15:46
    > Dear TC members,
    > 
    > this is a proposal for a new attribute of the 


  • 3.  Re: [office] Proposal for Spreadsheets: New sort option "natural sort"

    Posted 01-19-2007 20:58

    Interesting idea.

    How far do we take it?  

    For example do we allow multiple levels, as in:

    A1.1, A1.2, A1.10, ... , A19.1, A20.3, etc.

    -Rob

    Michael.Brauer@Sun.COM wrote on 01/19/2007 05:26:03 AM:

    > Dear TC members,
    >
    > this is a proposal for a new attribute of the <table:sort> element:
    >
    > The attribute "table:natural-sort" specifies how string values are sorted.
    > If the attribute's value is "true", string-prefixed numbers will be sorted
    > in a "natural", number-aware way, i.e. A1, A2, A3, ... , A19, A20,
    > instead of the normal, alpha-numeric behavior, i.e.
    > A1, A10, A11, A12, ... ,A19, A2, A20, A3, A4, ... , A8, A9.
    >
    > <define name="table-sort-attlist" combine="interleave">
    >      <optional>
    >          <attribute name="table:natural-sort" a:defaultValue="false">
    >              <ref name="boolean"/>
    >          </attribute>
    >      </optional>
    > </define>
    >
    > Best regards
    >
    > Michael
    > --
    > Michael Brauer, Technical Architect Software Engineering
    > StarOffice/OpenOffice.org
    > Sun Microsystems GmbH             Nagelsweg 55
    > D-20097 Hamburg, Germany          michael.brauer@sun.com
    > http://sun.com/staroffice         +49 40 23646 500
    > http://blogs.sun.com/GullFOSS
    >


  • 4.  Re: [office] Proposal for Spreadsheets: New sort option "naturalsort"

    Posted 01-19-2007 22:33
    Robert Weir asked:
    >How far do we take it?
    >For example do we allow multiple levels, as in:
    >A1.1, A1.2, A1.10, ... , A19.1, A20.3, etc.
    
    All definitions that I know of handle an arbitrary number of levels.
    
    I like the idea of it, it's VERY user-friendly, but there should be TWO natural sorts: case-sensitive and case-insensitive.
    
    The trick is to make sure we carefully define "natural sort".  There seems to be several definitions floating around.  Of the ones around, Martin Pool's implementation seems promising:
    * It seems to be the one most referred to, and the one most copied (e.g., by PHP)
    * He provides an implementation in C with a zlib-style (open source BSD-style license), so many implementors could probably start with that code easily (easing implementation)
    * It handles decimal numbers (with leading zeros) cleanly; not all other definitions do
    * He defines BOTH case-sensitive and case-insensitive natural sorts.  That makes sense.
    
    Pool's works like this.  To compare left, right, repeatedly do the following:
    * Skip all leading whitespace of left and right
    * If either begin with zero, compare the digits
       as left-aligned numbers  (consuming all consecutive digits of left and right)
       and return if not equal.
       This is basically the same as char-by-char comparisons of the digits starting from the left.
    * If both begin with non-zero digits, compare the
       digits as right-aligned numbers (consuming all consecutive digits of left and right)
       and return if not equal.
       This means the longest number of digits wins, else digits are compared char-by-char
    * Otherwise, compare the current character of left and right lexicographically
       (consuming one character of each) - this may be case-sensitive or not.
       Return if not equal.
    * If we haven't hit a difference, go to next unconsumed chars of left and right,
       and repeat until all characters consumed.
       If all characters consumed in both left and right, return equal.
    
    This works with the hidden assumption that an end-of-string is marked with a 0-byte (true for C).
    
    Unfortunately, there are lots of variations.  How do you handle whitespace? Non-alpha characters? Leading zeros?  They seem to vary :-(.
    
    Further info:
    * Martin Pool has an implementation and links to other info:
    http://sourcefrog.net/projects/natsort/
    I particularly like this one because he carefully deals with leading zeros,
    so that decimal numbers sort nicely. It looks like most other implementations are
    based on this implementation/definition.
    
    * PHP has one built-in, strnatcmp function, based on Martin Pool's:
    http://us3.php.net/manual/en/function.strnatcmp.php
    
    * Perl's CPAN has two implementations:
     http://search.cpan.org/~salva/Sort-Key/lib/Sort/Key/Natural.pm
     http://search.cpan.org/~sburke/Sort-Naturally-1.02/lib/Sort/Naturally.pm
    
    * OOo implementation:
    http://codesnippets.services.openoffice.org/Office/Office.SimpleNaturalSortAlgorithm.snip
    http://kohei.us/ooo/nsort/index.html
    
    * Stuart Cheshire's Mac Finder version: http://www.naturalordersort.org/
    This appears to be the "grandaddy" of the idea, but it's not clear that this
    handles leading 0's in a way that makes decimal numbers sort nicely.
    Pool re-implemented the idea as a function usable by others.
    
    --- David A. Wheeler