UBL Naming and Design Rules SC

RE: [ubl-ndrsc] Containership issue with current LCSC samples

  • 1.  RE: [ubl-ndrsc] Containership issue with current LCSC samples

    Posted 03-07-2003 15:32
    Point taken: performance is always an issue for someone.  Just so everyone
    understands though -- the specific proposals we're discussing:
     (a) extra container elements for each element with numOccurs "large", and 
     (b) extra container elements to group elements (that are somehow not
    already 
         grouped as an ABIE, yet for performance reasons should be shoved 
         down one level in the tree representation)
    ... neither of these materially affect document size.  The issue of session
    timeouts is driven by document size and processing time.  If anything, the
    extra container elements will make documents slightly _bigger_ -- but not
    enough to worry about.  So we're left with processing time as the only
    issue.  How much processing time does it take to process a 60MB PO?  What
    sort of difference do (a) and (b) make on important scenarios for most
    customers?
    
    I believe that with the current structure, most users will be able to do all
    their processing, efficiently, with reasonably simple code.  For customers
    that do high volume -- either lots of documents per tick, or very large
    documents, or both, optimizations will be necessary.
    
    What kind of user does high volume, large documents?  The _large_ company.
    Does the little-guy have any problem here?  No.
    
    Now, how does it go, we should strive to make the typical case easy and the
    difficult case possible.  Will it be _possible_ for the big guy to process
    UBL?  Worst case, the big guy can do a O(N) single-pass over a document,
    converting it into whatever structure he needs to optimize his hot access
    paths.  He might transform a UBL doc into another XML doc and then work on
    that one.  Alternatively he might build indexed in-memory structures, or
    write things off to a modern RDBMS.  Whatever.
    
    UBL is an interoperability format -- not a database format.  No matter how
    we structure it, _someone's_ processing will be inefficient.  A central
    lesson of 40 years of database practice is that it's a bad idea to contort
    your data model with too many processing concerns up-front because you'll
    just get it wrong for half the people anyway.  That's why dbms's have
    indexing, and statistics-based query optimizers -- so that you can come in
    _after_ the fact and without changing your _design_ (analogous to our UBL
    schema) you can make new applications efficient.  
    
    So if the argument isn't about better understandability then I don't buy it.
    Show me the performance problem, and I'll show you the solution.
    
    The NDR around "containers" as it stands makes the typical case easy and the
    difficult case possible.  It seems to me, out of balance, to spend more
    resources on this non-issue now, when much more urgent work needs attention.