Point taken: performance is always an issue for someone. Just so everyone
understands though -- the specific proposals we're discussing:
(a) extra container elements for each element with numOccurs "large", and
(b) extra container elements to group elements (that are somehow not
already
grouped as an ABIE, yet for performance reasons should be shoved
down one level in the tree representation)
... neither of these materially affect document size. The issue of session
timeouts is driven by document size and processing time. If anything, the
extra container elements will make documents slightly _bigger_ -- but not
enough to worry about. So we're left with processing time as the only
issue. How much processing time does it take to process a 60MB PO? What
sort of difference do (a) and (b) make on important scenarios for most
customers?
I believe that with the current structure, most users will be able to do all
their processing, efficiently, with reasonably simple code. For customers
that do high volume -- either lots of documents per tick, or very large
documents, or both, optimizations will be necessary.
What kind of user does high volume, large documents? The _large_ company.
Does the little-guy have any problem here? No.
Now, how does it go, we should strive to make the typical case easy and the
difficult case possible. Will it be _possible_ for the big guy to process
UBL? Worst case, the big guy can do a O(N) single-pass over a document,
converting it into whatever structure he needs to optimize his hot access
paths. He might transform a UBL doc into another XML doc and then work on
that one. Alternatively he might build indexed in-memory structures, or
write things off to a modern RDBMS. Whatever.
UBL is an interoperability format -- not a database format. No matter how
we structure it, _someone's_ processing will be inefficient. A central
lesson of 40 years of database practice is that it's a bad idea to contort
your data model with too many processing concerns up-front because you'll
just get it wrong for half the people anyway. That's why dbms's have
indexing, and statistics-based query optimizers -- so that you can come in
_after_ the fact and without changing your _design_ (analogous to our UBL
schema) you can make new applications efficient.
So if the argument isn't about better understandability then I don't buy it.
Show me the performance problem, and I'll show you the solution.
The NDR around "containers" as it stands makes the typical case easy and the
difficult case possible. It seems to me, out of balance, to spend more
resources on this non-issue now, when much more urgent work needs attention.