UBL Naming and Design Rules SC

 View Only

Re: [ubl-ndrsc] Containership Proposal

  • 1.  Re: [ubl-ndrsc] Containership Proposal

    Posted 03-07-2003 19:17
    I quite agree w/Arofan. 
    
    On Fri, 2003-03-07 at 10:58, A Gregory wrote:
    > Folks:
    > 
    > As per the discussion last Wednesday, here is a brief write-up of my
    > arguments regarding containership.
    > 
    > Cheers,
    > 
    > Arofan
    > 
    > _____________
    > 
    > UBL Release Op70 and Containership
    > ______________________________
    > 
    > Overview:
    > 
    > In the discussions about containership, a decision had been made to wait
    > until the Op70 release to see how the "normalization" of the LCSC modelling
    > activities would translate into XML structures, before making any decision
    > about containership. Generally speaking, the resulting XML has produced a
    > satisfactory level of containership. There are two areas where there are
    > problems, however: at the very top level, looking at the children of
    > document elements (Order, etc.); and in those cases where a child element
    > could be repeated many times, producing a "list" of like elements.
    > 
    > These two cases are examined primarily in terms of their effect on XML
    > processing, and whether they will  prove to be sub-optimized from the point
    > of view of XML processing with common tools/technologies. This argument also
    > looks at the easy comprehension of the XMl structures in these cases,
    > however, and whether  the usability of the XML structures might be enhanced
    > by the existence of additional containers in these two cases.
    > 
    > The issue of whether these containers represent semantic constructs is left
    > open for discussion, as it seems there may be some disagreement on this
    > point. It is assumed that this discussion will take place as the arguments
    > presented here are considered.
    > 
    > Issues:
    > 
    > As currently structured, the immedate child elements of a UBL document are
    > of two types - the "header" elements, appearing first in the document, as a
    > set of immediate children, and then a set of "item" elements, which in other
    > vocabularies typically make up the "body" section of a document. This
    > structuring is problematic for a number of reasons:
    > 
    > (1) Usability:
    > 
    > It is easier to see the distinction between these two types of child
    > elements if they are organized into two groups - a "header" and a "body".
    > Even if this is merely the result of traditional, presentation-based
    > structuring of vocabularies, it is still the case that many developers (and
    > other users) will find having the document-level element broken out into two
    > sections - header and body - easier to work with. This is not our primary
    > argument here, but, as we will see below, it becomes more important when we
    > look at the use of extensions.
    > 
    > (2) DOM Processing Efficiency:
    > 
    > Because many common XML tools use DOM structures to represent XML in
    > memory - notably XSLT and XSL-FO - we need to look at how well optimized the
    > existing structures are for this type of processing. When a specific element
    > is selected from a DOM representation, the nodes of the DOM tree must be
    > examined to find the desired node or nodes, often without recourse to the
    > XML schema itself. This means that the processor must examine each immediate
    > child of the root node, select those that match the selection criteria, and
    > then examine the immediate children of the matching nodes, and so on down
    > the tree, until the matching nodes have been found.
    > 
    > With the existing Op70, this is potentially a problem, particularly with
    > large documents, or with some large stylesheets. If I want to select an
    > item-type element from the body, I will have to examine a handful of
    > "header" elements before finding the matches in the "body" section below.
    > This is not ideal, but is not necessarily a problem, because there are not a
    > large number of header-type elements. The reverse case, however, is more
    > problematic. If I wish to select a header-type element from a document with
    > 200 items, then I will need to examine not only each of the relatively few
    > header elements, but also each of the 200 item elements. When the number of
    > potential selects in an XSLT stylesheet is considered, for example, then we
    > will see that we may have a problem.
    > 
    > By comparison, the existence of containers for the header and body elements
    > would allow the processor to examine many fewer children (two at the
    > document level, and then at most the handful of header elements at that
    > level). To briefly look at the way the numbers work: in the existing
    > structures, in an instance with 7 header elements and 200 items, to select a
    > header element I would need to examine the 207 immediate children of the
    > document element (and then however many nodes existed as children of each
    > matched node); with header and container elements, the first selection makes
    > me examine 2 nodes, and then the 7 different nodes inside the header element
    > (total nodes examined = 9).
    > 
    > While this will clearly vary with the number of items in the document
    > instance, do we really want to design document structures that perform well
    > only with small instances? There is no performance down-side to adding a
    > level of containership here, and only a very minor impact on the amount of
    > memory required to store the DOM tree being processed.
    > 
    > These same processing inefficienies will exist with any element structure
    > any of whose immediate children have cardinalities such as 1..n or 0..n.
    > From a processing perspective, "list containers would make the selection of
    > these children - and other, non-repeating children with the same parent -
    > much more efficient. When all of this type of element in a message is
    > considered, the processing efficiency could be compromised.
    > 
    > (3) Encapsulation and Java Binding:
    > 
    > Many tools for working with XML use a Java binding that equates elements
    > with java classes, which are then provided with "get" and "set" calls for
    > things like child elements or attribute values. This is true of such tools
    > as JAXB from Sun, and many other, similar technologies. (If you think about
    > it, this is very much the way we have done our data modelling, but in
    > reverse! Each class in the data model becomes an element/type in the XML
    > structures.)
    > 
    > In object-oriented programming languages, encapsulation works to simplify
    > and make more readable the code that is created. In our case, if I want all
    > the "header" information in a business document, and I am using a Java
    > binding as characterized above, I will need to deal with a set of a
    > half-dozen or more objects in order to construct or read from the document
    > object, as opposed to having a single object that encapsulates these. When I
    > want to get all of the body-type elements - items - from my order, I would
    > like to have a single object (a "body" object) that represented an array of
    > like objects (items), as this simplifies the code that reads or creates
    > these items in the document. In processing terms, the header and body
    > information is quite different - often, the "header" information provides
    > the context in which the items in the "body" are processed, so a division at
    > this level makes a great deal of sense from the point of view of object
    > encapsulation.
    > 
    > (4) Extension Methodology:
    > 
    > The current Op70 release is fairly adequate from this perspective, with the
    > exception of the lack of division of the document into "header" and body"
    > elements. (This argument does not apply to "lists" elsewhere in the document
    > structure.) Because XSD extension only allows us to add elements at the
    > *end* of existing structures, any additions of header-type information I
    > make at the document level will have to appear after all of the items in the
    > document. This exacerbates all of the problems stated above: because I can't
    > add an element to the header information (there not being a containing
    > element for header information), I have some header information before the
    > items, and some afterwards. This is extremely confusing to users, and
    > suffers from all of the processing inefficiencies stated before. It is also
    > suceptible to the same solution - the addition of a containing element
    > around the header information, so that header extensions could be added
    > there.
    > 
    > Note that this effect also complicates SAX-type processing. Often,
    > header-type information sets the stage for item processing, by establishing
    > who is placing an order, for example. When processing without benefit of a
    > DOM, added header information that appears _after_ the items of the document
    > would require a second pass through the XML instance, to determine what to
    > do with the items in the instance, assuming this header information is
    > needed to understand how to process the items. This negates much of the
    > efficiency advantages of SAX processing over DOM processing - the use of
    > memory to record the contents of the document while processing.
    > 
    > 
    > Recommendations:
    > 
    > The recommendations here are simple, and easily expressed as rules that
    > could be automatically enforced with the scripts that generate the XML
    > structures:
    > 
    > (1) All documents are divided into a "header" section and a "body" section
    > (which division is already implicit in the contents of the messages in Op70,
    > a suspicious fact when you consider the usability arguments above...), using
    > some simple naming rules based on the name of the document-level element.
    > These constructs, if deemed to have semantic content, could appear in the
    > business models; alternately, they could only appear in the XML and the
    > implementation models. (On this point, I am agnostic, but I would like to
    > point out that there is no requirement here to impact the work of LCSC at
    > all!)
    > 
    > (2) All elements that have a cardinality of 1..n or 0..n should have a "list
    > container", the name of which is created by adding an "s" to the end of the
    > child element contained (or otherwise pluralizing it). Again, this need not
    > be a semantic construct appearing in the business model, but could simply be
    > a construct in the implementation model. There is no need for the work of
    > LCSC to be altered.
    > 
    > 
    > 
    > ----------------------------------------------------------------
    > To subscribe or unsubscribe from this elist use the subscription
    > manager: <http://lists.oasis-open.org/ob/adm.pl>
    -- 
    Eduardo Gutentag               |         e-mail: eduardo.gutentag@Sun.COM
    Web Technologies and Standards |         Phone:  +1 510 550 4616 x31442
    Sun Microsystems Inc.          |         1800 Harrison St. Oakland, CA 94612
    W3C AC Rep / OASIS TAB Chair
    
    
    ----------------------------------------------------------------
    To subscribe or unsubscribe from this elist use the subscription
    manager: <http://lists.oasis-open.org/ob/adm.pl>