OASIS LegalDocumentML (LegalDocML) TC

  • 1.  On ids and numbering

    Posted 12-03-2013 10:14
    Dear all, let me make a proposal on numbering and ids from the discussions we had on the past teleconfs. Most of the actual abbreviations I used are invented on the spot, so please don't look at them critically. The generic syntax for an id is the following: [prefix "__"] abbr ["_" num] * prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears. * abbr is an abbreviation describing the element, and it is drawn from the list of abbreviations that Veronique has first created and Grant improved. * num is a (possibly empty) representation of the numbering of the element within its context. If the element is necessarily unique within its context, no numbering is used. This is how to use this syntax: * An explicitly numbered element is an element that is numbered by the author of the expression, so that it is not the task of the author of the markup to establish such number, but only to recognize it in the text. Such number is most frequently placed in a <num> element inside the element's structure. An implicitly numbered element, on the other hand, is an element that was not numbered by the author of the expression, and therefore must be numbered in some ways by the author of the manifestation, and in particular has no <num> element relating to itself anywhere. * The context of the numbering of elements <X> are the containing elements <Y> that suggest, imply or force a re-start of the numbering of all internal <X>s. This can be either explicit (when the <X>s are explicitly numbered) or implicit (otherwise). Different contexts imply that elements with the same name may end up having the same abbr and the same number, and must therefore be disambiguated through the use of a prefix. The best option for such prefix is the id of the context element <Y>. For instance, in many traditions chapters restart numbering within every title, so "chp_2" for Chapter 2 could be ambiguous. Therefore, in these cases the id for Chapter 2 of Title I could be "ttl_I_chp_2" (assuming that "ttl_I" unambiguously identifies Title I). * All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). Similarly, <quotedStructure> and <extractStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. Finally, plain inline elements are NEVER contexts. For instance, the id for article 12 in the first document of a composite document will be "doc_1__art_12", while in the second document it will be "doc_2__art_12". * Elements that are necessarily unique within a given context will require NO numbering. For instance, there is exactly ONE <body> in acts and bills, and therefore its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, there is at most ONE <content> element inside articles or sections, and therefore the id of the <content> element of article 12 will be simply "art_12__cnt". * What constitute a context is a tradition-dependent issue for explicitly numbered elements. That is to say, when the element is explicitly numbered we need to make sure whether the numberings starts multiple times in the same document. If this is the case, then the identification of the correct prefix requires the identification of the element causing the restart, and ITS id is used as prefix. For instance, while in many traditions articles are always globally numbered, in Latin America a special structure of transitional articles is added at the end of the document, whose numbering restarts. In this case, article 12 of the main part of the document will have id "main__art_12" while article 12 of the transitional part will be "transitional__art_12". * For non-eplicilty numbered elements, on the other hand, it is a manifestation-level decision to determine an element smaller than the document class (if any) that would constitute the context for the element. Obvious choices would be: - none (i.e., all non-numbered elements are numbered from the beginning of the relevant document class). - the closest hcontainer (i.e., all elements within a hierarchy would be numbered after the id of the containing hierarchical elements). - the closest containing element (ignoring containing inlines): block, container, container. Suppose for instance that we are considering the numbering of the third <ref> element within the second <p> element of article 12. In turn, by counting from the smallest enclosing document class, article 12 is explicitly numbered, and the local tradition has no context except for the document, thus its id is always "art_12". The second <p> of the article is actually the 14th <p> of the whole document, and its third ref is actually the 9th of the whole document and the fifth of the article (because the first <p> has two more refs). Thus we have the following options: - Case "none": 'art_12' for the <article>, 'p_14' for the <p> and 'ref_9' for the <ref>. - Case "hcontainer": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__ref_5' for the <ref>. - Case "closest containing element": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__p_2__ref_3' for the <ref>. Of course additional complexities may arise from mixing up policies depending on the elements (e.g. making akoma-specific elements become hcontainer-driven and html-specific elements become container-driven), but I would argue against this policy. My vote is cast for the case "hcontainer" but I will not cut my wrists in case another option is voted. I would strongly defend all the other proposals made in this message. * One last discussion item regards abundant or incomplete references. An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information. In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element. let me know what you think. Ciao Fabio -- Fabio Vitali Tiger got to hunt, bird got to fly, Dept. of Computer Science Man got to sit and wonder "Why, why, why?' Univ. of Bologna ITALY Tiger got to sleep, bird got to land, phone: +39 051 2094872 Man got to tell himself he understand. e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle" http://vitali.web.cs.unibo.it/


  • 2.  Re: [akomantoso-xml] On ids and numbering

    Posted 12-03-2013 18:45
    My comments are below at GCV: On Tue, Dec 3, 2013 at 2:13 AM, Fabio Vitali < fabio@cs.unibo.it > wrote: Dear all, let me make a proposal on numbering and ids from the discussions we had on the past teleconfs. Most of the actual abbreviations I used are invented on the spot, so please don't look at them critically. The generic syntax for an id is the following: [prefix "__"] abbr ["_" num] * prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears. * abbr is an abbreviation describing the element, and it is drawn from the list of abbreviations that Veronique has first created and Grant improved. * num is a (possibly empty) representation of the numbering of the element within its context. If the element is necessarily unique within its context, no numbering is used. GCV: I agree. I would add though the the "num" part must be case-sensitive. I have cases, here in California, where sec_1__p_iv and sec_1__p_IV will both exist and will be identifiying different paragraphs. It is the practice here to use numbering a...z, A..Z, lower roman numerals, and upper lower roman numberals in the same sequence. (And yes, there is an unfortunate overlapping problem - but that can't be changed). This is how to use this syntax: * An explicitly numbered element is an element that is numbered by the author of the _expression_, so that it is not the task of the author of the markup to establish such number, but only to recognize it in the text. Such number is most frequently placed in a <num> element inside the element's structure. An implicitly numbered element, on the other hand, is an element that was not numbered by the author of the _expression_, and therefore must be numbered in some ways by the author of the manifestation, and in particular has no <num> element relating to itself anywhere. GCV: I agree  * The context of the numbering of elements <X> are the containing elements <Y> that suggest, imply or force a re-start of the numbering of all internal <X>s. This can be either explicit (when the <X>s are explicitly numbered) or implicit (otherwise). Different contexts imply that elements with the same name may end up having the same abbr and the same number, and must therefore be disambiguated through the use of a prefix. The best option for such prefix is the id of the context element <Y>. For instance, in many traditions chapters restart numbering within every title, so "chp_2" for Chapter 2 could be ambiguous. Therefore, in these cases the id for Chapter 2 of Title I could be "ttl_I_chp_2" (assuming that "ttl_I" unambiguously identifies Title I). GCV: (assuming you mean "ttl_1__chp_2") I agree  * All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). Similarly, <quotedStructure> and <extractStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. Finally, plain inline elements are NEVER contexts. For instance, the id for article 12 in the first document of a composite document will be "doc_1__art_12", while in the second document it will be "doc_2__art_12". GCV: I agree  * Elements that are necessarily unique within a given context will require NO numbering. For instance, there is exactly ONE <body> in acts and bills, and therefore its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, there is at most ONE <content> element inside articles or sections, and therefore the id of the <content> element of article 12 will be simply "art_12__cnt". GCV: I agree  * What constitute a context is a tradition-dependent issue for explicitly numbered elements. That is to say, when the element is explicitly numbered we need to make sure whether the numberings starts multiple times in the same document. If this is the case, then the identification of the correct prefix requires the identification of the element causing the restart, and ITS id is used as prefix. For instance, while in many traditions articles are always globally numbered, in Latin America a special structure of transitional articles is added at the end of the document, whose numbering restarts. In this case, article 12 of the main part of the document will have id "main__art_12" while article 12 of the transitional part will be "transitional__art_12". GCV: I think that some elements (notes, refs, and figures, in particular), may use a numbering context that is different from the surrounding context. Also, English tradition generally causes sections to be globally numbered rather than articles.  * For non-eplicilty numbered elements, on the other hand, it is a manifestation-level decision to determine an element smaller than the document class (if any) that would constitute the context for the element. Obvious choices would be:  - none (i.e., all non-numbered elements are numbered from the beginning of the relevant document class).  - the closest hcontainer (i.e., all elements within a hierarchy would be numbered after the id of the containing hierarchical elements).  - the closest containing element (ignoring containing inlines): block, container, container. Suppose for instance that we are considering the numbering of the third <ref> element within the second <p> element of article 12. In turn, by counting from the smallest enclosing document class, article 12 is explicitly numbered, and the local tradition has no context except for the document, thus its id is always "art_12". The second <p> of the article is actually the 14th <p> of the whole document, and its third ref is actually the 9th of the whole document and the fifth of the article (because the first <p> has two more refs). Thus we have the following options:   - Case "none": 'art_12' for the <article>, 'p_14' for the <p> and 'ref_9' for the <ref>.   - Case "hcontainer": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__ref_5' for the <ref>.   - Case "closest containing element": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__p_2__ref_3' for the <ref>. Of course additional complexities may arise from mixing up policies depending on the elements (e.g. making akoma-specific elements become hcontainer-driven and html-specific elements become container-driven), but I would argue against this policy. My vote is cast for the case "hcontainer" but I will not cut my wrists in case another option is voted. I would strongly defend all the other proposals made in this message. GCV: I think that how unnumbered elements are assigned an id will be a jurisdiction specific decision - and it will vary based on the case. I would prefer that unnumbered paragraphs (non-designated to use LRC jargon) be numbered within their immediate surrounding context, but the refs and notes be numbered based on a higher context.  * One last discussion item regards abundant or incomplete references. An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information. In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element. GCV: I feel quite strongly that ids should contain a minimal (rather than maximal) set of information required to identify an ite. That way, if the reference is abundant, you can always resolve it - by throwing away information that you don't need. If the id was maximal, and the refernece was incomplete, it would not be possible to resolve it.   let me know what you think. Ciao Fabio -- Fabio Vitali                            Tiger got to hunt, bird got to fly, Dept. of Computer Science        Man got to sit and wonder "Why, why, why?' Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land, phone:  +39 051 2094872              Man got to tell himself he understand. e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle" http://vitali.web.cs.unibo.it/ -- You received this message because you are subscribed to the Google Groups "akomantoso-xml" group. To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com . To post to this group, send email to akomantoso-xml@googlegroups.com . Visit this group at http://groups.google.com/group/akomantoso-xml . For more options, visit https://groups.google.com/groups/opt_out . -- ____________________________________________________________________ Grant Vergottini Xcential Group, LLC. email: grant.vergottini@xcential.com phone: 858.361.6738


  • 3.  RE:[legaldocml] Re: [akomantoso-xml] On ids and numbering

    Posted 12-09-2013 12:37



    Hello,

    Please find hereafter my comments (VPA: )

    Kind regards





    Véronique Parisse
    AUBAY Luxembourg


    Orco House
    38, Parc d’activités - L-8308 Capellen
    Standard : +352 2992501
    Fax : +352 299251
    www.aubay.com







    De : legaldocml@lists.oasis-open.org [legaldocml@lists.oasis-open.org] de la part de Grant Vergottini [grant.vergottini@xcential.com]
    Envoyé : mardi 3 décembre 2013 19:45
    À : akomantoso-xml@googlegroups.com
    Cc : legaldocml@lists.oasis-open.org
    Objet : [legaldocml] Re: [akomantoso-xml] On ids and numbering




    My comments are below at GCV:


    On Tue, Dec 3, 2013 at 2:13 AM, Fabio Vitali
    < fabio@cs.unibo.it > wrote:

    Dear all,

    let me make a proposal on numbering and ids from the discussions we had on the past teleconfs. Most of the actual abbreviations I used are invented on the spot, so please don't look at them critically.

    The generic syntax for an id is the following:

    [prefix "__"] abbr ["_" num]

    * prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears.
    * abbr is an abbreviation describing the element, and it is drawn from the list of abbreviations that Veronique has first created and Grant improved.
    * num is a (possibly empty) representation of the numbering of the element within its context. If the element is necessarily unique within its context, no numbering is used.


    GCV: I agree. I would add though the the "num" part must be case-sensitive. I have cases, here in California, where sec_1__p_iv and sec_1__p_IV will both exist and will be identifiying different paragraphs. It is the practice here to use numbering a...z, A..Z,
    lower roman numerals, and upper lower roman numberals in the same sequence. (And yes, there is an unfortunate overlapping problem - but that can't be changed).


    VPA: I agree with the remark of Grant : num must be case sensitive.





    This is how to use this syntax:

    * An explicitly numbered element is an element that is numbered by the author of the _expression_, so that it is not the task of the author of the markup to establish such number, but only to recognize it in the text. Such number is most frequently placed in
    a <num> element inside the element's structure. An implicitly numbered element, on the other hand, is an element that was not numbered by the author of the _expression_, and therefore must be numbered in some ways by the author of the manifestation, and in particular
    has no <num> element relating to itself anywhere.


    GCV: I agree


    VPA: The numbering of the attribute is based on the numbering of the _expression_.


    So, the numbering is language specific ? How to make the correspondance between translation of the same structure ?




    For example, if we add, in an act in force, two new articles after, for example, "article 1, without renumbering, the number will be :



    " 1 bis " and " 1
    te r " for Spanish, French, Italian and Dutch

    " 1 a " and " 1 b " for
    Czech, Danish, German, Estonian, English, Croatian, Latvian, Lithuanian, Hungarian, Maltese, Polish, Romanian, Slovak, Slovenian, Finnish and Swedish

    " 1 A " and " 1 B " for Portuguese

    " 1 a " and " 1 ß " for greek

    " 1 a " and " 1
    ? " for bulgarian.





    Also, what is the number when, in the text, we found "Sole Article" ? Does we transform it to "1" ?




    * The context of the numbering of elements <X> are the containing elements <Y> that suggest, imply or force a re-start of the numbering of all internal <X>s. This can be either explicit (when the <X>s are explicitly numbered) or implicit (otherwise). Different
    contexts imply that elements with the same name may end up having the same abbr and the same number, and must therefore be disambiguated through the use of a prefix. The best option for such prefix is the id of the context element <Y>. For instance, in many
    traditions chapters restart numbering within every title, so "chp_2" for Chapter 2 could be ambiguous. Therefore, in these cases the id for Chapter 2 of Title I could be "ttl_I_chp_2" (assuming that "ttl_I" unambiguously identifies Title I).



    GCV: (assuming you mean "ttl_1__chp_2") I agree


    VPA: When the footnotes are never renumbered, does we use the id of the containing element to prefix the footnotes ?




    * All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). Similarly,
    <quotedStructure> and <extractStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. Finally, plain inline elements are NEVER contexts. For instance, the id for article
    12 in the first document of a composite document will be "doc_1__art_12", while in the second document it will be "doc_2__art_12".


    GCV: I agree


    VPA: When the document is an attachment of another one, the context must be composed with the prefix of the containing element; for exemple doc_1__annex_1__...






    * Elements that are necessarily unique within a given context will require NO numbering. For instance, there is exactly ONE <body> in acts and bills, and therefore its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously,
    there is at most ONE <content> element inside articles or sections, and therefore the id of the <content> element of article 12 will be simply "art_12__cnt".


    GCV: I agree


    VPA: ok




    * What constitute a context is a tradition-dependent issue for explicitly numbered elements. That is to say, when the element is explicitly numbered we need to make sure whether the numberings starts multiple times in the same document. If this is the case,
    then the identification of the correct prefix requires the identification of the element causing the restart, and ITS id is used as prefix. For instance, while in many traditions articles are always globally numbered, in Latin America a special structure of
    transitional articles is added at the end of the document, whose numbering restarts. In this case, article 12 of the main part of the document will have id "main__art_12" while article 12 of the transitional part will be "transitional__art_12".


    GCV: I think that some elements (notes, refs, and figures, in particular), may use a numbering context that is different from the surrounding context. Also, English tradition generally causes sections to be globally numbered rather than articles. 


    * For non-eplicilty numbered elements, on the other hand, it is a manifestation-level decision to determine an element smaller than the document class (if any) that would constitute the context for the element. Obvious choices would be:
     - none (i.e., all non-numbered elements are numbered from the beginning of the relevant document class).
     - the closest hcontainer (i.e., all elements within a hierarchy would be numbered after the id of the containing hierarchical elements).
     - the closest containing element (ignoring containing inlines): block, container, container.

    Suppose for instance that we are considering the numbering of the third <ref> element within the second <p> element of article 12. In turn, by counting from the smallest enclosing document class, article 12 is explicitly numbered, and the local tradition has
    no context except for the document, thus its id is always "art_12". The second <p> of the article is actually the 14th <p> of the whole document, and its third ref is actually the 9th of the whole document and the fifth of the article (because the first <p>
    has two more refs). Thus we have the following options:

      - Case "none": 'art_12' for the <article>, 'p_14' for the <p> and 'ref_9' for the <ref>.
      - Case "hcontainer": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__ref_5' for the <ref>.
      - Case "closest containing element": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__p_2__ref_3' for the <ref>.

    Of course additional complexities may arise from mixing up policies depending on the elements (e.g. making akoma-specific elements become hcontainer-driven and html-specific elements become container-driven), but I would argue against this policy. My vote is
    cast for the case "hcontainer" but I will not cut my wrists in case another option is voted. I would strongly defend all the other proposals made in this message.



    GCV: I think that how unnumbered elements are assigned an id will be a jurisdiction specific decision - and it will vary based on the case. I would prefer that unnumbered paragraphs (non-designated to use LRC jargon) be numbered within their immediate
    surrounding context, but the refs and notes be numbered based on a higher context.



    VPA: For footnote, the numbering is generally done by the author. But the number of a ref is done implicitly and is a technical number. Maybe it is better to have a number relative to the surrounding
    element structure : if you work with documents that are progressively improved, the number need to be implicitly recalculate each time there is a new ref inside this structure (because the author add one or because there is a detection of a ref in an existing
    document (more precise markup) ).
    Also, if an amendment add a ne w ref, the impact of this insertion depend s on the greatness of the context



    VPA :How do you disambiguate the reference 'art_12__p_2__ref_3', without knowing how is calculated the id for this document (relatively to the "p"or relatively to the "article") (and, also, how is calculated the reference) ?




    * One last discussion item regards abundant or incomplete references. An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on
    the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a
    manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could
    not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information.


    In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with
    the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element.



    GCV: I feel quite strongly that ids should contain a minimal (rather than maximal) set of information required to identify an ite. That way, if the reference is abundant, you can always resolve it - by throwing away information that you don't need. If
    the id was maximal, and the refernece was incomplete, it would not be possible to resolve it. 



    VPA: searching for "chap_1__sect_2__art_12__par_1", how can the resolver works without knowing anything of the tradition (is article continuous numbered or not ?). If a system is multi tradition,
    is it not necessary a proprietary of the document to inform about the algorithm used to build the id ?





    let me know what you think.

    Ciao

    Fabio



    --

    Fabio Vitali                            Tiger got to hunt, bird got to fly,
    Dept. of Computer Science        Man got to sit and wonder "Why, why, why?'
    Univ. of Bologna  ITALY               Tiger got to sleep, bird got to land,
    phone:  +39 051 2094872              Man got to tell himself he understand.
    e-mail: fabio@cs.unibo.it         Kurt Vonnegut (1922-2007), "Cat's cradle"
    http://vitali.web.cs.unibo.it/




    --
    You received this message because you are subscribed to the Google Groups "akomantoso-xml" group.
    To unsubscribe from this group and stop receiving emails from it, send an email to
    akomantoso-xml+unsubscribe@googlegroups.com .
    To post to this group, send email to
    akomantoso-xml@googlegroups.com .
    Visit this group at
    http://groups.google.com/group/akomantoso-xml .
    For more options, visit
    https://groups.google.com/groups/opt_out .






    --
    ____________________________________________________________________
    Grant Vergottini
    Xcential Group, LLC.
    email: grant.vergottini@xcential.com
    phone: 858.361.6738









  • 4.  Re: [legaldocml] [legaldocml] Re: [akomantoso-xml] On ids and numbering

    Posted 12-11-2013 17:54
    Dear Veronique, Grant, all: > On Tue, Dec 3, 2013 at 2:13 AM, Fabio Vitali <fabio@cs.unibo.it> wrote: > Dear all, > > let me make a proposal on numbering and ids from the discussions we had on the past teleconfs. Most of the actual abbreviations I used are invented on the spot, so please don't look at them critically. > > The generic syntax for an id is the following: > > [prefix "__"] abbr ["_" num] > > * prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears. > * abbr is an abbreviation describing the element, and it is drawn from the list of abbreviations that Veronique has first created and Grant improved. > * num is a (possibly empty) representation of the numbering of the element within its context. If the element is necessarily unique within its context, no numbering is used. > > GCV: I agree. I would add though the the "num" part must be case-sensitive. I have cases, here in California, where sec_1__p_iv and sec_1__p_IV will both exist and will be identifiying different paragraphs. It is the practice here to use numbering a...z, A..Z, lower roman numerals, and upper lower roman numberals in the same sequence. (And yes, there is an unfortunate overlapping problem - but that can't be changed). > > VPA: I agree with the remark of Grant : num must be case sensitive. I agree with both of you. > This is how to use this syntax: > > * An explicitly numbered element is an element that is numbered by the author of the expression, so that it is not the task of the author of the markup to establish such number, but only to recognize it in the text. Such number is most frequently placed in a <num> element inside the element's structure. An implicitly numbered element, on the other hand, is an element that was not numbered by the author of the expression, and therefore must be numbered in some ways by the author of the manifestation, and in particular has no <num> element relating to itself anywhere. > > GCV: I agree > > VPA: The numbering of the attribute is based on the numbering of the expression. > So, the numbering is language specific ? How to make the correspondance between translation of the same structure ? > > For example, if we add, in an act in force, two new articles after, for example, "article 1, without renumbering, the number will be : > • "1 bis" and "1 ter" for Spanish, French, Italian and Dutch > • "1 a" and "1 b" for Czech, Danish, German, Estonian, English, Croatian, Latvian, Lithuanian, Hungarian, Maltese, Polish, Romanian, Slovak, Slovenian, Finnish and Swedish > • "1 A" and "1 B" for Portuguese > • "1 a" and "1 ß" for greek > • "1 a" and "1 ?" for bulgarian. My opinion is that the currentId of an explicitly numbered element needs to be established syntactically given the num element in the document, and not semantically given its position in the document. This implies that corresponding structures in different languages will have different ids, and correspondences are maintained through different means (e.g., GUID). > Also, what is the number when, in the text, we found "Sole Article" ? Does we transform it to "1" ? The number is some rendering of the <num> element. If Sole Article is rendered as <article> <num>Sole Article</num> <content> <p>Blah blah</p> </content> </article> then the id should be "sole": "doc_1__art_sole" > * The context of the numbering of elements <X> are the containing elements <Y> that suggest, imply or force a re-start of the numbering of all internal <X>s. This can be either explicit (when the <X>s are explicitly numbered) or implicit (otherwise). Different contexts imply that elements with the same name may end up having the same abbr and the same number, and must therefore be disambiguated through the use of a prefix. The best option for such prefix is the id of the context element <Y>. For instance, in many traditions chapters restart numbering within every title, so "chp_2" for Chapter 2 could be ambiguous. Therefore, in these cases the id for Chapter 2 of Title I could be "ttl_I_chp_2" (assuming that "ttl_I" unambiguously identifies Title I). > > GCV: (assuming you mean "ttl_1__chp_2") I agree Of course I do. > VPA: When the footnotes are never renumbered, does we use the id of the containing element to prefix the footnotes ? Editorial (manifestation-level) <note>s are numbered as to the whim of the author of the manifestation. Authorial <authorialNote>s are explicitly numbered through a marker attribute. I suggest to use the content of the marker attribute as the number of the corresponding id. > * All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). Similarly, <quotedStructure> and <extractStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. Finally, plain inline elements are NEVER contexts. For instance, the id for article 12 in the first document of a composite document will be "doc_1__art_12", while in the second document it will be "doc_2__art_12". > > GCV: I agree > > VPA: When the document is an attachment of another one, the context must be composed with the prefix of the containing element; for exemple doc_1__annex_1__... I agree with Veronique. Attachments need contexts. > * Elements that are necessarily unique within a given context will require NO numbering. For instance, there is exactly ONE <body> in acts and bills, and therefore its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, there is at most ONE <content> element inside articles or sections, and therefore the id of the <content> element of article 12 will be simply "art_12__cnt". > > GCV: I agree > > VPA: ok Ok. > * What constitute a context is a tradition-dependent issue for explicitly numbered elements. That is to say, when the element is explicitly numbered we need to make sure whether the numberings starts multiple times in the same document. If this is the case, then the identification of the correct prefix requires the identification of the element causing the restart, and ITS id is used as prefix. For instance, while in many traditions articles are always globally numbered, in Latin America a special structure of transitional articles is added at the end of the document, whose numbering restarts. In this case, article 12 of the main part of the document will have id "main__art_12" while article 12 of the transitional part will be "transitional__art_12". > > GCV: I think that some elements (notes, refs, and figures, in particular), may use a numbering context that is different from the surrounding context. Also, English tradition generally causes sections to be globally numbered rather than articles. If we end up with different contexts depending on the element, then we MUST issue a document detailing which elements are numbered globally and which are numbered according to which context. > * For non-eplicilty numbered elements, on the other hand, it is a manifestation-level decision to determine an element smaller than the document class (if any) that would constitute the context for the element. Obvious choices would be: > - none (i.e., all non-numbered elements are numbered from the beginning of the relevant document class). > - the closest hcontainer (i.e., all elements within a hierarchy would be numbered after the id of the containing hierarchical elements). > - the closest containing element (ignoring containing inlines): block, container, container. > > Suppose for instance that we are considering the numbering of the third <ref> element within the second <p> element of article 12. In turn, by counting from the smallest enclosing document class, article 12 is explicitly numbered, and the local tradition has no context except for the document, thus its id is always "art_12". The second <p> of the article is actually the 14th <p> of the whole document, and its third ref is actually the 9th of the whole document and the fifth of the article (because the first <p> has two more refs). Thus we have the following options: > > - Case "none": 'art_12' for the <article>, 'p_14' for the <p> and 'ref_9' for the <ref>. > - Case "hcontainer": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__ref_5' for the <ref>. > - Case "closest containing element": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__p_2__ref_3' for the <ref>. > > Of course additional complexities may arise from mixing up policies depending on the elements (e.g. making akoma-specific elements become hcontainer-driven and html-specific elements become container-driven), but I would argue against this policy. My vote is cast for the case "hcontainer" but I will not cut my wrists in case another option is voted. I would strongly defend all the other proposals made in this message. > > GCV: I think that how unnumbered elements are assigned an id will be a jurisdiction specific decision - and it will vary based on the case. I would prefer that unnumbered paragraphs (non-designated to use LRC jargon) be numbered within their immediate surrounding context, but the refs and notes be numbered based on a higher context. If we end up with different contexts depending on the element, then we MUST issue a document detailing which elements are numbered globally and which are numbered according to which context. > VPA: For footnote, the numbering is generally done by the author. But the number of a ref is done implicitly and is a technical number. Maybe it is better to have a number relative to the surrounding element structure : if you work with documents that are progressively improved, the number need to be implicitly recalculate each time there is a new ref inside this structure (because the author add one or because there is a detection of a ref in an existing document (more precise markup) ). > Also, if an amendment add a new ref, the impact of this insertion depends on the greatness of the context If we end up with different contexts depending on the element, then we MUST issue a document detailing which elements are numbered globally and which are numbered according to which context. > VPA :How do you disambiguate the reference 'art_12__p_2__ref_3', without knowing how is calculated the id for this document (relatively to the "p"or relatively to the "article") (and, also, how is calculated the reference) ? If the fragment art_12__p_2 is present in the id, then ref_3 is local to the element whose id is art_12__p_2 . Thus it is necessarily the third ref of the second p. > * One last discussion item regards abundant or incomplete references. An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information. > > In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element. > > GCV: I feel quite strongly that ids should contain a minimal (rather than maximal) set of information required to identify an ite. That way, if the reference is abundant, you can always resolve it - by throwing away information that you don't need. If the id was maximal, and the refernece was incomplete, it would not be possible to resolve it. > > VPA: searching for "chap_1__sect_2__art_12__par_1", how can the resolver works without knowing anything of the tradition (is article continuous numbered or not ?). If a system is multi tradition, is it not necessary a proprietary of the document to inform about the algorithm used to build the id ? Discussions are pointless until we try concretely the algorithm. For instance, given "chap_1__sect_2__art_12__par_1": 1) Is there an element whose id is "chap_1__sect_2__art_12__par_1" ? - If yes, you've found it - If no, then: 2) Is there an element whose id is "par_1" within an element that can be recursively identified through this method with "chap_1__sect_2__art_12"? - If yes, you've found it - If no, then: 3) Is there an element whose id is "art_12__par_1" within an element that can be recursively identified through this method with "chap_1__sect_2"? - If yes, you've found it - If no, then: 4) Is there an element whose id is "sect_2__art_12__par_1" within an element whose id is "chap_1"? - If yes, you've found it - If no, then you have an error. Ciao Fabio -- > > > let me know what you think. > > Ciao > > Fabio > > > > -- > > Fabio Vitali Tiger got to hunt, bird got to fly, > Dept. of Computer Science Man got to sit and wonder "Why, why, why?' > Univ. of Bologna ITALY Tiger got to sleep, bird got to land, > phone: +39 051 2094872 Man got to tell himself he understand. > e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle" > http://vitali.web.cs.unibo.it/ > > > > > -- > You received this message because you are subscribed to the Google Groups "akomantoso-xml" group. > To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com. > To post to this group, send email to akomantoso-xml@googlegroups.com. > Visit this group at http://groups.google.com/group/akomantoso-xml . > For more options, visit https://groups.google.com/groups/opt_out . > > > > -- > ____________________________________________________________________ > Grant Vergottini > Xcential Group, LLC. > email: grant.vergottini@xcential.com > phone: 858.361.6738 -- Fabio Vitali Tiger got to hunt, bird got to fly, Dept. of Computer Science Man got to sit and wonder "Why, why, why?' Univ. of Bologna ITALY Tiger got to sleep, bird got to land, phone: +39 051 2094872 Man got to tell himself he understand. e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle" http://vitali.web.cs.unibo.it/


  • 5.  Re: [legaldocml] [legaldocml] Re: [akomantoso-xml] On ids and numbering

    Posted 12-11-2013 18:02
    Dear Veronique, Grant, all: > On Tue, Dec 3, 2013 at 2:13 AM, Fabio Vitali <fabio@cs.unibo.it> wrote: > Dear all, > > let me make a proposal on numbering and ids from the discussions we had on the past teleconfs. Most of the actual abbreviations I used are invented on the spot, so please don't look at them critically. > > The generic syntax for an id is the following: > > [prefix "__"] abbr ["_" num] > > * prefix is a (possibly empty) string providing uniqueness to the remaining part of the id, and based on the context in which the element appears. > * abbr is an abbreviation describing the element, and it is drawn from the list of abbreviations that Veronique has first created and Grant improved. > * num is a (possibly empty) representation of the numbering of the element within its context. If the element is necessarily unique within its context, no numbering is used. > > GCV: I agree. I would add though the the "num" part must be case-sensitive. I have cases, here in California, where sec_1__p_iv and sec_1__p_IV will both exist and will be identifiying different paragraphs. It is the practice here to use numbering a...z, A..Z, lower roman numerals, and upper lower roman numberals in the same sequence. (And yes, there is an unfortunate overlapping problem - but that can't be changed). > > VPA: I agree with the remark of Grant : num must be case sensitive. I agree with both of you. > This is how to use this syntax: > > * An explicitly numbered element is an element that is numbered by the author of the expression, so that it is not the task of the author of the markup to establish such number, but only to recognize it in the text. Such number is most frequently placed in a <num> element inside the element's structure. An implicitly numbered element, on the other hand, is an element that was not numbered by the author of the expression, and therefore must be numbered in some ways by the author of the manifestation, and in particular has no <num> element relating to itself anywhere. > > GCV: I agree > > VPA: The numbering of the attribute is based on the numbering of the expression. > So, the numbering is language specific ? How to make the correspondance between translation of the same structure ? > > For example, if we add, in an act in force, two new articles after, for example, "article 1, without renumbering, the number will be : > • "1 bis" and "1 ter" for Spanish, French, Italian and Dutch > • "1 a" and "1 b" for Czech, Danish, German, Estonian, English, Croatian, Latvian, Lithuanian, Hungarian, Maltese, Polish, Romanian, Slovak, Slovenian, Finnish and Swedish > • "1 A" and "1 B" for Portuguese > • "1 a" and "1 ß" for greek > • "1 a" and "1 ?" for bulgarian. My opinion is that the currentId of an explicitly numbered element needs to be established syntactically given the num element in the document, and not semantically given its position in the document. This implies that corresponding structures in different languages will have different ids, and correspondences are maintained through different means (e.g., GUID). > Also, what is the number when, in the text, we found "Sole Article" ? Does we transform it to "1" ? The number is some rendering of the <num> element. If Sole Article is rendered as <article> <num>Sole Article</num> <content> <p>Blah blah</p> </content> </article> then the id should be "sole": "doc_1__art_sole" > * The context of the numbering of elements <X> are the containing elements <Y> that suggest, imply or force a re-start of the numbering of all internal <X>s. This can be either explicit (when the <X>s are explicitly numbered) or implicit (otherwise). Different contexts imply that elements with the same name may end up having the same abbr and the same number, and must therefore be disambiguated through the use of a prefix. The best option for such prefix is the id of the context element <Y>. For instance, in many traditions chapters restart numbering within every title, so "chp_2" for Chapter 2 could be ambiguous. Therefore, in these cases the id for Chapter 2 of Title I could be "ttl_I_chp_2" (assuming that "ttl_I" unambiguously identifies Title I). > > GCV: (assuming you mean "ttl_1__chp_2") I agree Of course I do. > VPA: When the footnotes are never renumbered, does we use the id of the containing element to prefix the footnotes ? Editorial (manifestation-level) <note>s are numbered as to the whim of the author of the manifestation. Authorial <authorialNote>s are explicitly numbered through a marker attribute. I suggest to use the content of the marker attribute as the number of the corresponding id. > * All document classes (act, bill, doc, etc.) are ALWAYS contexts. This means that, except particular cases, all numbers restart whenever a new document class is started (e.g., in a composite document each document component has its own local numbering). Similarly, <quotedStructure> and <extractStructure> are always contexts, EVEN IF they do not force a **restart** of the numbering, but just a different numbering context within themselves. Finally, plain inline elements are NEVER contexts. For instance, the id for article 12 in the first document of a composite document will be "doc_1__art_12", while in the second document it will be "doc_2__art_12". > > GCV: I agree > > VPA: When the document is an attachment of another one, the context must be composed with the prefix of the containing element; for exemple doc_1__annex_1__... I agree with Veronique. Attachments need contexts. > * Elements that are necessarily unique within a given context will require NO numbering. For instance, there is exactly ONE <body> in acts and bills, and therefore its id can be simply "body" (or "doc_1__body" in case of a composite document, of course). Analogously, there is at most ONE <content> element inside articles or sections, and therefore the id of the <content> element of article 12 will be simply "art_12__cnt". > > GCV: I agree > > VPA: ok Ok. > * What constitute a context is a tradition-dependent issue for explicitly numbered elements. That is to say, when the element is explicitly numbered we need to make sure whether the numberings starts multiple times in the same document. If this is the case, then the identification of the correct prefix requires the identification of the element causing the restart, and ITS id is used as prefix. For instance, while in many traditions articles are always globally numbered, in Latin America a special structure of transitional articles is added at the end of the document, whose numbering restarts. In this case, article 12 of the main part of the document will have id "main__art_12" while article 12 of the transitional part will be "transitional__art_12". > > GCV: I think that some elements (notes, refs, and figures, in particular), may use a numbering context that is different from the surrounding context. Also, English tradition generally causes sections to be globally numbered rather than articles. If we end up with different contexts depending on the element, then we MUST issue a document detailing which elements are numbered globally and which are numbered according to which context. > * For non-eplicilty numbered elements, on the other hand, it is a manifestation-level decision to determine an element smaller than the document class (if any) that would constitute the context for the element. Obvious choices would be: > - none (i.e., all non-numbered elements are numbered from the beginning of the relevant document class). > - the closest hcontainer (i.e., all elements within a hierarchy would be numbered after the id of the containing hierarchical elements). > - the closest containing element (ignoring containing inlines): block, container, container. > > Suppose for instance that we are considering the numbering of the third <ref> element within the second <p> element of article 12. In turn, by counting from the smallest enclosing document class, article 12 is explicitly numbered, and the local tradition has no context except for the document, thus its id is always "art_12". The second <p> of the article is actually the 14th <p> of the whole document, and its third ref is actually the 9th of the whole document and the fifth of the article (because the first <p> has two more refs). Thus we have the following options: > > - Case "none": 'art_12' for the <article>, 'p_14' for the <p> and 'ref_9' for the <ref>. > - Case "hcontainer": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__ref_5' for the <ref>. > - Case "closest containing element": 'art_12' for the <article>, 'art_12__p_2' for the <p> and 'art_12__p_2__ref_3' for the <ref>. > > Of course additional complexities may arise from mixing up policies depending on the elements (e.g. making akoma-specific elements become hcontainer-driven and html-specific elements become container-driven), but I would argue against this policy. My vote is cast for the case "hcontainer" but I will not cut my wrists in case another option is voted. I would strongly defend all the other proposals made in this message. > > GCV: I think that how unnumbered elements are assigned an id will be a jurisdiction specific decision - and it will vary based on the case. I would prefer that unnumbered paragraphs (non-designated to use LRC jargon) be numbered within their immediate surrounding context, but the refs and notes be numbered based on a higher context. If we end up with different contexts depending on the element, then we MUST issue a document detailing which elements are numbered globally and which are numbered according to which context. > VPA: For footnote, the numbering is generally done by the author. But the number of a ref is done implicitly and is a technical number. Maybe it is better to have a number relative to the surrounding element structure : if you work with documents that are progressively improved, the number need to be implicitly recalculate each time there is a new ref inside this structure (because the author add one or because there is a detection of a ref in an existing document (more precise markup) ). > Also, if an amendment add a new ref, the impact of this insertion depends on the greatness of the context If we end up with different contexts depending on the element, then we MUST issue a document detailing which elements are numbered globally and which are numbered according to which context. > VPA :How do you disambiguate the reference 'art_12__p_2__ref_3', without knowing how is calculated the id for this document (relatively to the "p"or relatively to the "article") (and, also, how is calculated the reference) ? If the fragment art_12__p_2 is present in the id, then ref_3 is local to the element whose id is art_12__p_2 . Thus it is necessarily the third ref of the second p. > * One last discussion item regards abundant or incomplete references. An abundant reference is a reference, in particular the fragment part of an IRI, that contains more information than needed to match it to the id of an element. An incomplete reference, on the other hand, contains less information than necessary and therefore may point to more than one possible destinations. BTW, we must never deal with abundant or incomplete ***id*** in the id attributes of elements, since ids are created by the author of a manifestation, and therefore we should expect him/her to know what is needed to establish the minimum complete set of information to create an unambiguous id. We should only deal with abundant or incomplete references, since the author of a reference could not know everything about the document being mentioned in the text of the reference., and therefore he/she might create an incorrect reference that has too much or too little information. > > In case of abundant reference, the resolver should identify the relevant minimal id (if it exists) by removing prefixes until a perfect match is found; in case of missing reference, on the other hand, the resolver must establish an interactive session with the user similar to the process of resolving work-level IRIs, and determine the missing information necessary to identify the id of an unique element. > > GCV: I feel quite strongly that ids should contain a minimal (rather than maximal) set of information required to identify an ite. That way, if the reference is abundant, you can always resolve it - by throwing away information that you don't need. If the id was maximal, and the refernece was incomplete, it would not be possible to resolve it. > > VPA: searching for "chap_1__sect_2__art_12__par_1", how can the resolver works without knowing anything of the tradition (is article continuous numbered or not ?). If a system is multi tradition, is it not necessary a proprietary of the document to inform about the algorithm used to build the id ? Discussions are pointless until we try concretely the algorithm. For instance, given "chap_1__sect_2__art_12__par_1": 1) Is there an element whose id is "chap_1__sect_2__art_12__par_1" ? - If yes, you've found it - If no, then: 2) Is there an element whose id is "par_1" within an element that can be recursively identified through this method with "chap_1__sect_2__art_12"? - If yes, you've found it - If no, then: 3) Is there an element whose id is "art_12__par_1" within an element that can be recursively identified through this method with "chap_1__sect_2"? - If yes, you've found it - If no, then: 4) Is there an element whose id is "sect_2__art_12__par_1" within an element whose id is "chap_1"? - If yes, you've found it - If no, then you have an error. Ciao Fabio -- > > > let me know what you think. > > Ciao > > Fabio > > > > -- > > Fabio Vitali Tiger got to hunt, bird got to fly, > Dept. of Computer Science Man got to sit and wonder "Why, why, why?' > Univ. of Bologna ITALY Tiger got to sleep, bird got to land, > phone: +39 051 2094872 Man got to tell himself he understand. > e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle" > http://vitali.web.cs.unibo.it/ > > > > > -- > You received this message because you are subscribed to the Google Groups "akomantoso-xml" group. > To unsubscribe from this group and stop receiving emails from it, send an email to akomantoso-xml+unsubscribe@googlegroups.com. > To post to this group, send email to akomantoso-xml@googlegroups.com. > Visit this group at http://groups.google.com/group/akomantoso-xml . > For more options, visit https://groups.google.com/groups/opt_out . > > > > -- > ____________________________________________________________________ > Grant Vergottini > Xcential Group, LLC. > email: grant.vergottini@xcential.com > phone: 858.361.6738 -- Fabio Vitali Tiger got to hunt, bird got to fly, Dept. of Computer Science Man got to sit and wonder "Why, why, why?' Univ. of Bologna ITALY Tiger got to sleep, bird got to land, phone: +39 051 2094872 Man got to tell himself he understand. e-mail: fabio@cs.unibo.it Kurt Vonnegut (1922-2007), "Cat's cradle" http://vitali.web.cs.unibo.it/


  • 6.  RE: [legaldocml] On ids and numbering

    Posted 12-10-2013 15:59
    Link to my response with full text below: https://docs.google.com/document/d/1rGvVnvg4PaWD_ql2QjK1nqfq7WtPUJlCnOV0dUoDU9E/pub   First, I am very excited about the potential for using the id within legal documents. This will make the electronic citation possibilities stronger and easier to integrate with other document types. For example, having opinion papers, journal articles and other documents that refer to multiple fields can be better structured and useful. However, there are some issues that seem apparent to me with the current recommendation. I would start out with what I believe are the requirements for a well constructed protocol for assigning id attribute values. Most importantly, the rules must never allow for collisions, that is, multiple equal values for the id attributes is made impossible. Currently, the recommended rules do not seem to rule out such collisions. I think that it is a good practice to either use human understandable or clearly computer friendly systems. Some systems use both, like including in a URL both the title and serial number often divided by a slash as a reasonable compromise. But I think mixing the two, is a mistake and abbreviations that do not systematically correspond with the other portions are a problem. I will go through the problems and solutions for this below. There must be an automated method for generating the ids, at least for most jurisdictions. I have researched other systems that effectively use XSL to auto-generate ids. The current recommendation, I believe, makes this impossible for many AKN developers. It should not be a requirement to meet a specific tools’ capabilities to design a system. Y2K was an example of a bad design decision based on current capabilities. If this is an overwhelming current need, then I would use an alternative system for id values. I found a few that would meet the need, but still meet the other requirements I listed. I realize that id attribute values in use are often globally unique values within a greater set of documents or, are semantic in quality and/or endeavor to match an XPath like value. I see value in any of these methods of determining the id values. For legal documents, going with the XPATH like value has great utility and logic. Also the recommendation includes a semantic like system in that it mirrors the semantically named elements. My issues are with the specific methods of attaining these goals. Using abbreviations for the elements creates problems. It diminishes the semantic nature It may diminish the ability to automatically generate id values that are unique It will not meet the 16 character length limit for indexing by a specific tool being used There is not currently a method for embedding or attaching a list of the abbreviations which is necessary if abbreviations end up in the final recommendation The current recommendation, by not differentiating the element name id equivalent with potential user determined attribute values for the hcontainer, will lead to collisions. My suggested fixes for dealing with the abbreviations: Either stop abbreviating the element equivalent id values, or Rename the elements to match the abbreviations, or Include a table or method of embedding the element name to abbreviation that can be accessed by automated processes, or Change completely to a system that is less semantic, and is XPATH like with child/level abbreviations. I have seen example of this method where each nested level is about four characters or less, meeting the indexing desire. My suggestion for dealing with the element name and hcontainer attribute value potential collisions: Never allow hcontainer name attribute values to be abbreviated for use in the id attribute otherwise this may allow yet more collisions (pippo and pippopippo might get the same five character abbreviation). Or include a rigorous abbreviation system for user defined hcontainer name attribute values. There must be a method, special character or something within an id attribute to differentiate the element name with an hcontainer name attribute value. This could be a tilde or other URI/IRI friendly (not needing to be escaped) character not already specified for the id values. Generating an XSL auto generator or end user parser of the id values will be more difficult or possible without dealing with the two issues I listed. Final word: I would find acceptable any id value system that could be autogenerated using XSL for most jurisdictions. That might allow for end user tools as well. I do not believe that the current recommendation will allow for this. Perhaps each jurisdiction will need their own XSL generator, but that would be a poor outcome. I would prefer a system that avoided the Y2K abbreviation even with safeguards. Either implement a rigorous XPATH simple level/nest system or fully implement the semantic equivalent name.        


  • 7.  RE: [legaldocml] On ids and numbering

    Posted 12-10-2013 19:12
    Thanks to help from Monica, I better understand the hcontainer name= value issue. Although it does allow for consistency, it both breaks the semantic nature of the id values and reinforces the general AKN preference for core elements against extensibility. I might still allow jurisdictions that have their own names that get used in the hcontainer name attribute to have an alternative convention of using the tilde prefix to the name attribute.   However, the abbreviating of the element names is still very troublesome and no one has explained why it is absolutely necessary. And even to the extent that abbreviating helps with shorter ids, it does not fully solve the issue of the sixteen character limit that eXist apparently has.   Daniel Bennett