OASIS Open Document Format for Office Applications (OpenDocument) TC

Expand all | Collapse all

The desirability of xml:id stability

  • 1.  The desirability of xml:id stability

    Posted 02-04-2013 17:01
    In today's call, there was interesting discussion about producers preserving xml:id attributes on elements that are preserved from a document that is being consumed. This is in reference to the proposal of OFFICE-3788: < https://tools.oasis-open.org/issues/browse/OFFICE-3788 >. I believe that is a valuable feature for complex document cases, but that it is not a good idea for a .x release of the ODF specification. The ODF 1.0/1.1/1.2 line does not require any such preservation. There is also nothing to prevent an implementation from doing it. So there is room for implementations to determine whether it is important for their use cases. There might be guidance about that, but I don't believe there should be any requirement about it. Absent implementation differentiation becoming a factor in interoperability, it is perhaps not a good idea to suddenly impose this requirement on implementations. It is not clear that the benefit is such that all implementations would be required to preserve xml:id attribute ID values so long as the element having the xml:id occurrence persists. As desirable as this might be from a puristic position, it does damage to many implementations that have never found an use case sufficient to implement this already-allowed capability. For calibration and added perspective, here are three use cases for the preservation of xml:ids. All have problems. These are all for preserving xml:ids for referential integrity of references from outside the document that refer to internal elements of a document (derivative). Accomodating any of them in ODF 1.3 might be a bridge too far. CASE 1: [X]HTML Production. When a document is saved as HTML, the xml:ids are presumably turned into identified anchors. This is necessary simply to allow for internal cross-references by IDREF attribute values that target an xml:id ID value. Changing those ID and IDREF values on editing of a replacement for an existing HTML document will break any deep links into the updated HTML export from anywhere else in the World Wide Web. That may not be acceptable for some usage of ODF implementations as tools for maintaining and producing an HTML rendition. (The same problem arises for user-created bookmarks and the identifiers that are generated for them and cross-references to them.) CASE 2: RDF in the same package and elsewhere. (Not just the RDFa in content.xml itself) ODF 1.2 permits RDF parts to be included in a document that refer into elements of the document structure. These RDF parts need a way to identify the elements being referenced, and fragment IDs in URIs of the RDF terms are the common means. Likewise, when the RDF is extracted from the document (e.g., via a GRDDL procedure) or is otherwise external from a document, that RDF can make use of the ODF Package and OWL Document OWL classes to continue to refer to specific elements internal to the ODF package. To the extent that a revision of the document is logically the same work with respect to the nature of the RDF about it, not preserving fragment IDs becomes a problem. (It is also a challenge to deal with the fact that ODF currently lacks a means for creating a location-independent entity identification of a document. Something is needed for where different occurrences of instances are to be taken as logically the same document. This requires something that can work as a persistent URI or URN for a document that is relatively instance-independent and where the document is not necessarily found only at a unique URL location on the Web.) Finally, it is not to be expected that all implementations will be in a position to adjust RDF within packages to align with changed xml:id ID values in order to perserve the referential integrity from such metadata. Some implementations will simply not deal with such RDF and they may but need not preserve that RDF within the package. (There are pros and cons about this. Having mystery material can be a problem for document safety/security and also for documents that are digitally signed when there is implementation-unknown material.) ODF 1.2 doesn't constrain this and it is difficult to see what ODF 1.3 can do beyond adding some guidance. It is perhaps better for guidance to be worked out and demonstrated at OIC first. That's certainly the case for RDF that is not in the package at all. CASE 3: ODF 1.2 CHANGE TRACKING Depending on how references to portions of documents involving tracked changes happens, there can be a problem with the preservation of xml:id attributes. In ODF 1.0/1.1/1.2 the connection of change information with the places in the document where the change applies is accomplished by the xml:id ID value on a <text:changed-region> element. It is also the case that element start tags with xml:id attributes can be swept up into <text:deletion> elements that carry removed material. Those xml:ids would need to be preserved, since the deletion can be rejected in a later edit. (This situation has remarkable consequences for RDF now referencing an element that is (partially) deleted.) I don't know whether this is comprehended as an edge case for the MCT-based change-tracking for ODF 1.3. AND EDGE CASES There are many edge cases to all of this. There is the interaction with change-tracking (and whether that can synchronize with arbitrary RDF in the package), accessibility (also impacted by change tracking), and probably other provisions, including concerns about covert content and digital signatures. It is also important to note that the xml:id attribute ID values in ODF 1.2 documents are generally not thought to be user-specifiable. Where there are user-specified names, these are in other attributes that are usually not used as attribute values of type ID and IDREF. (Note that this xml:id case should actually be about all ODF 1.x attributes having values of type ID, since uniqueness must be preserved across all of them. The xml:id ones are the only ones automatically accessible via fragment values in URI references.) - Dennis PS: Another cat picture: < http://www.flickr.com/photos/orcmid/1502722674/in/set-72157600230263578 >.


  • 2.  Re: [office] The desirability of xml:id stability

    Posted 02-04-2013 19:56
    On 04/02/13 18:00, Dennis E. Hamilton wrote: > CASE 2: RDF in the same package and elsewhere. (Not just the RDFa in > content.xml itself) > > ODF 1.2 permits RDF parts to be included in a document that refer into > elements of the document structure. These RDF parts need a way to identify > the elements being referenced, and fragment IDs in URIs of the RDF terms are > the common means. > > Likewise, when the RDF is extracted from the document (e.g., via a GRDDL > procedure) or is otherwise external from a document, that RDF can make use > of the ODF Package and OWL Document OWL classes to continue to refer to > specific elements internal to the ODF package. To the extent that a > revision of the document is logically the same work with respect to the > nature of the RDF about it, not preserving fragment IDs becomes a problem. yes. (though OWL classes are not actually necessary for that, just referencing the xml:id of an element is enough.) > (It is also a challenge to deal with the fact that ODF currently lacks a > means for creating a location-independent entity identification of a > document. Something is needed for where different occurrences of instances > are to be taken as logically the same document. This requires something > that can work as a persistent URI or URN for a document that is relatively > instance-independent and where the document is not necessarily found only at > a unique URL location on the Web.) is this "something" in any way specific to ODF? i'd assume this is a more general problem. although it's only a problem for documents on the local file system; if the document is _actually_ on the web, it's a non-issue because if your web URIs aren't stable then you're doing it wrong anyway. > Finally, it is not to be expected that all implementations will be in a > position to adjust RDF within packages to align with changed xml:id ID > values in order to perserve the referential integrity from such metadata. > Some implementations will simply not deal with such RDF and they may but > need not preserve that RDF within the package. (There are pros and cons > about this. Having mystery material can be a problem for document > safety/security and also for documents that are digitally signed when there > is implementation-unknown material.) from my experience the main problem is that implementing the preservation of xml:id is actually surprisingly difficult, due to the uniqueness constraint that has to be maintained at all times; at least in OpenOffice.org Writer, whose internals differ significantly from what a naive reader of the ODF specification would expect, operations such as Splitting/Merging, Undo, Change Tracking and Copy/Paste are just the most obvious ways in which things can go wrong... > CASE 3: ODF 1.2 CHANGE TRACKING > > Depending on how references to portions of documents involving tracked > changes happens, there can be a problem with the preservation of xml:id > attributes. > > In ODF 1.0/1.1/1.2 the connection of change information with the places in > the document where the change applies is accomplished by the xml:id ID value > on a <text:changed-region> element. It is also the case that element start > tags with xml:id attributes can be swept up into <text:deletion> elements > that carry removed material. Those xml:ids would need to be preserved, > since the deletion can be rejected in a later edit. (This situation has > remarkable consequences for RDF now referencing an element that is > (partially) deleted.) ah yes... i once spent a day thinking about how to represent xml:ids of merged paragraphs in the change tracking info such that both accepting and rejecting the tracked change yields good results and no additional ODF attributes are necessary due to the uniqueness constraint on xml:id. i don't remember what my preferred solution was, but Oliver-Rainer didn't like it at the time so maybe it wasn't a good idea :) > (Note that this xml:id case > should actually be about all ODF 1.x attributes having values of type ID, > since uniqueness must be preserved across all of them. The xml:id ones are > the only ones automatically accessible via fragment values in URI > references.) the ODF 1.2 schema only has xml:id as an ID type attribute; all ID attributes in older ODF versions were deprecated in favour of xml:id with ODF 1.2, and their types were changed to "string" so they could have the same value as xml:id (for the benefit of ODF 1.1 consumers). regards, michael -- Michael Stahl Software Engineer Platform Engineering - Desktop Team Red Hat Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com


  • 3.  Re: [office] The desirability of xml:id stability

    Posted 02-25-2013 15:01
    Hi, On 04/02/13 20:55, Michael Stahl wrote: > On 04/02/13 18:00, Dennis E. Hamilton wrote: >> CASE 2: RDF in the same package and elsewhere.  (Not just the RDFa in >> content.xml itself) >> >> ODF 1.2 permits RDF parts to be included in a document that refer into >> elements of the document structure.  These RDF parts need a way to identify >> the elements being referenced, and fragment IDs in URIs of the RDF terms are >> the common means.   >> >> Likewise, when the RDF is extracted from the document (e.g., via a GRDDL >> procedure) or is otherwise external from a document, that RDF can make use >> of the ODF Package and OWL Document OWL classes to continue to refer to >> specific elements internal to the ODF package.  To the extent that a >> revision of the document is logically the same work with respect to the >> nature of the RDF about it, not preserving fragment IDs becomes a problem. > > yes.  (though OWL classes are not actually necessary for that, just > referencing the xml:id of an element is enough.) > >> (It is also a challenge to deal with the fact that ODF currently lacks a >> means for creating a location-independent entity identification of a >> document.  Something is needed for where different occurrences of instances >> are to be taken as logically the same document.  This requires something >> that can work as a persistent URI or URN for a document that is relatively >> instance-independent and where the document is not necessarily found only at >> a unique URL location on the Web.) > > is this "something" in any way specific to ODF?  i'd assume this is a > more general problem.  although it's only a problem for documents on the > local file system; if the document is _actually_ on the web, it's a > non-issue because if your web URIs aren't stable then you're doing it > wrong anyway. > >> Finally, it is not to be expected that all implementations will be in a >> position to adjust RDF within packages to align with changed xml:id ID >> values in order to perserve the referential integrity from such metadata. >> Some implementations will simply not deal with such RDF and they may but >> need not preserve that RDF within the package.  (There are pros and cons >> about this.  Having mystery material can be a problem for document >> safety/security and also for documents that are digitally signed when there >> is implementation-unknown material.) > > from my experience the main problem is that implementing the > preservation of xml:id is actually surprisingly difficult, due to the > uniqueness constraint that has to be maintained at all times; at least > in OpenOffice.org Writer, whose internals differ significantly from what > a naive reader of the ODF specification would expect, operations such as > Splitting/Merging, Undo, Change Tracking and Copy/Paste are just the > most obvious ways in which things can go wrong... > >> CASE 3: ODF 1.2 CHANGE TRACKING >> >> Depending on how references to portions of documents involving tracked >> changes happens, there can be a problem with the preservation of xml:id >> attributes.   >> >> In ODF 1.0/1.1/1.2 the connection of change information with the places in >> the document where the change applies is accomplished by the xml:id ID value >> on a <text:changed-region> element.  It is also the case that element start >> tags with xml:id attributes can be swept up into <text:deletion> elements >> that carry removed material.  Those xml:ids would need to be preserved, >> since the deletion can be rejected in a later edit.  (This situation has >> remarkable consequences for RDF now referencing an element that is >> (partially) deleted.) > > ah yes... i once spent a day thinking about how to represent xml:ids of > merged paragraphs in the change tracking info such that both accepting > and rejecting the tracked change yields good results and no additional > ODF attributes are necessary due to the uniqueness constraint on xml:id. >  i don't remember what my preferred solution was, but Oliver-Rainer > didn't like it at the time so maybe it wasn't a good idea  :) Unfortunately, I did not even remember the discussion on it. Just for my interest: Do you remember the discussed solutions? > >> (Note that this xml:id case >> should actually be about all ODF 1.x attributes having values of type ID, >> since uniqueness must be preserved across all of them.  The xml:id ones are >> the only ones automatically accessible via fragment values in URI >> references.) > > the ODF 1.2 schema only has xml:id as an ID type attribute;  all ID > attributes in older ODF versions were deprecated in favour of xml:id > with ODF 1.2, and their types were changed to "string" so they could > have the same value as xml:id (for the benefit of ODF 1.1 consumers). > > regards, >  michael > > -- > Michael Stahl Software Engineer > Platform Engineering - Desktop Team > Red Hat > > Better technology. Faster innovation. Powered by community collaboration. > See how it works at redhat.com > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail.  Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php > Mit freundlichen Grüßen / Best regards Oliver-Rainer Wittmann -- Advisory Software Engineer ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Beim Strohhause 17 20097 Hamburg Phone: +49-40-6389-1415 E-Mail: orwitt@de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294


  • 4.  Re: [office] The desirability of xml:id stability

    Posted 03-01-2013 21:02
    On 25/02/13 16:01, Oliver-Rainer Wittmann wrote: > Hi, > > On 04/02/13 20:55, Michael Stahl wrote: >> On 04/02/13 18:00, Dennis E. Hamilton wrote: >>> CASE 3: ODF 1.2 CHANGE TRACKING >>> >>> Depending on how references to portions of documents involving tracked >>> changes happens, there can be a problem with the preservation of xml:id >>> attributes. >>> >>> In ODF 1.0/1.1/1.2 the connection of change information with the > places in >>> the document where the change applies is accomplished by the xml:id > ID value >>> on a <text:changed-region> element. It is also the case that element > start >>> tags with xml:id attributes can be swept up into <text:deletion> > elements >>> that carry removed material. Those xml:ids would need to be preserved, >>> since the deletion can be rejected in a later edit. (This situation has >>> remarkable consequences for RDF now referencing an element that is >>> (partially) deleted.) >> >> ah yes... i once spent a day thinking about how to represent xml:ids of >> merged paragraphs in the change tracking info such that both accepting >> and rejecting the tracked change yields good results and no additional >> ODF attributes are necessary due to the uniqueness constraint on xml:id. >> i don't remember what my preferred solution was, but Oliver-Rainer >> didn't like it at the time so maybe it wasn't a good idea :) > > Unfortunately, I did not even remember the discussion on it. > > Just for my interest: Do you remember the discussed solutions? not exactly... but i've found an old mail containing some Wiki dump which i'll forward you privately... apparently the "solution" was to lazily join the xml:id of the paragraphs not when a tracked change is created but when it is accepted, which you and/or AMA didn't like because it violates the usual principle that the content of the document (ignoring tracked changes) represents the state in which all tracked changes are accepted. regards, michael -- Michael Stahl Software Engineer Platform Engineering - Desktop Team Red Hat Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com


  • 5.  Re: [office] The desirability of xml:id stability

    Posted 03-04-2013 09:02
    Hi, Michael: Is it possible to share the content of your mediawiki dump with the ODF TC? I have to admit that I still do not remember the mentioned discussion with you and AMA (Andreas Martens). It seems that my brain is losing certain stuff from the past as I am getting older and older. I found a wiki page in the OpenOffice wiki which is discussing the handling of meta data references on certain editing actions [1]. Micheal: Do you remember, whether the implementation of meta data in OpenOffice Writer follows the discussion in [1]? [1] http://wiki.openoffice.org/wiki/Writer/Metadata_Support Mit freundlichen Grüßen / Best regards Oliver-Rainer Wittmann -- Advisory Software Engineer ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Beim Strohhause 17 20097 Hamburg Phone: +49-40-6389-1415 E-Mail: orwitt@de.ibm.com ------------------------------------------------------------------------------------------------------------------------------------------- IBM Deutschland Research & Development GmbH / Vorsitzende des Aufsichtsrats: Martina Koederitz Geschäftsführung: Dirk Wittkopp Sitz der Gesellschaft: Böblingen / Registergericht: Amtsgericht Stuttgart, HRB 243294 From:         Michael Stahl <mstahl@redhat.com> To:         office@lists.oasis-open.org, Date:         01.03.2013 22:01 Subject:         Re: [office] The desirability of xml:id stability Sent by:         <office@lists.oasis-open.org> On 25/02/13 16:01, Oliver-Rainer Wittmann wrote: > Hi, > > On 04/02/13 20:55, Michael Stahl wrote: >> On 04/02/13 18:00, Dennis E. Hamilton wrote: >>> CASE 3: ODF 1.2 CHANGE TRACKING >>> >>> Depending on how references to portions of documents involving tracked >>> changes happens, there can be a problem with the preservation of xml:id >>> attributes.   >>> >>> In ODF 1.0/1.1/1.2 the connection of change information with the > places in >>> the document where the change applies is accomplished by the xml:id > ID value >>> on a <text:changed-region> element.  It is also the case that element > start >>> tags with xml:id attributes can be swept up into <text:deletion> > elements >>> that carry removed material.  Those xml:ids would need to be preserved, >>> since the deletion can be rejected in a later edit.  (This situation has >>> remarkable consequences for RDF now referencing an element that is >>> (partially) deleted.) >> >> ah yes... i once spent a day thinking about how to represent xml:ids of >> merged paragraphs in the change tracking info such that both accepting >> and rejecting the tracked change yields good results and no additional >> ODF attributes are necessary due to the uniqueness constraint on xml:id. >>  i don't remember what my preferred solution was, but Oliver-Rainer >> didn't like it at the time so maybe it wasn't a good idea  :) > > Unfortunately, I did not even remember the discussion on it. > > Just for my interest: Do you remember the discussed solutions? not exactly... but i've found an old mail containing some Wiki dump which i'll forward you privately... apparently the "solution" was to lazily join the xml:id of the paragraphs not when a tracked change is created but when it is accepted, which you and/or AMA didn't like because it violates the usual principle that the content of the document (ignoring tracked changes) represents the state in which all tracked changes are accepted. regards, michael -- Michael Stahl Software Engineer Platform Engineering - Desktop Team Red Hat Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 6.  Re: [office] The desirability of xml:id stability

    Posted 03-16-2013 18:34
    On 04/03/13 10:01, Oliver-Rainer Wittmann wrote: > Hi, sorry for late reply, was on vacation and then the wiki didn't want to talk to me... > Michael: Is it possible to share the content of your mediawiki dump with > the ODF TC? not sure if anybody here is interested in this much incoherent rambling :) but feel free to put it up on the AOO wiki and send a link here if you like. > I have to admit that I still do not remember the mentioned discussion > with you and AMA (Andreas Martens). It seems that my brain is losing > certain stuff from the past as I am getting older and older. hmmm... same here. > I found a wiki page in the OpenOffice wiki which is discussing the > handling of meta data references on certain editing actions [1]. > > Micheal: Do you remember, whether the implementation of meta data in > OpenOffice Writer follows the discussion in [1]? > > [1] http://wiki.openoffice.org/wiki/Writer/Metadata_Support - the list of elements in first paragraph is a goal statement, not an implementation status - "Deletion while change tracking is activated:" - i don't remember what actually happens here - "If another entity has the same metadata reference, the to be pasted entity will lose its metadata reference and thus has no metadata any more." the implementation is actually slightly more clever than this: you can copy something, paste it, the inserted copy appears to have no xml:id; now delete the source and the inserted copy takes over the xml:id the source had. (the mechanism for that was required for Writer's insane Undo implementation anyway...) - "Embedded Objects" - i don't remember what actually happens with these other than that it looks accurate as far as i can remember. regards, michael -- Michael Stahl Software Engineer Platform Engineering - Desktop Team Red Hat Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com


  • 7.  Re: [office] The desirability of xml:id stability

    Posted 02-04-2013 21:31
    Dennis, On 02/04/2013 12:00 PM, Dennis E. Hamilton wrote: In today's call, there was interesting discussion about producers preserving xml:id attributes on elements that are preserved from a document that is being consumed. This is in reference to the proposal of OFFICE-3788: < https://tools.oasis-open.org/issues/browse/OFFICE-3788 >. I believe that is a valuable feature for complex document cases, but that it is not a good idea for a .x release of the ODF specification. The ODF 1.0/1.1/1.2 line does not require any such preservation. There is also nothing to prevent an implementation from doing it. So there is room for implementations to determine whether it is important for their use cases. There might be guidance about that, but I don't believe there should be any requirement about it. Absent implementation differentiation becoming a factor in interoperability, it is perhaps not a good idea to suddenly impose this requirement on implementations. I don't think there is any question of "sudden imposition" of this as a requirement on implementations. If it bothers you that it might appear in ODF 1.3, we can always change the number to 2.0. When new versions of standards appear, software always takes some time to catch up. That has always been the case with new and exciting features. Not to mention that the discussion overlooked the fact that ODF applications currently preserve attribute values for elements, lots of them. I don't have numbers on that so do need to find an archive of realistic ODF documents to see how often applications currently save/re-write ids. It is not clear that the benefit is such that all implementations would be required to preserve xml:id attribute ID values so long as the element having the xml:id occurrence persists. As desirable as this might be from a puristic position, it does damage to many implementations that have never found an use case sufficient to implement this already-allowed capability. How so? If an implementation choose to not support persistent IDs, it can always be an ODF 1.2 implementation that implements some extra features defined in ODF 1.3 (or some other numbering). If "being allowed" were the test for interoperability, there is very reason to specify values in ODF. Applications could mimic each other if they really wanted to. Nothing stops them from doing just that. For calibration and added perspective, here are three use cases for the preservation of xml:ids. All have problems. These are all for preserving xml:ids for referential integrity of references from outside the document that refer to internal elements of a document (derivative). Accomodating any of them in ODF 1.3 might be a bridge too far. And the basis for "...might be a bridge too far" is? I understand that: 1) Present implementation don't preserve xml:ids 2) To require preservation of xml:ids would make existing applications non-conformant to some future version of ODF 1.3. (I assume if we change anything in ODF 1.3, they are not going to be fully conformant with ODF 1.3.) 3) What I am missing is some evidence, other than your saying it, that preservation of xml:ids is any harder than preserving any other attribute value. I understand implementations don't do the preservation now, but at one time implementations didn't use XML either. Non-use doesn't mean that a proposal is too difficult or unworkable. Being able to reliably point into documents would be the next step towards not simply getting document level pointers from a search engine. I would rather get a <text:p> level pointer than a point to 800+ pages of text. You? Hope you are having a great day! Patrick PS: Having said all that, I welcome pointers to archives of ODF documents so I can investigate current usage on ids. CASE 1: [X]HTML Production. When a document is saved as HTML, the xml:ids are presumably turned into identified anchors. This is necessary simply to allow for internal cross-references by IDREF attribute values that target an xml:id ID value. Changing those ID and IDREF values on editing of a replacement for an existing HTML document will break any deep links into the updated HTML export from anywhere else in the World Wide Web. That may not be acceptable for some usage of ODF implementations as tools for maintaining and producing an HTML rendition. (The same problem arises for user-created bookmarks and the identifiers that are generated for them and cross-references to them.) CASE 2: RDF in the same package and elsewhere. (Not just the RDFa in content.xml itself) ODF 1.2 permits RDF parts to be included in a document that refer into elements of the document structure. These RDF parts need a way to identify the elements being referenced, and fragment IDs in URIs of the RDF terms are the common means. Likewise, when the RDF is extracted from the document (e.g., via a GRDDL procedure) or is otherwise external from a document, that RDF can make use of the ODF Package and OWL Document OWL classes to continue to refer to specific elements internal to the ODF package. To the extent that a revision of the document is logically the same work with respect to the nature of the RDF about it, not preserving fragment IDs becomes a problem. (It is also a challenge to deal with the fact that ODF currently lacks a means for creating a location-independent entity identification of a document. Something is needed for where different occurrences of instances are to be taken as logically the same document. This requires something that can work as a persistent URI or URN for a document that is relatively instance-independent and where the document is not necessarily found only at a unique URL location on the Web.) Finally, it is not to be expected that all implementations will be in a position to adjust RDF within packages to align with changed xml:id ID values in order to perserve the referential integrity from such metadata. Some implementations will simply not deal with such RDF and they may but need not preserve that RDF within the package. (There are pros and cons about this. Having mystery material can be a problem for document safety/security and also for documents that are digitally signed when there is implementation-unknown material.) ODF 1.2 doesn't constrain this and it is difficult to see what ODF 1.3 can do beyond adding some guidance. It is perhaps better for guidance to be worked out and demonstrated at OIC first. That's certainly the case for RDF that is not in the package at all. CASE 3: ODF 1.2 CHANGE TRACKING Depending on how references to portions of documents involving tracked changes happens, there can be a problem with the preservation of xml:id attributes. In ODF 1.0/1.1/1.2 the connection of change information with the places in the document where the change applies is accomplished by the xml:id ID value on a <text:changed-region> element. It is also the case that element start tags with xml:id attributes can be swept up into <text:deletion> elements that carry removed material. Those xml:ids would need to be preserved, since the deletion can be rejected in a later edit. (This situation has remarkable consequences for RDF now referencing an element that is (partially) deleted.) I don't know whether this is comprehended as an edge case for the MCT-based change-tracking for ODF 1.3. AND EDGE CASES There are many edge cases to all of this. There is the interaction with change-tracking (and whether that can synchronize with arbitrary RDF in the package), accessibility (also impacted by change tracking), and probably other provisions, including concerns about covert content and digital signatures. It is also important to note that the xml:id attribute ID values in ODF 1.2 documents are generally not thought to be user-specifiable. Where there are user-specified names, these are in other attributes that are usually not used as attribute values of type ID and IDREF. (Note that this xml:id case should actually be about all ODF 1.x attributes having values of type ID, since uniqueness must be preserved across all of them. The xml:id ones are the only ones automatically accessible via fragment values in URI references.) - Dennis PS: Another cat picture: < http://www.flickr.com/photos/orcmid/1502722674/in/set-72157600230263578 >. --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php -- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Former Chair, V1 - US TAG to JTC 1/SC 34 Convener, JTC 1/SC 34/WG 3 (Topic Maps) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau


  • 8.  Re: [office] The desirability of xml:id stability

    Posted 02-04-2013 21:36
    On Mon, 2013-02-04 at 14:30 -0700, Patrick Durusau wrote: > > 3) What I am missing is some evidence, other than your saying it, that > preservation of xml:ids is any harder than preserving any other > attribute value. > > I understand implementations don't do the preservation now, but at one > time implementations didn't use XML either. Non-use doesn't mean that a > proposal is too difficult or unworkable. > > Being able to reliably point into documents would be the next step > towards not simply getting document level pointers from a search engine. > Patrick, what would be an example of an arbitrary string attribute value that is currently being preserved by at least some ODF consumer/producers? Andreas -- Andreas J. Guelzow, PhD FTICA Professor of Mathematical & Computing Sciences Concordia University College of Alberta Attachment: signature.asc Description: This is a digitally signed message part


  • 9.  Re: [office] The desirability of xml:id stability

    Posted 02-04-2013 22:34
    Andreas, On 02/04/2013 04:36 PM, Andreas J Guelzow wrote: On Mon, 2013-02-04 at 14:30 -0700, Patrick Durusau wrote: 3) What I am missing is some evidence, other than your saying it, that preservation of xml:ids is any harder than preserving any other attribute value. I understand implementations don't do the preservation now, but at one time implementations didn't use XML either. Non-use doesn't mean that a proposal is too difficult or unworkable. Being able to reliably point into documents would be the next step towards not simply getting document level pointers from a search engine. Patrick, what would be an example of an arbitrary string attribute value that is currently being preserved by at least some ODF consumer/producers? Is the "arbitrary" part a requirement? That is there are plenty of specified attribute values that are persisted. See any stylesheet attribute whose value is chosen by a user. I don't know how that preservation is different from preserving an "arbitrary" string? But to amuse you, consider the <text:user-index-mark> which has the attribute: text:string-value. 8.17 and 19.871.6, respectively. 19.871.6 reads: "The text:string-value attribute specifies text to be displayed in an index." So, I created a file, inserted a user defined string in the index, created an index that shows the user-defined value. Please open, edit (somewhere other than my index entry), save and re-open. Does my arbitrary text still appear? I would say it was preserved by the application. Yes? Try it with OpenOffice first, but I suspect the same will be true for some other implementations. Hope you are having a great day! Patrick Andreas -- Patrick Durusau patrick@durusau.net Technical Advisory Board, OASIS (TAB) Former Chair, V1 - US TAG to JTC 1/SC 34 Convener, JTC 1/SC 34/WG 3 (Topic Maps) Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300 Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps) Another Word For It (blog): http://tm.durusau.net Homepage: http://www.durusau.net Twitter: patrickDurusau Attachment: ODF-arbitrary-string-test-01.odt Description: application/vnd.oasis.opendocument.text


  • 10.  RE: [office] The desirability of xml:id stability

    Posted 02-05-2013 03:38
    @Patrick Andreas has already reported that Gnumeric preserves structure but it does not retain the original xml:ids in its internal structure. You've already heard from Oliver (orw) on the call that *none* of the OpenOffice-legacy implementations preserve xml:ids in their internal structure and the ids are synthesized on making a document persistent. Michael Stahl has reported on this thread that "implementing the preservation of xml:id is actually surprisingly difficult." It seems inappropriate to second-guess implementers about this and I am going to refrain from that. It seems to me that there is a lot more to clean up before xml:id stability becomes important. It is my considered opinion that any such stability should be left to implementations and only if there is some sort of outcry for the need for interoperable implementation in a standard-specified way should anything be done about it at the ODF TC. And yes, an ODF 2.0 would likely be the place to address significant breaking differences with respect to ODF 1.x. - Dennis MORE ANALYSIS I have discovered that it is relatively difficult to create an ODF 1.2 document that has xml:id attributes on any of its elements. Many of the permitted occurrences are optional and it is not clear what would have them be there. The elements that *require* an xml:id break down as follows: * The <text:changed-region> case is known. It is also known that those xml:id attribute values routinely change from consumption to production (although the IDREF valued attributes on associated elements still refer to the correct ones on <text:changed-region> elements.). * The <text:meta-field> case is one where the persistence of the xml:id ID value is definitely important, since the field needs to be found from an external source by a fragment identifier matching the ID. Since the arrangement is merely implementation-dependent, it is not possible to say more. I don't know what implementations, if any, support this element and derive the field value in any way. * The remainder are all <form:*> elements. I managed to demonstrate those by creating a <form:text> and a <form:fixed-text> that was a label for the <form:text> field. There were xml:id attributes whose ID value was lexically identical to the value of the form:id attribute on the same element. There were cross-references using IDREF values too. (The two ID values are "control1" and "control2".) The ID values are not visible to the user, but the form:name values are. I created this in LibreOffice 3.3.2 and opened and resaved it in Apache OpenOffice 3.4.1. The ID values and the form names were unchanged. I tried the document in Microsoft Office 2013 Word and the form did not survive the trip, so there is nothing to learn there about the unlikely preservation of xml:id values at this point.


  • 11.  Re: [office] The desirability of xml:id stability

    Posted 02-05-2013 11:17
    Dennis, On 02/04/2013 10:38 PM, Dennis E. Hamilton wrote: @Patrick Andreas has already reported that Gnumeric preserves structure but it does not retain the original xml:ids in its internal structure. You've already heard from Oliver (orw) on the call that *none* of the OpenOffice-legacy implementations preserve xml:ids in their internal structure and the ids are synthesized on making a document persistent. Michael Stahl has reported on this thread that "implementing the preservation of xml:id is actually surprisingly difficult." As statements of what implementations currently *do*, all those statements are correct. That has no bearing on the question of the advantages of xml:id preservation or its difficulty. Michael Stahl's comments, which you need to re-read, ventured a general opinion on the preservation of xml:ids for merging paragraphs. You really should quote in context or not at all. None of which addresses my example of the re-use of data from spreadsheets precisely because addresses to the data are preserved. It seems inappropriate to second-guess implementers about this and I am going to refrain from that. What's inappropriate is to echo any implementer's current code base as the default for any standardization effort. You have not entertained any discussion of the benefits of xml:id preservation but are eager to end the discussion before it can get to that point. It seems to me that there is a lot more to clean up before xml:id stability becomes important. It is my considered opinion that any such stability should be left to implementations and only if there is some sort of outcry for the need for interoperable implementation in a standard-specified way should anything be done about it at the ODF TC. And yes, an ODF 2.0 would likely be the place to address significant breaking differences with respect to ODF 1.x. Leaving stability up to implementations means that some implementations, not naming names, can claim conformance without offering advantages defined by a future version of ODF. I don't see any advantage to ODF in taking such a position, There are advantages to implementers but those are not my concerns. - Dennis MORE ANALYSIS I have discovered that it is relatively difficult to create an ODF 1.2 document that has xml:id attributes on any of its elements. Many of the permitted occurrences are optional and it is not clear what would have them be there. For a group that prides itself on abstraction, implementers are surprisingly concrete. I posted a file yesterday where an arbitrary text string is preserved by an OpenOffice application. Calling such a string xml:id, other than it's uniqueness feature, it is and remains just a string. There isn't anything magical about it or different from any other unique string. True, there is the merging case but that is a solvable one if you think about it. The elements that *require* an xml:id break down as follows: * The <text:changed-region> case is known. It is also known that those xml:id attribute values routinely change from consumption to production (although the IDREF valued attributes on associated elements still refer to the correct ones on <text:changed-region> elements.). Rather fanciful "analysis" if you want to call it that. How do you get "require[d]" change from consumption to production? ODF 5.5.2 says nothing of the sort. Do you disagree? * The <text:meta-field> case is one where the persistence of the xml:id ID value is definitely important, since the field needs to be found from an external source by a fragment identifier matching the ID. Since the arrangement is merely implementation-dependent, it is not possible to say more. I don't know what implementations, if any, support this element and derive the field value in any way. And your point would be? * The remainder are all <form:*> elements. I managed to demonstrate those by creating a <form:text> and a <form:fixed-text> that was a label for the <form:text> field. There were xml:id attributes whose ID value was lexically identical to the value of the form:id attribute on the same element. There were cross-references using IDREF values too. (The two ID values are "control1" and "control2".) The ID values are not visible to the user, but the form:name values are. I created this in LibreOffice 3.3.2 and opened and resaved it in Apache OpenOffice 3.4.1. The ID values and the form names were unchanged. I tried the document in Microsoft Office 2013 Word and the form did not survive the trip, so there is nothing to learn there about the unlikely preservation of xml:id values at this point. Again, you are keying off the term "xml:id" as though it is a talisman of some sort. I posted to Andreas an example where user supplied text strings are persisted last night. Yes, current implementations appear to not have a slot for writing down xml:ids or xml:idrefs where they are found. Ho-hum. Yet, current implementation write down a large number of other values and references between those values, such as style sheets. We could call them "h" for Hamilton ids and define the same characteristics as xml:ids (except for the requirement of start name characters) and have the same capabilities. I am going to start posting on the advantages of stable xml:ids and related issues. What current implementations do is a starting point, not a boundary for standards development. Hope you are at the start of a great day! Patrick


  • 12.  Re: [office] The desirability of xml:id stability

    Posted 02-05-2013 13:27
    On 05/02/13 04:38, Dennis E. Hamilton wrote: > Andreas has already reported that Gnumeric preserves structure but it > does not retain the original xml:ids in its internal structure. > You've already heard from Oliver (orw) on the call that *none* of the > OpenOffice-legacy implementations preserve xml:ids in their internal > structure and the ids are synthesized on making a document > persistent. Michael Stahl has reported on this thread that > "implementing the preservation of xml:id is actually surprisingly > difficult." starting with version 3.2, OpenOffice.org Writer has this _partially_ implemented, i.e. it will preserver xml:id on some but not all elements. for a complete list see: http://wiki.openoffice.org/wiki/Documentation/DevGuide/OfficeDev/RDF_metadata#Status > It seems to me that there is a lot more to clean up before xml:id > stability becomes important. It is my considered opinion that any > such stability should be left to implementations and only if there is > some sort of outcry for the need for interoperable implementation in > a standard-specified way should anything be done about it at the ODF > TC. And yes, an ODF 2.0 would likely be the place to address > significant breaking differences with respect to ODF 1.x. as an aside, from an RDF metadata perspective a more serious open question is that when some content element is copied, the copy/paste of the RDF metadata referencing that element via its xml:id is left completely unspecified by ODF, which is unlikely to yield similar results in different implementation. (the problem is that the RDF graphs have no "natural hierarchy" (they are not trees), and of course by the very nature of the feature ODF implementations cannot assume particular semantics of the RDF properties that happen to be used in some file.) > MORE ANALYSIS > > I have discovered that it is relatively difficult to create an ODF > 1.2 document that has xml:id attributes on any of its elements. Many > of the permitted occurrences are optional and it is not clear what > would have them be there. The elements that *require* an xml:id > break down as follows: in many cases optional xml:id attributes were added to elements in ODF 1.2 for the RDF metadata feature. > * The <text:changed-region> case is known. It is also known that > those xml:id attribute values routinely change from consumption to > production (although the IDREF valued attributes on associated > elements still refer to the correct ones on <text:changed-region> > elements.). in OpenOffice.org these xml:ids are newly generated on export. > * The <text:meta-field> case is one where the persistence of the > xml:id ID value is definitely important, since the field needs to be > found from an external source by a fragment identifier matching the > ID. Since the arrangement is merely implementation-dependent, it is > not possible to say more. I don't know what implementations, if any, > support this element and derive the field value in any way. the element is supported in OpenOffice.org, but there is no UI for it, so you have to use the API to insert it (or load an ODF file that contains it). > * The remainder are all <form:*> elements. I managed to demonstrate > those by creating a <form:text> and a <form:fixed-text> that was a > label for the <form:text> field. There were xml:id attributes whose > ID value was lexically identical to the value of the form:id > attribute on the same element. There were cross-references using > IDREF values too. (The two ID values are "control1" and "control2".) > The ID values are not visible to the user, but the form:name values > are. I created this in LibreOffice 3.3.2 and opened and resaved it > in Apache OpenOffice 3.4.1. The ID values and the form names were > unchanged. I tried the document in Microsoft Office 2013 Word and > the form did not survive the trip, so there is nothing to learn there > about the unlikely preservation of xml:id values at this point. i believe in OpenOffice.org the xml:id implementation for these elements is actually distinct from the one i've cited above, and so it is possible in principle to get duplicate xml:id attribute values for these; but i don't know if the xml:id values are _actually_ preserved or if they merely happen to be generated in a superficially consistent manner on export. another case where OpenOffice.org may generate xml:id but does not preserve it is text:list. regards, michael -- Michael Stahl Software Engineer Platform Engineering - Desktop Team Red Hat Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com


  • 13.  Re: [office] The desirability of xml:id stability

    Posted 02-26-2013 15:37
    On 05.02.2013 14:26, Michael Stahl wrote: On 05/02/13 04:38, Dennis E. Hamilton wrote: ... It seems to me that there is a lot more to clean up before xml:id stability becomes important. It is my considered opinion that any such stability should be left to implementations and only if there is some sort of outcry for the need for interoperable implementation in a standard-specified way should anything be done about it at the ODF TC. And yes, an ODF 2.0 would likely be the place to address significant breaking differences with respect to ODF 1.x. as an aside, from an RDF metadata perspective a more serious open question is that when some content element is copied, the copy/paste of the RDF metadata referencing that element via its xml:id is left completely unspecified by ODF, which is unlikely to yield similar results in different implementation. (the problem is that the RDF graphs have no natural hierarchy (they are not trees), and of course by the very nature of the feature ODF implementations cannot assume particular semantics of the RDF properties that happen to be used in some file.) The copy/paste of RDF metadata is unspecified as the behavior depends on the type of metadata (=> meta-metadata?). For instance: If I had been the original author of a paragraph and the paragraph is being moved, I will be still the original author and the metadata should/will be kept. If the metadata does instead represent the position of the paragraph, it have to be adapted/removed, if the paragraph itself is being moved (copy/pasted). Either only the module/plugin knowing about the metadata can handle the metadata correctly, or my suggestion we require some meta-metadata, as moveable that makes it understandable for third party software, as the office itself. Ceterum censeo, the xml:id is like a public API for a document, the removal/change during a load/save roundtrip is similar to changing the anchor IDs of an HTML page, breaking all references into the document. Best regards, Svante


  • 14.  RE: [office] The desirability of xml:id stability

    Posted 02-26-2013 19:04
    I agree about xml:id being a public API in some cases. However, that is not how xml:id is profiled in ODF. ODF limits xml:id to very specific cases and does not consider its arbitrary use to establish anchors. It's also nothing [xml-id] specification requires. In fact, it strikes me that [xml-id] was not originally intended for that purpose. If it was left to me, I would go the other direction: 1. xml:id would be allowed on any and all elements and it would be for fragment location. There would be *no* ODF-constrained use for it. Preservation would be implementation-defined. A conforming implementation could completely ignore them and validation would not be required. (I believe duplicates are not allowed to be fatal in [xml-id].) 2. All use of cross-references as part of representing the structural model of an ODF document would rely on identifiers and references that might use NCNAME-valued key sets but they would not be by usage of ID and IDREF, avoiding all collision issues with xml:id. 3. This is not a proposal. This is how I would have avoided comingling the use of identifiers that are essential to the structure and the arbitrary use of xml:id for some purpose independent of the ODF document structure. This would have an impact on interior xlink usage, too. Since this is not a proposal, there's no need to investigate whether there's an ODF-employed xlink case that would fail were an xml:id not available for a target. (I can imagine that happening in the single-file XML version of an ODF document.) The RDFa and [embedded] RDF cases would still rely on xml:id, but that's an overlay on the essential structure and a problem confined to those implementations that recognize and preserve RDFa and <rdf:RDF> annotations. Since this approach would completely rewind what was done from 1.1 to 1.2 with regard to some structure-required cross-referencing, I would not ever consider this (apart from maybe allowing xml:id anywhere and [optionally] preserving them in an implementation, independent of what the standard says). I favor one of the alternatives that Rob has listed as an interesting way forward for ODF 1.3. - Dennis


  • 15.  Re: [office] The desirability of xml:id stability

    Posted 03-04-2013 11:30
    On 26.02.2013 20:04, Dennis E. Hamilton wrote: I agree about xml:id being a public API in some cases. However, that is not how xml:id is profiled in ODF. ODF limits xml:id to very specific cases and does not consider its arbitrary use to establish anchors. It's also nothing [xml-id] specification requires. In fact, it strikes me that [xml-id] was not originally intended for that purpose. xml:id were introduced with ODF 1.2 to be able to refer to the elements from RDF resources within the package and to exchange in general the multiple existing IDs without an ID schema type by a unique one. Referring from outside the package is very desirable and already on the proposal queue, see https://wiki.oasis-open.org/office/Change_Proposal_for_ODF_1.2_using_URL_fragment_identifiers_for_ODF_media_types Mathias Bauer (former OOo Writer lead) even stated that opening a document at a certain spot, would be a low hanging fruit.   Regarding the changeability, it was not mentioned in ODF 1.2 nor in the HTML specification to not change these IDs for anchors, the access points within a document, as otherwise external links are broken, as it is as obvious as the label: your coffee might be hot , but much more difficult to specify. As the user is allowed to change the xml:id, but it should not occur without a reason. If it was left to me, I would go the other direction: 1. xml:id would be allowed on any and all elements and it would be for fragment location. There would be *no* ODF-constrained use for it. Preservation would be implementation-defined. A conforming implementation could completely ignore them and validation would not be required. (I believe duplicates are not allowed to be fatal in [xml-id].) This simplification of design would unfortunately create complexity on the abstraction layer from the XML. As all boilerplate XML elements (e.g. office:body) might possess IDs, enabling the ability to have multiple IDs (= URIs) on the same component (=Data), which should be avoided by web design principles . In ODF 1.2 we focused therefore on those elements that were able to contain (text) data. 2. All use of cross-references as part of representing the structural model of an ODF document would rely on identifiers and references that might use NCNAME-valued key sets but they would not be by usage of ID and IDREF, avoiding all collision issues with xml:id. What is the problem you are trying to solve here, Dennis? 3. This is not a proposal. This is how I would have avoided comingling the use of identifiers that are essential to the structure and the arbitrary use of xml:id for some purpose independent of the ODF document structure. This would have an impact on interior xlink usage, too. Since this is not a proposal, there's no need to investigate whether there's an ODF-employed xlink case that would fail were an xml:id not available for a target. (I can imagine that happening in the single-file XML version of an ODF document.) The RDFa and [embedded] RDF cases would still rely on xml:id, but that's an overlay on the essential structure and a problem confined to those implementations that recognize and preserve RDFa and <rdf:RDF> annotations. Since this approach would completely rewind what was done from 1.1 to 1.2 with regard to some structure-required cross-referencing, I would not ever consider this (apart from maybe allowing xml:id anywhere and [optionally] preserving them in an implementation, independent of what the standard says). I favor one of the alternatives that Rob has listed as an interesting way forward for ODF 1.3. If someone has a bug in its implementation, like randomizing external identifier (xml:id), it has to be fixed, otherwise the overall value of ODF is being lowered, as that application would be the weakest chain, destroying information in a document exchange. Using the same data format across applications requires responsibility, even if the application has currently no advantage of this feature. Especially, when this issue is as easy to fix as xml:ids. Thanks, Svante - Dennis


  • 16.  RE: [office] The desirability of xml:id stability