OpenDocument - Adv Document Collab SC

Expand all | Collapse all

Using RDF for Change Tracking serialization?

  • 1.  Using RDF for Change Tracking serialization?

    Posted 05-12-2011 01:20
    Hi, Since I know of a few office applications which support RDF, I thought of perhaps considering leveraging it for change tracking. Apologies if this has been thought of and dismissed in the past. As far as I know, OpenOffice, Calligra/KOffice, and abiword all have support for at least preserving RDF across load/save cycles. As an aside, since I have been involved in making some of that happen, the challenges I see when making an implementation support RDF in ODT are the following: (1) handling RDF/XML to internal model transition (handling the RDF file itself), when xml:id values change in the document (2) updating references from the RDF to those nodes, (3) handling copy and paste, again due to the xml:id needing to be unique and when that happens references from the RDF to the pasted document content have to be added. Anyway back to the point. As there is some implementation support for RDF I was considering how one might serialize change tracking into RDF instead of inline in content.xml. I've kept things very simple in this email, if there is interest I'm happy to expand and explore using other constructs with RDF too. Consider this example from the GCT (a fragment generated with abiword), the "text" of a four word paragraph undergoes a number of style changes which are represented using ac:changeXXX attributes; <text:p text:style-name="Normal" delta:insertion-type="insert-with-content" delta:insertion-change-idref="1">This is the <text:span text:style-name="T1" delta:insertion-change-idref="4" delta:insertion-type="insert-around-content" ac:change1="2,insert,text:style-name," ac:change2="3,modify,text:style-name,T2" ac:change3="4,modify,text:style-name,T4">text</text:span> here.</text:p> Considering only the ac:change attributes, in RDF one might instead see the change tracking coalesced into an xml:id. <text:p text:style-name="Normal" delta:insertion-type="insert-with-content" delta:insertion-change-idref="1">This is the <text:span xml:id="f009" text:style-name="T1" delta:insertion-change-idref="4" delta:insertion-type="insert-around-content">text</text:span> here.</text:p> The ac:changes are then serialized as RDF. It might also be advantageous to split the attribute value into its constituents. Without the formalities of namespaces and in a more abstract triple format this might lead to something like: bnodeA revision 2 bnodeA type insert bnodeA attribute text:style-name bnodeB revision 3 bnodeB type modify bnodeB attribute text:style-name bnodeB oldvalue T2 bnodeB revision 4 bnodeB type modify bnodeB attribute text:style-name bnodeB oldvalue T4 Of course this would need to also link bnodeA,B,C subjects back to the xml:id of f009 in the core document. There are many advantages that I see do exploring RDF for this purpose; (1) The change tracking information can have annotations and digital signatures applied. It would be quite simple for bnodeA to also include a signature for the subgraph ( bnodeA ? ? - bnodeA signature ? ), ie all RDF with a subject of bnodeA, sans any existing digital signature on the odd change it exists to avoid ambiguity. (2) implementations which do not support change tracking have a simpler and smaller document to load. (3) queries can be run on the change tracking information using SPARQL (4) The RDF affords applications a scratch space to associate any other semantics with changes that might be desired. Since the association can be to other RDF it should be resilient to implementations which do not know of the additional custom RDF. One major downside to this approach is that it requires an implementation to get its hands dirty with some RDF support in order to support change tracking. There is also the issue that an application not supporting RDF might break the xml:id links from the RDF to the document. Though if the change tracking specification does not use RDF and application which doesn't support change tracking is used to load an ODF file it too will probably not save the ct information if/when the document is saved.


  • 2.  Re: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-12-2011 09:40
    I do not think this has been considered. Interesting idea. Please clarify: - you have shown how it applies to ac:change attributes, but presumably it could also be applied to the other GCT attributes as well? Then there would just be an ID and RDF referencing this ID and containing all the CT information - presumably also RDF could be used to represent CT Sets and Stacks? - you gain ability to query with SPARQL but the original XML could be queried with XQuery and XPath. I do not know the relative merits of these in this situation - any comments? - if we want to define constraints, e.g. what constitutes a valid delete column change, would this be easier with CT in RDF or as XML? - presumably some XML infrastructure in content.xml is still needed, for example markers for deleted items and the deleted item itself somewhere else in the document Regarding your first aside about xml:id attributes - this is a big problem and the only practical solution I have seen is the simple one that requires applications to keep the IDs where possible (cut and paste does as you say require new IDs to be generated). Applications don't want to do that but the problem of matching up changed IDs is very complex and computationally expensive, so IMHO it is best to require that they are preserved. After all the rest of the XML needs to be retained, so why not the ID values? Perhaps the RDF itself could be used to preserve them?? Robin On 12/05/2011 02:19, monkeyiq wrote: 1305163171.17093.6.camel@alkid.localdomain type= cite > Hi, Since I know of a few office applications which support RDF, I thought of perhaps considering leveraging it for change tracking. Apologies if this has been thought of and dismissed in the past. As far as I know, OpenOffice, Calligra/KOffice, and abiword all have support for at least preserving RDF across load/save cycles. As an aside, since I have been involved in making some of that happen, the challenges I see when making an implementation support RDF in ODT are the following: (1) handling RDF/XML to internal model transition (handling the RDF file itself), when xml:id values change in the document (2) updating references from the RDF to those nodes, (3) handling copy and paste, again due to the xml:id needing to be unique and when that happens references from the RDF to the pasted document content have to be added. Anyway back to the point. As there is some implementation support for RDF I was considering how one might serialize change tracking into RDF instead of inline in content.xml. I've kept things very simple in this email, if there is interest I'm happy to expand and explore using other constructs with RDF too. Consider this example from the GCT (a fragment generated with abiword), the text of a four word paragraph undergoes a number of style changes which are represented using ac:changeXXX attributes; <text:p text:style-name= Normal delta:insertion-type= insert-with-content delta:insertion-change-idref= 1 >This is the <text:span text:style-name= T1 delta:insertion-change-idref= 4 delta:insertion-type= insert-around-content ac:change1= 2,insert,text:style-name, ac:change2= 3,modify,text:style-name,T2 ac:change3= 4,modify,text:style-name,T4 >text</text:span> here.</text:p> Considering only the ac:change attributes, in RDF one might instead see the change tracking coalesced into an xml:id. <text:p text:style-name= Normal delta:insertion-type= insert-with-content delta:insertion-change-idref= 1 >This is the <text:span xml:id= f009 text:style-name= T1 delta:insertion-change-idref= 4 delta:insertion-type= insert-around-content >text</text:span> here.</text:p> The ac:changes are then serialized as RDF. It might also be advantageous to split the attribute value into its constituents. Without the formalities of namespaces and in a more abstract triple format this might lead to something like: bnodeA revision 2 bnodeA type insert bnodeA attribute text:style-name bnodeB revision 3 bnodeB type modify bnodeB attribute text:style-name bnodeB oldvalue T2 bnodeB revision 4 bnodeB type modify bnodeB attribute text:style-name bnodeB oldvalue T4 Of course this would need to also link bnodeA,B,C subjects back to the xml:id of f009 in the core document. There are many advantages that I see do exploring RDF for this purpose; (1) The change tracking information can have annotations and digital signatures applied. It would be quite simple for bnodeA to also include a signature for the subgraph ( bnodeA ? ? - bnodeA signature ? ), ie all RDF with a subject of bnodeA, sans any existing digital signature on the odd change it exists to avoid ambiguity. (2) implementations which do not support change tracking have a simpler and smaller document to load. (3) queries can be run on the change tracking information using SPARQL (4) The RDF affords applications a scratch space to associate any other semantics with changes that might be desired. Since the association can be to other RDF it should be resilient to implementations which do not know of the additional custom RDF. One major downside to this approach is that it requires an implementation to get its hands dirty with some RDF support in order to support change tracking. There is also the issue that an application not supporting RDF might break the xml:id links from the RDF to the document. Though if the change tracking specification does not use RDF and application which doesn't support change tracking is used to load an ODF file it too will probably not save the ct information if/when the document is saved. --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd Change control for XML T: +44 1684 592 144 E: robin.lafontaine@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK


  • 3.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-12-2011 17:33
    SHORT ANSWER It is unacceptable to use the existing RDF provisions of ODF to incorporate the data structures involved in change tracking. - Dennis LONG ANSWER There are those of us who believe that RDF is meant to be descriptive and to provide metadata that can be used with regard to the *semantics* of documents. In that case, RDF is unidirectional. It expresses things about what the document (not so much the XML or the format) expresses but it does not impact what that expression is in any way. Since the way RDF is expressed requires pointing at aspects of a document, this does mean, for mechanical reasons, that it does so by pointing at fragments in the document format. Presumably these fragments serve as sufficient connection to the document content (when rendered for human consumption) that is what the RDF is interpreted as carrying assertions about. This does leave a rather interesting conundrum. (1) It should be possible to consume and present a document without consideration of the RDF whatsoever, whether in the package-carried RDF/XML files and some files called manifest.rdf or embedded in the content.xml file as RDFa. (There is an ODF element that breaks this case, but that is a minor blemish so far.) (2) Producing an ODF document is complicated by the fact that there is no way of determining, in a more-or-less pure producer, whether changes to the document's content as represented in ODF format to cause the RDFa or the separately packaged RDF/XML to now be inconsistent. (Think maintenance of software that does not change the relevant comments.) While it is recommended that RDF be retained, there is no reliable way of knowing whether or not there is now an inconsistency. All one can really do is attempt to preserve any xml:id already in the content.xml and not generate any duplicates. (This preserves reference to fragments of the document by RDF IRIs but won't assure much if the access is by XPATH or some other means of reference.) If some action has the effect of deleting an element having an xml:id, that's just unfortunate. Clearly, RDFa can be deleted along with the elements to which it is attached. It is less clear what can be done if the element is modified and there is no means to determine whether the result remains consistent with any associated RDFa. So there is no good way to assure consistency, nor is there any provision whatsoever in the ODF specification for how consistency is maintainable. The automatic retention of RDF found in the consumption of a document leads to some serious consequences when we consider digital signatures, if the RDF is not disclosed to and considered by the signer. I have observed no such provision of any current implementation of ODF digital signatures. Furthermore, the blind retention of material that is not understood is not particularly welcome in the context of document security and privacy. It also provides one rather easy means for establishing a covert or private channel in an otherwise innocent-seeming document, and that is also frowned upon in some circles. Notice that I haven't even mentioned change tracking yet. However, it seems to me that assuring RDF consistency is considerably more difficult than change tracking. To presume this problem solved in order to have a platform for change-tracking seems like a trip down the wrong tunnel. Furthermore, RDF is a poor vehicle for implementation of change-tracking because it breaks the abstraction that the RDF is presumably dealing with. Using RDF to inject behavior into the document format is a sin because now some RDF has an active role in consumption and presentation of the document -- it has become essential to the interpretation of the format. This now makes ignoring the RDF a difficult problem, and it is more difficult because there is no good way to know which RDF to pay attention to. (There is essentially no constraint on the RDF that is carried with an ODF document.) Secondly, we have the problem, already, of how to track changed elements to which RDF relates and, for that matter, does that mean RDF is change-tracked too? (I say it is madness to have that answer be anything but "no.") The problem of maintaining consistency is also acute. If an element having an xml:id is swept up into a tracked deletion, we have to presume that the xml:id value is potentially referred to by RDF that we might have no easy means to identify and we certainly might have no idea how to adjust it. We might track the impact on RDFa, but basically there is no good mechanical procedure for comprehending what modifications of any RDF/XML part are related to what the deleted/modified text expresses. Finally, because there is RDFa in content.xml and RDF/XML in the package, that does not mean we have before us all of the RDF that refers to the document we have the file for. (Technically, there should be a way to transform any of the RDF in an OpenDocument file or package into a free-standing RDF collection that exists apart from the document itself. That is, after all, part of how the Semantic Web is intended to work. This extracted RDF and other RDF of any origin whatsoever can be dispersed around the Worldwide Web and compiled into RDF collections in arbitrary places.) If one were to make a custom use of RDF, in its own parts and under its own root element, to implement change tracking, this seems like a waste of time. Making a custom XML component for that purpose is more direct and does not require extraordinary tooling. Also, once RDF is used, there is always the problem of the admission of RDF which is not understood by the consumer. So we end up reverting to the previously unsolved problem. Not to mention the prolonged learning curve that one will have had to follow in order to understand that the trip is not worth it.


  • 4.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-15-2011 04:33
    Thanks for your detailed reply! I tend to disagree with many of your points though :( On Thu, 2011-05-12 at 10:33 -0700, Dennis E. Hamilton wrote: > SHORT ANSWER > > It is unacceptable to use the existing RDF provisions of ODF to > incorporate the data structures involved in change tracking. Does everybody on the list agree on this point? What is the procedure for ruling it out or not? > > - Dennis > > LONG ANSWER > > There are those of us who believe that RDF is meant to be descriptive > and to provide metadata that can be used with regard to the > *semantics* of documents. In that case, RDF is unidirectional. It > expresses things about what the document (not so much the XML or the > format) expresses but it does not impact what that expression is in > any way. > > Since the way RDF is expressed requires pointing at aspects of a > document, this does mean, for mechanical reasons, that it does so by > pointing at fragments in the document format. Presumably these > fragments serve as sufficient connection to the document content (when > rendered for human consumption) that is what the RDF is interpreted as > carrying assertions about. > > This does leave a rather interesting conundrum. > > (1) It should be possible to consume and present a document without > consideration of the RDF whatsoever, whether in the package-carried > RDF/XML files and some files called manifest.rdf or embedded in the > content.xml file as RDFa. (There is an ODF element that breaks this > case, but that is a minor blemish so far.) > > (2) Producing an ODF document is complicated by the fact that there is > no way of determining, in a more-or-less pure producer, whether > changes to the document's content as represented in ODF format to > cause the RDFa or the separately packaged RDF/XML to now be > inconsistent. (Think maintenance of software that does not change the > relevant comments.) > > While it is recommended that RDF be retained, there is no reliable way > of knowing whether or not there is now an inconsistency. All one can > really do is attempt to preserve any xml:id already in the content.xml > and not generate any duplicates. (This preserves reference to > fragments of the document by RDF IRIs but won't assure much if the > access is by XPATH or some other means of reference.) If some action > has the effect of deleting an element having an xml:id, that's just > unfortunate. Clearly, RDFa can be deleted along with the elements to > which it is attached. It is less clear what can be done if the > element is modified and there is no means to determine whether the > result remains consistent with any associated RDFa. > > So there is no good way to assure consistency, nor is there any > provision whatsoever in the ODF specification for how consistency is > maintainable. The automatic retention of RDF found in the consumption > of a document leads to some serious consequences when we consider > digital signatures, if the RDF is not disclosed to and considered by > the signer. I have observed no such provision of any current > implementation of ODF digital signatures. Furthermore, the blind > retention of material that is not understood is not particularly > welcome in the context of document security and privacy. It also > provides one rather easy means for establishing a covert or private > channel in an otherwise innocent-seeming document, and that is also > frowned upon in some circles. > > Notice that I haven't even mentioned change tracking yet. However, it > seems to me that assuring RDF consistency is considerably more > difficult than change tracking. To presume this problem solved in > order to have a platform for change-tracking seems like a trip down > the wrong tunnel. > > Furthermore, RDF is a poor vehicle for implementation of > change-tracking because it breaks the abstraction that the RDF is > presumably dealing with. Using RDF to inject behavior into the > document format is a sin because now some RDF has an active role in > consumption and presentation of the document -- it has become > essential to the interpretation of the format. This now makes > ignoring the RDF a difficult problem, and it is more difficult because > there is no good way to know which RDF to pay attention to. (There is > essentially no constraint on the RDF that is carried with an ODF > document.) Secondly, we have the problem, already, of how to track > changed elements to which RDF relates and, for that matter, does that > mean RDF is change-tracked too? (I say it is madness to have that > answer be anything but "no.") I say it is madness to say anything other than absolutely yes. It seems like a weakness to only track the content and style of a document but not it's semantics. I had in mind to add provisions for RDF change tracking as an unofficial extension in one or two implementations at some point. But since I'm in this group now it seems like a wise move to try to table it here. Perhaps I'm wrong or just an optimist to consider it. Thinking a bit laterally, it seems that fields like deductive databases have to deal with things that feel much like revisioning of triples already. For example a ddb performing forward chaining must track something relating to the assertion and inference rule that led to triple generation. If it doesn't do this then retraction becomes an almost intractable problem (save for wiping all inference and starting forward inference from base triples). A few obvious ideas come to mind to implement ct of RDF; (a) using the context node, (b) using specific RDF/XML files per revision as incremental files, (c) or reifying document RDF and associating triples with their introduction / retraction revisions. For (b) one might have for a revision 7 a file rdf7.rdf where to load revision 7 one would call load( n ) = if(!n) return {} else read-triples(rdfn.rdf) union-with-retraction-handling load(n-1) Of course this is just a rough high level thought. The downside to (a) is that you then remove the ability for applications to use the RDF context node which is far from optimal IMHO. The downside to (c) is that it is a bit bloatey. Note that for (c) I wouldn't expect RDF producers / consumers to reify everything. I have in mind for the RDF handling code in the application itself to perform this behind the scenes. The graph offered by the application doesn't have to be exactly the graph on disk. Obviously it would be nice to be able to quickly deserialize the RDF graph (with retractions applied) in bulk for the last X versions. which (b) would need non incremental as well as incremental rdfN files for such to work. Does anyone else think CT on the RDF is an interesting and valuable idea? > The problem of maintaining consistency is also acute. If an element > having an xml:id is swept up into a tracked deletion, we have to > presume that the xml:id value is potentially referred to by RDF that > we might have no easy means to identify and we certainly might have no > idea how to adjust it. We might track the impact on RDFa, but > basically there is no good mechanical procedure for comprehending what > modifications of any RDF/XML part are related to what the > deleted/modified text expresses. > > Finally, because there is RDFa in content.xml and RDF/XML in the > package, that does not mean we have before us all of the RDF that > refers to the document we have the file for. (Technically, there > should be a way to transform any of the RDF in an OpenDocument file or > package into a free-standing RDF collection that exists apart from the > document itself. That is, after all, part of how the Semantic Web is > intended to work. This extracted RDF and other RDF of any origin > whatsoever can be dispersed around the Worldwide Web and compiled into > RDF collections in arbitrary places.) > > If one were to make a custom use of RDF, in its own parts and under > its own root element, to implement change tracking, this seems like a > waste of time. Making a custom XML component for that purpose is more > direct and does not require extraordinary tooling. Also, once RDF is > used, there is always the problem of the admission of RDF which is not > understood by the consumer. So we end up reverting to the previously > unsolved problem. Not to mention the prolonged learning curve that > one will have had to follow in order to understand that the trip is > not worth it. > >


  • 5.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-16-2011 15:40
    monkeyiq <monkeyiq@gmail.com> wrote on 05/15/2011 12:32:24 AM: > > Does anyone else think CT on the RDF is an interesting and valuable > idea? > I think it is interesting. But I wonder whether the same applications that find it challenging to support the flexible and dynamic GCT proposal would also find it equally challenging to support ODF 1.2's flexible and dynamic RDF support? In other words, I don't think RDF necessarily reduces the complexity of support changing tracking. It replaces it with another, perhaps equally complex task. But that doesn't necessarily mean it is a bad idea. -Rob


  • 6.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-21-2011 02:30
    On Mon, 2011-05-16 at 11:39 -0400, robert_weir@us.ibm.com wrote: > monkeyiq <monkeyiq@gmail.com> wrote on 05/15/2011 12:32:24 AM: > > > > > Does anyone else think CT on the RDF is an interesting and valuable > > idea? > > > > I think it is interesting. But I wonder whether the same applications > that find it challenging to support the flexible and dynamic GCT proposal > would also find it equally challenging to support ODF 1.2's flexible and > dynamic RDF support? One huge plus here is that the xml:id referencing isn't needed as the tracked changes have their own identifier system. So if, for example, changetracking.rdf was nominated to contain the <delta:tracked-changes> block then it could be preserved as is in an output document. Of course, unless the ECT/GCT markup in content.xml was preserved then the references from the changetracking.rdf would be invalidated, but that would be the same problem if the <delta:tracked-changes> was in XML inline in the content.xml. If a current fragment <delta:change-transaction delta:change-id="ct1"> is converted to nodeX rdf:type delta:change-transaction nodeX delta:change-id "ct1" The content.xml could go on using "ct1" as its identifier <text:span delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1' text:style-name="bold-style">bold</text:span> > > In other words, I don't think RDF necessarily reduces the complexity of > support changing tracking. It replaces it with another, perhaps equally > complex task. It would seem from initial inspection that RDF might be useful and simple to use for <delta:tracked-changes> and perhaps ac:change attributes. Though deletion of content in content.xml seems to offer significant complexity if one tries to express those changes in RDF. > > But that doesn't necessarily mean it is a bad idea. > > -Rob >


  • 7.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-21-2011 03:11
    So how do you propose to refer to tracked changes from RDF if they places you are referencing don't have URI references? If you are using literals, no generic RDF system will know what you are doing at all. And I don't see you doing anything with private identifiers that you can't do within the XML itself, since ODF already does that with other private (none ID-type) identifiers for other purposes. - Dennis PS: Actually, I'm sorry I am asking, since I don't think this avenue is worth pursuing in the first place.


  • 8.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-21-2011 07:21
    On Fri, 2011-05-20 at 20:10 -0700, Dennis E. Hamilton wrote: > So how do you propose to refer to tracked changes from RDF if they > places you are referencing don't have URI references? > > If you are using literals, no generic RDF system will know what you > are doing at all. And I don't see you doing anything with private > identifiers that you can't do within the XML itself, since ODF already > does that with other private (none ID-type) identifiers for other > purposes. Indeed, literals. No generic RDF system needs to know what I am doing at all. These triples are for the change tracking to see. The use of literals means that the generic system can be fine with just preserving them. > > - Dennis > > PS: Actually, I'm sorry I am asking, since I don't think this avenue > is worth pursuing in the first place. I wonder if your comment implies that I should be sorry answering too.


  • 9.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-21-2011 18:43
    Well, I can't speak to degrees of sorrow [;<). My lingering question is, what does RDF provide for a custom change-tracking structure that can't be done directly and more straightforwardly in XML? Then we can stick to the schema approach and tooling we already have in order to use XML in conveying an ODF document structure. I can't figure out what a diversion through RDF technology and back again accomplishes. - Dennis


  • 10.  Re: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-23-2011 11:42
    The question Dennis asks is the key question. From the discussion it seems RDF may be well suited for some areas of change tracking but definitely not well suited for others (deletion for example). But to use it for some change tracking and not all would seem very strange and difficult to justify. I am not convinced by the RDF representations being any better or easier to process than the original XML (worse in some cases IMHO). So I am, at the moment, not persuaded that it makes sense to use RDF for change tracking. Robin On 21/05/2011 19:42, Dennis E. Hamilton wrote: 008901cc17e6$e6945720$b3bd0560$@acm.org type= cite > Well, I can't speak to degrees of sorrow [;<). My lingering question is, what does RDF provide for a custom change-tracking structure that can't be done directly and more straightforwardly in XML? Then we can stick to the schema approach and tooling we already have in order to use XML in conveying an ODF document structure. I can't figure out what a diversion through RDF technology and back again accomplishes. - Dennis


  • 11.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-17-2011 19:27
    On May 14, I noticed some material on RDF/XML and RDFa related to a completely different discussion than the one here. However, it reminded me of the key accomplishment of the World Wide Web and Tim Berners-Lee: It is not necessary to maintain consistency of links nor for the targets of links to be aware of those links whatsoever. (There are others, such as Ted Nelson, who claim this is a gigantic failure of the web.) I take that principle to mean that an ODF document should definitely be concerned about referential integrity within its ODF-specified structure (e.g., references to style names, references to bookmarks and bibliography entries in the content, references to other sheets from a spreadsheet formula, references from fields to sources of their content in metadata, etc.). I see no reason that the ODF document have any obligation with regard to the referential integrity of references into the document structure from separate RDF/XML or even embedded RDFa, since there is no way to mechanically determine what the dependency is, at a semantic level, and ODF 1.2 specifies no specific semantics for specialized use that an ODF consumer would be expected to interpret. I agree that xml:id attributes and there values should be preserved when the element having them is preserved in the document structure. (If the element is swept up in a deletion, it seems to me that so is the xml:id, and it is not available for reuse until the deletion is accepted and no longer tracked.) [This creates a problem that has to be resolved if moves are treated as a combination of a deletion and an insertion.] I'm not sure that *any* implementation of ODF as a native format preserves xml:id, and we should check for that. I would definitely be surprised that xml:id is preserved by import into a non-ODF document processor where the manipulated document is then exported back to ODF format. (I don't expect that RDF/XML and embedded RDFa will survive such a journey either.) - Dennis SIDE NOTES Here is the material that had me remember Sir Tim's achievement and the principle that has worked so well for the web: > http://dev.iptc.org/rNews > > http://www.iptc.org/site/Home/ > > http://www.beet.tv/2011/05/the-semantic-web-is-coming-to-newsrooms-this-summer-hearsts-cto-michael-dunn.html > > Via Tim Berners-Lee via Shelly Powers > I also notice that the IPTC folk are clear that RDFa is for HTML and web pages and RDF/XML is for XML documents, a distinction that is not maintained in ODF 1.2 for our own idiosyncratic reasons. It is useful that xml:id should be preserved when the element is preserved. We must also keep in mind that no element can have more than one attribute with value of type ID (an odd but very specific limitation of XML), so we have to be careful about how xml:id is chosen for internal ODF document structure reasons and how it might be injected/borrowed as a fragment ID that is referenced by a URI somewhere, whether some RDF/XML elsewhere in a package or somewhere else. Unfortunately, the limited cases where xml:id is permitted for arbitrary purposes and cases where xml:id is part of a specific ODF document structure provision have been comingled in a way that is difficult to untangle. (The law of unintended consequences has stepped in.)


  • 12.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-21-2011 02:51
    On Tue, 2011-05-17 at 12:26 -0700, Dennis E. Hamilton wrote: > ... > I agree that xml:id attributes and there values should be preserved > when the element having them is preserved in the document structure. > (If the element is swept up in a deletion, it seems to me that so is > the xml:id, and it is not available for reuse until the deletion is > accepted and no longer tracked.) [This creates a problem that has to > be resolved if moves are treated as a combination of a deletion and an > insertion.] I don't see that the value of an xml:id need be special. If an application wants to change it that's fine IMHO. But it needs to do something like changeid( oldid, newid ) where references in RDF to oldid are changed to newid so that links from RDF to content.xml are preserved. Of course this breaks RDF outside the ODF file, but that RDF could link to a triple inside the ODF file instead of the xml:id if it wanted stability. KOffice changes xml:id values because it was seen that the system should be able to reassign xml:id values of it's choosing to avoid having to select new ones which do not clash with existing ones. > > I'm not sure that *any* implementation of ODF as a native format > preserves xml:id, and we should check for that. I would definitely be > surprised that xml:id is preserved by import into a non-ODF document > processor where the manipulated document is then exported back to ODF > format. (I don't expect that RDF/XML and embedded RDFa will survive > such a journey either.) This is exactly what happens in recent builds of abiword. For example, during a copy and paste operation non ODF content is available on the clipboard. In the paste the encoded xml:id values are sought from the clipboard content, updated, and the RDF (if any) associated with the source copy/cut is relinked to the pasted content. Links and RDF are also preserved across non ODF save/load cycles. For example loading an ODT with RDF and saving it to abw format, closing abiword, opening abiword and loading the abw and saving back to ODT does what one would hope. That being RDF preservation and any RDF pointing to xml:id links being preserved. I don't find this so surprising, it is more a matter of wanting it to happen.


  • 13.  RE: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-21-2011 03:11
    The reason for preserving xml:id is that they can be referenced by any URI from other material including RDF/XML but not limited to that (and from outside the package, if someone has a URI scheme for that). So we have this problem already, no matter what happens with change tracking. I would have preferred that we had never introduced xml:id for any other use, since we already have private identifier systems and we could have maintained them instead of going through a limited conversion to xml:id on what appear to have been solely ideological grounds. We broke down-level compatibility by fixing something that wasn't broken, in my considered opinion. - Dennis


  • 14.  Re: [office-collab] Using RDF for Change Tracking serialization?

    Posted 05-13-2011 08:01
    On Thu, 2011-05-12 at 10:39 +0100, Robin LaFontaine wrote: I do not think this has been considered. Interesting idea. Please clarify: - you have shown how it applies to ac:change attributes, but presumably it could also be applied to the other GCT attributes as well? Then there would just be an ID and RDF referencing this ID and containing all the CT information Looking over a few other examples one might get the following. There might be issues in there as I sort of winged it creating them. Though I still need to think on the examples that move / delete content. The final example does a bit of that and one may end up explicitly citing the ODF XML types per revision in RDF. 6.1.2 ORIGINAL <text:p delta:insertion-type="insert-with-content"  delta:insertion-change-idref='ct1234'> This paragraph is inserted.</text:p> WITH RDF The content.xml <text:p xml:id="n001">This paragraph is inserted.</text:p> The RDF n001 delta:insertion-type          insert-with-content n001 delta:insertion-change-idref  ct1234 6.3.2 ORIGINAL <text:p> This text will be made <text:span delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1234' text:style-name="bold-style">bold</text:span>. </text:p> WITH RDF The content.xml <text:p> This text will be made <text:span xml:id="n002" text:style-name="bold-style">bold</text:span>. </text:p> The RDF n002 delta:insertion-type         insert-around-content n002 delta:insertion-change-idref ct1234 6.5.2 ORIGINAL <text:p split:split01='sp1'>   This paragraph will be split into two. </text:p> <text:p delta:insertion-type='split'         delta:insertion-change-idref='ct1'         delta:split-id='sp1'> This will be in the second paragraph. </text:p> WITH RDF The content.xml <text:p xml:id="n005">   This paragraph will be split into two. </text:p> <text:p xml:id="n006"> This will be in the second paragraph. </text:p> The RDF n005 split:split01                 sp1 n006 delta:insertion-type          split n006 delta:insertion-change-idref  ct1 n006 delta:split-id                sp1 6.11.2 ORIGINAL <text:p> How text is <delta:inserted-text-start delta:inserted-text-id="it632507360" delta:insertion-change-idref= ct1 />very easily <delta:inserted-text-end delta:inserted-text-idref="it632507360"/>added. </text:p> WITH RDF The content.xml <text:p> How text is <delta:inserted-text-start xml:id="n007"/>very easily <delta:inserted-text-end xml:id="n008"/>added. </text:p> The RDF n007 delta:inserted-text-id="it632507360" n007 delta:insertion-change-idref="ct1" n007 ends-at n008 6.13.2 ORIGINAL <delta:remove-leaving-content-start delta:removal-change-idref='ct1234' delta:end-element-idref='ee888'>     <text:p text:style-name="Text_20_body" </delta:remove-leaving-content-start> <text:h text:style-name="Heading_20_1"         text:outline-level="1"         delta:insertion-type='insert-around-content'         delta:insertion-change-idref='ct1234'> What are the ground rules? </text:h> <delta:remove-leaving-content-end delta:end-element-id='ee888'/> WITH RDF The content.xml <text:h xml:id="n010"         text:style-name="Heading_20_1"         text:outline-level="1"> What are the ground rules? </text:h> The RDF n010   has-revision                       ct1234 ct1234 element-type                       text:p ct1234 text:style-name                    Text_20_body n010   delta:insertion-type               insert-around-content n010   delta:insertion-change-idref       ct1234 - presumably also RDF could be used to represent CT Sets and Stacks? I think this would be a really great use for RDF. The delta:tracked-changes tree could be made RDF and then possibilities like location, foaf etc immediately present themselves. For example the below cites a person and location for a change, and also allows the software to explicitly cite another location for a change. This sort of thing could be extremely useful for companies where it might be desired to know if a change was performed while travelling, at home, or at the office. Perhaps such information would be used when accepting or assessing changes. Folks tending to be less alert while hacking text over the Pacific. If nothing else, I think putting this data into RDF/XML would be a wonderful thing. Links made are via the ct1 style numbers, so the identifiers should not suffer the same issues as for xml:id values. Copy and paste of delta:change-transaction shouldn't happen via the office app either like it might for a text:p with an xml:id. OLD <delta:change-transaction delta:change-id="ct1">    <delta:change-info>      <dc:creator>Robin</dc:creator>      <dc:date>2010-06-02T15:48:00</dc:date>    </delta:change-info> </delta:change-transaction> NEW <uri:robin> < http://xmlns.com/foaf/0.1/name > "Robin" <uri:robin> < http://xmlns.com/foaf/0.1/homepage > < http://robin.deltaxml.com/ > <uri:robin> < http://xmlns.com/foaf/0.1/based_near > _:genid1 <uri:robin> < http://www.w3.org/1999/02/22-rdf-syntax-ns#type > < http://xmlns.com/foaf/0.1/Person > _:genid1 < http://www.w3.org/2003/01/geo/wgs84_pos#lat >   "51.47026" _:genid1 < http://www.w3.org/2003/01/geo/wgs84_pos#long >  "-2.59466" delta:change-transaction   delta:change-id   ct1 ct1                        dc:creator        uri:robin ct1                        dc:date           2010-06-02T15:48:00 ct1                        performed-at      _:genid2 _:genid1 < http://www.w3.org/2003/01/geo/wgs84_pos#lat >   "15.47026" _:genid1 < http://www.w3.org/2003/01/geo/wgs84_pos#long >  "12.59466" - you gain ability to query with SPARQL but the original XML could be queried with XQuery and XPath. I do not know the relative merits of these in this situation - any comments? I can't really say either way. One thing that comes to mind is that when an application wants to get value out of RDF it really wants to have SPARQL capabilities. Activities such as finding foaf data linked to a text:p is much simpler as a SPARQL. On the other hand an office app might not link to xqilla etc because it focuses more on load/save of ODF rather than runtime queries of it. But I've not looked at which of XQuery / SPARQL would give better value when querying this sort of data. - if we want to define constraints, e.g. what constitutes a valid delete column change, would this be easier with CT in RDF or as XML? I'll leave this one for now. An RDFS/OWL vs XSD/RelaxNG comparison would be interesting...but perhaps more of a whole arvo activity. - presumably some XML infrastructure in content.xml is still needed, for example markers for deleted items and the deleted item itself somewhere else in the document Yes, the keen eye will easily notice that I left the deletion examples alone :/ Regarding your first aside about xml:id attributes - this is a big problem and the only practical solution I have seen is the simple one that requires applications to keep the IDs where possible (cut and paste does as you say require new IDs to be generated). Applications don't want to do that but the problem of matching up changed IDs is very complex and computationally expensive, so IMHO it is best to require that they are preserved. After all the rest of the XML needs to be retained, so why not the ID values? Perhaps the RDF itself could be used to preserve them?? Unfortunately the RDF can't really preserve the xml:id values because they are the link from the RDF graph to the content.xml. The RDF could remember what the xml:id was, but if an app then writes a text:p with a new xml:id there isn't really a way to know that the new value replaces the old. I guess one could try to infer it from the context, but that would indeed be hideously complex.