OpenDocument - Adv Document Collab SC

  • 1.  Change Tracking on the RDF itself: Some initial thoughts...

    Posted 05-29-2011 11:46
    Hi,   I thought I'd throw around some ideas and runnable scripts for change tracking the RDF triples. The scripts use the redland library to execute. The design follows my past idea that the RDF model offered by the document API need not be the RDF model that is stored in the ODF file. For example, the RDF in the file might have all triples reified whereas the RDF offered by the API might not expose that if not needed.   Many of the examples here have technical issues which I realize but put them forward anyway to stimulate discussion. There are some simplifications too in order to focus on the examples and make them more compact. Consider the following little RDF using redland: #!/bin/bash rm -f test*db rdfproc test -- add bnode1 " http://www.w3.org/2003/01/geo/wgs84_pos#lat " "51.47026" rdfproc test -- add bnode1 " http://www.w3.org/2003/01/geo/wgs84_pos#long " "-2.59466" rdfproc test -- add "uri:gollum" " http://xmlns.com/foaf/0.1/name " "Gollum" rdfproc test -- add "uri:gollum" " http://xmlns.com/foaf/0.1/phone " "tel:11 1322342" The same sort of thing might be specified with change tracking as follows. The first part basically breaks the wgs84 triples into three subject, predicate, object triples. In a formal spec one would have to follow the proper reification, but I've just faked it for the purpose of discussion. After the two location triples and three for a foaf entry for Gollum are added, all of those reified triples are given a change-tracking version. This might be the same value as the delta:change-id instead of just a number. Along those lines, the RDF would then have to create an order over revisions using the dc:date instead of their numeric values directly.  That is, instead of sorting on the number $VER one would have to lookup the following and sort in it instead: totimet(delta:change-transaction@[delta:change-id=$VER]/delta:change-info/dc:date) It may be convenient for change tracked RDF to replicate a fragment of the content.xml//delta:tracked-changes tree in the manifest.rdf. Two updates to Gollum's details follow, and a deletion of his home page. The explicit forward and reverse linking of old and new RDF triples is a bit redundant. It is present because I'm playing with SPARQL to see what is the simplest way to get the "current" RDF at a revision. #!/bin/bash rm -f cttest*db rdfproc cttest -- add uri:r1 "uri:subject"   "bnode1" rdfproc cttest -- add uri:r1 "uri:predicate" " http://www.w3.org/2003/01/geo/wgs84_pos#lat " rdfproc cttest -- add uri:r1 "uri:object"    "51.47026" rdfproc cttest -- add uri:r2 "uri:subject"   "bnode1" rdfproc cttest -- add uri:r2 "uri:predicate" " http://www.w3.org/2003/01/geo/wgs84_pos#long " rdfproc cttest -- add uri:r2 "uri:object"    "-2.59466" rdfproc cttest -- add uri:r3 "uri:subject"   "uri:gollum" rdfproc cttest -- add uri:r3 "uri:predicate" " http://xmlns.com/foaf/0.1/name " rdfproc cttest -- add uri:r3 "uri:object"    "Gollum" rdfproc cttest -- add uri:r4 "uri:subject"   "uri:gollum" rdfproc cttest -- add uri:r4 "uri:predicate" " http://xmlns.com/foaf/0.1/phone " rdfproc cttest -- add uri:r4 "uri:object"    "tel:11 1322342" rdfproc cttest -- add uri:r5 "uri:subject"   "uri:gollum" rdfproc cttest -- add uri:r5 "uri:predicate" " http://xmlns.com/foaf/0.1/homepage " rdfproc cttest -- add uri:r5 "uri:object"    " http://en.wikipedia.org/wiki/gollum " rdfproc cttest -- add uri:r1 "uri:delta-change-id"  "1^^xsd:integer" rdfproc cttest -- add uri:r2 "uri:delta-change-id"  "1^^xsd:integer" rdfproc cttest -- add uri:r3 "uri:delta-change-id"  "1^^xsd:integer" rdfproc cttest -- add uri:r4 "uri:delta-change-id"  "1^^xsd:integer" rdfproc cttest -- add uri:r5 "uri:delta-change-id"  "1^^xsd:integer" # update Gollum's phone number rdfproc cttest -- add uri:r6 "uri:subject"          "uri:gollum" rdfproc cttest -- add uri:r6 "uri:predicate"        " http://xmlns.com/foaf/0.1/phone " rdfproc cttest -- add uri:r6 "uri:object"           "tel:11 6665534" rdfproc cttest -- add uri:r6 "uri:delta-change-id"  "2^^xsd:integer" rdfproc cttest -- add uri:r6 "uri:update"           "uri:r4" rdfproc cttest -- add uri:r4 "uri:succeddedby"      "uri:r6" # remove his home page. rdfproc cttest -- add uri:r7 "uri:delta-change-id"   "3^^xsd:integer" rdfproc cttest -- add uri:r7 "uri:delete"            "uri:r5" rdfproc cttest -- add uri:r5 "uri:succeddedby"       "uri:r7" # update Gollum's phone number rdfproc cttest -- add uri:r8 "uri:subject"          "uri:gollum" rdfproc cttest -- add uri:r8 "uri:predicate"        " http://xmlns.com/foaf/0.1/phone " rdfproc cttest -- add uri:r8 "uri:object"           "tel:11 3232 6665534" rdfproc cttest -- add uri:r8 "uri:delta-change-id"  "4^^xsd:integer" rdfproc cttest -- add uri:r8 "uri:update"           "uri:r6" rdfproc cttest -- add uri:r6 "uri:succeddedby"      "uri:r8" Using older SPARQL versions makes some of the queries a tad more verbose and convoluted... http://en.wikibooks.org/wiki/XQuery/SPARQL_Tutorial#Compute_the_maximum_salary Without yet considering dropping triples which are deleted, I have the following SPARQL which will show the add/modified triples at revision 3 of the document. Note that the optional {} and bound() portion of the query should be simplified in SPARQL 1.1 implementations. Basically the query seeks the subject, predicate, and object with a version <= a desired number. The optional clause attempts to find the triple which succeeds the current one and has a revision in the valid range. If no such succeeding triple is found (!bound(?nestedver)) then we have the latest version of a triple that is not newer than a given ?ver. This is for the case where one seeks the RDF as it was in the past, ie the current latest version might be 10 but we want to see how things were at revision 3. #!/bin/bash rdfproc cttest query - - ' select ?s ?p ?o where {    ?s ?p ?o .    ?s <uri:delta-change-id> ?ver    optional {              ?s      <uri:succeddedby>     ?sucver .              ?sucver <uri:delta-change-id> ?nestedver .              FILTER ( ?nestedver <= "3^^xsd:integer" && ?nestedver > ?ver )          } .    filter( !bound(?nestedver) && ?ver <= "3^^xsd:integer" ) } ' The results are as follows. Note that the query has to be updated to respect the uri:delete predicate. Notice that r8 is not present as it has a change-id too new. Also notice that r4 is *not* present as r6 replaces it in change-id=2. rdfproc: Query returned bindings results: result: [s=<uri:r1>, p=<uri:object>, o="51.47026"] result: [s=<uri:r1>, p=<uri:subject>, o="bnode1"] result: [s=<uri:r1>, p=<uri:predicate>, o=< http://www.w3.org/2003/01/geo/wgs84_pos#lat >] result: [s=<uri:r1>, p=<uri:delta-change-id>, o="1^^xsd:integer"] result: [s=<uri:r2>, p=<uri:object>, o="-2.59466"] result: [s=<uri:r2>, p=<uri:subject>, o="bnode1"] result: [s=<uri:r2>, p=<uri:predicate>, o=< http://www.w3.org/2003/01/geo/wgs84_pos#long >] result: [s=<uri:r2>, p=<uri:delta-change-id>, o="1^^xsd:integer"] result: [s=<uri:r3>, p=<uri:object>, o="Gollum"] result: [s=<uri:r3>, p=<uri:subject>, o=<uri:gollum>] result: [s=<uri:r3>, p=<uri:predicate>, o=< http://xmlns.com/foaf/0.1/name >] result: [s=<uri:r3>, p=<uri:delta-change-id>, o="1^^xsd:integer"] result: [s=<uri:r6>, p=<uri:object>, o="tel:11 6665534"] result: [s=<uri:r6>, p=<uri:update>, o=<uri:r4>] result: [s=<uri:r6>, p=<uri:subject>, o=<uri:gollum>] result: [s=<uri:r6>, p=<uri:predicate>, o=< http://xmlns.com/foaf/0.1/phone >] result: [s=<uri:r6>, p=<uri:succeddedby>, o=<uri:r8>] result: [s=<uri:r6>, p=<uri:delta-change-id>, o="2^^xsd:integer"] result: [s=<uri:r7>, p=<uri:delete>, o=<uri:r5>] result: [s=<uri:r7>, p=<uri:delta-change-id>, o="3^^xsd:integer"] My next move will be to respect deletion in the SPARQL. Note that this is part way there, as r5 is not shown above because it is succeeded by r7 which is itself a delete operation on that triple.


  • 2.  RE: [office-collab] Change Tracking on the RDF itself: Some initial thoughts...

    Posted 05-31-2011 15:08
    Related to today's call and discussion of the RDF, I wanted to clarify some things: 1. In the content.xml file, the only use of RDF is as RDFa. That is, it is done only using attributes (and not all of the RDFa ones are usable). Since you can only have one particular RDFa attribute on any element that allows for it, there are ways to use <span>-type elements to introduce more of them. (Having reification via the RDFa provisions strikes me as unlikely.) 2. In the separate RDF in an ODF 1.2 package (RDF that provides semantic information about the XML parts that constitute the ODF document as such), RDF/XML is used. 3. There is, of course, nothing to prevent the introduction of RDF/XML elements as children of elements in the content.xml (and other) XML documents of the ODF. Under the current schema and conformance targets, these are foreign elements and treated as extensions. (It appears that this is not allowed in manifest.xml since the rules about extensions and foreign elements were not extended to that case. The top-level manifest.rdf might have some bearing on this case, but it is difficult to know for sure.) 4. In the current ODF specification, there is nothing on whether and how the metadata carried in these RDF representations is made known to the user of an ODF consumer and how its connection to the ODF document content material is maintained. Whatever there is about coordinated change-tracking between the ODF document and the RDF tied to it needs to somehow recognize that there is considerable variability here (and the RDF is new to ODF 1.2 and it is not clear to me what level of interoperable implementations there are at this time). I just wanted to add some context for the discussion about how RDF might be involved as the subject of change tracking. - Dennis


  • 3.  RE: [office-collab] Change Tracking on the RDF itself: Someinitial thoughts...

    Posted 06-01-2011 04:00
    Hi,   I got in contact with Sebastian Trüg of KDE/Nepomuk fame regarding change tracking RDF. He was wondering why I didn't use the fourth "context" node which is available in many implementations [1]. [1] http://en.wikipedia.org/wiki/Resource_Description_Framework#Statement_reification_and_context   My initial thoughts were that even if the context node is available, perhaps an application might want to use the context node itself instead of surrendering it for change tracking. However, it might be considered just a level of indirection as they say. For example, if an application wanted to store: ?subj ?pred ?object ?mycontext then instead the model might do:   ?subj ?pred ?object ?ctctx1   ?ctctx1 <uri:context> ?mycontext This could be done by the model without the API user even knowing it. This would allow ?ctctx1 to track all sorts of wonderful information like the delta:change-id which introduced the triple, and the one what deleted it. In this light an update of a triple is just a retract/assert (delete/insert) of a triple. So the example from the other day, using contexts becomes: $ cat buildctctx.sh #!/bin/bash rm -f ctctxtest*db rdfproc -c ctctxtest -- add "bnode1" " http://www.w3.org/2003/01/geo/wgs84_pos#lat "  "51.47026" "uri:cn1" rdfproc -c ctctxtest -- add "bnode1" " http://www.w3.org/2003/01/geo/wgs84_pos#long " "-2.59466" "uri:cn2" rdfproc -c ctctxtest -- add "uri:gollum" " http://xmlns.com/foaf/0.1/name "  "Gollum"         "uri:cn3" rdfproc -c ctctxtest -- add "uri:gollum" " http://xmlns.com/foaf/0.1/phone " "tel:11 1322342" "uri:cn4" rdfproc -c ctctxtest -- add "uri:gollum" " http://xmlns.com/foaf/0.1/homepage " " http://en.wikipedia.org/wiki/gollum " "uri:cn5" rdfproc -c ctctxtest -- add "uri:cn1" "uri:delta-change-id"  "1^^xsd:integer" rdfproc -c ctctxtest -- add "uri:cn2" "uri:delta-change-id"  "1^^xsd:integer" rdfproc -c ctctxtest -- add "uri:cn3" "uri:delta-change-id"  "1^^xsd:integer" rdfproc -c ctctxtest -- add "uri:cn4" "uri:delta-change-id"  "1^^xsd:integer" rdfproc -c ctctxtest -- add "uri:cn5" "uri:delta-change-id"  "1^^xsd:integer" # update Gollum's phone number rdfproc -c ctctxtest -- add "uri:gollum" " http://xmlns.com/foaf/0.1/phone " "tel:11 6665534" "uri:cn6" rdfproc -c ctctxtest -- add "uri:cn6" "uri:delta-change-id"  "2^^xsd:integer" rdfproc -c ctctxtest -- add "uri:cn6" "uri:update"           "uri:cn4" rdfproc -c ctctxtest -- add "uri:cn4" "uri:succeddedby"      "uri:cn6" rdfproc -c ctctxtest -- add "uri:cn4" "uri:deleted-change-id"  "2^^xsd:integer" # remove his home page. rdfproc -c ctctxtest -- add "uri:cn5" "uri:deleted-change-id"  "3^^xsd:integer" # update Gollum's phone number rdfproc -c ctctxtest -- add "uri:gollum" " http://xmlns.com/foaf/0.1/phone " "tel:11 3232 6665534" "uri:cn8" rdfproc -c ctctxtest -- add "uri:cn8" "uri:delta-change-id"  "4^^xsd:integer" rdfproc -c ctctxtest -- add "uri:cn8" "uri:update"           "uri:cn6" rdfproc -c ctctxtest -- add "uri:cn6" "uri:succeddedby"      "uri:cn8" rdfproc -c ctctxtest -- add "uri:cn6" "uri:deleted-change-id"  "4^^xsd:integer" rdfproc -c ctctxtest -- add "uri:cn6" "uri:succeddedby"      "uri:cn8" And the SPARQL to get the triples for document at change-id "3" would be as follows. Note that the same caveat applies in the FILTER() where what we really want is to order change-id by their dc:date instead of the numeric value of their change-id. It might be useful to consider forcing implementations to choose values for change-id such that the numeric or string comparison respected the dc:date order of the change-transaction. The query could very likely be made more efficient, its just something that works for demonstration purposes^TM. The query starts by expanding to quads. Then the graph context node is itself used in the model as a subject to find when the triple was introduced and optionally when it was deleted. A triple must be introduced before the change-id given and not retracted again before that change-id. This assumes that if a triple is updated it is deleted and inserted again, so the updated version will have a different context node and thus it will match the query but the old, deleted triple will not. $ cat ./ctctxcurrent.sparql #!/bin/bash rdfproc -c ctctxtest query - - ' prefix rdf: < http://www.w3.org/1999/02/22-rdf-syntax-ns# > select ?graph ?s ?p ?o ?gver where {   graph ?graph {      ?s ?p ?o .   }   . ?graph <uri:delta-change-id> ?gver   . OPTIONAL { ?graph <uri:deleted-change-id> ?delid }     FILTER   ( ?gver <= "3^^xsd:integer" && ( !bound(?delid) ?delid >= "3^^xsd:integer"  )) } ' And execution is as follows. Note that the phone number shown is from a triple which is retracted at a later time, specifically at "uri:delta-change-id"  "4" the number will become tel:11 3232 6665534. It is also not his initial phone number for change-id="1". This of course generalizes from phone numbers to any other semantics the RDF/XML wants to express which might vary as the document does. $ ./ctctxcurrent.sparql rdfproc: Query returned bindings results: result: [graph=<uri:cn1>, s=<bnode1>, p=< http://www.w3.org/2003/01/geo/wgs84_pos#lat >, o="51.47026", gver="1^^xsd:integer"] result: [graph=<uri:cn2>, s=<bnode1>, p=< http://www.w3.org/2003/01/geo/wgs84_pos#long >, o="-2.59466", gver="1^^xsd:integer"] result: [graph=<uri:cn3>, s=<uri:gollum>, p=< http://xmlns.com/foaf/0.1/name >, o="Gollum", gver="1^^xsd:integer"] result: [graph=<uri:cn5>, s=<uri:gollum>, p=< http://xmlns.com/foaf/0.1/homepage >, o=< http://en.wikipedia.org/wiki/gollum >, gver="1^^xsd:integer"] result: [graph=<uri:cn6>, s=<uri:gollum>, p=< http://xmlns.com/foaf/0.1/phone >, o="tel:11 6665534", gver="2^^xsd:integer"] rdfproc: Query returned 5 results I'll send through an ODF file example shortly.