OpenDocument - Adv Document Collab SC

 View Only
Expand all | Collapse all

Some thoughts on Change Tracking

  • 1.  Some thoughts on Change Tracking

    Posted 09-14-2011 07:11
    Hi, Firstly, apologies for missing the last conference call. Since we are gathering wiki pages for issues with GCT and ECT I thought I would reiterate some of my thoughts FWIW. I think most of the points in this email have been discussed before. I am bringing them together to help make it easier for them to be included and referenced in any document such as consensus reports etc. Note that I'm just an implementer, so any thoughts I express below are more or less how I imagine change tracking might be desired by certain consumers. My first line of thought boils down to what the motivation is for performing change tracking in the first place. At a very abstract level, one could store the content.xml et al files in a git repository and perform rather complex XML or semantic diffs on the revision history to obtain change tracking without any application support. Some of the issues raised on the list recently come up here for semantic diffs. For example, treating styles with equivalent semantics but different names as equal and thus "not a change". See for example: http://markmail.org/message/65dsnvrzmbdmjxka However, one might like the document editor itself, be that LibreOffice, Calligra, abiword, or AJAX code, to help you see changes and offer to negate them, track new changes, or perform other actions. It then becomes interesting for ODF itself to allow a representation of changes to be captured in the document file itself. This is as opposed to storing just the latest state on ODF and performing a diff with an older complete ODF document. This is where I find the ECT bucket concept extremely troublesome. In order for an application to tell you what changed it must perform a complex semantic diff between inline content in content.xml (r10) and the latest bucket (r9). To find what changed in (r9) then another complex semantic diff is needed between r9 and r8 and so forth. I find this an extremely critical issue has it means that users of change tracking are relying on various applications to imply changes rather than being told directly and explicitly what has changed. For government use cases this might be an unacceptable distinction and as such the use of ODF for change tracking of documents might be rejected due to uncertainty. This is without considering the computational complexity of performing these diffs over chains of buckets to look back 20 revisions. Another way ECT buckets are, IMHO, deceptive is that they are proposed as a means to make things simple. Contrary to this, in abiword loading and saving an ECT bucket would require a different code path for buckets to mainstream content. It seems the OOo code would also require such attention: http://markmail.org/message/ox5ft4g57tgtyepz At least for abiword, changes to an attribute are stored in the data model inline. Thus it actually becomes harder to implement writing to an ECT bucket than to just use GCT ac:change attributes during a save. This is because to write a bucket the code needs to consider a whole rage of an in memory model and the revision and state attached to everything in that bucket to figure out what the bucket content will be. I reiterate my concern for matched pair XML elements in ECT buckets (UC7 and UC8). Having the start XML element move to a bucket will cause problems with matching pair end XML elements. One could at times insert a new matching end tag, thus splitting two matching elements into four. This also cascades in future changes, and the operation may very well need to change the names associated either with the new pair or the old pair of elements to maintain uniqueness. Such a name change itself needs to be change tracked or at least be able to explicitly and deterministically imply the link between the two pairs (the renamed one(s) and the original one) so that GUI elements can offer both to users wishing to review revisions. As I expressed early on, I think it might also be useful for the GCT to have some conformance levels which list elements, attributes etc which must be tracked in order for an application to be claim a given level of conformance. This has been discussed over a few threads on the list. For longer life documents, some form of change tracking epochs might be useful: http://markmail.org/message/64xcxoagwyxy4sn4 And in any case, I think change tracking on the RDF of the document should definitely not be a forgotten item: http://markmail.org/message/4t6zlmiieno2g7on


  • 2.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 09-15-2011 21:58
    monkeyiq wrote about ECT: > I find this an extremely critical issue has it means that users of > change tracking are relying on various applications to imply changes > rather than being told directly and explicitly what has changed. > Hi Ben, well the same applies to GCT. The very fact that it needs annotations is testament to this issue. I maintain that GCT markup, while being a nice idea on the xml level, fails my requirements as an implementer of a non-xml internal data model application. Cheers, -- Thorsten Behrens Novell GmbH, Nördlicher Zubringer 9-11, 40470 Düsseldorf; GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 21108 (AG Düsseldorf) PGP signature


  • 3.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 09-16-2011 00:13
    On Thu, 2011-09-15 at 23:53 +0200, Thorsten Behrens wrote: > monkeyiq wrote about ECT: > > I find this an extremely critical issue has it means that users of > > change tracking are relying on various applications to imply changes > > rather than being told directly and explicitly what has changed. > > > Hi Ben, > > well the same applies to GCT. The very fact that it needs > annotations is testament to this issue. I maintain that GCT markup, > while being a nice idea on the xml level, fails my requirements as > an implementer of a non-xml internal data model application. > > Cheers, Perhaps we are looking at different aspects here. I am referring to example like the "Edit Image/Shape/Chart" from the ECT (pp17). In order for an application to tell you that the image file name changed from Image1.jpg to Image2.jpg it will have to do a diff on the text:tracked-changes/ text:changed-region@text:id="1"/ text:deletion ct:id="1"/ draw:frame And the inline draw:frame for text:change-start text:change-id="1" ct:sub-id="2" In the GCT this would be explicit in an ac:change attribute. No need to perform any analysis to see that the xlink:href was Image1.jpg in the last revision. I always thought that annotating a change in the GCT was more for higher level semantic use. For example, editing an article if one gets new information about "General X does Y" they might make that part of the annotation for the change set to keep higher level semantics for the edit. I'm interested in the non-xml data models. Abiword uses a piecetable to store and edit the document, and has a non ODF native file format which it is geared towards. Though during IO things boil down to using various append() methods on the piecetable to bring in the document structure; https://github.com/monkeyiq/odf-2011-track-changes-git-svn/blob/master/plugins/opendocument/imp/xp/ODi_TextContent_ListenerState.cpp#L2836 And some of the append() methods are around here; https://github.com/monkeyiq/odf-2011-track-changes-git-svn/blob/master/src/text/ptbl/xp/pt_PieceTable.h#L280


  • 4.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 09-16-2011 00:24
    On Thu, 2011-09-15 at 18:12 -0600, monkeyiq wrote: > On Thu, 2011-09-15 at 23:53 +0200, Thorsten Behrens wrote: > > monkeyiq wrote about ECT: > > > I find this an extremely critical issue has it means that users of > > > change tracking are relying on various applications to imply changes > > > rather than being told directly and explicitly what has changed. > > > > > Hi Ben, > > > > well the same applies to GCT. The very fact that it needs > > annotations is testament to this issue. I maintain that GCT markup, > > while being a nice idea on the xml level, fails my requirements as > > an implementer of a non-xml internal data model application. > > > > Cheers, > > Perhaps we are looking at different aspects here. I am referring to > example like the "Edit Image/Shape/Chart" from the ECT (pp17). In order > for an application to tell you that the image file name changed from > Image1.jpg to Image2.jpg it will have to do a diff on the > > text:tracked-changes/ > text:changed-region@text:id="1"/ > text:deletion ct:id="1"/ > draw:frame > And the inline draw:frame for > text:change-start text:change-id="1" ct:sub-id="2" > > In the GCT this would be explicit in an ac:change attribute. No need to > perform any analysis to see that the xlink:href was Image1.jpg in the > last revision. But changing the name of the image file in the zip package is _not_ a change to the document. The image file names are just internal references to find the appropriate image or chart description. Andreas > -- Andreas J. Guelzow, PhD, FTICA Concordia University College of Alberta This is a digitally signed message part


  • 5.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 09-16-2011 00:43
    On Thu, 2011-09-15 at 18:22 -0600, Andreas J. Guelzow wrote: > On Thu, 2011-09-15 at 18:12 -0600, monkeyiq wrote: > > On Thu, 2011-09-15 at 23:53 +0200, Thorsten Behrens wrote: > > > monkeyiq wrote about ECT: > > > > I find this an extremely critical issue has it means that users of > > > > change tracking are relying on various applications to imply changes > > > > rather than being told directly and explicitly what has changed. > > > > > > > Hi Ben, > > > > > > well the same applies to GCT. The very fact that it needs > > > annotations is testament to this issue. I maintain that GCT markup, > > > while being a nice idea on the xml level, fails my requirements as > > > an implementer of a non-xml internal data model application. > > > > > > Cheers, > > > > Perhaps we are looking at different aspects here. I am referring to > > example like the "Edit Image/Shape/Chart" from the ECT (pp17). In order > > for an application to tell you that the image file name changed from > > Image1.jpg to Image2.jpg it will have to do a diff on the > > > > text:tracked-changes/ > > text:changed-region@text:id="1"/ > > text:deletion ct:id="1"/ > > draw:frame > > And the inline draw:frame for > > text:change-start text:change-id="1" ct:sub-id="2" > > > > In the GCT this would be explicit in an ac:change attribute. No need to > > perform any analysis to see that the xlink:href was Image1.jpg in the > > last revision. > > But changing the name of the image file in the zip package is _not_ a > change to the document. The image file names are just internal > references to find the appropriate image or chart description. Well, this is just the example from the ECT proposal document. It is my understanding that any semantic change in the draw:frame is handled the same way. So for example, editing the caption will want to be change tracked and will produce the above need to perform a diff analysis.


  • 6.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 09-16-2011 00:50
    On Thu, 2011-09-15 at 18:42 -0600, monkeyiq wrote: > On Thu, 2011-09-15 at 18:22 -0600, Andreas J. Guelzow wrote: > > On Thu, 2011-09-15 at 18:12 -0600, monkeyiq wrote: > > > On Thu, 2011-09-15 at 23:53 +0200, Thorsten Behrens wrote: > > > > monkeyiq wrote about ECT: > > > > > I find this an extremely critical issue has it means that users of > > > > > change tracking are relying on various applications to imply changes > > > > > rather than being told directly and explicitly what has changed. > > > > > > > > > Hi Ben, > > > > > > > > well the same applies to GCT. The very fact that it needs > > > > annotations is testament to this issue. I maintain that GCT markup, > > > > while being a nice idea on the xml level, fails my requirements as > > > > an implementer of a non-xml internal data model application. > > > > > > > > Cheers, > > > > > > Perhaps we are looking at different aspects here. I am referring to > > > example like the "Edit Image/Shape/Chart" from the ECT (pp17). In order > > > for an application to tell you that the image file name changed from > > > Image1.jpg to Image2.jpg it will have to do a diff on the > > > > > > text:tracked-changes/ > > > text:changed-region@text:id="1"/ > > > text:deletion ct:id="1"/ > > > draw:frame > > > And the inline draw:frame for > > > text:change-start text:change-id="1" ct:sub-id="2" > > > > > > In the GCT this would be explicit in an ac:change attribute. No need to > > > perform any analysis to see that the xlink:href was Image1.jpg in the > > > last revision. > > > > But changing the name of the image file in the zip package is _not_ a > > change to the document. The image file names are just internal > > references to find the appropriate image or chart description. > > Well, this is just the example from the ECT proposal document. It is my > understanding that any semantic change in the draw:frame is handled the > same way. So for example, editing the caption will want to be change > tracked and will produce the above need to perform a diff analysis. > And in GCT while you know that the image file name has changed you also still have to perform a diff analysis to determine whether there has indeed been a change to the image (or chart or...). So in both cases you need to determine whether there was change and if there was a change what it in consisted of (assuming that your implementation wants to be more specific then saying that there may have been a change). Andreas -- Andreas J. Guelzow, PhD, FTICA Concordia University College of Alberta This is a digitally signed message part


  • 7.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 09-16-2011 07:56
    On Thu, 2011-09-15 at 18:49 -0600, Andreas J. Guelzow wrote: > On Thu, 2011-09-15 at 18:42 -0600, monkeyiq wrote: > > On Thu, 2011-09-15 at 18:22 -0600, Andreas J. Guelzow wrote: > > > On Thu, 2011-09-15 at 18:12 -0600, monkeyiq wrote: > > > > On Thu, 2011-09-15 at 23:53 +0200, Thorsten Behrens wrote: > > > > > monkeyiq wrote about ECT: > > > > > > I find this an extremely critical issue has it means that users of > > > > > > change tracking are relying on various applications to imply changes > > > > > > rather than being told directly and explicitly what has changed. > > > > > > > > > > > Hi Ben, > > > > > > > > > > well the same applies to GCT. The very fact that it needs > > > > > annotations is testament to this issue. I maintain that GCT markup, > > > > > while being a nice idea on the xml level, fails my requirements as > > > > > an implementer of a non-xml internal data model application. > > > > > > > > > > Cheers, > > > > > > > > Perhaps we are looking at different aspects here. I am referring to > > > > example like the "Edit Image/Shape/Chart" from the ECT (pp17). In order > > > > for an application to tell you that the image file name changed from > > > > Image1.jpg to Image2.jpg it will have to do a diff on the > > > > > > > > text:tracked-changes/ > > > > text:changed-region@text:id="1"/ > > > > text:deletion ct:id="1"/ > > > > draw:frame > > > > And the inline draw:frame for > > > > text:change-start text:change-id="1" ct:sub-id="2" > > > > > > > > In the GCT this would be explicit in an ac:change attribute. No need to > > > > perform any analysis to see that the xlink:href was Image1.jpg in the > > > > last revision. > > > > > > But changing the name of the image file in the zip package is _not_ a > > > change to the document. The image file names are just internal > > > references to find the appropriate image or chart description. > > > > Well, this is just the example from the ECT proposal document. It is my > > understanding that any semantic change in the draw:frame is handled the > > same way. So for example, editing the caption will want to be change > > tracked and will produce the above need to perform a diff analysis. > > > And in GCT while you know that the image file name has changed you also > still have to perform a diff analysis to determine whether there has > indeed been a change to the image (or chart or...). So in both cases you > need to determine whether there was change and if there was a change > what it in consisted of (assuming that your implementation wants to be > more specific then saying that there may have been a change). > I think this also depends on what the implementation wants to do. For pixmap images one is dealing with compressed binary formats so perhaps the simplest brute force of comparing the RGB pixmaps would be effective. For SVG images the option is at least available to use GCT on the SVG XML file itself. Assuming that government, publication, or whatever companies want GCT on SVG then perhaps implementations will offer it there too. Otherwise, you are indeed forced to do complex comparisons for the image or chart, etc data. Borders on those, caption text:p, and other inline content would not mandate such a diff in the GCT though.


  • 8.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 10-24-2011 11:22
    Comment at end - I have been re-reading as part of consensus report editing. On 16/09/2011 01:49, Andreas J. Guelzow wrote: On Thu, 2011-09-15 at 18:42 -0600, monkeyiq wrote: On Thu, 2011-09-15 at 18:22 -0600, Andreas J. Guelzow wrote: On Thu, 2011-09-15 at 18:12 -0600, monkeyiq wrote: On Thu, 2011-09-15 at 23:53 +0200, Thorsten Behrens wrote: monkeyiq wrote about ECT: I find this an extremely critical issue has it means that users of change tracking are relying on various applications to imply changes rather than being told directly and explicitly what has changed. Hi Ben, well the same applies to GCT. The very fact that it needs annotations is testament to this issue. I maintain that GCT markup, while being a nice idea on the xml level, fails my requirements as an implementer of a non-xml internal data model application. Cheers, Perhaps we are looking at different aspects here. I am referring to example like the "Edit Image/Shape/Chart" from the ECT (pp17). In order for an application to tell you that the image file name changed from Image1.jpg to Image2.jpg it will have to do a diff on the text:tracked-changes/ text:changed-region@text:id="1"/ text:deletion ct:id="1"/ draw:frame And the inline draw:frame for text:change-start text:change-id="1" ct:sub-id="2" In the GCT this would be explicit in an ac:change attribute. No need to perform any analysis to see that the xlink:href was Image1.jpg in the last revision. But changing the name of the image file in the zip package is _not_ a change to the document. The image file names are just internal references to find the appropriate image or chart description. Well, this is just the example from the ECT proposal document. It is my understanding that any semantic change in the draw:frame is handled the same way. So for example, editing the caption will want to be change tracked and will produce the above need to perform a diff analysis. And in GCT while you know that the image file name has changed you also still have to perform a diff analysis to determine whether there has indeed been a change to the image (or chart or...). So in both cases you need to determine whether there was change and if there was a change what it in consisted of (assuming that your implementation wants to be more specific then saying that there may have been a change). Andreas I would not expect an application (an editing application or any other) to show a change to an image just because the name of the image file has changed. Although GCT can represent any change to the XML content it does not mean that an application has to show changes that are not 'real' changes. I would hope that an application can be clever enough to determine that these are only changes to pointers - it takes some work but is not difficult. The same applies to changes to automatic styles and other areas - some intelligence can be applied to avoid showing changes that are not 'real'. Of course what is a 'real' change should be defined by ODF, but is not: different representations of automatic styles, different representations of spans etc. An editing application would surely handle this in its internal data structure before it writes out the changes. The intention of GCT is to be able to show a change at the lowest level to avoid the need to diff sections of the XML. It does not of course address binary data. Robin -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd "Change control for XML" T: +44 1684 592 144 E:robin.lafontaine@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK


  • 9.  RE: [office-collab] Some thoughts on Change Tracking

    Posted 10-24-2011 18:38
    A change to the name of the image referenced in the markup means that an entirely different picture is being used. I'd certainly expect an application that supports showing change tracking to show that as a change. As an example, the only thing that changed when I tested deleting an image and replacing it with a different one (with the same dimensions) in OO.o 3.3 was the last 4 characters in the file name. <draw:frame draw:style-name="fr1" draw:name="graphics1" text:anchor-type="paragraph" svg:width="6.9252in" svg:height="4.328in" draw:z-index="0"> <draw:image xlink:href="Pictures/1000000000000780000004B0D94E6AD8.jpg" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/> </draw:frame> <draw:frame draw:style-name="fr1" draw:name="graphics1" text:anchor-type="paragraph" svg:width="6.9252in" svg:height="4.328in" draw:z-index="0"> <draw:image xlink:href="Pictures/1000000000000780000004B07DFCDB06.jpg" xlink:type="simple" xlink:show="embed" xlink:actuate="onLoad"/> </draw:frame> Re-reading the back-and-forth between Ben and Andreas, perhaps some read a bit of confusion into the example? The ECT example was not about changing the name of an existing file in the package and what would happen to references to that file in the ODF markup. I agree that's not worth tracking and I can't imagine off the top of my head why that would be useful. Recall ECT is about tracking user actions, so that example was about the user replacing one image with another. Which changes the filename in the draw:image markup. John


  • 10.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 10-26-2011 14:38
    Yes, some confusion here I think, we have drifted off the original topic which was to do with the need to diff data to find out what has changed. Ben asserted that in ECT this diffing of draw:frame was needed to establish that only the ct:id attribute had changed, but in GCT the change to that specific attribute was explicit. Ben's assertion is true: he says that it is necessary with the ECT representation to compare the old cached draw:frame with the current one to work out if the ct:id has changed or perhaps the caption, it could be either or both (or something else, perhaps the svg:width attribute etc). The point is that with a cached element you do not know which bit has changed, you need to work it out. Andreas points out that the attribute change may not be a 'real' change, which is a fair point and the (producing) application would need to work that out and not indicate a false change. (I don't think we have ever discussed how to show a change in the situation where the .jpg file (in your example) has been edited but has the same name and size, i.e. the draw:frame XML has not changed at all. I think that is outside the scope of what either ECT or GCT is trying to achieve.) But Andrea's point does not invalidate Ben's assertion. Therefore in the consensus report, 8.2, states 'Bucket' approach means readers need to do comparison work to work out detail of changes is correct, though we could perhaps give some more prose around that in the discussion of different approaches. Perhaps 'readers' should be 'consumers'. Robin On 24/10/2011 19:38, John Haug wrote: A change to the name of the image referenced in the markup means that an entirely different picture is being used. I'd certainly expect an application that supports showing change tracking to show that as a change. As an example, the only thing that changed when I tested deleting an image and replacing it with a different one (with the same dimensions) in OO.o 3.3 was the last 4 characters in the file name. <draw:frame draw:style-name= fr1 draw:name= graphics1 text:anchor-type= paragraph svg:width= 6.9252in svg:height= 4.328in draw:z-index= 0 > <draw:image xlink:href= xlink:type= simple xlink:show= embed xlink:actuate= onLoad /> </draw:frame> <draw:frame draw:style-name= fr1 draw:name= graphics1 text:anchor-type= paragraph svg:width= 6.9252in svg:height= 4.328in draw:z-index= 0 > <draw:image xlink:href= xlink:type= simple xlink:show= embed xlink:actuate= onLoad /> </draw:frame> Re-reading the back-and-forth between Ben and Andreas, perhaps some read a bit of confusion into the example? The ECT example was not about changing the name of an existing file in the package and what would happen to references to that file in the ODF markup. I agree that's not worth tracking and I can't imagine off the top of my head why that would be useful. Recall ECT is about tracking user actions, so that example was about the user replacing one image with another. Which changes the filename in the draw:image markup. John


  • 11.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 10-26-2011 17:26
    On Wed, 2011-10-26 at 08:37 -0600, Robin LaFontaine wrote: > Yes, some confusion here I think, we have drifted off the original > topic which was to do with the need to diff data to find out what has > changed. Ben asserted that in ECT this diffing of draw:frame was > needed to establish that only the ct:id attribute had changed, but in > GCT the change to that specific attribute was explicit. > > Ben's assertion is true: he says that it is necessary with the ECT > representation to compare the old cached draw:frame with the current > one to work out if the ct:id has changed or perhaps the caption, it > could be either or both (or something else, perhaps the svg:width > attribute etc). The point is that with a cached element you do not > know which bit has changed, you need to work it out. > > Andreas points out that the attribute change may not be a 'real' > change, which is a fair point and the (producing) application would > need to work that out and not indicate a false change. (I don't think > we have ever discussed how to show a change in the situation where > the .jpg file (in your example) has been edited but has the same name > and size, i.e. the draw:frame XML has not changed at all. I think that > is outside the scope of what either ECT or GCT is trying to achieve.) But this really should not be outside the scope! The first image in a Gnumeric generated ODS file is always Image1.xxx. So replacing a jpg with another jpg will always retain the image name in the package. In ECT this would be marked as a draw:frame change I would think. What would be happening under GCT? Andreas -- Andreas J. Guelzow, PhD, FTICA Concordia University College of Alberta


  • 12.  Re: [office-collab] Some thoughts on Change Tracking

    Posted 09-19-2011 14:14
    Thanks for this - this kind of reflection and summary of your considered opinion is useful as reference material for the consensus report. I would encourage others to do this if you have reflections on the discussions we have had and wish to summarise the key point(s) from your viewpoint. There has been a lot of traffic and I will do my best to reflect the main points in the report. Robin On 14/09/2011 08:10, monkeyiq wrote: Hi, Firstly, apologies for missing the last conference call. Since we are gathering wiki pages for issues with GCT and ECT I thought I would reiterate some of my thoughts FWIW. I think most of the points in this email have been discussed before. I am bringing them together to help make it easier for them to be included and referenced in any document such as consensus reports etc. Note that I'm just an implementer, so any thoughts I express below are more or less how I imagine change tracking might be desired by certain consumers. ..snip -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd Change control for XML T: +44 1684 592 144 E: robin.lafontaine@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK