OpenDocument - Adv Document Collab SC

 View Only
  • 1.  Serialization and change tracking

    Posted 06-21-2011 16:11
    While looking at the example in the ECT proposal (the one named UC1 in Robin's earlier email) I have spotted a potential problem with different, but semantically equivalent serializations of the document and the way that recorded change transactions can be processed. The example given is of a paragraph that uses a bold paragraph style. A plain text character style is then applied to one of the words (ct1) followed by an underline character style being applied to part of that same word (ct2). The three versions of the paragraph are: 1) Remove bold 2) Remove  bold 3) Remove  b ol d Assuming that the relevant paragraph and character styles have been defined, the three versions of the paragraph can be represented by: 1) <text:p text:style-name= BoldStyle >   Remove bold </text:p> 2) <text:p text:style-name= BoldStyle >   Remove    <text:span text:style-name= NormalStyle >bold</text:span> </text:p> 3) <text:p text:style-name= BoldStyle >   Remove    <text:span text:style-name= NormalStyle >b</text:span>   <text:span text:style-name= UnderlinedNormalStyle >ol</text:span>   <text:span text:style-name= NormalStyle >d</text:span> </text:p> The above representation is the cause for the potential problem because the central span mixes the effects of ct1 and ct2 in such a way that it is no longer possible, with either the ECT or GCT representation, to reject ct1 as a standalone operation. A rejection of ct1 should leave the word 'bold' in a bold face with the letters 'ol' underlined. The ECT representation of change for this paragraph is given as: <text:p text:style-name= BoldStyle >   Remove    <ct:format-change-start ct:id= 1 />   <text:span text:style-name= NormalStyle >b</text:span>   <ct:format-change-end ct:id= 1 />   <ct:format-change-start ct:id= 2 />   <text:span text:style-name= UnderlinedNormalStyle >ol</text:span>   <ct:format-change-end ct:id= 2 />   <ct:format-change-start ct:id= 1 />   <text:span text:style-name= NormalStyle >d</text:span>   <ct:format-change-end ct:id= 1 /> </text:p> In order to reject ct1, the spans contained immediately within start/end markers for ct1 should be removed (as I understand it) which would leave 'b' and 'd' as bold text but 'ol' would remain as plain underlined text. In fact, with the ECT representation, rejecting ct2 in the same way would leave 'b' and 'd' as plain text but would incorrectly revert 'ol' to bold text at the same time as removing the underline. The GCT representation is more verbose but still has the same problem that ct1 cannot be rejected on its own. In this case, ct2 can be rejected correctly. If an alternative representation using nested spans was used, the problem could be avoided: <text:p text:style-name= BoldStyle >   Remove    <text:span text:style-name= NormalStyle >b <text:span text:style-name= UnderlinedStyle >ol</text:span> d</text:span> </text:p> If this representation is used, ct1 can be rejected (by removing the outer span), leaving the 'ol' correctly underlined but bold. This would work correctly in both proposals as far as I can tell. The three-span representation, which is used by OpenOffice.org at least (I haven't tested other implementations), although visually correct, does not do very well at representing the editing operations that have occurred and so cannot be tracked well (the relationship between change tracking and editing operations has been well-establisehd in previous discussions). We need to consider how to advise implementors of the best way to serialize changes such as these in order to best facilitate change tracking. -- Tristan Mitchell, DeltaXML Ltd Change control for XML T: +44 1684 869 035 E: tristan.mitchell@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK


  • 2.  RE: [office-collab] Serialization and change tracking

    Posted 06-21-2011 18:59
    Unless I’m missing something, I see no issues with the cases you mention…   The example given in the document should indicate ct2 can be rejected properly.  Remember that the previous format state of the ct2 text is cached in the text:tracked-changes section.  From the example markup:      < text:changed-region text:id = " 2 " >        < text:format-change >           < office:change-info > … </ office:change-info >           < text:span text:style-name = " NormalStyle " />        </ text:format-change >    </ text:changed-region > This stores the NormalStyle applied to “ol” prior to underlining it, so an application properly rejecting/reverting the change should restore the NormalStyle.  In the same way than the app must split the single pair of ct1 change marks (around “bold”) into two pairs (one each for “b” and “d”) when ct2 is made, an application might recognize that after rejecting ct2 it has 3 adjacent spans of NormalStyle and choose to combine them back down to a single span.*   As for rejecting ct1 without rejecting ct2, I see two possibilities for runtime behavior: 1 – Changes can be rejected in arbitrary order.  Rejecting ct1 correctly leaves “ol” as non-bold underlined text as ct2 has not been rejected. 2 – Changes are rejected as a stack.  Rejecting ct1 means the application must unroll the stack and reject ct2 first.  As described above, rejecting ct2 brings “ol” back to NormalStyle and rejecting ct1 brings it all back to BoldStyle via merging the ct1 change marks.     * There is a complication I see for ensuring the markup makes clear to applications whether to merge, but it appears to be a minor implementation challenge that is solvable within the provisions of the file format we’re discussing.  I’ll touch on that next.   John   From: Tristan Mitchell [mailto:tristan.mitchell@deltaxml.com] Sent: Tuesday, June 21, 2011 9:11 AM To: office-collab@lists.oasis-open.org Subject: [office-collab] Serialization and change tracking   While looking at the example in the ECT proposal (the one named UC1 in Robin's earlier email) I have spotted a potential problem with different, but semantically equivalent serializations of the document and the way that recorded change transactions can be processed.   The example given is of a paragraph that uses a bold paragraph style. A plain text character style is then applied to one of the words (ct1) followed by an underline character style being applied to part of that same word (ct2). The three versions of the paragraph are:   1) Remove bold 2) Remove  bold 3) Remove  b ol d   Assuming that the relevant paragraph and character styles have been defined, the three versions of the paragraph can be represented by:   1) <text:p text:style-name="BoldStyle">   Remove bold </text:p>   2) <text:p text:style-name="BoldStyle">   Remove    <text:span text:style-name="NormalStyle">bold</text:span> </text:p>   3) <text:p text:style-name="BoldStyle">   Remove    <text:span text:style-name="NormalStyle">b</text:span>   <text:span text:style-name="UnderlinedNormalStyle">ol</text:span>   <text:span text:style-name="NormalStyle">d</text:span> </text:p>   The above representation is the cause for the potential problem because the central span mixes the effects of ct1 and ct2 in such a way that it is no longer possible, with either the ECT or GCT representation, to reject ct1 as a standalone operation. A rejection of ct1 should leave the word 'bold' in a bold face with the letters 'ol' underlined.   The ECT representation of change for this paragraph is given as: <text:p text:style-name="BoldStyle">   Remove    <ct:format-change-start ct:id="1"/>   <text:span text:style-name="NormalStyle">b</text:span>   <ct:format-change-end ct:id="1"/>   <ct:format-change-start ct:id="2"/>   <text:span text:style-name="UnderlinedNormalStyle">ol</text:span>   <ct:format-change-end ct:id="2"/>   <ct:format-change-start ct:id="1"/>   <text:span text:style-name="NormalStyle">d</text:span>   <ct:format-change-end ct:id="1"/> </text:p>   In order to reject ct1, the spans contained immediately within start/end markers for ct1 should be removed (as I understand it) which would leave 'b' and 'd' as bold text but 'ol' would remain as plain underlined text. In fact, with the ECT representation, rejecting ct2 in the same way would leave 'b' and 'd' as plain text but would incorrectly revert 'ol' to bold text at the same time as removing the underline.   The GCT representation is more verbose but still has the same problem that ct1 cannot be rejected on its own. In this case, ct2 can be rejected correctly.   If an alternative representation using nested spans was used, the problem could be avoided: <text:p text:style-name="BoldStyle">   Remove    <text:span text:style-name="NormalStyle">b <text:span text:style-name="UnderlinedStyle">ol</text:span>d</text:span> </text:p>   If this representation is used, ct1 can be rejected (by removing the outer span), leaving the 'ol' correctly underlined but bold. This would work correctly in both proposals as far as I can tell.   The three-span representation, which is used by OpenOffice.org at least (I haven't tested other implementations), although visually correct, does not do very well at representing the editing operations that have occurred and so cannot be tracked well (the relationship between change tracking and editing operations has been well-establisehd in previous discussions). We need to consider how to advise implementors of the best way to serialize changes such as these in order to best facilitate change tracking.   -- Tristan Mitchell, DeltaXML Ltd "Change control for XML" T: +44 1684 869 035 E: tristan.mitchell@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK  


  • 3.  Re: [office-collab] Serialization and change tracking

    Posted 06-22-2011 10:11
    John, Apologies, I had not picked up on the cached previous style state in the example and so yes, as you say, there are no issues with rejecting ct2 with the ECT example. I'm not sure I agree with you on what style the 'ol' should be in after rejecting just ct1. I suppose it depends on how the change is made in the first place. The starting state of the text is that it inherits the bold paragraph style. There is more than one way to (sensibly) get to the final stage: 1) If you select the word 'bold' and press the bold button or Ctrl-B (or whatever your favourite interface does to toggle bold) you have completed ct1. The next step is to highlight 'ol' and press the underline button, or Ctrl-U. This is ct2. In this case, I think that a rejection of ct1 should leave the whole of the word 'bold' as bold text with the 'ol' underlined. 2) Rather than using the buttons or key shortcuts, you could set up character styles. If style A specifies non-bold text and is applied to the whole of the word 'bold', this is ct1. Another style specifies non-bold text (arguably unnecessarily) and underlining and is applied to 'ol' This is ct2. In this case I agree that a rejection of ct1 should leave 'b' and 'd' as bold with 'ol' being non-bold underlined. The problem then becomes one of storing these subtly different changes in such a way that the rejection of ct1 gives a different result in each case. Performing the two different sets of actions above in OpenOffice.org does not give different results (other than that in 1. the styles are automatic styles and in 2. they are defined in the styles.xml file). I would have have expected action set 1 to produce nested spans but it produces three sibling spans as in the example given in the the ECT proposal. On 21 Jun 2011, at 19:58, John Haug wrote: > Unless I’m missing something, I see no issues with the cases you mention… > > The example given in the document should indicate ct2 can be rejected properly. Remember that the previous format state of the ct2 text is cached in the text:tracked-changes section. From the example markup: > > <text:changed-region text:id="2"> > <text:format-change> > <office:change-info> … </office:change-info> > <text:span text:style-name="NormalStyle"/> > </text:format-change> > </text:changed-region> > > This stores the NormalStyle applied to “ol” prior to underlining it, so an application properly rejecting/reverting the change should restore the NormalStyle. In the same way than the app must split the single pair of ct1 change marks (around “bold”) into two pairs (one each for “b” and “d”) when ct2 is made, an application might recognize that after rejecting ct2 it has 3 adjacent spans of NormalStyle and choose to combine them back down to a single span.* > > As for rejecting ct1 without rejecting ct2, I see two possibilities for runtime behavior: > 1 – Changes can be rejected in arbitrary order. Rejecting ct1 correctly leaves “ol” as non-bold underlined text as ct2 has not been rejected. > 2 – Changes are rejected as a stack. Rejecting ct1 means the application must unroll the stack and reject ct2 first. As described above, rejecting ct2 brings “ol” back to NormalStyle and rejecting ct1 brings it all back to BoldStyle via merging the ct1 change marks. > > > * There is a complication I see for ensuring the markup makes clear to applications whether to merge, but it appears to be a minor implementation challenge that is solvable within the provisions of the file format we’re discussing. I’ll touch on that next. > > John > > From: Tristan Mitchell [ mailto:tristan.mitchell@deltaxml.com ] > Sent: Tuesday, June 21, 2011 9:11 AM > To: office-collab@lists.oasis-open.org > Subject: [office-collab] Serialization and change tracking > > While looking at the example in the ECT proposal (the one named UC1 in Robin's earlier email) I have spotted a potential problem with different, but semantically equivalent serializations of the document and the way that recorded change transactions can be processed. > > The example given is of a paragraph that uses a bold paragraph style. A plain text character style is then applied to one of the words (ct1) followed by an underline character style being applied to part of that same word (ct2). The three versions of the paragraph are: > > 1) Remove bold > 2) Remove bold > 3) Remove bold > > Assuming that the relevant paragraph and character styles have been defined, the three versions of the paragraph can be represented by: > > 1) > <text:p text:style-name="BoldStyle"> > Remove bold > </text:p> > > 2) > <text:p text:style-name="BoldStyle"> > Remove > <text:span text:style-name="NormalStyle">bold</text:span> > </text:p> > > 3) > <text:p text:style-name="BoldStyle"> > Remove > <text:span text:style-name="NormalStyle">b</text:span> > <text:span text:style-name="UnderlinedNormalStyle">ol</text:span> > <text:span text:style-name="NormalStyle">d</text:span> > </text:p> > > The above representation is the cause for the potential problem because the central span mixes the effects of ct1 and ct2 in such a way that it is no longer possible, with either the ECT or GCT representation, to reject ct1 as a standalone operation. A rejection of ct1 should leave the word 'bold' in a bold face with the letters 'ol' underlined. > > The ECT representation of change for this paragraph is given as: > <text:p text:style-name="BoldStyle"> > Remove > <ct:format-change-start ct:id="1"/> > <text:span text:style-name="NormalStyle">b</text:span> > <ct:format-change-end ct:id="1"/> > <ct:format-change-start ct:id="2"/> > <text:span text:style-name="UnderlinedNormalStyle">ol</text:span> > <ct:format-change-end ct:id="2"/> > <ct:format-change-start ct:id="1"/> > <text:span text:style-name="NormalStyle">d</text:span> > <ct:format-change-end ct:id="1"/> > </text:p> > > In order to reject ct1, the spans contained immediately within start/end markers for ct1 should be removed (as I understand it) which would leave 'b' and 'd' as bold text but 'ol' would remain as plain underlined text. In fact, with the ECT representation, rejecting ct2 in the same way would leave 'b' and 'd' as plain text but would incorrectly revert 'ol' to bold text at the same time as removing the underline. > > The GCT representation is more verbose but still has the same problem that ct1 cannot be rejected on its own. In this case, ct2 can be rejected correctly. > > If an alternative representation using nested spans was used, the problem could be avoided: > <text:p text:style-name="BoldStyle"> > Remove > <text:span text:style-name="NormalStyle">b<text:span text:style-name="UnderlinedStyle">ol</text:span>d</text:span> > </text:p> > > If this representation is used, ct1 can be rejected (by removing the outer span), leaving the 'ol' correctly underlined but bold. This would work correctly in both proposals as far as I can tell. > > The three-span representation, which is used by OpenOffice.org at least (I haven't tested other implementations), although visually correct, does not do very well at representing the editing operations that have occurred and so cannot be tracked well (the relationship between change tracking and editing operations has been well-establisehd in previous discussions). We need to consider how to advise implementors of the best way to serialize changes such as these in order to best facilitate change tracking. > > -- > Tristan Mitchell, DeltaXML Ltd "Change control for XML" > T: +44 1684 869 035 E: tristan.mitchell@deltaxml.com http://www.deltaxml.com > Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK > > > > > > -- Tristan Mitchell, DeltaXML Ltd "Change control for XML" T: +44 1684 869 035 E: tristan.mitchell@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK


  • 4.  Re: [office-collab] Serialization and change tracking

    Posted 06-22-2011 14:46
    UC1 These discussions raise an interesting issue, independent of the representation. Having done some experiments with OOo and Word, it appears that if (in a bold paragraph per UC1) three words are selected and made normal, then the middle one is made underlined, the changes can be undone in reverse order in the editor session but when saved in OOo or change tracked in Word, the result is three independent format spans. So it is not possible to undo the edits that were actually made. So it seems that what is actually change tracked in Word (note I do not have the latest version, am using Word:mac 2008) could be represented as this: <text:p text:style-name= BoldStyle >Remove     <text:span text:style-name= NormalStyle delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>b</text:span>     <text:span text:style-name= UlNormalStyle delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct2'>ol</text:span>     <text:span text:style-name= NormalStyle delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>d</text:span> </text:p> If ct2 is undone, the 'ol' returns to be BoldStyle, which is not strictly correct. On the other hand, because all three spans are 'independent', the formatting on each can be undone independently. This is simpler to represent and arguably simpler for the user to understand. A better representation would be the equivalent of the John's ECT representation (the improved one in later email 21/06/2011), and in GCT this is as below, where the formatting of 'ol' reverts more correctly back to NormalStyle if ct2 is undone. <text:p text:style-name= BoldStyle >Remove     <text:span text:style-name= NormalStyle delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>b</text:span>     <text:span text:style-name= UlNormalStyle ac:change001= ct2,modify,text:style-name,NormalStyle”     delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>ol</text:span>     <text:span text:style-name= NormalStyle delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>d</text:span> </text:p> Then as Tristan points out the nested representation is closer to what the user did, as follows in GCT: <text:p text:style-name= BoldStyle >   Remove   <text:span text:style-name= NormalStyle   delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>b<text:span text:style-name= UnderlinedStyle   delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct2'>ol</text:span>d</text:span> </text:p> So, this raises the question of what we are trying to represent in the CT format: 1. the representation of the changes in the internal data structure of an editing application or 2. an enhanced version of this representation which allows reverse-order undo 3. something closer to the actions of the user making the edit, which in this example could be undone in any order (though probably not in more complex cases) What are we aiming for? On 22/06/2011 11:10, Tristan Mitchell wrote: 83D9B6E9-827D-4AAB-8466-DE3553709CBB@deltaxml.com type= cite > John, Apologies, I had not picked up on the cached previous style state in the example and so yes, as you say, there are no issues with rejecting ct2 with the ECT example. I'm not sure I agree with you on what style the 'ol' should be in after rejecting just ct1. I suppose it depends on how the change is made in the first place. The starting state of the text is that it inherits the bold paragraph style. There is more than one way to (sensibly) get to the final stage: .snip 83D9B6E9-827D-4AAB-8466-DE3553709CBB@deltaxml.com type= cite > --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd Change control for XML T: +44 1684 592 144 E: robin.lafontaine@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK


  • 5.  RE: [office-collab] Serialization and change tracking

    Posted 06-22-2011 17:59
    Tristan: > I suppose it depends on how the change is made in the first place. Yes, I think that’s the rub.  Off the top of my head, I think it varies based on how the style change is effected in markup – along the lines of, is it specified as a delta (“add underline to whatever style is there”) or as a fully-specified style (“normal text, underlined”).  In the first case, rejecting ct1 reverts the “whatever style is there” and leaves an underline on bold.  In the second, rejecting ct1 doesn’t change the fact that “ol” is “normal text, underlined”.  So, as with many things, it looks like application behavior.  But since the two approaches above (which I believe are both possible) result in different markup, I think it’s interoperable and not idiosyncratic to a given app.     Robin: So, this raises the question of what we are trying to represent in the CT format: 1. the representation of the changes in the internal data structure of an editing application or 2. an enhanced version of this representation which allows reverse-order undo 3. something closer to the actions of the user making the edit, which in this example could be undone in any order (though probably not in more complex cases) From my/our perspective, I think we’ve noted before that we feel arbitrary-order rejection is important.  I think this is the dominant user mode as documents are created, edited, passed around, edited more, etc.  That said, there seems to be a tradition among some ODF implementations of maintaining a clear notion of order of changes, at least for presentation to the user even though they seem to support both ordered and arbitrary rejection.  I’ve mentioned elsewhere that I think it’s important to be closer to what users are doing so that those usage scenarios are supportable by the file format.  As long as the format supports data storage for an implementation to interpret it within whatever scope of features it chooses to support, we’re in good shape.  I think that still leads us closer to 3 than 1 without disallowing 2’s reverse-order handling.     From: Robin LaFontaine [mailto:robin.lafontaine@deltaxml.com] Sent: Wednesday, June 22, 2011 7:46 AM To: office-collab@lists.oasis-open.org Subject: Re: [office-collab] Serialization and change tracking   UC1 These discussions raise an interesting issue, independent of the representation. Having done some experiments with OOo and Word, it appears that if (in a bold paragraph per UC1) three words are selected and made normal, then the middle one is made underlined, the changes can be undone in reverse order in the editor session but when saved in OOo or change tracked in Word, the result is three independent format spans. So it is not possible to undo the edits that were actually made. So it seems that what is actually change tracked in Word (note I do not have the latest version, am using Word:mac 2008) could be represented as this: <text:p text:style-name="BoldStyle">Remove     <text:span text:style-name="NormalStyle" delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>b</text:span>     <text:span text:style-name="UlNormalStyle" delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct2'>ol</text:span>     <text:span text:style-name="NormalStyle" delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>d</text:span> </text:p> If ct2 is undone, the 'ol' returns to be BoldStyle, which is not strictly correct. On the other hand, because all three spans are 'independent', the formatting on each can be undone independently. This is simpler to represent and arguably simpler for the user to understand. A better representation would be the equivalent of the John's ECT representation (the improved one in later email 21/06/2011), and in GCT this is as below, where the formatting of 'ol' reverts more correctly back to NormalStyle if ct2 is undone. <text:p text:style-name="BoldStyle">Remove     <text:span text:style-name="NormalStyle" delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>b</text:span>     <text:span text:style-name="UlNormalStyle" ac:change001="ct2,modify,text:style-name,NormalStyle”     delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>ol</text:span>     <text:span text:style-name="NormalStyle" delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>d</text:span> </text:p> Then as Tristan points out the nested representation is closer to what the user did, as follows in GCT: <text:p text:style-name="BoldStyle">   Remove   <text:span text:style-name="NormalStyle"   delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct1'>b<text:span text:style-name="UnderlinedStyle"   delta:insertion-type='insert-around-content' delta:insertion-change-idref='ct2'>ol</text:span>d</text:span> </text:p> So, this raises the question of what we are trying to represent in the CT format: 1. the representation of the changes in the internal data structure of an editing application or 2. an enhanced version of this representation which allows reverse-order undo 3. something closer to the actions of the user making the edit, which in this example could be undone in any order (though probably not in more complex cases) What are we aiming for? On 22/06/2011 11:10, Tristan Mitchell wrote: John,   Apologies, I had not picked up on the cached previous style state in the example and so yes, as you say, there are no issues with rejecting ct2 with the ECT example.   I'm not sure I agree with you on what style the 'ol' should be in after rejecting just ct1. I suppose it depends on how the change is made in the first place. The starting state of the text is that it inherits the bold paragraph style. There is more than one way to (sensibly) get to the final stage:   .snip   --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php   -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd  "Change control for XML" T: +44 1684 592 144  E: robin.lafontaine@deltaxml.com       http://www.deltaxml.com       Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 6.  Re: [office-collab] Serialization and change tracking

    Posted 06-24-2011 10:03
    John: Thanks for your comments, it does make sense to allow arbitrary ordered rejection, though it is a bit of a challenge. Although the nested span solution does work for the simple example we have here, it does not scale to more complex situations. I have been trying to work out if there is a way round this. I think that there may be if we can make two rules as follows: 1. Spans cannot be nested. 2. Two consecutive spans that have exactly the same attributes is semantically equivalent to a single span with those attributes. (There is also the rule that a span with no attributes is semantically equivalent to no span... but I think that is the case now.) This allows us to change the start position, which I think is the key to allowing a scalable solution. Currently, we have a start position as follows: <text:p text:style-name= BoldStyle >Remove bold</text:p> However, if we can change this to  a start position like this: <text:p text:style-name= BoldStyle >Remove     <text:span>b</text:span>     <text:span>ol</text:span>     <text:span>d</text:span> </text:p> Then our problem is reduced to simply specifying a history of attributes on these spans. Therefore the solution would look something like this in GCT (you will do a better job of the equivalent ECT than I can): <text:p text:style-name= BoldStyle >Remove     <text:span text:style-name= NormalStyle   ac:change001= ct1,add,text:style-name >b</text:span>     <text:span text:style-name= UlNormalStyle    ac:change001= ct1,add,text:style-name     ac:change002= ct2,modify,text:style-name,NormalStyle >ol</text:span>     <text:span text:style-name= NormalStyle ac:change001= ct1,add,text:style-name >d</text:span> </text:p> This solution can be scaled up to any arbitrary changes of text formatting for any combination of character sequences. It also allows rejection in an arbitrary order, although some calculations would have to be done to effect this. Robin On 22/06/2011 18:59, John Haug wrote: 91C4760493E4094B9871E5A496374DA232198CB3@DF-M14-01.exchange.corp.microsoft.com type= cite > Robin: So, this raises the question of what we are trying to represent in the CT format: 1. the representation of the changes in the internal data structure of an editing application or 2. an enhanced version of this representation which allows reverse-order undo 3. something closer to the actions of the user making the edit, which in this example could be undone in any order (though probably not in more complex cases) From my/our perspective, I think we’ve noted before that we feel arbitrary-order rejection is important.  I think this is the dominant user mode as documents are created, edited, passed around, edited more, etc.  That said, there seems to be a tradition among some ODF implementations of maintaining a clear notion of order of changes, at least for presentation to the user even though they seem to support both ordered and arbitrary rejection.  I’ve mentioned elsewhere that I think it’s important to be closer to what users are doing so that those usage scenarios are supportable by the file format.  As long as the format supports data storage for an implementation to interpret it within whatever scope of features it chooses to support, we’re in good shape.  I think that still leads us closer to 3 than 1 without disallowing 2’s reverse-order handling.     ..snip -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd Change control for XML T: +44 1684 592 144 E: robin.lafontaine@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK


  • 7.  Re: [office-collab] Serialization and change tracking

    Posted 06-26-2011 10:39
    On Fri, 2011-06-24 at 11:02 +0100, Robin LaFontaine wrote: John: Thanks for your comments, it does make sense to allow arbitrary ordered rejection, though it is a bit of a challenge. Although the nested span solution does work for the simple example we have here, it does not scale to more complex situations. I have been trying to work out if there is a way round this. I think that there may be if we can make two rules as follows: 1. Spans cannot be nested. 2. Two consecutive spans that have exactly the same attributes is semantically equivalent to a single span with those attributes. (There is also the rule that a span with no attributes is semantically equivalent to no span... but I think that is the case now.) This allows us to change the start position, which I think is the key to allowing a scalable solution. Currently, we have a start position as follows: <text:p text:style-name="BoldStyle">Remove bold</text:p> However, if we can change this to  a start position like this: <text:p text:style-name="BoldStyle">Remove     <text:span>b</text:span>     <text:span>ol</text:span>     <text:span>d</text:span> </text:p> Then our problem is reduced to simply specifying a history of attributes on these spans. Therefore the solution would look something like this in GCT (you will do a better job of the equivalent ECT than I can): <text:p text:style-name="BoldStyle">Remove     <text:span text:style-name="NormalStyle"  ac:change001="ct1,add,text:style-name">b</text:span>     <text:span text:style-name="UlNormalStyle"   ac:change001="ct1,add,text:style-name"     ac:change002="ct2,modify,text:style-name,NormalStyle">ol</text:span>     <text:span text:style-name="NormalStyle" ac:change001="ct1,add,text:style-name">d</text:span> </text:p> This solution can be scaled up to any arbitrary changes of text formatting for any combination of character sequences. It also allows rejection in an arbitrary order, although some calculations would have to be done to effect this. FWIW this is fairly much the serialization I arrived at for abiword. Though I arrived at this serialization with the help of Frank Meies when trying to code example 6.4.2 from the GCT [2]. A slightly cleaned up version of the ODT that abiword [1] currently produces for the changes is below. The edits to produce this were as follows. Note that I have changed the style-name strings in order to give their meanings implicitly. Edits: id=1, Remove Bold id=2, Remove Bold id=3, Remove B ol d ODT Fragment: <text:p text:style-name="Normal" delta:insertion-type="insert-with-content" delta:insertion-change-idref="1" >     <text:span text:style-name="bold" delta:insertion-change-idref="1"> Remove </text:span>     <text:span text:style-name="norm" delta:insertion-change-idref="2"                ac:change3="1,insert,text:style-name,"                ac:change4="2,modify,text:style-name,bold"> B </text:span>     <text:span text:style-name="normUL" delta:insertion-change-idref="3"                ac:change4="1,insert,text:style-name,"                ac:change5="2,modify,text:style-name,bold"                ac:change6="3,modify,text:style-name,norm"> ol </text:span>     <text:span text:style-name="norm" delta:insertion-change-idref="2"                ac:change3="1,insert,text:style-name,"                ac:change4="2,modify,text:style-name,bold"> d </text:span> </text:p> change-id=3 is simple to undo here. To undo a change that is midway through an ac:change set is a little more complex as you have to shuffle its attribute value down in the ac:change list. To illustrate, assume the following change-id=N, edited text sequence: id=1, foo id=2, foo id=3, foo id=6, foo This should give a text:span with these GCT attributes: text:style-name="underlined" ac:change4="1,insert,text:style-name," ac:change5="2,modify,text:style-name,bold" ac:change6="3,modify,text:style-name,norm" ac:change7="6,modify,text:style-name,ital" To revert id=3 the edit sequence then changes to the following: id=1, foo id=2, foo id=3, foo id=6, foo The ODT attributes will have the ac:change6 removed, but its attribute value will have to replace that of the subsequent ac:change, giving the edits: text:style-name="underlined" ac:change4="1,insert,text:style-name," ac:change5="2,modify,text:style-name,bold" ac:change6="3,modify,text:style-name,norm" ac:change7="6,modify,text:style-name, ital norm" The of course avoids the sticky situation where one might want to apply change tracking to the reversion of a change. [1] https://github.com/monkeyiq/odf-2011-track-changes-git-svn [2] http://lists.oasis-open.org/archives/office-collab/201104/msg00046.html Robin On 22/06/2011 18:59, John Haug wrote: Robin: So, this raises the question of what we are trying to represent in the CT format: 1. the representation of the changes in the internal data structure of an editing application or 2. an enhanced version of this representation which allows reverse-order undo 3. something closer to the actions of the user making the edit, which in this example could be undone in any order (though probably not in more complex cases) From my/our perspective, I think we’ve noted before that we feel arbitrary-order rejection is important.  I think this is the dominant user mode as documents are created, edited, passed around, edited more, etc.  That said, there seems to be a tradition among some ODF implementations of maintaining a clear notion of order of changes, at least for presentation to the user even though they seem to support both ordered and arbitrary rejection.  I’ve mentioned elsewhere that I think it’s important to be closer to what users are doing so that those usage scenarios are supportable by the file format.  As long as the format supports data storage for an implementation to interpret it within whatever scope of features it chooses to support, we’re in good shape.  I think that still leads us closer to 3 than 1 without disallowing 2’s reverse-order handling.     ..snip -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd "Change control for XML" T: +44 1684 592 144 E: robin.lafontaine@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 8.  Re: [office-collab] Serialization and change tracking

    Posted 07-01-2011 08:25
    Interesting that you came to this representation via another route, good to hear that it does make sense to you. I think we are possibly being distracted by trying to go from any arbitrary document to another (which GCT can do) but there are sometimes simpler ways to achieve the same thing. Maybe similar techniques can be applied to other areas. See comment below about the detail of your representation. On 26/06/2011 11:38, monkeyiq wrote: On Fri, 2011-06-24 at 11:02 +0100, Robin LaFontaine wrote: ..snip FWIW this is fairly much the serialization I arrived at for abiword. Though I arrived at this serialization with the help of Frank Meies when trying to code example 6.4.2 from the GCT [2]. A slightly cleaned up version of the ODT that abiword [1] currently produces for the changes is below. The edits to produce this were as follows. Note that I have changed the style-name strings in order to give their meanings implicitly. Edits: id=1, Remove Bold id=2, Remove Bold id=3, Remove B ol d ODT Fragment: <text:p text:style-name= Normal delta:insertion-type= insert-with-content delta:insertion-change-idref= 1 >       <text:span text:style-name= bold delta:insertion-change-idref= 1 > Remove </text:span>       <text:span text:style-name= norm delta:insertion-change-idref= 2                             ac:change3= 1,insert,text:style-name,                             ac:change4= 2,modify,text:style-name,bold > B </text:span>       <text:span text:style-name= normUL delta:insertion-change-idref= 3                             ac:change4= 1,insert,text:style-name,                             ac:change5= 2,modify,text:style-name,bold                             ac:change6= 3,modify,text:style-name,norm > ol </text:span>       <text:span text:style-name= norm delta:insertion-change-idref= 2                             ac:change3= 1,insert,text:style-name,                             ac:change4= 2,modify,text:style-name,bold > d </text:span> </text:p> change-id=3 is simple to undo here. To undo a change that is midway through an ac:change set is a little more complex as you have to shuffle its attribute value down in the ac:change list. ..snip Looking at your cleaned up version, I think the representation should be a bit simpler than what you propose - currently you have the attribute delta:insertion-change-idref= n on the text:p and also on the text:span(s) within it where it is not needed - I think your intention was that the document after change 1 but before change 2 is like this (assuming change 1 is to add the whole paragraph as suggested by the insert-with-content attribute, rather than to make 'Remove bold' all bold): <text:p text:style-name= Normal delta:insertion-type= insert-with-content delta:insertion-change-idref= 1 >       <text:span text:style-name= bold >Remove </text:span>       <text:span text:style-name= bold >B</text:span>       <text:span text:style-name= bold >ol</text:span>       <text:span text:style-name= bold >d</text:span> </text:p> In other words, all the spans are there as part of the original content because the application knows, before it serializes the data, all the change tracking it needs to record and so it can decided how to split up the text to support this. Then you get a simpler result, moving to change 2 we have: <text:p text:style-name= Normal delta:insertion-type= insert-with-content delta:insertion-change-idref= 1 >       <text:span text:style-name= bold >Remove </text:span>       <text:span text:style-name= norm                             ac:change4= 2,modify,text:style-name,bold >B</text:span>       <text:span text:style-name= norm                               ac:change5= 2,modify,text:style-name,bold >ol</text:span>       <text:span text:style-name= norm                             ac:change4= 2,modify,text:style-name,bold >d</text:span> </text:p> and moving to change 3: <text:p text:style-name= Normal delta:insertion-type= insert-with-content delta:insertion-change-idref= 1 >       <text:span text:style-name= bold >Remove </text:span>       <text:span text:style-name= norm                             ac:change4= 2,modify,text:style-name,bold >B</text:span>       <text:span text:style-name= normUL                               ac:change5= 2,modify,text:style-name,bold                             ac:change6= 3,modify,text:style-name,norm >ol</text:span>       <text:span text:style-name= norm                             ac:change4= 2,modify,text:style-name,bold >d</text:span> </text:p> Robin -- -- ----------------------------------------------------------------- Robin La Fontaine, Director, DeltaXML Ltd Change control for XML T: +44 1684 592 144 E: robin.lafontaine@deltaxml.com http://www.deltaxml.com Registered in England 02528681 Reg. Office: Monsell House, WR8 0QN, UK