OpenDocument - Adv Document Collab SC

 View Only
  • 1.  GCT delta:merge and paragraphs

    Posted 06-22-2011 03:59
    Hi,   I've been considering paragraph deletion and merging in the context of the GCT. In particular, the abiword implementation and how to track document changes in the data model to allow simpler IO paths. I would imagine the abstract details to be of interest to others, especially corner cases which I've not considered yet, thus this email. Consider an implementation of the GCT which uses metadata to explicitly track the cases when a paragraph has its start and end deleted. Such a case might call for merging two paragraphs using a delta:merge, possibly with intermediate content deleted. It seems that tracking such paragraph merges is best to be performed at the time of content deletion. So the user has a selection in the document and presses the delete key, the program then works out the best response to tracking this in its internal data model. I am thinking that the end of paragraph deleted marker associated with a paragraph in some way, para-end-deleted is set to true if the deletion of the selection can coalesce the next object at the same document scope: * this is the case if deleting from a para to another para * this is the case if deleting from a para through an entire table to another para * this is NOT the case if deleting from a para into an image (draw:frame). In this case the para content to its end is deleted and the image is wrapped in delta:removed-content. * this is NOT the case if deleting from a para to a para in a table or to a para in another cell of this or another table. In this case starting with para1, para2...paran, tableA, ... celly If the selection extends from ra1 through into celly then The content of para1 "ra1" is deleted with delta:removed-content. para2...paran are enclosed in delta:removed-content to be deleted. Each cell in the selection which from tableA is handled as a subdocument. When in a table cell, deletion is treated as though it operates on a subdocument: * The first and last paragraph of the cell can not have their start/end of paragraph deleted markers set to true. * As an example, if a cell has three paragraphs in it and is entirely contained in the selection to be deleted then it might be delta:merged to a single paragraph, or perhaps since the whole cell content is deleted the old content might be wrapped in a delta:removed-content and a fresh text:p inserted at this change-id. Another case, if the selection starts from a table cellx and extends beyond that table, including many top level paragraphs (p4,5,6) and into another subsequent table to celly. One would like to contain the paragraphs (p4,5,6) in a single <delta:removed-content> element rather than trying to start a delta:merge on them. To handle the outright deletion case for p4,5,6 the paragraph might also have metadata with a removed-content=change-id. When such a paragraph is encountered during a save a single delta:removed-content can be emitted to contain this and contiguous subsequent paragraphs with a deleted change-id that is the same value. A similar para-start-deleted tag might be set to true if this para is in the selection and the previous para is contained in the selection and has para-end-deleted = true. The paragraphs must be at the same document level. For example, in the same table cell or at top level. I was throwing around table cells. Deletion across a cell boundary either implies deletion of the end content in one cell and the start content in the next or a cell merge. The later operation is normally performed explicitly through a menu selection. Considering each table cell as a subdocument does raise the question of table cell merge and paragraphs. The case of table cell merge seems to give this markup, starting at an original document fragment: <table:table-row>   <table:table-cell table:style-name="Table1.A1" >     <text:p text:style-name="Table_20_Contents">Merged cell</text:p>   </table:table-cell>   <table:table-cell table:style-name="Table1.B1" >     <text:p text:style-name="Table_20_ContentsA">c2</text:p>     <text:p text:style-name="Table_20_ContentsB">c2p2</text:p>   </table:table-cell>   <table:table-cell table:style-name="Table1.C1" >     <text:p text:style-name="Table_20_Contents">c3</text:p>   </table:table-cell> </table:table-row> The following ODT fragment will result after merging A1 with B1. Note that there are some newlines in the below for readability which will need to be removed as to not alter the document content. Looking bottom up, the old cell c2 is wrapped in a delta:removed-content to remove it. The text:p content that comprises cell2 is effectively moved to the first cell. So the old text:p elements in cell2 each contain a delta:move-id to track that. Perhaps the table cell itself might have a move-id attached so things could reference it instead of the subelement. IMHO it would seem nicer to reference the old identical element though, so the new text:p move-idref's directly to the old text:p rather than a parent of the old element. The table:covered-table-cell is inserted-with-content to mark the old cell location. The new text:p in the first table cell contains the old content of cell2 with idrefs to the old content. As there were two text:p in the old cell2 then there are two new text:p in cell1, each with a move-idref to its respective old text:p. The first table cell also has an ac:change on its table:number-columns-spanned attribute. <table:table-row>   <table:table-cell table:style-name="Table1.A1"       ac:change001="2,insert,table:number-columns-spanned"       table:number-columns-spanned="2" >     <text:p text:style-name="Table_20_Contents">Merged cell</text:p>     <text:p        delta:insertion-type="insert-with-content"        delta:move-idref="mv33">        c2     </text:p>     <text:p        delta:insertion-type="insert-with-content"        delta:move-idref="mv34">        c2p2     </text:p>   </table:table-cell>   <table:covered-table-cell      delta:insertion-type="insert-with-content"      delta:insertion-change-idref="2"/>   <delta:removed-content delta:removal-change-idref="2">     <table:table-cell table:style-name="Table1.B1">       <text:p text:style-name="Table_20_ContentsA" delta:move-id="mv33">c2</text:p>       <text:p text:style-name="Table_20_ContentsB" delta:move-id="mv34">c2p2</text:p>     </table:table-cell>   </delta:removed-content>   <table:table-cell table:style-name="Table1.C1" >     <text:p text:style-name="Table_20_Contents">c3</text:p>   </table:table-cell> </table:table-row>


  • 2.  Re: GCT delta:merge and paragraphs

    Posted 06-30-2011 10:59
    On Wed, 2011-06-22 at 13:58 +1000, monkeyiq wrote: Hi,   I've been considering paragraph deletion and merging in the context of the GCT. In particular, the abiword implementation and how to track document changes in the data model to allow simpler IO paths. I would imagine the abstract details to be of interest to others, especially corner cases which I've not considered yet, thus this email.   Replying to my previous email regarding markup for a delta:merge at the time when a selection exists and delete/backspace is pressed. This is mainly a brain dump to help others who might take the same implementation route in the future.   One might also like to read the method documentation for pt_PieceTable::deleteSpanChangeTrackingAreWeMarkingDeltaMerge() in the future in case new cases are discovered and added to the code. My github has the current state of things, but in the somewhat near future one should also check git master for abiword. https://github.com/monkeyiq/odf-2011-track-changes-git-svn/blob/2fbefdb33b9b957302b7e23947ae0362a65bc8c7/src/text/ptbl/xp/pt_PT_DeleteSpan.cpp#L255 Also perhaps of interest, the abiword gct test suite contains a bunch of documents which can be used to verify we are both doing what we should. The repository contains abw format documents which can be turned into ODF using a build of abiword with GCT enabled (the above git repo or git master when merged). The command to use is: $ abiword -t odt -o /tmp/output.odt  input-abiword-document.abw Documents of interest: https://github.com/monkeyiq/odf-2011-track-changes-tests/tree/master/para-split-and-merge There are other directories to test other functionality such as ac:change etc.   To lift a more abstract description than the header comments of the method cited above offers, I consider the range (startpos,endpos) in the document for the selection to see if the deletion of this range would constitute a delta:merge being used if the document were serialized as ODF. If it does then suitable markup is added during the deletion. The basic rules; (1) It is not a delta merge if startpos and endpos are both fully contained in the same paragraph. (2) It is not a delta merge if the paragraph(startpos) is not in the same table cell as paragraph(endpos). If both these paragraphs are in no cell then it might yet be a delta merge. See point (6) if you think this is strange. (3) If deleting from the start or right to the very end of a paragraph X and the process results in deleting paragraph X entirely, I prefer to serialize using removed-content rather than using a delta:merge, thus these two positions form a special case and their appearance means it is not a delta:merge. (4) Some of the office apps I've worked on use a special in document marker to delimit paragraphs and other content. Abiword uses what is logically a 1 byte marker for the start of a paragraph which lives at the end of the last line of the previous paragraph. For example the ( $ ) position is not shown visually and would be the marker to indicate the second paragraph begins. This is para1 $ This is the second paragraph, with only one sentence too. In this case, if one is at the start of the second line or the end of the first line and presses backspace or delete respectively then it is a paragraph merge and should use a delta:merge. Likewise if a selection starts at the $ and extends into (but not to the end of) the paragraph then this is also a delta:merge. For example, if the bold and red is the selection, This is para1 $ This is the second paragraph, with only one sentence too. One gets the result: This is para1 with only one sentence too. And the ODF might be (without newlines and added whitespace): <text:p delta:insertion-type="insert-with-content" delta:insertion-change-idref="1">   This is para1   <delta:merge delta:removal-change-idref="2">            <delta:leading-partial-content></delta:leading-partial-content>     <delta:intermediate-content></delta:intermediate-content>     <delta:trailing-partial-content>       <text:p delta:insertion-type="split" delta:insertion-change-idref="1"> This is the second paragraph,       </text:p>     </delta:trailing-partial-content>          </delta:merge>   with only one sentence too. </text:p> (5) One might consider the case where the selection extends between two (or more) cells in a table as seen below. The selection might be formed as the red and bold text. This can be treated as two cases of (3) because to get out of the first cell we have selected "right to the end" of the last paragraph and to get into the second cell we have selected "right from the start" of the first paragraph in cell2. cell1 para1 c1pa ra2 c1para3 cell2 para1 cell 2p2 c2endpara (6) If the two ODF XML elements can't legally coalesce then it is not a delta merge. For example, somehow selecting from a paragraph into the caption of a subsequent image. As always, the code is the true expression of things and apologies if I missed something in this overview. FWIW I find attaching start, end, and whole deleted markers and the change-id that these occurred in to the 1-byte paragraph markers quite effective for serialization to/from ODF. Maybe some of this can be rolled into the GCT document to help other implementers. Of course I'd have to clean it up for readability and drop in some more ODF fragments. At least it's on a public mailing list already ;)