OASIS Open Document Format for Office Applications (OpenDocument) TC

 View Only
  • 1.  Forensics: Working with the ODF 1.1 ODT version

    Posted 03-28-2011 16:53
      |   view attached
    I have had an unexpected glitch in my wanting to start from the editable ODF Text version of the OASIS ODF 1.1 Standard. The versions of ODF consumers that I have do not match the pagination that is shown in the PDF of the OASIS ODF 1.1 Standard. The difference is substantial. The ODF consumers I have used all end up with fewer pages (719 pages in OO.o 2.4.1, 720 pages in OO.o 3.2.0, 718 pages in LibreOffice 3.3.2) instead of 738 pages in the PDFand it appears to be related to how automatic page breaks are done. The screen capture illustrates the 1-page difference between OO.o 2.4.1 and LOffice 3.3.2. In 2.4.1, section 9 is one page longer. The creeping that ends up with one extra page begins in section 9.9 in the page break just before the Event Effect subsection. (OO.o 3.2.0 on a different machine has the break still different.) There may have been other differences in page breaks before this, but none of them changed the table of contents page numbers for any of the indexed headings. As discussed on today's call, this is not a material problem for creating an Errata 01 so long as we don't continue the practice of the ODF 1.0 Errata 01 of using page and line numbers. I agree, we can provide errata item description in a way that do not require page and line references, and I had intended to do it that way. However, the problem with pagination differences is that I am then left to wonder what else is being done differently. For that reason, I would much prefer to use some sort of style or other adjustments to have a pagination that is preserved in interchange among ODF consumers and that leads to a matching PDF. In fact, in examining shorter pagination in a current ODF consumer, I saw that some table breaks were being presented incorrectly. This is apparently a bug in some consumers and I need to work around it. (And this is why it is important to understand what the pagination differences are all symptoms of and finding some way to mitigate them.) - Dennis PS: Here are details of the bug as something to watch out for when working on ODF TC specifications. I am not sure how to report it and to whom, but here are the symptoms: 1. A table is split between two pages. The table is styled so that table rows are not themselves split, and headings are also preserved on continuation pages. 2. Sometimes, the table is split such that the last table row before the split appears to have disappeared. 3. Apparently the last row retained before the split is there, but it extends below the page body frame such that it is not visible. In the case I observed, none of the clipped row was visible at all, but it was observable that the top of another row was there (because the vertical border lines of the table could be seen extending below the last-visible row). 4. That's probably enough to figure out what the edge case is. Of course, this bug is hard to report unless I capture a screen shot or a PDF fragment that shows the incorrect presentation. If I see any of those again I will capture screen shots. 5. There are other differences in how schema fragments are split across pages. They can be cured by heavy-handed forcing of breaks but not by anything I've been able to do by adjusting styling or page parameters (footer distance for example). (I notice that the jitter between schema fragment lines is also present but not always exactly the same compare do the original.) 6. Just to demonstrate that I am not picking on the OO.o code base alone, I also opened the ODF 1.1 standard in Microsoft Office 2010 Word and got an even shorter document, although updating the table of contents makes it almost match the PDF in terms of the last page number, but only because the TOC blew up to where it ended on page 60 of 739. The Office 2010 version is not useable for my purposes because of other formatting discrepancies, especially for the schema fragments. PPS: I can imagine that there is something about lining up the pixels and font rendering that leads to these discrepancies of pagination, even though I have the same default printer in all cases. (But Windows 7 doesn't use the printer manufacturer's drivers any longer.) As David Wheeler mentioned on the call, these variations are well-known. My problem is that I can't tell whether the deviations hide more-material discrepancies such as defects in inter-page table splitting. This also is what has bothered me about having the editable form as the authoritative edition. There is no assurance that different users will see the same authoritative text. I think fidelity should trump reusability and I have no evidence that the PDF renditions of these documents are worse in interchange than what I am seeing here.


  • 2.  Re: Forensics: Working with the ODF 1.1 ODT version

    Posted 03-28-2011 23:57
    Dennis, Your statement below: > As discussed on today's call, this is not a material problem for creating an Errata 01 so long as we don't continue the practice of the ODF 1.0 Errata 01 of using page and line numbers. I agree, we can provide errata item description in a way that do not require page and line references, and I had intended to do it that way. Captures all we need to do, at least as far as the errata. Hope you are having a great day! Patrick PS: The presentation "bug" you have discovered between implementations points, at least to me, to the need for more specifics with regard to page and layout models. So that users can rely upon getting the same layout despite different engines or applications. At a much cruder level, the same way that basic SVG rendering is the same no matter which SVG engine you use, at least for basic SVG. Personally I think that level of "interoperable" display is what users are going to interpret as interoperability to no small degree. On 3/28/2011 12:52 PM, Dennis E. Hamilton wrote: > I have had an unexpected glitch in my wanting to start from the editable ODF Text version of the OASIS ODF 1.1 Standard. The versions of ODF consumers that I have do not match the pagination that is shown in the PDF of the OASIS ODF 1.1 Standard. The difference is substantial. The ODF consumers I have used all end up with fewer pages (719 pages in OO.o 2.4.1, 720 pages in OO.o 3.2.0, 718 pages in LibreOffice 3.3.2) instead of 738 pages in the PDFand it appears to be related to how automatic page breaks are done. > > The screen capture illustrates the 1-page difference between OO.o 2.4.1 and LOffice 3.3.2. In 2.4.1, section 9 is one page longer. The creeping that ends up with one extra page begins in section 9.9 in the page break just before the Event Effect subsection. (OO.o 3.2.0 on a different machine has the break still different.) There may have been other differences in page breaks before this, but none of them changed the table of contents page numbers for any of the indexed headings. > > As discussed on today's call, this is not a material problem for creating an Errata 01 so long as we don't continue the practice of the ODF 1.0 Errata 01 of using page and line numbers. I agree, we can provide errata item description in a way that do not require page and line references, and I had intended to do it that way. > > However, the problem with pagination differences is that I am then left to wonder what else is being done differently. For that reason, I would much prefer to use some sort of style or other adjustments to have a pagination that is preserved in interchange among ODF consumers and that leads to a matching PDF. > > In fact, in examining shorter pagination in a current ODF consumer, I saw that some table breaks were being presented incorrectly. This is apparently a bug in some consumers and I need to work around it. (And this is why it is important to understand what the pagination differences are all symptoms of and finding some way to mitigate them.) > > - Dennis > > PS: Here are details of the bug as something to watch out for when working on ODF TC specifications. I am not sure how to report it and to whom, but here are the symptoms: > > 1. A table is split between two pages. The table is styled so that table rows are not themselves split, and headings are also preserved on continuation pages. > > 2. Sometimes, the table is split such that the last table row before the split appears to have disappeared. > > 3. Apparently the last row retained before the split is there, but it extends below the page body frame such that it is not visible. In the case I observed, none of the clipped row was visible at all, but it was observable that the top of another row was there (because the vertical border lines of the table could be seen extending below the last-visible row). > > 4. That's probably enough to figure out what the edge case is. Of course, this bug is hard to report unless I capture a screen shot or a PDF fragment that shows the incorrect presentation. If I see any of those again I will capture screen shots. > > 5. There are other differences in how schema fragments are split across pages. They can be cured by heavy-handed forcing of breaks but not by anything I've been able to do by adjusting styling or page parameters (footer distance for example). (I notice that the jitter between schema fragment lines is also present but not always exactly the same compare do the original.) > > 6. Just to demonstrate that I am not picking on the OO.o code base alone, I also opened the ODF 1.1 standard in Microsoft Office 2010 Word and got an even shorter document, although updating the table of contents makes it almost match the PDF in terms of the last page number, but only because the TOC blew up to where it ended on page 60 of 739. The Office 2010 version is not useable for my purposes because of other formatting discrepancies, especially for the schema fragments. > > PPS: I can imagine that there is something about lining up the pixels and font rendering that leads to these discrepancies of pagination, even though I have the same default printer in all cases. (But Windows 7 doesn't use the printer manufacturer's drivers any longer.) As David Wheeler mentioned on the call, these variations are well-known. My problem is that I can't tell whether the deviations hide more-material discrepancies such as defects in inter-page table splitting. This also is what has bothered me about having the editable form as the authoritative edition. There is no assurance that different users will see the same authoritative text. I think fidelity should trump reusability and I have no evidence that the PDF renditions of these documents are worse in interchange than what I am seeing here. > >


  • 3.  Re: [office] Forensics: Working with the ODF 1.1 ODT version

    Posted 03-29-2011 12:12
    Hi Dennis, I had a closer look at the ODF 1.1 ODF document: The final version of this document has been created with StarOffice 8 PU2 - equivalent with OOo 2.0.2 - under operating system Solaris Sparc. The used fonts are different on Solaris compared to the ones under e.g. Windows. Thus, I think it will be very hard to find an ODF consumer for Windows which will show the same layout result as the StarOffice 8 PU2/OOo 2.0.2 under the used Solaris system. I opened the document with OOo 2.0.2 under Windows 7 64bit and got 723 pages. My conclusion is that we should not rely our erratas on page and line references - as you already said. Ad your reported table split defects: I opened this document with OOo 3.3 under Windows 7 64bit. I investigated all tables which split at the end of a page. I did not experience any of the defects which you have reported. I recall that there are defects in the layout algorithm of the one or the other OOo 2.x version regarding splitting tables. These has been fixed. Thus I suggest to use the lastest release, in case that you want to use OOo (or any of the alternatives which are based on OOo). Best regards, Oliver. On 28.03.2011 18:52, Dennis E. Hamilton wrote: > I have had an unexpected glitch in my wanting to start from the > editable ODF Text version of the OASIS ODF 1.1 Standard. The > versions of ODF consumers that I have do not match the pagination > that is shown in the PDF of the OASIS ODF 1.1 Standard. The > difference is substantial. The ODF consumers I have used all end up > with fewer pages (719 pages in OO.o 2.4.1, 720 pages in OO.o 3.2.0, > 718 pages in LibreOffice 3.3.2) instead of 738 pages in the PDFand it > appears to be related to how automatic page breaks are done. > > The screen capture illustrates the 1-page difference between OO.o > 2.4.1 and LOffice 3.3.2. In 2.4.1, section 9 is one page longer. > The creeping that ends up with one extra page begins in section 9.9 > in the page break just before the Event Effect subsection. (OO.o > 3.2.0 on a different machine has the break still different.) There > may have been other differences in page breaks before this, but none > of them changed the table of contents page numbers for any of the > indexed headings. > > As discussed on today's call, this is not a material problem for > creating an Errata 01 so long as we don't continue the practice of > the ODF 1.0 Errata 01 of using page and line numbers. I agree, we > can provide errata item description in a way that do not require page > and line references, and I had intended to do it that way. > > However, the problem with pagination differences is that I am then > left to wonder what else is being done differently. For that reason, > I would much prefer to use some sort of style or other adjustments to > have a pagination that is preserved in interchange among ODF > consumers and that leads to a matching PDF. > > In fact, in examining shorter pagination in a current ODF consumer, I > saw that some table breaks were being presented incorrectly. This is > apparently a bug in some consumers and I need to work around it. > (And this is why it is important to understand what the pagination > differences are all symptoms of and finding some way to mitigate > them.) > > - Dennis > > PS: Here are details of the bug as something to watch out for when > working on ODF TC specifications. I am not sure how to report it and > to whom, but here are the symptoms: > > 1. A table is split between two pages. The table is styled so that > table rows are not themselves split, and headings are also preserved > on continuation pages. > > 2. Sometimes, the table is split such that the last table row before > the split appears to have disappeared. > > 3. Apparently the last row retained before the split is there, but it > extends below the page body frame such that it is not visible. In > the case I observed, none of the clipped row was visible at all, but > it was observable that the top of another row was there (because the > vertical border lines of the table could be seen extending below the > last-visible row). > > 4. That's probably enough to figure out what the edge case is. Of > course, this bug is hard to report unless I capture a screen shot or > a PDF fragment that shows the incorrect presentation. If I see any > of those again I will capture screen shots. > > 5. There are other differences in how schema fragments are split > across pages. They can be cured by heavy-handed forcing of breaks > but not by anything I've been able to do by adjusting styling or page > parameters (footer distance for example). (I notice that the jitter > between schema fragment lines is also present but not always exactly > the same compare do the original.) > > 6. Just to demonstrate that I am not picking on the OO.o code base > alone, I also opened the ODF 1.1 standard in Microsoft Office 2010 > Word and got an even shorter document, although updating the table of > contents makes it almost match the PDF in terms of the last page > number, but only because the TOC blew up to where it ended on page 60 > of 739. The Office 2010 version is not useable for my purposes > because of other formatting discrepancies, especially for the schema > fragments. > > PPS: I can imagine that there is something about lining up the pixels > and font rendering that leads to these discrepancies of pagination, > even though I have the same default printer in all cases. (But > Windows 7 doesn't use the printer manufacturer's drivers any longer.) > As David Wheeler mentioned on the call, these variations are > well-known. My problem is that I can't tell whether the deviations > hide more-material discrepancies such as defects in inter-page table > splitting. This also is what has bothered me about having the > editable form as the authoritative edition. There is no assurance > that different users will see the same authoritative text. I think > fidelity should trump reusability and I have no evidence that the PDF > renditions of these documents are worse in interchange than what I am > seeing here. > >


  • 4.  Re: Forensics: Working with the ODF 1.1 ODT version

    Posted 03-30-2011 19:42
    Dennis, I think we agree this is a non-issue if you use section headers and indicate the text to be replaced. Yes? I understand that you are concerned about different renderings in different applications, but none of that is relevant to the question of: Can we specify headings/text for correction such that every user making the corrections ends up with the same corrected text? That is the *only* question that we have to answer for errata production. I understand that this is an interesting presentation issue but that is all that it is, a presentation issue. Hope you are having a great day! Patrick PS: Apologies for the delay in sending. This got "caught" behind another window. ;-) On 3/28/2011 12:52 PM, Dennis E. Hamilton wrote: > I have had an unexpected glitch in my wanting to start from the editable ODF Text version of the OASIS ODF 1.1 Standard. The versions of ODF consumers that I have do not match the pagination that is shown in the PDF of the OASIS ODF 1.1 Standard. The difference is substantial. The ODF consumers I have used all end up with fewer pages (719 pages in OO.o 2.4.1, 720 pages in OO.o 3.2.0, 718 pages in LibreOffice 3.3.2) instead of 738 pages in the PDFand it appears to be related to how automatic page breaks are done. > > The screen capture illustrates the 1-page difference between OO.o 2.4.1 and LOffice 3.3.2. In 2.4.1, section 9 is one page longer. The creeping that ends up with one extra page begins in section 9.9 in the page break just before the Event Effect subsection. (OO.o 3.2.0 on a different machine has the break still different.) There may have been other differences in page breaks before this, but none of them changed the table of contents page numbers for any of the indexed headings. > > As discussed on today's call, this is not a material problem for creating an Errata 01 so long as we don't continue the practice of the ODF 1.0 Errata 01 of using page and line numbers. I agree, we can provide errata item description in a way that do not require page and line references, and I had intended to do it that way. > > However, the problem with pagination differences is that I am then left to wonder what else is being done differently. For that reason, I would much prefer to use some sort of style or other adjustments to have a pagination that is preserved in interchange among ODF consumers and that leads to a matching PDF. > > In fact, in examining shorter pagination in a current ODF consumer, I saw that some table breaks were being presented incorrectly. This is apparently a bug in some consumers and I need to work around it. (And this is why it is important to understand what the pagination differences are all symptoms of and finding some way to mitigate them.) > > - Dennis > > PS: Here are details of the bug as something to watch out for when working on ODF TC specifications. I am not sure how to report it and to whom, but here are the symptoms: > > 1. A table is split between two pages. The table is styled so that table rows are not themselves split, and headings are also preserved on continuation pages. > > 2. Sometimes, the table is split such that the last table row before the split appears to have disappeared. > > 3. Apparently the last row retained before the split is there, but it extends below the page body frame such that it is not visible. In the case I observed, none of the clipped row was visible at all, but it was observable that the top of another row was there (because the vertical border lines of the table could be seen extending below the last-visible row). > > 4. That's probably enough to figure out what the edge case is. Of course, this bug is hard to report unless I capture a screen shot or a PDF fragment that shows the incorrect presentation. If I see any of those again I will capture screen shots. > > 5. There are other differences in how schema fragments are split across pages. They can be cured by heavy-handed forcing of breaks but not by anything I've been able to do by adjusting styling or page parameters (footer distance for example). (I notice that the jitter between schema fragment lines is also present but not always exactly the same compare do the original.) > > 6. Just to demonstrate that I am not picking on the OO.o code base alone, I also opened the ODF 1.1 standard in Microsoft Office 2010 Word and got an even shorter document, although updating the table of contents makes it almost match the PDF in terms of the last page number, but only because the TOC blew up to where it ended on page 60 of 739. The Office 2010 version is not useable for my purposes because of other formatting discrepancies, especially for the schema fragments. > > PPS: I can imagine that there is something about lining up the pixels and font rendering that leads to these discrepancies of pagination, even though I have the same default printer in all cases. (But Windows 7 doesn't use the printer manufacturer's drivers any longer.) As David Wheeler mentioned on the call, these variations are well-known. My problem is that I can't tell whether the deviations hide more-material discrepancies such as defects in inter-page table splitting. This also is what has bothered me about having the editable form as the authoritative edition. There is no assurance that different users will see the same authoritative text. I think fidelity should trump reusability and I have no evidence that the PDF renditions of these documents are worse in interchange than what I am seeing here. > >


  • 5.  RE: [office] Re: Forensics: Working with the ODF 1.1 ODT version

    Posted 03-30-2011 20:06
    Yes, I have already agreed. There is no difficulty producing Errata 01. However, for producing a change-tracked (and/or change-accepted) copy of the ODF 1.1 specification with Errata incorporated, I am a bit more concerned about ensuring fidelity. I am not too concerned there, except people may be surprised by the different paginations. So far I have seen nothing different in regard to fonts, metrics and text flow, only page breaks. The one glitch I saw in the breaking of a table across multiple pages has not repeated and I was able to provide a manual work-around the one time I saw it. - Dennis