OASIS Open Document Format for Office Applications (OpenDocument) TC

 View Only
  • 1.  Alternatives for OFFICE-2102

    Posted 03-13-2017 16:34
      |   view attached
    Hi all, I have tried to sort out the alternatives for OFFICE-2102 for me. I have attached it. Please correct it where necessary and add your aspects in case I forgot something. Kind regards Regina Alternatives ============ = A = (see proposal by Michael in OFFICE-2102) Consumers shall collapse white space characters that occur in those elements where <text:s> [6.1.3], <text:tab> [6.1.4] and <text:line-break> [6.1.5] are allowed as element content. This way element <text:text-input> would not be affected by white space collapsing, because it does not yet allow <text:s>, <text:tab> and <text:line-break> as child. I would make this explicit: Children of such elements are treated for them own. ==Implications:== Multiple spaces, tab and line-break in an input field can be done by using ascii-character x0020, x0009 and x000A respectively. Pretty Printing of source of document in editor might add whitespaces (space, tab, line break) and such changes the appearence of the document. LibreOffe: Nothing to do in regard to input fields. I do not oversee other children of <text:p> and <text:h>. Word 2010: currently collapses the white spaces in input fields, would need changes in import and export filter. = B = (see proposal by Michael in OFFICE-2102) Consumers shall collapse white space characters that occur in those elements where <text:s> [6.1.3], <text:tab> [6.1.4] and <text:line-break> [6.1.5] are allowed as element content. AND The elements <text:s>, <text:tab> and <text:line-break> are allowed as children of element <text:text-input>. ==Implications:== Multiple spaces, tab and line-break in an input field are collapsed, in case ascii-characters are used. Pretty Printing of source of document is possible in editor without changing the appearence of the document. LibreOffice: Implement the new behavior for ODF 1.3. Implement a compatibility option for handling of old documents. Word 2010: Adapt the export filter to write <text:s>, <text:tab> and <text:line-break>. The import filter already support these elements. = C = Keep the current specification (only remove some glitches). Add an exception, that content of element <text:text-input> is excluded from white-space collapsing. ==Implications:== Multiple spaces, tab and line-break in an input field can be done by using ascii-character x0020, x0009 and x000A respectively. Pretty Printing of source of document in editor might add whitespaces (space, tab, line break) and such changes the appearence of the document. LibreOffice: Nothing to do Word: Need to change import and export filter in regard to input fields. = D = Keep the current specification (only remove some glitches). Make explicit, that content in all children are affected, including content of element <text:text-input>. ==Implications:== The ability of using multiple spaces, tabs or line-break in an element <text:text-import> is lost. Pretty Printing of source of document is possible in editor without changing the appearence of the document. LibreOffice: Needs to change import and export filter for ODF 1.3 in regard to element <text:text-import>. Needs to keep the current ODF 1.2 model and the current filters, so that existing documents are not changed. Word: Nothing to do

    Attachment(s)

    txt
    Alternatives.txt   2 KB 1 version


  • 2.  Re: [office] Alternatives for OFFICE-2102

    Posted 03-30-2017 17:04
    Dear TC, Let me try to do a new summarization of problems of the broken whitespace handling in ODF 1.2 Not all descendants of text:p/h allow the whitespace elements (text:tab, text:s, text:line-break) in the RelaxNG schema, although 6.1.2 "White Space Characters" §2 is requiring whitespace handling for all their descendants, which allow character data. 6.1.2 "White Space Characters"  is referring to descendants of text:p/h, while 3.18 White Space Processing and EOL Handling  is referring to children in the notes. Descendants are correct as already text:span might be nested and the handling should not only apply to the most upper text:span element. Existing whitespace handling as described in  6.1.2 "White Space Characters"   is not working as Jos mentioned in our last call. Jos provided a simple example, where pretty printing will add new space character into the document. The original document: <p><span>A</span><span>B</ span></p> if being pretty printed will end up as <p>     <span>A</span>      <span>B</span> </p> The space inserted for pretty printing in between the span will be compressed to a single space, which is still too much. NOTE: An ODT test document with the pretty printed XML and the additional space can be found attached to this mail. In addition, I have added a second version, where I have added some text in-between the spans and broke manually (like some custom pretty printer before & after elements). So why are we doing all this? The reason for whitespace handling is likely that ODF applications are able to identify and delete additional space inserted by pretty printing the XML being done by users in any other text/XML editor. There are many variations to do quick fixes to save some time fixing existing ODF applications, but just for the theory what would be the fix if whitespace handling should work with ODF 1.3? It is relative simple: Add whitespace elements (text:tab, text:s, text:line-break) in the RelaxNG schema for every descendant of text:p/h that has already character data (perhaps define character data) Fix the wording consistent to "descendants" 6.1.2 "White Space Characters"  (and likely other sections) have to be overworked that  ODF 1.3 producers Will exchange multiple space characters always to text:s with count attribute Will exchange even every single space before and after any descendant element of text:p/h with text:s (to avoid Jos' problem) ODF 1.3 consumers Will remove any space character before and after any descendant element of text:p/h Will remove any linebreak and adjacent whitespace characters To make the above work, the version attribute(s) shall become mandatory for ODF 1.3, which should be done anyway to ease a developer's life. What do you think? Hope it helps, Svante 2017-03-13 17:33 GMT+01:00 Regina Henschel < regina.henschel@libreoffice.org > : Hi all, I have tried to sort out the alternatives for OFFICE-2102 for me. I have attached it. Please correct it where necessary and add your aspects in case I forgot something. Kind regards Regina ------------------------------ ------------------------------ --------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/ apps/org/workgroup/portal/my_ workgroups.php ? Attachment: BrokenWhitespacehandling.odt Description: application/vnd.oasis.opendocument.text Attachment: BrokenWhitespacehandling2.odt Description: application/vnd.oasis.opendocument.text


  • 3.  Re: [office] Alternatives for OFFICE-2102

    Posted 03-31-2017 13:40
    On 30.03.2017 19:03, Svante Schubert wrote: > So why are we doing all this? > The reason for whitespace handling is likely that ODF applications are > able to identify and delete additional space inserted by pretty printing > the XML being done by users in any other text/XML editor. and we have already seen that this doesn't work perfectly in every case, and won't work perfectly with generic heuristics, without the pretty-printer applying ODF-specific rules. > There are many variations to do quick fixes to save some time fixing > existing ODF applications, but just for the theory what would be the > fix if whitespace handling should work with ODF 1.3? > > It is relative simple: > > 1. Add whitespace elements (text:tab, text:s, text:line-break) in the > RelaxNG schema for every descendant of text:p/h that has already > character data (perhaps define character data) let's take the first element from the list of <text:p> child elements that don't currently do whitespace processing: <dr3d:scene> 10.5.2 it has a child <svg:title>, which allows <text/> content - so you want to do whitespace processing there since it's a descendant of <text:p>. however, <dr3d:scene> does not necessarily occur in a paragraph, it may also occur in a <style:handout-master> element, which is never a descendant of <text:p>. do we now say that <svg:title> must have whitespace processing when it occurrs as a descendant of <text:p>, but not otherwise? to me that is the road to madness. to me a necessary criterion to apply whitespace processing is that the text content of the <text:p> descendant is conceptually part of the paragraph text - so all captions on drawing objects and authors on annotations and that sort of stuff shouldn't do whitespace processing. > 2. Fix the wording consistent to "descendants" "descendants" is better than "children", and furthermore i would perhaps move all mention of "descendants" into non-normative notes, and leave the normative text to say "processing shall be done if and only if the element allows <text:s> etc. as children". > 3. 6.1.2 "White Space Characters" > < http://docs.oasis-open.org/office/v1.2/os/OpenDocument-v1.2-os-part1.html#White-space_Characters > (and > likely other sections) have to be overworked that > * ODF 1.3 producers > 1. Will exchange multiple space characters always to text:s > with count attribute > 2. Will exchange even every single space before and after any > descendant element of text:p/h with text:s (to avoid Jos' > problem) > * ODF 1.3 consumers > 1. Will remove any space character before and after any > descendant element of text:p/h > 2. Will remove any linebreak and adjacent whitespace characters as said above i disagree with "any descendant". > 4. To make the above work, the version attribute(s) shall become > mandatory for ODF 1.3, which should be done anyway to ease a > developer's life. > > What do you think? i think we should restrict ourselves to specify something that has as much backwards compatibility as possible with existing implementations, with particular regard to how existing consumers will interpret whitespace in ODF 1.3 documents. given the current inconsistencies in implementations it's not possible to make everybody happy, but we should not introduce additional compatibility breakage that isn't there currently.


  • 4.  Re: [office] Alternatives for OFFICE-2102

    Posted 03-31-2017 13:59
    On 31.03.2017 15:39, Michael Stahl wrote: > On 30.03.2017 19:03, Svante Schubert wrote: >> 1. Add whitespace elements (text:tab, text:s, text:line-break) in the >> RelaxNG schema for every descendant of text:p/h that has already >> character data (perhaps define character data) > > let's take the first element from the list of <text:p> child elements > that don't currently do whitespace processing: <dr3d:scene> 10.5.2 > > it has a child <svg:title>, which allows <text/> content - so you want > to do whitespace processing there since it's a descendant of <text:p>. > > however, <dr3d:scene> does not necessarily occur in a paragraph, it may > also occur in a <style:handout-master> element, which is never a > descendant of <text:p>. > > do we now say that <svg:title> must have whitespace processing when it > occurrs as a descendant of <text:p>, but not otherwise? to me that is > the road to madness. > > to me a necessary criterion to apply whitespace processing is that the > text content of the <text:p> descendant is conceptually part of the > paragraph text - so all captions on drawing objects and authors on > annotations and that sort of stuff shouldn't do whitespace processing. another related point: <text:p> may also be a descendant of <text:p> - for example inside <draw:text-box>... this of course means you have 2 independent paragraphs on which the whitespace processing algorithm is applied; a <text:s> at the end of the inner paragraph should not cause a space following the </draw:text-box></text:frame> in the outer paragraph to vanish. -- Michael Stahl Software Engineer Platform Engineering - Desktop Team Red Hat Better technology. Faster innovation. Powered by community collaboration. See how it works at redhat.com Red Hat GmbH, http://www.de.redhat.com/ , Sitz: Grasbrunn, Handelsregister: Amtsgericht München, HRB 153243, Geschäftsführer: Charles Cachera, Michael Cunningham, Michael O'Neill, Eric Shander


  • 5.  Re: [office] Alternatives for OFFICE-2102

    Posted 04-02-2017 10:02
    Hello Michael, I believe we have quite different views on the required approach here. While to me it seems you are fixing the issue as it was written, minimising the costs for your application, I do not see how it solves the overall problem of the feature the issue is about: Allowing pretty printing in XML by none ODF editors. We should ask ourselves the questions:  Why whitespace handling had been added at all to the ODF specification and  how the intent can be fulfilled or  should it not be fulfilled and we deprecate all related work? 2017-03-31 15:39 GMT+02:00 Michael Stahl < mstahl@redhat.com > : On 30.03.2017 19:03, Svante Schubert wrote: > So why are we doing all this? > The reason for whitespace handling is likely that ODF applications are > able to identify and delete additional space inserted by pretty printing > the XML being done by users in any other text/XML editor. and we have already seen that this doesn't work perfectly in every case, and won't work perfectly with generic heuristics, without the pretty-printer applying ODF-specific rules. In the mail I had suggested already a possible solution for Jos' use case. Take your time and give an example, where it does not work in addition, so we might hava a chance to improve the specification.   > There are many variations to do quick fixes to save some time fixing > existing ODF applications, but just for the theory what would be the > fix if whitespace handling should work with ODF 1.3? > > It is relative simple: > >  1. Add whitespace elements (text:tab, text:s, text:line-break) in the >     RelaxNG schema for every descendant of text:p/h that has already >     character data (perhaps define character data) let's take the first element from the list of <text:p> child elements that don't currently do whitespace processing: <dr3d:scene> 10.5.2 it has a child <svg:title>, which allows <text/> content - so you want to do whitespace processing there since it's a descendant of <text:p>. however, <dr3d:scene> does not necessarily occur in a paragraph, it may also occur in a <style:handout-master> element, which is never a descendant of <text:p>. do we now say that <svg:title> must have whitespace processing when it occurrs as a descendant of <text:p>, but not otherwise?  to me that is the road to madness. to me a necessary criterion to apply whitespace processing is that the text content of the <text:p> descendant is conceptually part of the paragraph text - so all captions on drawing objects and authors on annotations and that sort of stuff shouldn't do whitespace processing. Do you really think the exchange of whitespace to elements and vice versa is a road to madness?  Perhaps when we realize that there is breaking text content, which is not within paragraphs/headers? Our specification should cover the indendet use case and as long no one comes up with a different explaination, it is all about removing whitespaces inserted by pretty printing. Or otherwise remove the use case. In either case, make a consequent clean decision. >  2. Fix the wording consistent to "descendants" "descendants" is better than "children", and furthermore i would perhaps move all mention of "descendants" into non-normative notes, and leave the normative text to say "processing shall be done if and only if the element allows <text:s> etc. as children". I can not follow you on the reasoning for the non-normative note. Why is it non-normative? Ar you already jumping to a certain solution in mind? The additional normative text you suggested, implies already the solution you favored in changing as little as possible in the specification and allow in the future the insertion for additional whitespaces in the content by pretty printing. Is this the best we can do?   >  3. 6.1.2 "White Space Characters" >     < http://docs.oasis-open.org/ office/v1.2/os/OpenDocument- v1.2-os-part1.html#White- space_Characters > (and >     likely other sections) have to be overworked that >       * ODF 1.3 producers >          1. Will exchange multiple space characters always to text:s >             with count attribute >          2. Will exchange even every single space before and after any >             descendant element of text:p/h with text:s (to avoid Jos' >             problem) >       * ODF 1.3 consumers >          1. Will remove any space character before and after any >             descendant element of text:p/h >          2. Will remove any linebreak and adjacent whitespace characters as said above i disagree with "any descendant". >  4. To make the above work, the version attribute(s) shall become >     mandatory for ODF 1.3, which should be done anyway to ease a >     developer's life. > > What do you think? i think we should restrict ourselves to specify something that has as much backwards compatibility as possible with existing implementations, with particular regard to how existing consumers will interpret whitespace in ODF 1.3 documents. given the current inconsistencies in implementations it's not possible to make everybody happy, but we should not introduce additional compatibility breakage that isn't there currently. If there is something broken in the specifciation, we should consider fixing it. If the fix requires others to update, the price is not too high in any case.  Broken backward compatibility is not evil per se. Have you tried a test document which the incomptabile changes (whitespace elements were they are not allowed)? What happens? Enjoy your Sunday, Michael! Svante   ------------------------------ ------------------------------ --------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/ apps/org/workgroup/portal/my_ workgroups.php ?


  • 6.  Re: [office] Alternatives for OFFICE-2102

    Posted 04-03-2017 08:40
    After thinking it over, I find the feature of whitespace handling as useful for end users as flat XML (which is similar unstable as it does not have a generic mapping for XML:ID (potential clashes) and no way to embrace other files of the directory (loose of data). Those developer features are nice for playing around, but not made stable enough for the productive environment to cover all potential edge cases. Therefore, I agree not to waste further time with it. To me, it is absolute fine to fix the wording and for instance make an informal note to state that it only works for most cases but will produce problems with certain cases, such as fields. In the future, a file format should better spend time in providing open source developer extensions for existing editors such as Atom, emacs, vi.. (whatever is hype). Then there should be no need for such a hunchback as whitespace handling for file applications. I have changed my mind when I realised that any pretty printing will break existing XML signatures and signatures will become more and more important in the future. In addition, I always wanted to have better ODF text editor support out-of-the-box. Fixing whitespace handling would fix the wrong side of the problem (make implementations of ODF more difficult), but will not provide me as a developer using pretty printing any better usability. Talk to you later today. Svante ? 2017-04-02 12:01 GMT+02:00 Svante Schubert < svante.schubert@gmail.com > : Hello Michael, I believe we have quite different views on the required approach here. While to me it seems you are fixing the issue as it was written, minimising the costs for your application, I do not see how it solves the overall problem of the feature the issue is about: Allowing pretty printing in XML by none ODF editors. We should ask ourselves the questions:  Why whitespace handling had been added at all to the ODF specification and  how the intent can be fulfilled or  should it not be fulfilled and we deprecate all related work? 2017-03-31 15:39 GMT+02:00 Michael Stahl < mstahl@redhat.com > : On 30.03.2017 19:03, Svante Schubert wrote: > So why are we doing all this? > The reason for whitespace handling is likely that ODF applications are > able to identify and delete additional space inserted by pretty printing > the XML being done by users in any other text/XML editor. and we have already seen that this doesn't work perfectly in every case, and won't work perfectly with generic heuristics, without the pretty-printer applying ODF-specific rules. In the mail I had suggested already a possible solution for Jos' use case. Take your time and give an example, where it does not work in addition, so we might hava a chance to improve the specification.   > There are many variations to do quick fixes to save some time fixing > existing ODF applications, but just for the theory what would be the > fix if whitespace handling should work with ODF 1.3? > > It is relative simple: > >  1. Add whitespace elements (text:tab, text:s, text:line-break) in the >     RelaxNG schema for every descendant of text:p/h that has already >     character data (perhaps define character data) let's take the first element from the list of <text:p> child elements that don't currently do whitespace processing: <dr3d:scene> 10.5.2 it has a child <svg:title>, which allows <text/> content - so you want to do whitespace processing there since it's a descendant of <text:p>. however, <dr3d:scene> does not necessarily occur in a paragraph, it may also occur in a <style:handout-master> element, which is never a descendant of <text:p>. do we now say that <svg:title> must have whitespace processing when it occurrs as a descendant of <text:p>, but not otherwise?  to me that is the road to madness. to me a necessary criterion to apply whitespace processing is that the text content of the <text:p> descendant is conceptually part of the paragraph text - so all captions on drawing objects and authors on annotations and that sort of stuff shouldn't do whitespace processing. Do you really think the exchange of whitespace to elements and vice versa is a road to madness?  Perhaps when we realize that there is breaking text content, which is not within paragraphs/headers? Our specification should cover the indendet use case and as long no one comes up with a different explaination, it is all about removing whitespaces inserted by pretty printing. Or otherwise remove the use case. In either case, make a consequent clean decision. >  2. Fix the wording consistent to "descendants" "descendants" is better than "children", and furthermore i would perhaps move all mention of "descendants" into non-normative notes, and leave the normative text to say "processing shall be done if and only if the element allows <text:s> etc. as children". I can not follow you on the reasoning for the non-normative note. Why is it non-normative? Ar you already jumping to a certain solution in mind? The additional normative text you suggested, implies already the solution you favored in changing as little as possible in the specification and allow in the future the insertion for additional whitespaces in the content by pretty printing. Is this the best we can do?   >  3. 6.1.2 "White Space Characters" >     < http://docs.oasis-open.org/o ffice/v1.2/os/OpenDocument-v1. 2-os-part1.html#White-space_ Characters > (and >     likely other sections) have to be overworked that >       * ODF 1.3 producers >          1. Will exchange multiple space characters always to text:s >             with count attribute >          2. Will exchange even every single space before and after any >             descendant element of text:p/h with text:s (to avoid Jos' >             problem) >       * ODF 1.3 consumers >          1. Will remove any space character before and after any >             descendant element of text:p/h >          2. Will remove any linebreak and adjacent whitespace characters as said above i disagree with "any descendant". >  4. To make the above work, the version attribute(s) shall become >     mandatory for ODF 1.3, which should be done anyway to ease a >     developer's life. > > What do you think? i think we should restrict ourselves to specify something that has as much backwards compatibility as possible with existing implementations, with particular regard to how existing consumers will interpret whitespace in ODF 1.3 documents. given the current inconsistencies in implementations it's not possible to make everybody happy, but we should not introduce additional compatibility breakage that isn't there currently. If there is something broken in the specifciation, we should consider fixing it. If the fix requires others to update, the price is not too high in any case.  Broken backward compatibility is not evil per se. Have you tried a test document which the incomptabile changes (whitespace elements were they are not allowed)? What happens? Enjoy your Sunday, Michael! Svante   ------------------------------ ------------------------------ --------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/app s/org/workgroup/portal/my_work groups.php ?