OASIS Open Document Format for Office Applications (OpenDocument) TC

 View Only
  • 1.  XML:IDs

    Posted 12 days ago
    Greetings!

    Last week Svante mentioned xml:ids as a pointing method but as I recall,
    those are re-generated on saving of an ODF document. That is there is no
    guarantee xml:ids will persist over time and changes to a document.

    I won't swear to it but over 20 years ago now, xml:ids were not
    persisted as a file size issue for the compressed ODF files.

    I don't think you can buy a laptop now with less than a TB of disk
    memory so has the technology changed so much that prior concerns for
    file size seem out of date?

    Thoughts? I'd much rather work with stable IDs. Even dumb systems can
    work with them.

    Hope everyone is looking forward to a great weekend!

    Patrick


  • 2.  RE: XML:IDs

    Posted 12 days ago
    Hi Patrick,

    Patrick Durusau via OASIS schrieb am 10.01.2025 um 23:36:
    > Greetings!
    >
    > Last week Svante mentioned xml:ids as a pointing method but as I recall,
    > those are re-generated on saving of an ODF document. That is there is no
    > guarantee xml:ids will persist over time and changes to a document.
    >
    > I won't swear to it but over 20 years ago now, xml:ids were not
    > persisted as a file size issue for the compressed ODF files.
    >
    > I don't think you can buy a laptop now with less than a TB of disk
    > memory so has the technology changed so much that prior concerns for
    > file size seem out of date?
    >
    > Thoughts? I'd much rather work with stable IDs. Even dumb systems can
    > work with them.

    The old discussion about that is in
    https://issues.oasis-open.org/browse/OFFICE-3788.

    Kind regards,
    Regina




  • 3.  RE: XML:IDs

    Posted 12 days ago

    Thanks Regina!

    Patrick

    On 1/11/2025 3:59 AM, Regina Henschel via OASIS wrote:
    010001945497679c-f1b1c049-1f32-49df-923b-a687461eec06-000000@email.amazonses.com">
    Hi Patrick, Patrick Durusau via OASIS schrieb am 10.01.2025 um 23:36: > Greetings! > > Last week Svante mentioned xml:ids as a pointing method...





  • 4.  RE: XML:IDs

    Posted 12 days ago
    Hi Patrick,

    In our last TC call I mentioned two different ways of referencing areas within a document. 
    1. The first way was the old common way by the explicit usage of @xml:id:
      Every element requires an attribute name @xml:id with a value that is unique within the XML document.
      This works even for text, by adding elements like <text:span> (for styles) or <text:meta> (for metadata only) within paragraphs.
      This works even for interleaving elements by having start/end elements as we have for bookmarks: <text:bookmark-start> and <text:bookmark-end>.
    2. The second way is the new implicit way of defining a point not by ID by it position. The counting of XML nodes to identify a position would be defined in the ODF standard.
      Note that not all nodes of the XML syntax are being counted as the XML does not exist during run-time in office applications and the XML is not normalized. A document with the same meaning (semantic) might exist in multiple XML variations.
      Instead, the semantic entities of multiple XML nodes are being grouped into an entity - this entity and grouping would be as well defined in our ODF specification. Think of the semantic entity of a table and image or a paragraph.
      Only those entities would be added up for a position - I give an example a little later.
    Comparing those two makes me easily favour the second one, as the complexity has been moved to the ODF specification and does not exist explicitly in every document!
    Image if we place our ODF specification on a server (what we did) and we would like to make every part of our ODF specification referrable by other ODF applications.

    Using method 1 requires that we add an @xml:id to every possible content element in the document. 
    The document would unnecessarily increase in size and still references to arbitrary text parts would likely not be feasible to become accessible due to the high amount of <text:meta> elements.

    Using method 2 would allow us to create from the syntax tree a higher-level semantic tree with fewer entities (removing some syntax noise), where we could refer to entities and areas in the document.
    Imagine there is a table and image and a paragraph with "Hello World" in a document. If we start counting in the semantic tree with 1 (like in XML and like humans do), the position of W of the word "World" in XPath-like-syntax would be: /3/7

    In addition, the second method 2 would work also with existing documents as these semantic entities always existed since the start of the office format, but were never explicitly mentioned nor specified in our specification - what I strongly suggest to improve references and especially testing (my main use case) for ODF applications!

    Please note that such a reference always refers to a certain state of the document. If someone adds to the document above a new paragraph in the beginning the reference of the letter "W" would become /4/7

    Last but not least, document editing with metadata is always tricky, as the validity of mostly invisible metadata within the document has to be guaranteed.
    This only works with confidence, if the document is being edited with an office application that supports this kind of metadata.
    To guarantee the quality of metadata we must be sure that the document was only edited by offices that are aware of such metadata.
    The trick might be to be certain that the last office was aware of the metadata. 
    This could be solved when those applications supporting a certain metadata feature share a certain signature. After editing, they might sign the document signature with their "feature" signature. 
    (This feature signature handling might need further discussion, just pointed in a direction...)

    Manual metadata editing is even worse. As a rule of thumb, manually edited metadata will likely get lost if the metadata is not visible to an application's user. 
    Quite impossible to establish an overall fool-proved handling in our first iteration as there are too many fools with unexpected ideas! ;-) .
    But we should focus on scenarios where the benefit exists and can be kept in a smaller educated - not arbitrary group... (like using a signature to ensure a feature)...

    Hope this does make sense, otherwise I will aim to answer question by email or on Monday in our TC call.
    Svante







  • 5.  RE: XML:IDs

    Posted 12 days ago

    Yes! Thanks!

    Longer answer coming.

    Patrick

    On 1/11/2025 5:25 AM, Svante Schubert via OASIS wrote:
    0100019454e60c77-cce3058b-44b1-48ef-b34e-f2bb0c3e7363-000000@email.amazonses.com">
    Hi Patrick, In our last TC call I mentioned two different ways of referencing areas within a document. The first way was the old common way by... -posted to the "OASIS Open Document Format for Office Applications (OpenDocument) TC" community





  • 6.  RE: XML:IDs

    Posted 12 days ago

    Svante,

    Responding on the "reduced" XML tree for attachment of metadata:

    While possible, it raises the question of at what level of granularity do we want metadata? (leaving what metadata to one side at the moment)

    <text:h>

    <text:p>

    <table:table>

    Just considering these three, each instance has a location in the XML tree of the document upon loading. That is to say in a valid ODF document, metadata should point to valid locations in the document, however those pointers are preserved by an application. When serialized to XML, the ODF format, an application must point metadata to elements using their position in the XML tree being serialized.

    Does a metadata aware application know the prior location of elements and update that memory mechanism?

    This is making me wonder if we should add an attribute to elements where we want to add this metadata.

    Moving it closer (onto) the elements eliminates the pointing and tracking issues.

    Thoughts?

    Feedback is greatly appreciated! See, I've been pushed to attributes! ;-)

    Patrick

    On 1/11/2025 5:25 AM, Svante Schubert via OASIS wrote:
    0100019454e60c77-cce3058b-44b1-48ef-b34e-f2bb0c3e7363-000000@email.amazonses.com">
    Hi Patrick, In our last TC call I mentioned two different ways of referencing areas within a document. The first way was the old common way by... -posted to the "OASIS Open Document Format for Office Applications (OpenDocument) TC" community