Greetings!
Just some rough notes that I'd like to sharpen up for discussion next week.
One of the current weaknesses of some, not all genAI pipelines is they
remove all markup so text becomes a single string. I can supply the
citations but there has been some re-discovery that paragraphs have
relationships to paragraphs in a single section, etc. And there are
efforts that work back from content to the structure of documents.
What if Svante's abstraction over XML trees was over both OOXML and ODF
trees, so we could capture structure like author names, titles, etc.,
containers of <text:p> or p-* for all the paragraph types in OOXML, as
abstractions, with implementations dealing with the mappings from OOXML
and ODF to the abstraction? So we don't have to get into the weeds of
syntax details. Whatever is there now or is there in the future. All map
to the abstraction.
I suspect Jean Pauli might be interested in getting all the bands back
together for another dance, but this one more coordinated from the
beginning.
Obviously there would need to be funding both for the development of the
abstraction, separate from the ODF work and to take advantage of it,
development of software.
Noting there is a sea of OOXML and ODF documents held by government
chomping at the bit to use AI in their workflows so this could be a
marketing success for all concerned.
Thoughts? What do I need to make stronger, etc. Not so much wording as
major themes. Should I include the usual fear of unlabeled AI content?
It would be terrible to think a colleague was literate when they aren't. ;-)
Thanks!
Patrick
PS: Entirely written without the aid of an AI.</text:p>