OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

View Only

Back to discussions

Expand all | Collapse all

Whitespace and the annotation module

1. Whitespace and the annotation module

0 Recommend
John McCrae
Posted 02-16-2024 09:50
Hi all, One more comment for the 2nd public review. I have been thinking and I am still not sure about our rule for whitespace in the elements. In particular, in the converter I have been developing I am having problems because we cannot apply pretty printing (indenting) to an XML file without changing the content in the model. Further, I think the rules are unintuitive and many will add whitespace and create unfortunate errors. Instead I propose that we adopt the HTML methodology as described here: https://infra.spec.whatwg.org/#strip-newlines In this case, before processing the content of any text carrying element, we will first remove all new lines ('
', '
'), delete all trailing and leading whitespace and replace all remaining blocks of ASCII whitespace with a single space. I would also make a model change, replacing all references to 'non-empty string' with a 'normalised string'. This means a string that contains no new lines, does not start or end with a whitespace, contains no block of ASCII whitespace more than a single space and is non-empty. This ensures that other serializations (JSON, RDF) cannot generate content that cannot be represented in XML. I do worry that this does not really cover Chinese, Japanese (and maybe Thai/Lao), as the whitespace rules for HTML are more complex in Unicode, but I think that this can probably be worked around by lexicographers working in these languages. We can add a note to the spec for these languages. Regards, John -- John P. McCrae (he/him; #startsWithAName John (rhymes with "gone") McCrae (rhymes with "hay") /dÊÉn mÃkÉeÉ/) Assistant Professor - SFI Insight Centre for Data Analytics, Data Science Institute & Computer Science, University of Galway

OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

Whitespace and the annotation module

1. Whitespace and the annotation module

Contact Us

Membership

Privacy & Terms

OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

Whitespace and the annotation module

1. Whitespace and the annotation module

Related Content

RDF serialization for LEXIDMA

LEXIDMA RDF and comments

Re: [xacml] functions specifications: string-equal and whitespace

Component Identifier Use Cases

Fwd: [lexidma] Updated diagram for OntoLex in LEXIDMA

Contact Us

Membership

Privacy & Terms