OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

 View Only
  • 1.  Superscript and subscript

    Posted 01-07-2025 04:05

    Hello,

    what would be the best way to represent superscript and subscript (and other potential formatting issues) in DMLex? We have a legacy dictionary with this:
     H<sub>2</sub>O
    LEGO<sup>®</sup>

    Thanks,

    Andraž



    ------------------------------
    Andra Repar
    Jozef Stefan Institute
    ------------------------------


  • 2.  RE: Superscript and subscript

    Posted 01-07-2025 05:32

    Interesting questions.

    Many superscript and subscript characters can be represented in plain text, without inline markup, because Unicode has codepoints for them. For example, "Subscript Two" is U+2082. So you could just go for "H₂O".

    A more interesting question is, what language are you labelling "H₂O" with? It's not an expression in a natural language, is it? DMLex is intended for natural language. Chemical and mathematical notations are an edge case which DMLex has no out-of-the-box support for. But you could invent a private-use language subtag for those (this is allowed for IETF language tags) and then encode the expression using any notation your software knows it should use for this "language", such as MathML or whatever.

    For the circled R ("Registered Sign", U+00AE) I would argue that it doesn't need any inline markup at all. In most fonts, this character already is rendered raised so it looks like a superscript: it is a property of the character that it should be shown like that.

    More radically, I would argue that "®" (as well as "™" and others) shouldn't be part of the headword at all. I would rather represent it as as a DMLex Label, whose semantics is "this headword is a registered trademark". The label can be shown to people as "®".

    Last but not least, DMLex has an Annotation Module but that is only intended for very specific use-cases and probably of no use for here.

    Michal



    ------------------------------
    Michal Mechura
    Faculty of Informatics Masaryk University
    ------------------------------



  • 3.  RE: Superscript and subscript

    Posted 01-07-2025 06:27
    I agree with Michal completely.
    Most formatting parts of lexicographic content is just legacy from paper time and needs to be handled differently -- and in many cases simply removed.

    Best
    Milos