OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

  • 1.  DMLex spec

    Posted 05-16-2022 16:25
    Hi everyone, I am busy editing the DMLex spec but I think I l need one more day to get it done properly. So, you can expect an e-mail from me towards the end of tomorrow (Tuesday 17 May). M.


  • 2.  Re: DMLex spec

    Posted 05-17-2022 20:08
    OK, so I'm going to need one more day. Sorry for keeping you waiting but the result will be worth it! M. po 16. 5. 2022 v 18:24 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: Hi everyone, I am busy editing the DMLex spec but I think I l need one more day to get it done properly. So, you can expect an e-mail from me towards the end of tomorrow (Tuesday 17 May). M.


  • 3.  Re: DMLex spec

    Posted 05-18-2022 16:41
      |   view attached
    Hi everyone, So here my latest draft. What's new: - I have made some tweaks to the pseudocode in which we present examples throughout the document. Consequently, I have made some changes to the formalism through which we define the data model. The distinction between "objects" and "properties" has disappeared, and we now explicitly state the arities and types of everything everywhere (TomaÅ will like that!). I think the whole document is now be easier to understand for readers, even outside our tribe. - There have been some tiny, almost cosmetic, changes to some names of things. In other words, this draft doesn't really bring any conceptually new material. Everything in it is based on things we've agreed and consensuses we've built up, so I wouldn't expect any disagreement or surprised reactions at this point. That said, I strongly encourage everyone to read it from start to end now because this is our last chance to catch any problems before we present it publicly at the ELEXIS event in Florence. With this now, we have created a clean, logical, consistent, simple, IT-friendly, well though-through data model for dictionaries. No-one's every done this before in the history of lexicography. It's taken us a lot of talking and proposing and writing and rewriting to distil our ideas into a single, universally usable data model, but we've done it and the result is really likeable! (Oh yes, the document is not in DocBook. I'll try to beat it into the DocBook format and submit it as a proper pull request through GitHub by Monday.) M. Ãt 17. 5. 2022 v 22:07 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: OK, so I'm going to need one more day. Sorry for keeping you waiting but the result will be worth it! M. po 16. 5. 2022 v 18:24 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: Hi everyone, I am busy editing the DMLex spec but I think I l need one more day to get it done properly. So, you can expect an e-mail from me towards the end of tomorrow (Tuesday 17 May). M. Attachment: dmlex.pdf Description: Adobe PDF document

    Attachment(s)

    pdf
    dmlex.pdf   142 KB 1 version


  • 4.  RE: [lexidma] Re: DMLex spec

    Posted 05-19-2022 13:09
      |   view attached
    Hi Michal, I went through the whole text and I do share your excitement. While reading I could put most of the data from the dictionaries I worked on in the last (almost) 30 years into the right slots, but these are much much better organised. Congratulations!   My comments are in the attached file. Best regards, Simon   From: lexidma@lists.oasis-open.org [mailto:lexidma@lists.oasis-open.org] On Behalf Of Michal MÄchura Sent: Wednesday, May 18, 2022 6:41 PM To: lexidma@lists.oasis-open.org Subject: [lexidma] Re: DMLex spec   Hi everyone,   So here my latest draft. What's new:   - I have made some tweaks to the pseudocode in which we present examples throughout the document. Consequently, I have made some changes to the formalism through which we define the data model. The distinction between "objects" and "properties" has disappeared, and we now explicitly state the arities and types of everything everywhere (TomaÅ will like that!). I think the whole document is now be easier to understand for readers, even outside our tribe.   - There have been some tiny, almost cosmetic, changes to some names of things.   In other words, this draft doesn't really bring any conceptually new material. Everything in it is based on things we've agreed and consensuses we've built up, so I wouldn't expect any disagreement or surprised reactions at this point. That said, I strongly encourage everyone to read it from start to end now because this is our last chance to catch any problems before we present it publicly at the ELEXIS event in Florence.   With this now, we have created a clean, logical, consistent, simple, IT-friendly, well though-through data model for dictionaries. No-one's every done this before in the history of lexicography. It's taken us a lot of talking and proposing and writing and rewriting to distil our ideas into a single, universally usable data model, but we've done it and the result is really likeable!   (Oh yes, the document is not in DocBook. I'll try to beat it into the DocBook format and submit it as a proper pull request through GitHub by Monday.)   M.     Ãt 17. 5. 2022 v 22:07 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: OK, so I'm going to need one more day. Sorry for keeping you waiting but the result will be worth it! M.   po 16. 5. 2022 v 18:24 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: Hi everyone,   I am busy editing the DMLex spec but I think I l need one more day to get it done properly. So, you can expect an e-mail from me towards the end of tomorrow (Tuesday 17 May).   M. Attachment: dmlex-sk.pdf Description: Adobe PDF document

    Attachment(s)

    pdf
    dmlex-sk.pdf   173 KB 1 version


  • 5.  Re: [lexidma] Re: DMLex spec

    Posted 05-23-2022 15:28
    Hi Michal, I was working on the RDF serialization. Here are some comments, queries that came up. ID and URI are the same thing in RDF serialization. Would this be problematic I am not sure why `transcriptionScheme` is typed as a `langCode` I assume that `homographNumber` > 0 (in value not cardinality) The 0..1 restriction on definition seems a bit limiting. I know in Open English WordNet we have multiple definitions for the same sense. `label` is used on multiple classes. I thought we weren't allowing this? Shouldn't it be `senseLabel`, `inflectedLabel`, etc.? `source` is also used on multiple classes (LexicographicResource and Example). In this case, its meaning is quite different. In fact, it is not specified what the value of the `source` of a `lexicographicResource` should be. For `sameAs` I assume we will just use OWL's built-in property. `translationLanguage` is marked as 1..1. This would suggest that every lexicographic resource has a translation language even if it is monolingual. Similarly `translationLanguage` has 1..n. I guess that you mean that if users use this module this property is required, but we have no mechanism to say what modules are being used so I think this will lead to issues with the serializations. Sec 3.3, extensions to example shows `sense` still. Should we use `partOfSpeech` on `headwordTranslation`. Will the POS values for the translations be the same as entries or should we have `translationPartOfSpeech` for the foreign language POS tag set? Sections 3 and 4 are very confusing and I thought at first it was just a mistake with copy/paste. I guess you want to say that `language` is an additional required property in the multilingual module, but why not just say that the Multilingual module is an extension to the Bilingual module. I am not sure how we intend users to declare which modules they are using anyway. The inline module is quite difficult to model in RDF. There is a bit of a clash as I want `headword` to be a string value not an object and I don't have the ability for it to be both like in your serialization. The same issue must exist in the JSON serialization as well. I suggest that, instead of adding children to `headword`, we introduce a property `headwordPlaceholderMarker` So taking your example into RDF we would have something like this. @prefix dmlex: <http://www.oasis-open.org/to-be-confirmed/dmlex> . <abandon-verb> dmlex:headword abandon ; dmlex:partOfSpeech verb ; dmlex:sense <abandon-verb-1>, <abandon-verb-2> . <abandon-verb-1> dmlex:definition to suddenly leave a place or a person ; dmlex:example [ dmlex:value I'm sorry I abandoned you like that ] , [ dmlex:value Abandon ship ; dmlex:label idiom ] . <abandon-verb-2> dmlex:label mostly-passive ; dmlex:definition to stop supporting an idea ; dmlex:example [ dmlex:value The theory has been abandoned ] . Some of these properties could be mapped to OntoLex properties, e.g., `dmlex:sense` => `ontolex:denotes`, but not so many. I attach my first stab at making an OWL ontology for DMLex. Regards, John On 18/05/2022 17:40, Michal MÄchura wrote: Hi everyone, So here my latest draft. What's new: - I have made some tweaks to the pseudocode in which we present examples throughout the document. Consequently, I have made some changes to the formalism through which we define the data model. The distinction between objects and properties has disappeared, and we now explicitly state the arities and types of everything everywhere (TomaÅ will like that!). I think the whole document is now be easier to understand for readers, even outside our tribe. - There have been some tiny, almost cosmetic, changes to some names of things. In other words, this draft doesn't really bring any conceptually new material. Everything in it is based on things we've agreed and consensuses we've built up, so I wouldn't expect any disagreement or surprised reactions at this point. That said, I strongly encourage everyone to read it from start to end now because this is our last chance to catch any problems before we present it publicly at the ELEXIS event in Florence. With this now, we have created a clean, logical, consistent, simple, IT-friendly, well though-through data model for dictionaries. No-one's every done this before in the history of lexicography. It's taken us a lot of talking and proposing and writing and rewriting to distil our ideas into a single, universally usable data model, but we've done it and the result is really likeable! (Oh yes, the document is not in DocBook. I'll try to beat it into the DocBook format and submit it as a proper pull request through GitHub by Monday.) M. Ãt 17. 5. 2022 v 22:07 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: OK, so I'm going to need one more day. Sorry for keeping you waiting but the result will be worth it! M. po 16. 5. 2022 v 18:24 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: Hi everyone, I am busy editing the DMLex spec but I think I l need one more day to get it done properly. So, you can expect an e-mail from me towards the end of tomorrow (Tuesday 17 May). M. --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php Attachment: dmlex.owl Description: application/rdf


  • 6.  RE: [lexidma] Re: DMLex spec

    Posted 05-23-2022 21:52
    Hi John, hi all, A few comments are in-line below. Simon   From: lexidma@lists.oasis-open.org [mailto:lexidma@lists.oasis-open.org] On Behalf Of John McCrae Sent: Monday, May 23, 2022 5:28 PM To: lexidma@lists.oasis-open.org Subject: Re: [lexidma] Re: DMLex spec   Hi Michal, I was working on the RDF serialization. Here are some comments, queries that came up. ID and URI are the same thing in RDF serialization. Would this be problematic I am not sure why `transcriptionScheme` is typed as a `langCode` [Simon] Indeed. I assume that `homographNumber` > 0 (in value not cardinality) The 0..1 restriction on definition seems a bit limiting. I know in Open English WordNet we have multiple definitions for the same sense. [Simon] Yes, this was also Carole's comment. More than one should be allowed. `label` is used on multiple classes. I thought we weren't allowing this? Shouldn't it be `senseLabel`, `inflectedLabel`, etc.? [Simon] From lexicographic point of view it makes more sense to have a generic label (type) that you can attach to different objects or parts of an entry. `source` is also used on multiple classes (LexicographicResource and Example). In this case, its meaning is quite different. In fact, it is not specified what the value of the `source` of a `lexicographicResource` should be. [Simon] I'm not so sure about this one. For `sameAs` I assume we will just use OWL's built-in property. `translationLanguage` is marked as 1..1. This would suggest that every lexicographic resource has a translation language even if it is monolingual. Similarly `translationLanguage` has 1..n. I guess that you mean that if users use this module this property is required, but we have no mechanism to say what modules are being used so I think this will lead to issues with the serializations. [Simon] I thought that language is used in monolinguals, and tranlationLanguage in bi- and multilinguals. Which probably means if I understand John correctly that we should force users to choose between three possible types of lexicographic resources in the first place: monolingual, bilingual, multilingual. Sec 3.3, extensions to example shows `sense` still. Should we use `partOfSpeech` on `headwordTranslation`. Will the POS values for the translations be the same as entries or should we have `translationPartOfSpeech` for the foreign language POS tag set? [Simon] I'm not sure about this. What happens on the translation side can be quite wild: one word to two or many, or zero, different POS, quasi explanations, etc. Sections 3 and 4 are very confusing and I thought at first it was just a mistake with copy/paste. I guess you want to say that `language` is an additional required property in the multilingual module, but why not just say that the Multilingual module is an extension to the Bilingual module. I am not sure how we intend users to declare which modules they are using anyway. [Simon] I see some justification for bi- and multilingual modules. In general, (traditional) lexicography is rather averse to multilingual dictionaries, as it's difficult enough to present a consistent contrastive analysis of two languages, and with multilinguals it's either annoying simplification or exponential growth of CA problems. The inline module is quite difficult to model in RDF. There is a bit of a clash as I want `headword` to be a string value not an object and I don't have the ability for it to be both like in your serialization. The same issue must exist in the JSON serialization as well. I suggest that, instead of adding children to `headword`, we introduce a property `headwordPlaceholderMarker` So taking your example into RDF we would have something like this. @prefix dmlex: <http://www.oasis-open.org/to-be-confirmed/dmlex> . <abandon-verb> dmlex:headword "abandon" ;   dmlex:partOfSpeech "verb" ;   dmlex:sense <abandon-verb-1>, <abandon-verb-2> . <abandon-verb-1> dmlex:definition "to suddenly leave a place or a person" ;   dmlex:example [ dmlex:value "I'm sorry I abandoned you like that" ] ,                 [ dmlex:value "Abandon ship" ; dmlex:label "idiom" ] . <abandon-verb-2> dmlex:label "mostly-passive" ;   dmlex:definition "to stop supporting an idea" ;   dmlex:example [ dmlex:value "The theory has been abandoned" ] .                        Some of these properties could be mapped to OntoLex properties, e.g., `dmlex:sense` => `ontolex:denotes`, but not so many. I attach my first stab at making an OWL ontology for DMLex. Regards, John On 18/05/2022 17:40, Michal MÄchura wrote: Hi everyone,   So here my latest draft. What's new:   - I have made some tweaks to the pseudocode in which we present examples throughout the document. Consequently, I have made some changes to the formalism through which we define the data model. The distinction between "objects" and "properties" has disappeared, and we now explicitly state the arities and types of everything everywhere (TomaÅ will like that!). I think the whole document is now be easier to understand for readers, even outside our tribe.   - There have been some tiny, almost cosmetic, changes to some names of things.   In other words, this draft doesn't really bring any conceptually new material. Everything in it is based on things we've agreed and consensuses we've built up, so I wouldn't expect any disagreement or surprised reactions at this point. That said, I strongly encourage everyone to read it from start to end now because this is our last chance to catch any problems before we present it publicly at the ELEXIS event in Florence.   With this now, we have created a clean, logical, consistent, simple, IT-friendly, well though-through data model for dictionaries. No-one's every done this before in the history of lexicography. It's taken us a lot of talking and proposing and writing and rewriting to distil our ideas into a single, universally usable data model, but we've done it and the result is really likeable!   (Oh yes, the document is not in DocBook. I'll try to beat it into the DocBook format and submit it as a proper pull request through GitHub by Monday.)   M.     Ãt 17. 5. 2022 v 22:07 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: OK, so I'm going to need one more day. Sorry for keeping you waiting but the result will be worth it! M.   po 16. 5. 2022 v 18:24 odesÃlatel Michal MÄchura < 462258@mail.muni.cz > napsal: Hi everyone,   I am busy editing the DMLex spec but I think I l need one more day to get it done properly. So, you can expect an e-mail from me towards the end of tomorrow (Tuesday 17 May).   M.   --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php