OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

Expand all | Collapse all

Working draft

  • 1.  Working draft

    Posted 09-12-2021 19:51
      |   view attached
    Hi all, So, I have converted my latest proposal to DocBook and I have submitted it to the Lexidma repository as a pull request: https://github.com/oasis-tcs/lexidma/pull/4 . Go there to understand how I have decided to structure it and why. Or, if you just want to read what I have written, open the attached PDF and read sections 3 to 8. I hope we can build on this. M. PS. For those of you who want to muck about with GitHub, Oxygen and DocBook, the instructions I had previously written have now moved to: https://github.com/michmech/lexidma-howto Attachment: Data Model for Lexicography (DMLex), Version 1.0.pdf Description: Adobe PDF document


  • 2.  RE: [lexidma] Working draft

    Posted 09-20-2021 12:41
    Hi Michal, David: do we have a meeting today or not? Best regards, Simon   From: lexidma@lists.oasis-open.org [mailto:lexidma@lists.oasis-open.org] On Behalf Of Michal MÄchura Sent: Sunday, September 12, 2021 9:50 PM To: lexidma@lists.oasis-open.org Subject: [lexidma] Working draft   Hi all, So, I have converted my latest proposal to DocBook and I have submitted it to the Lexidma repository as a pull request: https://github.com/oasis-tcs/lexidma/pull/4 . Go there to understand how I have decided to structure it and why. Or, if you just want to read what I have written, open the attached PDF and read sections 3 to 8. I hope we can build on this. M. PS. For those of you who want to muck about with GitHub, Oxygen and DocBook, the instructions I had previously written have now moved to: https://github.com/michmech/lexidma-howto


  • 3.  Re: [lexidma] Working draft

    Posted 09-20-2021 13:32
    Sorry guys, the meeting was on, I did not dial in due to time zone mix up.. Did you guys meet over the PR? https://github.com/oasis-tcs/lexidma/pull/4 -dF On Mon, Sep 20, 2021 at 2:41 PM Simon Krek < simon.krek@ijs.si > wrote: Hi Michal, David: do we have a meeting today or not? Best regards, Simon From: lexidma@lists.oasis-open.org [mailto: lexidma@lists.oasis-open.org ] On Behalf Of Michal MÄchura Sent: Sunday, September 12, 2021 9:50 PM To: lexidma@lists.oasis-open.org Subject: [lexidma] Working draft Hi all, So, I have converted my latest proposal to DocBook and I have submitted it to the Lexidma repository as a pull request: https://github.com/oasis-tcs/lexidma/pull/4 . Go there to understand how I have decided to structure it and why. Or, if you just want to read what I have written, open the attached PDF and read sections 3 to 8. I hope we can build on this. M. PS. For those of you who want to muck about with GitHub, Oxygen and DocBook, the instructions I had previously written have now moved to: https://github.com/michmech/lexidma-howto


  • 4.  Re: [lexidma] Working draft

    Posted 09-20-2021 13:35
    I'm sorry I couldn't make to the meeting. I had another appointment which totally over-ran. I'm free now, though, if anyone wants to talk (even informally, doesn't have to count as an official meeting). M. po 20. 9. 2021 v 14:41 odesÃlatel Simon Krek < simon.krek@ijs.si > napsal: Hi Michal, David: do we have a meeting today or not? Best regards, Simon From: lexidma@lists.oasis-open.org [mailto: lexidma@lists.oasis-open.org ] On Behalf Of Michal MÄchura Sent: Sunday, September 12, 2021 9:50 PM To: lexidma@lists.oasis-open.org Subject: [lexidma] Working draft Hi all, So, I have converted my latest proposal to DocBook and I have submitted it to the Lexidma repository as a pull request: https://github.com/oasis-tcs/lexidma/pull/4 . Go there to understand how I have decided to structure it and why. Or, if you just want to read what I have written, open the attached PDF and read sections 3 to 8. I hope we can build on this. M. PS. For those of you who want to muck about with GitHub, Oxygen and DocBook, the instructions I had previously written have now moved to: https://github.com/michmech/lexidma-howto


  • 5.  Re: [lexidma] Working draft

    Posted 09-20-2021 13:37
    Simon, if you start the meeting, at least Michal and I are on -dF On Mon, Sep 20, 2021 at 3:34 PM Michal MÄchura < 462258@mail.muni.cz > wrote: I'm sorry I couldn't make to the meeting. I had another appointment which totally over-ran. I'm free now, though, if anyone wants to talk (even informally, doesn't have to count as an official meeting). M. po 20. 9. 2021 v 14:41 odesÃlatel Simon Krek < simon.krek@ijs.si > napsal: Hi Michal, David: do we have a meeting today or not? Best regards, Simon From: lexidma@lists.oasis-open.org [mailto: lexidma@lists.oasis-open.org ] On Behalf Of Michal MÄchura Sent: Sunday, September 12, 2021 9:50 PM To: lexidma@lists.oasis-open.org Subject: [lexidma] Working draft Hi all, So, I have converted my latest proposal to DocBook and I have submitted it to the Lexidma repository as a pull request: https://github.com/oasis-tcs/lexidma/pull/4 . Go there to understand how I have decided to structure it and why. Or, if you just want to read what I have written, open the attached PDF and read sections 3 to 8. I hope we can build on this. M. PS. For those of you who want to muck about with GitHub, Oxygen and DocBook, the instructions I had previously written have now moved to: https://github.com/michmech/lexidma-howto


  • 6.  Re: [lexidma] Working draft

    Posted 09-20-2021 14:06
    Well, I guess it wasn't meant to be. I have to go offline again now, but I'll be happy to meet (informally or formally) any other day this week if anybody wants to. M. po 20. 9. 2021 v 15:37 odesÃlatel David Filip < david.filip@adaptcentre.ie > napsal: Simon, if you start the meeting, at least Michal and I are on -dF On Mon, Sep 20, 2021 at 3:34 PM Michal MÄchura < 462258@mail.muni.cz > wrote: I'm sorry I couldn't make to the meeting. I had another appointment which totally over-ran. I'm free now, though, if anyone wants to talk (even informally, doesn't have to count as an official meeting). M. po 20. 9. 2021 v 14:41 odesÃlatel Simon Krek < simon.krek@ijs.si > napsal: Hi Michal, David: do we have a meeting today or not? Best regards, Simon From: lexidma@lists.oasis-open.org [mailto: lexidma@lists.oasis-open.org ] On Behalf Of Michal MÄchura Sent: Sunday, September 12, 2021 9:50 PM To: lexidma@lists.oasis-open.org Subject: [lexidma] Working draft Hi all, So, I have converted my latest proposal to DocBook and I have submitted it to the Lexidma repository as a pull request: https://github.com/oasis-tcs/lexidma/pull/4 . Go there to understand how I have decided to structure it and why. Or, if you just want to read what I have written, open the attached PDF and read sections 3 to 8. I hope we can build on this. M. PS. For those of you who want to muck about with GitHub, Oxygen and DocBook, the instructions I had previously written have now moved to: https://github.com/michmech/lexidma-howto


  • 7.  Re: [lexidma] Working draft

    Posted 09-20-2021 15:51
    All, Apologies again, from Michal and I that the meeting didn't go ahead today. Mchal and I met over his PR https://github.com/oasis-tcs/lexidma/pull/4 and agreed some more changes to be made by Wednesday. Michal will also create GitHub issues (for design decisions we didn't agree on) or GitHub Wiki text (for new proposed design principles, extending but not overturning the currently agreed ones). Michal will continue developing the DocBook text on GitHub, I will be adding his commits / PRs, Issues, and Wiki contributions to the next meeting agenda. Since Simon wants both Michal and I in a meeting to progress the WD, the next possible date is 11th October. I will move the planned 18th October meeting to 11th October. And in that meeting we can try and agree more frequent meetings going on as well as collaboration of multiple potential contributors on the WD. Cheers dF -dF On Mon, Sep 20, 2021 at 4:06 PM Michal MÄchura < 462258@mail.muni.cz > wrote: Well, I guess it wasn't meant to be. I have to go offline again now, but I'll be happy to meet (informally or formally) any other day this week if anybody wants to. M. po 20. 9. 2021 v 15:37 odesÃlatel David Filip < david.filip@adaptcentre.ie > napsal: Simon, if you start the meeting, at least Michal and I are on -dF On Mon, Sep 20, 2021 at 3:34 PM Michal MÄchura < 462258@mail.muni.cz > wrote: I'm sorry I couldn't make to the meeting. I had another appointment which totally over-ran. I'm free now, though, if anyone wants to talk (even informally, doesn't have to count as an official meeting). M. po 20. 9. 2021 v 14:41 odesÃlatel Simon Krek < simon.krek@ijs.si > napsal: Hi Michal, David: do we have a meeting today or not? Best regards, Simon From: lexidma@lists.oasis-open.org [mailto: lexidma@lists.oasis-open.org ] On Behalf Of Michal MÄchura Sent: Sunday, September 12, 2021 9:50 PM To: lexidma@lists.oasis-open.org Subject: [lexidma] Working draft Hi all, So, I have converted my latest proposal to DocBook and I have submitted it to the Lexidma repository as a pull request: https://github.com/oasis-tcs/lexidma/pull/4 . Go there to understand how I have decided to structure it and why. Or, if you just want to read what I have written, open the attached PDF and read sections 3 to 8. I hope we can build on this. M. PS. For those of you who want to muck about with GitHub, Oxygen and DocBook, the instructions I had previously written have now moved to: https://github.com/michmech/lexidma-howto


  • 8.  Re: [lexidma] Working draft

    Posted 09-22-2021 15:11
      |   view attached
    Hi all, I have made a few changes to the pull request: https://github.com/oasis-tcs/lexidma/pull/4 A PDF version of the working draft, which contains those changes, is attached. Read sections 3 to 8 if you want to skip over the boilerplate and read only what I have written. Also, I have written a summary of the principles that guided me while I was writing the working draft: https://github.com/oasis-tcs/lexidma/wiki/Document-design-principles These are not principles we have agreed, so they are open for discussion. I hope this is moving us forward and I hope to meet you all soon. M. Attachment: Data Model for Lexicography (DMLex), Version 1.0.pdf Description: Adobe PDF document


  • 9.  Re: [lexidma] Working draft

    Posted 10-11-2021 14:16
    Hi Michal, Some general comments about the draft as it is. We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form 'boise' We cannot give citations for sources of information, as is typical in historical dictionaries (see image) There is no etymology information (as discussed in the call). See example in Merriam-Webster: I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla) No modelling for hypernym relations Should I add in this modelling and make a PR to your branch? Regards, John On 22/09/2021 16:10, Michal MÄchura wrote: Hi all, I have made a few changes to the pull request: https://github.com/oasis-tcs/lexidma/pull/4 A PDF version of the working draft, which contains those changes, is attached. Read sections 3 to 8 if you want to skip over the boilerplate and read only what I have written. Also, I have written a summary of the principles that guided me while I was writing the working draft: https://github.com/oasis-tcs/lexidma/wiki/Document-design-principles These are not principles we have agreed, so they are open for discussion. I hope this is moving us forward and I hope to meet you all soon. M. --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 10.  Re: [lexidma] Working draft

    Posted 10-11-2021 15:58
    John, I suggest that you make your PR against the lexidma master after I merge Michal's PR, real soon.. hopefully today.. We don't have a need for other branches than master atm, PRs should be made from your own fork though.. You could document your requirement on wiki or as an issue that could be linked with your PR (just a suggestion).. Cheers -dF On Mon, Oct 11, 2021 at 4:16 PM John McCrae < john.mccrae@insight-centre.org > wrote: Hi Michal, Some general comments about the draft as it is. We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form 'boise' We cannot give citations for sources of information, as is typical in historical dictionaries (see image) There is no etymology information (as discussed in the call). See example in Merriam-Webster: I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla) No modelling for hypernym relations Should I add in this modelling and make a PR to your branch? Regards, John On 22/09/2021 16:10, Michal MÄchura wrote: Hi all, I have made a few changes to the pull request: https://github.com/oasis-tcs/lexidma/pull/4 A PDF version of the working draft, which contains those changes, is attached. Read sections 3 to 8 if you want to skip over the boilerplate and read only what I have written. Also, I have written a summary of the principles that guided me while I was writing the working draft: https://github.com/oasis-tcs/lexidma/wiki/Document-design-principles These are not principles we have agreed, so they are open for discussion. I hope this is moving us forward and I hope to meet you all soon. M. --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 11.  Re: [lexidma] Working draft

    Posted 10-12-2021 09:42
    We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form boise For gender (and other properties) of nouns (and other word classes) the idea is that implementers would conflate them with part-of-speech labels: they would create a masculine noun part-of-speech label, a feminine noun part-of-speech label, and so on. For inflection categories, for example to say that this noun belongs in such-and-such inflectional paradigm, there my suggestion would again be to conflate this with the part-of-speech label, for example feminine noun of the second declension . Note that in DMLex, implementers can use the Controlled Vocabularies module to tell the world what their home-baked part-of-speech labels map to in some external data category ontology, for example to LexInfo. For inflected forms, that s what the InflectedForm object type is for (but no tildas please). We cannot give citations for sources of information, as is typical in historical dictionaries (see image) True. This would be a good candidate for a module. There is no etymology information (as discussed in the call). See example in Merriam-Webster: True again, and again it s a candidate for a module. I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla) My idea was that each collocation would be treated as an entry (with the full-form collocation as its headword, for example lÃn boise ), and this entry would be connected to its mother entry through SubentryRelation . To my mind, this is the only reasonable way to handle collocations if we want to avoid embedded subentries and headword overriding (see my report on subentrying earlier in Lexidma and also my presentation on recursion at eLex recently). No modelling for hypernym relations I was thinking most lexicographers would be happy enough to model hypernymy and hyponymy through SimilarityRelation in the Crossreferencing module. But, by all means, feel free to propose a more detailed inventory of relation types for this module if you want. On a general note, I m sure people from both inside and outside Lexidma will have a lot of questions like these; how do I handle nound gender in DMLex, how do I handle collocations in DMLex, and so on. The standard itself is probably not the right place to answer them. So, perhaps we should think about producing some sort of a cookbook to go with the standard as an additional, less formal guide to implementing DMLex. M.


  • 12.  Re: [lexidma] Working draft

    Posted 10-12-2021 12:49
    Hi Michal, Thanks. On 12/10/2021 10:42, Michal MÄchura wrote: We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form boise For gender (and other properties) of nouns (and other word classes) the idea is that implementers would conflate them with part-of-speech labels: they would create a masculine noun part-of-speech label, a feminine noun part-of-speech label, and so on. For inflection categories, for example to say that this noun belongs in such-and-such inflectional paradigm, there my suggestion would again be to conflate this with the part-of-speech label, for example feminine noun of the second declension . Note that in DMLex, implementers can use the Controlled Vocabularies module to tell the world what their home-baked part-of-speech labels map to in some external data category ontology, for example to LexInfo. For inflected forms, that s what the InflectedForm object type is for (but no tildas please). Okay, I don't like all this 'conflating', it seems like a hack to me that is not really necessary. We cannot give citations for sources of information, as is typical in historical dictionaries (see image) True. This would be a good candidate for a module. There is no etymology information (as discussed in the call). See example in Merriam-Webster: True again, and again it s a candidate for a module. Okay, I will try to make some pull requests for candidate modules. I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla) My idea was that each collocation would be treated as an entry (with the full-form collocation as its headword, for example lÃn boise ), and this entry would be connected to its mother entry through SubentryRelation . To my mind, this is the only reasonable way to handle collocations if we want to avoid embedded subentries and headword overriding (see my report on subentrying earlier in Lexidma and also my presentation on recursion at eLex recently). Yeah, but as you see we often get full sentences as 'collocations' and in many dictionaries the definitions rely significantly on context. I think the solution above may work, but we should certainly try it on some real dictionary data to see how practical it is. No modelling for hypernym relations I was thinking most lexicographers would be happy enough to model hypernymy and hyponymy through SimilarityRelation in the Crossreferencing module. But, by all means, feel free to propose a more detailed inventory of relation types for this module if you want. I think that the distinction between synonyms and hypernyms is quite important and few dictionaries would like to conflate these. A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations? On a general note, I m sure people from both inside and outside Lexidma will have a lot of questions like these; how do I handle nound gender in DMLex, how do I handle collocations in DMLex, and so on. The standard itself is probably not the right place to answer them. So, perhaps we should think about producing some sort of a cookbook to go with the standard as an additional, less formal guide to implementing DMLex. I agree. Examples will need to be documented somewhere. Regards, John M.


  • 13.  Re: [lexidma] Working draft

    Posted 10-12-2021 15:13
    Re: conflating gender etc, into part of speech Okay, I don t like all this conflating , it seems like a hack to me that is not really necessary. Another option would be to have a more generic label type and (optionally) use a controlled vocabulary to tell us what kind of label it is: part of speech or gender or whatever. That would still allow conflating but would also allow not-conflating . In fact, our sense-level usage labels do this already, so for consistency we should probably do it for our entry-level grammar labels too. Re: collocations Yeah, but as you see we often get full sentences as collocations and in many dictionaries the definitions rely significantly on context. That Irish-English dictonary you re looking at (Ã DÃnaill s FGB from 1977) is exceptional in that it makes no clear distinction between collocations and example sentences and other kinds of multiword units that appear inside the articles. The bold/italic type is not a consistent guide either. I know this dictionary well, I had a hand in its retrodigitization a while back. :-) If somebody wanted to convert FGB into DMLex I would advise them to either treat all those things as example sentences, or else to manually decide for each one whether or not it deserves to be its own (sub)entry. And either way, this dictionary is on the periphery of Lexidma s interests because it s an old paper one. I get the impression that modern born-digital dictionaries tend to be more clear on whether something is or isn t a (sub)entry. A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations? That s doable, yes. But still, I think we need separate Relation subtypes depending on their arity and directionality. Like, a synonymy relation can have two or more participants and is undirected, an antonymy relation must have exactly two participants and is also undirected, a hyperym/hyponym relation has two participants and is directed. M. Ãt 12. 10. 2021 v 14:48 odesÃlatel John McCrae < john.mccrae@insight-centre.org > napsal: Hi Michal, Thanks. On 12/10/2021 10:42, Michal MÄchura wrote: We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form boise For gender (and other properties) of nouns (and other word classes) the idea is that implementers would conflate them with part-of-speech labels: they would create a masculine noun part-of-speech label, a feminine noun part-of-speech label, and so on. For inflection categories, for example to say that this noun belongs in such-and-such inflectional paradigm, there my suggestion would again be to conflate this with the part-of-speech label, for example feminine noun of the second declension . Note that in DMLex, implementers can use the Controlled Vocabularies module to tell the world what their home-baked part-of-speech labels map to in some external data category ontology, for example to LexInfo. For inflected forms, that s what the InflectedForm object type is for (but no tildas please). Okay, I don't like all this 'conflating', it seems like a hack to me that is not really necessary. We cannot give citations for sources of information, as is typical in historical dictionaries (see image) True. This would be a good candidate for a module. There is no etymology information (as discussed in the call). See example in Merriam-Webster: True again, and again it s a candidate for a module. Okay, I will try to make some pull requests for candidate modules. I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla) My idea was that each collocation would be treated as an entry (with the full-form collocation as its headword, for example lÃn boise ), and this entry would be connected to its mother entry through SubentryRelation . To my mind, this is the only reasonable way to handle collocations if we want to avoid embedded subentries and headword overriding (see my report on subentrying earlier in Lexidma and also my presentation on recursion at eLex recently). Yeah, but as you see we often get full sentences as 'collocations' and in many dictionaries the definitions rely significantly on context. I think the solution above may work, but we should certainly try it on some real dictionary data to see how practical it is. No modelling for hypernym relations I was thinking most lexicographers would be happy enough to model hypernymy and hyponymy through SimilarityRelation in the Crossreferencing module. But, by all means, feel free to propose a more detailed inventory of relation types for this module if you want. I think that the distinction between synonyms and hypernyms is quite important and few dictionaries would like to conflate these. A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations? On a general note, I m sure people from both inside and outside Lexidma will have a lot of questions like these; how do I handle nound gender in DMLex, how do I handle collocations in DMLex, and so on. The standard itself is probably not the right place to answer them. So, perhaps we should think about producing some sort of a cookbook to go with the standard as an additional, less formal guide to implementing DMLex. I agree. Examples will need to be documented somewhere. Regards, John M.


  • 14.  Re: [lexidma] Working draft

    Posted 10-13-2021 09:55
    On 12/10/2021 16:12, Michal MÄchura wrote: Re: conflating gender etc, into part of speech Okay, I don t like all this conflating , it seems like a hack to me that is not really necessary. Another option would be to have a more generic label type and (optionally) use a controlled vocabulary to tell us what kind of label it is: part of speech or gender or whatever. That would still allow conflating but would also allow not-conflating . In fact, our sense-level usage labels do this already, so for consistency we should probably do it for our entry-level grammar labels too. I would recommend a key/value style representation of linguistic properties. This is the approach taken in all other models and would connect with other controlled vocabulary mechanisms (CLARIN, GOLD, LexInfo etc.) Re: collocations Yeah, but as you see we often get full sentences as collocations and in many dictionaries the definitions rely significantly on context. That Irish-English dictonary you re looking at (Ã DÃnaill s FGB from 1977) is exceptional in that it makes no clear distinction between collocations and example sentences and other kinds of multiword units that appear inside the articles. The bold/italic type is not a consistent guide either. I know this dictionary well, I had a hand in its retrodigitization a while back. :-) If somebody wanted to convert FGB into DMLex I would advise them to either treat all those things as example sentences, or else to manually decide for each one whether or not it deserves to be its own (sub)entry. And either way, this dictionary is on the periphery of Lexidma s interests because it s an old paper one. I get the impression that modern born-digital dictionaries tend to be more clear on whether something is or isn t a (sub)entry. I wouldn't say that it is just FGB. I think the practice is very common in bilingual dictionaries. For example, Collins Italian-English dictionary* has many full phrases. I was looking at the print version but the online one is similar: https://www.collinsdictionary.com/dictionary/italian-english/questo * The first bilingual dictionary I could find on my shelf :) A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations? That s doable, yes. But still, I think we need separate Relation subtypes depending on their arity and directionality. Like, a synonymy relation can have two or more participants and is undirected, an antonymy relation must have exactly two participants and is also undirected, a hyperym/hyponym relation has two participants and is directed. I see. What is the reason to care about symmetry and arity? It only seems useful if you are going to add some kind of reasoning or validation methodology, which would add many more complexities to the model. Regards, John M. Ãt 12. 10. 2021 v 14:48 odesÃlatel John McCrae < john.mccrae@insight-centre.org > napsal: Hi Michal, Thanks. On 12/10/2021 10:42, Michal MÄchura wrote: We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form boise For gender (and other properties) of nouns (and other word classes) the idea is that implementers would conflate them with part-of-speech labels: they would create a masculine noun part-of-speech label, a feminine noun part-of-speech label, and so on. For inflection categories, for example to say that this noun belongs in such-and-such inflectional paradigm, there my suggestion would again be to conflate this with the part-of-speech label, for example feminine noun of the second declension . Note that in DMLex, implementers can use the Controlled Vocabularies module to tell the world what their home-baked part-of-speech labels map to in some external data category ontology, for example to LexInfo. For inflected forms, that s what the InflectedForm object type is for (but no tildas please). Okay, I don't like all this 'conflating', it seems like a hack to me that is not really necessary. We cannot give citations for sources of information, as is typical in historical dictionaries (see image) True. This would be a good candidate for a module. There is no etymology information (as discussed in the call). See example in Merriam-Webster: True again, and again it s a candidate for a module. Okay, I will try to make some pull requests for candidate modules. I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla) My idea was that each collocation would be treated as an entry (with the full-form collocation as its headword, for example lÃn boise ), and this entry would be connected to its mother entry through SubentryRelation . To my mind, this is the only reasonable way to handle collocations if we want to avoid embedded subentries and headword overriding (see my report on subentrying earlier in Lexidma and also my presentation on recursion at eLex recently). Yeah, but as you see we often get full sentences as 'collocations' and in many dictionaries the definitions rely significantly on context. I think the solution above may work, but we should certainly try it on some real dictionary data to see how practical it is. No modelling for hypernym relations I was thinking most lexicographers would be happy enough to model hypernymy and hyponymy through SimilarityRelation in the Crossreferencing module. But, by all means, feel free to propose a more detailed inventory of relation types for this module if you want. I think that the distinction between synonyms and hypernyms is quite important and few dictionaries would like to conflate these. A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations? On a general note, I m sure people from both inside and outside Lexidma will have a lot of questions like these; how do I handle nound gender in DMLex, how do I handle collocations in DMLex, and so on. The standard itself is probably not the right place to answer them. So, perhaps we should think about producing some sort of a cookbook to go with the standard as an additional, less formal guide to implementing DMLex. I agree. Examples will need to be documented somewhere. Regards, John M.


  • 15.  Re: [lexidma] Working draft

    Posted 10-13-2021 16:19
    Re gender etc, part of speech etc I would recommend a key/value style representation of linguistic properties. This is the approach taken in all other models and would connect with other controlled vocabulary mechanisms (CLARIN, GOLD, LexInfo etc.) Sounds promising. Can you explain in more detail what that would look like? Re example sentences versus collocations, muti-wword units I wouldn t say that it is just FGB. I think the practice is very common in bilingual dictionaries. For example, Collins Italian-English dictionary* has many full phrases. I was looking at the print version but the online one is similar: https://www.collinsdictionary.com/dictionary/italian-english/questo Right, OK. Well, if the authors of a given dictionary don t want to make a distinction between examples on the one hand and collocations/multi-word combos on the other, then that s fine, they don t have to: they can treat them all as examples, and DMLex has a type for encoding examples. But for dictionary authors who do want to make such a dictinction, my suggestion would be that they make use of the subentrying mechanism in DMLex using SubentryRelation . Re cross-referencing, arity and directionality of relations What is the reason to care about symmetry and arity? It only seems useful if you are going to add some kind of reasoning or validation methodology, which would add many more complexities to the model. It seemed like common sense to me to care about such things, I didn t even think about it I guess this is where my object-oriented heritage is showing itself. :-) But yeah, good question. I guess it s useful for encoding the facts we want to encode and for basic type safety, like everything else in DMLex. If we decide not to care about such things, then we re going to end up allowing e.g. more than two participants in an antonymy relation, and that s nonsense, you can t have e.g. three words where each is an antonym of the other two. Or we re going to end up having two words in a hypernym/hyponym relation and not knowing which is the hypernym and which is the hyponym. M.