On 12/10/2021 16:12, Michal MÄchura wrote: Re: conflating gender etc, into part of speech Okay, I don t like all this conflating , it seems like a hack to me that is not really necessary. Another option would be to have a more generic label type and (optionally) use a controlled vocabulary to tell us what kind of label it is: part of speech or gender or whatever. That would still allow conflating but would also allow not-conflating . In fact, our sense-level usage labels do this already, so for consistency we should probably do it for our entry-level grammar labels too. I would recommend a key/value style representation of linguistic properties. This is the approach taken in all other models and would connect with other controlled vocabulary mechanisms (CLARIN, GOLD, LexInfo etc.) Re: collocations Yeah, but as you see we often get full sentences as collocations and in many dictionaries the definitions rely significantly on context. That Irish-English dictonary you re looking at (Ã DÃnaill s FGB from 1977) is exceptional in that it makes no clear distinction between collocations and example sentences and other kinds of multiword units that appear inside the articles. The bold/italic type is not a consistent guide either. I know this dictionary well, I had a hand in its retrodigitization a while back. :-) If somebody wanted to convert FGB into DMLex I would advise them to either treat all those things as example sentences, or else to manually decide for each one whether or not it deserves to be its own (sub)entry. And either way, this dictionary is on the periphery of Lexidma s interests because it s an old paper one. I get the impression that modern born-digital dictionaries tend to be more clear on whether something is or isn t a (sub)entry. I wouldn't say that it is just FGB. I think the practice is very common in bilingual dictionaries. For example, Collins Italian-English dictionary* has many full phrases. I was looking at the print version but the online one is similar:
https://www.collinsdictionary.com/dictionary/italian-english/questo * The first bilingual dictionary I could find on my shelf :) A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations? That s doable, yes. But still, I think we need separate Relation subtypes depending on their arity and directionality. Like, a synonymy relation can have two or more participants and is undirected, an antonymy relation must have exactly two participants and is also undirected, a hyperym/hyponym relation has two participants and is directed. I see. What is the reason to care about symmetry and arity? It only seems useful if you are going to add some kind of reasoning or validation methodology, which would add many more complexities to the model. Regards, John M. Ãt 12. 10. 2021 v 14:48 odesÃlatel John McCrae <
john.mccrae@insight-centre.org > napsal: Hi Michal, Thanks. On 12/10/2021 10:42, Michal MÄchura wrote: We have no way to represent fine-grained morphosyntactic information, including gender of nouns and categories of inflections. For example the annotation on this entry and its inflected form boise For gender (and other properties) of nouns (and other word classes) the idea is that implementers would conflate them with part-of-speech labels: they would create a masculine noun part-of-speech label, a feminine noun part-of-speech label, and so on. For inflection categories, for example to say that this noun belongs in such-and-such inflectional paradigm, there my suggestion would again be to conflate this with the part-of-speech label, for example feminine noun of the second declension . Note that in DMLex, implementers can use the Controlled Vocabularies module to tell the world what their home-baked part-of-speech labels map to in some external data category ontology, for example to LexInfo. For inflected forms, that s what the InflectedForm object type is for (but no tildas please). Okay, I don't like all this 'conflating', it seems like a hack to me that is not really necessary. We cannot give citations for sources of information, as is typical in historical dictionaries (see image) True. This would be a good candidate for a module. There is no etymology information (as discussed in the call). See example in Merriam-Webster: True again, and again it s a candidate for a module. Okay, I will try to make some pull requests for candidate modules. I am not sure how we intend to implement collocations (see example in FoclÃir Gaeilge-BÃarla) My idea was that each collocation would be treated as an entry (with the full-form collocation as its headword, for example lÃn boise ), and this entry would be connected to its mother entry through SubentryRelation . To my mind, this is the only reasonable way to handle collocations if we want to avoid embedded subentries and headword overriding (see my report on subentrying earlier in Lexidma and also my presentation on recursion at eLex recently). Yeah, but as you see we often get full sentences as 'collocations' and in many dictionaries the definitions rely significantly on context. I think the solution above may work, but we should certainly try it on some real dictionary data to see how practical it is. No modelling for hypernym relations I was thinking most lexicographers would be happy enough to model hypernymy and hyponymy through SimilarityRelation in the Crossreferencing module. But, by all means, feel free to propose a more detailed inventory of relation types for this module if you want. I think that the distinction between synonyms and hypernyms is quite important and few dictionaries would like to conflate these. A simple solution would be to just have a single Relation object and use the controlled vocabulary model to define the types of relations? On a general note, I m sure people from both inside and outside Lexidma will have a lot of questions like these; how do I handle nound gender in DMLex, how do I handle collocations in DMLex, and so on. The standard itself is probably not the right place to answer them. So, perhaps we should think about producing some sort of a cookbook to go with the standard as an additional, less formal guide to implementing DMLex. I agree. Examples will need to be documented somewhere. Regards, John M.