OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

 View Only

Fwd: [lexidma] follow up on two-level senses

  • 1.  Fwd: [lexidma] follow up on two-level senses

    Posted 07-03-2020 09:19
    Forward due to mail address. ---------- Forwarded message --------- Ã: John P. McCrae < john@mccr.ae > Date: Aoine 3 IÃil 2020 ag 10:17 Subject: Re: [lexidma] follow up on two-level senses To: MiloÅ JakubÃÄek < milos.jakubicek@sketchengine.eu > Cc: < lexidma@lists.oasis-open.org >, Buitelaar, Paul < paul.buitelaar@nuigalway.ie > Hi MiloÅ, Thanks for this interesting proposal, I have a few comments. Ar DÃar 2 IÃil 2020 ag 17:57, scrÃobh MiloÅ JakubÃÄek < milos.jakubicek@sketchengine.eu >: Dear all, as David encouraged, I'd like to follow up on our most recent discussion on using sense recursion via email. We've talked about this with Michal MÄchura and Michal came with what I consider an excellent alternative to this problem. Where lexicographers would want to encode a two-level sense structure, they could use the labels as a kind of tags to group related senses. Not only that this would avoid the needs for any additional hierarchy (not to talk about recursion, which would be an extrapolation ad absurdum), but it also enables what any such hierarchy prevents, namely several, possibly orthogonal, groupings of senses. To this point, I think it is important to realize that: (1) the hierarchy actually encodes similarity of senses, i.e. to say that some senses are closer to each other than some other ones (2) there is very limited agreement on this similarity Actually, I don't think this is merely similarity that is being encoded, in fact mostly these sense groupings are based on ideas of systematic polysemy (as introduced by authors such as Pustejovsky [1] and Buitelaar [2]) and complementary and contrastive senses (such as described by Weinreich [3]). These are real linguistic phenomenon and still motivate modern electronic lexicographic efforts [4]. (3) there are many possible way how this similarity can be defined and seen, allowing this means being closer to how language/word senses work (4) the fact that it was encoded in a hierarchical way that only allows one-dimensional structure merely comes from the limits of a printed dictionary I am not sure I agree with this... partly for the reasons stated above, but moreover, users do not want to use an electronic dictionary as some free-form graph structure. This is something that I have learnt from WordNet, that presenting the data as a flat text structure (e.g., https://en-word.net/ ) is more effective than through a graph diagram. As such, I think in both presentation and production of dictionary content, hierarchical groupings are still very useful. (5) this alternative solution therefore enables all this, and much more, if needed, without introducing additional complexity. I think that the labels generally could use a similar notation that David mentioned for PoS tagging, with prefix denoting type of label, e.g. "sensegroup:1" or "sensegroup:etymology1" and similar but that is to be discussed. From a technical point of view, there are also disadvantages to this. You are still encoding hierarchical senses, but now you are doing it in a way that is harder to work with in XPath and many other technologies, which in turn makes it harder for data creators to verify consistency. I would suggest that this is implemented as an optional sense grouping tag, e.g, <senseGrp> <sense id="..."><defn></defn></sense> <sense id="..."><defn></defn></sense> </senseGrp> Also, I would note that this discussion is only really about grouping senses. Grouping entries is more questionable but is often motivated by linguistic phenomena like derivation, grouping etymologically distinct forms of the same word (e.g., 'bank' can be first grouped into subentries based on its Germanic/Italian/French etymologies) or morphologically distinct forms (e.g., the unique dative singular found in the seventh sense here ). We should at least consider these requirements on the representation and have a plan to represent them in the model. Regards, John [1] The Generative Lexicon . James Pustejovsky [2] http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.43.1429&rep=rep1&type=pdf [3] https://www.jstor.org/stable/1263534?seq=1#metadata_info_tab_contents [4] https://euralex.org/elx_proceedings/Euralex2010/126_Euralex_2010_9_LANGEMETS_Systematic%20Polysemy%20of%20Nouns%20and%20its%20Lexicographic%20Treatment%20in%20Estonian.pdf All the best Milos