OASIS Lexicographic Infrastructure Data Model and API (LEXIDMA) TC

 View Only
  • 1.  Relational remodeling: the diagram

    Posted 02-08-2021 17:32
      |   view attached
    Hi all, Here (see attachment) is an entity-relationship diagram which combines the two reports I ve done on relational remodelling. A couple of notes to guide you through it: The main data types are entries and senses. I m sure every IT person will be happy to see that the list of senses inside an entry is always a flat list (there is no hierarchical embedding), that there are no such things as subsenses and subentries (at least not formally, as separate data types) and that every sense inside each entry is guaranteed to be about the entry s headword and not about something else. On the other hand, lexicographers should be happy to see that phenomena such as subsensing and subentries can still be captured here, just not as embedding hierarchies but as relations. An entry can contain data like headword and POS (top left), and a sense can contain data like definitions, translations and examples (top right). We will have to flesh out what those are, but this shouldn t be difficult. A lot of good thinking on this has been done in Elexis already. In the bottom half of the diagram you see three data types for the three relations which allow us to record the fact that, going from right to left (1) a sense is to be shown to the end-user as a subsense of another sense, (2) an entry is to be shown to the end-user as a subentry inside a sense of another entry, and (3) an entry is to be shown to the end-user as a subentry of another entry. Depending on the formalist you choose for your implementation or serialization (relational database, RDF, XML, JSON ), the three relations can be implemented as three tables in a relational database (like in this diagram) or as something else. I have drawn the diagram as if I was planning an implementation in a relational database. I don t know how to draw it more abstractly than this. (Incidentally, the fact that foreign keys attributes like ChildSenseID etc. are explicitly mentioned in the diagram means that strictly speaking it isn t an ER diagram. But I thought it was useful to have them there anyway, to make it obvious from their names who the child is and who the parent is.) Senses and relations have a ListingOrder (= a sort key) which determines their order when presented to the end-user. This means that (1) inside an entry, its senses and subentries can be shown in a given order and (2) inside a sense, its subsenses and subentries can be show in a given order. To summarize, this diagram expresses what I was arguing for in my last two reports. The idea is that we remodel some embedding phenomena as relations, and by doing that, we create an IT-friendly data model, with flat lists, with fewer data types and with relations between them. M. Attachment: diagram.png Description: PNG image


  • 2.  Re: [lexidma] Relational remodeling: the diagram

    Posted 02-10-2021 13:06
      |   view attached
    Hi Michael, I was working on my own diagram that I put here: I think we are on a similar path, although there are some difference in modelling I am kind of assuming that every element can have an ID and some way of making child elements. I am using some more general classes ('entry relation', 'sense relation', 'form', 'morphosyntactic property') that can be subtyped into more specific types ('Headword', 'SubentryOfEntryRelation') I allow elements to appear in many places, e.g., examples can be of entries, senses, collocation, so I use structure over specific classes ('CollocationExampleTranslation') Regards, John On 08/02/2021 17:31, Michal MÄchura wrote: Hi all, Here (see attachment) is an entity-relationship diagram which combines the two reports I ve done on relational remodelling. A couple of notes to guide you through it: The main data types are entries and senses. I m sure every IT person will be happy to see that the list of senses inside an entry is always a flat list (there is no hierarchical embedding), that there are no such things as subsenses and subentries (at least not formally, as separate data types) and that every sense inside each entry is guaranteed to be about the entry s headword and not about something else. On the other hand, lexicographers should be happy to see that phenomena such as subsensing and subentries can still be captured here, just not as embedding hierarchies but as relations. An entry can contain data like headword and POS (top left), and a sense can contain data like definitions, translations and examples (top right). We will have to flesh out what those are, but this shouldn t be difficult. A lot of good thinking on this has been done in Elexis already. In the bottom half of the diagram you see three data types for the three relations which allow us to record the fact that, going from right to left (1) a sense is to be shown to the end-user as a subsense of another sense, (2) an entry is to be shown to the end-user as a subentry inside a sense of another entry, and (3) an entry is to be shown to the end-user as a subentry of another entry. Depending on the formalist you choose for your implementation or serialization (relational database, RDF, XML, JSON ), the three relations can be implemented as three tables in a relational database (like in this diagram) or as something else. I have drawn the diagram as if I was planning an implementation in a relational database. I don t know how to draw it more abstractly than this. (Incidentally, the fact that foreign keys attributes like ChildSenseID etc. are explicitly mentioned in the diagram means that strictly speaking it isn t an ER diagram. But I thought it was useful to have them there anyway, to make it obvious from their names who the child is and who the parent is.) Senses and relations have a ListingOrder (= a sort key) which determines their order when presented to the end-user. This means that (1) inside an entry, its senses and subentries can be shown in a given order and (2) inside a sense, its subsenses and subentries can be show in a given order. To summarize, this diagram expresses what I was arguing for in my last two reports. The idea is that we remodel some embedding phenomena as relations, and by doing that, we create an IT-friendly data model, with flat lists, with fewer data types and with relations between them. M. --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 3.  Re: [lexidma] Relational remodeling: the diagram

    Posted 02-15-2021 13:02
    Hi John, On Wed, 10 Feb 2021 at 14:06, John McCrae < john.mccrae@insight-centre.org > wrote: Hi Michael, I was working on my own diagram that I put here: I think we are on a similar path, although there are some difference in modelling I am kind of assuming that every element can have an ID and some way of making child elements. Yes but not a obligatory ID. May I recall that best IDs are natural IDs and we should not enforce artificial IDs where they are not necessary. I allow elements to appear in many places, e.g., examples can be of entries, senses, collocation, so I use structure over specific classes ('CollocationExampleTranslation') That's an absolutely unwanted feature from the processing point of view. One of the reasons to make the model is to avoid things like this, especially as they are not necessary to convey the associated information. E.g. a translation can be of many things of course -- but it is different type of translation and it should be exactly specified like that. Best Milos


  • 4.  Re: [lexidma] Relational remodeling: the diagram

    Posted 02-15-2021 13:19
    Hi Milos, On 15/02/2021 13:01, MiloÅ JakubÃÄek wrote: Hi John, On Wed, 10 Feb 2021 at 14:06, John McCrae < john.mccrae@insight-centre.org > wrote: Hi Michael, I was working on my own diagram that I put here: I think we are on a similar path, although there are some difference in modelling I am kind of assuming that every element can have an ID and some way of making child elements. Yes but not a obligatory ID. May I recall that best IDs are natural IDs and we should not enforce artificial IDs where they are not necessary. Sure, I think we agree here. I allow elements to appear in many places, e.g., examples can be of entries, senses, collocation, so I use structure over specific classes ('CollocationExampleTranslation') That's an absolutely unwanted feature from the processing point of view. One of the reasons to make the model is to avoid things like this, especially as they are not necessary to convey the associated information. E.g. a translation can be of many things of course -- but it is different type of translation and it should be exactly specified like that. Actually, I am not sure which view you are supporting here. The difference is between (in XML notation) reusing 'Translation' e.g., <Definition> foo <Translation>foo in German</Translation> </Definition> <Example> bar <Translation>bar in German</Translation> </Example> versus introducing specific types: <Definition> foo <DefinitionTranslation>foo in German</ Definition Translation> </Definition> <Example> bar < Example Translation>bar in German</ Example Translation> </Example> Both are fine and have different advantages/disadvantages. The former leading to a smaller and hence easier-to-implement model, the latter allows more specification, should we need to have a different set of properties of a translation in different contexts. I lean towards the former, but we should evaluate what makes the most sense in terms of requirements and applications of the model. Regards, John Best Milos


  • 5.  Re: [lexidma] Relational remodeling: the diagram

    Posted 02-16-2021 09:02
    Hi all, I have an editable link to my version of the diagram here: https://drive.google.com/file/d/1B0xD77HjZFlHVTQlCNw2fLs7OOvV5P0U/view?usp=sharing I have updated it with the vocabulary shared by Carole. Regards, John On 15/02/2021 13:19, John McCrae wrote: Hi Milos, On 15/02/2021 13:01, MiloÅ JakubÃÄek wrote: Hi John, On Wed, 10 Feb 2021 at 14:06, John McCrae < john.mccrae@insight-centre.org > wrote: Hi Michael, I was working on my own diagram that I put here: I think we are on a similar path, although there are some difference in modelling I am kind of assuming that every element can have an ID and some way of making child elements. Yes but not a obligatory ID. May I recall that best IDs are natural IDs and we should not enforce artificial IDs where they are not necessary. Sure, I think we agree here. I allow elements to appear in many places, e.g., examples can be of entries, senses, collocation, so I use structure over specific classes ('CollocationExampleTranslation') That's an absolutely unwanted feature from the processing point of view. One of the reasons to make the model is to avoid things like this, especially as they are not necessary to convey the associated information. E.g. a translation can be of many things of course -- but it is different type of translation and it should be exactly specified like that. Actually, I am not sure which view you are supporting here. The difference is between (in XML notation) reusing 'Translation' e.g., <Definition> foo <Translation>foo in German</Translation> </Definition> <Example> bar <Translation>bar in German</Translation> </Example> versus introducing specific types: <Definition> foo <DefinitionTranslation>foo in German</ Definition Translation> </Definition> <Example> bar < Example Translation>bar in German</ Example Translation> </Example> Both are fine and have different advantages/disadvantages. The former leading to a smaller and hence easier-to-implement model, the latter allows more specification, should we need to have a different set of properties of a translation in different contexts. I lean towards the former, but we should evaluate what makes the most sense in terms of requirements and applications of the model. Regards, John Best Milos


  • 6.  Re: [lexidma] Relational remodeling: the diagram

    Posted 02-17-2021 18:49
    Hi John, On Mon, 15 Feb 2021 at 14:19, John McCrae < john.mccrae@insight-centre.org > wrote: Actually, I am not sure which view you are supporting here. The difference is between (in XML notation) reusing 'Translation' e.g., <Definition> foo <Translation>foo in German</Translation> </Definition> <Example> bar <Translation>bar in German</Translation> </Example> versus introducing specific types: <Definition> foo <DefinitionTranslation>foo in German</ Definition Translation> </Definition> <Example> bar < Example Translation>bar in German</ Example Translation> </Example> Both are fine and have different advantages/disadvantages. The former leading to a smaller and hence easier-to-implement model, I think it's vice versa, the former makes it much harder to use the model (or rather, such a serialization). Having a model "smaller" in terms of number of entities is no big gain, and on contrary, if the same entity plays multiple roles in the model, it makes it much harder to understand and use. Translation is an excellent example of that. In theory, you "just" need to say something is translated, but in practice these translations are never the same from the modelling and usage perspective. Sometimes the translation is just one word, sometimes a whole sentence. In case of examples, it might include references to the headword (e.g. in XML as inline element), for definitions not so much. When processing the data and encountering <Translation> you will need to know which it is to do anything with it, so it is much better to have separate names for each. Best Milos


  • 7.  Re: [lexidma] Relational remodeling: the diagram

    Posted 02-20-2021 17:18
      |   view attached
    Hi all, I have redrawn my diagram to be more formally an ER diagram (see attachment). The rectangles are entities, the diamonds are relationships (the arrow indicates who s who in that relationship) and the ovals are properties. Cardinalities are indicated using crow s feet notation . The dashed shapes are entities/properties we still need flesh out (from the ELEXIS vocabulary which Carole shared recently). A good way to read the diagram is to follow the numbers and read along: An entry can have one or more sense. A sense belongs to exactly one entry. A sense can be a subsense of up to one other sense. A sense can have zero or more subsenses. Business rule: they must be senses of the same entry. An entry can a subentry of zero or more senses. A sense can have zero or more subentries. Business rule: the sense must not be a sense of the subentry. An entry can be a subentry of zero or more other entries. An entry can have zero or more subentries. Hope this makes it clear that (1) the core structural skeleton of each entry is that the entry has an ordered list of senses, and that (2) in addition to this core we have relationships to enable subsensing and subentrying. That s (part of) the data model . At view time, when the entry is being displayed to a human, the relationships can be traversed to construct a view model in which an entry will contain an ordered list of senses and (optionally) subentries, and a sense will (optionally) contain an ordered list of senses and subentries. M. Attachment: Lexidma ER.png Description: PNG image