OASIS XLIFF Object Model and Other Serializations (XLIFF OMOS) TC

  • 1.  Handling unsupported modules and extensions

    Posted 06-14-2017 11:58
    Hi all,   Looking back at our issue for how an application could handle modules that it does not supported.   Here is a simple example of a basic Translation Candidate entry. With the following assumption (for now): -      We use “prefix_” for the modules -      We use “namespace-IRI<space>” for the extensions   Notation when the application supports the module:   "mtc_matches": [   {     "mtc_id": "mtc1",     "mtc_ref": "#1",     "mtc_reference": false,     "mtc_similarity": 50.0,     "mtc_type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Notation when the application does not supports the module:   "urn:oasis:names:tc:xliff:matches:2.0 matches": [   {     "urn:oasis:names:tc:xliff:matches:2.0 id": "mtc1",     "urn:oasis:names:tc:xliff:matches:2.0 ref": "#1",     "urn:oasis:names:tc:xliff:matches:2.0 reference": "false",     "urn:oasis:names:tc:xliff:matches:2.0 similarity": "50.0",     "urn:oasis:names:tc:xliff:matches:2.0 type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Using “mtc_” vs. “urn:oasis:names:tc:xliff:matches:2.0 ” is not really an issue. A tool not supporting MTC can do the conversion to the namespace, and vice-versa, a tool supporting MTC can map “urn:oasis:names:tc:xliff:matches:2.0 ” to “mtc_” if needed.   This said, I think it would be * a lot simpler * to have both tool use the same mechanism: property names with a prefix and a @context-like table to associate the prefixes with the namespace-IRI.   One much bigger issue is in the values of the properties:   We have to look at supported vs. unsupported from the object model viewpoint: An unsupported module is the same as an extension. The tool cannot know specific things about the values the extension uses. It has to store them in a generic way. For example it cannot know if in similarity="50" the "50" is a number. It may guess it, but without certainty (it could be a string too), so it should not try to guess. This is relatively easy to implement it for XLIFF: You put things in a map where the value is a string. Then you can write things back without knowledge of the data and without breaking anything.   When a tool reads a JLIFF extension it could store it the same way as for XLIFF. It could even have an extra flag telling what kind of type the value was when reading it from JLIFF, so it could write it back the same way to JLIFF. But such flag cannot be set (with certitude) when the data is read from XLIFF, so if the tool reads from XLIFF and needs to write to JLIFF, it cannot output "similarity":50, but only "similarity":"50".   The root cause of the issue is that XML values are not typed if you don’t have a schema, while JSON values have basic types. Hence a tool reading XLIFF cannot set a type for the data it does not supports, while the same tool reading JLIFF can.   I don’t have solution at this point. But I thought it may help to have the issue described, so smart people can come up with options. Maybe I’m just not seeing an obvious answer.   Cheers, -yves  


  • 2.  Re: [xliff-omos] Handling unsupported modules and extensions

    Posted 10-31-2017 22:01
    Picking this thread up after a long time. I included Yves's issue with data typing in unsupported extensions in the "challenges" section of the presentation on JLIFF I gave at FEISGILTT today.  I had included the relevant text from the XLIFF 2.0 spec on a slide: "Writers that do not support a given custom namespace based user extension SHOULD preserve that extension without Modification." David pointed out that we can rely on the use of "should" here. The lack of type information means we may not be able to preserve the extension faithfully, but this is not necessarily non-compliant. If schema for the extension is published, then in theory the typing information can be known.  However, there may be practical reasons (schema discovery, etc) that make this difficult in some case. On Wed, Jun 14, 2017 at 4:58 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all,   Looking back at our issue for how an application could handle modules that it does not supported.   Here is a simple example of a basic Translation Candidate entry. With the following assumption (for now): -      We use “prefix_” for the modules -      We use “namespace-IRI<space>” for the extensions   Notation when the application supports the module:   "mtc_matches": [   {     "mtc_id": "mtc1",     "mtc_ref": "#1",     "mtc_reference": false,     "mtc_similarity": 50.0,     "mtc_type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Notation when the application does not supports the module:   "urn:oasis:names:tc:xliff: matches:2.0 matches": [   {     "urn:oasis:names:tc:xliff: matches:2.0 id": "mtc1",     "urn:oasis:names:tc:xliff: matches:2.0 ref": "#1",     "urn:oasis:names:tc:xliff: matches:2.0 reference": "false",     "urn:oasis:names:tc:xliff: matches:2.0 similarity": "50.0",     "urn:oasis:names:tc:xliff: matches:2.0 type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Using “mtc_” vs. “urn:oasis:names:tc:xliff: matches:2.0 ” is not really an issue. A tool not supporting MTC can do the conversion to the namespace, and vice-versa, a tool supporting MTC can map “urn:oasis:names:tc:xliff: matches:2.0 ” to “mtc_” if needed.   This said, I think it would be * a lot simpler * to have both tool use the same mechanism: property names with a prefix and a @context-like table to associate the prefixes with the namespace-IRI.   One much bigger issue is in the values of the properties:   We have to look at supported vs. unsupported from the object model viewpoint: An unsupported module is the same as an extension. The tool cannot know specific things about the values the extension uses. It has to store them in a generic way. For example it cannot know if in similarity="50" the "50" is a number. It may guess it, but without certainty (it could be a string too), so it should not try to guess. This is relatively easy to implement it for XLIFF: You put things in a map where the value is a string. Then you can write things back without knowledge of the data and without breaking anything.   When a tool reads a JLIFF extension it could store it the same way as for XLIFF. It could even have an extra flag telling what kind of type the value was when reading it from JLIFF, so it could write it back the same way to JLIFF. But such flag cannot be set (with certitude) when the data is read from XLIFF, so if the tool reads from XLIFF and needs to write to JLIFF, it cannot output "similarity":50, but only "similarity":"50".   The root cause of the issue is that XML values are not typed if you don’t have a schema, while JSON values have basic types. Hence a tool reading XLIFF cannot set a type for the data it does not supports, while the same tool reading JLIFF can.   I don’t have solution at this point. But I thought it may help to have the issue described, so smart people can come up with options. Maybe I’m just not seeing an obvious answer.   Cheers, -yves  


  • 3.  RE: [xliff-omos] Handling unsupported modules and extensions

    Posted 11-01-2017 10:48
    Hi all,   Are we saying that we could have two different representations of the same data depending on whether or not the tool supports the extension? That doesn’t sound very good to me.   In addition there are no real differences for a tool between an extension it does not know about and a module it does not support. And we do have to find a solution for the case of the unsupported modules. So we probably can simply use the same solution (whatever it is) for the custom extensions.   Cheers, -ys   Yves Savourel Localization Solutions Architect ENLASO ® an Argos Multilingual company   From: Chase Tingley [mailto:chase@spartansoftwareinc.com] Sent: Tuesday, October 31, 2017 4:01 PM To: Yves Savourel <ysavourel@enlaso.com> Cc: XLIFF OMOS TC <xliff-omos@lists.oasis-open.org> Subject: Re: [xliff-omos] Handling unsupported modules and extensions   Picking this thread up after a long time.   I included Yves's issue with data typing in unsupported extensions in the "challenges" section of the presentation on JLIFF I gave at FEISGILTT today.  I had included the relevant text from the XLIFF 2.0 spec on a slide:   "Writers that do not support a given custom namespace based user extension SHOULD preserve that extension without Modification."   David pointed out that we can rely on the use of "should" here. The lack of type information means we may not be able to preserve the extension faithfully, but this is not necessarily non-compliant.   If schema for the extension is published, then in theory the typing information can be known.  However, there may be practical reasons (schema discovery, etc) that make this difficult in some case.           On Wed, Jun 14, 2017 at 4:58 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all,   Looking back at our issue for how an application could handle modules that it does not supported.   Here is a simple example of a basic Translation Candidate entry. With the following assumption (for now): -      We use “prefix_” for the modules -      We use “namespace-IRI<space>” for the extensions   Notation when the application supports the module:   "mtc_matches": [   {     "mtc_id": "mtc1",     "mtc_ref": "#1",     "mtc_reference": false,     "mtc_similarity": 50.0,     "mtc_type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Notation when the application does not supports the module:   "urn:oasis:names:tc:xliff:matches:2.0 matches": [   {     "urn:oasis:names:tc:xliff:matches:2.0 id": "mtc1",     "urn:oasis:names:tc:xliff:matches:2.0 ref": "#1",     "urn:oasis:names:tc:xliff:matches:2.0 reference": "false",     "urn:oasis:names:tc:xliff:matches:2.0 similarity": "50.0",     "urn:oasis:names:tc:xliff:matches:2.0 type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Using “mtc_” vs. “urn:oasis:names:tc:xliff:matches:2.0 ” is not really an issue. A tool not supporting MTC can do the conversion to the namespace, and vice-versa, a tool supporting MTC can map “urn:oasis:names:tc:xliff:matches:2.0 ” to “mtc_” if needed.   This said, I think it would be * a lot simpler * to have both tool use the same mechanism: property names with a prefix and a @context-like table to associate the prefixes with the namespace-IRI.   One much bigger issue is in the values of the properties:   We have to look at supported vs. unsupported from the object model viewpoint: An unsupported module is the same as an extension. The tool cannot know specific things about the values the extension uses. It has to store them in a generic way. For example it cannot know if in similarity="50" the "50" is a number. It may guess it, but without certainty (it could be a string too), so it should not try to guess. This is relatively easy to implement it for XLIFF: You put things in a map where the value is a string. Then you can write things back without knowledge of the data and without breaking anything.   When a tool reads a JLIFF extension it could store it the same way as for XLIFF. It could even have an extra flag telling what kind of type the value was when reading it from JLIFF, so it could write it back the same way to JLIFF. But such flag cannot be set (with certitude) when the data is read from XLIFF, so if the tool reads from XLIFF and needs to write to JLIFF, it cannot output "similarity":50, but only "similarity":"50".   The root cause of the issue is that XML values are not typed if you don’t have a schema, while JSON values have basic types. Hence a tool reading XLIFF cannot set a type for the data it does not supports, while the same tool reading JLIFF can.   I don’t have solution at this point. But I thought it may help to have the issue described, so smart people can come up with options. Maybe I’m just not seeing an obvious answer.   Cheers, -yves    


  • 4.  Re: [xliff-omos] Handling unsupported modules and extensions

    Posted 11-03-2017 17:15
    No, I was just saying that there are cases where we can't guarantee that the data types are preserved, and this is bad, but it's also within the language of the spec. I think there's a difference between custom extensions and unsupported modules, however.  For all modules, even unsupported ones, the schema is known, and that means we have the typing information available to convert it from one format to another correctly. On Wed, Nov 1, 2017 at 3:48 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all,   Are we saying that we could have two different representations of the same data depending on whether or not the tool supports the extension? That doesn’t sound very good to me.   In addition there are no real differences for a tool between an extension it does not know about and a module it does not support. And we do have to find a solution for the case of the unsupported modules. So we probably can simply use the same solution (whatever it is) for the custom extensions.   Cheers, -ys   Yves Savourel Localization Solutions Architect ENLASO ® an Argos Multilingual company   From: Chase Tingley [mailto: chase@ spartansoftwareinc.com ] Sent: Tuesday, October 31, 2017 4:01 PM To: Yves Savourel < ysavourel@enlaso.com > Cc: XLIFF OMOS TC < xliff-omos@lists.oasis-open. org > Subject: Re: [xliff-omos] Handling unsupported modules and extensions   Picking this thread up after a long time.   I included Yves's issue with data typing in unsupported extensions in the "challenges" section of the presentation on JLIFF I gave at FEISGILTT today.  I had included the relevant text from the XLIFF 2.0 spec on a slide:   "Writers that do not support a given custom namespace based user extension SHOULD preserve that extension without Modification."   David pointed out that we can rely on the use of "should" here. The lack of type information means we may not be able to preserve the extension faithfully, but this is not necessarily non-compliant.   If schema for the extension is published, then in theory the typing information can be known.  However, there may be practical reasons (schema discovery, etc) that make this difficult in some case.           On Wed, Jun 14, 2017 at 4:58 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi all,   Looking back at our issue for how an application could handle modules that it does not supported.   Here is a simple example of a basic Translation Candidate entry. With the following assumption (for now): -      We use “prefix_” for the modules -      We use “namespace-IRI<space>” for the extensions   Notation when the application supports the module:   "mtc_matches": [   {     "mtc_id": "mtc1",     "mtc_ref": "#1",     "mtc_reference": false,     "mtc_similarity": 50.0,     "mtc_type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Notation when the application does not supports the module:   "urn:oasis:names:tc:xliff: matches:2.0 matches": [   {     "urn:oasis:names:tc:xliff: matches:2.0 id": "mtc1",     "urn:oasis:names:tc:xliff: matches:2.0 ref": "#1",     "urn:oasis:names:tc:xliff: matches:2.0 reference": "false",     "urn:oasis:names:tc:xliff: matches:2.0 similarity": "50.0",     "urn:oasis:names:tc:xliff: matches:2.0 type": "tm",     "source": [       {         "text": "source match"       }     ],     "target": [       {         "text": "target match"       }     ]   } ]   Using “mtc_” vs. “urn:oasis:names:tc:xliff: matches:2.0 ” is not really an issue. A tool not supporting MTC can do the conversion to the namespace, and vice-versa, a tool supporting MTC can map “urn:oasis:names:tc:xliff: matches:2.0 ” to “mtc_” if needed.   This said, I think it would be * a lot simpler * to have both tool use the same mechanism: property names with a prefix and a @context-like table to associate the prefixes with the namespace-IRI.   One much bigger issue is in the values of the properties:   We have to look at supported vs. unsupported from the object model viewpoint: An unsupported module is the same as an extension. The tool cannot know specific things about the values the extension uses. It has to store them in a generic way. For example it cannot know if in similarity="50" the "50" is a number. It may guess it, but without certainty (it could be a string too), so it should not try to guess. This is relatively easy to implement it for XLIFF: You put things in a map where the value is a string. Then you can write things back without knowledge of the data and without breaking anything.   When a tool reads a JLIFF extension it could store it the same way as for XLIFF. It could even have an extra flag telling what kind of type the value was when reading it from JLIFF, so it could write it back the same way to JLIFF. But such flag cannot be set (with certitude) when the data is read from XLIFF, so if the tool reads from XLIFF and needs to write to JLIFF, it cannot output "similarity":50, but only "similarity":"50".   The root cause of the issue is that XML values are not typed if you don’t have a schema, while JSON values have basic types. Hence a tool reading XLIFF cannot set a type for the data it does not supports, while the same tool reading JLIFF can.   I don’t have solution at this point. But I thought it may help to have the issue described, so smart people can come up with options. Maybe I’m just not seeing an obvious answer.   Cheers, -yves    


  • 5.  RE: [xliff-omos] Handling unsupported modules and extensions

    Posted 11-05-2017 22:42
    > For all modules, even unsupported ones, the schema is known, and that means we have the > typing information available to convert it from one format to another correctly. The reason I see an unsupported module as the same as an unknown extension is because both have data without correspondence in the tool's object model. I'm not sure we can assume a tool not supporting a given module will do anything with the schema of that module. I guess there are several use cases of conversions: 1) JLIFF to OM to JLIFF 2) JLIFF to OM to XLIFF 3) XLIFF to OM to JLIFF 1) JLIFF to OM to JLIFF I have not tried, but I assume there is likely a generic way to store such data in the object model, for example using whatever JSON object the tool implementation uses. There is no need to do conversion to classes of the OM and the tool can still read and write back the data without losing information. The only thing the object model would need to implement is a way to store and retrieve generic JSON objects in its structure. 2) JLIFF to OM to XLIFF Here the main problem seems to be that, in some cases, there will be no way to decide if a set of fields should be output as attributes or elements. Except if that information would be coded as some kind of annotation in the schema, I'm not sure even that even knowing the schema would help for this. Another option would be to have some kind of naming convention that distinguish attributes from elements. But that is probably too much to ask for. 3) XLIFF to OM to JLIFF Here the main issue comes from the lack of type information in the XLIFF side. One could probably infer some (like 123, true, "text") but it won't be complete. So, one would have to rely on the XSD schema for the information. It seems to me that to be able to truly do conversions back and fore in any direction, the OM itself needs to know the type of the field and whether when it's in XML it's an element or an attribute (at least for the ambiguous cases). Just thinking aloud... -ys From: Chase Tingley [ mailto:chase@spartansoftwareinc.com ] Sent: Friday, November 3, 2017 11:14 AM To: Yves Savourel <ysavourel@enlaso.com> Cc: XLIFF OMOS TC <xliff-omos@lists.oasis-open.org> Subject: Re: [xliff-omos] Handling unsupported modules and extensions No, I was just saying that there are cases where we can't guarantee that the data types are preserved, and this is bad, but it's also within the language of the spec. I think there's a difference between custom extensions and unsupported modules, however. For all modules, even unsupported ones, the schema is known, and that means we have the typing information available to convert it from one format to another correctly.