OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only
  • 1.  Y22 - Translation proposals

    Posted 09-20-2012 13:01
    I had the action item to look at the item Y22 and report on its state. === a) id definition The specification lists id as an optional attribute, but does not define it in the attribute section of the module, instead it points to the general id section. === b) score/similarity/quality - based on the notes in the wiki and the discussion we had a long while ago, I think we are not settled yet on what the score/similarity/quality attribute should be named and what it should represent. See for example: http://markmail.org/message/iuchpu5isa7vxexo?q=similarity+list:org%2Eoasis-open%2Elists%2Exliff I think the situation can be summarized as: there are three types of information: - how similar the source of the match is compared to the source of the searched text - how good the quality of the candidate translation is - some kind of score/ranking value that may take into account the two values above and possibly others to provide a value that can be used for ordering the matches as to present the best first. The discussions seem to indicate that not all users use the same information. The question is: should we provide an attribute for each, or just one or two? Id so, which one. === c) type of match It seems there is a need to define also what kind of match the match is: MT, id-based match, in-context match, etc. If people think this is something we should have, I can try to come up with an initiallist. Cheers, -ys


  • 2.  RE: [xliff] Y22 - Translation proposals

    Posted 09-20-2012 13:22
    Hi Yves, Regarding the "ïd" attribute, I'll put a definition in the module's own section instead of using the general one. For score/similarity/quality, we better use one attribute that indicates how similar the source text from the match is to the source text being translated. If we add a second attribute for qualifying the "quality" of the translation supplied by the generating agent, there will be lots of interpretation problems. We do need a list of values for the type of match. It would be great if you can supply one. Regards, Rodolfo -- Rodolfo M. Raya rmraya@maxprograms.com Maxprograms http://www.maxprograms.com >


  • 3.  RE: [xliff] Y22 - Translation proposals

    Posted 09-20-2012 13:51
    Hi Yves, Rodolfo: >For score/similarity/quality, we better use one attribute that indicates how similar the source text from the match is to the >source text being translated. If we add a second attribute for qualifying the "quality" of the translation supplied by the >generating agent, there will be lots of interpretation problems. I agree with Rodolfo, I think only one attribute should be used to indicate the relation between the segment and the match (using a similarity score seems a fair solution), I would discard the "quality" idea, as it is extremely difficult to come up with quantitative method to obtain a score that would be widely accepted and it would always be a subjective dependent topic. Regarding optional attributes that this element could have, I had added some time ago some proposals to the wiki (some of them are already present in the 1.2), I only introduced the "provenance" attributes that could contain information about the origin of the translation match (which was the topic of my research). https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/Translation%20Proposals Lucía


  • 4.  RE: [xliff] Y22 - Translation proposals

    Posted 09-20-2012 13:57
    Hi Lucía, The provenance information could be useful indeed. I would implement it by allowing an optional <mda:metadata> element in <match> that could host all the required data. Regards, Rodolfo -- Rodolfo M. Raya rmraya@maxprograms.com Maxprograms http://www.maxprograms.com >


  • 5.  RE: [xliff] Y22 - Translation proposals

    Posted 09-20-2012 14:25
    I tend to agree with Rodolfo on the quality/score attribute on keeping it simple to just a well defined attribute. Any reason why something like edit distance could not be applied for "similarity" and if so why not just call it "edit_distance"? On the type of matches, I have definitely seen MT, exact match (similar to the id-match), and in-context match in IBM. However, we have also just rolled out another implementation that does parallel search against thousands of terabytes or petabytes of data to try skim the fat off the cream elsewhere. Within IBM, we just call it "global match" or some referred to as "optimized match". Are other organizations doing something similar and is that type of match considered different from the three already stated? Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         "Rodolfo M. Raya" <rmraya@maxprograms.com> To:         "'Yves Savourel'" <ysavourel@enlaso.com>, <xliff@lists.oasis-open.org> Date:         09/20/2012 09:23 AM Subject:         RE: [xliff] Y22 - Translation proposals Sent by:         <xliff@lists.oasis-open.org> Hi Yves, Regarding the "ïd" attribute, I'll put a definition in the module's own section instead of using the general one. For score/similarity/quality, we better use one attribute that indicates how similar the source text from the match is to the source text being translated. If we add a second attribute for qualifying the "quality" of the translation supplied by the generating agent, there will be lots of interpretation problems. We do need a list of values for the type of match. It would be great if you can supply one. Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com >


  • 6.  RE: [xliff] Y22 - Translation proposals

    Posted 09-21-2012 00:32
    We also have TermBase matches at MultiCorpora, as well as fuzzy matches which I’m sure are already on everyone’s list. I’m not in favor of having a special category for what Helena is describing as “global matches” or “optimized matches”, as I’m sure every organization has special ways of pulling out the most relevant matches and I’m sure each organization’s way is different. In the end they are still exact or fuzzy matches, and Lucia’s comment about the provenance could handle these situations.   Regards,   SHIRLEY COADY PRODUCT MANAGER GESTIONNAIRE DE PRODUIT (819)778-7070 ext./poste 229 scoady@multicorpora.ca     From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Helena S Chapman Sent: September-20-12 10:25 AM To: Rodolfo M. Raya Cc: xliff@lists.oasis-open.org; 'Yves Savourel' Subject: RE: [xliff] Y22 - Translation proposals   I tend to agree with Rodolfo on the quality/score attribute on keeping it simple to just a well defined attribute. Any reason why something like edit distance could not be applied for "similarity" and if so why not just call it "edit_distance"? On the type of matches, I have definitely seen MT, exact match (similar to the id-match), and in-context match in IBM. However, we have also just rolled out another implementation that does parallel search against thousands of terabytes or petabytes of data to try skim the fat off the cream elsewhere. Within IBM, we just call it "global match" or some referred to as "optimized match". Are other organizations doing something similar and is that type of match considered different from the three already stated? Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         "Rodolfo M. Raya" < rmraya@maxprograms.com > To:         "'Yves Savourel'" < ysavourel@enlaso.com >, < xliff@lists.oasis-open.org > Date:         09/20/2012 09:23 AM Subject:         RE: [xliff] Y22 - Translation proposals Sent by:         < xliff@lists.oasis-open.org > Hi Yves, Regarding the "ïd" attribute, I'll put a definition in the module's own section instead of using the general one. For score/similarity/quality, we better use one attribute that indicates how similar the source text from the match is to the source text being translated. If we add a second attribute for qualifying the "quality" of the translation supplied by the generating agent, there will be lots of interpretation problems. We do need a list of values for the type of match. It would be great if you can supply one. Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com >