OASIS XML Localisation Interchange File Format (XLIFF) TC

Expand all | Collapse all

1.2 to 2.0 Gaps and Proposals

  • 1.  1.2 to 2.0 Gaps and Proposals

    Posted 11-15-2012 04:25
    As part of our exercise to map our 1.2 implementation (and challenges) to 2.0, we discovered the following gaps in core and modules that we would like to propose 5 features for. These are all based on real-world use cases at Microsoft and quite probably apply to other large companies that outsource content for localization.   Proposal 1: Add an optional build attribute to 2.0 <file> element in core. In 1.2, the build-num attribute is important for us because, once we’ve handed off files to our suppliers to be localized, we expect localized files from the same build to be returned. We suspect we aren’t the only content providers doing this kind of validation. In 2.0, there is no file-level attribute we could use for this.   <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01" >     Proposal 2: Be able to specify optional custom values for match type attribute in the <mtc:matches> module. Content providers and Localization Suppliers base their cost and billing models on match similarity and match types. Localization suppliers charge us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we might even want to get more granular than that as our cost and billing models evolve with the business. In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, which were defined in the state-qualifier attribute in 1.2. Instead of supporting these two, or any others that may not have migrated from 1.2 to 2.0, as a separate attribute, the request is, that like the discussion on state and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. This will allow us to add extra business logic to types, such as "tm" or "mt", which are already defined in the spec.   <match id=”1” similarity =”100.0” type=” tm/xlf:exact ”> <match id=”1” similarity=”75.0” type=” tm/xlf:fuzzy ”> <match id=”1” similarity=”99.0” type=” tm/custom:near-exact ”>   Where, as noted in the 2.0 spec: The sub-category prefix is a string uniquely identifying a collection of values for a specific authority. The value is any string value defined by an authority. The prefix xlf is reserved for this specification…     Proposal 3: Add an optional Reference Language to core. This is a crucial feature for Microsoft and other large companies that localize minority languages. For example, it is typical that when we localize from English into Quechua , localizers are more efficient and provide much higher quality translation, when along with English source, we provide them with Spanish target. In 1.2, Reference Languages could be defined in an <alt-trans> element:   <alt-trans> <target xml:lang="es-es“  alttranstype="reference">hola mundo %s</target> </alt-trans>   There is no equivalent in 2.0, so we’d like to make this much simpler by proposing an optional <reference> element on <segment> that can have an xml:lang attribute different from source and target in the main document.   <segment id=”1”>      <source xml:lang=”en-us”>hello world</source>     <target xml:lang=”quz-pe">hola món</target>     <reference xml:lang=”es-es“>hola mundo</reference> </segment>       Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module. We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped and displayed together, such as in an editor UI. Here are some examples:   <notes name="comments" >   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=“user"> Developer@microsoft.com</note >   <note id=“date">10/21/2012 5:28:13 PM</note> </notes>   <notes name="instructions" >   <note id=“instruction">Do not localize the product name</note>   <note id=“user"> loc_engineer@microsoft.com</note >   <note id=“date">10/21/2012 5:28:13 PM</note> </notes>   As opposed to something less structured and more difficult to process:   <notes>   <note id=“instruction">Do not localize the product name</note>   <note id=“instruction-user">Localization Engineer</note>   <note id=“instruction-date">10/21/2012 5:28:13 PM</note>   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=“comment-user">Developer</note>   <note id=“comment-date">10/21/2012 5:28:13 PM</note> </notes>   Similarly, we’d like a name attribute for <mda:metadata>.   <mda:metadata name=”properties” >    <meta type="previous-source">hello world</meta>       <meta type="string-category">TextBox</meta>    <meta type="workorder-id">25</meta>    <meta type="workorder-name">Hotmail</meta> </mda:metadata>     Proposal 5: Add optional change tracking attributes to <segment>. When translation work may be shared across <segments> in the same file, for whatever reason, it is useful to track who modified a <segment> and when it was modified for billing purposes. This can be easily done when localization is done online in a database, but once it is offline and file-based, e.g. in an XLIFF file, having optional attributes defined on the <segment> would aid in capturing this information.   <segment id=”1” modifiedBy=” translator@loc.com ” modifiedDate=”10/21/2012 5:28:13 PM” >      <source>hello world</source>     <target>hola món</target> </segment>    Please let us know your opinions on these proposals.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)        


  • 2.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-15-2012 20:24
    Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this. > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> > <note id=“comment">This string cannot be longer than 100 characters</note> > <note id=“user">Developer@microsoft.com</note> > <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind. > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=”translator@loc.com” > modifiedDate=”10/21/2012 5:28:13 PM”> > <source>hello world</source> > <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok? cheers, -yves


  • 3.  Re: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-16-2012 01:24
    Yves, Ryan et. al. Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this. > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are. I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces. > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok? Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module... cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 4.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-17-2012 01:15
    Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 5.  Re: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-17-2012 13:19
    Ryan, one very short comment below.. Cheers dF > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. This could not work from the point of view of the normative theory. You cannot have "MUST preserve" and "SHOULD update" statements on the same level. The working should be like this: The general module processing requirement for CORE ONLY implementers is MUST preserve IFF[if and only if] you are also the module implementer, you MUST follow its specific PR that are actually in conflict with the higher level MUST preserve requirement, but this is OK in normative theory, because it is a specific rule that suspends the general under an IFF statement. [This is the same in general for ALL module PRs] So what you wanted to say is probably that the tools MUST preserve if they are unable to process according to the module specific PRs, but this is equivalent with the general PR that the TC adopted by the last completed ballot, so all should be OK :-)   Sorry, if this seems hair splitting, or if it seems self-evident to everyone, but I believe this is vital and beneficial to clarify at this point for all module owners, so that they understand the relationship of their module specific PRs to the general MUST preserve PR defined at the core spec level. A related issue occurred in the discussion between Yves and myself before TC made the decision on general PRs. Yves thought that it was absurd that only module implementers can remove module, I argued that this is exactly what we wanted to achieve and this is what prevailed in the ballot. Cheers dF


  • 6.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-17-2012 13:56
    > ... A related issue occurred in the discussion between Yves and myself before TC > made the decision on general PRs. Yves thought that it was absurd that only module > implementers can remove module, I argued that this is exactly what we wanted to > achieve and this is what prevailed in the ballot. And on that note David: Following that logic, since none of the modules has currently any PR stating a tool implementing the module can remove the element, technically, once added a module element cannot be removed ...ever. Even by the tool that placed it there. That's the problem with "MUST preserve" at the default level: It forces you to define module-specific PRs where you can't possibly think about everything. That results in default PRs that are simply impossible to enforce. But it's ok, I'm past spending time discussing it. Have a good week-end, -ys


  • 7.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 00:58
    Hi Shirley, it was agreed in last week’s TC call that we would merge our feature requests:   3.1. (S1) Change Tracking / Version Control 1.19. (R43) Change track module   Can you let me know what your ideas are for this module and maybe we can take the discussion offline for a bit to come up with a proposal for the structure and PR of the module? Our initial requirement was simply two properties that would record the author and timestamp of a piece of data, such as a source segment or a note.   Thanks, Ryan   From: Dr. David Filip [mailto:David.Filip@ul.ie] Sent: Saturday, November 17, 2012 5:18 AM To: Ryan King Cc: Yves Savourel; xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Ryan, one very short comment below.. Cheers dF > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. This could not work from the point of view of the normative theory. You cannot have "MUST preserve" and "SHOULD update" statements on the same level. The working should be like this: The general module processing requirement for CORE ONLY implementers is MUST preserve IFF[if and only if] you are also the module implementer, you MUST follow its specific PR that are actually in conflict with the higher level MUST preserve requirement, but this is OK in normative theory, because it is a specific rule that suspends the general under an IFF statement. [This is the same in general for ALL module PRs]   So what you wanted to say is probably that the tools MUST preserve if they are unable to process according to the module specific PRs, but this is equivalent with the general PR that the TC adopted by the last completed ballot, so all should be OK :-)   Sorry, if this seems hair splitting, or if it seems self-evident to everyone, but I believe this is vital and beneficial to clarify at this point for all module owners, so that they understand the relationship of their module specific PRs to the general MUST preserve PR defined at the core spec level. A related issue occurred in the discussion between Yves and myself before TC made the decision on general PRs. Yves thought that it was absurd that only module implementers can remove module, I argued that this is exactly what we wanted to achieve and this is what prevailed in the ballot.   Cheers dF


  • 8.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-27-2012 23:46
    Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?   ·          Proposal 2: Be able to specify optional custom values for match type in <mtc:matches> ·          Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow zero, one or more <notes> in each position in the tree structure)   Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:   1.        That we allow zero, one or more <mtc: matches> at each extension point, because you might have both recycling and reference language data. 2.        Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from the call will correct that if false.   Please let me know how I can work with you on these. Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Friday, November 16, 2012 5:02 PM To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 9.  Re: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 01:41
    Please don't ruin te design for <notes>. Only one should be allowed per insertion point. Regards, Rodolfo Sent from my iPad On Nov 27, 2012, at 9:45 PM, Ryan King < ryanki@microsoft.com > wrote: Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?   ·          Proposal 2: Be able to specify optional custom values for match type in <mtc:matches> ·          Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow zero, one or more <notes> in each position in the tree structure)   Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:   1.        That we allow zero, one or more <mtc: matches> at each extension point, because you might have both recycling and reference language data. 2.        Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from the call will correct that if false.   Please let me know how I can work with you on these. Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Friday, November 16, 2012 5:02 PM To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build= 2011-11-23-133615307_windc.win8.beta.b01 > I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as tm or mt , > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the percentage from the business type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name= comments > >  <note id=“comment >This string cannot be longer than 100 characters</note> >  <note id=“user > Developer@microsoft.com </note> >  <note id=“date >10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a change track module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 10.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 07:33




    So that our original reason for proposing having more than one <notes> at the extension point does not get obfuscated in all of the replies and “see inlines”,
    here once again, is the use case for adding more than one <notes> per extension:
     
    Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module.
    We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped
    and displayed together, such as in an editor UI. Here are some examples:
     
    <notes
    name="comments" >
      <note id=“comment">This string cannot be longer than 100 characters</note>
      <note id=“origin">developer</note>
      <note id=”priority”>1</note>
    </notes>
     
    <notes
    name="instructions" >
      <note id=“instruction">Do not localize the product name</note>
      <note id=“origin">loc-engineer</note>
      <note id=”priority”>2</note>
    </notes>
     
    As opposed to something less structured and more difficult to process:
     
    <notes>
      <note id=“instruction">Do not localize the product name</note>
      <note id=“instruction-origin">loc-engineer</note>
      <note id=”instructions-priority”>1</note>
      <note id=“comment">This string cannot be longer than 100 characters</note>
      <note id=”comment-priority”>2</note>
    </notes>
     
    Thanks,
    Ryan
     


    From: Rodolfo M. Raya [mailto:rmraya@maxprograms.com]

    Sent: Tuesday, November 27, 2012 5:41 PM
    To: Ryan King
    Cc: Yves Savourel; <xliff@lists.oasis-open.org>
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals


     

    Please don't ruin te design for <notes>. Only one should be allowed per insertion point.


     


    Regards,


    Rodolfo

    Sent from my iPad



    On Nov 27, 2012, at 9:45 PM, "Ryan King" < ryanki@microsoft.com > wrote:



    Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals
    that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?

     
    ·         
    Proposal 2: Be able to specify optional custom values for match type in <mtc:matches>
    ·         
    Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow
    zero, one or more <notes> in each position in the tree structure)
     
    Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is
    already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:
     
    1.      
    That we allow
    zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data.
    2.      
    Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different
    from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:
     
    <mtc:matches>
      <mtc:match reference=”yes”>
       <segment>
        <target xml:lang=”es-es”>hola mundo</target>
       </segment>
      </mtc:match>
    </match>    

     
    I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from
    the call will correct that if false.
     
    Please let me know how I can work with you on these.

    Ryan
     


    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Friday, November 16, 2012 5:02 PM
    To: Dr. David Filip; Yves Savourel;
    xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add
    these soon to the wiki.
     
    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, November 15, 2012 5:24 PM
    To: Yves Savourel
    Cc: xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals
     



    Yves, Ryan et. al.
     
    Commenting inline..
    Cheers
    dF
    On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote:
    Hi Ryan, all,


    > Proposal 1: Add an optional build attribute to 2.0 <file> element in core.
    > ..
    > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01">
    I don't see anything wrong with this.
     

    > Proposal 2: Be able to specify optional custom values for match type
    > attribute in the <mtc:matches> module.
    > Content providers and Localization Suppliers base their cost and billing
    > models on match similarity and match types. Localization suppliers charge
    > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we
    > might even want to get more granular than that as our cost and billing models
    > evolve with the business.
    > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match,
    > which were defined in the state-qualifier attribute in 1.2. Instead of supporting
    > these two, or any others that may not have migrated from 1.2 to 2.0,
    > as a separate attribute, the request is, that like the discussion on state
    > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type.
    > This will allow us to add extra business logic to types, such as "tm" or "mt",
    > which are already defined in the spec.
    > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>
    > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”>
    > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”>
    I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not.

    The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you
    want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc.
    I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular
    can of worms..
     
    So I wouldn't see a problem with a sub-type there.

    A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.
     
    [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the
    percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled
    content):
      <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>
      <match id=”1” similarity=”100.0” type=”ice”>
    In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match,
    in our case, has the requirement of being 100% and signed off or final.

    > Proposal 3: Add an optional Reference Language to core.
    > This is a crucial feature for Microsoft and other large companies that localize
    > minority languages. For example, it is typical that when we localize from
    > English into Quechua, localizers are more efficient and provide much higher
    > quality translation, when along with English source, we provide them with
    > Spanish target. In 1.2, Reference Languages could be defined in
    > an <alt-trans> element:
    I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional).

    Could that be part of the match module?
    Possibly with a new attribute (e.g. reference='yes no' defaulting to no)

    Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.
     
    I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria
     
    [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should
    be used for as opposed to recycling.

    > Proposal 4: Add an optional name attribute on <notes> in core
    > and <mds:metadata> module.
    > We believe it will be typical for content providers to want to
    > ...
    > <notes name="comments">
    >  <note id=“comment">This string cannot be longer than 100 characters</note>
    >  <note id=“user"> Developer@microsoft.com </note>
    >  <note id=“date">10/21/2012 5:28:13 PM</note>
    > </notes>
    Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point.

    The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp?
    That would obviously work only if those two info are the only example you have in mind.
     
    I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes
    from other namespaces.
     
    [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an
    author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.
     

    > Proposal 5: Add optional change tracking attributes to <segment>.
    > ...
    > <segment id=”1” modifiedBy=” translator@loc.com
    > modifiedDate=”10/21/2012 5:28:13 PM”>
    >    <source>hello world</source>
    >    <target>hola món</target>
    > </segment>
    Here again I'm wondering if a "change track" module may be better?
    You could use it not just on segments but other elements: notes.
    The issue then would be how this gets updated if it's not a core component?
    Actually if it's a core attribute, does it means it's not optional?
    I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date.
    But maybe that's ok?
     
    Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents
    do not update it, it feels more like a module...
     
    [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present,
    but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents.


    cheers,
    -yves



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
    For additional commands, e-mail:
    xliff-help@lists.oasis-open.org


     









  • 11.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 08:33
    Still a bad use case that doesn’t justify ruining a good design.   Regards, Rodolfo -- Rodolfo M. Raya        rmraya@maxprograms.com Maxprograms        http://www.maxprograms.com   From: Ryan King [mailto:ryanki@microsoft.com ] Sent : Wednesday, November 28, 2012 5:32 AM To: Rodolfo M. Raya; <xliff@lists.oasis-open.org>; Yves Savourel Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   So that our original reason for proposing having more than one <notes> at the extension point does not get obfuscated in all of the replies and “see inlines”, here once again, is the use case for adding more than one <notes> per extension:   Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module. We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped and displayed together, such as in an editor UI. Here are some examples:   <notes name="comments" >   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=“origin">developer</note>   <note id=”priority”>1</note> </notes>   <notes name="instructions" >   <note id=“instruction">Do not localize the product name</note>   <note id=“origin">loc-engineer</note>   <note id=”priority”>2</note> </notes>   As opposed to something less structured and more difficult to process:   <notes>   <note id=“instruction">Do not localize the product name</note>   <note id=“instruction-origin">loc-engineer</note>   <note id=”instructions-priority”>1</note>   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=”comment-priority”>2</note> </notes>   Thanks, Ryan   From: Rodolfo M. Raya [ mailto:rmraya@maxprograms.com ] Sent : Tuesday, November 27, 2012 5:41 PM To: Ryan King Cc: Yves Savourel ; < xliff@lists.oasis-open.org > Subject: Re: [ xliff ] 1.2 to 2.0 Gaps and Proposals   Please don't ruin te design for <notes>. Only one should be allowed per insertion point.   Regards, Rodolfo Sent from my iPad On Nov 27, 2012, at 9:45 PM, "Ryan King" < ryanki@microsoft.com > wrote: Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?   ·          Proposal 2: Be able to specify optional custom values for match type in <mtc:matches> ·          Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow zero, one or more <notes> in each position in the tree structure)   Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:   1.       That we allow zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data. 2.       Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from the call will correct that if false.   Please let me know how I can work with you on these. Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Friday, November 16, 2012 5:02 PM To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 12.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 10:11




    Hi Rodolfo, Ryan,
     
    I think the intent of the <notes> is lost with the current proposal. The feature is designed so that <notes> is a container for a group of <note>s at a specific
    level in the document. Where each <note> is one annotation / comment in itself. The suggested change transforms that so that the <notes> element becomes the entity describing one note, with <note> describing specific pieces of metadata related to that note.
    The ID is intended to be used to refer to the note from other places such as from <mrk> elements in the inline content, so overloading it to be the type of data would cause additional problems.
     
    I think the initial model is much easier to work with and more clean as it contain all note related information in one sub tree per document level where notes
    are allowed. Adding attributes to the <note> element is in my opinion the best way to go. If we should have more standard attributes or if a processor is free to use the third party namespace extension mechanism to add them is another question. Depending on
    how simple we want to keep the basic notes feature it could be either or a mix of the two methods.
     
    Although I’m not a fan of the third party extensions I think this is a case where they could make sense. And if used for process specific metadata only I don’t
    see an issue. Of course there will be no standard way to display them in a UI or report if they are not specified in the standard.
     
    Regards,
    Fredrik Estreen
     



    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Rodolfo M. Raya
    Sent: den 28 november 2012 09:32
    To: xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    Still a bad use case that doesn’t justify ruining a good design.
     
    Regards,
    Rodolfo

    --
    Rodolfo M. Raya       rmraya@maxprograms.com
    Maxprograms       http://www.maxprograms.com

     



    From: Ryan King [ mailto:ryanki@microsoft.com ]

    Sent: Wednesday, November 28, 2012 5:32 AM
    To: Rodolfo M. Raya; < xliff@lists.oasis-open.org >; Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    So that our original reason for proposing having more than one <notes> at the extension point does not get obfuscated in all of the replies and “see inlines”,
    here once again, is the use case for adding more than one <notes> per extension:
     
    Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module.
    We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped
    and displayed together, such as in an editor UI. Here are some examples:
     
    <notes
    name="comments" >
      <note id=“comment">This string cannot be longer than 100 characters</note>
      <note id=“origin">developer</note>
      <note id=”priority”>1</note>
    </notes>
     
    <notes
    name="instructions" >
      <note id=“instruction">Do not localize the product name</note>
      <note id=“origin">loc-engineer</note>
      <note id=”priority”>2</note>
    </notes>
     
    As opposed to something less structured and more difficult to process:
     
    <notes>
      <note id=“instruction">Do not localize the product name</note>
      <note id=“instruction-origin">loc-engineer</note>
      <note id=”instructions-priority”>1</note>
      <note id=“comment">This string cannot be longer than 100 characters</note>
      <note id=”comment-priority”>2</note>
    </notes>
     
    Thanks,
    Ryan
     


    From: Rodolfo M. Raya [ mailto:rmraya@maxprograms.com ]

    Sent: Tuesday, November 27, 2012 5:41 PM
    To: Ryan King
    Cc: Yves Savourel; < xliff@lists.oasis-open.org >
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals


     

    Please don't ruin te design for <notes>. Only one should be allowed per insertion point.


     


    Regards,


    Rodolfo

    Sent from my iPad



    On Nov 27, 2012, at 9:45 PM, "Ryan King" < ryanki@microsoft.com > wrote:



    Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals
    that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?

     
    ·         
    Proposal 2: Be able to specify optional custom values for match type in <mtc:matches>
    ·         
    Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow
    zero, one or more <notes> in each position in the tree structure)
     
    Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is
    already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:
     
    1.      
    That we allow
    zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data.
    2.      
    Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different
    from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:
     
    <mtc:matches>
      <mtc:match reference=”yes”>
       <segment>
        <target xml:lang=”es-es”>hola mundo</target>
       </segment>
      </mtc:match>
    </match>    

     
    I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from
    the call will correct that if false.
     
    Please let me know how I can work with you on these.

    Ryan
     


    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Friday, November 16, 2012 5:02 PM
    To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add
    these soon to the wiki.
     
    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, November 15, 2012 5:24 PM
    To: Yves Savourel
    Cc: xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals
     



    Yves, Ryan et. al.
     
    Commenting inline..
    Cheers
    dF
    On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote:
    Hi Ryan, all,


    > Proposal 1: Add an optional build attribute to 2.0 <file> element in core.
    > ..
    > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01">
    I don't see anything wrong with this.
     

    > Proposal 2: Be able to specify optional custom values for match type
    > attribute in the <mtc:matches> module.
    > Content providers and Localization Suppliers base their cost and billing
    > models on match similarity and match types. Localization suppliers charge
    > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we
    > might even want to get more granular than that as our cost and billing models
    > evolve with the business.
    > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match,
    > which were defined in the state-qualifier attribute in 1.2. Instead of supporting
    > these two, or any others that may not have migrated from 1.2 to 2.0,
    > as a separate attribute, the request is, that like the discussion on state
    > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type.
    > This will allow us to add extra business logic to types, such as "tm" or "mt",
    > which are already defined in the spec.
    > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>
    > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”>
    > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”>
    I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not.

    The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you
    want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc.
    I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular
    can of worms..
     
    So I wouldn't see a problem with a sub-type there.

    A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.
     
    [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the
    percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled
    content):
      <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>
      <match id=”1” similarity=”100.0” type=”ice”>
    In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match,
    in our case, has the requirement of being 100% and signed off or final.

    > Proposal 3: Add an optional Reference Language to core.
    > This is a crucial feature for Microsoft and other large companies that localize
    > minority languages. For example, it is typical that when we localize from
    > English into Quechua, localizers are more efficient and provide much higher
    > quality translation, when along with English source, we provide them with
    > Spanish target. In 1.2, Reference Languages could be defined in
    > an <alt-trans> element:
    I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional).

    Could that be part of the match module?
    Possibly with a new attribute (e.g. reference='yes no' defaulting to no)

    Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.
     
    I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria
     
    [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should
    be used for as opposed to recycling.

    > Proposal 4: Add an optional name attribute on <notes> in core
    > and <mds:metadata> module.
    > We believe it will be typical for content providers to want to
    > ...
    > <notes name="comments">
    >  <note id=“comment">This string cannot be longer than 100 characters</note>
    >  <note id=“user"> Developer@microsoft.com </note>
    >  <note id=“date">10/21/2012 5:28:13 PM</note>
    > </notes>
    Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point.

    The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp?
    That would obviously work only if those two info are the only example you have in mind.
     
    I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes
    from other namespaces.
     
    [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an
    author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.
     

    > Proposal 5: Add optional change tracking attributes to <segment>.
    > ...
    > <segment id=”1” modifiedBy=” translator@loc.com
    > modifiedDate=”10/21/2012 5:28:13 PM”>
    >    <source>hello world</source>
    >    <target>hola món</target>
    > </segment>
    Here again I'm wondering if a "change track" module may be better?
    You could use it not just on segments but other elements: notes.
    The issue then would be how this gets updated if it's not a core component?
    Actually if it's a core attribute, does it means it's not optional?
    I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date.
    But maybe that's ok?
     
    Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents
    do not update it, it feels more like a module...
     
    [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present,
    but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents.


    cheers,
    -yves



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
    For additional commands, e-mail:
    xliff-help@lists.oasis-open.org


     











  • 13.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 20:18




    Thanks Fredrick for the productive feedback. Since the id is specifically intended to refer to the <mrk> element, we should at least introduce a type attribute
    then. Additionally, it would still be worthwhile in our opinion to be able to group <notes> with associated note “metadata” as you call it. Any thoughts on that? Perhaps a grouping element or a group attribute a la the discussion on metadata?
     
    <notes>
      <notegroup name=”instructions”>
        <note id=”1” type=“instruction">Do not localize the product name</note>
        <note type=“origin">loc-engineer</note>
       <note type=”priority”>1</note>
      </notegroup>
    </notes>
     
    versus
     
    <notes>
      <note id=”1” type=“instruction" group=”instructions”>Do not localize the product name</note>
     <note type=“origin" group=”instructions”>loc-engineer</note>
     <note type=”priority” group=”instructions”>1</note>
    </notes>
     
    Thanks,
    Ryan
     
     


    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Estreen, Fredrik
    Sent: Wednesday, November 28, 2012 2:11 AM
    To: Rodolfo M. Raya; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    Hi Rodolfo, Ryan,
     
    I think the intent of the <notes> is lost with the current proposal. The feature is designed so that <notes> is a container for a group of <note>s at a specific
    level in the document. Where each <note> is one annotation / comment in itself. The suggested change transforms that so that the <notes> element becomes the entity describing one note, with <note> describing specific pieces of metadata related to that note.
    The ID is intended to be used to refer to the note from other places such as from <mrk> elements in the inline content, so overloading it to be the type of data would cause additional problems.
     
    I think the initial model is much easier to work with and more clean as it contain all note related information in one sub tree per document level where notes
    are allowed. Adding attributes to the <note> element is in my opinion the best way to go. If we should have more standard attributes or if a processor is free to use the third party namespace extension mechanism to add them is another question. Depending on
    how simple we want to keep the basic notes feature it could be either or a mix of the two methods.
     
    Although I’m not a fan of the third party extensions I think this is a case where they could make sense. And if used for process specific metadata only I don’t
    see an issue. Of course there will be no standard way to display them in a UI or report if they are not specified in the standard.
     
    Regards,
    Fredrik Estreen
     



    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Rodolfo M. Raya
    Sent: den 28 november 2012 09:32
    To: xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    Still a bad use case that doesn’t justify ruining a good design.
     
    Regards,
    Rodolfo

    --
    Rodolfo M. Raya       rmraya@maxprograms.com
    Maxprograms       http://www.maxprograms.com

     



    From: Ryan King [ mailto:ryanki@microsoft.com ]

    Sent: Wednesday, November 28, 2012 5:32 AM
    To: Rodolfo M. Raya; < xliff@lists.oasis-open.org >; Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    So that our original reason for proposing having more than one <notes> at the extension point does not get obfuscated in all of the replies and “see inlines”,
    here once again, is the use case for adding more than one <notes> per extension:
     
    Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module.
    We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped
    and displayed together, such as in an editor UI. Here are some examples:
     
    <notes
    name="comments" >
      <note id=“comment">This string cannot be longer than 100 characters</note>
      <note id=“origin">developer</note>
      <note id=”priority”>1</note>
    </notes>
     
    <notes
    name="instructions" >
      <note id=“instruction">Do not localize the product name</note>
      <note id=“origin">loc-engineer</note>
      <note id=”priority”>2</note>
    </notes>
     
    As opposed to something less structured and more difficult to process:
     
    <notes>
      <note id=“instruction">Do not localize the product name</note>
      <note id=“instruction-origin">loc-engineer</note>
      <note id=”instructions-priority”>1</note>
      <note id=“comment">This string cannot be longer than 100 characters</note>
      <note id=”comment-priority”>2</note>
    </notes>
     
    Thanks,
    Ryan
     


    From: Rodolfo M. Raya [ mailto:rmraya@maxprograms.com ]

    Sent: Tuesday, November 27, 2012 5:41 PM
    To: Ryan King
    Cc: Yves Savourel; < xliff@lists.oasis-open.org >
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals


     

    Please don't ruin te design for <notes>. Only one should be allowed per insertion point.


     


    Regards,


    Rodolfo

    Sent from my iPad



    On Nov 27, 2012, at 9:45 PM, "Ryan King" < ryanki@microsoft.com > wrote:



    Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals
    that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?

     
    ·         
    Proposal 2: Be able to specify optional custom values for match type in <mtc:matches>
    ·         
    Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow
    zero, one or more <notes> in each position in the tree structure)
     
    Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is
    already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:
     
    1.      
    That we allow
    zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data.
    2.      
    Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different
    from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:
     
    <mtc:matches>
      <mtc:match reference=”yes”>
       <segment>
        <target xml:lang=”es-es”>hola mundo</target>
       </segment>
      </mtc:match>
    </match>    

     
    I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from
    the call will correct that if false.
     
    Please let me know how I can work with you on these.

    Ryan
     


    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Friday, November 16, 2012 5:02 PM
    To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


     
    Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add
    these soon to the wiki.
     
    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, November 15, 2012 5:24 PM
    To: Yves Savourel
    Cc: xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals
     



    Yves, Ryan et. al.
     
    Commenting inline..
    Cheers
    dF
    On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote:
    Hi Ryan, all,


    > Proposal 1: Add an optional build attribute to 2.0 <file> element in core.
    > ..
    > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01">
    I don't see anything wrong with this.
     

    > Proposal 2: Be able to specify optional custom values for match type
    > attribute in the <mtc:matches> module.
    > Content providers and Localization Suppliers base their cost and billing
    > models on match similarity and match types. Localization suppliers charge
    > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we
    > might even want to get more granular than that as our cost and billing models
    > evolve with the business.
    > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match,
    > which were defined in the state-qualifier attribute in 1.2. Instead of supporting
    > these two, or any others that may not have migrated from 1.2 to 2.0,
    > as a separate attribute, the request is, that like the discussion on state
    > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type.
    > This will allow us to add extra business logic to types, such as "tm" or "mt",
    > which are already defined in the spec.
    > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>
    > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”>
    > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”>
    I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not.

    The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you
    want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc.
    I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular
    can of worms..
     
    So I wouldn't see a problem with a sub-type there.

    A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.
     
    [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the
    percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled
    content):
      <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>
      <match id=”1” similarity=”100.0” type=”ice”>
    In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match,
    in our case, has the requirement of being 100% and signed off or final.

    > Proposal 3: Add an optional Reference Language to core.
    > This is a crucial feature for Microsoft and other large companies that localize
    > minority languages. For example, it is typical that when we localize from
    > English into Quechua, localizers are more efficient and provide much higher
    > quality translation, when along with English source, we provide them with
    > Spanish target. In 1.2, Reference Languages could be defined in
    > an <alt-trans> element:
    I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional).

    Could that be part of the match module?
    Possibly with a new attribute (e.g. reference='yes no' defaulting to no)

    Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.
     
    I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria
     
    [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should
    be used for as opposed to recycling.

    > Proposal 4: Add an optional name attribute on <notes> in core
    > and <mds:metadata> module.
    > We believe it will be typical for content providers to want to
    > ...
    > <notes name="comments">
    >  <note id=“comment">This string cannot be longer than 100 characters</note>
    >  <note id=“user"> Developer@microsoft.com </note>
    >  <note id=“date">10/21/2012 5:28:13 PM</note>
    > </notes>
    Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point.

    The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp?
    That would obviously work only if those two info are the only example you have in mind.
     
    I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes
    from other namespaces.
     
    [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an
    author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.
     

    > Proposal 5: Add optional change tracking attributes to <segment>.
    > ...
    > <segment id=”1” modifiedBy=” translator@loc.com
    > modifiedDate=”10/21/2012 5:28:13 PM”>
    >    <source>hello world</source>
    >    <target>hola món</target>
    > </segment>
    Here again I'm wondering if a "change track" module may be better?
    You could use it not just on segments but other elements: notes.
    The issue then would be how this gets updated if it's not a core component?
    Actually if it's a core attribute, does it means it's not optional?
    I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date.
    But maybe that's ok?
     
    Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents
    do not update it, it feels more like a module...
     
    [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present,
    but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents.


    cheers,
    -yves



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
    For additional commands, e-mail:
    xliff-help@lists.oasis-open.org


     











  • 14.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 20:26
    The extra nesting level is also ugly here. Use an attribute.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Wednesday, November 28, 2012 6:14 PM To: Estreen, Fredrik; Rodolfo M. Raya; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Fredrick for the productive feedback. Since the id is specifically intended to refer to the <mrk> element, we should at least introduce a type attribute then. Additionally, it would still be worthwhile in our opinion to be able to group <notes> with associated note “metadata” as you call it. Any thoughts on that? Perhaps a grouping element or a group attribute a la the discussion on metadata?   <notes>   <notegroup name=”instructions”>     <note id=”1” type=“instruction">Do not localize the product name</note>     <note type=“origin">loc-engineer</note>    <note type=”priority”>1</note>   </notegroup> </notes>   versus   <notes>   <note id=”1” type=“instruction" group=”instructions”>Do not localize the product name</note>  <note type=“origin" group=”instructions”>loc-engineer</note>  <note type=”priority” group=”instructions”>1</note> </notes>   Thanks, Ryan     From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Estreen, Fredrik Sent: Wednesday, November 28, 2012 2:11 AM To: Rodolfo M. Raya; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Hi Rodolfo, Ryan,   I think the intent of the <notes> is lost with the current proposal. The feature is designed so that <notes> is a container for a group of <note>s at a specific level in the document. Where each <note> is one annotation / comment in itself. The suggested change transforms that so that the <notes> element becomes the entity describing one note, with <note> describing specific pieces of metadata related to that note. The ID is intended to be used to refer to the note from other places such as from <mrk> elements in the inline content, so overloading it to be the type of data would cause additional problems.   I think the initial model is much easier to work with and more clean as it contain all note related information in one sub tree per document level where notes are allowed. Adding attributes to the <note> element is in my opinion the best way to go. If we should have more standard attributes or if a processor is free to use the third party namespace extension mechanism to add them is another question. Depending on how simple we want to keep the basic notes feature it could be either or a mix of the two methods.   Although I’m not a fan of the third party extensions I think this is a case where they could make sense. And if used for process specific metadata only I don’t see an issue. Of course there will be no standard way to display them in a UI or report if they are not specified in the standard.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Rodolfo M. Raya Sent: den 28 november 2012 09:32 To: xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Still a bad use case that doesn’t justify ruining a good design.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Wednesday, November 28, 2012 5:32 AM To: Rodolfo M. Raya; < xliff@lists.oasis-open.org >; Yves Savourel Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   So that our original reason for proposing having more than one <notes> at the extension point does not get obfuscated in all of the replies and “see inlines”, here once again, is the use case for adding more than one <notes> per extension:   Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module. We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped and displayed together, such as in an editor UI. Here are some examples:   <notes name="comments" >   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=“origin">developer</note>   <note id=”priority”>1</note> </notes>   <notes name="instructions" >   <note id=“instruction">Do not localize the product name</note>   <note id=“origin">loc-engineer</note>   <note id=”priority”>2</note> </notes>   As opposed to something less structured and more difficult to process:   <notes>   <note id=“instruction">Do not localize the product name</note>   <note id=“instruction-origin">loc-engineer</note>   <note id=”instructions-priority”>1</note>   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=”comment-priority”>2</note> </notes>   Thanks, Ryan   From: Rodolfo M. Raya [ mailto:rmraya@maxprograms.com ] Sent: Tuesday, November 27, 2012 5:41 PM To: Ryan King Cc: Yves Savourel; < xliff@lists.oasis-open.org > Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Please don't ruin te design for <notes>. Only one should be allowed per insertion point.   Regards, Rodolfo Sent from my iPad On Nov 27, 2012, at 9:45 PM, "Ryan King" < ryanki@microsoft.com > wrote: Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?   ·          Proposal 2: Be able to specify optional custom values for match type in <mtc:matches> ·          Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow zero, one or more <notes> in each position in the tree structure)   Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:   1.       That we allow zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data. 2.       Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from the call will correct that if false.   Please let me know how I can work with you on these. Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Friday, November 16, 2012 5:02 PM To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 15.  Re: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 20:50
    Fredrik, all, same as Fredrik, I think that extensibility makes sense here. I agree that the grouping mechanism in the style of mda is not appropriate here and would change the semantics in an undesired way. Annotations are perfect extension points in general, and besides we need the extensibility here for the its mapping. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Wed, Nov 28, 2012 at 10:10 AM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com > wrote: Hi Rodolfo, Ryan,   I think the intent of the <notes> is lost with the current proposal. The feature is designed so that <notes> is a container for a group of <note>s at a specific level in the document. Where each <note> is one annotation / comment in itself. The suggested change transforms that so that the <notes> element becomes the entity describing one note, with <note> describing specific pieces of metadata related to that note. The ID is intended to be used to refer to the note from other places such as from <mrk> elements in the inline content, so overloading it to be the type of data would cause additional problems.   I think the initial model is much easier to work with and more clean as it contain all note related information in one sub tree per document level where notes are allowed. Adding attributes to the <note> element is in my opinion the best way to go. If we should have more standard attributes or if a processor is free to use the third party namespace extension mechanism to add them is another question. Depending on how simple we want to keep the basic notes feature it could be either or a mix of the two methods.   Although I’m not a fan of the third party extensions I think this is a case where they could make sense. And if used for process specific metadata only I don’t see an issue. Of course there will be no standard way to display them in a UI or report if they are not specified in the standard.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [mailto: xliff@lists.oasis-open.org ] On Behalf Of Rodolfo M. Raya Sent: den 28 november 2012 09:32 To: xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Still a bad use case that doesn’t justify ruining a good design.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Wednesday, November 28, 2012 5:32 AM To: Rodolfo M. Raya; < xliff@lists.oasis-open.org >; Yves Savourel Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   So that our original reason for proposing having more than one <notes> at the extension point does not get obfuscated in all of the replies and “see inlines”, here once again, is the use case for adding more than one <notes> per extension:   Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module. We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped and displayed together, such as in an editor UI. Here are some examples:   <notes name="comments" >   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=“origin">developer</note>   <note id=”priority”>1</note> </notes>   <notes name="instructions" >   <note id=“instruction">Do not localize the product name</note>   <note id=“origin">loc-engineer</note>   <note id=”priority”>2</note> </notes>   As opposed to something less structured and more difficult to process:   <notes>   <note id=“instruction">Do not localize the product name</note>   <note id=“instruction-origin">loc-engineer</note>   <note id=”instructions-priority”>1</note>   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=”comment-priority”>2</note> </notes>   Thanks, Ryan   From: Rodolfo M. Raya [ mailto:rmraya@maxprograms.com ] Sent: Tuesday, November 27, 2012 5:41 PM To: Ryan King Cc: Yves Savourel; < xliff@lists.oasis-open.org > Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Please don't ruin te design for <notes>. Only one should be allowed per insertion point.   Regards, Rodolfo Sent from my iPad On Nov 27, 2012, at 9:45 PM, "Ryan King" < ryanki@microsoft.com > wrote: Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?   ·          Proposal 2: Be able to specify optional custom values for match type in <mtc:matches> ·          Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow zero, one or more <notes> in each position in the tree structure)   Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:   1.       That we allow zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data. 2.       Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from the call will correct that if false.   Please let me know how I can work with you on these. Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Friday, November 16, 2012 5:02 PM To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 16.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 20:58
    David or Frederick, can you give us an XLIFF example of how that would look?   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. David Filip Sent: Wednesday, November 28, 2012 12:50 PM To: Estreen, Fredrik Cc: Rodolfo M. Raya; xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Fredrik, all, same as Fredrik, I think that extensibility makes sense here. I agree that the grouping mechanism in the style of mda is not appropriate here and would change the semantics in an undesired way. Annotations are perfect extension points in general, and besides we need the extensibility here for the its mapping.   Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Wed, Nov 28, 2012 at 10:10 AM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com > wrote: Hi Rodolfo, Ryan,   I think the intent of the <notes> is lost with the current proposal. The feature is designed so that <notes> is a container for a group of <note>s at a specific level in the document. Where each <note> is one annotation / comment in itself. The suggested change transforms that so that the <notes> element becomes the entity describing one note, with <note> describing specific pieces of metadata related to that note. The ID is intended to be used to refer to the note from other places such as from <mrk> elements in the inline content, so overloading it to be the type of data would cause additional problems.   I think the initial model is much easier to work with and more clean as it contain all note related information in one sub tree per document level where notes are allowed. Adding attributes to the <note> element is in my opinion the best way to go. If we should have more standard attributes or if a processor is free to use the third party namespace extension mechanism to add them is another question. Depending on how simple we want to keep the basic notes feature it could be either or a mix of the two methods.   Although I’m not a fan of the third party extensions I think this is a case where they could make sense. And if used for process specific metadata only I don’t see an issue. Of course there will be no standard way to display them in a UI or report if they are not specified in the standard.   Regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [mailto: xliff@lists.oasis-open.org ] On Behalf Of Rodolfo M. Raya Sent: den 28 november 2012 09:32 To: xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Still a bad use case that doesn’t justify ruining a good design.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: Ryan King [ mailto:ryanki@microsoft.com ] Sent: Wednesday, November 28, 2012 5:32 AM To: Rodolfo M. Raya; < xliff@lists.oasis-open.org >; Yves Savourel Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   So that our original reason for proposing having more than one <notes> at the extension point does not get obfuscated in all of the replies and “see inlines”, here once again, is the use case for adding more than one <notes> per extension:   Proposal 4: Add an optional name attribute on <notes> in core and <mds:metadata> module. We believe it will be typical for content providers to want to group their notes or metadata in meaningful ways. This might be done so that a certain number of notes or bits of metadata can be processed in the same way, or simply grouped and displayed together, such as in an editor UI. Here are some examples:   <notes name="comments" >   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=“origin">developer</note>   <note id=”priority”>1</note> </notes>   <notes name="instructions" >   <note id=“instruction">Do not localize the product name</note>   <note id=“origin">loc-engineer</note>   <note id=”priority”>2</note> </notes>   As opposed to something less structured and more difficult to process:   <notes>   <note id=“instruction">Do not localize the product name</note>   <note id=“instruction-origin">loc-engineer</note>   <note id=”instructions-priority”>1</note>   <note id=“comment">This string cannot be longer than 100 characters</note>   <note id=”comment-priority”>2</note> </notes>   Thanks, Ryan   From: Rodolfo M. Raya [ mailto:rmraya@maxprograms.com ] Sent: Tuesday, November 27, 2012 5:41 PM To: Ryan King Cc: Yves Savourel; < xliff@lists.oasis-open.org > Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Please don't ruin te design for <notes>. Only one should be allowed per insertion point.   Regards, Rodolfo Sent from my iPad On Nov 27, 2012, at 9:45 PM, "Ryan King" < ryanki@microsoft.com > wrote: Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?   ·          Proposal 2: Be able to specify optional custom values for match type in <mtc:matches> ·          Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow zero, one or more <notes> in each position in the tree structure)   Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:   1.       That we allow zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data. 2.       Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from the call will correct that if false.   Please let me know how I can work with you on these. Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Friday, November 16, 2012 5:02 PM To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org    


  • 17.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 22:12
    Since the <notes> proposal has spawned its own thread, I’m resending this to bring the <mtc:matches> proposals into focus as well. Yves , Can you please let me know what we need to do to move forward with getting these implemented?   1.        Be able to specify optional custom values for match type in <mtc:matches> 2.        Support Reference Language in <mtc:matches> ·          Allow zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data. ·          Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       Thanks, Ryan   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Tuesday, November 27, 2012 3:46 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Hi Yves, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. I believe you are the owner for the matches module and notes. Can you please let me know what we need to do to move forward with getting these implemented?   ·          Proposal 2: Be able to specify optional custom values for match type in <mtc:matches> ·          Proposal 4: Add an optional name attribute on <notes> in core (which also means that we need to allow zero, one or more <notes> in each position in the tree structure)   Additionally, it was deemed that we should add Reference Language to the <mtc:matches> module. How do you want to move forward with that? Since the module is already defined in the 2.0 spec, can I just suggest the method and if you agree, you can fold it into the current module definition? I would propose:   1.        That we allow zero, one or more <mtc:matches> at each extension point, because you might have both recycling and reference language data. 2.        Add an optional attribute reference=”yes no” with no as default. Additionally, PR for a “reference match” would be to allow an xml:lang on the target different from the document and allow the <source> not to be present as it would be redundant information with the core <source>, e.g. Spanish reference for Quechua might look like this:   <mtc:matches>   <mtc:match reference=”yes”>    <segment>     <target xml:lang=”es-es”>hola mundo</target>    </segment>   </mtc:match> </match>       I’m not sure if any of these require an electronic ballot. I got the impression from the call that they don’t, but hopefully Bryan or David or someone else from the call will correct that if false.   Please let me know how I can work with you on these. Ryan   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Friday, November 16, 2012 5:02 PM To: Dr. David Filip; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   Thanks Yves and David for the valuable feedback. See our comments inline below prefixed with [Microsoft]. As David suggested on another thread, we will add these soon to the wiki.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, November 15, 2012 5:24 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan et. al.   Commenting inline.. Cheers dF On Thu, Nov 15, 2012 at 8:23 PM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, > Proposal 1: Add an optional build attribute to 2.0 <file> element in core. > .. > <file id=”1” original=”mainUI.resx” build="2011-11-23-133615307_windc.win8.beta.b01"> I don't see anything wrong with this.   > Proposal 2: Be able to specify optional custom values for match type > attribute in the <mtc:matches> module. > Content providers and Localization Suppliers base their cost and billing > models on match similarity and match types. Localization suppliers charge > us differently for ICE Matches, Exact Matches, and Fuzzy Matches, and we > might even want to get more granular than that as our cost and billing models > evolve with the business. > In 2.0, the match type doesn’t support the values exact-match and fuzzy-match, > which were defined in the state-qualifier attribute in 1.2. Instead of supporting > these two, or any others that may not have migrated from 1.2 to 2.0, > as a separate attribute, the request is, that like the discussion on state > and sub-state in the Face-to-Face in Seattle, we add a sub-type to match type. > This will allow us to add extra business logic to types, such as "tm" or "mt", > which are already defined in the spec. > <match id=”1” similarity=”100.0” type=”tm/xlf:exact”> > <match id=”1” similarity=”75.0” type=”tm/xlf:fuzzy”> > <match id=”1” similarity=”99.0” type=”tm/custom:near-exact”> I understand the need for the information, but to me, it seems the similarity give you whether a match is exact or not. The example however, shows (I think) that you are thinking about categories that could be mapped differently to the similarity depending on projects. For example in one project a near-match corresponds to one range and in another to a different range, and you want to simply map that info to something common across your process, without having to carry the ranges around. If that's the case I wonder if XLIFF should define any default like xlf:exact, etc. I believe there is value in decoupling the "percentage" from the "business" type of the match. The number means nothing unless we opt to prescribe a specific variety of (modified) Levenshtein, and I i guess we should not open this particular can of worms..   So I wouldn't see a problem with a sub-type there. A side comment on the match type: especially, if we allow sub-type, I'm still not sure about the values currently listed.   [Microsoft] we definitely advocate decoupling the “percentage” from the “business” type of match as David puts it. And we should not prescribe meaning to the percentage, either. Costing models built on top of these values will necessarily change from one provider/supplier to the next and as Yves states, possibly from one project to the next. We could very easily have the following (and we do in much of our recycled content):   <match id=”1” similarity=”100.0” type=”tm/xlf:exact”>   <match id=”1” similarity=”100.0” type=”ice”> In the first case, we’ve recycled a candidate which is 100% match, but came from a segment whose state isn’t signed off or final yet, whereas the ice match, in our case, has the requirement of being 100% and signed off or final. > Proposal 3: Add an optional Reference Language to core. > This is a crucial feature for Microsoft and other large companies that localize > minority languages. For example, it is typical that when we localize from > English into Quechua, localizers are more efficient and provide much higher > quality translation, when along with English source, we provide them with > Spanish target. In 1.2, Reference Languages could be defined in > an <alt-trans> element: I see the use case and I've seen other cases like this, with Chinese (simplified/Traditional). Could that be part of the match module? Possibly with a new attribute (e.g. reference='yes no' defaulting to no) Adding something along with <source>/<target> is bound to cause additional PR issues. If it's part of the Match module, it just uses whatever the module PRs are.   I agree with Yves's reasons to have this within the match module, which is anyway the alt-trans successor. I guess it does not fulfill the core criteria   [Microsoft] Adding this to the match module would be fine as long as the proper explanatory text and processing instructions make it clear what this data should be used for as opposed to recycling. > Proposal 4: Add an optional name attribute on <notes> in core > and <mds:metadata> module. > We believe it will be typical for content providers to want to > ... > <notes name="comments"> >  <note id=“comment">This string cannot be longer than 100 characters</note> >  <note id=“user"> Developer@microsoft.com </note> >  <note id=“date">10/21/2012 5:28:13 PM</note> > </notes> Sounds reasonable. We'll have to allow several <notes> and <m:metadadat> (I think (but I may be wrong) only one is allowed)) on the extension point. The example makes me wonder about the long term life of XLIFF though: likely this type of info (author, timestamp) will be needed by other. Maybe a better way to address it would be to add attributes to the note and meta that carry the author and time stamp? That would obviously work only if those two info are the only example you have in mind.   I agree with Yves that a couple of standard attributes should be added to increase interoperability, still I believe that note should be fully extendable, as it is part of the general annotation mechanism and should be able to carry attributes from other namespaces.   [Microsoft] Capturing an author and timestamp on a comment is specific to our needs and thus that example. However, we do see value in being able to apply an author and timestamp on potentially any piece of data. So a module (as Yves suggests below) that can exists at the same extension points as metadata (and including metadata) might lend itself better to that.   > Proposal 5: Add optional change tracking attributes to <segment>. > ... > <segment id=”1” modifiedBy=” translator@loc.com ” > modifiedDate=”10/21/2012 5:28:13 PM”> >    <source>hello world</source> >    <target>hola món</target> > </segment> Here again I'm wondering if a "change track" module may be better? You could use it not just on segments but other elements: notes. The issue then would be how this gets updated if it's not a core component? Actually if it's a core attribute, does it means it's not optional? I'm not sure there is a way, even with a PR, to guarantee these data will be up-to-date. But maybe that's ok?   Optional attributes in core are tricky, IMHO It means you do not need to introduce it yourself, if you do not feel so.. But if present it would need to be processed by agents who modify the segment. If it is thinkable that change agents do not update it, it feels more like a module...   [Microsoft] Since we are heading down the same path to MUST preserve modules as well, if we introduce a “change track” module, then user agents would need to preserve it if present, but as for any other processing requirements, such as updating it, that could be specified as part of the module’s processing requirements. For example: The module MUST be preserved and SHOULD be updated by user agents. cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 18.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-29-2012 03:48
    Hi Ryan, all, Sorry for the delay: I'm just swamped and can't find the time to read emails anymore. > 1. Be able to specify optional custom values > for match type in <mtc:matches> I suppose some mechanism similar to the subType we're using in inline codes and other places could allow for custom values while making sure a top-level category is also declared. Since we are discussing values for match type: I'm still not convinced that the latest list makes sense: am - Assembled Match ebm - Example-based Machine Translation idm - ID-based Match ice - In-Context Exact Match mt - Machine Translation tm - Translation Memory Match - 'Example-based Machine Translation' should not be there IMO: it's just MT, what type of MT is not relevant (but could be a candidate for the subtype) - 'In-Context Exact Match' IMO should be 'in-context' only: the fact that's an exact one is captured in the similarity (and it could be an in-context fuzzy too). > 2. Support Reference Language in <mtc:matches> > • Allow zero, one or more <mtc:matches> at each extension point, because > you might have both recycling and reference language data. I assume you mean: allow more than one <mtc:matches> where we currently allow one? Not in *all* extensions point. right? > • Add an optional attribute reference=”yes no” with no as default. > Additionally, PR for a “reference match” would be to allow an xml:lang on the target > different from the document and allow the <source> not to be present > as it would be redundant information with the core <source>, e.g. Spanish > reference for Quechua might look like this: - reference='yes
    o' and allowing a different language for xml:lang in those with reference='yes' seems ok to me. - source not being present... I don't know. If we do that for those 'matches' why not for the normalmatches as well? If the source is the same. I think we mandated the source originally that's to simplify processing: testing for the presence of not of the source may be cumbersome for some processors (XSLT maybe?). We would need to update the definition of what a "match" is as well. hope this helps, -ys


  • 19.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-29-2012 07:14
    Thanks Yves, see my inline to your inline. Please let me know if there is anything I can do to help you document and get this added to the specification. Do you feel we need to have a roll call vote on these items in the next TC call? Thanks, Ryan


  • 20.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-01-2012 15:05
    Hi Ryan, all, > ... see my inline to your inline. > Please let me know if there is anything I can do to help you document > and get this added to the specification. > Do you feel we need to have a roll call vote on these items in the next TC call? (this is related to the proposed changes in the match module) see below). Personally I think it's best to work by consensus first, and only go to ballot when there is no consensus. This TC is very ballot-driven so you should do whatever make sense in your opinion. As for moving things forward: - type probably needs a revised list - subType and ref probably need to be defined as they would appear in the specification. So people can see it and provide feedback if they want. If there is no feedback, one can assume there is no dissent and update the specification. I'm afraid I have not much time to do specification update currently, but Bryan, Tom or David may. cheers, (and sorry for being slow to answer emails) -yves


  • 21.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-04-2012 11:34
    Yves, I still believe we need to add termbase matches to the list. I don't see any category below in which a termbase match could be grouped. While I'm not disputing there may be some, I'm not personally aware of any tool that does not separate the terminology base from the TM. I understand that frequently the termbase is used to identify or replace terminology within a segment, and that's not a segment "match", but there are a lot of valid situations in which the entire segment is replaced from the termbase. One of the best examples I have is when translating UN documents / conference meeting minutes, there is always a list, many pages long, of all participating delegates. We advise the users of our software to enter these in a termbase - I understand this is not traditional terminology but if you can automatically translate these, it's about saving time. Same thing with slogans, titles of government ministries that change routinely (at least in Canada they do!), standard disclaimers, etc. Shirley


  • 22.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-11-2012 23:34
    Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match. Thanks, ryan


  • 23.  Re: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-12-2012 00:07
    I support adding private subtype Pending issues: - Freeze of the normative top level list - Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??   Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com > wrote: Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match. Thanks, ryan


  • 24.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-12-2012 06:36
    To be honest, I originally proposed concatenated because I thought that was what we agreed on for subState at the f2f and I wanted to follow suit…but maybe I misremembered that. I actually think a separate attribute is better. It is cleaner as you say, and I don’t think it is really a heavy requirement to ask user agents to drop the subtype when the main type changes (or is deleted), which I agree is the correct behavior.   Should we define any sub values in Xliff such as “fuzzy” or “exact”? I would actually put “ice” here as well and not in the main type attribute. I reference Wikipedia for my reasoning J http://en.wikipedia.org/wiki/Translation_memory : Retrieval Several different types of matches can be retrieved from a TM. Exact match Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100 % matches". In-Context Exact (ICE) match or Guaranteed Match An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions. Fuzzy match When the match is not exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.   So now we would have something like this: <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”> <match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”> <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”> <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>   Thanks, ryan   From: Dr. David Filip [mailto:David.Filip@ul.ie] Sent: Tuesday, December 11, 2012 4:06 PM To: Ryan King Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   I support adding private subtype   Pending issues: - Freeze of the normative top level list - Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue   Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??     Cheers dF   Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com > wrote: Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match. Thanks, ryan


  • 25.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-16-2012 03:54
    Further comments or discussion J ?   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Tuesday, December 11, 2012 10:35 PM To: Dr. David Filip Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals   To be honest, I originally proposed concatenated because I thought that was what we agreed on for subState at the f2f and I wanted to follow suit…but maybe I misremembered that. I actually think a separate attribute is better. It is cleaner as you say, and I don’t think it is really a heavy requirement to ask user agents to drop the subtype when the main type changes (or is deleted), which I agree is the correct behavior.   Should we define any sub values in Xliff such as “fuzzy” or “exact”? I would actually put “ice” here as well and not in the main type attribute. I reference Wikipedia for my reasoning J http://en.wikipedia.org/wiki/Translation_memory : Retrieval Several different types of matches can be retrieved from a TM. Exact match Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100 % matches". In-Context Exact (ICE) match or Guaranteed Match An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions. Fuzzy match When the match is not exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is specified.   So now we would have something like this: <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”> <match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”> <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”> <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>   Thanks, ryan   From: Dr. David Filip [ mailto:David.Filip@ul.ie ] Sent: Tuesday, December 11, 2012 4:06 PM To: Ryan King Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   I support adding private subtype   Pending issues: - Freeze of the normative top level list - Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue   Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??     Cheers dF   Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie   On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com > wrote: Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match. Thanks, ryan


  • 26.  RE: [xliff] Match type and subType

    Posted 12-16-2012 13:08
    Hi Ryan, Shirley, all, Here are a few comments: === I think having a subType for match is fine. and it can work, as you noted, in a similar way as the type/subtype for the inline codes. === for the values of type. Currently the specification lists: - am Assembled Match - ebm Example-based Machine Translation - idm ID-based Match - ice In-Context Exact Match - mt Machine Translation - tm Translation Memory Match a) In my opinion we should not have attributes that don't provide the same information several times. So 'in-context exact' is not right because it states it is an exact match and that information is already carried with similarity. b) 'Example-based Machine Translation' is also wrong in this list because the list doesn't specify what kind of MT the match is coming from. Not only that information is probably not very useful, but we would then also need to define other types of MT (rule based, statistical, hybrid, etc.) The definition ("Indicates the type of a <match> element.") is not very specific. So I would propose the following for the type attribute: type - value providing additional information about how the match was generated or qualifying further the relevance of the match. The list of pre-defined values is general and user-specific information can be added using the subtype attribute. Possible values: - am - assembled match: Match generated by assembling different translation parts together. - mt - machine translation candidate: Match generated by a machine translation system. - icm - in context match: Match for which the context is the same as the context of the source content. For example the source text for both contents is also preceded by an identical source segment. - idm - identifier-based match: Match that has an identifier identical to the source content. For example the previous translation of a given UI component with the same ID. - tb - term base match: Match obtained from a terminological data base. - tm - simple translation memory: Match obtained from a translation memory. - other - other type of match: Type of match not covered by any of the other top-level types. One can further specify the type of a match using the subType attribute. (and the definitions can be improved I'm sure). === As for subtype. I'm not sure we should define any default values (But we should certainly reserve the 'xlf' prefix, for possible future values). I think defining an 'ice' match for subtype is not useful: it's redundant with similarity='100' + type='icm', and it also opens the door to have tools setting one attribute and not the others. So basically requiring extra processing requirements in order to insure duplication of information. The same goes for 'exact', 'fuzzy', etc. If an authority wants to define sub-types such as 'fuzzy', 'near', 'exact' and 'ice'. that's fine (it may correspond to different type of payment for examples), but in my opinion it's a user-defined information. so we would have something like this: subType - Indicates the secondary level type for a match. Value description: The value is composed of a prefix and a sub-value separated by a character : (U+003A). The prefix is a string uniquely identifying a collection of values for a specific authority. The sub-value is any string value defined by an authority. The prefix xlf is reserved for this specification, but no sub-values are defined for it at this time. Other prefixes and sub-values may be defined by the users. Default value: Undefined Used in: <match> Processing Requirements • If the attribute subType is used, the attribute type MUST be specified as well. • If the attribute type is modified, the attribute subType MUST be updated or deleted. cheers, -yves


  • 27.  RE: [xliff] Match type and subType

    Posted 01-19-2013 00:46
    +1 For the proposal as Yves outlines it. Just a note on the types and definitions, however. Going from my experience, which I understand is not everyone's experience, you can have a tm that stores an entire XLIFF document, not just segments from the document. In this way, it is entirely possible that your 100% match can be from a tm and be either exact (segment match) or in context (contextual match). In your list below tm and icm are mutually exclusive. However, using the subtype attribute, you could define type="tm" and subtype="abc:exact" or subtype="abc:context" (I'll avoid calling it ice here), so I think this proposal will cover the scenario adequately. Thanks, ryan


  • 28.  RE: [xliff] Match type and subType

    Posted 01-19-2013 13:21
    hi Ryan, > Going from my experience, which I understand is not everyone's experience, > you can have a tm that stores an entire XLIFF document, not just segments > from the document. In this way, it is entirely possible that your 100% match > can be from a tm and be either exact (segment match) or in context (contextual > match). In your list below tm and icm are mutually exclusive. However, using > the subtype attribute, you could define type="tm" and subtype="abc:exact" > or subtype="abc:context" (I'll avoid calling it ice here), so I think this > proposal will cover the scenario adequately. The type 'tm' does not indicate an exact match, just a match coming from a translation memory without other specific qualifier. So if your match is coming from a TM and is contextual, one would use 'icm'. Exactness is indicated by the similarity attribute. cheers, -yves


  • 29.  RE: [xliff] Match type and subType

    Posted 01-21-2013 18:25
    Thanks for the clarification, Yves. I understand a match of any similarity can come from a tm. My question was about how to capture an icm coming from a tm since they both unique values. This statement clears it up, though, thanks: ...*without other specific qualifier*. So if your match is coming from a TM and is contextual, one would use 'icm' Maybe you could add that statement "without other specific qualifier" to the value definition for tm. Still +1 on your proposal.


  • 30.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-17-2012 16:42
    I have some concerns about the similarity
    attribute, until there is a openly acknowledged standard around matching
    proximity, having that attribute does not make sense to me. What does it
    mean when your tool say 75%? What happens if my tool does not acknowledge
    the calculation to be agreeable? Note that I am not suggesting it is not
    useful information but I think the cart is in front of the horse.

    Based on an agreed upon matching standard,
    I do not believe there will be a need for subType. Most of the information
    specified by similarity would be sufficient enough for determining what
    the subType would be.




    From:      
      Ryan King <ryanki@microsoft.com>
    To:      
      Ryan King <ryanki@microsoft.com>,
    "Dr. David Filip" <David.Filip@ul.ie>
    Cc:      
      Shirley Coady <scoady@multicorpora.com>,
    Yves Savourel <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org"
    <xliff@lists.oasis-open.org>
    Date:      
      12/15/2012 10:54 PM
    Subject:    
        RE: [xliff]
    1.2 to 2.0 Gaps and Proposals
    Sent by:    
        <xliff@lists.oasis-open.org>




    Further comments or discussion
    J ?
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Tuesday, December 11, 2012 10:35 PM
    To: Dr. David Filip
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    To be honest, I originally proposed concatenated
    because I thought that was what we agreed on for subState at the f2f and
    I wanted to follow suit…but maybe I misremembered that. I actually think
    a separate attribute is better. It is cleaner as you say, and I don’t
    think it is really a heavy requirement to ask user agents to drop the subtype
    when the main type changes (or is deleted), which I agree is the correct
    behavior.
     
    Should we define any sub values in Xliff
    such as “fuzzy” or “exact”? I would actually put “ice” here as well
    and not in the main type attribute. I reference Wikipedia for my reasoning
    J
    http://en.wikipedia.org/wiki/Translation_memory :
    Retrieval
    Several different types of matches can be retrieved from a TM.
    Exact match
    Exact matches appear when the match between
    the current source segment and the stored one is a character by character
    match. When translating a sentence, an exact match means the same sentence
    has been translated before. Exact matches are also called "100 % matches".
    In-Context Exact (ICE) match or Guaranteed
    Match
    An ICE match is an exact match that occurs
    in exactly the same context, that is, the same location in a paragraph.
    Context is often defined by the surrounding sentences and attributes such
    as document file name, date, and permissions.
    Fuzzy match
    When the match is not exact, it is a "fuzzy"
    match. Some systems assign percentages to these kinds of matches, in which
    case a fuzzy match is greater than 0% and less than 100%. Those figures
    are not comparable across systems unless the method of scoring is specified.
     
    So now we would have something like this:

    <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>
    <match id=”1” similarity=”99.0”
    type=”tm” subtype=”ms:near-exact”>
    <match id=”1” similarity=”100.0”
    type=”tm” subtype=”xlf:exact”>
    <match id=”1” similarity=”100.0”
    type=”tm” subtype=”xlf:ice”>
     
    Thanks,
    ryan
     
    From: Dr. David Filip [ mailto:David.Filip@ul.ie ]

    Sent: Tuesday, December 11, 2012 4:06 PM
    To: Ryan King
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    I support adding private subtype
     
    Pending issues:
    - Freeze of the normative top level
    list
    - Mechnics of subtype, we should
    be probably using the same mechanics consistently, i.e. either concatenated
    or separate attributes. This is a spec wide issue
     
    Separate seems cleaner, but concatenation
    seems better for processing, subtype is automatically dropped when main
    type changed, which seems desirable ??  
     
    Cheers
    dF
     
    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158

    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie
     
    On Tue, Dec 11, 2012 at 11:32 PM,
    Ryan King < ryanki@microsoft.com >
    wrote:
    Thanks Yves and Shirley, while
    we are discussing the correct list of match values, I'd like to know from
    the list if we have consensus on adding a subtype for match.

    Thanks,
    ryan




  • 31.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 01-17-2013 18:02




    One of the main reasons why having an extensible match subtype makes sense is because the cost and billing models between content providers and localization
    supplier can differ from one to the next. If I have a 100% match from a TM database, that match might just be an exact match or it might be an in context exact match. Microsoft might have a contract to pay their localization supplier to review the exact match,
    but not the in context exact match. Another company might have a different cost model where they pay to have the in context exact match reviewed as well.
     
    Thanks,
    Ryan
     
    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Helena S Chapman
    Sent: Monday, December 17, 2012 8:42 AM
    To: Ryan King
    Cc: Dr. David Filip; Ryan King; Shirley Coady; xliff@lists.oasis-open.org; Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    I have some concerns about the similarity attribute, until there is a openly acknowledged standard around matching proximity, having that attribute does not make sense to me.
    What does it mean when your tool say 75%? What happens if my tool does not acknowledge the calculation to be agreeable? Note that I am not suggesting it is not useful information but I think the cart is in front of the horse.


    Based on an agreed upon matching standard, I do not believe there will be a need for subType. Most of the information specified by similarity would be sufficient enough for determining what the
    subType would be.




    From:         Ryan King < ryanki@microsoft.com >

    To:         Ryan King < ryanki@microsoft.com >, "Dr. David Filip" < David.Filip@ul.ie >

    Cc:         Shirley Coady < scoady@multicorpora.com >, Yves Savourel
    < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >

    Date:         12/15/2012 10:54 PM

    Subject:         RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Sent by:         < xliff@lists.oasis-open.org >







    Further comments or discussion
    J ?

     

    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Tuesday, December 11, 2012 10:35 PM
    To: Dr. David Filip
    Cc: Shirley Coady; Yves Savourel;
    xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    To be honest, I originally proposed concatenated because I thought that was what we agreed on for subState at the f2f and I wanted to follow suit…but maybe I misremembered that. I actually think
    a separate attribute is better. It is cleaner as you say, and I don’t think it is really a heavy requirement to ask user agents to drop the subtype when the main type changes (or is deleted), which I agree is the correct behavior.

     
    Should we define any sub values in Xliff such as “fuzzy” or “exact”? I would actually put “ice” here as well and not in the main type attribute. I reference Wikipedia for my reasoning
    J
    http://en.wikipedia.org/wiki/Translation_memory :

    Retrieval
    Several different types of matches can be retrieved from a TM.
    Exact match

    Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence
    has been translated before. Exact matches are also called "100 % matches".

    In-Context Exact (ICE) match or Guaranteed Match

    An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as
    document file name, date, and permissions.
    Fuzzy match

    When the match is not exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures
    are not comparable across systems unless the method of scoring is specified.

     
    So now we would have something like this:


    <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>
    <match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>

     
    Thanks,
    ryan
     

    From: Dr. David Filip [ mailto:David.Filip@ul.ie ]

    Sent: Tuesday, December 11, 2012 4:06 PM
    To: Ryan King
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals

     
    I support adding private subtype
     
    Pending issues:
    - Freeze of the normative top level list
    - Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue

     
    Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??  

     
    Cheers
    dF
     
    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie

     
    On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com > wrote:

    Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match.

    Thanks,
    ryan




  • 32.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 01-18-2013 02:24
    Wait, so are we suggesting to include cost
    (element) model along with matching attribute all in XLIFF? To the tools,
    why would it matter whether or not a match is in-context-exact-match or
    exact-match from a sister product's translation last 6 month, an exact-match
    from public domain memories, or exact match from another product 20 years
    ago etc. Most of us really only care about two types of exact matches:
    1) a real exact match within context 2) everything else. How a vendor is
    paid within the cost model of that company is established by an contractual
    agreement that lives outside of the XLIFF document. Same principle should
    apply to fuzzy matches.

    Unless we are talking about mixing MS
    content along with Oracle content in the same document and therefore there
    is a need to distinguish between which one is which when you pay your vendor,
    within the same organization, what's the value of having the ms namespace
    tagged along with the content?



    From:      
      Ryan King <ryanki@microsoft.com>
    To:      
      Helena S Chapman/San
    Jose/IBM@IBMUS
    Cc:      
      "Dr. David Filip"
    <David.Filip@ul.ie>, Shirley Coady <scoady@multicorpora.com>,
    "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org>,
    Yves Savourel <ysavourel@enlaso.com>
    Date:      
      01/17/2013 01:08 PM
    Subject:    
        RE: [xliff]
    1.2 to 2.0 Gaps and Proposals




    One of the main reasons why
    having an extensible match subtype makes sense is because the cost and
    billing models between content providers and localization supplier can
    differ from one to the next. If I have a 100% match from a TM database,
    that match might just be an exact match or it might be an in context exact
    match. Microsoft might have a contract to pay their localization supplier
    to review the exact match, but not the in context exact match. Another
    company might have a different cost model where they pay to have the in
    context exact match reviewed as well.
     
    Thanks,
    Ryan
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Helena S Chapman
    Sent: Monday, December 17, 2012 8:42 AM
    To: Ryan King
    Cc: Dr. David Filip; Ryan King; Shirley Coady; xliff@lists.oasis-open.org;
    Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    I have some concerns about the similarity
    attribute, until there is a openly acknowledged standard around matching
    proximity, having that attribute does not make sense to me. What does it
    mean when your tool say 75%? What happens if my tool does not acknowledge
    the calculation to be agreeable? Note that I am not suggesting it is not
    useful information but I think the cart is in front of the horse.


    Based on an agreed upon matching standard, I do not believe there will
    be a need for subType. Most of the information specified by similarity
    would be sufficient enough for determining what the subType would be.





    From:         Ryan
    King < ryanki@microsoft.com >

    To:         Ryan King
    < ryanki@microsoft.com >,
    "Dr. David Filip" < David.Filip@ul.ie >

    Cc:         Shirley
    Coady < scoady@multicorpora.com >,
    Yves Savourel < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         12/15/2012
    10:54 PM
    Subject:         RE:
    [xliff] 1.2 to 2.0 Gaps and Proposals

    Sent by:         < xliff@lists.oasis-open.org >







    Further comments or discussion J ?

     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Tuesday, December 11, 2012 10:35 PM
    To: Dr. David Filip
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

     
    To be honest, I originally proposed concatenated because I thought that
    was what we agreed on for subState at the f2f and I wanted to follow suit…but
    maybe I misremembered that. I actually think a separate attribute is better.
    It is cleaner as you say, and I don’t think it is really a heavy requirement
    to ask user agents to drop the subtype when the main type changes (or is
    deleted), which I agree is the correct behavior.

     
    Should we define any sub values in Xliff such as “fuzzy” or “exact”?
    I would actually put “ice” here as well and not in the main type attribute.
    I reference Wikipedia for my reasoning J
    http://en.wikipedia.org/wiki/Translation_memory :

    Retrieval
    Several different types of matches can be retrieved from a TM.

    Exact match
    Exact matches appear when the match between the current source segment
    and the stored one is a character by character match. When translating
    a sentence, an exact match means the same sentence has been translated
    before. Exact matches are also called "100 % matches".

    In-Context Exact (ICE) match or Guaranteed Match

    An ICE match is an exact match that occurs in exactly the same context,
    that is, the same location in a paragraph. Context is often defined by
    the surrounding sentences and attributes such as document file name, date,
    and permissions.
    Fuzzy match
    When the match is not exact, it is a "fuzzy" match. Some systems
    assign percentages to these kinds of matches, in which case a fuzzy match
    is greater than 0% and less than 100%. Those figures are not comparable
    across systems unless the method of scoring is specified.

     
    So now we would have something like this:


    <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>

    <match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>

     
    Thanks,
    ryan
     
    From: Dr. David Filip [ mailto:David.Filip@ul.ie ]

    Sent: Tuesday, December 11, 2012 4:06 PM
    To: Ryan King
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals

     
    I support adding private subtype
     
    Pending issues:
    - Freeze of the normative top level list
    - Mechnics of subtype, we should be probably using the same mechanics consistently,
    i.e. either concatenated or separate attributes. This is a spec wide issue

     
    Separate seems cleaner, but concatenation seems better for processing,
    subtype is automatically dropped when main type changed, which seems desirable
    ??  
     
    Cheers
    dF
     
    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie

     
    On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com >
    wrote:
    Thanks Yves and Shirley, while we are discussing the correct list of match
    values, I'd like to know from the list if we have consensus on adding a
    subtype for match.

    Thanks,
    ryan




  • 33.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 01-18-2013 20:46




    We’re not suggesting including a cost model, only extensible metadata on the match type that can be used during statistical analysis of the XLIFF to categorize
    matches as they might relate to a contractual agreement that lives outside of the XLIFF document.
     
    Your last statement assumes that every company uses a centralized contractual agreement for billing and there is no need to differentiate between internal teams/divisions,
    which is not always the case.
     
    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Helena S Chapman
    Sent: Thursday, January 17, 2013 6:24 PM
    To: Ryan King
    Cc: Dr. David Filip; Shirley Coady; xliff@lists.oasis-open.org; Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    Wait, so are we suggesting to include cost (element) model along with matching attribute all in XLIFF? To the tools, why would it matter whether or not a match is in-context-exact-match
    or exact-match from a sister product's translation last 6 month, an exact-match from public domain memories, or exact match from another product 20 years ago etc. Most of us really only care about two types of exact matches: 1) a real exact match within context
    2) everything else. How a vendor is paid within the cost model of that company is established by an contractual agreement that lives outside of the XLIFF document. Same principle should apply to fuzzy matches.

    Unless we are talking about mixing MS content along with Oracle content in the same document and therefore there is a need to distinguish between which one is which when you pay your vendor, within
    the same organization, what's the value of having the ms namespace tagged along with the content?




    From:         Ryan King < ryanki@microsoft.com >

    To:         Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         "Dr. David Filip" < David.Filip@ul.ie >, Shirley Coady < scoady@multicorpora.com >,
    " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com >

    Date:         01/17/2013 01:08 PM

    Subject:         RE: [xliff] 1.2 to 2.0 Gaps and Proposals







    One of the main reasons why having an extensible match subtype makes sense is because the cost and billing models between content providers and localization supplier can differ
    from one to the next. If I have a 100% match from a TM database, that match might just be an exact match or it might be an in context exact match. Microsoft might have a contract to pay their localization supplier to review the exact match, but not the in
    context exact match. Another company might have a different cost model where they pay to have the in context exact match reviewed as well.

     

    Thanks,

    Ryan

     

    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Helena S Chapman
    Sent: Monday, December 17, 2012 8:42 AM
    To: Ryan King
    Cc: Dr. David Filip; Ryan King; Shirley Coady;
    xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    I have some concerns about the similarity attribute, until there is a openly acknowledged standard around matching proximity, having that attribute does not make sense to me. What does it mean
    when your tool say 75%? What happens if my tool does not acknowledge the calculation to be agreeable? Note that I am not suggesting it is not useful information but I think the cart is in front of the horse.


    Based on an agreed upon matching standard, I do not believe there will be a need for subType. Most of the information specified by similarity would be sufficient enough for determining what the subType would be.





    From:         Ryan King < ryanki@microsoft.com >

    To:         Ryan King < ryanki@microsoft.com >,
    "Dr. David Filip" < David.Filip@ul.ie >

    Cc:         Shirley Coady < scoady@multicorpora.com >,
    Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         12/15/2012 10:54 PM

    Subject:         RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Sent by:         < xliff@lists.oasis-open.org >

     







    Further comments or discussion J ?

     
    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Tuesday, December 11, 2012 10:35 PM
    To: Dr. David Filip
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

     
    To be honest, I originally proposed concatenated because I thought that was what we agreed on for subState at the f2f and I wanted to follow suit…but maybe I misremembered that. I actually think a separate attribute is better. It is cleaner as you say, and
    I don’t think it is really a heavy requirement to ask user agents to drop the subtype when the main type changes (or is deleted), which I agree is the correct behavior.

     
    Should we define any sub values in Xliff such as “fuzzy” or “exact”? I would actually put “ice” here as well and not in the main type attribute. I reference Wikipedia for my reasoning
    J
    http://en.wikipedia.org/wiki/Translation_memory :

    Retrieval
    Several different types of matches can be retrieved from a TM.

    Exact match

    Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100 %
    matches".

    In-Context Exact (ICE) match or Guaranteed Match

    An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions.

    Fuzzy match

    When the match is not exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is
    specified.

     
    So now we would have something like this:


    <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>

    <match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>

     
    Thanks,

    ryan

     
    From: Dr. David Filip [ mailto:David.Filip@ul.ie ]

    Sent: Tuesday, December 11, 2012 4:06 PM
    To: Ryan King
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals

     
    I support adding private subtype
     
    Pending issues:
    - Freeze of the normative top level list
    - Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue

     
    Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??  

     
    Cheers
    dF
     
    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie

     
    On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com > wrote:

    Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match.

    Thanks,
    ryan




  • 34.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 01-18-2013 21:57
    I see, so the point is that XLIFF should
    be able to handle ms:aspdotnet_matches, ms:windowsphone_matches, ms:whatever?




    From:      
      Ryan King <ryanki@microsoft.com>
    To:      
      Helena S Chapman/San
    Jose/IBM@IBMUS
    Cc:      
      "Dr. David Filip"
    <David.Filip@ul.ie>, Shirley Coady <scoady@multicorpora.com>,
    "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org>,
    Yves Savourel <ysavourel@enlaso.com>
    Date:      
      01/18/2013 03:46 PM
    Subject:    
        RE: [xliff]
    1.2 to 2.0 Gaps and Proposals
    Sent by:    
        <xliff@lists.oasis-open.org>




    We’re not suggesting including
    a cost model, only extensible metadata on the match type that can be used
    during statistical analysis of the XLIFF to categorize matches as they
    might relate to a contractual agreement that lives outside of the XLIFF
    document.
     
    Your last statement assumes
    that every company uses a centralized contractual agreement for billing
    and there is no need to differentiate between internal teams/divisions,
    which is not always the case.
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Helena S Chapman
    Sent: Thursday, January 17, 2013 6:24 PM
    To: Ryan King
    Cc: Dr. David Filip; Shirley Coady; xliff@lists.oasis-open.org; Yves
    Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    Wait, so are we suggesting to include cost
    (element) model along with matching attribute all in XLIFF? To the tools,
    why would it matter whether or not a match is in-context-exact-match or
    exact-match from a sister product's translation last 6 month, an exact-match
    from public domain memories, or exact match from another product 20 years
    ago etc. Most of us really only care about two types of exact matches:
    1) a real exact match within context 2) everything else. How a vendor is
    paid within the cost model of that company is established by an contractual
    agreement that lives outside of the XLIFF document. Same principle should
    apply to fuzzy matches.

    Unless we are talking about mixing MS content along with Oracle content
    in the same document and therefore there is a need to distinguish between
    which one is which when you pay your vendor, within the same organization,
    what's the value of having the ms namespace tagged along with the content?




    From:         Ryan
    King < ryanki@microsoft.com >

    To:         Helena
    S Chapman/San Jose/IBM@IBMUS

    Cc:         "Dr.
    David Filip" < David.Filip@ul.ie >,
    Shirley Coady < scoady@multicorpora.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         01/17/2013
    01:08 PM
    Subject:         RE:
    [xliff] 1.2 to 2.0 Gaps and Proposals







    One of the main reasons why having an extensible match subtype makes sense
    is because the cost and billing models between content providers and localization
    supplier can differ from one to the next. If I have a 100% match from a
    TM database, that match might just be an exact match or it might be an
    in context exact match. Microsoft might have a contract to pay their localization
    supplier to review the exact match, but not the in context exact match.
    Another company might have a different cost model where they pay to have
    the in context exact match reviewed as well.

     
    Thanks,
    Ryan
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Helena S Chapman
    Sent: Monday, December 17, 2012 8:42 AM
    To: Ryan King
    Cc: Dr. David Filip; Ryan King; Shirley Coady; xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals

     
    I have some concerns about the similarity attribute, until there is a openly
    acknowledged standard around matching proximity, having that attribute
    does not make sense to me. What does it mean when your tool say 75%? What
    happens if my tool does not acknowledge the calculation to be agreeable?
    Note that I am not suggesting it is not useful information but I think
    the cart is in front of the horse.


    Based on an agreed upon matching standard, I do not believe there will
    be a need for subType. Most of the information specified by similarity
    would be sufficient enough for determining what the subType would be.





    From:         Ryan
    King < ryanki@microsoft.com >

    To:         Ryan King
    < ryanki@microsoft.com >,
    "Dr. David Filip" < David.Filip@ul.ie >

    Cc:         Shirley
    Coady < scoady@multicorpora.com >,
    Yves Savourel < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         12/15/2012
    10:54 PM
    Subject:         RE:
    [xliff] 1.2 to 2.0 Gaps and Proposals

    Sent by:         < xliff@lists.oasis-open.org >


     






    Further comments or discussion J ?


    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Tuesday, December 11, 2012 10:35 PM
    To: Dr. David Filip
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


    To be honest, I originally proposed concatenated because I thought that
    was what we agreed on for subState at the f2f and I wanted to follow suit…but
    maybe I misremembered that. I actually think a separate attribute is better.
    It is cleaner as you say, and I don’t think it is really a heavy requirement
    to ask user agents to drop the subtype when the main type changes (or is
    deleted), which I agree is the correct behavior.


    Should we define any sub values in Xliff such as “fuzzy” or “exact”?
    I would actually put “ice” here as well and not in the main type attribute.
    I reference Wikipedia for my reasoning J
    http://en.wikipedia.org/wiki/Translation_memory :

    Retrieval
    Several different types of matches can be retrieved from a TM.

    Exact match
    Exact matches appear when the match between the current source segment
    and the stored one is a character by character match. When translating
    a sentence, an exact match means the same sentence has been translated
    before. Exact matches are also called "100 % matches".

    In-Context Exact (ICE) match or Guaranteed Match

    An ICE match is an exact match that occurs in exactly the same context,
    that is, the same location in a paragraph. Context is often defined by
    the surrounding sentences and attributes such as document file name, date,
    and permissions.
    Fuzzy match
    When the match is not exact, it is a "fuzzy" match. Some systems
    assign percentages to these kinds of matches, in which case a fuzzy match
    is greater than 0% and less than 100%. Those figures are not comparable
    across systems unless the method of scoring is specified.


    So now we would have something like this:


    <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>

    <match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>


    Thanks,
    ryan

    From: Dr. David Filip [ mailto:David.Filip@ul.ie ]

    Sent: Tuesday, December 11, 2012 4:06 PM
    To: Ryan King
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals


    I support adding private subtype

    Pending issues:
    - Freeze of the normative top level list
    - Mechnics of subtype, we should be probably using the same mechanics consistently,
    i.e. either concatenated or separate attributes. This is a spec wide issue


    Separate seems cleaner, but concatenation seems better for processing,
    subtype is automatically dropped when main type changed, which seems desirable
    ??  

    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie


    On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com >
    wrote:
    Thanks Yves and Shirley, while we are discussing the correct list of match
    values, I'd like to know from the list if we have consensus on adding a
    subtype for match.

    Thanks,
    ryan




  • 35.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 01-19-2013 00:07




    No, the point is not about Microsoft’s desire to specify a bunch of internal product-specific match types. It is about being able to differentiating different
    types of matches. For example, ones that have the same percentage: 100% Exact, 100% In Context, and treating them differently, whether it is for billing, or even recycling.

     
     
    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Helena S Chapman
    Sent: Friday, January 18, 2013 1:57 PM
    To: Ryan King
    Cc: Dr. David Filip; Shirley Coady; xliff@lists.oasis-open.org; Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    I see, so the point is that XLIFF should be able to handle ms:aspdotnet_matches, ms:windowsphone_matches, ms:whatever?




    From:         Ryan King < ryanki@microsoft.com >

    To:         Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         "Dr. David Filip" < David.Filip@ul.ie >, Shirley Coady < scoady@multicorpora.com >,
    " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com >

    Date:         01/18/2013 03:46 PM

    Subject:         RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Sent by:         < xliff@lists.oasis-open.org >







    We’re not suggesting including a cost model, only extensible metadata on the match type that can be used during statistical analysis of the XLIFF to categorize matches as they might
    relate to a contractual agreement that lives outside of the XLIFF document.

     

    Your last statement assumes that every company uses a centralized contractual agreement for billing and there is no need to differentiate between internal teams/divisions, which
    is not always the case.
     

    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Helena S Chapman
    Sent: Thursday, January 17, 2013 6:24 PM
    To: Ryan King
    Cc: Dr. David Filip; Shirley Coady;
    xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    Wait, so are we suggesting to include cost (element) model along with matching attribute all in XLIFF? To the tools, why would it matter whether or not a match is in-context-exact-match or exact-match
    from a sister product's translation last 6 month, an exact-match from public domain memories, or exact match from another product 20 years ago etc. Most of us really only care about two types of exact matches: 1) a real exact match within context 2) everything
    else. How a vendor is paid within the cost model of that company is established by an contractual agreement that lives outside of the XLIFF document. Same principle should apply to fuzzy matches.

    Unless we are talking about mixing MS content along with Oracle content in the same document and therefore there is a need to distinguish between which one is which when you pay your vendor, within the same organization, what's the value of having the ms namespace
    tagged along with the content?



    From:         Ryan King < ryanki@microsoft.com >

    To:         Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         "Dr. David Filip" < David.Filip@ul.ie >,
    Shirley Coady < scoady@multicorpora.com >, " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com >

    Date:         01/17/2013 01:08 PM

    Subject:         RE: [xliff] 1.2 to 2.0 Gaps and Proposals

     







    One of the main reasons why having an extensible match subtype makes sense is because the cost and billing models between content providers and localization supplier can differ from one to the next. If I have a 100% match from a TM database, that match might
    just be an exact match or it might be an in context exact match. Microsoft might have a contract to pay their localization supplier to review the exact match, but not the in context exact match. Another company might have a different cost model where they
    pay to have the in context exact match reviewed as well.

     
    Thanks,

    Ryan

     
    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Helena S Chapman
    Sent: Monday, December 17, 2012 8:42 AM
    To: Ryan King
    Cc: Dr. David Filip; Ryan King; Shirley Coady; xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals
     
    I have some concerns about the similarity attribute, until there is a openly acknowledged standard around matching proximity, having that attribute does not make sense to me. What does it mean when your tool say 75%? What happens if my tool does not acknowledge
    the calculation to be agreeable? Note that I am not suggesting it is not useful information but I think the cart is in front of the horse.


    Based on an agreed upon matching standard, I do not believe there will be a need for subType. Most of the information specified by similarity would be sufficient enough for determining what the subType would be.





    From:         Ryan King < ryanki@microsoft.com >

    To:         Ryan King < ryanki@microsoft.com >,
    "Dr. David Filip" < David.Filip@ul.ie >

    Cc:         Shirley Coady < scoady@multicorpora.com >,
    Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         12/15/2012 10:54 PM

    Subject:         RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Sent by:         < xliff@lists.oasis-open.org >


     








    Further comments or discussion J ?


    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Tuesday, December 11, 2012 10:35 PM
    To: Dr. David Filip
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: RE: [xliff] 1.2 to 2.0 Gaps and Proposals


    To be honest, I originally proposed concatenated because I thought that was what we agreed on for subState at the f2f and I wanted to follow suit…but maybe I misremembered that. I actually think a separate attribute is better. It is cleaner as you say, and
    I don’t think it is really a heavy requirement to ask user agents to drop the subtype when the main type changes (or is deleted), which I agree is the correct behavior.


    Should we define any sub values in Xliff such as “fuzzy” or “exact”? I would actually put “ice” here as well and not in the main type attribute. I reference Wikipedia for my reasoning
    J
    http://en.wikipedia.org/wiki/Translation_memory :

    Retrieval
    Several different types of matches can be retrieved from a TM.

    Exact match

    Exact matches appear when the match between the current source segment and the stored one is a character by character match. When translating a sentence, an exact match means the same sentence has been translated before. Exact matches are also called "100 %
    matches".

    In-Context Exact (ICE) match or Guaranteed Match

    An ICE match is an exact match that occurs in exactly the same context, that is, the same location in a paragraph. Context is often defined by the surrounding sentences and attributes such as document file name, date, and permissions.

    Fuzzy match

    When the match is not exact, it is a "fuzzy" match. Some systems assign percentages to these kinds of matches, in which case a fuzzy match is greater than 0% and less than 100%. Those figures are not comparable across systems unless the method of scoring is
    specified.

    So now we would have something like this:


    <match id=”1” similarity=”75.0” type=”tm” subtype=”xlf:fuzzy”>

    <match id=”1” similarity=”99.0” type=”tm” subtype=”ms:near-exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:exact”>

    <match id=”1” similarity=”100.0” type=”tm” subtype=”xlf:ice”>

    Thanks,

    ryan

    From: Dr. David Filip [ mailto:David.Filip@ul.ie ]

    Sent: Tuesday, December 11, 2012 4:06 PM
    To: Ryan King
    Cc: Shirley Coady; Yves Savourel; xliff@lists.oasis-open.org
    Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals


    I support adding private subtype

    Pending issues:
    - Freeze of the normative top level list
    - Mechnics of subtype, we should be probably using the same mechanics consistently, i.e. either concatenated or separate attributes. This is a spec wide issue


    Separate seems cleaner, but concatenation seems better for processing, subtype is automatically dropped when main type changed, which seems desirable ??  


    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie


    On Tue, Dec 11, 2012 at 11:32 PM, Ryan King < ryanki@microsoft.com > wrote:

    Thanks Yves and Shirley, while we are discussing the correct list of match values, I'd like to know from the list if we have consensus on adding a subtype for match.

    Thanks,
    ryan




  • 36.  Re: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-01-2012 19:18
    Yves, Ryan, the source should be required in all matches, reference or not. This was one very specific piece of feedback from the toolmakers on the 2nd XLIFF Symposium in Warsaw. SDL, Kilgray, Atril, and more agreed on that having no source in alt-trans complicated the processing unnecessarily and said that they would provide better support to an XLIFF local matching mechanism if it had mandatory source. We should honot this wish in the matches module IMHO So it might seem as redundancy but actually is not so bad and explicitly supported by the voice of an important constituency.. Cheers dF  Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Thu, Nov 29, 2012 at 3:47 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, Sorry for the delay: I'm just swamped and can't find the time to read emails anymore. > 1. Be able to specify optional custom values > for match type in <mtc:matches> I suppose some mechanism similar to the subType we're using in inline codes and other places could allow for custom values while making sure a top-level category is also declared. Since we are discussing values for match type: I'm still not convinced that the latest list makes sense: am - Assembled Match ebm - Example-based Machine Translation idm - ID-based Match ice - In-Context Exact Match mt - Machine Translation tm - Translation Memory Match - 'Example-based Machine Translation' should not be there IMO: it's just MT, what type of MT is not relevant (but could be a candidate for the subtype) - 'In-Context Exact Match' IMO should be 'in-context' only: the fact that's an exact one is captured in the similarity (and it could be an in-context fuzzy too). > 2. Support Reference Language in <mtc:matches> > • Allow zero, one or more <mtc:matches> at each extension point, because > you might have both recycling and reference language data. I assume you mean: allow more than one <mtc:matches> where we currently allow one? Not in *all* extensions point. right? > • Add an optional attribute reference=”yes no” with no as default. > Additionally, PR for a “reference match” would be to allow an xml:lang on the target > different from the document and allow the <source> not to be present > as it would be redundant information with the core <source>, e.g. Spanish > reference for Quechua might look like this: - reference='yes
    o' and allowing a different language for xml:lang in those with reference='yes' seems ok to me. - source not being present... I don't know. If we do that for those 'matches' why not for the normalmatches as well? If the source is the same. I think we mandated the source originally that's to simplify processing: testing for the presence of not of the source may be cumbersome for some processors (XSLT maybe?). We would need to update the definition of what a "match" is as well. hope this helps, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 37.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 12-03-2012 19:46
    Sounds good. Let’s keep source in Reference Language.   From: Dr. David Filip [mailto:David.Filip@ul.ie] Sent: Saturday, December 1, 2012 11:17 AM To: Yves Savourel Cc: Ryan King; xliff@lists.oasis-open.org Subject: Re: [xliff] 1.2 to 2.0 Gaps and Proposals   Yves, Ryan,   the source should be required in all matches, reference or not. This was one very specific piece of feedback from the toolmakers on the 2nd XLIFF Symposium in Warsaw. SDL, Kilgray, Atril, and more agreed on that having no source in alt-trans complicated the processing unnecessarily and said that they would provide better support to an XLIFF local matching mechanism if it had mandatory source. We should honot this wish in the matches module IMHO   So it might seem as redundancy but actually is not so bad and explicitly supported by the voice of an important constituency..   Cheers dF      Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Thu, Nov 29, 2012 at 3:47 AM, Yves Savourel < ysavourel@enlaso.com > wrote: Hi Ryan, all, Sorry for the delay: I'm just swamped and can't find the time to read emails anymore. > 1. Be able to specify optional custom values > for match type in <mtc:matches> I suppose some mechanism similar to the subType we're using in inline codes and other places could allow for custom values while making sure a top-level category is also declared. Since we are discussing values for match type: I'm still not convinced that the latest list makes sense: am - Assembled Match ebm - Example-based Machine Translation idm - ID-based Match ice - In-Context Exact Match mt - Machine Translation tm - Translation Memory Match - 'Example-based Machine Translation' should not be there IMO: it's just MT, what type of MT is not relevant (but could be a candidate for the subtype) - 'In-Context Exact Match' IMO should be 'in-context' only: the fact that's an exact one is captured in the similarity (and it could be an in-context fuzzy too). > 2. Support Reference Language in <mtc:matches> > • Allow zero, one or more <mtc:matches> at each extension point, because > you might have both recycling and reference language data. I assume you mean: allow more than one <mtc:matches> where we currently allow one? Not in *all* extensions point. right? > • Add an optional attribute reference=”yes no” with no as default. > Additionally, PR for a “reference match” would be to allow an xml:lang on the target > different from the document and allow the <source> not to be present > as it would be redundant information with the core <source>, e.g. Spanish > reference for Quechua might look like this: - reference='yes
    o' and allowing a different language for xml:lang in those with reference='yes' seems ok to me. - source not being present... I don't know. If we do that for those 'matches' why not for the normalmatches as well? If the source is the same. I think we mandated the source originally that's to simplify processing: testing for the presence of not of the source may be cumbersome for some processors (XSLT maybe?). We would need to update the definition of what a "match" is as well. hope this helps, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org  


  • 38.  RE: [xliff] 1.2 to 2.0 Gaps and Proposals

    Posted 11-28-2012 00:00
    Hi ALL, in last week’s TC call it was mentioned that I should work with the owners of the current features to get our requirements implemented for proposals that weren’t deemed as features. Can anyone tell me who the owner of the <file> element is so that we can discuss the implementation of the following: • Proposal 1: Add an optional build attribute to 2.0 <file> element in core. I’m not sure if this requires an electronic ballot. I got the impression from the call that it doesn’t, but hopefully Bryan or David or someone else from the call will correct that if false. Please let me know who I can work with on this. Ryan