Here is the draft for the new Validation Module as Kevin outlines below. Please give your comments and feedback.
Thanks,
Ryan
From:
xliff@lists.oasis-open.org [mailto:
xliff@lists.oasis-open.org]
On Behalf Of Estreen, Fredrik
Sent: Tuesday, February 26, 2013 10:22 AM
To: Helena S Chapman; Kevin O'Donnell
Cc:
xliff@lists.oasis-open.org Subject: RE: [xliff] R37: Revised Validations Module proposal
Hi Kevin, Helena
I think it makes sense to leave the RegEx and custom rules out of the spec for the first version. Selecting and defining a single RegEx flavor to use for localization standards
would indeed be good. But there are quite a few obstacles.
For me the proposed rules seem to cover the common cases I see. Except for the more complex ones involving content vs. inline tags. But such rules could be
added later.
The difference between noLoc and mustLoc is not so much about what would be flagged as it is about what message to report to the user for the different types
of error. It would make sense to use a common implementation for both but allow different severity / report message to be defined for the two types of rules. But with the implementation neutral specification that does not need to go in the standard. Perhaps
a clarifying sentence in the rule description would be helpful though.
The normalization issues brought up are interesting. I don’t think it is a major issue since the strings in the rules are tightly coupled with the content of
the file or the localization style of the files provider. So I expect that if a mustLoc rule requires localization to use a word with a “ß” in it that the rule should flag a replacement with “ss”. But normal Unicode normalization would probably make sense
(or be optionally available) since there will be no control on how the user entered the target content.
For spaces I’m not sure but I’m leaning towards thinking that it would make sense to have an option that either require identical type of (character code) spaces
in source and target, or treat all Unicode spaces as equal. I don’t think that the joiners are considered spaces and to me it would make sense to include them as any character in the text to match, Either they come from source (presumably coordinated with
the rules) or they are mandated / forbidden in the target portion for a rule like mustLoc based on expected localization style.
Overall I think the revised proposal looks like a good initial validation feature. Some details like the normalization issues and probably the exact set of
rules to include might need more discussion and tight(er) language. But the general design should allow for interoperable behavior without depending on solving the more complex issues that even extend beyond Xliff.
Best regards,
Fredrik Estreen
From:
xliff@lists.oasis-open.org [ mailto:
xliff@lists.oasis-open.org ]
On Behalf Of Helena S Chapman
Sent: den 26 februari 2013 16:30
To: Kevin O'Donnell
Cc:
xliff@lists.oasis-open.org Subject: Re: [xliff] R37: Revised Validations Module proposal
I am a little confused.
Isn't noLoc a duplicate of mustLoc? In other words, noLoc("Microsoft") is the same as mustLoc("Microsoft", "Microsoft")?
I am always nervous about substring or pattern matching operations when it comes to multiple languages.
For example, what would the tools do with the following?
mustLoc("résumé", " ? ? ")
vs mustLoc("resume", " ? ? ")? Does that mean the document will need
to contain all possible combination or will the document needs to specify which normalization form to process this with? How about ß in text vs "ss"/"sz" or "SS"/"SZ"/"Sz"/...? Or, the other way around, the
fi ligature? Same comment applies to startWith or endWith matching. Or the space character for that matter. zerowidth joiner considered a space? Should the tool use the Unicode standard's
definition of what a "space" character is? If not, what then? Does it matter if it's a visible or invisible formatting/control character?
Best regards,
Helena Shih Chapman
Globalization Technologies and Architecture
+1-720-396-6323 or T/L 938-6323
Waltham, Massachusetts
From: "Kevin O'Donnell" <
kevinod@microsoft.com >
To: "
xliff@lists.oasis-open.org " <
xliff@lists.oasis-open.org >
Date: 02/26/2013 02:07 AM
Subject: [xliff] R37: Revised Validations Module proposal
Sent by: <
xliff@lists.oasis-open.org >
Hi All,
Following the approval of the Validation feature in XLIFF 2.0, we spent some time thinking about our original proposal, researching available implementations and consulting with colleagues both
internally and outside of our company. We’d now like to present a revised, simplified proposal for implementation in XLIFF 2.0.
For us, inclusion of localization validation rules remains a high priority in XLIFF 2.0, to support localization constraints and better software localization suitability. We believe this goal
can be achieved with a straightforward Validations module that will encourage wide adoption and usage.
Background
Previously, we advocated a Validations feature that included pre-defined rules, custom rules and pattern matching using Regular _expression_. In our new proposal, we are dropping the proposal for
custom rules and pattern matching and, instead, advocating a set of normative validation rules
only .
Here’s a summary of the difficulties we found with proposals of pattern matching/custom rules:
·
Pattern Matching (with RegEx)
o As the previous discussion proved, it is difficult to identify a single RegEx engine that will have broad industry acceptance
o Currently, few RegEx engines have extensive UniCode support
o RegEx is powerful, but requires strong technical skills to utilize fully – may prohibit adoption
o RegEx is not human readable, compared to a standard rule set
·
Custom rules
o Custom rules may lead to unpredictable content in an XLIFF document. Again, based on feedback we’ve heard, it’s preferable
to have predictable rule set with strict processing instructions
If we start with a defined set of rules, we could always revisit the option to include Pattern Matching in XLIFF 2.x, giving us more time to examine the pattern matching idea (or develop a localization
RegEx standard).
Revised Proposal
Our revised proposal is based on 5 normative validation rules only. We believe these will be useful and powerful additions to the XLIFF 2.0 feature set:
·
A defined list of rules will provide certainty and predictability (i.e. no custom rules), which should encourage widespread adoption among XLIFF tool providers
·
Tool creators can map the defined list of rules to their UI features (e.g. flagging rule violations, adapting editing functionality based on rules)
·
The list is not exhaustive (intentionally so), but provides coverage for the most common cases of localization validation we have seen
List of rules:
Rule
Description
Example
1
strBegins
Test to verify that a target string begins with a defined value.
strBegins(nbsp;)
2
strEnds
Test to verify that a target string ends with a defined value.
strEnds(“.”) existsInSource=“yes”
3
mustLoc
Test for presence of substring in source text and verify existence of specific translated value in target string.
mustLoc(“email”, “Courriel”)
4
noLoc
Test to ensure presence of string/substring in target string, subject to its presence in the source string.
noLoc(“Microsoft”)
5
dblSpace
Test to verify that a certain number of double spaces exists in the target string.
dblSpace(0)
Notes/processing requirements:
·
The rules allow XLIFF agents to perform tests on given XLIFF documents. The processing agent may use its own business logic to determine how to interpret the result of the test (although
the result will be a consistent pass/fail, regardless of the agent performing the test)
·
As with other modules, if a tool does not support the Validations module, it may ignore it, but must preserve the rules
·
The rules may be augmented with an optional qualifier ( existsInSource ) to only test upon existence of the valid condition in source string. In the given example above, the strEnds
rule will only apply if the source text meets the condition (of ending with “.”). Without the
existsInSource qualifier, the default validation applies based on the target text only
·
The rules may be placed at the File or Unit level
·
At the file level, the rules are global
·
At the unit level, the rule applies to the target
·
At the unit level, it is possible to override a global rule
·
At the unit level, it is possible to disable a global rule
At this point, we’d like to hear feedback on the revised proposal. We will continue to refine the proposal and create examples for the specific feature implementation and appropriate syntax.
Thanks,
Microsoft Corporation.
(Kevin, Ryan, Alan, Uwe)
Attachment: Validation Module.docx Description: Validation Module.docx