OASIS XML Localisation Interchange File Format (XLIFF) TC

Expand all | Collapse all

2.0 Validations Module Proposal

  • 1.  2.0 Validations Module Proposal

    Posted 11-16-2012 00:02
    In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the first of the two, a Validation module.                      Validating localized target data is a very important part of the business of outsourcing localization, especially when the extracted source content comes from software. Typically, there is a plethora of tools that content providers and localization suppliers use to perform a multitude of validations. There is a strong desire in the industry to bring some consistency to this space, but there are currently no accepted standards or interchange formats that facilitate this activity. We would like to propose a Validation module that would help with standardizing this crucial activity.   The basic idea would be to define a small set of standard validation rules and standard descriptions for them that tool developers could consistently build business logic around. How a rule is applied to a string or sub-string would be done using regular expressions. These would all be contained in a Validations module.   Here’s a draft of the Module for comment:     Validations Module The target text of a document can be verified against various validation rules. The Validations Module should be able to store a list of pre-defined validation rules, along with a description about how to process the target text using those rules, to perform specific verifications.   Module Specification Module Namespace The namespace for the Verification module is: urn:oasis:names:tc:xliff:validations:2.0 Module Elements The elements defined in the Validations module are: <validations>, <validation>, and <matchExpression>. Tree Structure Legend: 1 = one + = one or more ? = zero or one   <validations> +                                +---<validation> +               +---<matchExpression> 1 validations Collection of validations to be applied by a validation engine   Contains: - One or more <validation> elements   Parents: <file>, <group>, <unit> and <segment>   Attributes: - name validation Specifies a validation rule, and a description and regular _expression_, which define how to apply that validation rule to the target text.   Contains: - One <matchExpression> element   Parents: <validations>   Attribute s: - id, rule, desc matchExpression A regular _expression_ used to match the target text or substring to which the validation rule is applied.   Contains: A regular _expression_   Parents: <validation>   Attribute s: - none   Module Attributes The attributes defined in the Validations module are: name, id, rule, and desc. name Name – The user-defined name of a named validations element.   Value description: NMTOKEN.   Default value: undefined   Used in: <validations>. id Identifier - A character string used to identify a <validation> element.   Value description: NMTOKEN.   Default value: undefined The value must be unique within the <validations> element.   Used in: <validation>. rule Validation Rule - Indicates the rule that a validation engine should apply to the target text.   Value description: A paired value with desc. See table below.   Default value: undefined   Used in: <validation>   desc Validation description – indicates how a specific rule should be applied to the target text.   Value description: A paired value with rule. See table below.   Default value: undefined   Used in: <validation>. Possible values for rule and desc attributes (format and number of rules TBD) : Rule Description maxLength:100 Match string can’t be longer than # of chars specified. minLength:10 Match string can’t be shorter than # of chars specified. noLoc Match string shouldn’t be localized Etc. Etc. Any custom rule Any custom description   Examples in XLIFF: Using the following segment as an example <segment>   <source> Contact me at someCompany: user@somecompany.com</source>   <target> Kontaktieren Sie mich unter someFirma : user@somecompany.com</target> </target> maxLength:100 . Matches “ Kontaktieren Sie mich unter someFirma : user@somecompany.com“. Match succeeds, so validation business logic checks to see if the string is less than 100 chars, that also succeeds, and the business logic then takes the appropriate action.   <val:validations>   <validation rule=”maxLength:100” desc=” Match string can’t be longer than # of chars specified.”>     <matchExpression>.</matchExpression>   </validation> </val:validations>   noLoc someCompany doesn’t match “someCompany” in the target text. Validation business logic takes the appropriate action for the match failure.   <val:validations>   <validation rule=”noLoc” desc=” Match string shouldn’t be localized.”>     <matchExpression>someCompany</matchExpression>   </validation> </val:validations>   Rules not defined in the Module can still be defined using the same mechanisms, though user agents that support the Validation Module may or may not have built-in implementation for them. An example might be to check if the target text contains a valid email address. validEmail [A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4} matches “user@somecompany.com”. Validation business logic takes appropriate action for the match success.   <val:validations>   <validation rule=”validEmail” desc=” Match string is a valid email address.”>     <matchExpression>[A-Z0-9._%+-]+@[A-Z0-9.-]+.[A-Z]{2,4}</matchExpression>   </validation> </val:validations>     Please let us know your opinion on this proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)  


  • 2.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-16-2012 14:01
    Hi Ryan, all,   I think a validation module would be quite nice to have. It would allow catching many issues where they really need to be caught: when translating.   A few notes of things to possibly consider:   -     What regular _expression_ syntax should the module use? ICU?, .NET?, Perl?, Java? XSD? ECMA? other? for interoperability purpose this is quite important to have a well defined way to write the regexes. I don’t have an answer. it’s just that there are precedents in SRX and ITS that demonstrate the problem is not easy to solve.   -     I notice the maxLength rule. How this would fit with the proposal that Fredrik put forward about length and size restriction? see https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/Length%20and%20Size%20Restrictions Or with the ITS Storage Size data category that would be in an ITS module. Somehow we would have to make sure there is one way to check one thing.   -     Maybe the ‘custom rule’ could be defined with a clearer PR. For example, the case of the email pattern doesn’t tell you if there is a problem. Maybe a more generic way to work with custom pattern could be to see of a pattern in the source matches the same number of occurrences in the target. For the email example, it would mean a red flag if the email is not found in the target. One could have more sophisticated options too, like have a pattern for both the source and the target. Checker tools like XBench, QADistiller, etc. have put a lot of thoughts into this. It would be nice to have equivalence.   -     It seems noLoc would be very similar to <mrk id='1' translate='no'>...</mrk> A rational to justify both method would be nice to offer to the implementers.       That’s all I have for now. -yves     From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Thursday, November 15, 2012 5:01 PM To: xliff@lists.oasis-open.org Subject: [xliff] 2.0 Validations Module Proposal   In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the first of the two, a Validation module.                      Validating localized target data is a very important part of the business of outsourcing localization, especially when the extracted source content comes from software. Typically, there is a plethora of tools that content providers and localization suppliers use to perform a multitude of validations. There is a strong desire in the industry to bring some consistency to this space, but there are currently no accepted standards or interchange formats that facilitate this activity. We would like to propose a Validation module that would help with standardizing this crucial activity.   The basic idea would be to define a small set of standard validation rules and standard descriptions for them that tool developers could consistently build business logic around. How a rule is applied to a string or sub-string would be done using regular expressions. These would all be contained in a Validations module.   Here’s a draft of the Module for comment:     Validations Module The target text of a document can be verified against various validation rules. The Validations Module should be able to store a list of pre-defined validation rules, along with a description about how to process the target text using those rules, to perform specific verifications.   Module Specification Module Namespace The namespace for the Verification module is: urn:oasis:names:tc:xliff:validations:2.0 Module Elements The elements defined in the Validations module are: <validations>, <validation>, and <matchExpression>. Tree Structure Legend: 1 = one + = one or more ? = zero or one   <validations> +                                +---<validation> +               +---<matchExpression> 1 validations Collection of validations to be applied by a validation engine   Contains: - One or more <validation> elements   Parents: <file>, <group>, <unit> and <segment>   Attributes: - name validation Specifies a validation rule, and a description and regular _expression_, which define how to apply that validation rule to the target text.   Contains: - One <matchExpression> element   Parents: <validations>   Attribute s: - id, rule, desc matchExpression A regular _expression_ used to match the target text or substring to which the validation rule is applied.   Contains: A regular _expression_   Parents: <validation>   Attribute s: - none   Module Attributes The attributes defined in the Validations module are: name, id, rule, and desc. name Name – The user-defined name of a named validations element.   Value description: NMTOKEN.   Default value: undefined   Used in: <validations>. id Identifier - A character string used to identify a <validation> element.   Value description: NMTOKEN.   Default value: undefined The value must be unique within the <validations> element.   Used in: <validation>. rule Validation Rule - Indicates the rule that a validation engine should apply to the target text.   Value description: A paired value with desc. See table below.   Default value: undefined   Used in: <validation>   desc Validation description – indicates how a specific rule should be applied to the target text.   Value description: A paired value with rule. See table below.   Default value: undefined   Used in: <validation>. Possible values for rule and desc attributes (format and number of rules TBD) : Rule Description maxLength:100 Match string can’t be longer than # of chars specified. minLength:10 Match string can’t be shorter than # of chars specified. noLoc Match string shouldn’t be localized Etc. Etc. Any custom rule Any custom description   Examples in XLIFF: Using the following segment as an example <segment>   <source> Contact me at someCompany: user@somecompany.com</source >   <target> Kontaktieren Sie mich unter someFirma : user@somecompany.com</target > </target> maxLength:100 . Matches “ Kontaktieren Sie mich unter someFirma : user@somecompany.com “. Match succeeds, so validation business logic checks to see if the string is less than 100 chars, that also succeeds, and the business logic then takes the appropriate action.   <val:validations>   <validation rule=”maxLength:100” desc=”Match string can’t be longer than # of chars specified.”>     <matchExpression>.</matchExpression>   </validation> </val:validations>   noLoc someCompany doesn’t match “someCompany” in the target text. Validation business logic takes the appropriate action for the match failure.   <val:validations>   <validation rule=”noLoc” desc=”Match string shouldn’t be localized.”>     <matchExpression>someCompany</matchExpression>   </validation> </val:validations>   Rules not defined in the Module can still be defined using the same mechanisms, though user agents that support the Validation Module may or may not have built-in implementation for them. An example might be to check if the target text contains a valid email address.   validEmail [A-Z0-9._%+-] +@[A-Z0-9.-]+.[A-Z]{2,4} matches “ user@somecompany.com ”. Validation business logic takes appropriate action for the match success.   <val:validations>   <validation rule=”validEmail” desc=”Match string is a valid email address.”>     <matchExpression>[A-Z0-9._%+-] +@[A-Z0-9.-]+.[A-Z]{2,4}</matchExpression >   </validation> </val:validations>     Please let us know your opinion on this proposal.   Thanks, Microsoft Corporation (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)  


  • 3.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-20-2012 01:13




    Thanks Yves for the comments and feedback . See our response inline.
     


    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Yves Savourel
    Sent: Friday, November 16, 2012 6:01 AM
    To: xliff@lists.oasis-open.org
    Subject: RE: [xliff] 2.0 Validations Module Proposal


     
    Hi Ryan, all,
     
    I think a validation module would be quite nice to have.
    It would allow catching many issues where they really need to be caught: when translating.
     
    A few notes of things to possibly consider:
     
    -    
    What regular _expression_ syntax should the module use? ICU?, .NET?, Perl?, Java? XSD? ECMA? other?
    for interoperability purpose this is quite important to have a well defined way to write the regexes.
    I don’t have an answer. it’s just that there are precedents in SRX and ITS that demonstrate the problem is not easy to solve.
     
    [ryanki] Good question. Maybe an attribute should be added to allow a user to define the regex language to use?
     
    -    
    I notice the maxLength rule. How this would fit with the proposal that Fredrik put forward about length and size restriction?
    see
    https://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/Length%20and%20Size%20Restrictions
    Or with the ITS Storage Size data category that would be in an ITS module.
    Somehow we would have to make sure there is one way to check one thing.
     
    [ryanki] Since maxLength is just another type of validation, we would advocate replacing it with the more general <validations> module.

     
    -    
    Maybe the ‘custom rule’ could be defined with a clearer PR. For example, the case of the email pattern doesn’t tell you if there is a problem. Maybe a more
    generic way to work with custom pattern could be to see of a pattern in the source matches the same number of occurrences in the target. For the email example, it would mean a red flag if the email is not found in the target.
    One could have more sophisticated options too, like have a pattern for both the source and the target.
    Checker tools like XBench, QADistiller, etc. have put a lot of thoughts into this. It would be nice to have equivalence.
     
    [ryanki] Along with “well-known” rules like noLoc, maxLength, minLength, etc. there should just be a generic one defined called matchStatus (or something) where only the success
    or failure of the match can be acted upon. Your example of source-target comparison should probably be specified as one of the well-known rules. Leaving “true” custom rules to be defined with an x- prefix that could be safely ignored by tools that don’t know
    anything beyond the “well-known” set.
     
    -    
    It seems noLoc would be very similar to <mrk id='1' translate='no'>...</mrk> A rational to justify both method would be nice to offer to the implementers.
     
    [ryanki] If you have the following source “Hello Microsoft” the tendency would be to use <mrk> to annotate it, or similarly, if I have “Hello %s”, the tendency might be to use <ph>
    to encode it. However, both cases introduce markup into my source that I may have to normalize during recycling to get a 100% match. So having a noLoc rule is a way to provide a “cleaner, no post-processing needed” source for recycling.  
     
     
    That’s all I have for now.
    -yves
     
     


    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Thursday, November 15, 2012 5:01 PM
    To: xliff@lists.oasis-open.org
    Subject: [xliff] 2.0 Validations Module Proposal


     
    In anticipation of closing down on 2.0, we have two new proposals for modules. In this mail, we are proposing the first of the two, a Validation module.
                        
    Validating localized target data is a very important part of the business of outsourcing localization, especially when the extracted source content comes from software. Typically, there is a plethora of tools that content providers and
    localization suppliers use to perform a multitude of validations. There is a strong desire in the industry to bring some consistency to this space, but there are currently no accepted standards or interchange formats that facilitate this activity. We would
    like to propose a Validation module that would help with standardizing this crucial activity.
     
    The basic idea would be to define a small set of standard validation rules and standard descriptions for them that tool developers could consistently build business logic around. How a rule is applied to a string or sub-string would be
    done using regular expressions. These would all be contained in a Validations module.
     

    Here’s a draft of the Module for comment:
     

     
    Validations Module
    The target text of a document can be verified against various validation rules. The Validations Module should be able to store a list of pre-defined validation rules, along with a description about how to process the target text using those rules, to perform
    specific verifications.
     
    Module Specification
    Module Namespace
    The namespace for the Verification module is: urn:oasis:names:tc:xliff:validations:2.0
    Module Elements
    The elements defined in the Validations module are: <validations>, <validation>, and <matchExpression>.
    Tree Structure
    Legend:
    1 = one + = one or more ? = zero or one
     
    <validations> +                               

    +---<validation> +
          
           +---<matchExpression> 1
    validations
    Collection of validations to be applied by a validation engine
     
    Contains:
    - One or more <validation> elements
     
    Parents:
    <file>, <group>, <unit> and <segment>
     
    Attributes:
    - name
    validation
    Specifies a validation rule, and a description and regular _expression_, which define how to apply that validation rule to the target text.
     
    Contains:
    - One <matchExpression> element
     
    Parents:
    <validations>
     
    Attribute s:
    - id, rule, desc
    matchExpression
    A regular _expression_ used to match the target text or substring to which the validation rule is applied.
     
    Contains:
    A regular _expression_
     
    Parents:
    <validation>
     
    Attribute s:
    - none
     
    Module Attributes
    The attributes defined in the Validations module are: name, id, rule, and desc.
    name
    Name – The user-defined name of a named validations element.
     
    Value description: NMTOKEN.
     
    Default value: undefined
     
    Used in: <validations>.
    id
    Identifier - A character string used to identify a <validation> element.
     
    Value description: NMTOKEN.
     
    Default value: undefined
    The value must be unique within the <validations> element.
     
    Used in: <validation>.
    rule
    Validation Rule - Indicates the rule that a validation engine should apply to the target text.
     
    Value description: A paired value with desc. See table below.
     
    Default value: undefined
     
    Used in: <validation>
     
    desc
    Validation description – indicates how a specific rule should be applied to the target text.
     
    Value description: A paired value with rule. See table below.
     
    Default value: undefined
     
    Used in: <validation>.
    Possible values for rule and desc attributes (format and number of rules TBD) :




    Rule


    Description




    maxLength:100


    Match string can’t be longer than # of chars specified.




    minLength:10


    Match string can’t be shorter than # of chars specified.




    noLoc


    Match string shouldn’t be localized




    Etc.


    Etc.




    Any custom rule


    Any custom description




     
    Examples in XLIFF:
    Using the following segment as an example
    <segment>
      <source> Contact me at someCompany:
    user@somecompany.com</source >
      <target> Kontaktieren Sie mich unter someFirma :
    user@somecompany.com</target >
    </target>
    maxLength:100
    . Matches “ Kontaktieren Sie mich unter someFirma :
    user@somecompany.com “.
    Match succeeds, so validation business logic checks to see if the string is less than 100 chars, that also succeeds, and the business logic then takes the appropriate action.
     
    <val:validations>
      <validation rule=”maxLength:100” desc=”Match string can’t be longer than # of chars specified.”>
        <matchExpression>.</matchExpression>
      </validation>
    </val:validations>
     
    noLoc
    someCompany doesn’t match “someCompany” in the target text.
    Validation business logic takes the appropriate action for the match failure.
     
    <val:validations>
      <validation rule=”noLoc” desc=”Match string shouldn’t be localized.”>
        <matchExpression>someCompany</matchExpression>
      </validation>
    </val:validations>
     
    Rules not defined in the Module can still be defined using the same mechanisms, though user agents that support the Validation Module may or may not have built-in implementation for them. An example might be to check if the target text
    contains a valid email address.
     
    validEmail
    [A-Z0-9._%+-] +@[A-Z0-9.-]+.[A-Z]{2,4} matches “ user@somecompany.com ”.
    Validation business logic takes appropriate action for the match success.
     
    <val:validations>
      <validation rule=”validEmail” desc=”Match string is a valid email address.”>
        <matchExpression>[A-Z0-9._%+-] +@[A-Z0-9.-]+.[A-Z]{2,4}</matchExpression >
      </validation>
    </val:validations>

     

     
    Please let us know your opinion on this proposal.
     
    Thanks,
    Microsoft Corporation
    (Ryan King, Kevin O'Donnell, Uwe Stahlschmidt, Alan Michael)
     






  • 4.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-20-2012 02:06
    > [ryanki] Good question. Maybe an attribute should be added > to allow a user to define the regex language to use? That one step toward better interoperability and at the same time, possibly toward less: Tools can identify what is the regex syntax, but now they have to implement more than one syntax. I don't have an answer, just thinking aloud. > [ryanki] Since maxLength is just another type of validation, > we would advocate replacing it with the more general > <validations> module. That's a good argument. Based on the module proposal Fredrik just posted it seems a complex one. maybe maxLength is just one of the many profiles? just thinking aloud here too. > [ryanki] If you have the following source “Hello Microsoft” > the tendency would be to use <mrk> to annotate it, or similarly, if I > have “Hello %s”, the tendency might be to use <ph> to encode it. > However, both cases introduce markup into my source that I may > have to normalize during recycling to get a 100% match. So having > a noLoc rule is a way to provide a “cleaner, no post-processing needed” > source for recycling. But now you have also a "don't translate" information decoupled from the segment that tools have to carry along with it. In many use cases having the inline markup is simpler and easier to work with (e.g. send the text to MT, etc.) Just thinking aloud here too. cheers, -yves


  • 5.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-28-2012 15:14
    Hi Ryan, Yves, I have been thinking about this proposal for some time now and have a few suggestions and comments. First I'd like to point out that I'm really in favor of a validation feature in the standard. But to make it part of the standard we need to ensure it is providing an interoperable baseline and works well with the other features in the standard. I have add to the discussion inline bellow and put some additional thoughts at the end. >


  • 6.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-29-2012 19:57
    Frederick, thanks for the constructive feedback. It seems that there has been some misconception that the original examples given were comprehensive, or the ones that should necessarily be defined, but they weren't. They only served as examples of the direction we might take. The original intent was to provide a validations module that would 1) Define a standard set of rules and descriptions that tool implementers could build business logic around to apply to the string or substring indicated by the match expression (defining those standard rules and what elements they affect would have been a wider committee activity) and 2) In the absence of a standard rule and description, allow a "custom rule", which would essentially be nothing more than the pass/fail of the match expression, and the action taken would be left to the user agent. In the meantime, we have looked at what the current inline markup in the spec offers and it does cover much of our current validation needs (though we still have the issue of normalizing strings for recycling, which we will need to deal with). Using XPath also sounds promising and easier for authors to write and maintain, plus it has the distinction of being a w3c standard (but we may need to think about 1.0 vs. 2.0 since 1.0 is in wider use). So, in the interest of time and simplicity, here is how we propose to move forward in this version of the standard: Allow users to store XPath expressions, an identifier (e.g. a rule name), and possibly a description in the module. Tool implementers could evaluate the success/failure of the match and take some action for that particular identifier according to their tool's business logic based on the description. The module can live at the <file>, <group>, or <unit> level, and because rules may indeed become complex, only standard core processing rules would apply. <val:validations> <validation rule=”mustExist” desc=”Match string should exist in both source and target.”>{XPath Expression}</validation> </val:validations> It will be up to further discussion and debate if we want to define a "standard" set of rules and descriptions in the module, which don't overlap with current inline markup, or simply allow users to define their own set as a first implementation of the module. Can we move forward with a vote on this one to approve or not approve? Thanks, Ryan


  • 7.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-29-2012 20:59
    Hi Ryan, Fredrik, all, While using an XPath expression may have the nice effect of eliminating the problem of the different Regex engines/syntaxes, it has: a) the (minor) drawback of still having to deal with potentially several versions of XPath, as Ryan noted. and b) it is very difficult to implement when not working directly on the XLIFF documents. It seems we tend often to forget that XLIFF is an *exchange* format and most tools on both ends will read the document in a structure/database that has nothing to do with XML. Sure one can convert back the entry into an XML fragment and apply the validation, but that is a lot of work. I can't think of a good solution for this, but I'm also not sure XPath is a better solution than regex: it solves some issues but bring new ones. cheers, -ys


  • 8.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-30-2012 01:45
    Hi Yves, Thanks for your input; you make a valid point with (b) below. Indeed, we recognize that XPath could prove troublesome when dealing with validation outside of the XLIFF document (which is a typical scenario). Given this, from our perspective, RegEx would be a more suitable choice for this proposal, as originally stated. Of course, it has some limitations, but we believe it's important to have a degree of certainty with the validation module, to have consistent implementation and support for validation in a wide variety of tools. If we leave the choice of rule engine open, we risk having no consistent support for rules. That said, it's not an easy choice to decide which RegEx engine(s) to officially support. We need some time to research the appropriate solution - perhaps others have opinions on worthy selections? Thanks, Kevin.


  • 9.  Re: [xliff] 2.0 Validations Module Proposal

    Posted 11-30-2012 10:51
    Well, this was extensively discussed on ITS, but the use case was just a subset of the validations we would cover. ITS 2.0 is able to specify allowed characters using a regexp. @Yves, would you care to summarize on what regexp set the group consolidated? I guess that the allowed characters are an important use case and it would be good to be consistent here. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Fri, Nov 30, 2012 at 1:34 AM, Kevin O'Donnell < kevinod@microsoft.com > wrote: Hi Yves, Thanks for your input; you make a valid point with (b) below. Indeed, we recognize that XPath could prove troublesome when dealing with validation outside of the XLIFF document (which is a typical scenario). Given this, from our perspective, RegEx would be a more suitable choice for this proposal, as originally stated. Of course, it has some limitations, but we believe it's important to have a degree of certainty with the validation module, to have consistent implementation and support for validation in a wide variety of tools. If we leave the choice of rule engine open, we risk having no consistent support for rules. That said, it's not an easy choice to decide which RegEx engine(s) to officially support. We need some time to research the appropriate solution - perhaps others have opinions on worthy selections? Thanks, Kevin.


  • 10.  Re: [xliff] 2.0 Validations Module Proposal

    Posted 11-30-2012 11:36
    Kevin, Yves, Ryan, all, having discussed this issue with LRC developers, we'd incline to support rather an XML native method, such as XPath or XQuery. Yves' argument that XLIFF is only a transfer format and that it would be useful for XLIFF processor owners to be able to run their checks on their proprietary often DB based representations is valid, and it would be indeed ideal to be able to facilitate this type of use. However the regex field seems to be too fragmented among platforms to be able to achieve a real benefit by abandoning XML native methods. Despite having XLIFF represented in a proprietary DB structure, the XLIFF processor must be able to recreate the XLIFF no later than for hand back. If it is too cumbersome for them to generate the XLIFFs or XLIFF fragments for validation purposes on the fly, nothing prevents them to interpret the XPath/XQuery validation rule into an Regex friendly to their particular platform. There are also tools that use XLIFF as its native processing format, and for them obviously the XPath or XQuery methods are the simplest. XQuery would be SQL friendlier, so that it would better facilitate an SQL based XLIFF processor transofrming the validation rules into a format that it can use natively. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Fri, Nov 30, 2012 at 10:50 AM, Dr. David Filip < David.Filip@ul.ie > wrote: Well, this was extensively discussed on ITS, but the use case was just a subset of the validations we would cover. ITS 2.0 is able to specify allowed characters using a regexp. @Yves, would you care to summarize on what regexp set the group consolidated? I guess that the allowed characters are an important use case and it would be good to be consistent here. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 mailto: david.filip@ul.ie On Fri, Nov 30, 2012 at 1:34 AM, Kevin O'Donnell < kevinod@microsoft.com > wrote: Hi Yves, Thanks for your input; you make a valid point with (b) below. Indeed, we recognize that XPath could prove troublesome when dealing with validation outside of the XLIFF document (which is a typical scenario). Given this, from our perspective, RegEx would be a more suitable choice for this proposal, as originally stated. Of course, it has some limitations, but we believe it's important to have a degree of certainty with the validation module, to have consistent implementation and support for validation in a wide variety of tools. If we leave the choice of rule engine open, we risk having no consistent support for rules. That said, it's not an easy choice to decide which RegEx engine(s) to officially support. We need some time to research the appropriate solution - perhaps others have opinions on worthy selections? Thanks, Kevin.


  • 11.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 11-30-2012 12:58
    Hi David, Kevin, all, > Well, this was extensively discussed on ITS, but the use > case was just a subset of the validations we would cover. > ITS 2.0 is able to specify allowed characters using a regexp. > @Yves, would you care to summarize on what regexp set the group consolidated? Currently the syntax for the ITS Allowed Character feature is a regular expression as defined in the Character Class of XML Schema. But, as David noted, the context for this ITS feature and the validation feature we are talking about here are different: Allowed Characters needs only a limited regular expression to work, while the validation case we need a full-fledge regular expression mechanism. In Allowed Characters a sub-set of the current syntax could be use and a) be enough to work in all use cases, and b) be common to most regular expression engines. The fact that ITS is not using that solution is not understandable to me. But that is irrelevant in our case. In the validation feature case using XPath or XQuery has the advantage to be a unique common syntax, but has, in my opinion, the same disadvantages as any other regular expression engine: it may not be available on all programming languages, and it may be difficult to implement when the tools are working outside of XLIFF, which is *just an exchange format*. > ...Despite having XLIFF represented in a proprietary DB structure, the XLIFF processor > must be able to recreate the XLIFF no later than for hand back. That is incorrect: most modern software tools work based on components. In the case of a validation component, that component is likely to not know anything about XLIFF because that's the job of another component to import the XLIFF document into the system's own document model. > ...If it is too cumbersome for them to generate the XLIFFs or XLIFF fragments > for validation purposes on the fly, nothing prevents them to interpret the > XPath/XQuery validation rule into an Regex friendly to their particular platform. Indeed, one can do anything in programming. But there is a big difference between 'nothing prevent you to do...' and doing it. To illustrate this: our tools set will not have a conformant implementation of the ITS allowed Characters feature if it ends up using today's choice of syntax. No matter how much a standard must look at the users interest, ultimately the implementers decide how far they are willing to go to accommodate interoperability. > ...There are also tools that use XLIFF as its native processing format, > and for them obviously the XPath or XQuery methods are the simplest. I recall explicitly stating long ago for the record that the TC members must never forget that XLIFF is *only* a *tool neutral exchange* format. It is not intended to be used as the native representation of any tool. Some tool do use it natively. Great: more power to them. But that is irrelevant for the TC. Arguing that a feature is better done one way because it's easier for the tools that use XLIFF natively is, in my opinion, a big misstep: It demonstrates that one's judgment for anything XLIFF is not completely in line with its main goal: being an exchange format. It's very hard to not make that misstep. But that's ok as long as someone raise the red flag when that happens. This said, using XPath or XQuery because a tool works with XML (not specifically XLIFF) is a valid argument. The main issue is that with the validation feature we are drifting away from XLIFF being a mere representation of the source/target content to storing metadata that cannot really be tool agnostic. It runs against the "tool neutral" aspect of XLIFF. That is where the borderline between tool-specific extensions and modules becomes blurred. I'm not sure there are good solutions for this besides identifying the type of data used, in this case the regular expression syntax. Regards, -yves XQuery would be SQL friendlier, so that it would better facilitate an SQL based XLIFF processor transofrming the validation rules into a format that it can use natively. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Fri, Nov 30, 2012 at 10:50 AM, Dr. David Filip <David.Filip@ul.ie> wrote: Well, this was extensively discussed on ITS, but the use case was just a subset of the validations we would cover. ITS 2.0 is able to specify allowed characters using a regexp. @Yves, would you care to summarize on what regexp set the group consolidated? I guess that the allowed characters are an important use case and it would be good to be consistent here. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Fri, Nov 30, 2012 at 1:34 AM, Kevin O'Donnell <kevinod@microsoft.com> wrote: Hi Yves, Thanks for your input; you make a valid point with (b) below. Indeed, we recognize that XPath could prove troublesome when dealing with validation outside of the XLIFF document (which is a typical scenario). Given this, from our perspective, RegEx would be a more suitable choice for this proposal, as originally stated. Of course, it has some limitations, but we believe it's important to have a degree of certainty with the validation module, to have consistent implementation and support for validation in a wide variety of tools. If we leave the choice of rule engine open, we risk having no consistent support for rules. That said, it's not an easy choice to decide which RegEx engine(s) to officially support. We need some time to research the appropriate solution - perhaps others have opinions on worthy selections? Thanks, Kevin.


  • 12.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 12-01-2012 00:45
    Lots of good discussion, thank you. We would like to move it to electronic ballot for approval (since I've only been to 2 of the past 5 calls, Uwe will make the official request) and we suggest the following options: 1 = one + = one or more ? = zero or one OPTION 1: <val:validations> 1 <validation rule=”” desc=””> + <matchExpression> 1 <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - an XPath 1.0 Expression The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. OPTION 2: <val:validations> 1 <validation rule=”” desc=””> + <matchExpression> 1 <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - an XPath 2.0 Expression The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. OPTION 3: <val:validations> 1 <validation rule=”” desc=””> + <matchExpression> 1 <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - a Regular Expression The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. Our preference is Option 1. Thanks, ryan


  • 13.  Re: [xliff] 2.0 Validations Module Proposal

    Posted 12-01-2012 01:13
    Ryan, why not offer an XQuery option. This seems most tool agnostic, because of its SQL friendliness. I would discourage having XPath 1 option, we should encourage adoption of the newer and more powerful standard. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Sat, Dec 1, 2012 at 12:44 AM, Ryan King < ryanki@microsoft.com > wrote: Lots of good discussion, thank you. We would like to move it to electronic ballot for approval (since I've only been to 2 of the past 5 calls, Uwe will make the official request) and we suggest the following options: 1 = one + = one or more ? = zero or one OPTION 1: <val:validations> 1   <validation rule=”” desc=””> +     <matchExpression> 1       <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - an XPath 1.0 _expression_ The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. OPTION 2: <val:validations> 1   <validation rule=”” desc=””> +     <matchExpression> 1       <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - an XPath 2.0 _expression_ The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. OPTION 3: <val:validations> 1   <validation rule=”” desc=””> +     <matchExpression> 1       <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - a Regular _expression_ The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. Our preference is Option 1. Thanks, ryan


  • 14.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 12-01-2012 07:57
    David, as I mentioned to Yves on the other thread:   [yves] At this point we might as well go full throttle and make that content a _javascript_ script. But as you can guess the least restrictive the content of matchExpression is, the least useable it is outside of that specific XLIFF document. [ryanki] T his is the same reason why we aren't proposing XQuery. However, if the TC members are OK with more than just the pattern match functionality that RegEx and XPath give us, i.e. being able to specify complex query and transformation functions akin to XSLT, then we can certainly add that option to a ballot…. thanks, ryan   From: Dr. David Filip [mailto:David.Filip@ul.ie] Sent: Friday, November 30, 2012 5:13 PM To: Ryan King Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] 2.0 Validations Module Proposal   Ryan, why not offer an XQuery option. This seems most tool agnostic, because of its SQL friendliness. I would discourage having XPath 1 option, we should encourage adoption of the newer and more powerful standard. Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Sat, Dec 1, 2012 at 12:44 AM, Ryan King < ryanki@microsoft.com > wrote: Lots of good discussion, thank you. We would like to move it to electronic ballot for approval (since I've only been to 2 of the past 5 calls, Uwe will make the official request) and we suggest the following options: 1 = one + = one or more ? = zero or one OPTION 1: <val:validations> 1   <validation rule=”” desc=””> +     <matchExpression> 1       <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - an XPath 1.0 _expression_ The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. OPTION 2: <val:validations> 1   <validation rule=”” desc=””> +     <matchExpression> 1       <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - an XPath 2.0 _expression_ The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. OPTION 3: <val:validations> 1   <validation rule=”” desc=””> +     <matchExpression> 1       <mda:metadata> ? rule - required unspecified text desc - optional unspecified text matchExpression content - a Regular _expression_ The module can live at the <file>, <group>, or <unit> level, and standard core processing rules would apply. Our preference is Option 1. Thanks, ryan


  • 15.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 12-01-2012 04:29
    Hi Ryan, all, A few questions: > desc - optional unspecified text I assume this attribute contain the description of what the given valuation does. > rule - required unspecified text I assume this attribute hold some kind of name for the given rule, is that right? > <matchExpression> I may have missed some previous description and I apologize for this, but if that element hold an 'expression' (XPath, regex, etc.) I'm not quite sure how to apply it (from an implementer viewpoint). Does it applies to the source content and if no match is found it's an error, or to the target content? or both? If it's an XPath expression I assume it applies to the node(s) of the source/target/both, but what is the return type to use for the evaluation? (BOOLEAN?, DOM_OBJECT_MODEL?, NODE?, NODESET? NUMBER?, STRING?) XQuery has even more implementation questions: how does the data source is related to the entry? Is the full XQuery set available (one can query other files, etc.)? At this point we might as well go full throttle and make that content a JavaScript script. But as you can guess the least restrictive the content of matchExpression is, the least useable it is outside of that specific XLIFF document. A few more notes, in general: I don't want to sound negative but maybe this module needs a lot more thoughts and test implementations before it's truly ready for prime-time. For example, what do the tools makers of applications like XBench or QADistiller, etc. think of this? After all they support XLIFF and QA functions and are widely used. They would be one of the the types of implementers. If the TC wants a stable Committee Draft published in January, maybe we should focus on the core and possibly tweak the few modules that have been worked on since several months. As long as we have a good process in place to integrate new modules there is no really a need to rush. After all such validation module (or other modules) could be extensions for a while. [BTW: I'm still not enthusiastic about how it looks like we will integrate new modules: incrementing the version seems to be a complex process: that will require OASIS members (not just TC members) to vote on it, something that is not easy to get done at all.] I have to say that I tend to agree with Rodolfo on the flurry of last minute additions: it's probably not a very healthy way to move to a committee draft. cheers, -yves


  • 16.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 12-01-2012 07:43
    Thanks Yves, see inline.


  • 17.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 12-01-2012 14:40
      |   view attached
    hi Ryan, all, >>> <matchExpression> >> >> I may have missed some previous description and I apologize for this, >> but if that element hold an 'expression' (XPath, regex, etc.) I'm not >> quite sure how to apply it (from an implementer viewpoint). Does it >> applies to the source content and if no match is found it's an error, >> or to the target content? or both? > > [ryanki] Unless we have a bunch of pre-defined standard rules and descriptions, > so that a tool implementer would know the business logic to take depending on > what the match returns, the most we can assume is that the match's success > or failure can be acted upon in some way. Success or failure of the match can > be a good start for a first implementation, at least. But how exactly the expression should be used? I think we have to be very specific otherwise you can end up with applications that use the same metadata very differently, even if they use the same regex engone. for example: a) I can apply the regex to the target and if I don't get a match it constitutes a failure of the validation. b) I can combine the text of both source and target and apply the regex, and no match constitute a failure. but depending on a or b, the regex needs to be very different. also: how inline codes fit into this? do we pass the raw XML to the regex? (then how can tools not working natively with XLIFF use this?) Should the regex support extra metacharacters that match code (e.g. k means 'any code' and the tool knows how to matchk to its own representation of the code), etc. How source and target are linked (if they are)? for example: Many QA tools use pairs of regex to run validation rules (See attached screen shot), one for the source the other for the target. They often add some extra keyword or logic to allow linking the two fields. I'm not trying to sink your proposal, far from it. I just think that to be more than an private extension it probably need a lot more processing descriptions, and input from implementers. And maybe it will end up being very simple and consider out of scope a lot of what QA tools do. >> ...do the tools makers of applications like XBench or QADistiller, etc. >> think of this? > > [ryanki] Are there members in the TC from these companies, or do folks have > contacts in these companies that they can share? ENLASO is doing QA tools and that's why I'm providing all that feedback. SDL and Lionbridge have also QA/validation components: fredrik provided some feedback. For XBench: Josep is in the xliff-comments list 9 https://lists.oasis-open.org/archives/xliff-comment/ ) For QA Distiller: Thomas may be in the xliff-comments list too. -ys Attachment: screenshot.png Description: PNG image


  • 18.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 12-03-2012 19:57
    There has been a good, detailed discussion around this validations module proposal. We are grateful for the input - certainly, lots of worthwhile points to consider. Considering the revised proposal outlined by Ryan last week, can we proceed to a approve/reject ballot in the near future? That was the recommendation in the previous TC call. The work to decide the specific implementation and detail can still continue, but we'd like to achieve certainty around the feature's inclusion in XLIFF 2.0. Regardless of the mechanics of the validation feature, I expect it will need careful consideration and compromise among implementers - something we can realistically only expect after we publish XLIFF 2.0. This will allow for improvement/extension of the feature in future revisions. Thanks, Kevin.


  • 19.  RE: [xliff] 2.0 Validations Module Proposal

    Posted 12-03-2012 23:32
    Hi Kevin, Personally I'm not opposed to such validation module in 2.0. I simply think there are details to work out. I'm not sure a ballot would guarantee its inclusion in the final 2.0. But it may help ensuring it's in the Committee Draft and help to get feedback from potential implementers. > ...I expect it will need careful consideration and > compromise among implementers - something we can > realistically only expect after we publish XLIFF 2.0. Hopefully we should get feedback before we have a final 2.0 (and not just for this part). That's the whole idea of the Committee Draft: https://www.oasis-open.org/policies-guidelines/tc-process#publicReview cheers, -yves