OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only
Expand all | Collapse all

RE: [xliff] R37: Revised Validations Module proposal

  • 1.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-20-2013 06:42




    Hi Yves, all,
     
    We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)"
    If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"

     
    Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)"
    For example, where 3 pipes need to occur in the target for whatever reason.
     
    Further comments or suggestions welcome.
     



  • 2.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-20-2013 11:42
    Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King <ryanki@microsoft.com> To:         Yves Savourel <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org> Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         <xliff@lists.oasis-open.org> Hi Yves, all,   We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"   Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason.   Further comments or suggestions welcome.  


  • 3.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-20-2013 17:00
    Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King <ryanki@microsoft.com> To:         Yves Savourel <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org> Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         <xliff@lists.oasis-open.org> Hi Yves, all,   We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"   Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason.   Further comments or suggestions welcome.  


  • 4.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-20-2013 23:30
    Hi Ryan,   while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?   Cheers, Joachim   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Ryan King Sent: Mittwoch, 20. März 2013 17:58 To: Helena S Chapman Cc: xliff@lists.oasis-open.org; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal   Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King <ryanki@microsoft.com> To:         Yves Savourel <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org> Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         <xliff@lists.oasis-open.org> Hi Yves, all,   We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"   Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason.   Further comments or suggestions welcome.  


  • 5.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-21-2013 18:45
    You summarized correctly of my own recollection
    of the discussion.




    From:      
      "Schurig, Joachim"
    <Joachim.Schurig@lionbridge.com>
    To:      
      Ryan King <ryanki@microsoft.com>,
    Helena S Chapman/San Jose/IBM@IBMUS
    Cc:      
      "xliff@lists.oasis-open.org"
    <xliff@lists.oasis-open.org>, Yves Savourel <ysavourel@enlaso.com>
    Date:      
      03/20/2013 07:29 PM
    Subject:    
        RE: [xliff]
    R37: Revised Validations Module proposal




    Hi Ryan,
     
    while yours was my initial
    position as well, I do not think that it was the outcome of the discussion
    in the TC. We do have already mention of the normalization approach in
    the size restriction module, so it would make sense to include it here,
    too, and I think this was the conclusion on the Tuesday call. You could
    leave the default to “none” and declare that this would leave it to the
    processing agent how to deal with the situation, but if any of “nfd”
    or “nfc” values are set it should lead to more specific behavior. Could
    this be an acceptable solution to all parties?
     
    Cheers,
    Joachim
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
     
    Yes, Helena, thanks for checking with me
    that. We did discuss it and feel that the processing agent should be responsible
    for normalization of text and so we will explicitly state that in the module.

    Thanks,
    Ryan

    Sent from my Windows Phone



    From: Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
    Did Kevin convey the comments about normalization
    to you? How do we expect to deal with that in the spec?




    From:         Ryan
    King <ryanki@microsoft.com>

    To:         Yves Savourel
    <ysavourel@enlaso.com>, "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org>

    Date:         03/20/2013
    02:41 AM
    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal

    Sent by:         <xliff@lists.oasis-open.org>







    Hi Yves, all,
     
    We suggest that for mustLoc, we enclose the source and replacement target
    values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace
    for example, we could escape it like so: mustLoc="((World)) ({Welt})"

     
    Since we are generalizing the dblSpace to occurrences, then we could do
    something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.

     
    Further comments or suggestions welcome.

     



  • 6.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-21-2013 18:55




    Borrowing from the precedent set by the size restriction module then would be the right thing to do. We could have an additional attribute called normalization
    on the <rule> element:
     
    normalization
    This attribute specifies the normalization to apply when validating a rule. Only the normalization forms C and D as specified by the Unicode Consortium are
    supported, see Unicode Standard Annex #15 [http://unicode.org/reports/tr15/].
    Value description:
    normalization to apply
    none      No normalization should be done
    nfc         Normalization Form C should be used
    nfd         Normalization Form D should be used
    Default value: "none"
    Used in: <rule>.
     
    Either setting the attribute to none, or the absence of attribute would mean that the processing agent would need to decide how to handle normalization. Does
    that work for everyone?
     
    Thanks,
    Ryan
     
    From: Helena S Chapman [mailto:hchapman@us.ibm.com]

    Sent: Thursday, March 21, 2013 11:45 AM
    To: Schurig, Joachim
    Cc: Ryan King; xliff@lists.oasis-open.org; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
     
    You summarized correctly of my own recollection of the discussion.




    From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >

    To:         Ryan King < ryanki@microsoft.com >, Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/20/2013 07:29 PM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal







    Hi Ryan,

     

    while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the
    size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation,
    but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?

     

    Cheers,

    Joachim

     

    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
     
    Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the
    module.

    Thanks,
    Ryan

    Sent from my Windows Phone
     




    From:
    Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
    Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec?




    From:         Ryan King < ryanki@microsoft.com >

    To:         Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >

    Date:         03/20/2013 02:41 AM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >

     







    Hi Yves, all,

     
    We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"

     
    Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.

     
    Further comments or suggestions welcome.

     



  • 7.  Re: [xliff] R37: Revised Validations Module proposal

    Posted 03-21-2013 19:08
    Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list. While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module. I think there was no doubt that we should provide a vehicle for conveying the normalization type required. There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-) Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: You summarized correctly of my own recollection of the discussion. From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com > To:         Ryan King < ryanki@microsoft.com >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/20/2013 07:29 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal Hi Ryan,   while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?   Cheers, Joachim   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Mittwoch, 20. März 2013 17:58 To: Helena S Chapman Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal   Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King < ryanki@microsoft.com > To:         Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org > Hi Yves, all,   We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"   Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason.   Further comments or suggestions welcome.  


  • 8.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-21-2013 20:29
    Hi David,   This is a fair point. I’d be interested in hearing from tool providers about their preference here also.   I see two scenarios that are relevant here:   1.        Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature would not require/benefit from normalization 2.        Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present   In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing language-specific rules, then they have an onus to specify their preferred normalization approach.   If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.   Other feedback/thoughts appreciated.   Thanks, Kevin.   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Dr. David Filip Sent: Thursday, March 21, 2013 12:08 PM To: Helena S Chapman Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list. While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module. I think there was no doubt that we should provide a vehicle for conveying the normalization type required. There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)   Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie   On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: You summarized correctly of my own recollection of the discussion. From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com > To:         Ryan King < ryanki@microsoft.com >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/20/2013 07:29 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal Hi Ryan,   while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?   Cheers, Joachim   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Mittwoch, 20. März 2013 17:58 To: Helena S Chapman Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal   Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone   From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King < ryanki@microsoft.com > To:         Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org >   Hi Yves, all,   We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"   Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason.   Further comments or suggestions welcome.  


  • 9.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-27-2013 18:05
    Unfortunately, I don't entirely agree.


    I understand what you are suggesting
    with "language neutral rules". However, Unicode normalization
    should often be applied even when there is no language rules that should
    be associated with it. For instance, when someone send an English document
    which contains two distinctly different characters "?"
    vs "?", without normalization
    and the correct intention, how would the tools/processes know what to do
    with these two characters? When to treat them as two different characters
    and when to treat them as the same? "none" for normalization
    would tell them to treat these as different and if it's "NFD"
    or "NFC", these two would be the same (not identical).

    This is going to be increasingly common
    because:

    1. We are more likely to receive true
    multilingual static content these days. And, we don't have to look far
    for an example. In Canada, most content has to be available both in English
    and French at the same time.
    2. When we deal with multimedia type
    content, the use of more than one language within the same context is even
    more frequent. In my own household, a combination of Mandarin, Taiwanese,
    Japanese, and Hebrew are often mixed in with English.

    I am actually curious if the spoken
    language content interchange is out of scope of XLIFF in general? What
    happens when we embed this into an interactive format? Do we give our community
    the guideline that if one is working with translation requests that are
    not limited to written languages, don't use XLIFF for interchange?




    From:      
      "Kevin O'Donnell"
    <kevinod@microsoft.com>
    To:      
      "Dr. David Filip"
    <David.Filip@ul.ie>, Helena S Chapman/San Jose/IBM@IBMUS
    Cc:      
      "Schurig, Joachim"
    <Joachim.Schurig@lionbridge.com>, Ryan King <ryanki@microsoft.com>,
    "xliff@lists.oasis-open.org" <xliff@lists.oasis-open.org>,
    Yves Savourel <ysavourel@enlaso.com>
    Date:      
      03/21/2013 04:30 PM
    Subject:    
        RE: [xliff]
    R37: Revised Validations Module proposal
    Sent by:    
        <xliff@lists.oasis-open.org>




    Hi David,
     
    This is a fair point. I’d
    be interested in hearing from tool providers about their preference here
    also.
     
    I see two scenarios that
    are relevant here:
     
    1.       Global
    (language neutral) rules : these rules are the most common here at Microsoft
    and do not differentiate per-language (e.g. formatting rules). Therefore,
    rules of this nature would not require/benefit from normalization
    2.       Language-specific
    rules : these rules, by default, may benefit from normalization and
    indeed would help reduce false positives if present
     
    In my thinking, keeping “none”
    as default keeps the module simple and avoids unnecessary overhead by the
    processing agent when not required. If the XLIFF creator is implementing
    language-specific rules, then they have an onus to specify their preferred
    normalization approach.
     
    If we can surmise the likely
    prevalence of scenario 1 vs. scenario 2, that may also indicate the likely
    best default setting here.
     
    Other feedback/thoughts appreciated.
     
    Thanks,
    Kevin.
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, March 21, 2013 12:08 PM
    To: Helena S Chapman
    Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org; Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal
     
    Guys, after a brief spotcheck I
    just forwarded Asanka's minutes from Tuesday to the list.
    While there was no formal conclusion
    the discussion tended to inclusion of normalization types along the lines
    of the size restriction module.
    I think there was no doubt that
    we should provide a vehicle for conveying the normalization type required.
    There did not seem to be a clear
    consensus on the default value though. I personally think that the default
    should NOT be "none". This option for default seems vague and
    obscure to me..  It lets the processor guess based on tribal knowledge
    what they should do not to produce tons of false positives. I thought that
    we wanted to be naive implementer friendly.. :-)
     
    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158

    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie
     
    On Thu, Mar 21, 2013 at 6:44 PM,
    Helena S Chapman < hchapman@us.ibm.com >
    wrote:
    You summarized correctly of my own recollection
    of the discussion.




    From:         "Schurig,
    Joachim" < Joachim.Schurig@lionbridge.com >

    To:         Ryan King
    < ryanki@microsoft.com >,
    Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/20/2013
    07:29 PM
    Subject:      
      RE: [xliff] R37: Revised Validations
    Module proposal






    Hi Ryan,
     
    while yours was my initial position as well, I do not think that it was
    the outcome of the discussion in the TC. We do have already mention of
    the normalization approach in the size restriction module, so it would
    make sense to include it here, too, and I think this was the conclusion
    on the Tuesday call. You could leave the default to “none” and declare
    that this would leave it to the processing agent how to deal with the situation,
    but if any of “nfd” or “nfc” values are set it should lead to more
    specific behavior. Could this be an acceptable solution to all parties?

     
    Cheers,
    Joachim
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

     
    Yes, Helena, thanks for checking with me that. We did discuss it and feel
    that the processing agent should be responsible for normalization of text
    and so we will explicitly state that in the module.

    Thanks,
    Ryan

    Sent from my Windows Phone

     



    From: Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

    Did Kevin convey the comments about normalization to you? How do we expect
    to deal with that in the spec?




    From:         Ryan
    King < ryanki@microsoft.com >

    To:         Yves Savourel
    < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         03/20/2013
    02:41 AM
    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >


     






    Hi Yves, all,

    We suggest that for mustLoc, we enclose the source and replacement target
    values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace
    for example, we could escape it like so: mustLoc="((World)) ({Welt})"


    Since we are generalizing the dblSpace to occurrences, then we could do
    something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.


    Further comments or suggestions welcome.





  • 10.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-28-2013 07:03




    Thanks for your input Helena. I don’t have a strong position on establishing a default of “none”, but am not sure exactly how frequently normalization will
    be required – for the language-neutral rules that I am familiar with, normalization would likely not be required, although I recognize there are many different scenarios here.

     
    In the examples you provided, the XLIFF creator would need to specify their preferred normalization approach when writing the rule. Do you have a suggestion/preference
    for what the default should be, instead of “none”?
     
    The question of using XLIFF to interchange spoken language is interesting; I’m not sure if this has been discussed previously. I’ll let others join in with
    their thoughts.
     
    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Helena S Chapman
    Sent: Wednesday, March 27, 2013 11:03 AM
    To: Kevin O'Donnell
    Cc: Dr. David Filip; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
     
    Unfortunately, I don't entirely agree.


    I understand what you are suggesting with "language neutral rules". However, Unicode normalization should often be applied even when there is no language rules that should be associated with it.
    For instance, when someone send an English document which contains two distinctly different characters
    "?" vs "?", without normalization and the correct intention, how would the tools/processes know what to do
    with these two characters? When to treat them as two different characters and when to treat them as the same? "none" for normalization would tell them to treat these as different and if it's "NFD" or "NFC", these two would be the same (not identical).


    This is going to be increasingly common because:


    1. We are more likely to receive true multilingual static content these days. And, we don't have to look far for an example. In Canada, most content has to be available both in English and French
    at the same time.
    2. When we deal with multimedia type content, the use of more than one language within the same context is even more frequent. In my own household, a combination of Mandarin, Taiwanese, Japanese,
    and Hebrew are often mixed in with English.

    I am actually curious if the spoken language content interchange is out of scope of XLIFF in general? What happens when we embed this into an interactive format? Do we give our community the guideline
    that if one is working with translation requests that are not limited to written languages, don't use XLIFF for interchange?





    From:         "Kevin O'Donnell" < kevinod@microsoft.com >

    To:         "Dr. David Filip" < David.Filip@ul.ie >, Helena S Chapman/San
    Jose/IBM@IBMUS
    Cc:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >,
    Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com >

    Date:         03/21/2013 04:30 PM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >







    Hi David,

     

    This is a fair point. I’d be interested in hearing from tool providers about their preference here also.

     

    I see two scenarios that are relevant here:

     

    1.      
    Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature would not require/benefit from normalization

    2.      
    Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present

     

    In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing
    language-specific rules, then they have an onus to specify their preferred normalization approach.

     

    If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.

     

    Other feedback/thoughts appreciated.

     

    Thanks,

    Kevin.

     

    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, March 21, 2013 12:08 PM
    To: Helena S Chapman
    Cc: Schurig, Joachim; Ryan King;
    xliff@lists.oasis-open.org ; Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal
     
    Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list.

    While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module.

    I think there was no doubt that we should provide a vehicle for conveying the normalization type required.

    There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should
    do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)

     
    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie

     
    On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote:

    You summarized correctly of my own recollection of the discussion.




    From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >

    To:         Ryan King < ryanki@microsoft.com >,
    Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com >

    Date:         03/20/2013 07:29 PM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal

     







    Hi Ryan,
     
    while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think
    this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior.
    Could this be an acceptable solution to all parties?

     
    Cheers,
    Joachim
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
     
    Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module.

    Thanks,
    Ryan

    Sent from my Windows Phone

     





    From: Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

    Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec?




    From:         Ryan King < ryanki@microsoft.com >

    To:         Yves Savourel < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >

    Date:         03/20/2013 02:41 AM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >


     








    Hi Yves, all,

    We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"


    Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.


    Further comments or suggestions welcome.




  • 11.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-28-2013 17:39
    I would suggest NFC being the default.
    Chances are, information being authored in decomposed form of characters
    is unlikely.



    From:      
      "Kevin O'Donnell"
    <kevinod@microsoft.com>
    To:      
      Helena S Chapman/San
    Jose/IBM@IBMUS
    Cc:      
      "Dr. David Filip"
    <David.Filip@ul.ie>, "Schurig, Joachim" <Joachim.Schurig@lionbridge.com>,
    Ryan King <ryanki@microsoft.com>, "xliff@lists.oasis-open.org"
    <xliff@lists.oasis-open.org>, Yves Savourel <ysavourel@enlaso.com>
    Date:      
      03/28/2013 03:03 AM
    Subject:    
        RE: [xliff]
    R37: Revised Validations Module proposal
    Sent by:    
        <xliff@lists.oasis-open.org>




    Thanks for your input Helena.
    I don’t have a strong position on establishing a default of “none”,
    but am not sure exactly how frequently normalization will be required –
    for the language-neutral rules that I am familiar with, normalization would
    likely not be required, although I recognize there are many different scenarios
    here.
     
    In the examples you provided,
    the XLIFF creator would need to specify their preferred normalization approach
    when writing the rule. Do you have a suggestion/preference for what the
    default should be, instead of “none”?
     
    The question of using XLIFF
    to interchange spoken language is interesting; I’m not sure if this has
    been discussed previously. I’ll let others join in with their thoughts.
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Helena S Chapman
    Sent: Wednesday, March 27, 2013 11:03 AM
    To: Kevin O'Donnell
    Cc: Dr. David Filip; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org;
    Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
     
    Unfortunately, I don't entirely agree.

    I understand what you are suggesting with "language neutral rules".
    However, Unicode normalization should often be applied even when there
    is no language rules that should be associated with it. For instance, when
    someone send an English document which contains two distinctly different
    characters "?"
    vs "?", without normalization
    and the correct intention, how would the tools/processes know what to do
    with these two characters? When to treat them as two different characters
    and when to treat them as the same? "none" for normalization
    would tell them to treat these as different and if it's "NFD"
    or "NFC", these two would be the same (not identical).

    This is going to be increasingly common because:

    1. We are more likely to receive true multilingual static content these
    days. And, we don't have to look far for an example. In Canada, most content
    has to be available both in English and French at the same time.
    2. When we deal with multimedia type content, the use of more than one
    language within the same context is even more frequent. In my own household,
    a combination of Mandarin, Taiwanese, Japanese, and Hebrew are often mixed
    in with English.

    I am actually curious if the spoken language content interchange is out
    of scope of XLIFF in general? What happens when we embed this into an interactive
    format? Do we give our community the guideline that if one is working with
    translation requests that are not limited to written languages, don't use
    XLIFF for interchange?




    From:         "Kevin
    O'Donnell" < kevinod@microsoft.com >

    To:         "Dr.
    David Filip" < David.Filip@ul.ie >,
    Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         "Schurig,
    Joachim" < Joachim.Schurig@lionbridge.com >,
    Ryan King < ryanki@microsoft.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/21/2013
    04:30 PM
    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >







    Hi David,
     
    This is a fair point. I’d be interested in hearing from tool providers
    about their preference here also.

     
    I see two scenarios that are relevant here:

     
    1.       Global (language neutral) rules : these rules
    are the most common here at Microsoft and do not differentiate per-language
    (e.g. formatting rules). Therefore, rules of this nature would not require/benefit
    from normalization
    2.       Language-specific rules : these rules, by
    default, may benefit from normalization and indeed would help reduce false
    positives if present
     
    In my thinking, keeping “none” as default keeps the module simple and
    avoids unnecessary overhead by the processing agent when not required.
    If the XLIFF creator is implementing language-specific rules, then they
    have an onus to specify their preferred normalization approach.

     
    If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that
    may also indicate the likely best default setting here.

     
    Other feedback/thoughts appreciated.

     
    Thanks,
    Kevin.
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, March 21, 2013 12:08 PM
    To: Helena S Chapman
    Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal

     
    Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday
    to the list.
    While there was no formal conclusion the discussion tended to inclusion
    of normalization types along the lines of the size restriction module.

    I think there was no doubt that we should provide a vehicle for conveying
    the normalization type required.
    There did not seem to be a clear consensus on the default value though.
    I personally think that the default should NOT be "none". This
    option for default seems vague and obscure to me..  It lets the processor
    guess based on tribal knowledge what they should do not to produce tons
    of false positives. I thought that we wanted to be naive implementer friendly..
    :-)
     
    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie

     
    On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com >
    wrote:
    You summarized correctly of my own recollection of the discussion.




    From:         "Schurig,
    Joachim" < Joachim.Schurig@lionbridge.com >

    To:         Ryan King
    < ryanki@microsoft.com >,
    Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/20/2013
    07:29 PM
    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal


     






    Hi Ryan,

    while yours was my initial position as well, I do not think that it was
    the outcome of the discussion in the TC. We do have already mention of
    the normalization approach in the size restriction module, so it would
    make sense to include it here, too, and I think this was the conclusion
    on the Tuesday call. You could leave the default to “none” and declare
    that this would leave it to the processing agent how to deal with the situation,
    but if any of “nfd” or “nfc” values are set it should lead to more
    specific behavior. Could this be an acceptable solution to all parties?


    Cheers,
    Joachim

    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal


    Yes, Helena, thanks for checking with me that. We did discuss it and feel
    that the processing agent should be responsible for normalization of text
    and so we will explicitly state that in the module.

    Thanks,
    Ryan

    Sent from my Windows Phone


     




    From: Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

    Did Kevin convey the comments about normalization to you? How do we expect
    to deal with that in the spec?




    From:         Ryan
    King < ryanki@microsoft.com >

    To:         Yves Savourel
    < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         03/20/2013
    02:41 AM
    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >



     







    Hi Yves, all,

    We suggest that for mustLoc, we enclose the source and replacement target
    values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace
    for example, we could escape it like so: mustLoc="((World)) ({Welt})"


    Since we are generalizing the dblSpace to occurrences, then we could do
    something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.


    Further comments or suggestions welcome.





  • 12.  Re: [xliff] R37: Revised Validations Module proposal

    Posted 04-02-2013 12:49
    IMHO validation is different use case to size restriction. It makes sense to have none as default there but not here.. See inline.. Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Wed, Mar 27, 2013 at 6:02 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: Unfortunately, I don't entirely agree. I understand what you are suggesting with "language neutral rules". However, Unicode normalization should often be applied even when there is no language rules that should be associated with it. +1  For instance, when someone send an English document which contains two distinctly different characters "?" vs "?", without normalization and the correct intention, how would the tools/processes know what to do with these two characters? When to treat them as two different characters and when to treat them as the same? "none" for normalization would tell them to treat these as different and if it's "NFD" or "NFC", these two would be the same (not identical). This is going to be increasingly common because: +1  1. We are more likely to receive true multilingual static content these days. And, we don't have to look far for an example. In Canada, most content has to be available both in English and French at the same time. 2. When we deal with multimedia type content, the use of more than one language within the same context is even more frequent. In my own household, a combination of Mandarin, Taiwanese, Japanese, and Hebrew are often mixed in with English. I am actually curious if the spoken language content interchange is out of scope of XLIFF in general? What happens when we embed this into an interactive format? Do we give our community the guideline that if one is working with translation requests that are not limited to written languages, don't use XLIFF for interchange? To include voice content we would need to re-charter. I think it is the next frontier and worth discussion.. I would just see it not at the front burner right now with 2.0 preparing for the first public review..  From:         "Kevin O'Donnell" < kevinod@microsoft.com > To:         "Dr. David Filip" < David.Filip@ul.ie >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >, Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/21/2013 04:30 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org > Hi David,   This is a fair point. I’d be interested in hearing from tool providers about their preference here also.   I see two scenarios that are relevant here:   1.       Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature would not require/benefit from normalization 2.       Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present   In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing language-specific rules, then they have an onus to specify their preferred normalization approach.   If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.   Other feedback/thoughts appreciated.   Thanks, Kevin.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, March 21, 2013 12:08 PM To: Helena S Chapman Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list. While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module. I think there was no doubt that we should provide a vehicle for conveying the normalization type required. There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)   Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie   On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: You summarized correctly of my own recollection of the discussion. From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com > To:         Ryan King < ryanki@microsoft.com >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/20/2013 07:29 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal Hi Ryan,   while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?   Cheers, Joachim   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Mittwoch, 20. März 2013 17:58 To: Helena S Chapman Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal   Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone   From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King < ryanki@microsoft.com > To:         Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org >   Hi Yves, all, We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})" Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason. Further comments or suggestions welcome.


  • 13.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 04-02-2013 15:00




    Hi All,
     
    Just back from vacation and catching up on email. Reading the current spec I think it could make sense to simply rely on section “2.6.8 Content Comparison”
    in the core specification for what (if any) normalization to apply for validation comparisons. It stipulates that NFC is used to compare the equality of content.
     
    Best regards,
    Fredrik Estreen
     



    From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org]
    On Behalf Of Dr. David Filip
    Sent: den 2 april 2013 14:49
    To: Helena S Chapman
    Cc: Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org; Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal


     


    IMHO validation is different use case to size restriction. It makes sense to have none as default there but not here..


    See inline..







    Dr. David Filip



    =======================

    LRC CNGL LT-Web CSIS


    University of Limerick, Ireland


    telephone: +353-6120-2781


    cellphone: +353-86-0222-158


    facsimile: +353-6120-2734


    mailto:
    david.filip@ul.ie



     

    On Wed, Mar 27, 2013 at 6:02 PM, Helena S Chapman < hchapman@us.ibm.com > wrote:
    Unfortunately, I don't entirely agree.


    I understand what you are suggesting with "language neutral rules". However, Unicode normalization should often be applied even when there is no language rules that should be associated with it.

    +1 


    For instance, when someone send an English document which contains two distinctly different characters
    "?" vs "?", without normalization and the correct intention, how would the tools/processes know what to do with these two characters? When to
    treat them as two different characters and when to treat them as the same? "none" for normalization would tell them to treat these as different and if it's "NFD" or "NFC", these two would be the same (not identical).


    This is going to be increasingly common because:


    +1 




    1. We are more likely to receive true multilingual static content these days. And, we don't have to look far for an example. In Canada, most content has to be available both in English and French at the same time.

    2. When we deal with multimedia type content, the use of more than one language within the same context is even more frequent. In my own household, a combination of Mandarin, Taiwanese, Japanese, and Hebrew are
    often mixed in with English.

    I am actually curious if the spoken language content interchange is out of scope of XLIFF in general? What happens when we embed this into an interactive format? Do we give our community the guideline that if one
    is working with translation requests that are not limited to written languages, don't use XLIFF for interchange?



     


    To include voice content we would need to re-charter. I think it is the next frontier and worth discussion.. I would just see it not at the front burner right now with 2.0 preparing for the first public review.. 







    From:         "Kevin O'Donnell" < kevinod@microsoft.com >


    To:         "Dr. David Filip" < David.Filip@ul.ie >,
    Helena S Chapman/San Jose/IBM@IBMUS
    Cc:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >,
    Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/21/2013 04:30 PM


    Subject:         RE: [xliff] R37: Revised Validations Module proposal




    Sent by:         < xliff@lists.oasis-open.org >







    Hi David,
     
    This is a fair point. I’d be interested in hearing from tool providers about their preference here also.

     
    I see two scenarios that are relevant here:

     
    1.       Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature
    would not require/benefit from normalization
    2.       Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present

     
    In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing language-specific
    rules, then they have an onus to specify their preferred normalization approach.

     
    If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.

     
    Other feedback/thoughts appreciated.

     
    Thanks,
    Kevin.
     
    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, March 21, 2013 12:08 PM
    To: Helena S Chapman
    Cc: Schurig, Joachim; Ryan King;
    xliff@lists.oasis-open.org ; Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal
     
    Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list.

    While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module.

    I think there was no doubt that we should provide a vehicle for conveying the normalization type required.

    There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should
    do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)

     
    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158

    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie

     
    On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote:

    You summarized correctly of my own recollection of the discussion.




    From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >

    To:         Ryan King < ryanki@microsoft.com >,
    Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com >

    Date:         03/20/2013 07:29 PM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal

     







    Hi Ryan,
     
    while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think
    this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior.
    Could this be an acceptable solution to all parties?

     
    Cheers,
    Joachim
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal
     
    Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module.

    Thanks,
    Ryan

    Sent from my Windows Phone

     





    From: Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

    Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec?




    From:         Ryan King < ryanki@microsoft.com >

    To:         Yves Savourel < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >

    Date:         03/20/2013 02:41 AM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >


     








    Hi Yves, all,

    We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"


    Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.


    Further comments or suggestions welcome.




  • 14.  Re: [xliff] R37: Revised Validations Module proposal

    Posted 04-08-2013 23:42
    All, following the TC discussions last week, I believe the following needs to be answered. Does the core need a default or even enforced storage normalization? I believe the answer is no and we are OK with specifying a default for content comparison purposes that can be overridden in the validation module and in the size restriction module and nowhere else in the spec as it stands now AFAIK. In case these two introduce an explicit attribute that allows for the override, the default should be NFC in both cases, same as in 2.6.8. If we decided that it is not enough to have normalization defaults for comparison purposes ONLY, we could introduce an optional normalization attribute in core that could live on any of the structural elements, from <file> down to <source> and <target>, there would be inheritance and the default/inherited would be assumed (MUST for processors) where nothing is specified/inherited. The default could be either "none" or "NFC" In case we go for the core attribute, I believe the default should be "none" for everything (including storage) except comparison purposes that would cover section 2.6.8 and both modules. Please note that I am NOT actually proposing to have the core attribute, I am just trying to accelerate the discussion by charting all viable options. Please indicate what option seems preferable to you, eventually if you see any other viable options.. Thanks and regards dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Apr 2, 2013 at 3:59 PM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com > wrote: Hi All,   Just back from vacation and catching up on email. Reading the current spec I think it could make sense to simply rely on section “2.6.8 Content Comparison” in the core specification for what (if any) normalization to apply for validation comparisons. It stipulates that NFC is used to compare the equality of content.   Best regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [mailto: xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: den 2 april 2013 14:49 To: Helena S Chapman Cc: Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   IMHO validation is different use case to size restriction. It makes sense to have none as default there but not here.. See inline.. Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 mailto: david.filip@ul.ie   On Wed, Mar 27, 2013 at 6:02 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: Unfortunately, I don't entirely agree. I understand what you are suggesting with "language neutral rules". However, Unicode normalization should often be applied even when there is no language rules that should be associated with it. +1  For instance, when someone send an English document which contains two distinctly different characters "?" vs "?", without normalization and the correct intention, how would the tools/processes know what to do with these two characters? When to treat them as two different characters and when to treat them as the same? "none" for normalization would tell them to treat these as different and if it's "NFD" or "NFC", these two would be the same (not identical). This is going to be increasingly common because: +1  1. We are more likely to receive true multilingual static content these days. And, we don't have to look far for an example. In Canada, most content has to be available both in English and French at the same time. 2. When we deal with multimedia type content, the use of more than one language within the same context is even more frequent. In my own household, a combination of Mandarin, Taiwanese, Japanese, and Hebrew are often mixed in with English. I am actually curious if the spoken language content interchange is out of scope of XLIFF in general? What happens when we embed this into an interactive format? Do we give our community the guideline that if one is working with translation requests that are not limited to written languages, don't use XLIFF for interchange?   To include voice content we would need to re-charter. I think it is the next frontier and worth discussion.. I would just see it not at the front burner right now with 2.0 preparing for the first public review..  From:         "Kevin O'Donnell" < kevinod@microsoft.com > To:         "Dr. David Filip" < David.Filip@ul.ie >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >, Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/21/2013 04:30 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org > Hi David,   This is a fair point. I’d be interested in hearing from tool providers about their preference here also.   I see two scenarios that are relevant here:   1.       Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature would not require/benefit from normalization 2.       Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present   In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing language-specific rules, then they have an onus to specify their preferred normalization approach.   If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.   Other feedback/thoughts appreciated.   Thanks, Kevin.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, March 21, 2013 12:08 PM To: Helena S Chapman Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list. While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module. I think there was no doubt that we should provide a vehicle for conveying the normalization type required. There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)   Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie   On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: You summarized correctly of my own recollection of the discussion. From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com > To:         Ryan King < ryanki@microsoft.com >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/20/2013 07:29 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal   Hi Ryan,   while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?   Cheers, Joachim   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Mittwoch, 20. März 2013 17:58 To: Helena S Chapman Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal   Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone   From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King < ryanki@microsoft.com > To:         Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org >   Hi Yves, all, We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})" Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason. Further comments or suggestions welcome.


  • 15.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 04-09-2013 09:39




    I opt for no normalization on core – quote from my other mail today:
     
    I actually think there should not be an attribute about normalization in the XLIFF core. In my opinion it makes sense to have it in length restrictions
    (because they refer to the storage format) and to a lesser extent in validation, but not in core. If you expect or need a specific normalization in your document, apply it.
     
    Actually, even the normalization attributes in the size restriction module do not require the content be stored in that normalization – which
    would also be difficult as the same content could have e.g. NFC for their storage size applied and NFD for general size restrictions.. it’s only used to know which calculation about sizes to apply.
     
    So I think we both agree.
     
    Regards,
    Joachim
     

    From: Dr. David Filip [mailto:David.Filip@ul.ie]

    Sent: Dienstag, 9. April 2013 01:41
    To: Estreen, Fredrik
    Cc: Helena S Chapman; Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org; Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal

     

    All, following the TC discussions last week, I believe the following needs to be answered.

     


    Does the core need a default or even enforced storage normalization?


    I believe the answer is no and we are OK with specifying a default for content comparison purposes that can be overridden in the validation module and in the size restriction module and nowhere else in the spec as it stands now AFAIK. In
    case these two introduce an explicit attribute that allows for the override, the default should be NFC in both cases, same as in 2.6.8.


     


    If we decided that it is not enough to have normalization defaults for comparison purposes ONLY, we could introduce an optional normalization attribute in core that could live on any of the structural elements, from <file> down to <source>
    and <target>, there would be inheritance and the default/inherited would be assumed (MUST for processors) where nothing is specified/inherited.


     


    The default could be either "none" or "NFC"


    In case we go for the core attribute, I believe the default should be "none" for everything (including storage) except comparison purposes that would cover section 2.6.8 and both modules.


     


    Please note that I am NOT actually proposing to have the core attribute, I am just trying to accelerate the discussion by charting all viable options.


     


    Please indicate what option seems preferable to you, eventually if you see any other viable options..


     


    Thanks and regards


    dF


     








    Dr. David Filip



    =======================

    LRC CNGL LT-Web CSIS


    University of Limerick, Ireland


    telephone: +353-6120-2781


    cellphone: +353-86-0222-158


    facsimile: +353-6120-2734


    mailto:
    david.filip@ul.ie



     

    On Tue, Apr 2, 2013 at 3:59 PM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com > wrote:


    Hi All,
     
    Just back from vacation and catching up on email. Reading the current spec I think it
    could make sense to simply rely on section “2.6.8 Content Comparison” in the core specification for what (if any) normalization to apply for validation comparisons. It stipulates that NFC is used to compare the equality of content.
     
    Best regards,
    Fredrik Estreen
     



    From:
    xliff@lists.oasis-open.org [mailto: xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: den 2 april 2013 14:49
    To: Helena S Chapman
    Cc: Kevin O'Donnell; Schurig, Joachim; Ryan King;
    xliff@lists.oasis-open.org ; Yves Savourel



    Subject: Re: [xliff] R37: Revised Validations Module proposal






     


    IMHO validation is different use case to size restriction. It makes sense to have none as default there but not here..


    See inline..







    Dr. David Filip



    =======================

    LRC CNGL LT-Web CSIS


    University of Limerick, Ireland


    telephone:  +353-6120-2781


    cellphone:
    +353-86-0222-158



    facsimile:  +353-6120-2734


    mailto:
    david.filip@ul.ie



     

    On Wed, Mar 27, 2013 at 6:02 PM, Helena S Chapman < hchapman@us.ibm.com > wrote:
    Unfortunately, I don't entirely agree.


    I understand what you are suggesting with "language neutral rules". However, Unicode normalization should often be applied even when there is no language rules that should be associated with
    it.

    +1 


    For instance, when someone send an English document which contains two distinctly different characters
    "?" vs "?", without normalization and the correct intention, how would the tools/processes know what to do with these two characters? When to treat them as two different
    characters and when to treat them as the same? "none" for normalization would tell them to treat these as different and if it's "NFD" or "NFC", these two would be the same (not identical).


    This is going to be increasingly common because:


    +1 




    1. We are more likely to receive true multilingual static content these days. And, we don't have to look far for an example. In Canada, most content has to be available both in English and French
    at the same time.
    2. When we deal with multimedia type content, the use of more than one language within the same context is even more frequent. In my own household, a combination of Mandarin, Taiwanese, Japanese,
    and Hebrew are often mixed in with English.

    I am actually curious if the spoken language content interchange is out of scope of XLIFF in general? What happens when we embed this into an interactive format? Do we give our community the
    guideline that if one is working with translation requests that are not limited to written languages, don't use XLIFF for interchange?



     


    To include voice content we would need to re-charter. I think it is the next frontier and worth discussion.. I would just see it not at the front burner right
    now with 2.0 preparing for the first public review.. 







    From:         "Kevin O'Donnell" < kevinod@microsoft.com >


    To:         "Dr.
    David Filip" < David.Filip@ul.ie >, Helena S Chapman/San Jose/IBM@IBMUS

    Cc:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >,
    Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/21/2013 04:30 PM


    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal




    Sent by:         < xliff@lists.oasis-open.org >







    Hi David,

     

    This is a fair point. I’d be interested in hearing from tool providers about their preference here also.

     

    I see two scenarios that are relevant here:

     

    1.      
    Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature would not require/benefit from normalization

    2.      
    Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present

     

    In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing
    language-specific rules, then they have an onus to specify their preferred normalization approach.

     

    If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.

     

    Other feedback/thoughts appreciated.

     

    Thanks,

    Kevin.

     

    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, March 21, 2013 12:08 PM
    To: Helena S Chapman
    Cc: Schurig, Joachim; Ryan King;
    xliff@lists.oasis-open.org ; Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal

     
    Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list.

    While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module.

    I think there was no doubt that we should provide a vehicle for conveying the normalization type required.

    There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should
    do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)

     
    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158

    facsimile: +353-6120-2734
    mailto:
    david.filip@ul.ie

     
    On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote:

    You summarized correctly of my own recollection of the discussion.




    From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >

    To:         Ryan King < ryanki@microsoft.com >,
    Helena S Chapman/San Jose/IBM@IBMUS
    Cc:         " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/20/2013 07:29 PM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal


     







    Hi Ryan,
     
    while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think
    this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior.
    Could this be an acceptable solution to all parties?

     
    Cheers,
    Joachim
     
    From:
    xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

     
    Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module.

    Thanks,
    Ryan

    Sent from my Windows Phone


     





    From: Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org ; Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

    Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec?




    From:         Ryan King < ryanki@microsoft.com >

    To:         Yves Savourel < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         03/20/2013 02:41 AM

    Subject:         RE: [xliff] R37: Revised Validations Module proposal

    Sent by:         < xliff@lists.oasis-open.org >



     








    Hi Yves, all,

    We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})"


    Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.


    Further comments or suggestions welcome.




  • 16.  Re: [xliff] R37: Revised Validations Module proposal

    Posted 04-09-2013 11:07
    Thanks Joachim, I believe we agree.. (see also inline) So the consensus should be something like this: No storage default. NFC as comparison default in all three cases core and both modules, but both modules allowing for override by having the dedicated attribute @Helena, @Fredrik, others, Please shout by Wed End of Your Day, if this does not seem OK, otherwise I'd like to ask Fredrik and Ryan to implement this in their modules by the end of their day Thu, so that Tom and I can finalize the spec for meeting next week @Tom, will you be able to modify schema by Monday if this is done by Friday? Thanks dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Apr 9, 2013 at 10:38 AM, Schurig, Joachim < Joachim.Schurig@lionbridge.com > wrote: I opt for no normalization on core – quote from my other mail today:   I actually think there should not be an attribute about normalization in the XLIFF core. In my opinion it makes sense to have it in length restrictions (because they refer to the storage format) and to a lesser extent in validation, but not in core. If you expect or need a specific normalization in your document, apply it. +1    Actually, even the normalization attributes in the size restriction module do not require the content be stored in that normalization +1  – which would also be difficult as the same content could have e.g. NFC for their storage size applied and NFD for general size restrictions.. it’s only used to know which calculation about sizes to apply. +1    So I think we both agree.   Regards, Joachim   From: Dr. David Filip [mailto: David.Filip@ul.ie ] Sent: Dienstag, 9. April 2013 01:41 To: Estreen, Fredrik Cc: Helena S Chapman; Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   All, following the TC discussions last week, I believe the following needs to be answered.   Does the core need a default or even enforced storage normalization? I believe the answer is no and we are OK with specifying a default for content comparison purposes that can be overridden in the validation module and in the size restriction module and nowhere else in the spec as it stands now AFAIK. In case these two introduce an explicit attribute that allows for the override, the default should be NFC in both cases, same as in 2.6.8.   If we decided that it is not enough to have normalization defaults for comparison purposes ONLY, we could introduce an optional normalization attribute in core that could live on any of the structural elements, from <file> down to <source> and <target>, there would be inheritance and the default/inherited would be assumed (MUST for processors) where nothing is specified/inherited.   The default could be either "none" or "NFC" In case we go for the core attribute, I believe the default should be "none" for everything (including storage) except comparison purposes that would cover section 2.6.8 and both modules.   Please note that I am NOT actually proposing to have the core attribute, I am just trying to accelerate the discussion by charting all viable options.   Please indicate what option seems preferable to you, eventually if you see any other viable options..   Thanks and regards dF   Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 mailto: david.filip@ul.ie   On Tue, Apr 2, 2013 at 3:59 PM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com > wrote: Hi All,   Just back from vacation and catching up on email. Reading the current spec I think it could make sense to simply rely on section “2.6.8 Content Comparison” in the core specification for what (if any) normalization to apply for validation comparisons. It stipulates that NFC is used to compare the equality of content.   Best regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [mailto: xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: den 2 april 2013 14:49 To: Helena S Chapman Cc: Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   IMHO validation is different use case to size restriction. It makes sense to have none as default there but not here.. See inline.. Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 mailto: david.filip@ul.ie   On Wed, Mar 27, 2013 at 6:02 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: Unfortunately, I don't entirely agree. I understand what you are suggesting with "language neutral rules". However, Unicode normalization should often be applied even when there is no language rules that should be associated with it. +1  For instance, when someone send an English document which contains two distinctly different characters "?" vs "?", without normalization and the correct intention, how would the tools/processes know what to do with these two characters? When to treat them as two different characters and when to treat them as the same? "none" for normalization would tell them to treat these as different and if it's "NFD" or "NFC", these two would be the same (not identical). This is going to be increasingly common because: +1  1. We are more likely to receive true multilingual static content these days. And, we don't have to look far for an example. In Canada, most content has to be available both in English and French at the same time. 2. When we deal with multimedia type content, the use of more than one language within the same context is even more frequent. In my own household, a combination of Mandarin, Taiwanese, Japanese, and Hebrew are often mixed in with English. I am actually curious if the spoken language content interchange is out of scope of XLIFF in general? What happens when we embed this into an interactive format? Do we give our community the guideline that if one is working with translation requests that are not limited to written languages, don't use XLIFF for interchange?   To include voice content we would need to re-charter. I think it is the next frontier and worth discussion.. I would just see it not at the front burner right now with 2.0 preparing for the first public review..  From:         "Kevin O'Donnell" < kevinod@microsoft.com > To:         "Dr. David Filip" < David.Filip@ul.ie >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >, Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/21/2013 04:30 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org > Hi David,   This is a fair point. I’d be interested in hearing from tool providers about their preference here also.   I see two scenarios that are relevant here:   1.       Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature would not require/benefit from normalization 2.       Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present   In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing language-specific rules, then they have an onus to specify their preferred normalization approach.   If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.   Other feedback/thoughts appreciated.   Thanks, Kevin.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, March 21, 2013 12:08 PM To: Helena S Chapman Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list. While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module. I think there was no doubt that we should provide a vehicle for conveying the normalization type required. There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)   Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie   On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: You summarized correctly of my own recollection of the discussion. From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com > To:         Ryan King < ryanki@microsoft.com >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/20/2013 07:29 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal   Hi Ryan,   while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?   Cheers, Joachim   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Mittwoch, 20. März 2013 17:58 To: Helena S Chapman Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal   Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone   From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King < ryanki@microsoft.com > To:         Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org >   Hi Yves, all, We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})" Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason. Further comments or suggestions welcome.


  • 17.  Re: [xliff] R37: Revised Validations Module proposal

    Posted 04-09-2013 18:34
    I am ok with this as you described.




    From:      
      "Dr. David Filip"
    <David.Filip@ul.ie>
    To:      
      "Schurig, Joachim"
    <Joachim.Schurig@lionbridge.com>, "Estreen, Fredrik" <Fredrik.Estreen@lionbridge.com>,
    Helena S Chapman/San Jose/IBM@IBMUS, Tom Comerford <tom@supratext.com>,
    Ryan King <ryanki@microsoft.com>, "xliff@lists.oasis-open.org"
    <xliff@lists.oasis-open.org>
    Date:      
      04/09/2013 07:07 AM
    Subject:    
        Re: [xliff]
    R37: Revised Validations Module proposal
    Sent by:    
        <xliff@lists.oasis-open.org>




    Thanks Joachim,
    I believe we agree.. (see also inline)

    So the consensus should be something like this:
    No storage default. NFC as comparison default in all three
    cases core and both modules, but both modules allowing for override by
    having the dedicated attribute

    @Helena, @Fredrik, others,
    Please shout by Wed End of Your Day, if this does not
    seem OK, otherwise I'd like to ask Fredrik and Ryan to implement this in
    their modules by the end of their day Thu, so that Tom and I can finalize
    the spec for meeting next week
    @Tom, will you be able to modify schema by Monday if this
    is done by Friday?

    Thanks
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781
    cellphone: +353-86-0222-158
    facsimile: +353-6120-2734
    mailto: david.filip@ul.ie


    On Tue, Apr 9, 2013 at 10:38 AM, Schurig, Joachim < Joachim.Schurig@lionbridge.com >
    wrote:
    I opt for no normalization
    on core – quote from my other mail today:
     
    I actually think there
    should not be an attribute about normalization in the XLIFF core. In my
    opinion it makes sense to have it in length restrictions (because they
    refer to the storage format) and to a lesser extent in validation, but
    not in core. If you expect or need a specific normalization in your document,
    apply it.
    +1 
     
    Actually, even the normalization
    attributes in the size restriction module do not require the content be
    stored in that normalization
    +1 
    – which would also be
    difficult as the same content could have e.g. NFC for their storage size
    applied and NFD for general size restrictions.. it’s only used to know
    which calculation about sizes to apply.
    +1 
     
    So I think we both agree.
     
    Regards,
    Joachim
     
    From: Dr. David Filip [mailto: David.Filip@ul.ie ]

    Sent: Dienstag, 9. April 2013 01:41
    To: Estreen, Fredrik
    Cc: Helena S Chapman; Kevin O'Donnell; Schurig, Joachim; Ryan King;
    xliff@lists.oasis-open.org ;
    Yves Savourel

    Subject: Re: [xliff] R37: Revised Validations Module proposal
     
    All, following the TC discussions last week, I believe
    the following needs to be answered.
     
    Does the core need a default or even enforced storage normalization?
    I believe the answer is no and we are OK with specifying
    a default for content comparison purposes that can be overridden in
    the validation module and in the size restriction module and nowhere else
    in the spec as it stands now AFAIK. In case these two introduce an explicit
    attribute that allows for the override, the default should be NFC in both
    cases, same as in 2.6.8.
     
    If we decided that it is not enough to have normalization
    defaults for comparison purposes ONLY, we could introduce an optional normalization
    attribute in core that could live on any of the structural elements, from
    <file> down to <source> and <target>, there would be
    inheritance and the default/inherited would be assumed (MUST for processors)
    where nothing is specified/inherited.
     
    The default could be either "none" or "NFC"
    In case we go for the core attribute, I believe the default
    should be "none" for everything (including storage) except comparison
    purposes that would cover section 2.6.8 and both modules.
     
    Please note that I am NOT actually proposing to have the
    core attribute, I am just trying to accelerate the discussion by charting
    all viable options.
     
    Please indicate what option seems preferable to you, eventually
    if you see any other viable options..
     
    Thanks and regards
    dF
     

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone:  +353-6120-2781
    cellphone: +353-86-0222-158

    facsimile:  +353-6120-2734
    mailto: david.filip@ul.ie
     
    On Tue, Apr 2, 2013 at 3:59 PM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com >
    wrote:
    Hi All,
     
    Just back from vacation and
    catching up on email. Reading the current spec I think it could make sense
    to simply rely on section “2.6.8 Content Comparison” in the core specification
    for what (if any) normalization to apply for validation comparisons. It
    stipulates that NFC is used to compare the equality of content.
     
    Best regards,
    Fredrik Estreen
     
    From: xliff@lists.oasis-open.org
    [mailto: xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: den 2 april 2013 14:49
    To: Helena S Chapman
    Cc: Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ;
    Yves Savourel

    Subject: Re: [xliff] R37: Revised Validations Module proposal
     
    IMHO validation is different use case to size restriction.
    It makes sense to have none as default there but not here..
    See inline..

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone:  +353-6120-2781
    cellphone: +353-86-0222-158

    facsimile:  +353-6120-2734
    mailto: david.filip@ul.ie
     
    On Wed, Mar 27, 2013 at 6:02 PM, Helena S Chapman < hchapman@us.ibm.com >
    wrote:
    Unfortunately, I don't entirely agree.

    I understand what you are suggesting with "language neutral rules".
    However, Unicode normalization should often be applied even when there
    is no language rules that should be associated with it.
    +1 
    For instance, when someone send an English
    document which contains two distinctly different characters "?"
    vs "?", without normalization
    and the correct intention, how would the tools/processes know what to do
    with these two characters? When to treat them as two different characters
    and when to treat them as the same? "none" for normalization
    would tell them to treat these as different and if it's "NFD"
    or "NFC", these two would be the same (not identical).

    This is going to be increasingly common because:
    +1 


    1. We are more likely to receive true multilingual static content these
    days. And, we don't have to look far for an example. In Canada, most content
    has to be available both in English and French at the same time.
    2. When we deal with multimedia type content, the use of more than one
    language within the same context is even more frequent. In my own household,
    a combination of Mandarin, Taiwanese, Japanese, and Hebrew are often mixed
    in with English.

    I am actually curious if the spoken language content interchange is out
    of scope of XLIFF in general? What happens when we embed this into an interactive
    format? Do we give our community the guideline that if one is working with
    translation requests that are not limited to written languages, don't use
    XLIFF for interchange?
     
    To include voice content we would need to re-charter. I
    think it is the next frontier and worth discussion.. I would just see it
    not at the front burner right now with 2.0 preparing for the first public
    review.. 




    From:         "Kevin
    O'Donnell" < kevinod@microsoft.com >

    To:         "Dr.
    David Filip" < David.Filip@ul.ie >,
    Helena S Chapman/San Jose/IBM@IBMUS
    Cc:         "Schurig,
    Joachim" < Joachim.Schurig@lionbridge.com >,
    Ryan King < ryanki@microsoft.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/21/2013
    04:30 PM
    Subject:      
      RE: [xliff] R37: Revised Validations
    Module proposal
    Sent by:      
      < xliff@lists.oasis-open.org >







    Hi David,
     
    This is a fair point. I’d be interested in hearing from tool providers
    about their preference here also.
     
    I see two scenarios that are relevant here:
     
    1.       Global (language neutral) rules : these rules
    are the most common here at Microsoft and do not differentiate per-language
    (e.g. formatting rules). Therefore, rules of this nature would not require/benefit
    from normalization
    2.       Language-specific rules : these rules, by
    default, may benefit from normalization and indeed would help reduce false
    positives if present
     
    In my thinking, keeping “none” as default keeps the module simple and
    avoids unnecessary overhead by the processing agent when not required.
    If the XLIFF creator is implementing language-specific rules, then they
    have an onus to specify their preferred normalization approach.

     
    If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that
    may also indicate the likely best default setting here.

     
    Other feedback/thoughts appreciated.
     
    Thanks,
    Kevin.
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Dr. David Filip
    Sent: Thursday, March 21, 2013 12:08 PM
    To: Helena S Chapman
    Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: Re: [xliff] R37: Revised Validations Module proposal

     
    Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday
    to the list.
    While there was no formal conclusion the discussion tended to inclusion
    of normalization types along the lines of the size restriction module.

    I think there was no doubt that we should provide a vehicle for conveying
    the normalization type required.
    There did not seem to be a clear consensus on the default value though.
    I personally think that the default should NOT be "none". This
    option for default seems vague and obscure to me..  It lets the processor
    guess based on tribal knowledge what they should do not to produce tons
    of false positives. I thought that we wanted to be naive implementer friendly..
    :-)
     
    Cheers
    dF

    Dr. David Filip
    =======================
    LRC CNGL LT-Web CSIS
    University of Limerick, Ireland
    telephone: +353-6120-2781

    cellphone: +353-86-0222-158

    facsimile: +353-6120-2734

    mailto: david.filip@ul.ie

     
    On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com >
    wrote:
    You summarized correctly of my own recollection of the discussion.




    From:         "Schurig,
    Joachim" < Joachim.Schurig@lionbridge.com >

    To:         Ryan King
    < ryanki@microsoft.com >,
    Helena S Chapman/San Jose/IBM@IBMUS
    Cc:         " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >,
    Yves Savourel < ysavourel@enlaso.com >

    Date:         03/20/2013
    07:29 PM
    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal

     






    Hi Ryan,
     
    while yours was my initial position as well, I do not think that it was
    the outcome of the discussion in the TC. We do have already mention of
    the normalization approach in the size restriction module, so it would
    make sense to include it here, too, and I think this was the conclusion
    on the Tuesday call. You could leave the default to “none” and declare
    that this would leave it to the processing agent how to deal with the situation,
    but if any of “nfd” or “nfc” values are set it should lead to more
    specific behavior. Could this be an acceptable solution to all parties?

     
    Cheers,
    Joachim
     
    From: xliff@lists.oasis-open.org
    [ mailto:xliff@lists.oasis-open.org ]
    On Behalf Of Ryan King
    Sent: Mittwoch, 20. März 2013 17:58
    To: Helena S Chapman
    Cc: xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

     
    Yes, Helena, thanks for checking with me that. We did discuss it and feel
    that the processing agent should be responsible for normalization of text
    and so we will explicitly state that in the module.

    Thanks,
    Ryan

    Sent from my Windows Phone


     




    From: Helena S Chapman
    Sent: 3/20/2013 4:43 AM
    To: Ryan King
    Cc: xliff@lists.oasis-open.org ;
    Yves Savourel
    Subject: RE: [xliff] R37: Revised Validations Module proposal

    Did Kevin convey the comments about normalization to you? How do we expect
    to deal with that in the spec?



    From:         Ryan
    King < ryanki@microsoft.com >

    To:         Yves Savourel
    < ysavourel@enlaso.com >,
    " xliff@lists.oasis-open.org "
    < xliff@lists.oasis-open.org >

    Date:         03/20/2013
    02:41 AM
    Subject:         RE:
    [xliff] R37: Revised Validations Module proposal
    Sent by:         < xliff@lists.oasis-open.org >



     







    Hi Yves, all,

    We suggest that for mustLoc, we enclose the source and replacement target
    values in parenthesis, like so: mustLoc="(World) (Welt)"

    If for any reason, a parenthesis is required to be translated, as a brace
    for example, we could escape it like so: mustLoc="((World)) ({Welt})"


    Since we are generalizing the dblSpace to occurrences, then we could do
    something similar there as well: occurrences="( ) (3)"

    For example, where 3 pipes need to occur in the target for whatever reason.


    Further comments or suggestions welcome.




  • 18.  Re: [xliff] R37: Revised Validations Module proposal

    Posted 04-09-2013 18:53
    Thanks Helena, @Fredrik, in the other conversation today it seemed to me you wanted to stick to default "none" in your module.. Is it so, or can you go with "NFC" as default? I do not mind really just trying to nail down a summary consensus on the normalization issue.. Thanks dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Apr 9, 2013 at 7:33 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: I am ok with this as you described. From:         "Dr. David Filip" < David.Filip@ul.ie > To:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >, "Estreen, Fredrik" < Fredrik.Estreen@lionbridge.com >, Helena S Chapman/San Jose/IBM@IBMUS, Tom Comerford < tom@supratext.com >, Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         04/09/2013 07:07 AM Subject:         Re: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org > Thanks Joachim, I believe we agree.. (see also inline) So the consensus should be something like this: No storage default. NFC as comparison default in all three cases core and both modules, but both modules allowing for override by having the dedicated attribute @Helena, @Fredrik, others, Please shout by Wed End of Your Day, if this does not seem OK, otherwise I'd like to ask Fredrik and Ryan to implement this in their modules by the end of their day Thu, so that Tom and I can finalize the spec for meeting next week @Tom, will you be able to modify schema by Monday if this is done by Friday? Thanks dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 mailto: david.filip@ul.ie On Tue, Apr 9, 2013 at 10:38 AM, Schurig, Joachim < Joachim.Schurig@lionbridge.com > wrote: I opt for no normalization on core – quote from my other mail today:   I actually think there should not be an attribute about normalization in the XLIFF core. In my opinion it makes sense to have it in length restrictions (because they refer to the storage format) and to a lesser extent in validation, but not in core. If you expect or need a specific normalization in your document, apply it. +1    Actually, even the normalization attributes in the size restriction module do not require the content be stored in that normalization +1  – which would also be difficult as the same content could have e.g. NFC for their storage size applied and NFD for general size restrictions.. it’s only used to know which calculation about sizes to apply. +1    So I think we both agree.   Regards, Joachim   From: Dr. David Filip [mailto: David.Filip@ul.ie ] Sent: Dienstag, 9. April 2013 01:41 To: Estreen, Fredrik Cc: Helena S Chapman; Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   All, following the TC discussions last week, I believe the following needs to be answered.   Does the core need a default or even enforced storage normalization? I believe the answer is no and we are OK with specifying a default for content comparison purposes that can be overridden in the validation module and in the size restriction module and nowhere else in the spec as it stands now AFAIK. In case these two introduce an explicit attribute that allows for the override, the default should be NFC in both cases, same as in 2.6.8.   If we decided that it is not enough to have normalization defaults for comparison purposes ONLY, we could introduce an optional normalization attribute in core that could live on any of the structural elements, from <file> down to <source> and <target>, there would be inheritance and the default/inherited would be assumed (MUST for processors) where nothing is specified/inherited.   The default could be either "none" or "NFC" In case we go for the core attribute, I believe the default should be "none" for everything (including storage) except comparison purposes that would cover section 2.6.8 and both modules.   Please note that I am NOT actually proposing to have the core attribute, I am just trying to accelerate the discussion by charting all viable options.   Please indicate what option seems preferable to you, eventually if you see any other viable options..   Thanks and regards dF   Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 mailto: david.filip@ul.ie   On Tue, Apr 2, 2013 at 3:59 PM, Estreen, Fredrik < Fredrik.Estreen@lionbridge.com > wrote: Hi All,   Just back from vacation and catching up on email. Reading the current spec I think it could make sense to simply rely on section “2.6.8 Content Comparison” in the core specification for what (if any) normalization to apply for validation comparisons. It stipulates that NFC is used to compare the equality of content.   Best regards, Fredrik Estreen   From: xliff@lists.oasis-open.org [mailto: xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: den 2 april 2013 14:49 To: Helena S Chapman Cc: Kevin O'Donnell; Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   IMHO validation is different use case to size restriction. It makes sense to have none as default there but not here.. See inline.. Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone:  +353-6120-2781 cellphone: +353-86-0222-158 facsimile:  +353-6120-2734 mailto: david.filip@ul.ie   On Wed, Mar 27, 2013 at 6:02 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: Unfortunately, I don't entirely agree. I understand what you are suggesting with "language neutral rules". However, Unicode normalization should often be applied even when there is no language rules that should be associated with it. +1  For instance, when someone send an English document which contains two distinctly different characters "?" vs "?", without normalization and the correct intention, how would the tools/processes know what to do with these two characters? When to treat them as two different characters and when to treat them as the same? "none" for normalization would tell them to treat these as different and if it's "NFD" or "NFC", these two would be the same (not identical). This is going to be increasingly common because: +1  1. We are more likely to receive true multilingual static content these days. And, we don't have to look far for an example. In Canada, most content has to be available both in English and French at the same time. 2. When we deal with multimedia type content, the use of more than one language within the same context is even more frequent. In my own household, a combination of Mandarin, Taiwanese, Japanese, and Hebrew are often mixed in with English. I am actually curious if the spoken language content interchange is out of scope of XLIFF in general? What happens when we embed this into an interactive format? Do we give our community the guideline that if one is working with translation requests that are not limited to written languages, don't use XLIFF for interchange?   To include voice content we would need to re-charter. I think it is the next frontier and worth discussion.. I would just see it not at the front burner right now with 2.0 preparing for the first public review..  From:         "Kevin O'Donnell" < kevinod@microsoft.com > To:         "Dr. David Filip" < David.Filip@ul.ie >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com >, Ryan King < ryanki@microsoft.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/21/2013 04:30 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org > Hi David,   This is a fair point. I’d be interested in hearing from tool providers about their preference here also.   I see two scenarios that are relevant here:   1.       Global (language neutral) rules : these rules are the most common here at Microsoft and do not differentiate per-language (e.g. formatting rules). Therefore, rules of this nature would not require/benefit from normalization 2.       Language-specific rules : these rules, by default, may benefit from normalization and indeed would help reduce false positives if present   In my thinking, keeping “none” as default keeps the module simple and avoids unnecessary overhead by the processing agent when not required. If the XLIFF creator is implementing language-specific rules, then they have an onus to specify their preferred normalization approach.   If we can surmise the likely prevalence of scenario 1 vs. scenario 2, that may also indicate the likely best default setting here.   Other feedback/thoughts appreciated.   Thanks, Kevin.   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Dr. David Filip Sent: Thursday, March 21, 2013 12:08 PM To: Helena S Chapman Cc: Schurig, Joachim; Ryan King; xliff@lists.oasis-open.org ; Yves Savourel Subject: Re: [xliff] R37: Revised Validations Module proposal   Guys, after a brief spotcheck I just forwarded Asanka's minutes from Tuesday to the list. While there was no formal conclusion the discussion tended to inclusion of normalization types along the lines of the size restriction module. I think there was no doubt that we should provide a vehicle for conveying the normalization type required. There did not seem to be a clear consensus on the default value though. I personally think that the default should NOT be "none". This option for default seems vague and obscure to me..  It lets the processor guess based on tribal knowledge what they should do not to produce tons of false positives. I thought that we wanted to be naive implementer friendly.. :-)   Cheers dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 cellphone: +353-86-0222-158 facsimile: +353-6120-2734 mailto: david.filip@ul.ie   On Thu, Mar 21, 2013 at 6:44 PM, Helena S Chapman < hchapman@us.ibm.com > wrote: You summarized correctly of my own recollection of the discussion. From:         "Schurig, Joachim" < Joachim.Schurig@lionbridge.com > To:         Ryan King < ryanki@microsoft.com >, Helena S Chapman/San Jose/IBM@IBMUS Cc:         " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org >, Yves Savourel < ysavourel@enlaso.com > Date:         03/20/2013 07:29 PM Subject:         RE: [xliff] R37: Revised Validations Module proposal   Hi Ryan,   while yours was my initial position as well, I do not think that it was the outcome of the discussion in the TC. We do have already mention of the normalization approach in the size restriction module, so it would make sense to include it here, too, and I think this was the conclusion on the Tuesday call. You could leave the default to “none” and declare that this would leave it to the processing agent how to deal with the situation, but if any of “nfd” or “nfc” values are set it should lead to more specific behavior. Could this be an acceptable solution to all parties?   Cheers, Joachim   From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Ryan King Sent: Mittwoch, 20. März 2013 17:58 To: Helena S Chapman Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal   Yes, Helena, thanks for checking with me that. We did discuss it and feel that the processing agent should be responsible for normalization of text and so we will explicitly state that in the module. Thanks, Ryan Sent from my Windows Phone   From: Helena S Chapman Sent: 3/20/2013 4:43 AM To: Ryan King Cc: xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] R37: Revised Validations Module proposal Did Kevin convey the comments about normalization to you? How do we expect to deal with that in the spec? From:         Ryan King < ryanki@microsoft.com > To:         Yves Savourel < ysavourel@enlaso.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         03/20/2013 02:41 AM Subject:         RE: [xliff] R37: Revised Validations Module proposal Sent by:         < xliff@lists.oasis-open.org >   Hi Yves, all, We suggest that for mustLoc, we enclose the source and replacement target values in parenthesis, like so: mustLoc="(World) (Welt)" If for any reason, a parenthesis is required to be translated, as a brace for example, we could escape it like so: mustLoc="((World)) ({Welt})" Since we are generalizing the dblSpace to occurrences, then we could do something similar there as well: occurrences="( ) (3)" For example, where 3 pipes need to occur in the target for whatever reason. Further comments or suggestions welcome.


  • 19.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-20-2013 14:37
    Hi Ryan, > We suggest that for mustLoc, we enclose the source and replacement > target values in parenthesis, like so: mustLoc="(World) (Welt)" > If for any reason, a parenthesis is required to be translated, > as a brace for example, we could escape it like so: > mustLoc="((World)) ({Welt})" > Since we are generalizing the dblSpace to occurrences, then we > could do something similar there as well: occurrences="( ) (3)" > For example, where 3 pipes need to occur in the target for whatever reason. Wouldn't it be simpler if we were to escape ' ' with '': That would be only less escaping to do. -ys


  • 20.  RE: [xliff] R37: Revised Validations Module proposal

    Posted 03-20-2013 17:20
    Hi Yves, all, Yes, escaping the pipe would be a much simpler solution to the original proposal. However, I think parenthesis is a better solution because it rules out ambiguity. For example, the following are not the same: mustLoc="World Welt" mustLoc="World Welt" mustLoc=" World Welt " But I might think they are and the only difference between them is making the value more human readable. Whereas, the following is less ambiguous: mustLoc="(World) (Welt)" mustLoc="( World ) ( Welt )" Other opinions are welcome, of course. Ryan