OASIS eXtensible Access Control Markup Language (XACML) TC

 View Only
  • 1.  Unicode issues

    Posted 11-02-2008 14:40
    All,
    
    We have previously discussed unicode issues for our string functions and 
    the W3C working draft here:
    
    http://www.w3.org/TR/2005/WD-charmod-norm-20051027/
    
    I posted some questions for clarification about this to their mailing list.
    
    http://lists.w3.org/Archives/Public/www-international/2008OctDec/0004.html
    
    It turns out that the specification does not meet our needs. After some 
    thinking on the issues I have written up the following for the next 
    working draft:
    
    A new section:
    --8<--
    7.1 Unicode issues
    
    In Unicode it is possible to represent some letters by different 
    character sequences. The process of converting Unicode strings into 
    canonical character sequences is called normalization. An operation is 
    normalization-sensitive if its output(s) are different depending on the 
    state of normalization of the input(s); if the output(s) are textual, 
    they are deemed different only if they would remain different were they 
    to be normalized. (Quoted from [CM]).
    
    An XACML implementation MUST NOT perform any normalization-sensitive 
    operations unless it has ensured that the inputs are normalized. An 
    XACML implementation MUST behave as if each normalization-sensitive 
    operation normalizes the string into Unicode normalization form C. An 
    implementation MAY use some other form of internal processing as long as 
    the externally visible results are identical to this specification.
    
    For more information and specification of normalization forms see [UAX15].
    --8<--
    
    The references are:
    
    [CM]     Character model model for the World Wide Web 1.0: 
    Normalization, W3C Working Draft, 27 October 2005, 
    http://www.w3.org/TR/2005/WD-charmod-norm-20051027/, World Wide Web 
    Consortium.
    
    [UAX15]    Davis, Mark, Unicode Standard Annex #15: Unicode 
    Normalization Forms, Unicode 5.1, available from 
    http://unicode.org/reports/tr15/
    
    In the above mentioned thread on the www-international mailing list I 
    wrote that string equal would be defined by binary equality of the 
    strings if encoded in a common Unicode encoding form, but I think I will 
    stick with what we decided before, that is, "code-point collation" as 
    defined in XQuery.
    
    Regarding case mapping I have added the following formulation to the 
    existing string-normalize-to-lower-case XACML function. "Case mapping 
    shall be done as specified for the fn:lower-case function in [XF] with 
    no tailoring for particular languages or environments." [XF] is 
    http://www.w3.org/TR/2007/REC-xpath-functions-20070123/
    
    I also noted that the existing normalize-space XACML function had no 
    definition of whitespace. I added (like in XQuery): "The whitespace 
    characters are defined in the metasymbol S (Production 3) of [XML].". 
    [XML] refers to http://www.w3.org/TR/2006/REC-xml-20060816/
    
    I have added a section for unicode security issues.
    
    --8<--
    9.3 Unicode security issues
    
    There are many security considerations related to use of Unicode. An 
    XACML implementation SHOULD follow the advice given in the relevant 
    version of [UTR36].
    --8<--
    
    [UTR36] refers to http://unicode.org/reports/tr36/
    
    Best regards,
    Erik