OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only

Fwd: Handling escaped characters in Translation Units

  • 1.  Fwd: Handling escaped characters in Translation Units

    Posted 05-23-2005 16:21
     MHonArc v2.5.0b2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    xliff message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


    Subject: Fwd: Handling escaped characters in Translation Units


    Dear TC, the xliff-tools project, would greatly appreciate your insight on the 
    following problem they have been discussing:
    
    ----------  Forwarded Message  ----------
    
    Subject: Handling escaped characters in Translation Units
    Date: Tuesday 10 May 2005 16:52
    From: Asgeir Frimannsson <asgeirf@redhat.com>
    To: Paul Gampe <pgampe@redhat.com>
    Cc: Jim Hogan <j.hogan@qut.edu.au>
    
    Hi Paul,
    
    Here's an issue we've been discussing up and down on the xliff-tools
     mailing-list, - a discussion initiated by Yves Savourel last week. I believe
     this is an issue that needs a reccommended approach by the XLIFF TC. Let me
     know what you think :)
    
    Handling Escaped Characters in Translation Units
    
    In source code, it is very common to use escape characters for characters
     like newline (\u000A) and horizontal tab (\u0009).
    
    For example:
    
    printf("Please Enter the following Data:\n\
    \t- First Name\n\
    \t- Last Name\n");
    
    Here we've used the escape characters '\n' and '\t' representing newlines and
     tabs.
    
    This fragment would be represented in PO as follows:
    
    msgid ""
    "Please Enter the following Data:\n"
    "\t- First Name\n"
    "\t- Last Name\n"
    
    
    This could be mapped to XLIFF using two different approaches:
    
    Approach A:
    
    We could preserve the escaped characters:
    
    <source>Please Enter the following Data:\n\t- First Name\n\
    \t- First Name\n\t- Last Name\n</source>
    
    We could further enhance this by abstracting the escaped characters to <ph>
     elements:
    
    <source>Please Enter the following Data:<ph id='1' ctype='lb'>\n</ph>\
    <ph id='2' ctype='x-ht'>\t</ph>- First Name<ph id='3' ctype='lb'>\n</ph>\
    <ph id='4' ctype='x-ht'>\t</ph>- First Name<ph id='5' ctype='lb'>\n</ph>\
    </source>
    
    Issue A-1: If using this approach, would filters have to discard real newline
     characters (\u000A) in translation units? How would this affect TM lookups?
    
    Issue A-2: How would editors handle this approach? For software messages,
     they would have to disable entering newlines, and in some way format the
     message after the value of the ctype attributes? (Not having visual
     indicators for e.g. newlines would not be a very
     translator-useability-friendly approach).
    
    Issue A-3: Where do we stop? In Java .properties files we usually add a
     "\u0020" to indicate a leading space, For example:
    
    my_message = \u0020Some Text
    
    Should this be represented as:
    
    <source>\u0020Some Text</source>
    or
    <source> Some Text</source>
    ?
    
    Approach B:
    
    Many of the escaped characters have native unicode values we could use in
     XLIFF. We could replace '\t' with a real TAB (\u0009) character, and similar
     with other escape characters, giving us the following XLIFF fragment:
    
    <source>Please Enter the following Data:
    	- First Name
    	- Last Name
    </source>
    
    Issue B-1: DOS/Windows use "\r\n", while UNIX (and most programming
     languages) use "\n" as line endings. How would we on back-conversion know if
     we should write "\n" or "\r\n" in the translated source file.
    
    Issue B-2: There are some escape characters used in PO (and probably other
     source formats?) that XML does not allow. For example the "\b" (\u0007, the
     Alert or Bell control character). How should these be handled? (Yes, asking
     the developer what that character is doing in a localised message is a good
     start)
    
    Conclusion
    
    It would be good to have a reccommended approach for handling this, which all
     representation guides could share.
    
    The full archived discussion on this, is available at:
    http://lists.freedesktop.org/archives/xliff-tools/2005-May/000169.html
    
    cheers,
    asgeir
    
    -------------------------------------------------------
    


    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]