MHonArc v2.5.0b2 -->
xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Fwd: Handling escaped characters in Translation Units
Dear TC, the xliff-tools project, would greatly appreciate your insight on the
following problem they have been discussing:
---------- Forwarded Message ----------
Subject: Handling escaped characters in Translation Units
Date: Tuesday 10 May 2005 16:52
From: Asgeir Frimannsson <asgeirf@redhat.com>
To: Paul Gampe <pgampe@redhat.com>
Cc: Jim Hogan <j.hogan@qut.edu.au>
Hi Paul,
Here's an issue we've been discussing up and down on the xliff-tools
mailing-list, - a discussion initiated by Yves Savourel last week. I believe
this is an issue that needs a reccommended approach by the XLIFF TC. Let me
know what you think :)
Handling Escaped Characters in Translation Units
In source code, it is very common to use escape characters for characters
like newline (\u000A) and horizontal tab (\u0009).
For example:
printf("Please Enter the following Data:\n\
\t- First Name\n\
\t- Last Name\n");
Here we've used the escape characters '\n' and '\t' representing newlines and
tabs.
This fragment would be represented in PO as follows:
msgid ""
"Please Enter the following Data:\n"
"\t- First Name\n"
"\t- Last Name\n"
This could be mapped to XLIFF using two different approaches:
Approach A:
We could preserve the escaped characters:
<source>Please Enter the following Data:\n\t- First Name\n\
\t- First Name\n\t- Last Name\n</source>
We could further enhance this by abstracting the escaped characters to <ph>
elements:
<source>Please Enter the following Data:<ph id='1' ctype='lb'>\n</ph>\
<ph id='2' ctype='x-ht'>\t</ph>- First Name<ph id='3' ctype='lb'>\n</ph>\
<ph id='4' ctype='x-ht'>\t</ph>- First Name<ph id='5' ctype='lb'>\n</ph>\
</source>
Issue A-1: If using this approach, would filters have to discard real newline
characters (\u000A) in translation units? How would this affect TM lookups?
Issue A-2: How would editors handle this approach? For software messages,
they would have to disable entering newlines, and in some way format the
message after the value of the ctype attributes? (Not having visual
indicators for e.g. newlines would not be a very
translator-useability-friendly approach).
Issue A-3: Where do we stop? In Java .properties files we usually add a
"\u0020" to indicate a leading space, For example:
my_message = \u0020Some Text
Should this be represented as:
<source>\u0020Some Text</source>
or
<source> Some Text</source>
?
Approach B:
Many of the escaped characters have native unicode values we could use in
XLIFF. We could replace '\t' with a real TAB (\u0009) character, and similar
with other escape characters, giving us the following XLIFF fragment:
<source>Please Enter the following Data:
- First Name
- Last Name
</source>
Issue B-1: DOS/Windows use "\r\n", while UNIX (and most programming
languages) use "\n" as line endings. How would we on back-conversion know if
we should write "\n" or "\r\n" in the translated source file.
Issue B-2: There are some escape characters used in PO (and probably other
source formats?) that XML does not allow. For example the "\b" (\u0007, the
Alert or Bell control character). How should these be handled? (Yes, asking
the developer what that character is doing in a localised message is a good
start)
Conclusion
It would be good to have a reccommended approach for handling this, which all
representation guides could share.
The full archived discussion on this, is available at:
http://lists.freedesktop.org/archives/xliff-tools/2005-May/000169.html
cheers,
asgeir
-------------------------------------------------------
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]