MHonArc v2.5.0b2 -->
xliff message
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]
Subject: Processing extension elements
It seems to me that the fundamental question for extension
in <source>/<target> is how "generic" tools will be able to
deal with them, while preserving them.
Here are the of possible extension processings I can think
of (without worrying about how this would be expressed in
the XLIFF schema):
#1- The unknown elements and their content are stripped out.
#2- The unknown elements are stripped out, their content
left part of the <source>/<target>.
#3- The unknown elements are preserved and treated as <g>
(or <x/> if they are empty elements).
#4- The unknown elements are preserved treated as <ph>
(their content is seen as code).
#5- The unknown elements have some XLIFF-understood
indication on how to be treated.
- "generic tool" means a tool that does the minimal
processing allows by the specifications. It does not known
any specific extension.
- "as seen by a generic tool" means how the unknown tags
would be interpreted in memory (regardless how they are
actually represented) by tools that would not know what
to do with them.
- There are actually two cases of processing: during merge
and not during merge. During a merge process the unknown
elements should be ignored by the generic tool (just like
an <mrk> element). One has to decide what to do with the
content: discard it or treat it as part of the text.
Now let's see examples, pros, and cons for each case:
============================================================
#1- The unknown elements and their content are stripped out.
------------------------------------------------------------
The more drastic solution.
Original entry:
<source xml:lang='en'>This is <htm:b>big</htm:b></source>
As seen a generic tool:
<source xml:lang='en'>This is </source>
Saved by a generic tool:
<source xml:lang='en'>This is </source>
Probably not what we want as extensions that would enclose
the original content become death trap for translatable
text.
============================================================
#2- The unknown elements are stripped out, their content
left part of the <source>/<target>.
------------------------------------------------------------
Original entry:
<source xml:lang='en'>This is <htm:b>big</htm:b></source>
As seen a generic tool:
<source xml:lang='en'>This is big</source>
Saved by a generic tool:
<source xml:lang='en'>This is big</source>
A very simple way to deal with unknown tags. But it would
add un-wanted content if the content of the extension
elements are really metadata, as shown below.
Original entry:
<source xml:lang='en'>This is
<x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
</source>
As seen a generic tool:
<source xml:lang='en'>This is big'big</source>
Saved by a generic tool:
<source xml:lang='en'>This is big'big</source>
============================================================
#3- The unknown elements are preserved and treated as <g>
(or <x/> if they are empty elements).
------------------------------------------------------------
Original entry:
<source xml:lang='en'>This is <htm:b>big</htm:b></source>
As seen a generic tool:
<source xml:lang='en'>This is <g id='0'>big</g></source>
Saved by a generic tool:
<source xml:lang='en'>This is <htm:b>big</htm:b></source>
This solution would also add un-wanted content if the
content of the extension elements are really metadata, as
shown below.
Original entry:
<source xml:lang='en'>This is
<x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
</source>
As seen a generic tool:
<source xml:lang='en'>This is <g id='0'><g id='1'>big</g>
<g id='2'>'big</g></g></source>
Saved by a generic tool:
<source xml:lang='en'>This is
<x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
</source>
============================================================
#4- The unknown elements are preserved treated as a <ph>
(their content is seen as code).
------------------------------------------------------------
This is John's senario (I think). It works fine if the
content of all extension elements is metadata.
Original entry:
<source xml:lang='en'>This is big<x:note>blah blah</x:note>
</source>
As seen a generic tool:
<source xml:lang='en'>This is big<ph id='0'>blah blah</ph>
</source>
Saved by a generic tool:
<source xml:lang='en'>This is big<x:note>blah blah</x:note>
</source>
But it does not work for text content inside extension
elements, as it would be seen as "code".
Original entry:
<source xml:lang='en'>This is <htm:b>big</htm:b></source>
As seen a generic tool:
<source xml:lang='en'>This is <ph id='0'>big</ph></source>
(Code not text --------------------------^ )
Saved by a generic tool:
<source xml:lang='en'>This is <htm:b>big</htm:b></source>
============================================================
#5- The unknown elements have some XLIFF-understood
indication on how to be treated.
------------------------------------------------------------
There are two ways to indicate this:
By an XLIFF-defined attribute the extension elements would
have or by enclosing the extensions in a special new XLIFF
element such as <extend>.
Original entry:
<source xml:lang='en'>This is
<x:def xlf:totrans='yes'><x:term>big</x:term><x:pron
xlf:totrans='no'>'big</x:pron></x:def></source>
As seen a generic tool:
<source xml:lang='en'>This is <g id='0'><g id='1'>big</g>
<ph id='2'>'big</ph></g></source>
Saved by a generic tool:
<source xml:lang='en'>This is
<x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
</source>
This is more flexible since it allows to specify how to
process things.
However, it may not always doable as the extension elements
may belong to a namespace that does not allow extension
itself, so you would not be able to use xlf:totrans (or
wahtever flag decided on). For that the solution would be to
use an <extend> XLIFF element as Matt (I think) suggested.
But as you can imagine this would start to make the
<source>/<target> content rather crowded.
============================================================
Personnal opinion
------------------------------------------------------------
It seems that allowing to extension elements that can have
either translatable or "code" content in <source>/<target>
would add a significate cost in processing and complexity,
while I'm not sure allowing code content (i.e. meta-data)
would be wise anyway.
I see no big problem with the <html:b>-type of extensions as
they are simply a more customized way of using <mrk> and
"generic" tools could probably deal with them without to
much change in their implementation. So, I tend to like
solution #3 better (at least for now).
-yves
[Date Prev]
| [Thread Prev]
| [Thread Next]
| [Date Next]
--
[Date Index]
| [Thread Index]
| [List Home]