Fredrik,
Thank you for your insights. I’m
using XSLT to extract & merge XLIFF with a skeleton. In fact, for
convenience I use XLIFF 1.1 for the skeleton too. The original content is XHTML
and XML. The <g> tag seems natural for inline tags and assures that the
merged content will be well-formed. I would be concerned that <bpt> and
<ept> might not match correctly. For example,
ORIGINAL SOURCE
Italic texts starts <i><b>in the middle of
first sentence</b>. Italics ends after the second
sentence.</i>
XLIFF SOURCE
<source>Italic texts starts <bpt id='i1' ctype='x-html-i'/><btp id='b1' ctype='x-html-b'/>in
the middle of
first sentence<ept id='b1' ctype='x-html-b'/>. Italics
ends after the second sentence.</ept
id='i1' ctype='x-html-i'></source>
XLIFF TARGET
<target>Italic texts starts <bpt id='i1' ctype='x-html-i'/><btp id='b1' ctype='x-html-b'/>in
the middle of
first sentence<ept id='i1' ctype='x-html-i'/><ept id='b1' ctype='x-html-b'/>.
<bpt id='i1' ctype='x-html-i'/><btp id='b1' ctype='x-html-b'/>Italics
ends after the second sentence.<ept
id='i1' ctype='x-html-i'/></target>
MERGED TRANSLATION
Italic texts starts <i><b>in the middle of
first sentence</i></b>. <i><b>Italics ends after the second
sentence.</i>
Notice that <i> and <b>
overlap and that a closing <b> is missing even though the contents of the
<target> tag are well-formed.
I share some of the same feelings about
the ‘id’ attribute in <trans-unit>. I was using it to
reference back to the skeleton and it was not unique. Then I noticed that the ‘xid’
attribute is used to reference <trans-unit> or <bin-unit> elements,
which I thought would be tenuous because of the lack of uniqueness. In
conversations with Rodolfo, it came to light that the ‘id’
attribute of TU and BU must, in fact, be unique within the <file>
element. It’s implied in the spec, but not explicit. Realizing that, I
couldn’t use it to reference the skeleton, so I’m left with
assigning an arbitrary ‘id’ value because it’s required and
either using a custom attribute (XLIFF 1.1+) or overloading ‘resname’
in XLIFF 1.0.
Perhaps we should change the spec to not
require the ‘id’ attribute, similar to <group>. In XSLT, I’m
left with using generate-id(), which produces unseemly values like ‘ID3F2D83’
in most XML processors or using an extension to run JavaScript to assign a
sequence number, for example, 1, 2, 3, etc.
I’m using ‘id’ in
<g> to reference the skeleton. I’m concerned that segmentation will
cause problems with referencing the skeleton. To illustrate, please consider
the example from above.
ORIGINAL SOURCE
Italic texts starts <i><b>in the middle of
first sentence</b>. Italics ends after the second
sentence.</i>
XLIFF SOURCE
<source>Italic texts starts <g id='1' ctype='x-html-i'><g id='2' ctype='x-html-b'>in
the middle of
first sentence</g>. Italics ends after the second
sentence.</g></source>
where ‘1’ reference <i>
and ‘2’ references <b>.
XLIFF TARGET SEGMENTED
<target>Italic texts starts <g id='1' ctype='x-html-i'><g id='2' ctype='x-html-b'>in
the middle of first sentence</g></g>. <g id='1' ctype='x-html-i'><g id='2' ctype='x-html-b'>Italics
ends after the second sentence.</g></g> </target>
Note that the tags are duplicated so there
are two <g> tags with id=’1’ and two with id=’2’.
There are two <g> tags that map to one <i> in the skeleton and two
to one <b>. This scenario precludes merging the target text with the
skeleton for inline tags that have been duplicated as a result of segmentation
or reordering. Perhaps the target text should not be merged with the skeleton,
but simply reconstructed. This would be a blending of the minimal (with
skeleton) and maximal (no skeleton for inline tags) approach.
The conversion to TMX seems worth
considering too.
Fortunately, none of these issues seem
insurmountable. It’s mostly a matter of clearing up ambiguities as we
resolve interoperability issues and establish best practices.
Regards,
Doug Domeny
Software Analyst
Ektron, Inc.
+1 603 594-0249 x212
http://www.ektron.com
Hello,
I just joined and this is my first post!
My XLIFF experience is mostly as a XLIFF
Editor/filter programmer (Transolution).
I must say that from my
point of view I much prefer the <bpt/ept way of wrapping inline tags,
and if the editor has tag checking it's easy to check that they are valid.
I had the same problem
with deciphering the use of <g tag from the spec as Rodolfo, and until I
read the "XLIFF 1.2 Representation Guide for HTML" I was hoping I
never had to deal with them as containing translatable content. XLIFF is quite
a lot to digest and the <g tag really doubles the effort as it breaks the
simple logic that can be used on a flat structure for translatable content.
Also at some time you will need to convert XLIFF to TMX and then you need to
convert it to <bpt/ept anyway. Using ph/bpt/ept gives you a very
generic and straight forward approach and you preserve the original source
format information exactly as it is and you can treat all formats the same.
That said I can see why people like the <g approach. It's easier to wrap in
existing translation tools and process with XSLT, it also looks nicer in a text
editor and I suppose lessens the need for skeleton files.
I have implementation question regarding the <g tag, in the XLIFF
documentation the specification of the g-tags "id" attribute is
different to that of the ph/bpt/ept:
ph-tag:
The required id attribute is used to identify the <ph> inline code
g-tag:
The required id attribute is used to reference the replaced code in the
skeleton file.
Does this mean that there
can be <g and <ph tags with the same id in a segment? And what
if there is no skeleton file?
This brings me to a general complaint about the XLIFF spec, it is very vague
and leaves a lot of room for personal taste and/or misunderstandings. This
makes if hard to create a generic editor that works with XLIFF's in the wild.
For example TU's have required ID attribute but it can be anything and does not
even have to be unique, so why is it required in the first place?
Cheers,
Fredrik Corneliusson
From:
bryan.s.schnabel@exgate.tek.com [mailto:bryan.s.schnabel@exgate.tek.com]
Sent: den 7 mars 2006 01:26
To: ddomeny@ektron.com;
rodolfo@heartsome.net
Cc: xliff@lists.oasis-open.org
Subject: RE: [xliff] RE: How to
translate text within G tags?
I thought about this when
I wrote that portion of the HTML profile.
From a philosophical
view, I strongly think I bpt/ept should only be used in XLIFF files that
are derived from non-markup formats (RTF, for example).
I really don't like the
idea of using bpt/ept on XLIFF files derived from HTML, XHTML, or XML
files. I see "begin paired tag" and "end paired tag"
as an artificial device. It could easily lead to malformed XML on the
conversion from XLIFF back to HTML.
Assuming the source file
is well formed, it would be a shame to have to delimit inline elements in an
artificial way. If <g tags are defined in the spec in such a way that
they are thought to be for non-translatable text, I would vote to either update
the specification, or come up with a new element for identifying translatable
inline elements in <target elements.
Thanks to Doug and
Rodolfo for brining this issue to light,