Excellent proposal Doug!
It doesn't quite solve modifications
to data in absence of text but definitely provides a suitable compromise
for 1.1.
Mark Levins
IBM Software Group,
Dublin Software Laboratory,
Airways Industrial Estate,
Cloghran,
Dublin 17,
Ireland.
Phone: +353 1 704 6676
IBM Tie Line 166676
Doug Domeny <
ddomeny@ektron.com>
23/01/2003 18:05
Please respond to
ddomeny@ektron.com To
xliff@lists.oasis-open.org cc
Subject
RE: [xliff] From Mat Lovatt:
reformat Summary Of Options.doc
Thank you for the summary, Tony. I agree with the
options, but I have a few comments about compatibility and the need
to retool. And I actually have another option too.
I refer to the guideline for minor releases (
http://lists.oasis-open.org/archives/xliff/200208/msg00005.html ).
Shall be comprised of small changes that
would not require re-qualification of supporting tools or technologies
There are several aspects to compatibility to consider:
1. XLIFF 1.0 document validates against XLIFF 1.1
schema. Given the flexibility of schemas, it would almost always
be possible to create a schema that allowed both 1.0 and 1.1 structures.
2. XLIFF 1.1 tool can process either XLIFF 1.0 or
1.1 documents without requiring extensive effort to handle
XLIFF 1.0 documents.
3. XLIFF 1.0 tool can process either XLIFF 1.0 or
1.1 documents without modification (assuming a reasonably careful
implementation).
Aspects #1 and #2 deal with backward compatibility (from
the tool's perspective). That is, new tools and new schemas handle
old data. The issue is not one of possibility, but of practicality.
Is it easy to create the tools?
Aspect #3 is forward compatibility (from the tool's
perspective). That is, can the old tool handle the new data? This
is similar to asking whether MS Word 97 can read a MS Word 2000 document
(allowing for some loss). Another example is whether an old browser,
say IE 3, can render a new HTML document, say XHTML 1.0. Again, allowing
for some loss for unknown tags. The primary rule for forward compatibility
in a browser is, "render the contents of an unknown tag". This
aspect of forward compatibility is crucial to meeting the guideline
for not re-qualifying supporting tools.
XLIFF tools, however, are not as simple as browsers.
An XLIFF tool must be able to modify the contents, not just render
them. Because the contents must be modified, the XLIFF tool requires
more knowledge of the tags. This is why adding extension points (non
XLIFF tags) to content within <source> and <target> has
been deferred.
Here are some comments regarding each option listed
below as they pertain to "re-qualification of supporting tools
or technologies".
Option 1 (siblings)
I believe this is forward compatible, assuming the
tool doesn't assume that <target> immediately follows <source>.
The other concern is how <target-info>
appears in <alt-trans> where multiple <target> elements
are allowed.
I took another look at the XLIFF 1.0 DTD. Here are
the <trans-unit> and <alt-trans> definitions:
<!ELEMENT trans-unit (source,target?,(count-group note context-group prop-group alt-trans)*)
>
<!ELEMENT alt-trans (source?,target+,(note context-group prop-group)*)
>
The new DTD would be:
<!ELEMENT trans-unit (source, source-info?,
target?, target-info?,(count-group note context-group prop-group alt-trans)*)
>
<!ELEMENT alt-trans (source?,
source-info?, (target, target-info?)+, (note context-group prop-group)*)
>
I think we all have some reservations about this
approach because it is awkward to have two source elements and worse yet,
difficult to match a given <target-info> element with its corresponding
<target> element.
Option 2 (restructure)
We all agree this is a clean structure but not compatible.
Option 3 (embedded)
Allow me to given a different example using a <font>
tag and a placeholder tag.
<trans-unit id="Option 1" translate="yes
>
<source><font face="Arial"
size="2">
</font><ph/>Source
Text</source>
<target><font face="Arial"
size="3">
</font><ph/>Translated
Text </target>
</trans-unit>
The inclusion of extension points for <source>
and <target> are deferred because they introduce unknown tags
into text that is processed by a TM tool. This option introduces
unknown tags to the text content. This option isn't fully compatible
because the TM tool will need to ignore <font> and other unknown
tags. Granted the unknown tags should come before the rest of the text
to be translated, but I still do not believe it is forward compatible.
Besides, correctly parsing this structure is almost impossible.
How does the tool know which tag is the last format tag and which
is the first inline "placeholder" tag? Adding more "placeholder"
tags to the specification would be impossible because the tool would
have to assume any unknown tag is a format tag. This appears to not
be a viable option.
Option 4 (combined)
This really isn't technically different than Option
2 other than to say that the XLIFF 1.1 schema and XLIFF 1.1 tools
must support the old XLIFF 1.0 structure as well as the new structure.
I do believe the effort is minimal to have the <source-info>
and <target-info> tags be optional. However, if they are present, they
will likely to break existing XLIFF 1.0 tools that look for the <source>
as an immediate child of <trans-unit>. For instance, my existing XSL
transforms would need to be updated to support XLIFF 1.1 documents.
Therefore, this option isn't fully compatible with 1.0 even though
it is backward compatible.
With all this said, I went back to determine the
original purpose for proposing elements for reformatting. The issue
is concerning being able to specify which format values may be modified
during translation. In XLIFF 1.0, as you know, there are several
attributes to specify formatting for the text. Namely, coord,
font, css-style, style, and exstyle. The 'reformat' attribute of
<trans-unit> is either "yes" or "no" indicating
whether any or none of the format attribute values can be changed.
The changed value is stored in the <target> tag.
The problem is that 'reformat' does not give sufficient
control to be able to say that some formats may be changed, but others
cannot. For example, it is allowed to change the coord-cx, but not
coord-x or coord-y. The original proposal was to move each format
attribute to be elements and each element would have its own 'reformat'
attribute. This approach is fine except for the compatibility problems
that have been discussed at length.
Here's the new option.
Extend the possible values for the 'reformat' attribute
to provide sufficient control. XLIFF 1.0 presently uses ";"-delimited
lists within attribute values to store multiple values. The 'coord'
attribute is an example. It's value is actually four: "x;y;cx;cy",
where "#" can be used for 'don't care'.
So let's extend 'reformat' the same way. Of course,
we keep "yes" and "no" for compatibility.
"yes" = all format attributes may be changed
"no" = no format attributes may be changed
...or a semicolon-delimited list of the following
in any order. If an attribute is listed, it means it may be reformatted.
coord = all 4 coords
coord-x
coord-y
coord-cx
coord-cy
font = all 3 font values
font-name
font-size
font-weight
css-style
style
exstyle
Example,
<trans-unit coord="#;#;183;272" font="Arial;2;normal"
reformat="coord-cx;font-name" ...>
<source>...</source>
<target coord="#;#;181;272"
font="System;2;normal">...</target>
<alt-trans coord="#;#;183;272"
font="Arial;2;normal">
<target coord="#;#;180;272"
font="Arial Bold;2;normal">...</target>
<target coord="#;#;185;272"
font="Arial, Helvetica;2;normal">...</target>
</alt-tran>
</trans-unit>
Parsing the reformat list is fairly easy, even with XSLT,
which has a limited set of string functions.
This option is 100% compatible, both forward and
backward. It does not affect the structure at all. The only problem
I can foresee an XLIFF 1.0 tool having is if an invalid value for
reformat is assumed to be "yes" instead of "no" and
allows some values to be changed that should. That is, an XLIFF 1.0
tool could interpret a value of "coord-cx;font-name" as
"no" and not allow any of the format value to change. Of
course, if it assumed "no" instead of "yes" it would
not allow any changes. Since the default value for 'reformat' is
"yes", I don't see either of the possibilities as being
too harmful.
Regards,
Doug Domeny
Ektron, Inc.
+1 603 594-0249
http://www.ektron.com