Thank
you for the summary, Tony. I agree with the options, but I have a few comments
about compatibility and the need to retool. And I actually have another option
too.
- Shall be comprised of
small changes that would not require re-qualification of supporting tools or
technologies
There
are several aspects to compatibility to consider:
1.
XLIFF 1.0 document validates against XLIFF 1.1 schema. Given the flexibility of
schemas, it would almost always be possible to create a schema that allowed both
1.0 and 1.1 structures.
2.
XLIFF 1.1 tool can process either XLIFF 1.0 or 1.1 documents without
requiring extensive effort to handle XLIFF 1.0 documents.
3. XLIFF 1.0 tool
can process either XLIFF 1.0 or 1.1 documents without modification (assuming a
reasonably careful implementation).
Aspects #1 and #2 deal with backward compatibility (from the tool's
perspective). That is, new tools and new schemas handle old data. The issue is
not one of possibility, but of practicality. Is it easy to create the
tools?
Aspect
#3 is forward compatibility (from the tool's perspective). That is, can the old
tool handle the new data? This is similar to asking whether MS Word 97 can read
a MS Word 2000 document (allowing for some loss). Another example is whether an
old browser, say IE 3, can render a new HTML document, say XHTML 1.0. Again,
allowing for some loss for unknown tags. The primary rule for forward
compatibility in a browser is, "render the contents of an unknown tag". This
aspect of forward compatibility is crucial to meeting the guideline for not
re-qualifying supporting tools.
XLIFF
tools, however, are not as simple as browsers. An XLIFF tool must be able to
modify the contents, not just render them. Because the contents must be
modified, the XLIFF tool requires more knowledge of the tags. This is why adding
extension points (non XLIFF tags) to content within <source> and
<target> has been deferred.
Here
are some comments regarding each option listed below as they pertain to
"re-qualification of supporting tools or technologies".
Option
1 (siblings)
I
believe this is forward compatible, assuming the tool doesn't assume that
<target> immediately follows <source>.
The
other concern is how <target-info> appears in <alt-trans> where
multiple <target> elements are allowed.
I took
another look at the XLIFF 1.0 DTD. Here are the <trans-unit> and
<alt-trans> definitions:
<!ELEMENT trans-unit
(source,target?,(count-group|note|context-group|prop-group|alt-trans)*)
>
<!ELEMENT alt-trans
(source?,target+,(note|context-group|prop-group)*) >
The
new DTD would be:
<!ELEMENT trans-unit (source, source-info?, target?,
target-info?,(count-group|note|context-group|prop-group|alt-trans)*)
>
<!ELEMENT alt-trans
(source?, source-info?, (target,
target-info?)+, (note|context-group|prop-group)*)
>
I think we all have some reservations about
this approach because it is awkward to have two source elements and worse yet,
difficult to match a given <target-info> element with its corresponding
<target> element.
Option
2 (restructure)
We all
agree this is a clean structure but not compatible.
Option
3 (embedded)
Allow
me to given a different example using a <font> tag and a placeholder
tag.
<trans-unit id="Option 1" translate="yes >
<source><font face="Arial" size="2">
</font><ph/>Source
Text</source>
<target><font face="Arial" size="3">
</font><ph/>Translated
Text </target>
</trans-unit>
The
inclusion of extension points for <source> and <target> are deferred
because they introduce unknown tags into text that is processed by a TM tool.
This option introduces unknown tags to the text content. This option isn't fully
compatible because the TM tool will need to ignore <font> and other
unknown tags. Granted the unknown tags should come before the rest of the text
to be translated, but I still do not believe it is forward compatible.
Besides, correctly parsing this structure is almost impossible. How does
the tool know which tag is the last format tag and which is the first inline
"placeholder" tag? Adding more "placeholder" tags to the specification would be
impossible because the tool would have to assume any unknown tag is a format
tag. This appears to not be a viable option.
Option
4 (combined)
This
really isn't technically different than Option 2 other than to say that the
XLIFF 1.1 schema and XLIFF 1.1 tools must support the old XLIFF 1.0 structure as
well as the new structure. I do believe the effort is minimal to have the
<source-info> and <target-info> tags be optional. However, if they
are present, they will likely to break existing XLIFF 1.0 tools that look
for the <source> as an immediate child of <trans-unit>. For
instance, my existing XSL transforms would need to be updated to support
XLIFF 1.1 documents. Therefore, this option isn't fully compatible with 1.0 even
though it is backward compatible.
With
all this said, I went back to determine the original purpose for proposing
elements for reformatting. The issue is concerning being able to specify which
format values may be modified during translation. In XLIFF 1.0, as you know,
there are several attributes to specify formatting for the text.
Namely, coord, font, css-style, style, and exstyle. The 'reformat'
attribute of <trans-unit> is either "yes" or "no" indicating whether any
or none of the format attribute values can be changed. The changed value is
stored in the <target> tag.
The
problem is that 'reformat' does not give sufficient control to be able to say
that some formats may be changed, but others cannot. For example, it is allowed
to change the coord-cx, but not coord-x or coord-y. The original proposal was to
move each format attribute to be elements and each element would have its own
'reformat' attribute. This approach is fine except for the compatibility
problems that have been discussed at length.
Here's
the new option.
Extend
the possible values for the 'reformat' attribute to provide sufficient control.
XLIFF 1.0 presently uses ";"-delimited lists within attribute values to store
multiple values. The 'coord' attribute is an example. It's value is actually
four: "x;y;cx;cy", where "#" can be used for 'don't
care'.
So
let's extend 'reformat' the same way. Of course, we keep "yes" and "no" for
compatibility.
"yes"
= all format attributes may be changed
"no" =
no format attributes may be changed
...or
a semicolon-delimited list of the following in any order. If an attribute is
listed, it means it may be reformatted.
coord
= all 4 coords
coord-x
coord-y
coord-cx
coord-cy
font =
all 3 font values
font-name
font-size
font-weight
css-style
style
exstyle
Example,
<trans-unit coord="#;#;183;272" font="Arial;2;normal"
reformat="coord-cx;font-name" ...>
<source>...</source>
<target coord="#;#;181;272"
font="System;2;normal">...</target>
<alt-trans coord="#;#;183;272"
font="Arial;2;normal">
<target coord="#;#;180;272"
font="Arial Bold;2;normal">...</target>
<target coord="#;#;185;272"
font="Arial, Helvetica;2;normal">...</target>
</alt-tran>
</trans-unit>
Parsing the reformat list is fairly easy, even with XSLT, which has a
limited set of string functions.
This
option is 100% compatible, both forward and backward. It does not affect the
structure at all. The only problem I can foresee an XLIFF 1.0 tool having is if
an invalid value for reformat is assumed to be "yes" instead of "no" and allows
some values to be changed that should. That is, an XLIFF 1.0 tool could
interpret a value of "coord-cx;font-name" as "no" and not allow any of the
format value to change. Of course, if it assumed "no" instead of "yes" it would
not allow any changes. Since the default value for 'reformat' is "yes", I don't
see either of the possibilities as being too harmful.
Regards,
Doug Domeny
Ektron, Inc. +1 603
594-0249 http://www.ektron.com
This thread already has a best answer. Would you like to mark this message as the new best answer?
|