Hi everyone,
I went back to the DITA users who had told me they didn't want to have to put xml:lang on every topic, and they more or less agreed that putting xml:lang on every topic would not be a huge burden. So I believe we have consensus, which we didn't have before, on the following points:
- Setting xml:lang on each topic will assist in reuse across maps, faceted search, and possibly authoring, and is strongly recommended.
- Setting xml:lang on each topic is not too much of a burden for adopters.
Yay :)
However:
- Accidents happen. If a team tries to put xml:lang on every topic but fails in 0.1% of cases, that is approximately one error every 500-1000 pages.
- Some processors treat xml:lang as cascading. With the current wording of the draft 1.2 spec, these processors would be required to stop this behavior.
- There are teams with legacy content that does not have xml:lang on every topic, because they produce single-language deliverables and their processors have thus far picked it up from the map.
A good point has been raised that the map might not have the correct xml:lang for describing the language of a topic that doesn't itself include xml:lang. However if the xml:lang in the map is different from the processor's default, which is more likely to correctly describe the language of the topic: the map, or the processor default? The processor default is more likely to be correct if we assume that a topic that does not declare xml:lang is likely to be in English. However in an environment where people are making an effort to declare xml:lang in all topics including English ones, I would think that accidental omission of xml:lang would be equally likely for all languages.
David H. said that he'd be OK with having a processor pick up the xml:lang on the map and using it to set the processor default. This seems to me to be functionally equivalent to having the xml:lang attribute cascade.
The draft 1.2 spec currently says:
"The @xml:lang value does not cascade from one map to another or from a map to a topic, and an @xml:lang value specified in a map does not override @xml:lang values specified in other maps or in topics. "
I propose changing this to:
"Processors may treat the @xml:lang value as cascading from one map to another or from a map to a topic. An @xml:lang value specified in a map may be used if @xml:lang is not defined in child maps or topics. However, an @xml:lang value specified in a map must never override explicit @xml:lang values specified in other maps or in topics. "
How does that sound? BTW, the draft 1.2 spec already has plenty of wording to encourage people to set xml:lang on every topic.
Su-Laine
Original Message-----
From: Bruce Nevin (bnevin) [mailto:bnevin@cisco.com]
Sent: Thursday, August 05, 2010 12:47 PM
To: bryan.s.schnabel@tektronix.com; dita@lists.oasis-open.org
Subject: RE: [dita] Cascading of xml:lang attribute
No, that's just what I had in mind, Bryan. Thanks! I've been aware of XLIFF for some time, without really looking into it.
The DITA 1.2 spec talks about this issue at "2.1.3.9.1 The @xml:lang attribute". In particular, it says: "The @xml:lang value does not cascade from one map to another or from a map to a topic...." So a value of @xml:lang that is set in a map applies only to content that is immediately contained within the map, such as a title.
Su-Laine's discussion looks at cascading, a top-down notion, from the other direction, bottom up. If @xml:lang is unspecified on some element (a note, in her example), look to the parent element, and do so recursively within the topic, but don't look outside the topic to the map. If no value is set anywhere in that walk up the DOM, assume a default.
I suppose a processor in fact might more efficiently work from the top down. When you encounter an element with @xml:lang set, push the current value onto a stack and use the new value on down the DOM from there until it's overridden or you exit the current branch dominated by the element that carried that value; whereupon take the earlier value back off the push-down stack. When you exit a topic, the last value that you can take off the stack is the system-set default.
(BTW, I just noticed a couple of typos in the section on use with @conref/@conkeyref: "If the reference element does not have an explicit value for the @xml:lang attribute, the effective value for its @xml:lang attribute is determined by using the standard @xml:lang inheritance from the referenced source.." Should be referenced element, and a single period. Do I have to go make a formal comment?)
Su-Laine was reporting that users don't want the overhead of setting @xml:lang on every topic. The spec seems to suggest that this be automated in some way ("applications should") but also makes authors responsible ("authors are urged"):
"Applications should ensure that every highest level topic element explicitly assigns the @xml:lang attribute. Authors are urged to set the @xml:lang attribute in the source language so that the translator may change it in the target language. Because some translation software does not permite translators to add elements, the absence of the @xml:lang element from the source language may result in higher administrative costs for translation." (I assume that "may change it in the target language" means "may change it in a copy (?) of the topic in which the content is translated to the target language")
This gives one strong reason why users might want to bite the bullet and find a way to tag every topic. There is also the issue of sharing content to an environment where a different default language prevails.
/B
>
Original Message-----
> From: bryan.s.schnabel@tektronix.com
> [mailto:bryan.s.schnabel@tektronix.com]
> Sent: Thursday, August 05, 2010 2:18 PM
> To: Bruce Nevin (bnevin); dita@lists.oasis-open.org
> Subject: RE: [dita] Cascading of xml:lang attribute
>
> Hi Bruce,
>
> Regarding your question:
>
> > can't you send fragments smaller than a topic for translation if
> > that's all that has changed?
>
> I think that depends on your method of sending topics out for
> translation. I see this as an absolute real-world
> requirement. And I think a great way is to use the open
> standard for translation, XLIFF, as part of your DITA process.
>
> I hope to resurrect the DITA Translation SC (under the DITA
> Adoption TC) as soon as I can marshal the bandwidth. One of
> the things members of that SC have talked about in the past
> is evaluating the use of XLIFF as a best practice for
> translating DITA. Using XLIFF as your Interchange File Format
> when translating DITA topics enables exactly what you are
> talking about. The idea is that all of the topics referenced
> by a given map would be transformed into an XLIFF file (XLIFF
> is a preferred file format for translation providers and is
> natively consumable by translation tools).
>
> So if fragments within the topic need to be translated the
> XLIFF file can identify the strings that need to be
> translated, and lock (but provide for context) the strings
> that do not need translation. Here's a little sample:
>
> For this topic segment:
>
>
> Lester William Polsfuss aka Les Paul was
> a pioneer of the development to the Solid
> body Electic Guitar and modern recording
> technologies.
> Clarence Leonidas Fender, also known as
> Leo Fender, moved the development of Electric
> Guitars into the modern era, with smaller, more
> portable solid body Electric Guitars.
>
>
> This XLIFF segment would trigger the translator to translate
> the second paragraph, but provide the first paragraph as a
> locked, for context only topic fragment (note the @state attribute):
>
>
>
> Hopefully this answer doesn't abstract too far beyond the
> scenario you had in mind.
>
> - Bryan
>
>
Original Message-----
> From: Bruce Nevin (bnevin) [mailto:bnevin@cisco.com]
> Sent: Thursday, August 05, 2010 8:56 AM
> To: JoAnn Hackos; Chris Nitchie; Su-Laine Yeo; Helfinstine,
> David; DITA TC
> Cc: Robert D Anderson
> Subject: RE: [dita] Cascading of xml:lang attribute
>
> I've not been directly involved with this, but can't you send
> fragments smaller than a topic for translation if that's all
> that has changed? Or are other means used to identify just
> those parts that need the translator's attention?
>
> A topic may have different xml:lang values on different
> fragments in it. Quotations, citations, and legal
> requirements for bilingual environments come to mind.
>
> /B
>
> >
Original Message-----
> > From: JoAnn Hackos [mailto:joann.hackos@comtech-serv.com]
> > Sent: Wednesday, August 04, 2010 4:57 PM
> > To: Bruce Nevin (bnevin); Chris Nitchie; Su-Laine Yeo; Helfinstine,
> > David; DITA TC
> > Cc: Robert D Anderson
> > Subject: RE: [dita] Cascading of xml:lang attribute
> >
> > When we automate the process of sending topics out for
> translation, we
> > ask the translators to change the xml:lang attribute to the correct
> > languages, which in the CMS environment enables the topics to be
> > synchronized correctly with the source language topics. It's very
> > important that the attribute be placed on every topic correctly.
> >
> > When topics are changed, we can are able to send only those
> topics for
> > retranslation or, in some cases, only the individual
> strings that have
> > been changed. All of these controls helps to reduce
> translation costs.
> >
> > The xml:lang attribute at the map level will not have the correct
> > effect. The translators do not see the maps.
> >
> > JoAnn
> >
> > JoAnn Hackos PhD
> > President
> > Comtech Services, Inc.
> > joann.hackos@comtech-serv.com
> > Skype joannhackos
> >
> >
> >
> >
> >
> >
Original Message-----
> > From: Bruce Nevin (bnevin) [mailto:bnevin@cisco.com]
> > Sent: Wednesday, August 04, 2010 9:09 AM
> > To: Chris Nitchie; Su-Laine Yeo; Helfinstine, David; DITA TC
> > Cc: Robert D Anderson
> > Subject: RE: [dita] Cascading of xml:lang attribute
> >
> > If the value of xml:lang cascades to a topic that has no value set,
> > then it would be mandatory (a strongly advised best practice?) for
> > someone or something to set xml:lang on every topic to
> avoid problems.
> > But if we expect every topic to have xml:lang set, then there's no
> > reason to have xml:lang cascade. By that logic, it should
> be up to the
> > processor to decide what assumptions are appropriate when
> xml:lang is
> > not explicitly specified, and on what basis to make such
> assumptions.
> >
> > The descriptive/prescriptive dichotomy isn't apt. Any
> attribute value
> > specifies something descriptively about the content, and
> any attribute
> > value that isn't used for some kind of processing has no
> use case. The
> > value of xml:lang is no exception.
> >
> > /B
> >
> > >
Original Message-----
> > > From: Chris Nitchie [mailto:cnitchie@ptc.com]
> > > Sent: Tuesday, August 03, 2010 8:50 PM
> > > To: Su-Laine Yeo; Helfinstine, David; DITA TC
> > > Cc: Robert D Anderson
> > > Subject: Re: [dita] Cascading of xml:lang attribute
> > >
> > > I would think in such a situation, where you have to
> manage a large
> > > number of languages, the only rational process is to mark
> > each piece
> > > of content with its language. The potential for assigning
> the wrong
> > > language to a piece of content via cascading, processor
> > defaults, or
> > > any other mechanism is higher in such cases than it is for the
> > > customer with only one or two languages.
> > >
> > > If xml:lang cascaded from maps to topics when there's no explicit
> > > xml:lang on the topic, you'd wind up with content in the
> > output marked
> > > with the wrong language via cascading, and we would have to
> > call that
> > > valid DITA processing even though it's obviously incorrect. The
> > > xml:lang and other locale-related attributes are different
> > from other
> > > cascading attributes because they are descriptive, not
> > prescriptive;
> > > they describe the content as it is, not metadata for how it
> > should be
> > > processed. Topics are in a language, and they're in that
> > language no
> > > matter what map references them, and no matter whether
> they specify
> > > xml:lang or not. Allowing a map - or anything else - to impose a
> > > language setting invites outcomes that are simply wrong.
> I suspect,
> > > but can't say for sure, that the language in the spec about
> > processor
> > > defaults is there because something has to establish a language
> > > eventually, but it's not a very good substitute for
> > assigning language
> > > markers on your content.
> > >
> > > Chris
> > >
> > > On 8/3/10 6:07 PM, "Su-Laine Yeo"
> > >