Hi Arle, I didn't mean to say your examples did not solve the issue. I meant to say that I did not think your examples were not complex enough to require an elegant solution (i.e., I was side-stepping the real issue). My interpretation was that it was just a matter of asking, "does a translator need all the junk adjacent to the translatable strings for (a) context? Or (b) in order to adjust the junk based on the unique needs of the target language"? If 'yes', we then need to add the junk into the trans-unit; if 'no', we just stick it in the skeleton. I assumed your examples led us the 'no' conclusion. But I now see that you meant them to lead us to the 'yes' conclusion (based on the scenario your two bullet points paint). This was not a problem with your examples - it was my misinterpretation of them. To me the very interesting take-away from this thread is a philosophical one. I'll try to explain (but I'll probably not communicate this very well). It sounds to me like you are saying it is the responsibility of the XLIFF spec to solve the problem of: *difficult-to-identify-translatable-vs-non-translatable-strings-within-dense-blobs* My view is that it is the responsibility of the filter/tool maker to do the difficult job of identifying translatable vs. non translatable text. Therefore it is responsibility of the XLIFF spec to provide translators with (1) a clear place to translate translatable strings; (2) enough context to translate the strings accurately, and (3) ability to make adjustments to non-translatable content in cases where the target language causes a need for such adjustment. (1) and (2) seem not too difficult (although not super easy by any means). And (3) seems to be the tricky bit that I confessed to be side-stepping. Sorry that I muddied it up by misreading your examples. (and for the record, I love the kind of person that looks deeply at problems - finding problems is the first half of finding solutions) - Bryan From: Arle Lommel [mailto:
alommel@gala-global.org] Sent: Tuesday, June 14, 2011 2:02 PM To:
xliff-inline@lists.oasis-open.org Subject: Re: [xliff-inline] Complex tag content Hi Brian, You are correct that my examples don't solve the issue. I brought them up as problems, not solutions. (Maybe I'm the kind of person you come to hate: the one who says “but what about this…” and never gives a solution.) I see a few issues with your response: · If we don't push these things to the skeleton, but instead want to handle them as inline elements in the trans-unit (one of the three options we discussed), then we can't use your solution. This may be an argument for using the skeleton solution over the other options. · To expose just the translatable part requires knowledge of what should be touched. Imagine that in my first example there was translatable content in there, how would we even identify it since this function is written in some proprietary hivelogic format? (I presume this function was written this way specifically to make it unintelligible.) Similarly, in the second case, automatically identifying what should be translated could be a real problem. So we are still left with the need for an elegant way to allow access when needed and hide it when not. I don't know if there is a generalizable solution to this problem. Good import filters would help, but who would write an HTML filter (for example) that can identify the hivelogic garbage and parse it? So there will always be a need for access to this kind of crap in an editor environment :-( This is the kind of thing that makes translators want to quit and take up something easy like rocket surgery… There are a lot of issues here, but these examples show why simple solutions sometimes don't work. Best, Arle On Jun 14, 2011, at 16:18 , Schnabel, Bryan S wrote: Hi Arle, Thanks for the gnarly examples (yuck!). In each of these cases my guess is that you'd want to hide the ugliness in the skeleton, and expose just the translated parts in the trans-units, like this: <xliff> <header> <skeleton> <xlf_skel_module:p><xlf_skel_module:source idref="b1" /> <script type="text/javascript"> /* <![CDATA[ */ function hivelogic_enkoder(){var kode= "kode="nrgh@%>,**=,40kwjqho1hgrn+wDudkf1hgrnBkwjqho1hgrn?l+.{@hgrn\000,l+"+ "wDudkf1hgrn.,4.l+wDudkf1hgrn@.{~,5@.l>,40kwjqho1hgrn+?l>3@l+uri>**@{>_%@{g"+ "hnr,\000+fghFrduFkrpiuj1lqwu@V{.;>45.@,f?3+fli6>,0+lDwghFrdufkh1rg@n~f.,l"+ ".k>jwhq1oghnrl?3>l@u+ir*>@*>{~_%__kCuj3q33/____.ijkIugxInuslxm4otzxCY~1>A7"+ "81C/iB6.iol9A/3.oGzjkIugxink4ujCq\001i1/o1nAmzkt4rjkquoB6AoCx.lu-AC-A~\0"+ "01(nFxm6t662b1lmnLxj{Lqxvo{p7rw}{F\\\0014AD:;4F2lE91lro<D261rJ}mnLxj{lq"+ "n7xmFt4l332____44Dr}qwpunn7xmEtDrF91rx{Do00\001F+D5GJ.;myHo{p:~x3{33z____"+ "u{m\00066b6xuomx{{LzrJuh.<Aq==x7=\000b.\177Ih\177\177xm,oh.{y:oxp{~33"+ "____3{z\000u6m66ubmx{oLxr{uz{Fx\000mu.yIhqrt~m,.Hq4u\0003~33:____z\000"+ "yqo\001p{F+mntxC(jkqu@_%__ghnr_%@hgrn%>nrgh@nrgh1vsolw+**,1uhyhuvh+,1mrlq"+ "+**,";x='';for(i=0;i<kode.length;i++){c=kode.charCodeAt(i)-3;if(c<0)c+=12"+ "8;x+=String.fromCharCode(c)}kode=x" ;var i,c,x;while(eval(kode));}hivelogic_enkoder(); /* ]]> */ </script> </xlf_skel_module:p> </skeleton> </header> <body> <trans-unit id="b1"> <source>© 2011 Localization World <inliner name="span" xlf_preserve_att_module:class="style21"> +1 208 263 8178 </inliner> </source> <target>Land</target> </trans-unit> </body> Just to illustrate a possible future mechanism (not meant to be a proposal), I've added some nonexistent XLIFF core and module elements (for example <inliner> would be core and <xlf_skel_module> would be a module), and a nonexistent method for preserving source attributes (xlf_preserv_att_module:source-attribute). And I left the text entities just as they are in source. I think you could make a case that ubiquitous text entities like copyright and nonbreaking-space could/should be thought of on the same level as letters or characters and can be moved, deleted, inserted, etc. (perhaps a contentious assumption). And your second example is not only ugly, it is also (XML) malformed. Ignoring the malformedness, I'd still just stick the ugliness in the skeleton and expose the translator to the content: <xliff> <header> <skeleton> <xlf_skel_module:span dojoType="dojo.data.ItemFileWriteStore" jsId="jsonStore" data= "{"identifier":"id", "label": "label", "items": [ { "id":"AF","label":"<b>IMx</b>", "type":"continent"}, { "id":"AR","label":" <!-- this div is malformed--><div class="aaaa" ", "type":" <xlf_skel_module:source idref="a1" /> " } ] }" ></xlf_skel_module:span> </skeleton> </header> <body> <trans-unit id="a1"> <source>country</source> <target>Land</target> </trans-unit> </body> </xliff> Notice that I skillfully side-stepped the more pressing point made at today's SC meeting, that there needs to be a way of exposing the code (in extreme cases) to the translator. My answer does not address that scenario. And if I read your examples correctly, neither do they. Thanks, Bryan From: Arle Lommel [mailto:
alommel@gala-global.org] Sent: Tuesday, June 14, 2011 12:11 PM To:
xliff-inline@lists.oasis-open.org Subject: [xliff-inline] Complex tag content In line with today's discussion, I wanted to share an example of a complex inline code I found before the meeting: <p>© 2011 Localization World <span class="style21"> +1 208 263 8178</span> <script type="text/javascript"> /* <![CDATA[ */ function hivelogic_enkoder(){var kode= "kode="nrgh@%>,**=,40kwjqho1hgrn+wDudkf1hgrnBkwjqho1hgrn?l+.{@hgrn\000,l+"+ "wDudkf1hgrn.,4.l+wDudkf1hgrn@.{~,5@.l>,40kwjqho1hgrn+?l>3@l+uri>**@{>_%@{g"+ "hnr,\000+fghFrduFkrpiuj1lqwu@V{.;>45.@,f?3+fli6>,0+lDwghFrdufkh1rg@n~f.,l"+ ".k>jwhq1oghnrl?3>l@u+ir*>@*>{~_%__kCuj3q33/____.ijkIugxInuslxm4otzxCY~1>A7"+ "81C/iB6.iol9A/3.oGzjkIugxink4ujCq\001i1/o1nAmzkt4rjkquoB6AoCx.lu-AC-A~\0"+ "01(nFxm6t662b1lmnLxj{Lqxvo{p7rw}{F\\\0014AD:;4F2lE91lro<D261rJ}mnLxj{lq"+ "n7xmFt4l332____44Dr}qwpunn7xmEtDrF91rx{Do00\001F+D5GJ.;myHo{p:~x3{33z____"+ "u{m\00066b6xuomx{{LzrJuh.<Aq==x7=\000b.\177Ih\177\177xm,oh.{y:oxp{~33"+ "____3{z\000u6m66ubmx{oLxr{uz{Fx\000mu.yIhqrt~m,.Hq4u\0003~33:____z\000"+ "yqo\001p{F+mntxC(jkqu@_%__ghnr_%@hgrn%>nrgh@nrgh1vsolw+**,1uhyhuvh+,1mrlq"+ "+**,";x='';for(i=0;i<kode.length;i++){c=kode.charCodeAt(i)-3;if(c<0)c+=12"+ "8;x+=String.fromCharCode(c)}kode=x" ;var i,c,x;while(eval(kode));}hivelogic_enkoder(); /* ]]> */ </script> </p> This is from the LocalizationWorld website and has a <script> tag embedded in a <p>, as you can see. In this case, you could hide this from the translator with no negative consequence, but consider the following (from
https://github.com/maqetta/maqetta/issues/79 ): <span dojoType= "dojo.data.ItemFileWriteStore" jsId= "jsonStore" data= "{"identifier":"id", "label": "label", "items": [ { "id":"AF","label":"<b>IMx</b>", "type":"continent"}, { "id":"AR","label":" <div class="aaaa" ", "type":" country " } ] }" > Presumably this could appear in something going to XLIFF the content of the inline tag contains potentially localizable content (see the blue bit)… It shows that there are cases where discarding or hiding the content of inline markup in XLIFF would create problems. On the other hand, I wouldn't want to expose more translators to either of the cases above. So I'm not sure what the best solution is: you can't live with it and you can't live without it. (I know, ideally there would have been an internationalization process that would have externalized the localizable content in the second example, but we know that won't always happen in the real world.) Best, Arle