XLIFF Inline Markup SC

  • 1.  Type of codes

    Posted 11-14-2011 16:31
    Hi everyone, One last summary on the type of codes discussion. I've listed below the four options we have: A) ph, pc, sc/ec This is the current draft. We allow both pc and sc/ec, the second being used when the first cannot. The main advantage is that it offers the pc cleanness to users who need it. It has the drawback of making thing a bit more complex for the tools. B) ph, sc/ec This would remove pc, marking all span-like original codes with sc/ec. It simplifies somehow the writing of the documents. Its main drawback is that not having a pc notation is not very friendly for XML-like original formats. C) ph, pc This would change ph to handle also the sc/ec cases. This would add several attributes to ph. Essentially it would transfer the syntax of two element names to new attributes in ph. The pc element would stay to provide a clean markup for the well-formed span-like codes. D) ph This would code all cases with ph: One code to rules them all. The ph element would have many attributes to handle all the cases. This option is especially attractive for tool extracting only placeholder codes. Note: - Regardless of the option, we would still have a distinction between placeholder and span-like codes. In other words, even with option D (<ph> only) a filter should be able to extract an HTML <BR/> and a <B>...</B> making a distinction, for example: <ph id='1'>&lt;BR/></ph> and <ph id='2' kind='start'>&lt;B></ph>...<ph rid='2' kind='end'>&lt;/B></ph>. At some point we'll have to have a discussion about whether extraction tools MUST or only SHOULD do the distinction, but that's independent of the representation. - Keep also <mrk> in mind. The presence of <pc> would make the use of <mrk> a bit more complex: (e.g. <mrk> overlapping <pc>). But not that much because <mrk> can overlap <mrk> too so the use case exist even if <pc> is not there. My personal opinion: I think have distinct elements for original codes that are placeholder vs span-like is useful, even important. So I would not put everything into ph. I would tend to think having the span-like codes handled only with sc/ec is fine. But looking at how v1.2 is used, the requirements we have for 2.0 and the feedback of the past months, I can see there are arguments for also having pc, and since the drawbacks of keeping pc are not that big I would. We just need to make sure it can be mapped transparently to sc/ec and conversely. So I would pick option A. What would you pick and, as importantly, why? (so all can understand the rationale of the choice) Cheers, -yves


  • 2.  RE: [xliff-inline] Type of codes

    Posted 12-07-2011 17:14
    Hi Yves, I vote for A). Rationale: Even though in my personal workload I convert the occasional non-XML code into XLIFF and back (i.e., firmware, proprietary UI code for our instruments, RTF, Interleaf proprietary markup, Framemaker, etc.), it is becoming increasingly clear that most non-XML conversions can/do have XML alternatives. Upshot: excluding the XML friendly <pc> option, and forcing well-formed XML to use the (my opinion, sorry if it's harsh) hackish <sc>/<ec> approach, in the name of simplification, is very much not attractive. Similarly, to exclude the <sc>/<ec> seems to me to be a not attractive option. In real life there are cases where segmentation causes inlines to be separated (start and end tags). I see overloading the <pc> or <ph> with attributes to handle this case (while perfectly logical from an XML processing point of view) to lack clarity. Please feel let me know if I should provide details or examples. Bryan


  • 3.  RE: [xliff-inline] Type of codes

    Posted 12-07-2011 17:31
    Hi Yves, my vote on this is for B. My main reason is that it will allow only one style of tagging. Also supporting <pc> which I agree is nicer from an XML perspective forces all implementations to support two completely different ways to represent the same tagging, spanning and non spanning. It also requires a transformation from <pc> to <sc>/<ec> to be defined in order to allow some cases of sub segmentation. So my argument is simplicity. My second preference would be for D a single <ph> with attributes defining it's usage (given that it is only used non spanning). It provide the same functionality as B but in an in my opinion less clear way. Regards, Fredrik Estreen


  • 4.  Re: [xliff-inline] Type of codes

    Posted 12-07-2011 17:52
    Do we know what equivalent processing models are currently used by various tools? I don't know the internals on most of the tools, so I am asking out of curiosity/ignorance. But my thought is that there is no sense in making a solution that is more elaborate than what most tools actually do. Part of the answer to this question should be based on how do we support what the tools do and need. One of the problems with TMX 1.4b's metamarkup system, for example, is that it required (at the element level) for tools developers to provide information that they simply didn't have. So one tool might treat everything as <it> or <ph> simply because <bpt> and <ept> required more information than their filters provided. -Arle On Dec 7, 2011, at 08:29 , Estreen, Fredrik wrote: > Hi Yves, > > my vote on this is for B. > > My main reason is that it will allow only one style of tagging. Also supporting <pc> which I agree is nicer from an XML perspective forces all implementations to support two completely different ways to represent the same tagging, spanning and non spanning. It also requires a transformation from <pc> to <sc>/<ec> to be defined in order to allow some cases of sub segmentation. So my argument is simplicity. > > My second preference would be for D a single <ph> with attributes defining it's usage (given that it is only used non spanning). It provide the same functionality as B but in an in my opinion less clear way. > > Regards, > Fredrik Estreen > >


  • 5.  RE: [xliff-inline] Type of codes

    Posted 12-07-2011 18:39
    Hi Arle, all, > But my thought is that there is no sense in making > a solution that is more elaborate than what most > tools actually do. Part of the answer to this > question should be based on how do we support what > the tools do and need. Actually I don't think it matters: any of the proposed notation handles the same functionalities as far as using standalone or open/close codes. Only the syntax changes. As for not providing an open/close functionality because some tools (and they are quite a few indeed) do not support those type of codes, I think it would cause an important lose of functionality. We would not be able to map 1.2 file to 2.0 anymore for example; QA tools which use open/close for validation would not be able to work very well; etc. It would penalize quite a few tools, and I have no doubt that we would end up with custom extensions to fill the need :) By the way: Bryan, Fredrik: thanks for posting your viewpoints too. It should help in the decision next week. Hopefully we'll get more posts on the topic. Cheers, -yves


  • 6.  Re: [xliff-inline] Type of codes

    Posted 12-07-2011 19:43
    Hi Yves, Thanks for the response. To be clear, I wasn't suggesting getting rid of the functionality, but rather questioning whether it belongs at the element or the attribute level. I realize that any of the models could support the same functionality, so for me the question is which model comes the closest to the most common philosophy. For example if 95% of tools used only <ph> equivalents (I'm deliberately making an absurd argument, I know), then it would make sense to use only the <ph> element and add the additional functionality as optional attributes. If, on the other hand, we found that most tools use the more elaborate model with matching of start and end tags, that would argue (to me at least) that a <ph>-only model wouldn't be as good a fit. All that is independent of whether or not the simpler (in terms of elements) models can encapsulate the functionality of the more complex element sets. I tend to like the idea of a <ph>-only model at the conceptual level because it captures a fundamental unity and it allows a match (at the element level) between all sorts of tool-specific ways of dealing with markup, thus capturing the similarities. On the other hand, if we use option 1 and a tool comes along that uses only <ph>, then its markup will not match the intention of the system, no matter what we do, because it would be unable to supply the elements we hope to see. Sure, tools could work around that, but if we push the information to attributes, it reveals the fundamental similarity. But I have to admit that this is really an aesthetic/theoretical preference, not one guided by practical concerns. Best, -Arle On Dec 7, 2011, at 09:38 , Yves Savourel wrote: > Hi Arle, all, > >> But my thought is that there is no sense in making >> a solution that is more elaborate than what most >> tools actually do. Part of the answer to this >> question should be based on how do we support what >> the tools do and need. > > Actually I don't think it matters: any of the proposed notation handles the same functionalities as far as using standalone or open/close codes. Only the syntax changes. > > As for not providing an open/close functionality because some tools (and they are quite a few indeed) do not support those type of codes, I think it would cause an important lose of functionality. > We would not be able to map 1.2 file to 2.0 anymore for example; QA tools which use open/close for validation would not be able to work very well; etc. > > It would penalize quite a few tools, and I have no doubt that we would end up with custom extensions to fill the need :) > > By the way: Bryan, Fredrik: thanks for posting your viewpoints too. It should help in the decision next week. Hopefully we'll get more posts on the topic. > > Cheers, > -yves > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: xliff-inline-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: xliff-inline-help@lists.oasis-open.org >


  • 7.  Re: [xliff-inline] Type of codes

    Posted 12-07-2011 19:41
    Hi all, I would go for A, (and C as a second option). The simple reason is that both A and C contain pc that is XML friendly. X(ML)LIFF should not be XML hostile IMHO. Primarily, I am for expressivity. As for simplicity, even A is a big simplification (in good sense) compared to TMX 1.4b and XLIFF 1.2. I agree with Yves that less inline elements does not necessarily mean simplicity, as then the semantics must be carried by attributes anyway.. I consider A the minimum reasonably expressive set. C would be a compromise, largely harmless, still unnecessary IMHO. Rgds dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 mobile:  +353-86-049-34-68 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Mon, Nov 14, 2011 at 16:31, Yves Savourel < ysavourel@enlaso.com > wrote: Hi everyone, One last summary on the type of codes discussion. I've listed below the four options we have: A) ph, pc, sc/ec This is the current draft. We allow both pc and sc/ec, the second being used when the first cannot. The main advantage is that it offers the pc cleanness to users who need it. It has the drawback of making thing a bit more complex for the tools. B) ph, sc/ec This would remove pc, marking all span-like original codes with sc/ec. It simplifies somehow the writing of the documents. Its main drawback is that not having a pc notation is not very friendly for XML-like original formats. C) ph, pc This would change ph to handle also the sc/ec cases. This would add several attributes to ph. Essentially it would transfer the syntax of two element names to new attributes in ph. The pc element would stay to provide a clean markup for the well-formed span-like codes. D) ph This would code all cases with ph: One code to rules them all. The ph element would have many attributes to handle all the cases. This option is especially attractive for tool extracting only placeholder codes. Note: - Regardless of the option, we would still have a distinction between placeholder and span-like codes. In other words, even with option D (<ph> only) a filter should be able to extract an HTML <BR/> and a <B>...</B> making a distinction, for example: <ph id='1'>&lt;BR/></ph> and <ph id='2' kind='start'>&lt;B></ph>...<ph rid='2' kind='end'>&lt;/B></ph>. At some point we'll have to have a discussion about whether extraction tools MUST or only SHOULD do the distinction, but that's independent of the representation. - Keep also <mrk> in mind. The presence of <pc> would make the use of <mrk> a bit more complex: (e.g. <mrk> overlapping <pc>). But not that much because <mrk> can overlap <mrk> too so the use case exist even if <pc> is not there. My personal opinion: I think have distinct elements for original codes that are placeholder vs span-like is useful, even important. So I would not put everything into ph. I would tend to think having the span-like codes handled only with sc/ec is fine. But looking at how v1.2 is used, the requirements we have for 2.0 and the feedback of the past months, I can see there are arguments for also having pc, and since the drawbacks of keeping pc are not that big I would. We just need to make sure it can be mapped transparently to sc/ec and conversely. So I would pick option A. What would you pick and, as importantly, why? (so all can understand the rationale of the choice) Cheers, -yves --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-inline-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-inline-help@lists.oasis-open.org