XLIFF Inline Markup SC

  • 1.  Representing information for the starting/ending/standalone parts

    Posted 09-19-2011 16:40
    Hi everyone, As you know we have two representations of span-like codes: 1) <sc> and <ec> that represent the start code and the ending code. Useful when segmentation breaks up spans, for overlapping spans, etc. 2) <pc> that represent the same think, but in a XML-friendly way, so that can be used when the span-like inline code is well-formed. We have several attributes that are store the same type of information for both the start part and the end part. Here is an example: It shows the four types of representations we have, and the optional display information. That information may be different for the starting and ending parts. a) Use the same attribute name as much as possible, and have a "special" one for the ending part when used in the <pc> element? <ph id='1' disp="[br/]"/> <sc rid='1' disp="[b]"/> <ec id='1' disp="[/b]"/> <pc id='1' disp="[b]" dispEnd="[/b]"/> b) Variation of a) but using two different attribute for <pc> to be a bit clearer? [this is the one we have been using so far] <ph id='1' disp="[br/]"/> <sc id='1' disp="[b]"/> <ec id='1' disp="[/b]"/> <pc id='1' dispStart="[b]" dispEnd="[/b]"/> c) Use different attributes depending on each element's type of representation? <ph id='1' disp="[br/]"/> <sc id='1' dispStart="[b]"/> <ec id='1' dispEnd="[/b]"/> <pc id='1' dispStart="[b]" dispend="[/b]"/> This would be applicable to the four following information: - disp (alternative display) - equiv (text equivalent) - nid (reference to original data) - subFlows (points to sub-flows) The names can be worked out ('nid' e.g. is pretty bad), but I'd like to get a consensus on the concept to use. Any thoughts? Thanks, -ys


  • 2.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 09-19-2011 18:25
    My vote: c) Use different attributes depending on each element's type of representation? <ph id='1' disp="[br/]"/> <sc id='1' dispStart="[b]"/> <ec id='1' dispEnd="[/b]"/> <pc id='1' dispStart="[b]" dispend="[/b]"/> (typo on <pc, should be 'dispEnd') Consistent with my preference for clarity over brevity. - Bryan


  • 3.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 09-20-2011 10:58
    My personal preference would be: --- b) Variation of a) but using two different attribute for <pc> to be a bit clearer? [this is the one we have been using so far] <ph id='1' disp="[br/]"/> <sc id='1' disp="[b]"/> <ec id='1' disp="[/b]"/> <pc id='1' dispStart="[b]" dispEnd="[/b]"/> --- The rationale being that adding 'Start' or 'End' to the attribute is redundant when in a start/end code. It may even be leading the user to think there is a dispEnd if she sees a dispStart in a <sc>. Cheers, -ys


  • 4.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 10-04-2011 07:56
    Hi Yves, I share your view. I wonder if we have already discussed the following: When should the native code go into "disp/dispStart/dispend"? Put differently: We discussed local skeletons. When should they be used? Cheers, Christian


  • 5.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 10-04-2011 07:51
    Hi Yves, Thanks for the summary. I guess I lost track of the argument why we need two representations (sc/ec and pc). Could someone please remind me? Cheers, Christian


  • 6.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 10-04-2011 10:46
    Hi Christian, > ...I guess I lost track of the argument why we > need two representations (sc/ec and pc). > Could someone please remind me? It's a good question. Original codes like HTML <b> maybe start in one segment and end in another. So being able to represent the start code and the end code as standalone XLIFF element is necessary because the segments are represented as elements and one cannot have something like this <seg>..<pc>..</seg><seg>..</pc>..</seg>. We could represent all span-like codes with <sc>/<ec>. But we do have currently a requirement (#17): "Should preserve span-like structures" ( http://wiki.oasis-open.org/xliff/OneContentModel/Requirements#Shouldpreservespan-likestructures ) and the <pc> representation is the proposed solution for that. Having <pc> does bring a set of extra attributes because we have to store both starting and ending information in the same element. One of the questions we have not discussed yet is how stringent we want the specification to be on the use of <sc>/<ec> vs. <pc>. On a side note: We could actually even go further in simplicity and have only a unique <ph> element with an optional attribute that would indicate wither it represents a start, end or placeholder code. Overall it wouldn't change the information XLIFF would carry. I think having distinct elements allows a little more control of the validation. Cheers, -yves


  • 7.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 10-07-2011 08:33
    Hi Yves, Thanks for the reminder, and the additional info. Please find some comments below. Cheers, Christian


  • 8.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 10-07-2011 21:54
    Hi Christian, >> ...have only a unique <ph> element with an optional >> attribute that would indicate whether it represents a >> start, end or placeholder code... > > To my ears, the work with just one representation > sounds attractive. Before we even get to discuss a single element representation :) we would have to first contemplate the elimination of the <pc> representation. I'm curious about the opinion of everyone. I'm guessing Bryan is very much in favor of having it. (and it is in the list of requirements). What about others? <sc>/<ec> alone or both <sc>/<ec> and <pc>? Here is a rough list of pros-cons I can thing of: Pro of eliminating <pc>: - It simplifies the implementation to a great extend: No need to find out if the code has a corresponding opening/closing in the same segment. - it reduces our list of attributes, getting rid of all the ...Start and ...End like equivStart/equivEnd. So it's a bit simpler to write/read and implement. Cons of eliminating <pc>: - we don't implement the requirement 17 (should preserve span-like structurs ( http://wiki.oasis-open.org/xliff/OneContentModel/Requirements#Shouldpreservespan-likestructures ) - it may makes life more complicated for XSLT-based processor. (...Although I'm not 100% sure about this. Because having <pc> does not eliminate <sc>/<ec> since you can have span split by segments. So presumably XSLT-based tools will have to deal with <sc>/<ec> anyway). More pros/cons anyone? Cheers, -ys


  • 9.  RE: [xliff-inline] Representing information for the starting/ending/standalone parts

    Posted 10-07-2011 23:11
    Hi Yves, Christian, and SC, > I'm guessing Bryan is very much in favor . . . > What about others? While we wait for the opinions others, I have a few thoughts. My lurking leads me to understand the options as (numeration added by me): > (1) <sc>/<ec> alone or (2) both <sc>/<ec> and <pc>? . . . or (I thought I heard) (3) <pc> alone with attributes to also carry the load of <sc>/<ec> And for the record, yes, I prefer (2) or (3), and I admit, have absolutely no love for (1). One reason for disliking (1) is that I prefer clarity over brevity. I understand immediately what the <sc>, <ec>, and <pc> mean. If we eliminate <pc> and use <sc>'s and <ec>'s in a single span, we need to investigate the attribute id on each to know which goes with the other. Bur if we have a rule that <pc> is always used in a single span, and <sc>/<ec> is only used in cases where the code crosses spans, the nature of the code and the span could not be clearer. And I don't think I agree with, or maybe just don't understand the first bullet under "Pro of eliminating <pc>:" > - It simplifies the implementation to a great extend: > No need to find out if the code has a corresponding > opening/closing in the same segment. I don't think eliminating <pc> simplifies the implementation for the reason you give. In fact, I think if you have a well-formed <pc> with start and end tags, there could be no confusion about whether or not you had a pair in the segment; you would know you have a pair. Finally, in my experience (processing documentation and firmware in and out of XLIFF), the need to cross segments has been a very rare corner case (i.e., we are picking the uglier representation to accommodate the more infrequent use case). This, of course, could be challenged by people who have experience to the contrary. I very much agree that it would be very good to hear the opinions of others. Thank Yves and Christian for shedding further light on this important topic. - Bryan