XLIFF Inline Markup SC

 View Only
  • 1.  Editing operations and spliting segment

    Posted 05-27-2012 13:47
    Hi Fredrik, all, I'm working on integrating the rules for editing a target segment in the draft. For the part about editing when there is an existing target one of the PEs is: MUST NOT Split the segment into two segments And the text explains this as: "The reason to not allow splitting of segments with content in the target node is because there is no guarantee that the content in the two nodes are linguistically in the same order, allowing that operation would pose a risk to the integrity of the content." I'm not sure what "there is no guarantee that the content in the two nodes are linguistically in the same order" means, and what nodes we are talking about. Could you elaborate and/or give an example if you have time? (if not that's ok: we'll clarify this when we get to it during the F2F) Thanks, -ys


  • 2.  RE: [xliff-inline] Editing operations and spliting segment

    Posted 05-27-2012 21:35
    Hi Yves, What I mean there is that given a piece of text in two languages it might be impossible to find one point in the source and one point in the target where the first part of the source match linguistically to the first part of the target and similarly for the second part. Here is a short example, I do not think there is any way to split these into 2+2 pieces and keep the piece pairs mean the same thing. (Apologies if the German is not correct, but I think it is.) "I spoke with him two days ago." "Vor zwei Tagen habe ich mit ihm gesprochen." These pairs make no sense: 1. "I spoke with him" "Vor zwei Tagen" 2. " two days ago." " habe ich mit ihm gesprochen." Regards, Fredrik Estreen ________________________________________ From: xliff-inline@lists.oasis-open.org [xliff-inline@lists.oasis-open.org] on behalf of Yves Savourel [ysavourel@enlaso.com] Sent: Sunday, May 27, 2012 3:47 PM To: xliff-inline@lists.oasis-open.org Subject: [xliff-inline] Editing operations and spliting segment Hi Fredrik, all, I'm working on integrating the rules for editing a target segment in the draft. For the part about editing when there is an existing target one of the PEs is: MUST NOT Split the segment into two segments And the text explains this as: "The reason to not allow splitting of segments with content in the target node is because there is no guarantee that the content in the two nodes are linguistically in the same order, allowing that operation would pose a risk to the integrity of the content." I'm not sure what "there is no guarantee that the content in the two nodes are linguistically in the same order" means, and what nodes we are talking about. Could you elaborate and/or give an example if you have time? (if not that's ok: we'll clarify this when we get to it during the F2F) Thanks, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-inline-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-inline-help@lists.oasis-open.org


  • 3.  RE: [xliff-inline] Editing operations and spliting segment

    Posted 05-29-2012 12:00
    Hi Fredrik, Thanks for the example. I understand your concern better now. However, I wonder if we are not going too far with those processing expectations. For instance: An XLIFF document may be the input for an aligner tool. It would break sentences and align the resulting segments. So its input may be like this: <unit id='1'> <segment> <source>Error detected: File not found.</source> <target>Erreur détectée : Fichier non trouvé.</target> </segment> </unit> And its output like this: <unit id='1'> <segment> <source>Error detected: </source> <target>Erreur détectée : </target> </segment> <segment> <source>File not found.</source> <target>Fichier non trouvé.</target> </segment> </unit> The input would have an existing target and it would be perfectly reasonable to re-segments and align the content. As long as the operations are controlled the results should be valid. In other words, doing anything to the target can make that target a mismatch for the source. Re-segmenting an existing target is no more dangerous that other operations. Also: <target> has an order attribute that can be used to modify the order in which the target segments need to be re-constructed. So even the example you gave could probably be handled properly. Cheers, -ys


  • 4.  RE: [xliff-inline] Editing operations and spliting segment

    Posted 05-29-2012 12:21
    Hi Yves, Yes an align tool that know how to do this properly could do it. What I'm afraid of is gracious splitting. I did not know about an order attribute on target but with such it would be possible to re arrange sentences in target while keeping the source and target pairs consistent. I don't see that attribute in the draft. There is also the potential issues of managing to put the correct tags in the correct piece and correct order within the unit. Regards, Fredrik Estreen >


  • 5.  RE: [xliff-inline] Editing operations and spliting segment

    Posted 05-31-2012 05:48
    Hi Fredrik, > ...I did not know about an order attribute on target > but with such it would be possible to re arrange > sentences in target while keeping the source and > target pairs consistent. > I don't see that attribute in the draft. That's right. It seems it didn't make it to the schema or attributes list yet. We needed it to be able to represent XLIFF 1.2 entries (which can be re-ordered) We'll have to fix this. That may be an interesting challenge for XSD 1.0 since the default needs to be dynamic (i.e. the value for the first segment: 1, for the second: 2, etc.) but I suppose a default to 0 and a processing expectation would o too. Its usage is described in "Segments Order" section. Cheers, -yves


  • 6.  RE: [xliff-inline] Editing operations and spliting segment

    Posted 06-01-2012 09:46
    Hi Yves, I thought a bit about this (re-ordering targets) and I think it has some interesting implications for both inline codes and BiDi. For inline codes it means we need the ID of the codes to be unique in the unit and not segment. And we need to allow the user to use codes from any segment in the <unit> in any other segment as well as mix inline tags from multiple segments in one segment. That in turn will complicate tag checking as you need to check the whole unit and not just a segment at a time. It also complicates match target generation from TM operations as you will send a source segment with possibly a different tag set then you need in target. En example when this arise would be if you have a text with multiple segments that are reordered in the translation for linguistic reasons but need to keep a span starting inside the first and ending inside the last for stylistic reasons. Add some references or index markers which follow the language and you get a mix. How match generation would be able to get a copy of the actual source tag in another segment into target I do not see now. And we need to really copy in order to preserve potentially unknown data in module or third party namespaces. So the processing expectations has to allow this and generally operate at the unit level instead of segment level. For BiDi it provides the answer to where we reset the Unicode BiDi algorithm, segment or unit. I think we need it to be at unit level with re-orderable target. But that seem most in line with the normal per paragraph behavior. It will however be a bit more difficult to implement. Regards, Fredrik Estreen >


  • 7.  RE: [xliff-inline] Editing operations and spliting segment

    Posted 06-02-2012 05:09
    Hi Fredrik, all, It seems we are dealing with two classes of issues here: a) The uniqueness of IDs for inline codes at the unit level. b) Changing the order of the target segments. I believe we have the uniqueness of IDs at the unit level since the start (The fact that the different scopes of the id attribute is not described yet in the spec says how much work is still needed). We are already working at the unit level: Inline codes or annotations can span several segments, original data are defined at the unit level, added codes have id unique within the unit, etc. One way to look at it is that <unit> is a single <segment> initially, so any splitting has to result in unique ids at the unit level afterward since they were unique in the original segment. As for changing the order of target segments, I agree: it brings some problems, especially with codes re-ordering and difference between codes in source and target when matching. But I think those issues exist anyway when you start applying matches. Your target matches may have different codes than the source already. Those challenges exist in 1.2 as well where segments can be re-ordered, and id uniqueness is at the trans-unit level since the segments are defined by <mrk> elements. They are not something 2.0 is introducing. In addition, the alternative (id unique within each segment) brings its own set of issues as well. For example you may have to re-ID some inline codes when joining segments and that would cause problem with merging for the codes without the original data. Cheers, -yves