After our panel discussion today at the symposium and trying to visualize this, I think we may be over-complicating the structure using annotations to point to modules that contain segment-level metadata. For example, here is what we have defined today in the spec: <unit> <segment id="1"> <source>Hello World. Hello World 2.</source> <target>Hello World. Hello World 2.</target> <ctr:changeTrack>...</ctr:changeTrack> <mda:metadata">...</mda:metadata> <val:validation>...</val:validation> </segment> </unit> And the same thing using annotations after re-segmenting in the way I think we've been discussing it, where maybe the second segment needs validation, but the first doesn't, but they both need metadata and they both need change tracking. <unit> <segment 1d="1"> <source><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2" type="metadata" ref="#m1"><mrk id="3" type="validation" ref="#v1">Hello World.</mrk></mrk></mrk></source> <target><mrk id="1" type="changeTrack" ref="#c1"><mrk id="2" type="metadata" ref="#m1"><mrk id="3" type="validation" ref="#v1">Hello World.</mrk></mrk></mrk></target> </segment> <segment id="2"> <source><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2" type="metadata" ref="#m2">Hello World 2.</mrk></mrk></source> <target><mrk id="1" type="changeTrack" ref="#c2"><mrk id="2" type="metadata" ref="#m2">Hello World 2.</mrk></mrk></target> </segment> <ctr:changeTrack id="c1">...</ctr:changeTrack> <mda:metadata id="m1">...</mda:metadata> <val:validation id="v1">...</val:validation> <ctr:changeTrack id="c2">...</ctr:changeTrack> <mda:metadata id="m2">...</mda:metadata> <val:validation id="v3">...</val:validation> </unit> Right away, as Yves pointed out, that is a lot of <mrk> elements (and there would potentially be more with matches, etc.) surrounding the actual source and target text. Also, it is ambiguous, because it looks like I have <mrk> elements embedded in other <mrk> elements and this is technically not the case. Maybe it would make more sense to have each module, or extension, with segment-level metadata, define an attribute that could be used in a custom annotation for referencing. For example, something like a custom "reference" annotation: <unit> <segment 1d="1"> <source><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" val:validationID="v1" translate="yes">Hello World</mrk></source> <target><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" val:validationID="v1" translate="yes">Hello World</mrk></target> </segment> <segment id="2"> <source ><mrk id="2" type="reference" ctr:changeTrackID="c2" mda:metadataID="m2" translate="yes">Hello World 2</mrk><source> <target><mrk id="1" type="reference" ctr:changeTrackID="c1" mda:metadataID="m1" translate="yes">Hello World</mrk></target> </segment> <ctr:changeTrack id="c1">...</ctr:changeTrack> <mda:metadata id="m1">...</mda:metadata> <val:validation id="v1">...</val:validation> <ctr:changeTrack id="c2">...</ctr:changeTrack> <mda:metadata id="m2">...</mda:metadata> <val:validation id="v3">...</val:validation> </unit> What do you think? Ryan
Original Message----- From: xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Yves Savourel Sent: Wednesday, June 12, 2013 5:48 AM To: XLIFF Main List Subject: [xliff] Re-segmentation Hi all, Thinking more about the different solutions for re-segmentation in 2.0, especially about solution #4: - We would have to define PRs for the <segment> attributes like translate, approved, state, etc. Note that translate would logically become a <mrk translate='yes no'>. Is that mean we should always have this info as an <mrk>? - We would have to add an id in all top elements like <matches>, <changeTrack> and allow multiple of them at the <unit> level. - The part that concerns me most is the paradigm shift for developers. Traditionally many tools are segment-based and with solution #4 they would have to change how many metadata for the segments would be stored, and decide what to do with the parts that don't correspond to a segment anymore (overlapping <mrk>s and sub-segment <mrk>). - We may end up with <segment> containing a lot of <mrk> at both ends. It may take some efforts to deal with those. They may have some side effects on functions like TM matching, etc. I'm still relatively sure that #4 is probably the better representation on the long-term, but it is a very big change. So the more feedback before we go that way the better. And we really need examples and working implementation for this. Cheers, -yves --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php