OASIS XML Localisation Interchange File Format (XLIFF) TC

Expand all | Collapse all

Segmentation as core or not

  • 1.  Segmentation as core or not

    Posted 11-01-2011 20:55
    Hi all, To continue on the discussion whether the "segmentation" feature is core or not: I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF. But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ). So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences. Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts. <unit id='1'> <part> <source>Sentence one. Sentence two.</source> </part> </unit> <unit id='1'> <part> <source>Sentence one. </source> </part> <part> <source> Sentence two.</source> </part> </unit> Maybe, viewed from that angle it's more clear that such element needs to be part of the core? Cheers, -ys


  • 2.  Re: [xliff] Segmentation as core or not

    Posted 11-02-2011 01:53
    Yves, I want to make sure I understand your view point. Based on what you suggested, it is possible for one to have an entire chapter or book as a single *part* when pass it around in an XLIFF file? If so, why call it a segment? <unit id='1'> <part>  <source>Sentence one. Sentence two. Sentence three. .... Sentence two thousand and forty five.</source> </part> </unit> Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         Yves Savourel <ysavourel@enlaso.com> To:         <xliff@lists.oasis-open.org> Date:         11/01/2011 04:56 PM Subject:         [xliff] Segmentation as core or not Sent by:         <xliff@lists.oasis-open.org> Hi all, To continue on the discussion whether the "segmentation" feature is core or not: I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF. But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ). So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences. Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts. <unit id='1'> <part>  <source>Sentence one. Sentence two.</source> </part> </unit> <unit id='1'> <part>  <source>Sentence one. </source> </part> <part>  <source> Sentence two.</source> </part> </unit> Maybe, viewed from that angle it's more clear that such element needs to be part of the core? Cheers, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 3.  RE: [xliff] Segmentation as core or not

    Posted 11-02-2011 03:02
    Hi Helena,   I guess theoretically it would be possible to have an entire chapter in one “part”. But the extraction tools would not likely do that. Even when there is no sentence-based segmentation the extractors do break down the content into much smaller parts; typically the equivalent of paragraphs for document-type files, or strings for UI-type file.   Actually quite a few tools, especially for software, don’t go beyond that type of segmentation. If you look at many tools for PO files, or Java properties files for examples: Their entries are not often sentence-segmented. And they create TMX files where the entries are called “segments”.   Others may correct me, but I think calling those extracted parts “segments” is simply a relatively common practice.   Personally I think the important thing is to be very clear on what those “part” are, regardless how we end up calling the elements. That said we should obviously pick a name that is not too confusing. It seems “segment” has been used for a while to mean both the container of something un-segmented and segmented (see for example TMX’s <seg>), but maybe I’ve been too deep in TMX/XLIFF/etc. for too long to see the world with un-tainted eyes :)   Hope this helps, -yves     From: Helena S Chapman [mailto:hchapman@us.ibm.com] Sent: Tuesday, November 01, 2011 7:52 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] Segmentation as core or not   Yves, I want to make sure I understand your view point. Based on what you suggested, it is possible for one to have an entire chapter or book as a single *part* when pass it around in an XLIFF file? If so, why call it a segment? <unit id='1'> <part>  <source>Sentence one. Sentence two. Sentence three. .... Sentence two thousand and forty five.</source> </part> </unit> Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         Yves Savourel < ysavourel@enlaso.com > To:         < xliff@lists.oasis-open.org > Date:         11/01/2011 04:56 PM Subject:         [xliff] Segmentation as core or not Sent by:         < xliff@lists.oasis-open.org > Hi all, To continue on the discussion whether the "segmentation" feature is core or not: I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF. But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ). So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences. Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts. <unit id='1'> <part>  <source>Sentence one. Sentence two.</source> </part> </unit> <unit id='1'> <part>  <source>Sentence one. </source> </part> <part>  <source> Sentence two.</source> </part> </unit> Maybe, viewed from that angle it's more clear that such element needs to be part of the core? Cheers, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 4.  RE: [xliff] Segmentation as core or not

    Posted 11-02-2011 13:38
    Hi all, I think we might be putting the cart before the horse. I think David W. and Christian (among others) have an action item to come up with criteria for determining if a proposed or accepted feature is core vs. extended module. Perhaps we should wait until we have a more mature discussion on what criteria we should use, before we try to determine if this feature is core or not. But by all means, continue the technical discussion on this feature. Just thinking out loud here. - Bryan ________________________________________ From: xliff@lists.oasis-open.org [xliff@lists.oasis-open.org] On Behalf Of Yves Savourel [ysavourel@enlaso.com] Sent: Tuesday, November 01, 2011 8:01 PM To: 'Helena S Chapman' Cc: xliff@lists.oasis-open.org Subject: RE: [xliff] Segmentation as core or not Hi Helena, I guess theoretically it would be possible to have an entire chapter in one “part”. But the extraction tools would not likely do that. Even when there is no sentence-based segmentation the extractors do break down the content into much smaller parts; typically the equivalent of paragraphs for document-type files, or strings for UI-type file. Actually quite a few tools, especially for software, don’t go beyond that type of segmentation. If you look at many tools for PO files, or Java properties files for examples: Their entries are not often sentence-segmented. And they create TMX files where the entries are called “segments”. Others may correct me, but I think calling those extracted parts “segments” is simply a relatively common practice. Personally I think the important thing is to be very clear on what those “part” are, regardless how we end up calling the elements. That said we should obviously pick a name that is not too confusing. It seems “segment” has been used for a while to mean both the container of something un-segmented and segmented (see for example TMX’s <seg>), but maybe I’ve been too deep in TMX/XLIFF/etc. for too long to see the world with un-tainted eyes :) Hope this helps, -yves From: Helena S Chapman [ mailto:hchapman@us.ibm.com ] Sent: Tuesday, November 01, 2011 7:52 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] Segmentation as core or not Yves, I want to make sure I understand your view point. Based on what you suggested, it is possible for one to have an entire chapter or book as a single *part* when pass it around in an XLIFF file? If so, why call it a segment? <unit id='1'> <part> <source>Sentence one. Sentence two. Sentence three. .... Sentence two thousand and forty five.</source> </part> </unit> Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From: Yves Savourel <ysavourel@enlaso.com< mailto:ysavourel@enlaso.com >> To: <xliff@lists.oasis-open.org< mailto:xliff@lists.oasis-open.org >> Date: 11/01/2011 04:56 PM Subject: [xliff] Segmentation as core or not Sent by: <xliff@lists.oasis-open.org< mailto:xliff@lists.oasis-open.org >> ________________________________ Hi all, To continue on the discussion whether the "segmentation" feature is core or not: I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF. But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ). So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences. Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts. <unit id='1'> <part> <source>Sentence one. Sentence two.</source> </part> </unit> <unit id='1'> <part> <source>Sentence one. </source> </part> <part> <source> Sentence two.</source> </part> </unit> Maybe, viewed from that angle it's more clear that such element needs to be part of the core? Cheers, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org< mailto:xliff-unsubscribe@lists.oasis-open.org > For additional commands, e-mail: xliff-help@lists.oasis-open.org< mailto:xliff-help@lists.oasis-open.org >


  • 5.  RE: [xliff] Segmentation as core or not

    Posted 11-02-2011 14:08
    It almost read like what the localization
    industry is used to call "segment" is really a "partition".
    Basically something that have been cut, classified but could be further
    divided or broken off into finer fragments? Since I have only been involved
    in localization topic for the last 3-4 years, I am probably close to the
    un-tainted eyes.

    To me, a segment in the localization
    world is something that usually have something to do with payment. That
    is, even if one is paying a service by words, the cost of each word can
    still be determined by the complexity of a segment. (e.g. length etc.)




    From:      
      Yves Savourel <ysavourel@enlaso.com>
    To:      
      Helena S Chapman/San
    Jose/IBM@IBMUS
    Cc:      
      <xliff@lists.oasis-open.org>
    Date:      
      11/01/2011 11:02 PM
    Subject:    
        RE: [xliff]
    Segmentation as core or not




    Hi Helena,
     
    I guess theoretically
    it would be possible to have an entire chapter in one “part”. But the
    extraction tools would not likely do that. Even when there is no sentence-based
    segmentation the extractors do break down the content into much smaller
    parts; typically the equivalent of paragraphs for document-type files,
    or strings for UI-type file.
     
    Actually quite a few
    tools, especially for software, don’t go beyond that type of segmentation.
    If you look at many tools for PO files, or Java properties files for examples:
    Their entries are not often sentence-segmented. And they create TMX files
    where the entries are called “segments”.
     
    Others may correct me,
    but I think calling those extracted parts “segments” is simply a relatively
    common practice.
     
    Personally I think the
    important thing is to be very clear on what those “part” are, regardless
    how we end up calling the elements. That said we should obviously pick
    a name that is not too confusing.
    It seems “segment”
    has been used for a while to mean both the container of something un-segmented
    and segmented (see for example TMX’s <seg>), but maybe I’ve been
    too deep in TMX/XLIFF/etc. for too long to see the world with un-tainted
    eyes :)
     
    Hope this helps,
    -yves
     
     
    From: Helena S Chapman [ mailto:hchapman@us.ibm.com ]

    Sent: Tuesday, November 01, 2011 7:52 PM
    To: Yves Savourel
    Cc: xliff@lists.oasis-open.org
    Subject: Re: [xliff] Segmentation as core or not
     
    Yves, I want to make sure I understand your
    view point. Based on what you suggested, it is possible for one to have
    an entire chapter or book as a single *part* when pass it around in an
    XLIFF file? If so, why call it a segment?

    <unit id='1'>
    <part>
    <source>Sentence one. Sentence two. Sentence three. .... Sentence
    two thousand and forty five.</source>
    </part>
    </unit>

    Best regards,

    Helena Shih Chapman
    Globalization Technologies and Architecture
    +1-720-396-6323 or T/L 938-6323
    Waltham, Massachusetts




    From:         Yves
    Savourel < ysavourel@enlaso.com >

    To:         < xliff@lists.oasis-open.org >

    Date:         11/01/2011
    04:56 PM
    Subject:         [xliff]
    Segmentation as core or not

    Sent by:         < xliff@lists.oasis-open.org >







    Hi all,

    To continue on the discussion whether the "segmentation" feature
    is core or not:

    I think Dave has an obviously valid point when saying that segmentation
    is not necessarily done at the time of the extraction, and therefore we
    could have un-segmented XLIFF.

    But to me a "segment" is not necessarily the result of a segmentation
    process it can be a "block" extracted from the original format
    (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ).
    So each un-segmented entry is, by nature a segment, that simply contains
    potentially several sentences.

    Maybe things would more clear if we think about the element <segment>
    as a "part" rather than a "segment"? The Segmentation
    representation addresses how to organize and manipulate such parts.

    <unit id='1'>
    <part>
    <source>Sentence one. Sentence two.</source>
    </part>
    </unit>

    <unit id='1'>
    <part>
    <source>Sentence one. </source>
    </part>
    <part>
    <source> Sentence two.</source>
    </part>
    </unit>

    Maybe, viewed from that angle it's more clear that such element needs to
    be part of the core?

    Cheers,
    -ys



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
    For additional commands, e-mail: xliff-help@lists.oasis-open.org





  • 6.  RE: [xliff] Segmentation as core or not

    Posted 11-02-2011 17:08
    Hi Helena,   There is a confusion in terminology. Changing the element name to <part> helps in visualization but doesn’t solve the issue at hand.   An XLIFF file is a container for text extracted for localization. If there isn’t text to localize, there is no XLIFF because there is nothing to Interchange (the “L” and “I” in XLIFF are failing).   In many cases, the text extracted for localization needs to be further partitioned to facilitate the translation process. There are cases in which translators prefer to translate paragraphs of text because it produces better translations. In other cases (probably the majority of cases), translators prefer to translate sentences because it facilitates TM matching and translation reuse. The process of splitting extracted text into sentences is known as “segmentation”.   The issue listed in the wiki related to segmentation deals with division of extracted text into “segments” and rearrangement of the segmented text when the boundaries detected by an automated process are not suitable according to the preferences of the translator.   Segmentation can be done during text extraction, when the XLIFF file is created, or in a second pass after the XLIFF has been created. Segmentation also happens at translation time when translators merge or split existing segments.   An XLIFF file must have containers for the extracted text. Having those containers is not a “feature”, it is a necessity. Being able to split the text and store the “segments”, “parts” or “fragments” in the same XLIFF can be viewed as a feature that may be qualified as “core” or “module”.   The proposal currently in the wiki doesn’t make it easy to differentiate between text that has been “extracted” and text that has been “extracted and segmented”. If we had a clear distinction between just extracted and segmented we would be able to tell if the segmentation process and its result belongs to the “core” or “module” category.   When segmentation is done while the XLIFF file is being generated, each segment can be represented as a unit for translation. That was the original way of working with XLIFF 1.0 and 1.1. In XLIFF 1.2 the notion of representing segmentation in the XLIFF document was introduced.   Working with XLIFF 1.2 you can have a segmented file with each <trans-unit> containing one segment or you can have files that contain multiple segments in a <trans-unit> element, each of them enclosed in special markup designed with a combination of <seg-source> and <mrk> elements.   The model for representing segmentation  introduced in XLIFF 1.2 has several problems that must be fixed in XLIFF 2.0.   The proposal for using <unit>, <segment> and <ignorable> that we have in current draft of the XLIFF schema allows representing segmentation. The problem with the schema is that it does not tell you if the text contained in the XLIFF file has been just extracted or extracted and segmented.   The work you did with Yves in the wiki helps in understanding the status of the extracted text. With the attributes, elements and processing expectations you designed it is possible to know if the text has been segmented, if further segmentation is allowed and what restrictions apply. It’s a very nice design.   The discussion is about the qualification of your work. Is it essential of is it optional? If essential, that’s a “core” feature and the used elements and attributes should be in the main XML Schema and documented as integral part of XLIFF. If  representing segmentation is an optional goal, then those elements and attributes should live in a separate optional XML Schema (a “module”) and documented in an annex of the specification or in a separate guideline.   In my personal opinion, representing segmentation as was designed should be a required part of the XLIFF 2.0 standard. I would call it a “core” feature.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of Helena S Chapman Sent: Wednesday, November 02, 2011 12:07 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: RE: [xliff] Segmentation as core or not   It almost read like what the localization industry is used to call "segment" is really a "partition". Basically something that have been cut, classified but could be further divided or broken off into finer fragments? Since I have only been involved in localization topic for the last 3-4 years, I am probably close to the un-tainted eyes. To me, a segment in the localization world is something that usually have something to do with payment. That is, even if one is paying a service by words, the cost of each word can still be determined by the complexity of a segment. (e.g. length etc.) From:         Yves Savourel < ysavourel@enlaso.com > To:         Helena S Chapman/San Jose/IBM@IBMUS Cc:         < xliff@lists.oasis-open.org > Date:         11/01/2011 11:02 PM Subject:         RE: [xliff] Segmentation as core or not Hi Helena,   I guess theoretically it would be possible to have an entire chapter in one “part”. But the extraction tools would not likely do that. Even when there is no sentence-based segmentation the extractors do break down the content into much smaller parts; typically the equivalent of paragraphs for document-type files, or strings for UI-type file.   Actually quite a few tools, especially for software, don’t go beyond that type of segmentation. If you look at many tools for PO files, or Java properties files for examples: Their entries are not often sentence-segmented. And they create TMX files where the entries are called “segments”.   Others may correct me, but I think calling those extracted parts “segments” is simply a relatively common practice.   Personally I think the important thing is to be very clear on what those “part” are, regardless how we end up calling the elements. That said we should obviously pick a name that is not too confusing. It seems “segment” has been used for a while to mean both the container of something un-segmented and segmented (see for example TMX’s <seg>), but maybe I’ve been too deep in TMX/XLIFF/etc. for too long to see the world with un-tainted eyes :)   Hope this helps, -yves     From: Helena S Chapman [ mailto:hchapman@us.ibm.com ] Sent: Tuesday, November 01, 2011 7:52 PM To: Yves Savourel Cc: xliff@lists.oasis-open.org Subject: Re: [xliff] Segmentation as core or not   Yves, I want to make sure I understand your view point. Based on what you suggested, it is possible for one to have an entire chapter or book as a single *part* when pass it around in an XLIFF file? If so, why call it a segment? <unit id='1'> <part> <source>Sentence one. Sentence two. Sentence three. .... Sentence two thousand and forty five.</source> </part> </unit> Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         Yves Savourel < ysavourel@enlaso.com > To:         < xliff@lists.oasis-open.org > Date:         11/01/2011 04:56 PM Subject:         [xliff] Segmentation as core or not Sent by:         < xliff@lists.oasis-open.org >   Hi all, To continue on the discussion whether the "segmentation" feature is core or not: I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF. But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ). So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences. Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts. <unit id='1'> <part> <source>Sentence one. Sentence two.</source> </part> </unit> <unit id='1'> <part> <source>Sentence one. </source> </part> <part> <source> Sentence two.</source> </part> </unit> Maybe, viewed from that angle it's more clear that such element needs to be part of the core? Cheers, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 7.  RE: [xliff] Segmentation as core or not

    Posted 11-03-2011 19:04

    Here is a simple XML example.

    <?xml version="1.0" encoding="utf-8"?>
    <document>
      <title>This is my document title</title>
      <body id="one" short="Document's short description">
        <para num="first">This document describes how the user is to use product [product].  The first
              step is to press the <bi>start</bi> button; there are no other actions.
        </para>    
      </body>
    </document>

    Say this format is unique to this product, so no translation tools supports it.  The developer has been told to provide only XLIFF files for globalization purposes.  He does not know about terminology, word counting, segmentation, etc.  He only know what pieces of text are translatable.  The XLIFF 1.2 file he would probably generate would be:

    <?xml version="1.0" encoding="utf-8"?>
        <xliff version="1.2" xml:lang="EN">
         <file source-language="EN" datatype="plaintext" original="file.xml">
          <header></header>
          <body>
            <trans-unit id="1">
               <source> This is my document title</source>
             </trans-unit>
            <trans-unit id="2">
               <source>Document's short description</source>
             </trans-unit>
            <trans-unit id="3">
               <source>This document describes how the user is to use product <x id="1"/>.  The first
            step is to press the <g id="2">start</g> button; there are no other actions.
            </source>
             </trans-unit>
           </body>
      </file>
    </xliff>

    This simple XLIFF file should not contain segmentation information because some tools may not care about segmentation.  For example, a program for term extraction, spell checking, word counting, or grammar checking only cares about the readable text.

    In  my opinion, the XLIFF "core" elements should be the minimum set of elements which are required to extract the source text from the original file format in such a way that the source text can be replaced by the translated text in the original file, and the translated file will be usable by the product.  This would include:


    Identify that the XML file contains XLIFF information.
    Identify the source file from which the text was extracted and its attributes, like file name, source language, original file format, etc.
    Identify each contiguous block of text based on the source file's formatting rules.
    Identify non-translatable inline items which are imbedded in the text.
    Identify text formatting requirements, like text on a single line, reflowable, maximum length, etc.

    Any XLIFF elements or attributes which are needed for a specific application's use would be placed in a "module".



    David

    Corporate Globalization Tool Development
    EMail:  waltersd@us.ibm.com          
    Phone: (507) 253-7278,   T/L:553-7278,   Fax: (507) 253-1721

    CHKPII:                     http://w3-03.ibm.com/globalization/page/2011
    TM file formats:     http://w3-03.ibm.com/globalization/page/2083
    TM markups:         http://w3-03.ibm.com/globalization/page/2071


    "Rodolfo M. Raya" ---11/02/2011 12:09:10 PM---Hi Helena,





    From:

    "Rodolfo M. Raya" <rmraya@maxprograms.com>



    To:

    <xliff@lists.oasis-open.org>



    Date:

    11/02/2011 12:09 PM



    Subject:

    RE: [xliff] Segmentation as core or not



    Sent by:

    <xliff@lists.oasis-open.org>




    Hi Helena,
     
    There is a confusion in terminology. Changing the element name to <part> helps in visualization but doesn’t solve the issue at hand.
     
    An XLIFF file is a container for text extracted for localization. If there isn’t text to localize, there is no XLIFF because there is nothing to Interchange (the “L” and “I” in XLIFF are failing).
     
    In many cases, the text extracted for localization needs to be further partitioned to facilitate the translation process. There are cases in which translators prefer to translate paragraphs of text because it produces better translations. In other cases (probably the majority of cases), translators prefer to translate sentences because it facilitates TM matching and translation reuse. The process of splitting extracted text into sentences is known as “segmentation”.
     
    The issue listed in the wiki related to segmentation deals with division of extracted text into “segments” and rearrangement of the segmented text when the boundaries detected by an automated process are not suitable according to the preferences of the translator.
     
    Segmentation can be done during text extraction, when the XLIFF file is created, or in a second pass after the XLIFF has been created. Segmentation also happens at translation time when translators merge or split existing segments.
     
    An XLIFF file must have containers for the extracted text. Having those containers is not a “feature”, it is a necessity. Being able to split the text and store the “segments”, “parts” or “fragments” in the same XLIFF can be viewed as a feature that may be qualified as “core” or “module”.
     
    The proposal currently in the wiki doesn’t make it easy to differentiate between text that has been “extracted” and text that has been “extracted and segmented”. If we had a clear distinction between just extracted and segmented we would be able to tell if the segmentation process and its result belongs to the “core” or “module” category.
     
    When segmentation is done while the XLIFF file is being generated, each segment can be represented as a unit for translation. That was the original way of working with XLIFF 1.0 and 1.1. In XLIFF 1.2 the notion of representing segmentation in the XLIFF document was introduced.
     
    Working with XLIFF 1.2 you can have a segmented file with each <trans-unit> containing one segment or you can have files that contain multiple segments in a <trans-unit> element, each of them enclosed in special markup designed with a combination of <seg-source> and <mrk> elements.
     
    The model for representing segmentation  introduced in XLIFF 1.2 has several problems that must be fixed in XLIFF 2.0.
     
    The proposal for using <unit>, <segment> and <ignorable> that we have in current draft of the XLIFF schema allows representing segmentation. The problem with the schema is that it does not tell you if the text contained in the XLIFF file has been just extracted or extracted and segmented.
     
    The work you did with Yves in the wiki helps in understanding the status of the extracted text. With the attributes, elements and processing expectations you designed it is possible to know if the text has been segmented, if further segmentation is allowed and what restrictions apply. It’s a very nice design.
     
    The discussion is about the qualification of your work. Is it essential of is it optional? If essential, that’s a “core” feature and the used elements and attributes should be in the main XML Schema and documented as integral part of XLIFF. If  representing segmentation is an optional goal, then those elements and attributes should live in a separate optional XML Schema (a “module”) and documented in an annex of the specification or in a separate guideline.
     
    In my personal opinion, representing segmentation as was designed should be a required part of the XLIFF 2.0 standard. I would call it a “core” feature.
     
    Regards,
    Rodolfo
    --
    Rodolfo M. Raya       rmraya@maxprograms.com
    Maxprograms       http://www.maxprograms.com
     
    From:  xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Helena S Chapman
    Sent:  Wednesday, November 02, 2011 12:07 PM
    To:  Yves Savourel
    Cc:  xliff@lists.oasis-open.org
    Subject:  RE: [xliff] Segmentation as core or not
     
    It almost read like what the localization industry is used to call "segment" is really a "partition". Basically something that have been cut, classified but could be further divided or broken off into finer fragments? Since I have only been involved in localization topic for the last 3-4 years, I am probably close to the un-tainted eyes.  

    To me, a segment in the localization world is something that usually have something to do with payment. That is, even if one is paying a service by words, the cost of each word can still be determined by the complexity of a segment. (e.g. length etc.)




    From:         Yves Savourel < ysavourel@enlaso.com >  
    To:         Helena S Chapman/San Jose/IBM@IBMUS  
    Cc:         < xliff@lists.oasis-open.org >  
    Date:         11/01/2011 11:02 PM  
    Subject:         RE: [xliff] Segmentation as core or not  


    Hi Helena,  
     
    I guess theoretically it would be possible to have an entire chapter in one “part”. But the extraction tools would not likely do that. Even when there is no sentence-based segmentation the extractors do break down the content into much smaller parts; typically the equivalent of paragraphs for document-type files, or strings for UI-type file.  
     
    Actually quite a few tools, especially for software, don’t go beyond that type of segmentation. If you look at many tools for PO files, or Java properties files for examples: Their entries are not often sentence-segmented. And they create TMX files where the entries are called “segments”.  
     
    Others may correct me, but I think calling those extracted parts “segments” is simply a relatively common practice.  
     
    Personally I think the important thing is to be very clear on what those “part” are, regardless how we end up calling the elements. That said we should obviously pick a name that is not too confusing.  
    It seems “segment” has been used for a while to mean both the container of something un-segmented and segmented (see for example TMX’s <seg>), but maybe I’ve been too deep in TMX/XLIFF/etc. for too long to see the world with un-tainted eyes :)  
     
    Hope this helps,  
    -yves  
     
     
    From:  Helena S Chapman [ mailto:hchapman@us.ibm.com ]
    Sent:  Tuesday, November 01, 2011 7:52 PM
    To:  Yves Savourel
    Cc:   xliff@lists.oasis-open.org
    Subject:  Re: [xliff] Segmentation as core or not  
     
    Yves, I want to make sure I understand your view point. Based on what you suggested, it is possible for one to have an entire chapter or book as a single *part* when pass it around in an XLIFF file? If so, why call it a segment?

    <unit id='1'>
    <part>
    <source>Sentence one. Sentence two. Sentence three. .... Sentence two thousand and forty five.</source>
    </part>
    </unit>

    Best regards,

    Helena Shih Chapman
    Globalization Technologies and Architecture
    +1-720-396-6323 or T/L 938-6323
    Waltham, Massachusetts




    From:         Yves Savourel < ysavourel@enlaso.com >  
    To:         < xliff@lists.oasis-open.org >  
    Date:         11/01/2011 04:56 PM  
    Subject:         [xliff] Segmentation as core or not  
    Sent by:         < xliff@lists.oasis-open.org >    




    Hi all,

    To continue on the discussion whether the "segmentation" feature is core or not:

    I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF.

    But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ).
    So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences.

    Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts.

    <unit id='1'>
    <part>
    <source>Sentence one. Sentence two.</source>
    </part>
    </unit>

    <unit id='1'>
    <part>
    <source>Sentence one. </source>
    </part>
    <part>
    <source> Sentence two.</source>
    </part>
    </unit>

    Maybe, viewed from that angle it's more clear that such element needs to be part of the core?

    Cheers,
    -ys



    ---------------------------------------------------------------------
    To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org
    For additional commands, e-mail: xliff-help@lists.oasis-open.org




  • 8.  Core vs. Module (was RE: [xliff] Segmentation as core or not)

    Posted 11-03-2011 19:31
    Hi all,   I agree with David on the first part of his note. But I changed the subject line to focus on the second part of David's note (which I also tend to agree with).   <snip>   In  my opinion, the XLIFF "core" elements should be the minimum set of elements which are required to extract the source text from the original file format in such a way that the source text can be replaced by the translated text in the original file, and the translated file will be usable by the product.  This would include: 1. Identify that the XML file contains XLIFF information. 2. Identify the source file from which the text was extracted and its attributes, like file name, source language, original file format, etc. 3. Identify each contiguous block of text based on the source file's formatting rules. 4. Identify non-translatable inline items which are imbedded in the text. 5. Identify text formatting requirements, like text on a single line, reflowable, maximum length, etc.   Any XLIFF elements or attributes which are needed for a specific application's use would be placed in a "module".   David   </snip>   I think this is an important argument/discussion. My opinion is that we should try have a separate discussion on criteria for classifying any feature as Core vs. Module, as a first step, and then apply that criteria to each feature (and the above criteria from David might be what we decide on).   The other option is to try to decide, feature by feature, if it is core or module without established criteria. In other words, it would be entirely possible that we would have a conversation each time that goes something like this:   "I think feature N should be Core because a , b , and c ."   (and then)   "And I base that on my opinion that in order to be core, a feature must be x , y , and z "   And I could see that each of us, might have a different x , y , and z . And we might even change our own x , y , and z from feature to feature.   Thanks,   Bryan  


  • 9.  What is XLIFF trying to solve? That's the real question (was: Core vs. Module)

    Posted 11-03-2011 20:19
    On reading Bryan’s statement, I decided to go back to the XLIFF Charter and see if it clearly defines the problem that XLIFF is to solve rather than assume I know what XLIFF is supposed to do. If it does define the problem, then the core vs. module issue would be easily settled based on whether a feature is essential to meet the charter’s goals (core) or not (module). Unfortunately the current charter’s Statement of Purpose as a whole isn’t actually a Statement of Purpose, but rather a grab-bag of observations about the industry, discussion of the past, talk about things that could be done, etc. The closest to statement of purpose is this bit: The purpose of the OASIS XLIFF TC is to define, through extensible XML vocabularies, and promote the adoption of, a specification for the interchange of localisable software and document based objects and related metadata.  That, actually, is the germ of a statement of purpose, but it accounts for less than 9% of the text in the statement. It is also, unfortunately, too vague to make these determinations. So I would like to suggest that we make a statement of purpose, even informal, that states clearly what our goals are. I've gone back and looked at David’s proposed 2.0 Charter again, and I am not sure it is quite what is needed either. Here is the most relevant portions from the August 16 version (the most recent I find at the moment): Core – Basic part of the specification that contains all and only substantial elements that cannot possibly be excluded without negatively affecting the standard’s capability to allow for basic language technology related transformations. Unfortunately “basic language technology related transformations” is undefined and likely to be the subject of some contention since one tool might see something as vital that another tools sees as secondary. Module – a part of the specification that fulfills all of the following conditions i)             Does not overlap with Core ii)            Is compatible with Core iii)           Comprises all elements and their processing rules that form a meaningful functional whole Description of Business Needs the Program should address Customers’ voice: The 1.x standard is too complex The 1.x standard has too generous extensibility The 1.x standard lacks explicit conformance criteria The overall goal is to ensure interoperability throughout Language Technology related content transformations during the whole content lifecycle . Although the XLIFF 1.x standard was intended primarily as an exchange format the industry practice shows that the defined format is also suitable for storage and legacy content leverage purposes. While David's proposed charter was admirable, it too lacks a clear statement of the scope of the problem XLIFF is to address. (And I can't fault David, since I reviewed it and I certainly never noticed until today that it doesn't define the problem.) The bit about “ensur[ing] interoperability throughout Language Technology related content transformations during the whole content lifecycle” sounds good, but unfortunately it too is a bit on the vague side since anyone could read almost any issue into that. Everything we are discussing could fit under that rubric. So I think that we have a fundamental issue here in that we may all assume different scopes and purposes without realizing it and therefore disagree on core vs. module as a result. If we look at  http://wiki.oasis-open.org/xliff/XLIFF2.0/FeatureTracking with this lack of clarity in mind, what we'll find is that the argument about core vs. module for the features is circular because we don’t already know what the criteria are and we are actually arguing about the criteria under the cover of the features (and vice versa). If David and Helena say that segmentation is core and someone else disagrees, we don’t have a basis for that determination right now other than whether individuals want or don't want it and whether tools do this or do not (but even that is a value judgment). I hadn't realized this was the problem until today. But what it means is that we need to step back and come to some agreement about the problem we are to solve in terms of principles defined abstractly (i.e., not in terms of specific features). Then we can map the features to the abstract description and not get lost in arguments about core vs. module informed by diverging assumptions. Note that I am not arguing for some sort of doctrinaire position: If we come up with a statement of purpose but find something not covered that we all agree is really critical for marking the standard more useful, we should feel free to include it upon discussion. So, although this mail is already too long, let me end with some questions: 1. What are the central business and technical needs that XLIFF 2.0 is to address? (Maybe we need a matrix listing multiple things and let people vote on how important they are. Ideally we'll end up with some logically coherent problem that XLIFF solves and a list of principles needed to meet that need.) 2. What does XLIFF 1.2 already do really well that should be brought forward to the core of XLIFF 2.0? (Obviously, we don't want to lose XLIFF 1.2 functionality in XLIFF 2.0.) I think if we can answer those questions, the discussion of other things will become much easier. It might also help to look for some parallel standards in other areas and see how they define their problems. Best, -Arle


  • 10.  Re: [xliff] What is XLIFF trying to solve? That's the real question (was: Core vs. Module)

    Posted 11-04-2011 22:16
      |   view attached
    Thanks Arle, for this insightful analysis. I agree that Core vs. Module is dependent on definitions that we currently lack. This is why I think that discussing inclusion of any singular feature in core is premature. I think that you extracted the core statements from both Charters (the TC Charter and the WIP XLIFF 2.0 Program Charter). I am very well aware of the 'Core' definition issue with the WIP Charter, hence I have proposed charting the business processes in which XLIFF is expected to play the leading role. This approach is also reflected in the current WIP version of the Charter on SVN Rev 17 2011-10-04 10:02:43 GMT Author:  David.Filip [ Core – Basic part of the specification that contains all and only substantial elements that cannot possibly be excluded without negatively affecting the standard’s capability to allow for basic language technology related transformations . [ongoing discussion on this concept, DavidF will work on deriving this concept from main success scenario rather than the vague notion of a basic LT transformation – presented on 2 nd XLIFF Symposium, got buy in and support from industry, i.e. Andrew Pimlott] ] The state of definitions in the WIP Charter reflects the current state of discussion in the TC, i.e. lack of clarity and consensus in the TC on what substantial is and what not. I agree that feature specific discussions at this stage naturally tend to become circular. Based on earlier discussion with Yves, I have proposed to base the notion of 'substantial' on modeling the main success scenarios. With this goal in mind and as part of P&L SC located Charter work, I have drafted a preliminary high level process diagram (attached as BPMN 2 compliant xpdl and also a png image) that could help us (in a later more mature version reflecting hopefully SC and TC discussions) pinpoint the right overall goal, including basic definitions crucial for discerning core vs. module. The current (attached) version of the overall XLIFF Business Process suggests that there are some six high level functional areas that are more or less relevant for XLIFF creation, processing, and consumption. I believe that all of them are relevant and that we might end up with some +-2 functional areas to be described in lower level detail to facilitate the following: i) Provide unambiguous and sufficiently detailed definitions needed for stating XLIFF goal and substantial functions/parts ii) Provide framework for stating processing requirements [in terms of what is admissible 'before' and 'after' on top of basic constraints of well-formedness and validity] If the currently proposed (or similar) framework of main functional areas (business processes and main success scenarios) will be sanctioned by the TC, the issue of core vs. module can be stated from two different point of views. EITHER  1) Core is all and only such elements that are necessary for extraction of source and re-import of target. [Only the first Pool in the current model, all other areas would be considered specialized modules] [I gather that such approach might be favored by large buyers] OR 2) In each and every Pool, we would identify as core the minimum that is required for at all including activities from that pool. Modules would comprise advanced functionality for each pool. [A hybrid approach would try to pick a few pools that would be considered core activities along with the first Pool. Prominently maybe translating and its pre-requisites.] Considering what Arle pointed out and what I tried to explain above, I would second Bryan proposal to *continue technical work on owned features* (to continue moving maturing features into section 1 on wiki) * while consciously deferring determination of their Core vs. Module status* to a later stage where we will have elaborated the main success scenarios and will have chosen the overall approach to Core vs. Module criteria [based on minimum requirements of consensually identified main success scenarios]. Based on this thinking aloud, it occurred to me that we might well need to split features we are currently working on into core and module parts later on; that would be in case of option 2) approach to Core vs. Module that now seems more viable to me. That is, I believe, another good reason for continued and intensified Charter work and for deferring Core status discussions partaiming particular features. Thanks for your attention dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 mobile:  +353-86-049-34-68 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Thu, Nov 3, 2011 at 20:18, Arle Lommel < alommel@gala-global.org > wrote: On reading Bryan’s statement, I decided to go back to the XLIFF Charter and see if it clearly defines the problem that XLIFF is to solve rather than assume I know what XLIFF is supposed to do. If it does define the problem, then the core vs. module issue would be easily settled based on whether a feature is essential to meet the charter’s goals (core) or not (module). Unfortunately the current charter’s Statement of Purpose as a whole isn’t actually a Statement of Purpose, but rather a grab-bag of observations about the industry, discussion of the past, talk about things that could be done, etc. The closest to statement of purpose is this bit: The purpose of the OASIS XLIFF TC is to define, through extensible XML vocabularies, and promote the adoption of, a specification for the interchange of localisable software and document based objects and related metadata.  That, actually, is the germ of a statement of purpose, but it accounts for less than 9% of the text in the statement. It is also, unfortunately, too vague to make these determinations. So I would like to suggest that we make a statement of purpose, even informal, that states clearly what our goals are. I've gone back and looked at David’s proposed 2.0 Charter again, and I am not sure it is quite what is needed either. Here is the most relevant portions from the August 16 version (the most recent I find at the moment): Core – Basic part of the specification that contains all and only substantial elements that cannot possibly be excluded without negatively affecting the standard’s capability to allow for basic language technology related transformations. Unfortunately “basic language technology related transformations” is undefined and likely to be the subject of some contention since one tool might see something as vital that another tools sees as secondary. Module – a part of the specification that fulfills all of the following conditions i)             Does not overlap with Core ii)            Is compatible with Core iii)           Comprises all elements and their processing rules that form a meaningful functional whole Description of Business Needs the Program should address Customers’ voice: The 1.x standard is too complex The 1.x standard has too generous extensibility The 1.x standard lacks explicit conformance criteria The overall goal is to ensure interoperability throughout Language Technology related content transformations during the whole content lifecycle . Although the XLIFF 1.x standard was intended primarily as an exchange format the industry practice shows that the defined format is also suitable for storage and legacy content leverage purposes. While David's proposed charter was admirable, it too lacks a clear statement of the scope of the problem XLIFF is to address. (And I can't fault David, since I reviewed it and I certainly never noticed until today that it doesn't define the problem.) The bit about “ensur[ing] interoperability throughout Language Technology related content transformations during the whole content lifecycle” sounds good, but unfortunately it too is a bit on the vague side since anyone could read almost any issue into that. Everything we are discussing could fit under that rubric. So I think that we have a fundamental issue here in that we may all assume different scopes and purposes without realizing it and therefore disagree on core vs. module as a result. If we look at  http://wiki.oasis-open.org/xliff/XLIFF2.0/FeatureTracking with this lack of clarity in mind, what we'll find is that the argument about core vs. module for the features is circular because we don’t already know what the criteria are and we are actually arguing about the criteria under the cover of the features (and vice versa). If David and Helena say that segmentation is core and someone else disagrees, we don’t have a basis for that determination right now other than whether individuals want or don't want it and whether tools do this or do not (but even that is a value judgment). I hadn't realized this was the problem until today. But what it means is that we need to step back and come to some agreement about the problem we are to solve in terms of principles defined abstractly (i.e., not in terms of specific features). Then we can map the features to the abstract description and not get lost in arguments about core vs. module informed by diverging assumptions. Note that I am not arguing for some sort of doctrinaire position: If we come up with a statement of purpose but find something not covered that we all agree is really critical for marking the standard more useful, we should feel free to include it upon discussion. So, although this mail is already too long, let me end with some questions: 1. What are the central business and technical needs that XLIFF 2.0 is to address? (Maybe we need a matrix listing multiple things and let people vote on how important they are. Ideally we'll end up with some logically coherent problem that XLIFF solves and a list of principles needed to meet that need.) 2. What does XLIFF 1.2 already do really well that should be brought forward to the core of XLIFF 2.0? (Obviously, we don't want to lose XLIFF 1.2 functionality in XLIFF 2.0.) I think if we can answer those questions, the discussion of other things will become much easier. It might also help to look for some parallel standards in other areas and see how they define their problems. Best, -Arle Attachment: Diagram 1.xpdl Description: Binary data Attachment: GeneralBP.png Description: PNG image


  • 11.  RE: [xliff] Segmentation as core or not

    Posted 11-03-2011 19:39
    Hi David,   Your example deals with XLIFF 1.2 and we are working on XLIFF 2.0.   Please provide an example based on the XML Schema draft for XLIFF 2.0 that we have in SVN. This will help you differentiating between basic elements and attributes required in XLIFF 2.0 and those optional parts that should be in a module.   Notice that your example can be fully segmented with the elements already included in the schema draft. Prepare your example with what we have now in the draft and we will know what elements are not needed and should go to a module.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From: xliff@lists.oasis-open.org [mailto:xliff@lists.oasis-open.org] On Behalf Of David Walters Sent: Thursday, November 03, 2011 4:47 PM To: xliff@lists.oasis-open.org Subject: RE: [xliff] Segmentation as core or not   Here is a simple XML example. <?xml version="1.0" encoding="utf-8"?> <document>   <title>This is my document title</title>   <body id="one" short="Document's short description">     <para num="first">This document describes how the user is to use product [product].  The first           step is to press the <bi>start</bi> button; there are no other actions.     </para>       </body> </document> Say this format is unique to this product, so no translation tools supports it.  The developer has been told to provide only XLIFF files for globalization purposes.  He does not know about terminology, word counting, segmentation, etc.  He only know what pieces of text are translatable.  The XLIFF 1.2 file he would probably generate would be: <?xml version="1.0" encoding="utf-8"?>     <xliff version="1.2" xml:lang="EN">      <file source-language="EN" datatype="plaintext" original="file.xml">       <header></header>       <body>         <trans-unit id="1">            <source>This is my document title</source>          </trans-unit>         <trans-unit id="2">            <source>Document's short description</source>          </trans-unit>         <trans-unit id="3">            <source>This document describes how the user is to use product <x id="1"/>.  The first         step is to press the <g id="2">start</g> button; there are no other actions.         </source>          </trans-unit>        </body>   </file> </xliff> This simple XLIFF file should not contain segmentation information because some tools may not care about segmentation.  For example, a program for term extraction, spell checking, word counting, or grammar checking only cares about the readable text. In  my opinion, the XLIFF "core" elements should be the minimum set of elements which are required to extract the source text from the original file format in such a way that the source text can be replaced by the translated text in the original file, and the translated file will be usable by the product.  This would include: Identify that the XML file contains XLIFF information. Identify the source file from which the text was extracted and its attributes, like file name, source language, original file format, etc. Identify each contiguous block of text based on the source file's formatting rules. Identify non-translatable inline items which are imbedded in the text. Identify text formatting requirements, like text on a single line, reflowable, maximum length, etc. Any XLIFF elements or attributes which are needed for a specific application's use would be placed in a "module". David Corporate Globalization Tool Development EMail:   waltersd@us.ibm.com           Phone: (507) 253-7278,   T/L:553-7278,   Fax: (507) 253-1721 CHKPII:                     http://w3-03.ibm.com/globalization/page/2011 TM file formats:     http://w3-03.ibm.com/globalization/page/2083 TM markups:         http://w3-03.ibm.com/globalization/page/2071 "Rodolfo M. Raya" ---11/02/2011 12:09:10 PM---Hi Helena, From: "Rodolfo M. Raya" < rmraya@maxprograms.com > To: < xliff@lists.oasis-open.org > Date: 11/02/2011 12:09 PM Subject: RE: [xliff] Segmentation as core or not Sent by: < xliff@lists.oasis-open.org > Hi Helena,   There is a confusion in terminology. Changing the element name to <part> helps in visualization but doesn’t solve the issue at hand.   An XLIFF file is a container for text extracted for localization. If there isn’t text to localize, there is no XLIFF because there is nothing to Interchange (the “L” and “I” in XLIFF are failing).   In many cases, the text extracted for localization needs to be further partitioned to facilitate the translation process. There are cases in which translators prefer to translate paragraphs of text because it produces better translations. In other cases (probably the majority of cases), translators prefer to translate sentences because it facilitates TM matching and translation reuse. The process of splitting extracted text into sentences is known as “segmentation”.   The issue listed in the wiki related to segmentation deals with division of extracted text into “segments” and rearrangement of the segmented text when the boundaries detected by an automated process are not suitable according to the preferences of the translator.   Segmentation can be done during text extraction, when the XLIFF file is created, or in a second pass after the XLIFF has been created. Segmentation also happens at translation time when translators merge or split existing segments.   An XLIFF file must have containers for the extracted text. Having those containers is not a “feature”, it is a necessity. Being able to split the text and store the “segments”, “parts” or “fragments” in the same XLIFF can be viewed as a feature that may be qualified as “core” or “module”.   The proposal currently in the wiki doesn’t make it easy to differentiate between text that has been “extracted” and text that has been “extracted and segmented”. If we had a clear distinction between just extracted and segmented we would be able to tell if the segmentation process and its result belongs to the “core” or “module” category.   When segmentation is done while the XLIFF file is being generated, each segment can be represented as a unit for translation. That was the original way of working with XLIFF 1.0 and 1.1. In XLIFF 1.2 the notion of representing segmentation in the XLIFF document was introduced.   Working with XLIFF 1.2 you can have a segmented file with each <trans-unit> containing one segment or you can have files that contain multiple segments in a <trans-unit> element, each of them enclosed in special markup designed with a combination of <seg-source> and <mrk> elements.   The model for representing segmentation  introduced in XLIFF 1.2 has several problems that must be fixed in XLIFF 2.0.   The proposal for using <unit>, <segment> and <ignorable> that we have in current draft of the XLIFF schema allows representing segmentation. The problem with the schema is that it does not tell you if the text contained in the XLIFF file has been just extracted or extracted and segmented.   The work you did with Yves in the wiki helps in understanding the status of the extracted text. With the attributes, elements and processing expectations you designed it is possible to know if the text has been segmented, if further segmentation is allowed and what restrictions apply. It’s a very nice design.   The discussion is about the qualification of your work. Is it essential of is it optional? If essential, that’s a “core” feature and the used elements and attributes should be in the main XML Schema and documented as integral part of XLIFF. If  representing segmentation is an optional goal, then those elements and attributes should live in a separate optional XML Schema (a “module”) and documented in an annex of the specification or in a separate guideline.   In my personal opinion, representing segmentation as was designed should be a required part of the XLIFF 2.0 standard. I would call it a “core” feature.   Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com   From:   xliff@lists.oasis-open.org [ mailto:xliff@lists.oasis-open.org ] On Behalf Of Helena S Chapman Sent:  Wednesday, November 02, 2011 12:07 PM To:  Yves Savourel Cc:   xliff@lists.oasis-open.org Subject:  RE: [xliff] Segmentation as core or not   It almost read like what the localization industry is used to call "segment" is really a "partition". Basically something that have been cut, classified but could be further divided or broken off into finer fragments? Since I have only been involved in localization topic for the last 3-4 years, I am probably close to the un-tainted eyes.   To me, a segment in the localization world is something that usually have something to do with payment. That is, even if one is paying a service by words, the cost of each word can still be determined by the complexity of a segment. (e.g. length etc.) From:         Yves Savourel < ysavourel@enlaso.com >   To:         Helena S Chapman/San Jose/IBM@IBMUS   Cc:         < xliff@lists.oasis-open.org >   Date:         11/01/2011 11:02 PM   Subject:         RE: [xliff] Segmentation as core or not   Hi Helena,     I guess theoretically it would be possible to have an entire chapter in one “part”. But the extraction tools would not likely do that. Even when there is no sentence-based segmentation the extractors do break down the content into much smaller parts; typically the equivalent of paragraphs for document-type files, or strings for UI-type file.     Actually quite a few tools, especially for software, don’t go beyond that type of segmentation. If you look at many tools for PO files, or Java properties files for examples: Their entries are not often sentence-segmented. And they create TMX files where the entries are called “segments”.     Others may correct me, but I think calling those extracted parts “segments” is simply a relatively common practice.     Personally I think the important thing is to be very clear on what those “part” are, regardless how we end up calling the elements. That said we should obviously pick a name that is not too confusing.   It seems “segment” has been used for a while to mean both the container of something un-segmented and segmented (see for example TMX’s <seg>), but maybe I’ve been too deep in TMX/XLIFF/etc. for too long to see the world with un-tainted eyes :)     Hope this helps,   -yves       From:  Helena S Chapman [ mailto:hchapman@us.ibm.com ] Sent:  Tuesday, November 01, 2011 7:52 PM To:  Yves Savourel Cc:   xliff@lists.oasis-open.org Subject:  Re: [xliff] Segmentation as core or not     Yves, I want to make sure I understand your view point. Based on what you suggested, it is possible for one to have an entire chapter or book as a single *part* when pass it around in an XLIFF file? If so, why call it a segment? <unit id='1'> <part> <source>Sentence one. Sentence two. Sentence three. .... Sentence two thousand and forty five.</source> </part> </unit> Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         Yves Savourel < ysavourel@enlaso.com >   To:         < xliff@lists.oasis-open.org >   Date:         11/01/2011 04:56 PM   Subject:         [xliff] Segmentation as core or not   Sent by:         < xliff@lists.oasis-open.org >     Hi all, To continue on the discussion whether the "segmentation" feature is core or not: I think Dave has an obviously valid point when saying that segmentation is not necessarily done at the time of the extraction, and therefore we could have un-segmented XLIFF. But to me a "segment" is not necessarily the result of a segmentation process it can be a "block" extracted from the original format (as our definition states: http://wiki.oasis-open.org/xliff/OneContentModel#Definitions.2BAC8-Terminology ). So each un-segmented entry is, by nature a segment, that simply contains potentially several sentences. Maybe things would more clear if we think about the element <segment> as a "part" rather than a "segment"? The Segmentation representation addresses how to organize and manipulate such parts. <unit id='1'> <part> <source>Sentence one. Sentence two.</source> </part> </unit> <unit id='1'> <part> <source>Sentence one. </source> </part> <part> <source> Sentence two.</source> </part> </unit> Maybe, viewed from that angle it's more clear that such element needs to be part of the core? Cheers, -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 12.  RE: [xliff] Segmentation as core or not

    Posted 11-03-2011 19:55
    Hi Dave, > This simple XLIFF file should not contain segmentation > information because some tools may not care about > segmentation. I agree that some tools may not care about segmentation information. But they still should be able to work with files that have segmentation information. For example a spell-checker should work with or without segmentation. But that is beside your point. I think an XLIFF document has always some segmentation formation. It's just not always the result of a sentence segmentation process. > In my opinion, the XLIFF "core" elements should be > the minimum set of elements which are required to > extract the source text from the original file > format in such a way that the source text can be > replaced by the translated text in the original file, > and the translated file will be usable by the product. I tend to agree with that. > 3. Identify each contiguous block of text based on > the source file's formatting rules. I think this is the key: your #3 is segmentation representation. Extracting entries from any file is implicitly a segmentation process. It's not the same as trying to segment by sentences, which you may do later, but it is a form of segmentation. A good proof of that is that somehow in your example, the filter decided that the content of <bi> should not have its own <trans-unit>: you are already applying some kind of segmentation rules. The only difference is that such initial extraction-driven segmentation is usually not labeled as such and people tend to see only the sentence segmentation. Imagine that your file is now in XLIFF 2.0. It is as follow: <unit id="1"> <segment> <source>This is my document title</source> </segment> </unit> < unit id="2"> <segment> <source>Document's short description</source> </segment> </unit> <unit id="3"> <segment> <source>This document describes how the user is to use product <ph id="1"/>. The first step is to press the <pc id="2">start</pc> button; there are no other actions.</source> </segment> </unit> Then you decide to apply sentence segmentation and you end up with: <unit id="1"> <segment> <source>This is my document title</source> </segment> </unit> < unit id="2"> <segment> <source>Document's short description</source> </segment> </unit> <unit id="3"> <segment> <source>This document describes how the user is to use product <ph id="1"/>. </source> </segment> <segment> <source>The first step is to press the <pc id="2">start</pc> button; there are no other actions.</source> </segment> </unit> Going from the first representation to the second is not really segmenting, it re-segmenting. In addition, all entries are using the same representation regardless whether or not they have gone through additional segmentation after the extraction. This makes it very easy for tools (even XSLT ones) to work with the text without having to worry about looking at different kind of elements. This also has the drawback of not being able to know the (re-)segmentation status of the entries with a single <segment>, as Rodolfo pointed out yesterday. But we can come up with some solution for that. -ys


  • 13.  RE: [xliff] Segmentation as core or not

    Posted 11-04-2011 12:45
    Hi Yves, I never thought about extracting the text from one format and creating an XLIFF file as a form of segmentation. But I guess you are dividing the source into parts.  I agree that "segment" and "segmentation" are overloaded words which have evolved into having different meanings to different people.  So I think we must be consistent on the words we use.  One of the definitions I found for the word "extract" was " to   take   or   copy   out   (matter),   as   from   a   book ".  So I prefer to think of "extraction" as the process of taking the text out of the original source format, modifying that text so that it is compatible with a new output format (i.e. replacing inline items with XLIFF inline elements), and creating a new output file.  A set of guidelines could be developed on how to extract the text.  Does the XLIFF 1.2 spec have guidelines for this activity?  "Segmentation" would be the process to take a block of text and divide it into smaller parts (segments).  The text itself is not modified, it is only divided.  SRX was developed to control the segmentation of text, but has nothing to do with the extraction of text. Your example helps to separate these two activities. <unit id="3"> <segment>  <source>This document describes how the user is to use product <ph id="1"/>.  The first step is to press the <pc id="2">start</pc> button; there are no other actions.</source> </segment> </unit> <unit> defines the extracted text parts.  <segment> is redundant and provides no additional information at this point, so it should not be required.   Once some type of segmentation is performed (whether it be sentence segmentation or segmentation based on some other rules), then the <unit> is further divided into translatable segments: <unit id="3"> <segment>  <source>This document describes how the user is to use product <ph id="1"/>.  </source> </segment> <segment>  <source>The first step is to press the <pc id="2">start</pc> button; there are no other actions.</source> </segment> </unit> David Corporate Globalization Tool Development EMail:  waltersd@us.ibm.com           Phone: (507) 253-7278,   T/L:553-7278,   Fax: (507) 253-1721 CHKPII:                     http://w3-03.ibm.com/globalization/page/2011 TM file formats:     http://w3-03.ibm.com/globalization/page/2083 TM markups:         http://w3-03.ibm.com/globalization/page/2071 Yves Savourel ---11/03/2011 02:56:22 PM---Hi Dave, > This simple XLIFF file should not contain segmentation From: Yves Savourel <ysavourel@enlaso.com> To: <xliff@lists.oasis-open.org> Date: 11/03/2011 02:56 PM Subject: RE: [xliff] Segmentation as core or not Sent by: <xliff@lists.oasis-open.org> Hi Dave, > This simple XLIFF file should not contain segmentation > information because some tools may not care about > segmentation. I agree that some tools may not care about segmentation information. But they still should be able to work with files that have segmentation information. For example a spell-checker should work with or without segmentation. But that is beside your point. I think an XLIFF document has always some segmentation formation. It's just not always the result of a sentence segmentation process. > In  my opinion, the XLIFF "core" elements should be > the minimum set of elements which are required to > extract the source text from the original file > format in such a way that the source text can be > replaced by the translated text in the original file, > and the translated file will be usable by the product. I tend to agree with that. > 3. Identify each contiguous block of text based on > the source file's formatting rules. I think this is the key: your #3 is segmentation representation. Extracting entries from any file is implicitly a segmentation process. It's not the same as trying to segment by sentences, which you may do later, but it is a form of segmentation. A good proof of that is that somehow in your example, the filter decided that the content of <bi> should not have its own <trans-unit>: you are already applying some kind of segmentation rules. The only difference is that such initial extraction-driven segmentation is usually not labeled as such and people tend to see only the sentence segmentation. Imagine that your file is now in XLIFF 2.0. It is as follow: <unit id="1"> <segment>  <source>This is my document title</source> </segment> </unit> < unit id="2"> <segment>  <source>Document's short description</source> </segment> </unit> <unit id="3"> <segment>  <source>This document describes how the user is to use product <ph id="1"/>.  The first step is to press the <pc id="2">start</pc> button; there are no other actions.</source> </segment> </unit> Then you decide to apply sentence segmentation and you end up with: <unit id="1"> <segment>  <source>This is my document title</source> </segment> </unit> < unit id="2"> <segment>  <source>Document's short description</source> </segment> </unit> <unit id="3"> <segment>  <source>This document describes how the user is to use product <ph id="1"/>.  </source> </segment> <segment>  <source>The first step is to press the <pc id="2">start</pc> button; there are no other actions.</source> </segment> </unit> Going from the first representation to the second is not really segmenting, it re-segmenting. In addition, all entries are using the same representation regardless whether or not they have gone through additional segmentation after the extraction. This makes it very easy for tools (even XSLT ones) to work with the text without having to worry about looking at different kind of elements. This also has the drawback of not being able to know the (re-)segmentation status of the entries with a single <segment>, as Rodolfo pointed out yesterday. But we can come up with some solution for that. -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org


  • 14.  RE: [xliff] Segmentation as core or not

    Posted 11-04-2011 20:57
    Hi David, > ...taking the text out of the original source format, > modifying that text so that it is compatible with a > new output format (i.e. replacing inline items with > XLIFF inline elements), and creating a new output > file" It seems your definition of extraction includes an implicit segmentation: When you "take the text out of the original source format" you have to take a selection of the original content and that selection has to be based on some rules. They may not be called segmentation rules (or people may not think about them as segmentation rules), but that's what they are. Using <segment> in the output simply makes the result of that implicit segmentation explicit. Which is a good thing. > "Segmentation" would be the process to take a block > of text and divide it into smaller parts (segments). It sounds reasonable. And that definition can fit the "take the text out of the original source format" of your definition of extraction. > SRX was developed to control the segmentation > of text, but has nothing to do with the > extraction of text. But SRX is not the only thing that drives segmentation. For example the <withinTextRule/> element of ITS would be used in your XML example to make <bi> an element within its parent rather than a separate <unit>. It would also be used to specify sub-flows. Those are segmentation rules. See the ITS specification section 6.8 ( http://www.w3.org/TR/its/#elements-within-text ) and the requirement #25 in the work document used to define the requirement for ITS: http://www.w3.org/TR/2006/WD-itsreq-20060518/#elemseg It's not about sentence-segmentation, but it's clearly about segmentation. > <unit> defines the extracted text parts. > <segment> is redundant and provides no additional > information at this point, so it should not > be required. "<unit> defines the extracted text parts" ...which initially correspond to single segments. And <segment> hold one segment. Having a single segment in the unit is just one case among others. It just happens that it's the case existing just after "extraction", before you apply further (optional) segmentation rules. If we say <segment> shouldn't be required when there is only one segment, then we could apply the same logic (they are redundant and provides no additional information) to the <source> elements and say they are not required when there is no target. Both <segment> and <source> may look useless when a unit is made of a single segment and has no target, but making them optional would cause tools (and the schema) to have to deal with complicated conditions. It's much simpler and efficient to make them required. Finally, even if <segment> was optional when the content has not been sentence-segmented. I still think it would be a fundamental part of the XLIFF structure: states, translation candidates, comments, and many other aspects of XLIFF need to be set at the segment level and therefore they could not exists efficiently without <segment>. In other words so many optional modules would need the "segment representation module", that it will make more sense to have the segmentation representation always available. Cheers, -yves


  • 15.  RE: [xliff] Segmentation as core or not

    Posted 11-08-2011 01:40
    Hello,  We discussed this a little bit in IBM today.  Our view would still be that segmentation does not need to be in core for interchange.  However, we had discussed a little bit about the idea of a logging facility which would give a list of what operations had occurred on a particular document as it processes through a workflow.  If the logging facility stored, within the document, for each such operation: * timestamp * type of operation ( from a namespace ) * perhaps free format text with a description of the operation  For example,  (and please disregard specifics of the following markup, it is only given for the rough concept)   <log> <logEntry timestamp="2011-11-08 01:16:07 UTC" operation="org.oasis-open.xliff.segmentation">Segmentation was performed</logEntry> <logEntry timestamp="2011-12-08 01:16:07 UTC" operation="com.example.someOtherOp.specialTranslation">Some other operation was performed</logEntry>               ...   </log> * XLIFF could administer a namespace containing items such as org.oasis-open.xliff.segmentation   ( Java form, or it could be a URI such as with DTDs ) * Or, a company could use their own namespace (com.example for example). * This way we could answer questions such as 'has segmentation occurred?' and 'where in the workflow (sequentially, according to the timestamp) did it occur? * the operations referenced in the <log> would not need to be core or even part of the currently referenced version of xliff - as long as the namespace was maintained Steven


  • 16.  RE: [xliff] Segmentation as core or not

    Posted 11-08-2011 05:26
    Hi Steven, all, > We discussed this a little bit in IBM today. > Our view would still be that segmentation > does not need to be in core for interchange. I think most (all hopefully) of us would probably agree that one important criteria for an optional module is that it does not prevent the tools implementing only the core to work properly. So if the representation of sentence-segmentation is optional it should not prevent a tool XYZ, which understands only the core elements, to work. The question then is how does tool XYZ can work with a sentence-segmented file without knowing about <segment>? <unit id='1'> <segment> <source>Sentence one. </source> </segment> <segment> <source>Sentence two.</source> </segment> </unit> I don't think it can. The only way it could, would be if a unit was to store two copies of the same content: one not sentence-segmented, and the other one reserved for the tools that would implement the optional segmentation representation module. Needless to say this would result in a slew of troubles: Where does tool ABC (which implements segmentation) puts its translation? How tools XYZ (which does not implement segmentation) can access it? How do we resolve difference in source? Where do we put segment status? etc. Basically it's all the problems of 1.2 all over again. In 1.2 we had no choice because we needed to be backward compatible. But 2.0 we can have a clean way of dealing with segments. So far, the only rationale I've heard for making <segment> optional, is the argument that segmentation is a different process and therefore should not be part of the core. But I think we have seen that segmentation in general is broader than sentence-segmentation and clearly happens also during extraction (see the example with ITS <withinTextRule/>), so that rationale doesn't really hold true. But maybe I'm missing other things: what are the advantages of keeping the segmentation representation optional? Cheers, -yves


  • 17.  RE: [xliff] Segmentation as core or not

    Posted 11-08-2011 09:37
    Hi, I think there is a huge confusion between the segmentation process and storing segments in XLIFF. Text extracted for translation and stored in an XLIFF file needs to be stored in some elements that act as containers. If XLIFF doesn't have containers for holding localizable text, then the localizable text can't be exchanged and the "L" and "I" fail in the XLIFF acronym. Extracted text can be segmented before the XLIFF file is created (my tools have been doing this for years) or after the XLIFF has been created. A tool processing XLIFF files should not care about when segmentation was done. More, the segmentation process is completely optional. To apply segmentation process to an already existing XLIFF file is an optional task. Recording that such task has been performed is the optional part. For the process to be possible, the text must already be in the XLIFF file and it has to be in some containers. Storing translatable text in XLIFF files is not optional. Elements for holding that text are required and elements for holding the translations of that text are also an integral part of XLIFF. What we have so far in the XLIFF schema draft is a set of elements and attributes for holding translatable text and its translations. In the schema we don't have information that indicates how and when segmentation process occurred. In the wiki we have a proposal for decorating current schema draft with elements and attributes containing information about the segmentation process. The proposal in the wiki augments the scope of the basic elements already present in the schema draft by adding attributes and processing expectations to elements that must be present in any XLIFF file. Although some attributes mentioned in the segmentation section in the wiki are not really necessary when an XLIFF file is created, the elements in which they appear are absolutely necessary. We can't document an element as part of the "core" schema and leave some of its attributes as optional in a separate "module". Minimalism is a fancy trend. I like it very much and see it useful in some cases. We should not try to apply minimalism to the concept of XLIFF core; this would be a mistake as big as the mistake in XLIFF 1.2 that enabled custom extensions everywhere. Balance is important. Regards, Rodolfo -- Rodolfo M. Raya rmraya@maxprograms.com Maxprograms http://www.maxprograms.com >


  • 18.  RE: [xliff] Segmentation as core or not

    Posted 11-08-2011 15:14
    Rodolfo. You brought up an interesting point " To apply segmentation process to an already existing XLIFF file is an optional task. Recording that such task has been performed is the   optional part. For the process to be possible, the text must already be in the XLIFF file and it has to be in some containers. " I believe we are talking about two very distinct process activities here: 1. partition content into parts (core) 2. refine the definition of #1 into segments (module) I agree any existing XLIFF file will already include "parts" of content. How these parts were defined by what tools is something the module can then define. For example, one might expect metadata about what the parts mean according to other standard or non-standard definition. For example, word vs sentence according to UAX#29 or paragraph vs chapter based on Acme Translation Agency Inc. internal definition? The latter is what Steven is referring to as logging. We definitely should rethink the taxonomy of what we call "segmentation" today. Note that I didn't use the word "terminology" to further pollute the conversation. Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         "Rodolfo M. Raya" <rmraya@maxprograms.com> To:         <xliff@lists.oasis-open.org> Date:         11/08/2011 04:37 AM Subject:         RE: [xliff] Segmentation as core or not Sent by:         <xliff@lists.oasis-open.org> Hi, I think there is a huge confusion between the segmentation process and storing segments in XLIFF. Text extracted for translation and stored in an XLIFF file needs to be stored in some elements that act as containers. If XLIFF doesn't have containers for holding localizable text, then the localizable text can't be exchanged and the "L" and "I" fail in the XLIFF acronym. Extracted text can be segmented before the XLIFF file is created (my tools have been doing this for years) or after the XLIFF has been created. A tool processing XLIFF files should not care about when segmentation was done. More, the segmentation process is completely optional. To apply segmentation process to an already existing XLIFF file is an optional task. Recording that such task has been performed is the  optional part. For the process to be possible, the text must already be in the XLIFF file and it has to be in some containers. Storing translatable text in XLIFF files is not optional. Elements for holding that text are required and elements for holding the translations of that text are also an integral part of XLIFF. What we have so far in the XLIFF schema draft is a set of elements and attributes for holding translatable text and its translations. In the schema we don't have information that indicates how and when segmentation process occurred. In the wiki we have a proposal for decorating current schema draft with elements and attributes containing information about the segmentation process. The proposal in the wiki augments the scope of the basic elements already present in the schema draft by adding attributes and processing expectations to elements that must be present in any XLIFF file. Although some attributes mentioned in the segmentation section in the wiki are not really necessary when an XLIFF file is created, the elements in which they appear are absolutely necessary. We can't document an element as part of the "core" schema and leave some of its attributes as optional in a separate "module". Minimalism is a fancy trend. I like it very much and see it useful in some cases. We should not try to apply minimalism to the concept of XLIFF core; this would be a mistake as big as the mistake in XLIFF 1.2 that enabled custom extensions everywhere. Balance is important. Regards, Rodolfo -- Rodolfo M. Raya       rmraya@maxprograms.com Maxprograms       http://www.maxprograms.com >


  • 19.  Core vs. Module: what are the criteria? (RE: [xliff] Segmentation as core or not)

    Posted 11-08-2011 17:36
    Seems we are in need of establishing criteria to distinguish between core and module. There are clearly specific features that are aching to be categorized. I will add this excellent comment by Yves to the record for when we get to that debate (i.e., what are the criteria we should apply when calling a feature core or module?). > Hi Steven, all, > >. . . > > I think most (all hopefully) of us would probably > agree that one important criteria for an optional > module is that it does not prevent the tools > implementing only the core to work properly. Thanks Yves and Steven. I would not have thought of this excellent point. - Bryan


  • 20.  Re: [xliff] Segmentation as core or not

    Posted 11-08-2011 12:16
    I Steven, I very much like the idea of log that keeps track of operations. Still in many cases we would need information on unit/segment level. But this is another story.. As Bryan, Arle, and I suggested elsewhere we should defer discussion of core/module status of particular features.   I suggest that you record the log facility as a feature in the section 2 on the wiki. Rgds dF Dr. David Filip ======================= LRC CNGL LT-Web CSIS University of Limerick, Ireland telephone: +353-6120-2781 mobile:  +353-86-049-34-68 facsimile: +353-6120-2734 mailto: david.filip@ul.ie On Tue, Nov 8, 2011 at 01:23, Steven R Loomis < srloomis@us.ibm.com > wrote: Hello,  We discussed this a little bit in IBM today.  Our view would still be that segmentation does not need to be in core for interchange.  However, we had discussed a little bit about the idea of a logging facility which would give a list of what operations had occurred on a particular document as it processes through a workflow.  If the logging facility stored, within the document, for each such operation: * timestamp * type of operation ( from a namespace ) * perhaps free format text with a description of the operation  For example,  (and please disregard specifics of the following markup, it is only given for the rough concept)   <log> <logEntry timestamp="2011-11-08 01:16:07 UTC" operation="org.oasis-open.xliff.segmentation">Segmentation was performed</logEntry> <logEntry timestamp="2011-12-08 01:16:07 UTC" operation="com.example.someOtherOp.specialTranslation">Some other operation was performed</logEntry>               ...   </log> * XLIFF could administer a namespace containing items such as org.oasis-open.xliff.segmentation   ( Java form, or it could be a URI such as with DTDs ) * Or, a company could use their own namespace (com.example for example). * This way we could answer questions such as 'has segmentation occurred?' and 'where in the workflow (sequentially, according to the timestamp) did it occur? * the operations referenced in the <log> would not need to be core or even part of the currently referenced version of xliff - as long as the namespace was maintained Steven


  • 21.  RE: [xliff] Segmentation as core or not

    Posted 11-08-2011 12:33
    > I suggest that you record the log facility as > a feature in the section 2 on the wiki. Maybe it could go with the item 1 in that list: http://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/ChangeTracking -ys


  • 22.  RE: [xliff] Segmentation as core or not

    Posted 11-08-2011 15:14
    Question I have with the item 1 regarding "change tracking/version control" is: - Is the change tracking for the purpose of tracking changes of the XLIFF file content itself or, - change tracking is to track the activities performed to this file during the end to end process? When I think of "change tracking/version control", I perceive that as CVS/SVN where a simple change in comments (for example) would also be captured that has nothing to do with the end-to-end process. Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         Yves Savourel <ysavourel@enlaso.com> To:         <xliff@lists.oasis-open.org> Date:         11/08/2011 07:32 AM Subject:         RE: [xliff] Segmentation as core or not Sent by:         <xliff@lists.oasis-open.org> > I suggest that you record the log facility as > a feature in the section 2 on the wiki. Maybe it could go with the item 1 in that list: http://wiki.oasis-open.org/xliff/XLIFF2.0/Feature/ChangeTracking -ys --------------------------------------------------------------------- To unsubscribe, e-mail: xliff-unsubscribe@lists.oasis-open.org For additional commands, e-mail: xliff-help@lists.oasis-open.org