OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only

Processing extension elements

  • 1.  Processing extension elements

    Posted 08-17-2004 19:32
     MHonArc v2.5.0b2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    xliff message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


    Subject: Processing extension elements


    It seems to me that the fundamental question for extension 
    in <source>/<target> is how "generic" tools will be able to 
    deal with them, while preserving them.
    
    Here are the of possible extension processings I can think 
    of (without worrying about how this would be expressed in 
    the XLIFF schema):
    
    #1- The unknown elements and their content are stripped out.
    
    #2- The unknown elements are stripped out, their content
        left part of the <source>/<target>.
    
    #3- The unknown elements are preserved and treated as <g> 
        (or <x/> if they are empty elements).
    
    #4- The unknown elements are preserved treated as <ph> 
        (their content is seen as code).
    
    #5- The unknown elements have some XLIFF-understood 
        indication on how to be treated.
    
    
    - "generic tool" means a tool that does the minimal 
    processing allows by the specifications. It does not known 
    any specific extension.
    
    - "as seen by a generic tool" means how the unknown tags 
    would be interpreted in memory (regardless how they are 
    actually represented) by tools that would not know what 
    to do with them.
    
    - There are actually two cases of processing: during merge 
    and not during merge. During a merge process the unknown 
    elements should be ignored by the generic tool (just like 
    an <mrk> element). One has to decide what to do with the 
    content: discard it or treat it as part of the text.
    
    Now let's see examples, pros, and cons for each case:
    
    
    ============================================================
    #1- The unknown elements and their content are stripped out.
    ------------------------------------------------------------
    
    The more drastic solution.
    
    Original entry:
    
    <source xml:lang='en'>This is <htm:b>big</htm:b></source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is </source>
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is </source>
    
    Probably not what we want as extensions that would enclose 
    the original content become death trap for translatable
    text.
    
    
    
    
    ============================================================
    #2- The unknown elements are stripped out, their content 
        left part of the <source>/<target>.
    ------------------------------------------------------------
    
    Original entry:
    
    <source xml:lang='en'>This is <htm:b>big</htm:b></source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is big</source>
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is big</source>
    
    A very simple way to deal with unknown tags. But it would 
    add un-wanted content if the content of the extension 
    elements are really metadata, as shown below.
    
    Original entry:
    
    <source xml:lang='en'>This is 
     <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
    </source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is big'big</source>
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is big'big</source>
    
    
    
    
    ============================================================
    #3- The unknown elements are preserved and treated as <g> 
        (or <x/> if they are empty elements).
    ------------------------------------------------------------
    
    Original entry:
    
    <source xml:lang='en'>This is <htm:b>big</htm:b></source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is <g id='0'>big</g></source>
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is <htm:b>big</htm:b></source>
    
    This solution would also add un-wanted content if the 
    content of the extension elements are really metadata, as 
    shown below.
    
    Original entry:
    
    <source xml:lang='en'>This is 
     <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
    </source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is <g id='0'><g id='1'>big</g>
    <g id='2'>'big</g></g></source>
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is 
     <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
    </source>
    
    
    
    
    ============================================================
    #4- The unknown elements are preserved treated as a <ph> 
        (their content is seen as code).
    ------------------------------------------------------------
    
    This is John's senario (I think). It works fine if the 
    content of all extension elements is metadata.
    
    Original entry:
    
    <source xml:lang='en'>This is big<x:note>blah blah</x:note>
    </source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is big<ph id='0'>blah blah</ph>
    </source>
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is big<x:note>blah blah</x:note>
    </source>
    
    But it does not work for text content inside extension 
    elements, as it would be seen as "code".
    
    Original entry:
    
    <source xml:lang='en'>This is <htm:b>big</htm:b></source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is <ph id='0'>big</ph></source>
    (Code not text --------------------------^ )
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is <htm:b>big</htm:b></source>
    
    
    
    
    ============================================================
    #5- The unknown elements have some XLIFF-understood 
        indication on how to be treated.
    ------------------------------------------------------------
    
    There are two ways to indicate this:
    By an XLIFF-defined attribute the extension elements would 
    have or by enclosing the extensions in a special new XLIFF 
    element such as <extend>.
    
    Original entry:
    
    <source xml:lang='en'>This is 
     <x:def xlf:totrans='yes'><x:term>big</x:term><x:pron 
     xlf:totrans='no'>'big</x:pron></x:def></source>
    
    As seen a generic tool:
    
    <source xml:lang='en'>This is <g id='0'><g id='1'>big</g>
    <ph id='2'>'big</ph></g></source>
    
    Saved by a generic tool:
    
    <source xml:lang='en'>This is 
     <x:def><x:term>big</x:term><x:pron>'big</x:pron></x:def>
    </source>
    
    This is more flexible since it allows to specify how to 
    process things.
    
    However, it may not always doable as the extension elements
    may belong to a namespace that does not allow extension 
    itself, so you would not be able to use xlf:totrans (or 
    wahtever flag decided on). For that the solution would be to
    use an <extend> XLIFF element as Matt (I think) suggested.
    But as you can imagine this would start to make the 
    <source>/<target> content rather crowded.
    
    
    
    
    ============================================================
    Personnal opinion
    ------------------------------------------------------------
    
    It seems that allowing to extension elements that can have 
    either translatable or "code" content in <source>/<target> 
    would add a significate cost in processing and complexity, 
    while I'm not sure allowing code content (i.e. meta-data) 
    would be wise anyway.
    
    I see no big problem with the <html:b>-type of extensions as
     they are simply a more customized way of using <mrk> and 
    "generic" tools could probably deal with them without to 
    much change in their implementation. So, I tend to like 
    solution #3 better (at least for now).
    
    -yves
    
    
    
    


    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]