OASIS XML Localisation Interchange File Format (XLIFF) TC

A tricky problem for handling in XLIFF

  • 1.  A tricky problem for handling in XLIFF

    Posted 03-26-2004 08:50
     MHonArc v2.5.0b2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    xliff message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


    Subject: A tricky problem for handling in XLIFF


    In our last meeting, I mentioned that I had encountered a difficult problem for XLIFF.

    This issue was actually raised once before
    http://lists.oasis-open.org/archives/xliff-comment/200303/msg00000.html
    But I think the issue does raise further discussion
    So here are the details

    Here is an XML document to be translated using XLIFF

    <?xml version="1.0"?>
    <ovpxml version="1.0" xml:lang="en-us">

      <prompt id="WELCOME">
        Welcome to the Oracle Messaging Centre.
      </prompt>

      <prompt id="DELETE_MESSAGE">
        <var name="key" type="phoneKey"/>
        To delete the message, press <varref name="key"/>.
      </prompt>

      <prompt id="MESSAGE_RECEIVED">
       <var name="timeReceived" type="dateTime"/>
        The message was received at <varref name="timeReceived" mask="hmA"/>.
      </prompt>

      <prompt id="YOU_HAVE_MESSAGE">
        <var name="numVM" type="integer"/>
        <choose>
          <when test="numVM==0">
            You have no new messages.
          </when>
          <when test="numVM==1"/>
            You have one new message.
          </when>
          <otherwise>
            You have <varref name="numVM"/> new messages.
          </otherwise>
        </choose>
      </prompt>

    </ovpxml>

    (FYI, after the strings are translated, Actors will record these phrases in audio files, and the files are played over the phone when connecting to a voice messaging system)

    There are four prompts

    1) id="WELCOME"
    This is easy and causes no problems

    2) id="DELETE_MESSAGE"

        This is also easy.
       
        But note the <var> and <varref> elements.
        At run time, the application can specify different phone keys
     
        E.G
        In US English
            To delete the message, press pound
        In Real(!) English
            To delete the message, press hash



    3)id="MESSAGE_RECEIVED"

        This is slightly trickier, in that the translated prompt needs to modify the text and the mark-up.

        For Example
            <prompt id="MESSAGE_RECEIVED">
              <var name="timeReceived" type="dateTime"/>
              Le message a �t� re�u � <varref name="timeReceived" mask="Hm"/>.
            </prompt>

        Note that in the translation, the mask attribute has changed due to different linguistic requirements

    4) id="YOU_HAVE_MESSAGE"

    This is the killer problem!

    First note that the designers of the file format have 'borrowed' the choose/when/otherwise structures from XSLT.

    I quite like this idea.

    We are bound to encounter similar scenarios in certain file types, and while we may encounter different usages, should we recommend incorporating XSLT structures into XLIFF when processing such scenarios?

    For example, John mentioned that Java Resource bundles can have a similar mechanism. Should we use xslt elements to represent them?

    But the real problem comes when we examine what is returned from translation.

    <prompt id="YOU_HAVE_MESSAGE">
      <var name="numVM" type="integer"/>
      <choose>
        <when test="numVM=0">
          Nie masz nowych wiadomości
        </when>
        <when test="numVM=1"/>
          Masz jedną nową wiadomość
        </when>
        <when test="numVM<5"/>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowe wiadomości
        </when>
        <when test="numVM<21"/>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowych wiadomości
        </when>
        <when test="in(numVM % 10, [2,3,4])"/>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowe wiadomości
        </when>
        <otherwise>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowych wiadomości
        </otherwise>
      </choose>
    </prompt>

    There are three choices in the US English file, but six choices in the Polish Translation!

    So how can we segment the prompt into trans-units

    I see several options, none of which are simple

    1) We can drop the entire prompt element into a single trans-unit, embed the OVPXML mark up in the source and target, and pass the OVPXML schema to translators to ensure they correctly modify the mark up.


    2) we can segment on each option, incorporating the test condition into the id.
    This means that the editing tool will need to add trans-units for the new conditions

    3) We could permit multiple targets in each trans-unit, with different conditions for different targets

    4) Since there is actually a many to many relationship between sources and targets, we could produce trans-units with only source, or only target, and implement a relationship map between related trans-unit

    So From the above, you will deduce that the problem is rather complex, and that I am stuck.
     
    We need to be able to translate these structures, validate the modified markup during transaltion , and be able to resue as much of the translation in subsequent versions of the files.
     
    For example, if the second version of the file is different in one of the choose elements, how much of the original file can we reuse, and how much needs retranslation.

    For our first pass at implementing this file (In XLIFF 1.0), I think I will go with option 1.

    But if anyone has any better ideas, please let me know.
     
    Mat
     


    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]