OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only

embedding XSLT elements in XLIFF source and target elements

  • 1.  embedding XSLT elements in XLIFF source and target elements

    Posted 06-20-2005 18:13
     MHonArc v2.5.0b2 -->
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    
    

    xliff message

    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]


    Subject: embedding XSLT elements in XLIFF source and target elements


     

    Hi All

     

    About a year ago, I raised some issues regarding translation of a peculiar file format that Oracle intends to be the first step in localizing a voice enabled application

     

    This issue has recently resurfaced within the TC, as it impacts the whole issue of including xml structures from other name spaces directly within the source and target xliff elements

     

    I have put together the following document which should refresh your memories.

     

    In summary, i am suggesting hat XSLT, or relevant elements of XSLT be an allowed namespace within source and target as it allows some very intricate processing to be performed that gets round some translatability quality issues that have never really be addressed within I18n

     

    I have used some very rough Java code as examples, but the issues apply to any other file type

     

    Mat

     

    ------------------------------------------------------------

     

    Translation quality and embedding XSLT within XLIFF

     

    Consider a simple java method designed to report on the number of new emails received by a mail server application

     

    public void printNewmails(int  count){

                System.out.println(“You have “ + count + “ new E-mail message(s)”);

    }

     

    There are several problems here.

     

    Firstly, the message uses a short-hand mechanism of handling plurals.

     

    An improved mechanism would be:

     

    public void printNewmails(int  count){

                if (count == 1){

                System.out.println(“You have “ + count + “ new E-mail message”);

    }

    else {

                System.out.println(“You have “ + count + “ new E-mail messages”);

    }

    }

     

    This can be more easily localized

     

    public void printNewmails(int  count){

                String message = null;

                if (count == 1){

                message = LoadResource(SINGLE_MESSAGE);

    else {

                message = LoadResource(MULTIPLE_MESSAGES);

    }

    message.format(count);

    System.out.println(message);

    }

     

    But there is now a more subtle issue. Appending an ‘s’ to ‘message’ for “no messages” or more than one message is a convention that applies in English. It is most likely not applicable in other languages

     

    For example, Irish appends a multiple for all non-zero values.

     

    So the logic is different in Irish

     

     i.e. if (count == 0){

    }

    else{

    }

     

    Other languages may use different logic or  a different number of formats

     

    For example, Polish uses five different resources depending on the count

     

     

    An even more extreme example occurs if the system can be used to print the number of new e-mails, voice mails, or faxes

     

     

    Public static int E_MAIL = 1;

    Public static int VOICE_MAIL = 2;

    Public static int FAX = 3;

     

    public void printNewmails(int  count, int type){

                String message = null;

    String messageType = null;

     

                if (count == 1){

                message = LoadResource(SINGLE_MESSAGE);

    else {

                message = LoadResource(MULTIPLE_MESSAGES);

    }

     

    if (type = E_MAIL){

                messageType = LoadResource(E_MAIL_MESSAGE);

    }

    else if (type = VOICE_MAIL){

                messageType = LoadResource(VOICE_MAIL _MESSAGE);

    }

    if (type = FAX){

                messageType = LoadResource(FAX _MESSAGE);

    }

     

               

    message.format(count, messageType);

     

     

    The complexity here is that some languages may change the initial message depending on the Gender of the message Type

     

    This then requires even more complex code, with the code knowing the gender of each resource.

    But how does a developer know or code for all the possible gender and linguistic variations . The localizer should make this decision

     

    public void printNewmails(int  count, int type, Locale loc){

     

    LocalisedResource messageType = LoadLocalisedResource(type, loc)

    LocalisedResource message = loadLocalisedResource(count,

    messageType,

    loc);

               

                System.out.println(message.toString());

    }

     

    In this example, the first call to LoadLocalisedResource returns a structure containing the localized text of the resource for the required locale. The structure also has meta data containing the gender of the resource

     

    The second call uses the gender of the message type to alter the language specific resource returned depending on the meta data of the language type.

     

     

    But this still requires the developer to understand all these linguistic issues, and to know the gender of each message Type for all possible languages

     

     

    To be fully internationalized, the decision on which message to use needs to be localized, as well as the messages themselves

     

     

    Traditionally, this issue has been handled in software by designing the applications so that these issues do not occur

     

     

    public void printNewmails(int  count, int type){

     

    will output:

                New Email message count : 3

    Or

                New Fax message count : 0

     

    But if the UI quality is to be of a higher standard, or is to be converted into a voice enabled application, this level of quality is insufficient.

     

    For voice enable applications, further issues arise, in that the count values also need to be localized as “One”, “Two” etc

     

    As stated above, to achieve the highest possible quality in localizing this type of application is to localize the decision logic as well as the resources

     

    One way to do this would be to introduce the linguistic Meta data and logic into the Translatable Xliff  structures themselves

     

    Firstly, when translating “E_MAIL”, translators can add gender meta tags” to the resource.

     

    Secondly when translating the actual messages, the if (count ==) logic can be implemented using XSLT choose structures

     

    The choose structures can pick up input values, such as the count, or meta tags from other structures

     

    Consider the following XSLT structure

     

    <var name="numVM" type="integer"/>
        <choose>
          <when test="numVM==0">
            You have no new messages.
          </when>
          <when test="numVM==1"/>
            You have one new message.
          </when>
          <otherwise>
            You have <varref name="numVM"/> new messages.
          </otherwise>
        </choose>

     

    And its Polish translation:

      <var name="numVM" type="integer"/>
      <choose>
        <when test="numVM=0">
          Nie masz nowych wiadomości
        </when>
        <when test="numVM=1"/>
          Masz jedną nową wiadomość
        </when>
        <when test="numVM<5"/>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowe wiadomości
        </when>
        <when test="numVM<21"/>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowych wiadomości
        </when>
        <when test="in(numVM % 10, [2,3,4])"/>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowe wiadomości
        </when>
        <otherwise>
          Masz <varref name="numVM" variant="feminine_accusative"/> nowych wiadomości
        </otherwise>
      </choose>

    Now consider how this could be transported in an Xliff document

     

    The obvious solution is to include a trans unit for each of the three English variants.

    But this requires six trans units in Polish

     

    The polish translator could add the extra trans units. We include the <when> conditions in the trans unit id’s

     

    This becomes awkward for reuse.

     

    If the second English resource is modified in later versions, which of the six polish Tran units can be reused, and which need retranslation.

     

    It also requires that the Xliff editing tool be capable and allowed to add extra trans units

     

    A simpler variation may be to put the entire structure as the translatable text in a single XLIFF trans unit

     

    The translator then localizes a single  XSLT fragment for each resource, and the transaltaed Xliff file then becvomes

     

     

    <xliff version='1.1'

     xmlns='urn:oasis:names:tc:xliff:document:1.1'>

     <file original='hello.txt' source-language='en' target-language='pl'

      datatype='plaintext'>

      <body>

       <trans-unit id='new_messages'>

        <source><var name="numVM" type="integer"/>

        <choose>

          <when test="numVM==0">

            You have no new messages.

          </when>

          <when test="numVM==1"/>

            You have one new message.

          </when>

          <otherwise>

            You have <varref name="numVM"/> new messages.

          </otherwise>

        </choose></source>

        <target><var name="numVM" type="integer"/>

      <choose>

        <when test="numVM=0">

          Nie masz nowych wiadomosci

        </when>

        <when test="numVM=1"/>

          Masz jedna nowa wiadomosc

        </when>

        <when test="numVM<5"/>

          Masz <varref name="numVM" variant="feminine_accusative"/> nowe wiadomosci

        </when>

        <when test="numVM<21"/>

          Masz <varref name="numVM" variant="feminine_accusative"/> nowych wiadomosci

        </when>

        <when test="in(numVM % 10, [2,3,4])"/>

          Masz <varref name="numVM" variant="feminine_accusative"/> nowe wiadomosci

        </when>

        <otherwise>

          Masz <varref name="numVM" variant="feminine_accusative"/> nowych wiadomosci

        </otherwise>

      </choose></target>

       </trans-unit>

      </body>

     </file>

    </xliff>



    [Date Prev] | [Thread Prev] | [Thread Next] | [Date Next] -- [Date Index] | [Thread Index] | [List Home]