OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only

RE: [xliff] XLIFF and TM, Glossary, Segmenation Rules - Was: RE: [xliff] ULI

  • 1.  RE: [xliff] XLIFF and TM, Glossary, Segmenation Rules - Was: RE: [xliff] ULI

    Posted 05-10-2011 15:23
    Hi All,   I think we need a full discussion on this topic.   I agree in principal with the idea that we should reference other standard where appropriate.   In the case of glossaries and segmentation rules I would prefer to reference rather do the work ourselves.  However, I think we need to reserve the right to work on glossaries and segmentation rules if the currently available standards are no longer developed. The method suggested by Rodolfo is one possible method for this. It may also be the case that we agree a package which has all these elements rather than have them as part of an XLIFF file and of course someone else could write this package.   In relation to translation memory exchange I think the current situation is more complex. To my mind TMX was written with segment based translation memory in mind and it has not kept pace with technology innovations. We have a situation now where the technology is providing grater leveraging through innovations such as context exact matching and this advantage is being lost when TMX is used to transfer a tm from one tool to another.   My colleague, G á bor Ugray (head of development at Kilgray), gave a presentation at last year ??s XLIFF Symposium where he suggested that the XLIFF file itself could be used for tm exchange. This helps deal with issues such as context matching and possibly improves how corpus based tm are exchanged. However, you do the lose the one source to many targets approach of TMX.  It is possible that Rodolfo ??s proposal for a module for tm exchange could help a lot here. There is also the possibility that the alt-trans element is one where we could have multiple targets.   What I would like to suggest is we have a discussion at our next meeting and decide what is in scope and what is not. I think we need to change the charter but it may make sense to wait a while longer before doing that.   Thanks,   Peter.   From: Helena S Chapman [mailto:hchapman@us.ibm.com] Sent: Tuesday, May 10, 2011 1:10 PM To: Lieske, Christian Cc: Rodolfo M. Raya; xliff@lists.oasis-open.org Subject: Re: [xliff] XLIFF and TM, Glossary, Segmenation Rules - Was: RE: [xliff] ULI   I also have concern with the same statement. If we want to be completely independent of Unicode, let's consider redefining UTF-8, 16, 32 as well since that is how content is commonly encoded when exchanged. On top of that, we should also redefine BCP 47 in case we ever needed to tell each other which locale and scripts we are exchanging the content with. Bottom line, interdependency to reputable standard organization is a good thing. Having to be "contingent" upon not-so-active standard organization is a bad thing. A kitchen-and-sink/cover-it-all standard and implementation has not proven to work well in my programming experiences. I quite frankly disagree with the approach. Referencing another standard within the context is a better method. Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         "Lieske, Christian" < christian.lieske@sap.com > To:         "Rodolfo M. Raya" < rmraya@maxprograms.com >, " xliff@lists.oasis-open.org " < xliff@lists.oasis-open.org > Date:         05/10/2011 02:13 AM Subject:         [xliff] XLIFF and TM, Glossary, Segmenation Rules - Was: RE: [xliff] ULI Hi Rodolfo,   I have some concerns related to your comment/proposal   RR> If we define our own formats for TM, glossary and segmentation rules, we would be independent from LISA, ETSI, Unicode and others.   Here are some of them:   1.       The charter of the XLIFF TC delineates the TC ??s scope. We would have to check that the charter covers the proposal. 2.       The bandwidth and the expertise of the TC members may not allow the comprehensive coverage/scope that you propose. 3.       ??define our own formats ? may lead to a situation in which requirements are not captured properly, and synergies and economies of scale are not possible.   Best regards, Christian   From: Rodolfo M. Raya [ mailto:rmraya@maxprograms.com ] Sent: Dienstag, 3. Mai 2011 18:54 To: xliff@lists.oasis-open.org Subject: RE: [xliff] ULI   Hi Helena,   In the few minutes I requested for after the meeting I proposed an idea: include in XLIFF 2.0 a set of optional modules for holding TM data (something like TMX), glossary data, segmentation rules (something along the lines of SRX).   A translation job requires multiple files, source documents, XLIFF (or equivalent), TMX files and glossaries. What I ´m proposing would reduce the number of files involved, by including some of the usual exchange files into an XLIFF 2.0 document.   If we define our own formats for TM, glossary and segmentation rules, we would be independent from LISA, ETSI, Unicode and others.   Those containers would be absolutely optional but could help in interoperability if developers have to deal with just on technical committee defining exchange rules.   Best regards, Rodolfo -- Rodolfo M. Raya   < rmraya@maxprograms.com > Maxprograms       http://www.maxprograms.com   From: Helena S Chapman [ mailto:hchapman@us.ibm.com ] Sent: Tuesday, May 03, 2011 1:27 PM To: Helena S Chapman Cc: Lucia.Morado; xliff@lists.oasis-open.org ; Yves Savourel Subject: RE: [xliff] ULI   Forgot to mention IBM concerns: - Integration of memory specification in under XLIFF or separately thru Unicode ULI TC. Unless we can do the following, We are not entirely convinced TMX should not live on its own:        * <trans_unit> be limited to one entry per segment and not more        * restructure the <source> and <target> against the TMX <tuv xml:lang="xx"> tags. However, I do not see a need of specifying more than one language pair at a time. That is something never used in practice in TMX        * fold the <note> and <prop> tags together        * TMX needs <note> on <tuv> level because the translator may have added comments are to why the text was translated this way From:         Helena S Chapman/San Jose/IBM@IBMUS To:         "Lucia.Morado" < Lucia.Morado@ul.ie > Cc:         xliff@lists.oasis-open.org , "Yves Savourel" < ysavourel@translate.com > Date:         05/03/2011 12:18 PM Subject:         RE: [xliff] ULI   As mentioned, I'd like to call an off-line meeting. Please write me if you are interested. Agenda: - I have some idea of the initial focus area: SRX and TMX related. Specifically,       1. TR29 specifying the definition and types of textual segmentation rules that applies across the entire life cycle of content, not just during translation phase.       2. SRX being the interchange standards for #1.       3. CLDR to publish language specific segmentation behavior. Following the core/module concept, the specialty module would not be expected to be 100% interoperable. However, core should.       4. Sort out whether there is a need for memory specific standards outside XLIFF. I have a list of concerns and issues to share. See below. - Any of the above does not belong in Unicode? I need your input on what to stay out of or mind my own business and WHY? What type of interaction point would be needed between the two? I plan to call this meeting soon before I kick off the ULI activity on the Unicode side. Please write me by noon EDT Wed May 4th with a few availability options and I will send invitation shortly after. I will not be able to accommodate all the requirements, my apologies in advance. Best regards, Helena Shih Chapman Globalization Technologies and Architecture +1-720-396-6323 or T/L 938-6323 Waltham, Massachusetts From:         "Lucia.Morado" < Lucia.Morado@ul.ie > To:         "Yves Savourel" < ysavourel@translate.com >, < xliff@lists.oasis-open.org > Date:         05/03/2011 10:30 AM Subject:         RE: [xliff] ULI   I see Helena as the current TC officer, she might clarify this in today's meeting. http://unicode.org/uli/ Lucia >