OASIS XML Localisation Interchange File Format (XLIFF) TC

 View Only
  • 1.  Meeting minutes

    Posted 03-06-2026 06:18

    Dear all,

    Please find below this week's meeting minutes.

    Best,

    Lucía 

    -----------------

    Administration

    R: I move to approve February 17, meeting minutes. https://groups.oasis-open.org/discussion/meeting-minutes-34

    M: I second.

    R: Meeting minutes approved.

    Technical

    Revisit the test suite topic: https://github.com/oasis-tcs/xliff-xliff-22/tree/master/xliff-21/test-suite . Yoshito. New repository?

    Y: I have a question coming up. I think is probably your validator has a problem for validating the note.

    R: send me the files and I will look into it.

    Y: in short, note element ID attribute is ID attribute is optional. Right. And when there is a notes so element and there are two or more note element and if you don't specify ID like for, you know, you can have a multiple note in within the notes without, you know, defining id. But it looks your validator automatically assign id. So and if you have a two node element it invalidate so in caused by duplicated note id. Do you know what I mean?

    R: I understand. If you send me the file, I will look into it.

    Y: Sure, I will.

    R: Have you contacted Kelly about the new repository?

    Y:  Sorry, I didn't. I will do it before next meeting.

    New translation memory standard. https://github.com/oasis-tcs/xliff-xliff-22/tree/master/memory

    R: I don't have any news about the translation memory standard, but I like the work on the provenance module, so maybe we can about switch to that.

    Provenance module. Work on the draft.

    L: Just to give some background information for Mihai if he hasn't followed the last meetings minutes, we decided to create a new module for provenance metadata. Instead of having this document where we specify where metadata should be and we started drafting this document that we shared I think also with you, Mihai. If you don't have access, let us know. We follow more or less a similar structure from previous modules where we have an introduction and then all this standardized information. And the first draft that I created, that we discussed in the previous meeting, I was including all the information in attributes. During the discussion of the last meeting, especially Mathijs and Rodolfo were saying that it was better to have the information within element so we can redefine it better than just attributes. I think he has been working on that.

    M:  Working is a strong word. I wrote down a couple sentences.

    L: Well, I just created just one element, provenance, but I wasn't sure how you wanted to structure the rest of the attributes, so I left a comment just for you to work on that. And I think you have made some progress on that.

    M:  Yeah, and open for. Because while I was working on it, I was like, oh yeah, we should really discuss this. It was like which direction do we go? Like do we go in terms of a change in the provenance goes back to one single element or one single attribute even? Or is the change much larger than that? And how do we then encapsulate what has been changed during provenance.

    R: I would assume that the change applies to the section where the element is. If you are inside a unit, then applies to the unit if you're inside a file. I would assume that the change or the provenance applies to the file. I would take the parent element as the one that has been modified.

    M: Yeah, but the main question here for me is do we only care about target changes? I change the target, therefore provenance, I put it in. Or do we say no, no, no, no. My quality score, my quality threshold. Like those are things that I now updating. That's the only thing I do. But it also is also provenance data.

    R:  It's. It's provenance. It doesn't have to be just target. You may be adding matches or you may be adding glossary terms.

    M: So that's the question is like if there's like different types of things you can change and therefore that requires provenance entry, how should the provenance entry, how should that change there reflect what has been changed?

    R:  You could remember that these attributes may contain text. So the text will be free but not interchangeable. So we need to define the attributes of the things. That's the key. Remember that last meeting you were mentioned? Who, what, why? And when? So who was the. There was an agent applying adding terms in glossary elements on November 3, 2015. You can have other changes. For example, the. The glossary that was used, the number of terms that were added. So we need to differentiate a couple of things that it was not clear to me in the document. Some things must be fixed, must have a certain value set, so you cannot escape. You define other removed or just a few verbs or things that are fixed. But we also need some free text to explain. You need to review it just for example and some things it will be necessary to have free text. But for the most important parts, we need to have elements and attributes that are predefined.

    L:  Yes, my question was. Or when I was thinking about how to convert this into. Because I have everything in attributes. I didn't know the level that you wanted. For example, Do we want to have agent as an attribute or just agent as an element? And inside we have an attribute name, etc. My conceptual first big question that I had, which level you want it of?

    M:  I generally don't like elements that don't have any attributes, only have a single text value. So that to me is an attribute. So therefore agent is an attribute, in my opinion.

    L:  So in that case we can already reuse what I have written about that. So for date you will also put it at provenance level.

    M: Maybe that encapsulates more of a container for all the things inside. But inside the problems container there's multiple changes. Because if there's different points in time that something changed. So every change has like a date and the word change can of course change. But like has the date, has the who, has the what, the why and how much it cost.

    R:  So we don't have the attribute for cost yet. That's something we need to add.

    L:  So the attributes will within the change element.  And then you have some kind of ID or the tool that will handle it.

    M:  I mean, that's just. I don't think an id, it makes sense here. Like it's just the dates should set it apart and the tool who did it.

    R: Also the tool could be just an attribute.

    M:  Exactly, it's an attribute. Yeah. And then. And then as a tool I can add my own metadata to it to distinguish. But to me the challenge is like your example, Rodolfo, is like I changed the terms in the term base. Like that's a very different what than I changed the text of the translation and like I changed the quality score. Like those are very different things with various structures as well. And that's. That might be hard to encapsulate. That for me is the biggest question.

    R:  Yes, we have a situation because many changes will be applied inside the segment, for example source or target. And in the segment we cannot put elements. We need to use attributes.

    M:  Exactly.

    R:  We can use attributes from any name space that would include the provenance module. So it's important that we have attributes that define the changes. But we will be restricted to single attributes. For example, if a single segment went over had three changes, we cannot store all the changes unless we use a ref.

    M:  And that's why I added ref, because that's needed every. In all the unit level provenance where you want to reference a segment. So like on file level it's not necessary. But on unit level we need the ref.

    R: Yes. So we can have the ref and then you can have any number of changes.

    M: Yes.

    R: We may still need to define if the change applies. We may need a special attribute. Applies to. Yeah, just to select if applies to Source applies to target.

    M:  Nothing should ever change in the source. No change should ever be made to the source.

    R:  you may not make a change to the source, but not applies to source. A term can be found on source. So I'm not saying modify this. You can add terms for the search. So it doesn't mean you changed search. We can sing but don't blame the sinner. Don't say who edited the search.

    M: Ok.

    L:  So if I understand it correctly, these changes means that there has been a change. But what if we want just to specify the date when the original content was created?

    M:  Now that's a good question. So like should certain elements of this provenance, should they be top level to the XLIFF saying this XLIF was generated at this date. So this is the starting point of this whole change direct thing.

    R: You can, you can apply that attribute to the file element to the file level.

    M: I like that.

    R:  Reading this, I also had a comment about something. I believe it was you, Lucia, that said the recommended pattern to use for dates is this. I would not recommend, I would specify the pattern. So we have Compatibility. So instead of saying you may use any ISO format. No, use this for this pattern from the ISO. So we all have. We all use the same.

    L: I agree, I had taken that verbatim from1.2.

    L:  What is left now is to clean up a little bit the document. I haven't had the time to see what Mathijs wrote for this new changes element. So we might need some wording for it and have just like the format of how you define an element, and so on. And then redefine those attributes as attributes of, I guess change or changes. And then have this, the last one, this usage and usage unit. It would be like an element and a figure to move them around.

    R:  So it's okay. One comment about usage. If we make it. If we make usage an element, it must contain a number. So we define the content. We can say this has a required attribute, which is usage unit, and contains a number.

    L:  Are you talking about the element? Or if you define this as an attribute?

    R:  No, if we go, we talk about making unit an element, a child of change. But we need to define the content. The content of usage should be a number because we are counting. And the attribute usage unit, which defines the way we count must be required.

    M:  Just as a side note, if it happens that in this whole thing that only users use unit are separate elements and the rest is attributes, we might then just say, okay, for cleanliness, it's just an attribute, and they become optional. So that the unit becomes optional.

    R:  Yeah, unit would be optional child of change. If it's present, it must contain a number.

    L:  So usage will be the element and unit will be the attribute of usage.

    M: Yes.  Does it make sense for us to introduce a required attribute that then we say, but you can freely specify this, like, because we can think of characters, words, sentences, unit, minutes, tokens, but they're. It could be so much more where different tools charge. Also in the future, does it make sense to make such a required thing if we don't know the actual units,

    Mi:  I think we have a bit of a problem with the number of words. Right. The standard that we have, it's ancient.  To the point where you cannot find the standard anymore. I only managed to find a mirror on somebody's personal website.

    R:  You can find it on Unicode TR. I believe it's 39.

    Mi:  We have to say very clearly, what do you mean by word? How do you count words?

    M:  But that's not. No, I think that's not the point because that's up to the tool. So the mt, they charge something and they will tell you how they charge. And of course, word counting is not an exact science. So we are not going to count words for you. But it's just whatever unit this tool gives you that apparently this number means it's not our responsibility or anyone else's to check if it's correct.

    Mi:   the whole point of XLIF is for interchange. Right. It means between your tool and my tool. If your tool and my tool define what the word means is different, then the whole thing is not portable.

    M:  Sure. But the reason why we're recording usage is not to then operate on that, but only for recording purposes. So my tool can read your tool's usage and just use that to save it to then store data about it. It's not to then use that somehow and add logic to it.

    Mi:  Then I would say if we make this unit stuff mandatory, we should have the tool, a tool id. Mandatory tool.

    M: Yeah. And so therefore that. That is. Yeah, maybe that's indeed the way to shape it. So then the tool itself becomes the. The element. And then the usage goes on top of that. Mandatory. And then the tool has a reference. So you can, for example, reference a particular model in OpenAI. Whatever. Yeah, that's the tool. And that will give you a number. And that will be, for example, tokens. You don't know how it counted that. We don't know how it counted that. But you at least know it's that tool. And it's that. That's how they charge. And this is, of course, the unit

    R:  that's the cost in that tool.

    M: In that tool.

    Mi:  Yes, yeah, yeah, but I need to know the tool.

    Ma:  Yeah, you need to know the tool for it to make meaning. Yeah, and that's maybe also it. Maybe we don't need units, we just need the tool. And then the units follow from the tool.

    R: So we have the unit for a given tool that gives us a cost in whatever that unit says.

    M:  Yeah, it's all about the tool. Indeed.

    L:. Okay, so how do you suggest that we continue with this? I don't know. Matthias, if you want to put more into the document of all these ideas. If you want me just to try to capture what we discuss and then you can work on that.

    M: Yeah, let me write something more down later today even so that it's still fresh.

    L: thank you. No more business. Meeting adjourned.



    ------------------------------
    Lucía Morado Vázquez
    Researcher and lecturer
    University of Geneva
    ------------------------------