Dear all,
Please find yesterday's meeting minutes.
Best,
Lucía
----------
Attendance: Yoshito, Rodolfo, Mathijs, Lucia, we have quorum.
Administration
R: I move to approve August 19, meeting minutes –https://groups.oasis-open.org/discussion/meeting-minutes-24
M: I second.
R: Meeting minutes approved.
Technical
Promotion activities of the new version.
L: I have worked on the promotional materials (webinar description, blog post, social media announcement, etc.) of the webinar. It is all in the document I shared with you. About the date, I might not be able to attend the initial one (30 September), do you agree to postpone it for a week, and do it on the 7 of October instead? Is the same time of the official meeting. So it's already on your agendas if you are planning to attend the TC. meeting
R: It is fine with me.
Y: ok for me as well.
L: Thank you for your understanding, I will contact Mihai after the meeting and if he is also ok with it, we will do it on the 7.
L: They also told us to include questions for the registrants, what do you think about these three?.
R: Would we expect people to know about XLIF because we are talking about the changes in XLIFF.
L: The first time we talked about this webinar, we were thinking about doing a small intro explaining what XLIF was. Then we took that out of the agenda. But I think it would be nice to have. I don't know if people that don't know about XLIFF are interested in coming to this type of webinar. So I don't know. It's really up to you.
M: You'd be surprised. Like when I talked about XLIFF at Locked World. There are a lot of people who knew about it, of course, because we're all in the localization industry. But we're like, I really want to know more and learn more about it. So they know it exists, So it doesn't hurt. I'd say.
R: I believe it depends on the answers we get from the audience.
L: I just have another argument for having this introduction that is that this webinar is going to be recorded. So for future audience that might not be familiar with xliff, I think it's a good promotional material just to have this intro so they know what the standard is. When I will send this information to the OASIS team, I can also inquire and know how or when we will know about the answers from the audience or if they can send us some information like two weeks before or one week before. So you can adapt anything if needed.
M: If you need me to cover any topics still, let me know. Happy to also present anything.
R: I would prefer you do the introduction to XLIFF, because I want to see a different point of view. I've been in this for so many years and it's so ingrained in my brain. I want to know how people see this and especially newer generations. I started with this over 20 years ago, so my point of view may be even archaic Maybe wrong, but it's very simple.
M: I'd be happy to do the introduction and, and I can then maybe bring it a little bit into context and explain how I work for electrification tech company. Like we heavily use XLIFF and we love it and, and why. Perhaps a little bit of background before the official proceedings.
Y: I think that's the important part..
R: I am still explaining XLIFF to people.
L: Okay, so that's great. To organize according to the timeline that we have. Do you think it's that you will be able to have a look at this document? And if you can, let's say by Thursday, if you can just have a look and see if you want to change things, so that I can send on Friday all the information to them. If no news, I just assume that you agree with everything that's there.
Revisit the test suite topic: https://github.com/oasis-tcs/xliff-xliff-22/tree/master/xliff-21/test-suite . Yoshito.
Y: started working on that. I realized one thing, I had a look the schema of the modules, and it is aligned to 2.0, but the new to 2.2. But you know, it's kind of awkward. Right?
R: Yes. Because the other modules represent things that already exist. And when 2.1 was released. Let me check. I'm looking for the ITS module. The namespace for its has 2.1 at the end, but it's in the very different namespace. So it doesn't even follow the same pattern. Let me share this. I'm sharing the scroll. Let me go back to ITS module module namespace and validation. Look at this ending. It's completely different. It doesn't follow the pattern. So I don't see a very big problem here.
Y: This is not a problem, just a consistency issue. And I just want to confirm the scope of this work.
R: So the problem you see with the test suit was that it was actually mixed because there were many things in the valid folder that were not valid. I started fixing them. I don't know if I committed all changes, but there were too many things that were listed as valid that were not valid. And there were many errors in the invalid ones that were not the only error.
Y: Okay. So I think so, we just work on this stuff in two step one is that just to incorporate like a 2.2 changes. So and second step is like we need a validation system.
R: Yes. And to make sure that in the invalid ones are only where the title says.
Y: Yeah, yeah. And one. Another question is like in and out. So. So I'm not 100% sure about the purpose of in and out.
R: Now that was a crazy idea at the time. The idea was there will be a process that takes a file that those are the files that are in the process will ruin the files and then you get the output and then you need to compare the input with the output, find the errors and things like that. I'm not sure it's really.
Y: For the future purpose, maybe we drop in and out. I will try to finish this topic before the next call.
R: when you will make the changes I will run all the files in my validation tool. That reminds me that I have to update my validation tools to support 2.2.
M: Yes, I'll talk more about that later. That's my whole story.
M: Okay, you can start now.
Metadata exchange information, Mathijs, all.
M_ I've been working on XLIFF for the last two weeks and part of the goals was to also support 2.1 and 2.0 more properly. So initially we by default just do 2.2. And we thought, oh well, we thought naively that it was forward compatible. Not so. We discovered that phrase only accepts 2.0 and we discovered that most likely phrase uses your library under the hood to check if it's valid or not. And that your tool Online only does 2.0. Anyway, the only structural changes, as far as I know, as far as I've read the documentation carefully, is that 2.2 compared to 2.1 has the top level notes and metadata.
R: Yes.
M: So we had to figure out a way. if a 2.2 file has top level notes of metadata, how do we put them in 2.1 and 2.0 instead and make it still completely transferable? So we built something for that. That's not the problem.
Y: Yeah. And just one more change. Like, you know, there is a way to add the nodes, point to the segment directly without mark.
M:. So because of the change, notes can be top level, but that's not reflected in 3.3.1.21. About ID, where it says that the unique must be note must be unique ID value in the immediate file group or unit. But I'm missing XLIFf here.
R: No, it's not an omission. The thing is, you have one XLIFF element at all in the file.
M: It must be unique within the xliff.
R: Yes, but it does. It really doesn't matter that much because if you make it unit at XLIFF level, then you cannot reuse the ID in at another level. So it will complicate things unnecessarily. The important thing is when you have file units, don't repeat the IDS because that's where people will be making changes most of the time. But if you have the notes at xliff level, the order of nodes or the IDs would not matter because they apply to the full file.
M_ But does the ID still be unique? Right?
R: Should be unique, but it's not mandatory. You can. Yes, because there is no provision anywhere saying that it must be unique at XLIFF level, but important at unit and file.
M: Okay, I thought it was an oversight, but that's interesting to learn about. Very niche, but sure.
R: I hope you understand the reason the note at 5 level applies to that file. And it's important that we don't cross files. Yes, and at unit level it's the same situation.
M_ It doesn't cross anything. And XLIFF does not cross anything because there's only one XLIF top level.
R: Okay, but If you're at the top level, notes and metadata applied to the full file and you only have one notes container. In that case it really doesn't matter at all if you have duplicated ideas or not.
M: Okay, clear. I was then looking, basically working on the codes for both nodes and metadata, because they both had to be moved from the top level to a file to just put into a group to just say like, look, this is actually part of the top level. And I noticed like, both notes and metadata have a category attribute, except that in metadata a category is actually more of a group level thing, while notes, you just have to have the note with the same category. I found it interesting. I was like, it feels weirdly discrepant that Notes and category. Sorry, metadata behave differently in this way.
R: Yes. Right. You're right.
M: Is that like discussed in the past?
R: not that I remember.
R: The problem with the notes in category that you're limited.
M: Yeah. So I can do this. And they all have the same category in theory. But I'm actually wanting to make groups. That's what I felt like I needed because I wanted to make a group. Just to explain actually this node actually belongs on the top level. But I'm just doing this to be compatible with 2.1 and 2.0. So they both have this category thing. Except the category in metadata is more flexible because you have these groups that you can give a category to and then everything below falls in the same category. So I feel like maybe notes need these. These groups as well. And then they're fully compatible with each other or aligned.
R: I have been working on that recently, I understand your point completely. (Rodolfo shares his screen and show how metadata is being implemented in his tool). It is quite different when you try to include notes, it is just text.
M: But you can still add category to notes, and this could be useful.
R: The category notes is optional is open, that is a hell. It is a problem.
M: It's very open. Yes, it helps, let's say us tool builders to actually give it some structure so we can a little bit understand what's going on with these notes. And in that way I found like the metadata category system a lot more flexible. And I felt like it was missing for the notes. The category attribute is there for the same reason, but they were implemented in a different way. And I was like, interesting discrepancy.
R: it is a good candidate for a note. For example, if you have a note, you can use it like this, if you have metadata you can do it like this.
R: In XLIFF 2.2, you can go to note without modifying the source. So it depends on what you really need.
M: I mean, we did it anyways and I'm just separating categories by spaces. But to still show a hierarchy, but it's not. Could be better, I feel.
R: Yes, but it's a good candidate, as I see it, for some. For a note telling people, hey, if you have notes, you can do this. If you have metadata, you have other options. Because we have both metadata notes. Notes are very flexible about the content. You can put full instructions in there easily. You have a restriction on how you apply a note to a segment to file or to what you need to point to a valid ID. And that's the big change in xliff 2.2. Before, to apply a node to a segment, you needed to modify the source. And that was horrible because when you create an xliff file, you create the source file, hoping no one modifies the source and works in the target. That was my basic expectation when we started working in xliff 1.0, 1.1, 1.2 and one day, suddenly, to add a node to xliff 2.0, you must modify the source. For me personally, that was a big problem because source should not be changed if you later need to reuse the source because the translator didn't translate a segment and the source was modified and you want to use that source to rebuild the original document. That's an unexpected complication. Now, in 2.0, you can add nodes pointing at the segment without modifying the source. That's the major change. You have the href attribute now that you can use to point to a segment. If you want to apply to the source, you don't need to modify the source. You just put your note and it's in a different container, so it's not touching the content of the segment. So the same thing happens with metadata. You can put the metadata at the unit level. Unfortunately, we don't have metadata at segment level. That's right. That's what I found interesting because I have a client that wants metadata the segment level for the things that I just showed you, because they know I have this unit with four segments inside and the third one has a bad translation. And the reviewers added a lot of comments, things that you can with qualifier grading and things like that that you cannot put in a note because in the review job you need to qualify the segment. And metadata didn't help that much.
M: Ultimately, in my opinion, like metadata and notes both serve the same goal. As in we're annotating something of a particular unit somewhere in the hierarchy. But notes are meant to be human readable and metadata is a lot more machine readable versus providers have their own take on it and they. They create their own. Yes, but because if they. They have a very similar purpose besides the latter. I was just wondering, maybe we need to make them a little bit more similar in structure.
R: We can, we can. It's a. It's a matter of defining what change we want to do.
M: Then just to finish my story, we've had that little thing on the agenda for metadata exchange for a while now. I just wanted to share what we do with metadata. Remove the notes, all the files that we produce, they have this particular metadata where we always store the original name and content type of the file because we really want to get back from a skeleton into the original file. And that's really what you need in order to do so. And the last two are the source and target unique IDs of the content in the original systems. So that we can know, okay, if this translated to going back, where do we put it? So we always transfer that. So that's really how we use the metadata in our particular technology.
Y: It's interesting. Like a metadata is more flexible and add more context.
R: So one use case I found is when you're translating data content, you have a DITA map and you translate the DITA map, but the name of the map would be the original. But suddenly that map is pulling topics from multiple files. And in the end you have an XLIF document with many file elements. You still need to connect them to the same original DITA map. And the original attribute is filling the local topic and you still need to move the container, which is the DITA map. And that's where I fo crazy and have to result to use metadata for that. So it kind of overlapping, but essential because I have two original.
M: The last thing that I came across was that, well, we of course had to do proper conversion to 2.0, and I used your validator, but also online to check if it was valid. And it kept yelling at me saying it was not valid. And that's where I figured out the word follows in the XLIFF specification. And that has to be taken by the letter. So the order of which elements are in. And I found it interesting, the order of elements matters. So your original data has to be before your units. That tripped us up a little because in XML it doesn't matter, but apparently next, if it does. And I found that interesting.
R: Yes, because in the schemas we have an order. It's probably the parcer that's telling you that's not the order of elements.
M: Yeah. So why is the order holy? Why does it matter?
R: It was decided that things that affect a segment or a target or things like that should be placed before. Just imagine you're parsing a unit and you find that there is metadata. You already have that in memory. When you read the segment that matches that, for example, or let's say a match, you want to have the match and then read the segment. Because if you read the segment first and then you read the match, when you reach to the match, you're already away from the segment.
M: But that's interesting because that applies to reading, but writing is the opposite. And that's why we did it the other way around. Initially, when we're writing stuff, we are, okay, we're writing segments. And then we see, oh, there's some metadata or there's some tags. Let's just put that in a little cache. And then after we're done writing all the units now we will write the metadata and the original data. So both have been made for both.
R: For writing, it's much easier because you don't need to write the unit until you have already parsed it completely, so you still have the unit in memory. You put the metadata or the matches or whatever and then write. You don't go back.
M: The fix was simple, but it was interesting to discover. Like, wait, wait. This order really matters.
R: Yes. When that was discussed, the idea was, okay, put everything at the top and you have everything when you reach the segment, the unit. That's why notes go first. That was the problem of pointing to the note to the proper segment.
M: I understand. Just wanted to share the story. It's good to Know because from doing the whole implementation that was the least clear to me that the order matters. It's just in the word followed by. That's where I discovered like oh yeah followed followed by. So maybe somewhere we may consider like the order matters. Somewhere in the top like the order of elements matters.
M: Maybe creating a bad invalid example.
R: Yes, that's a good candidate for invalid.
M: I also checked the source code of your tool of course to discover some things. It was interesting to see our differences in parsing, but that's always what you get.
R: But you're probably looking at the Java version.
Yeah. The thing is that now I'm moving everything from Java to Typescript. Even the XML parser.
M: our tool is also open source (shares his screen)
R: So thanks for sharing.
L: Thank you Mathijs, your experience is very valuable to us and also so fresh. But if you want really those changes to be in the next spec, you can create a proposal that it might not be done like in the next month or so, but that we have it written down in GitHub with all these things that you think that should be clarified (the attribute category, the order matters, etc.) so there are not only included in the minutes but also in GitHub, that will be great.
Y: So what is the process on going forward? We try to capture those like a proposal or stuff in XLIFF 2.2 GitHub repository, or an issue, right?
R: or maybe an errata depending on the changes we have.
Y: Meanwhile, we can open an issue in the repository.
L: The best thing now is just to gather all this information and then we can discuss maybe in another meeting how do we proceed if we are planning to do a new version or if that can be an errata. etc. But the most important thing is that we have everything documented so we can act on that. Thank you.
L: No more business, meeting adjourned.
------------------------------
Lucía Morado Vázquez
Researcher and lecturer
University of Geneva
------------------------------