XLIFF Inline Markup SC

 View Only
  • 1.  Storing native data

    Posted 12-01-2010 22:00
    Hi everybody,
    
    Just a note to remind you that everyone has the action item to think about how to store native data for our XLIFF inline codes...
    
    Here are some general thoughts that may help the discussion:
    
    We have currently three requirements to cover:
    
    a) no storage of the native data
    b) storage of the native data outside the inline code
    c) storing the data along with the inline code
    
    I'll talk only about c) since a) and b) are easy to do.
     
    There are not many ways to store the native data along with its inline code representation. It can be either in some attributes or the content.
    
    === Storing in some attribute(s):
    
    For example: 
    
    Pros:
    
    - Avoid having mixed text nodes in XLIFF: all text nodes are real text. No guessing to do by looking at the parent: makes some processing a lot simpler.
    
    Cons:
    
    - preserving whitespaces may require some efforts
    
    
    === Storing as the content of the inline code:
    
    For example: text
    
    Pros:
    
    - Easy to store even large data chunks
    
    - Easy to preserve whitespaces
    
    - Provide automatically a way to handle overlapping codes, no need to have a different markup.
    
    Cons:
    
    - Introduces text nodes that are not text content
    
    - Force to break the XML model: the content affected by the code is not its descendants anymore it's its siblings until the end-code element is reached.
    
    - Forces use of matching id or other synchronization mechanism to pair the start and end elements.
    
    - Forces to provide another mechanism to allow XML-friendly paired code for inline markup with no native data.
    
    
    Cheers,
    -yves
    
    
    


  • 2.  Re: [xliff-inline] Storing native data

    Posted 12-02-2010 02:33
    > Cons:
    >
    > - preserving whitespaces may require some efforts
    One potential way we could deal with this would be to require that any 
    significant white space be prefixed with a back slash, just as is done 
    in path names in Unix, i.e., “\”. This would require that a true 
    backslash be represented as “\\”. This sort of escaping (particularly of 
    spaces and backslashes) is well known and understood, and it would allow 
    multiple spaces to be included unambiguously, e.g., “\ \ \ \ #AD23” 
    (starting with four spaces). It would mean a certain amount of 
    processing would be required, but it would not be unreasonable and it 
    would solve the problem.
    
    -Arle
    
    


  • 3.  RE: [xliff-inline] Storing native data

    Posted 12-06-2010 08:56
    Hi Yves/all,
    
    Please find a couple of thoughts/comments below. They refer to some of the concepts/terminology suggested in http://lists.oasis-open.org/archives/xliff-inline/201012/msg00002.html ...
    
    >We have currently three requirements to cover:
    >
    >a) no storage of the native data
    >b) storage of the native data outside the inline code
    >c) storing the data along with the inline code
    >
    >I'll talk only about c) since a) and b) are easy to do.
    
    I would tend to rephrase this as follows:
    
    In cases where supplementary entities related to inline entities, or surrogate entities (for "illegal/non-XML" characters) have to be represented, two representation mechanisms can be envisioned:
    
    1. attached to the material to which it pertains (corresponds to b) )
    2. detached from the material to which it pertains (corresponds to c) )
    
    I wonder if we really should come up with suggestions for both mechanisms. Is there a requirement to have both? If not, I would tend to think that we should only devise a single representation. This presumably will be beneficial for interchange (since tool's providers only have one choice how to represent/implement the aforementioned entities).
    
    As for ideas related to the possible representation mechanism, two things come to mind:
    
    1. distinguishing between "carrier" and "data"
    2. using XPointer
    
    The distinction between "carrier" and "data" pertains to the following observation: The "data" in the example could be stored outside of the "carrier" (here: the "start/end" attributes).
    
    	Example:
    
    		
    
    	Alternative:
    
    		
    
    		<data>[data]</data>
    		<data>[/data]</data>
    
    We thus introduce a level of indirection - and need a mechanism to establish a link between the "carrier" and the "data". This could be done via XPointer in various ways.
    
    One possibility is to use XPointer in the "carrier":
    
    		
    
    		<data>[data]</data>
    		<data>[/data]</data>
    
    Another possibility would be to use XPointer in the container for the "data":
    
    		
    
    		<data>[data]</data>
    		<data>[/data]</data>
    
    If I understood XPointer correctly, you may even get rid of the "carrier" since XPointer does not only allow you to easily identify even strings. Example:
    
    		
    
    		<data>[data]</data>
    
    This would attach "[data]" to any string "high".
    
    Best regards,
    Christian
    
    


  • 4.  RE: [xliff-inline] Storing native data

    Posted 12-06-2010 16:23
    Hi Christian, all,
    
    
    > I wonder if we really should come up with suggestions 
    > for both mechanisms. Is there a requirement to have both?
    
    Yes there is.
    
    
    > One possibility is to use XPointer in the "carrier":
    >	
    >	<data>[data]</data>
    >	<data>[/data]</data>
    > Another possibility would be to use XPointer in the container for the "data":
    >	
    >	<data>[data]</data>
    >	<data>[/data]</data>
    
    From a practical viewpoint I'm not sure if we would want to "burden" an XLIFF document with XLink syntax for every single inline code.
    It seem this would also restrict the way tool can work, forcing them to use XLink, and possibly preventing XLIFF implementation where XLink processor is not available.
    But linking the data to the code using a unique id could certainly be used. And implemented in XLink for the applications choosing to do so.
    
    This separation between the code tag and its corresponding data is used by SDLXLIFF currently. I wonder if anyone has any feedback on its usage: What are its advantages vs. its drawbacks? For example it seems relatively easy to work with when you have the whole document in memory, but it may be less practical when using a stream-based parser, or when working with snippet of XLIFF, etc.
    
    -ys