OASIS Open Document Format for Office Applications (OpenDocument) TC

 View Only
Expand all | Collapse all

17.5 on IRIs

  • 1.  17.5 on IRIs

    Posted 09-22-2008 16:54
    Greetings,
    
    To continue the discussion from the call this morning, I would call 
    everyone's attention to a prior suggestion by Michael (I overlooked this):
    
    > *Every IRI reference that is not a relative-path reference does* not
    > > need any special processing. This especially means that absolute-paths
    > > do not reference files inside the package, but within the hierarchy the
    > > package is contained in, for instance the file system. IRI references
    > > inside a package may leave the package, but once they have left the
    > > package, they never can return into the package or another one.
    Note that we still have:
    
    "but within the hierarchy the package is contained in, for instance the 
    file system."
    
    which doesn't make sense, or at least not at first.
    
    To some degree guessing on my part but I think the language of both the 
    first and second sentence were meant to be talking about IRIs that are 
    not relative paths.
    
    Thus:
    
    First sentence: wants to say: No special rules for non-relative path 
    references. (ok by me)
    
    Second sentence wants to say: Repeats about absolute paths don't 
    reference files in the package (repetition but ok) *and* that absolute 
    path IRI can reference packages, for example in a file system.
    
    In other words, the second part of the second sentence was simply an 
    *observation* about the capacity of an absolute path IRI.
    
    Third sentence wants to say: IRI can point to something outside the 
    package (ok) but once it leaves it can't come back. ???
    
    Well, but IRIs only point to one location and since we don't have link 
    hubs (XLink feature) there is no known mechanism for a single IRI to 
    point outside of a package and then back into a package. Noting that we 
    have already said that absolute IRI can't point into a package.
    
    OK, having gone the long way around (apologies but I wanted it to be 
    clear that remarks from others and not any cleverness on my part has 
    resulted in the following) here is what I would propose to "fix" the 
    paragraph in question:
    
    ****
    Every IRI reference that is not a relative-path reference does not need 
    any special processing. Absolute-paths can not reference files inside a 
    package, but may, for instance, address packages that are held in a file 
    hierarchy. IRI references inside a package may address anything 
    addressable by an IRI that is outside of a package, but no IRI outside 
    of a package may address any location within any package.
    ****
    
    A bit wordy for me and I would suggest further edits on the second 
    sentence, now that I suspect we know what was meant and to replace the 
    paragraph with:
    
    ****
    Every IRI reference that is not a relative-path reference does not need 
    any special processing. Absolute-paths can not reference files inside a 
    package. IRI references inside a package may address anything 
    addressable by an IRI that is outside of a package, but no IRI outside 
    of a package may address any location within any package
    ****
    
    The second half of the third sentence strikes me as redundant with the 
    second sentence. So, my personal preference would be:
    
    ****
    Every IRI reference that is not a relative-path reference does not need 
    any special processing. An absolute-path IRI can not reference files 
    inside a package. IRI references inside a package may address anything 
    addressable by an IRI that is outside of a package.
    ****
    
    Note that I started to say "that is outside of *the* package" to make 
    reference to the package containing the IRI but that would be wrong 
    because we don't want absolute IRIs addressing files in *any* package.
    
    Hope everyone is having a great day!
    
    Patrick
    
    -- 
    Patrick Durusau
    patrick@durusau.net
    Chair, V1 - US TAG to JTC 1/SC 34
    Convener, JTC 1/SC 34/WG 3 (Topic Maps)
    Editor, OpenDocument Format TC (OASIS), Project Editor ISO/IEC 26300
    Co-Editor, ISO/IEC 13250-1, 13250-5 (Topic Maps)
    
    


  • 2.  RE: [office] 17.5 on IRIs

    Posted 09-22-2008 17:55
    Patrick,
    
    Here's what I notice, looking over the complete 17.5 more carefully:
    
    1. The first sentence of 17.5 suggests that relative URI can be used to
    reference files within the file system.  (On the call, I thought that was
    allowed, then I thought it wasn't, and now I think it is again.)  Suggested
    clean-up: change "but can also" in 686 line 5 to read "and can also".  (This
    is minor).  This is a very handy thing (e.g., with master documents and
    other cases where material is incorporated in a document by reference),
    since it allows the material to be moved as a group and have the collection
    still works.  [I note further that the preservation of referential integrity
    for *any* URI that lead out of a package is no more assured for absolute URI
    than for relative URI.]
    
    2. It appears we are attempting to ban URI schemes that might allow access
    to a part of a Zip file.  I don't think that is in our power.  So some
    statements in 17.5 about how parts of a package are not to be accessed are
    too strong in their wording.  
    
    3. I suspect what we want to say is that within files contained within
    packages, URIs that refer to sub files in packages (1) shall be relative,
    (2) shall only refer to sub files in the same package, and (3) shall not in
    any way depend on the current location and any naming of the package
    containing the referencing file and the target sub-file.  [I am not sure
    this closes all of the doors, but it is as close as I can figure what is
    intended.  We also need to consider whether it could be should rather than
    shall, although I think it would be a stretch to change it from shall/must
    at this point.] 
    
    4. If (3) is the gist of it, I think we can make 17.5 much simpler. 
    
    I will step away from the keyboard now, waiting to see what others have to
    say about this.
    
     - Dennis
    
    


  • 3.  Re: [office] 17.5 on IRIs

    Posted 09-22-2008 20:38
    Hi,
    
    I prefer the version 2 of Patrick's proposed text.
    
    When I was translating ISO/IEC26.300 to Brazilian Portuguese, I've spent 
    almost a week to find the best way to translate this particular sentence 
    to Portuguese. I think that this proposal number 2 will be easily 
    translated and explicit everything that we need to say.
    
    Best,
    
    Jomar
    
    Patrick Durusau escreveu:
    > Greetings,
    >
    > To continue the discussion from the call this morning, I would call 
    > everyone's attention to a prior suggestion by Michael (I overlooked 
    > this):
    >
    >> *Every IRI reference that is not a relative-path reference does* not
    >> > need any special processing. This especially means that absolute-paths
    >> > do not reference files inside the package, but within the hierarchy 
    >> the
    >> > package is contained in, for instance the file system. IRI references
    >> > inside a package may leave the package, but once they have left the
    >> > package, they never can return into the package or another one.
    > Note that we still have:
    >
    > "but within the hierarchy the package is contained in, for instance 
    > the file system."
    >
    > which doesn't make sense, or at least not at first.
    >
    > To some degree guessing on my part but I think the language of both 
    > the first and second sentence were meant to be talking about IRIs that 
    > are not relative paths.
    >
    > Thus:
    >
    > First sentence: wants to say: No special rules for non-relative path 
    > references. (ok by me)
    >
    > Second sentence wants to say: Repeats about absolute paths don't 
    > reference files in the package (repetition but ok) *and* that absolute 
    > path IRI can reference packages, for example in a file system.
    >
    > In other words, the second part of the second sentence was simply an 
    > *observation* about the capacity of an absolute path IRI.
    >
    > Third sentence wants to say: IRI can point to something outside the 
    > package (ok) but once it leaves it can't come back. ???
    >
    > Well, but IRIs only point to one location and since we don't have link 
    > hubs (XLink feature) there is no known mechanism for a single IRI to 
    > point outside of a package and then back into a package. Noting that 
    > we have already said that absolute IRI can't point into a package.
    >
    > OK, having gone the long way around (apologies but I wanted it to be 
    > clear that remarks from others and not any cleverness on my part has 
    > resulted in the following) here is what I would propose to "fix" the 
    > paragraph in question:
    >
    > ****
    > Every IRI reference that is not a relative-path reference does not 
    > need any special processing. Absolute-paths can not reference files 
    > inside a package, but may, for instance, address packages that are 
    > held in a file hierarchy. IRI references inside a package may address 
    > anything addressable by an IRI that is outside of a package, but no 
    > IRI outside of a package may address any location within any package.
    > ****
    >
    > A bit wordy for me and I would suggest further edits on the second 
    > sentence, now that I suspect we know what was meant and to replace the 
    > paragraph with:
    >
    > ****
    > Every IRI reference that is not a relative-path reference does not 
    > need any special processing. Absolute-paths can not reference files 
    > inside a package. IRI references inside a package may address anything 
    > addressable by an IRI that is outside of a package, but no IRI outside 
    > of a package may address any location within any package
    > ****
    >
    > The second half of the third sentence strikes me as redundant with the 
    > second sentence. So, my personal preference would be:
    >
    > ****
    > Every IRI reference that is not a relative-path reference does not 
    > need any special processing. An absolute-path IRI can not reference 
    > files inside a package. IRI references inside a package may address 
    > anything addressable by an IRI that is outside of a package.
    > ****
    >
    > Note that I started to say "that is outside of *the* package" to make 
    > reference to the package containing the IRI but that would be wrong 
    > because we don't want absolute IRIs addressing files in *any* package.
    >
    > Hope everyone is having a great day!
    >
    > Patrick
    >
    
    


  • 4.  RE: [office] 17.5 on IRIs

    Posted 09-23-2008 02:13
    Interesting.  I see jurisdictional problems with statements along these
    lines:
    
        Absolute-paths can not reference files 
        inside a package. IRI references inside 
        a package may address anything 
        addressable by an IRI that is outside of a 
        package, but no IRI outside of a package 
        may address any location within any package.
    
    I don't think we have jurisdiction over IRIs and over the Zip format.  We
    can restrict how those are applied for ODF (as with the Zip MIME-type hack
    for ODF packages), but only in the context of their occurrence in the ODF
    representation of documents.  
    
    In particular, I don't see how we can say anything about ways applications
    handle IRIs not found within an ODF-format file and whether schemes that
    provided access to sub-files inside of an ODF package from outside of that
    package can be defined or not.  It seems to me that we have no jurisdiction
    over such prospects.
    
    I can understand removing the obligation of ODF processors from dealing with
    IRI schemes in an ODF-format file that would direct them to sub-files in
    packages other than any holding the ODF-format file  in which the IRI
    appears.  Of course, for an IRI whose entire resolution can be delegated to
    the host operating system, I am not sure why this is much of a concern.  
    
    MORE THINKING ABOUT RELATIVE PATHS (not the same as RELATIVE URI/IRI)
    
    Relative paths must be resolved by the ODF processor, and it is useful to
    limit the problem and confine resolution to internal sub-files of the same
    package.  
    
    If the relative path reaches above the package, as allowed in 17.5, it can
    then be transformed into a relative path to be resolved using the operating
    system and with a base location using the directory in which the current
    package is located.  This seems to be what 17.5 is out to accomplish.  It is
    hard to know whether such resolution could retrieve a part of some package
    and how the processor would be able to detect and prevent that.  We can
    certainly say that relative paths should not lead to such a situation.
    
    There is an additional wrinkle in the use of relative paths.  Although it is
    clear that an IRI can be encoded in UTF8 (or UTF16) in the XML attribute
    that has an IRI as its value, it is not clear whether anything but
    single-byte codes (and preferably 7-bit only codes to avoid any ambiguity
    with double-byte coding schemes and single-byte codepage variants) can
    appear in the part that codes a Zip filename.  The cited reference on the
    Zip format doesn't say boo about the character encoding of filenames for the
    Zip elements.  When the relative IRI refers up and out of the package,
    handling any Unicodeness will be a matter to be worked out with the hosting
    platform.  (The string of "../" parts that move up to the directory location
    of the package and are stripped out to leave the directory-relative part are
    the same in ISO-646 and UTF8 and in wide characters, so no problem there and
    these disappear in the creation of a package-internal Zip filename.)  
    
    Finally, I note that 17.5 does not prohibit a relative IRI from being used
    to refer to the (Zipped) package itself, or another Zipped package.  We just
    don't want to allow IRIs that somehow refer into a package.
    
    Just thinking around the ramifications, not proposing at this point  ...
    
     - Dennis 
    
    


  • 5.  RE: [office] 17.5 on IRIs

    Posted 09-23-2008 03:53
    Patrick and others,
    
    A quick question.  My reading of this ISO 26300 passage
    
    "A relative-path reference (as described in §6.5 of [RFC3987]) that occurs
    in a file that is contained in a package has to be resolved exactly as it
    would be resolved if the whole package gets unzipped into a directory at its
    current location. The base IRI for resolving relative-path references is the
    one that has to be used to retrieve the (unzipped) file that contains the
    relative-path reference."
    
    is that the base IRI is the Zip-file "path" to the sub-file that contains
    the relative-path reference.  So the relative path may need to be
    de-"../"-ed and the correct prefix pushed onto what's left to make the Zip
    file name that is used.   Of course, if de-"../"-ing exceeds the path depth
    of the sub-file, we know that the remainder must be used with the host-OS
    file system.  
    
    For the most part (e.g., in the content.xml file), the in-package relative
    URIs start with "./" followed by what is actually the Zip filename for the
    referenced object.  
    
    I have a master document and the way it's content.xml file references the
    related component documents is via relative URIs such as
    "../MasterChecker2.odt".
    
    I have no handy ODF samples with any XML files below the top level of the
    package that also have relative URIs in them.  The manifest.xml file is not
    at the top level, but its file references are all via manifest:full-path
    references and these are strings, not URIs.  In fact, they are like Zip
    filenames for Zip package parts (and some empty directory-named parts)
    although there is no description in the ODF specification of the format or
    encoding of these attribute values.
    
    But we seem to have enough evidence that the base IRI for relative-path
    references is the IRI of the container of the package sub-file that contains
    the references.
    
    So this particular paragraph seems perfectly clear.  In 17.5, it is only the
    final paragraph that is lacking, and the list at the beginning of the
    section is problematic too.
    
     - Dennis
    
    


  • 6.  Re: [office] 17.5 on IRIs

    Posted 09-23-2008 09:26
    Dennis and others
    
    2008/9/23 Dennis E. Hamilton 


  • 7.  Re: [office] 17.5 on IRIs

    Posted 09-23-2008 10:10
    I forgot to attach documentsignatures.xml to previous post.  Here's
    the relevant section anyway:
    
    


  • 8.  Re: [office] 17.5 on IRIs

    Posted 09-23-2008 14:48
    Hi Bob,
    
    On 09/23/08 11:28, Bob Jolliffe wrote:
    > Dennis and others
    > 
    > 2008/9/23 Dennis E. Hamilton 


  • 9.  RE: [office] 17.5 on IRIs

    Posted 09-23-2008 17:09
    Michael's rationale in response to Bob Jolliffe is consistent with the
    evidence I found in some ODF documents produced with OO.o.  Relative-path
    URIs are to be interpreted as relative to the location of the sub-file
    containing the URI-valued attribute.
    
    The following observations may be helpful in the comprehension of this
    approach:
    
    0. GLOBAL EDGE CASES (SELF-REFERENCE, etc.).  The problem of needing to know
    the name of files at the end of relative paths is not new.  It is clearly a
    brittle arrangement to embed the filename of the current package (or any
    other) in a reference.  Likewise, referring to the same file-part having the
    reference is easy to do and as interesting to make work (or not).  The
    prospect exists in any path-based system that can navigate into, up,  and
    down a hierarchy.  The prospect is equally-available for absolute paths.
    There is no assurance that such a reference will work (not only because a
    file is not present [with the expected name] but also because of locking and
    so on.  On the other hand, I have on my computer a number of applications
    that are smart enough to detect when a file already being worked on is
    referenced and then make it work.  (I haven't tested any of them to see if
    they can detect and prevent unending recursions, but that is rarely a
    problem.  Notice that browsers don't mind.)
    
    1. NOT USING URIs FOR COMPLETE PACKAGE PART REFERENCES.  Although the
    manifest:full-path attribute is underspecified (i.e., not much description
    at all) in ODF 1.0, it has a string value, not a URI value.  For sub-files,
    it is the full in-Zip filename (that is, having the in-zip "path" built-in).
    This attribute or maybe a package:full-path attribute might be co-opted for
    the DSIG case (just making things up here).
    
    2. PART-RELATIVE EASES NESTING.  The advantage of using the location of the
    reference-bearing sub-file as the "base URI" is that embedding of one ODF
    document in another via nesting in a single package (not a package in a
    package) still works, and the embedded ODF document does not have to be
    reprocessed in any way, simply imported into the appropriate nesting level
    with prefixing of filenames to reflect the new nested location.
    
    3. LACK OF WAY TO BE PACKAGE-RELATIVE.  The disadvantage of (2) is the
    absence of a way to treat the top of the actual package as a "home" point.
    It would be possible to co-opt "/" or even "~/" for this, but the first
    contradicts other quite-clear provisions of 17.5 and the second is likely
    misleading.  But other devices are possible.
    
    4. BRITTLENESS OF GLOBAL DEPENDENCIES.  A problem with this hypothetical
    nesting in (2) is that any package-relative paths that go global to the
    nested package are now likely broken.  A device such as (3) would help, but
    there is the problem of making sure that the external content is preserved
    in its path-global location and that there are not name conflicts for
    external content wanted by the containing document and nested documents.
    (Although it doesn't solve name-conflict cases, if a package (e.g., an .odt
    file) were nested in another package, the rule for relative global
    references could always involve going to the location of the top-level
    package.)
    
    5. IMPROVING GLOBAL-DEPENDENCY MANAGEMENT.  The scenarios in (2-4) are
    rather far-fetched and relative paths always turn out to be insufficient
    for global references, one way or another.  The last time I worried about
    such things (circa 1990-1992), we found that the ideal way to deal with
    references global to packages was to use indirection.  That is, content
    referred to an index in which external references were carried (the manifest
    is a good place for this), and one could determine external/global
    dependencies by inspection of the index without having to process the
    content.  Under these conditions, relocation and renaming of material and
    even dynamic adjustment when processing from a different place than the
    index assumes can be handled in a reasonable way. But that is a whole
    different game (although important when working with document-management and
    content-management systems that should not touch or alter content in any
    way).  This idea goes back even farther to systems for global-reference
    management for pure procedures in Multics and the like.  And don't you just
    love it when old fogeys talk about how they did it in a past century on
    systems that no one remembers or cares about?     
    
     - Dennis
    
    


  • 10.  Re: [office] 17.5 on IRIs

    Posted 09-23-2008 21:46
    Hi Michael, Andreas and all
    
    Thanks for the clarifications.  Unfortunately I've had my head a bit
    buried in the DSIG stuff so my assumptions about the way relative IRIs
    (or URIs) are interpreted has been shaped in that particular dark
    place :-)
    
    I am happy to hear that the DSIG case is an outlier and a peculiarity
    as I agree with Andreas, that, in my own words, it is quite strange.
    
    So, at least for the purposes of the errata and 1.1, there is no
    inconsistency in the interpretation of the root of a relative IRI - I
    think...
    
    Regarding the DSIG revisions for v1.2, I hope to reshape our earlier
    proposal quite soon.  We are also still waiting for formal comment
    from both the DSS-X and the ETSI XaDES TC's.
    
    Regards
    Bob
    
    2008/9/23 Michael Brauer - Sun Germany - ham02 - Hamburg
    


  • 11.  Re: [office] 17.5 on IRIs

    Posted 09-23-2008 15:02
    On Tue, 2008-09-23 at 10:28 +0100, Bob Jolliffe wrote:
    
    > 
    > On the basis of my experience above, I have worked on the assumption
    > that the way to interpret a relative IRI is that it is always relative
    > to the root directory of the package.
    
    To use your words from a different situation, this starts to stretch the
    meaning of relative and is quite strange. This would be akin of relative
    path names in a shell to always refer to the home directory of the user
    rather than the current location.
    
    Andreas
    -- 
    Andreas J Guelzow 


  • 12.  Re: [office] 17.5 on IRIs

    Posted 09-24-2008 09:40
    Patrick,
    
    On 22.09.08 18:55, Patrick Durusau wrote:
    > Greetings,
    > 
    > To continue the discussion from the call this morning, I would call 
    > everyone's attention to a prior suggestion by Michael (I overlooked this):
    > 
    >> *Every IRI reference that is not a relative-path reference does* not
    >> > need any special processing. This especially means that absolute-paths
    >> > do not reference files inside the package, but within the hierarchy the
    >> > package is contained in, for instance the file system. IRI references
    >> > inside a package may leave the package, but once they have left the
    >> > package, they never can return into the package or another one.
    
    In the resolution of the comment that I have discussed with Murata-san, 
    there is also a change to the previous paragraph. The new text is:
    
    ****
    A relative-path reference *(as defined in ァ4.2 of [RFC3986], except
    that it may contain the additional characters that are allowed in IRI
    references [RFC3987])* that occurs in a file that is contained in a
    package has to be resolved exactly as it would be resolved if the whole
    package gets unzipped into a directory at its current location. The base
    IRI for resolving relative-path references is the one that has to be
    used to retrieve the (unzipped) file that contains the relative-path
    reference.
    
    *Every IRI reference that is not a relative-path reference does* not
    need any special processing. This especially means that absolute-paths
    do not reference files inside the package, but within the hierarchy the
    package is contained in, for instance the file system. IRI references
    inside a package may leave the package, but once they have left the
    package, they never can return into the package or another one.
    ****
    
    This change is important, because it clarifies what we mean by a 
    relative-path reference. A relative-path reference is just a relative 
    path, and differs from the term relative URI. It is one kind of a 
    relative URI, but there are others. For instance, a URI starting with a 
    "/" is a relative URI, but it is not a relative path.
    
    
    > 
    > OK, having gone the long way around (apologies but I wanted it to be 
    > clear that remarks from others and not any cleverness on my part has 
    > resulted in the following) here is what I would propose to "fix" the 
    > paragraph in question:
    
    I'm not sure which part of the paragraph is addressed by the comment for 
    which we a looking for a resolution, but have no objections to improve 
    the language. Your 2nd suggestion sounds okay for me, except that I 
    would add that IRI may also address files within the same package. That is:
    
    ****
    Every IRI reference that is not a relative-path reference does not need
    Any special processing. Absolute-paths can not reference files inside a
    package. IRI references inside a package may address anything
    addressable by an IRI that is outside of a package *or within the same 
    package*, but no IRI outside of a package may address any location 
    within any package.
    ****
    
    Michael
    > 
    > ****
    > Every IRI reference that is not a relative-path reference does not need 
    > any special processing. Absolute-paths can not reference files inside a 
    > package, but may, for instance, address packages that are held in a file 
    > hierarchy. IRI references inside a package may address anything 
    > addressable by an IRI that is outside of a package, but no IRI outside 
    > of a package may address any location within any package.
    > ****
    > 
    > A bit wordy for me and I would suggest further edits on the second 
    > sentence, now that I suspect we know what was meant and to replace the 
    > paragraph with:
    > 
    > ****
    > Every IRI reference that is not a relative-path reference does not need 
    > any special processing. Absolute-paths can not reference files inside a 
    > package. IRI references inside a package may address anything 
    > addressable by an IRI that is outside of a package, but no IRI outside 
    > of a package may address any location within any package
    > ****
    > 
    > The second half of the third sentence strikes me as redundant with the 
    > second sentence. So, my personal preference would be:
    > 
    > ****
    > Every IRI reference that is not a relative-path reference does not need 
    > any special processing. An absolute-path IRI can not reference files 
    > inside a package. IRI references inside a package may address anything 
    > addressable by an IRI that is outside of a package.
    > ****
    > 
    > Note that I started to say "that is outside of *the* package" to make 
    > reference to the package containing the IRI but that would be wrong 
    > because we don't want absolute IRIs addressing files in *any* package.
    > 
    > Hope everyone is having a great day!
    > 
    > Patrick
    > 
    
    
    -- 
    Michael Brauer, Technical Architect Software Engineering
    StarOffice/OpenOffice.org
    Sun Microsystems GmbH             Nagelsweg 55
    D-20097 Hamburg, Germany          michael.brauer@sun.com
    http://sun.com/staroffice         +49 40 23646 500
    http://blogs.sun.com/GullFOSS
    
    Sitz der Gesellschaft: Sun Microsystems GmbH, Sonnenallee 1,
    	   D-85551 Kirchheim-Heimstetten
    Amtsgericht Muenchen: HRB 161028
    Geschaeftsfuehrer: Thomas Schroeder, Wolfgang Engels, Dr. Roland Boemer
    Vorsitzender des Aufsichtsrates: Martin Haering