OASIS Open Document Format for Office Applications (OpenDocument) TC

 View Only
  • 1.  Encryption and data leakage

    Posted 05-11-2010 17:40
    The approach we inherited from ODF 1.1 encrypts each file in the ZIP 
    independently.  Although the contents of the files are not viewable due to 
    the encryption, there are bits of information that  potential "leak", such 
    as:
    
    1) The file size
    2) The file date
    3) The file name
    4) The file mime type
    5) The hash of the first 1024 bytes of the file
    
    For example, even in an encrypted document I could see a file name called 
    "big-secret-takeover-june-3.jpg" and know some information that the person 
    who wrote the encrypted document might be rather surprised to see in the 
    open.
    
    Although not required by ODF, an implementation, if it is clever, can 
    avoid some of these leakages.  For example, the timestamp of the file can 
    be turned into the time of encryption rather than the original time stamp. 
     And the file name can be randomized rather than indicate the original 
    file name.  This might be fine for ODF, since these time stamps and file 
    names are not necessary to be preserved.  So long as as we preserve 
    referential integrity of the package, the names of images are not 
    significant.
    
    However we still should be concerned here.  First, the reason we split 
    Part 3 into its own part was the believe that it could be useful for 
    purposes other than just ODF 1.2.  Many of us hoped that it would other 
    uses.  But I don't think we can assume that all uses can ignore the 
    original file names and time stamps.  These might be significant for some 
    uses. 
    
    Second, even within ODF, especially if we allow package extensions,  we 
    might see items added to packages where the names of files (which may 
    ultimately end user-defined) cannot safely be renamed to random names. For 
    example, there may be referential integrity constraints that a generic ODF 
    processor is not aware of.  Maybe there is RDF that points to a contained 
    image or other package resource.  In any case, the approach is very 
    fragile.
    
    Finally, even without extensions, and with the use of randomized names, we 
    still leak information, based on knowing the size and hash of the first 
    1024 bytes of the file.  For example, if I have a copy of "
    big-secret-takeover-june-3.jpg" I can easily check to see what encrypted 
    documents also contain that same image.  I can similarly probe for any 
    other resource where I know in advance its size and or contents. 
    
    There are three ways of getting around this problem.  (Or at least two 
    that come to mind).  One is to keep a "shadow directory" for the ZIP, that 
    contains the original names, time stamps, and sizes of the files.  Encrypt 
    this  "shadow directory" when the document is encrypted.  For example 
    encrypted file, prepend it with some random bytes (not sure what is 
    optimal) in order to prevent data leakage of original size and hash of 
    first 1024 bytes.
    
    Another approach is to encode the original full path of the file, appended 
    with its timestamp, using the original derived key, base64 encode that, 
    and then write that out as the full path for the ZIP entry. That way you 
    do not need another file in the ZIP. 
    
    The other way is to move to a whole-package encryption method, rather than 
    trying to do this file-by-file. 
    
    -Rob
    


  • 2.  RE: [office] Encryption and data leakage

    Posted 05-11-2010 17:49
    An additional leak comes when embedding additional files. I checked our implementation, and an embedded image will get changed to 'image1.jpg', which doesn't reveal the original file name, but when I embed another document into the base document, it is clear from the package contents there is an embedded document, and the type of document would be apparent.
    
    I also agree that making the hash of the first 1024 bytes of the file public information is a fairly serious flaw.