OASIS Topology and Orchestration Specification for Cloud Applications (TOSCA) TC

Expand all | Collapse all

RE: [tosca] Re: Proposal: CSARs should be tarballs, not ZIPs

  • 1.  RE: [tosca] Re: Proposal: CSARs should be tarballs, not ZIPs

    Posted 04-16-2021 19:26




    Let s put this on the agenda for Tuesday s meeting. A couple of observations:
     

    I m not sure if the CSAR version should be related to the container format. For one, a processor can t get to the version until after unpacking the container. Do we really need to adopt a standard file name extension? Processors should be smart enough to try whatever the various supported container formats are an figure it out for themselves.
     
    Thanks,
     
    Chris
     

    From: tosca@lists.oasis-open.org <tosca@lists.oasis-open.org>
    On Behalf Of Tal Liron
    Sent: Friday, April 16, 2021 10:24 AM
    To: tosca@lists.oasis-open.org
    Subject: [tosca] Re: Proposal: CSARs should be tarballs, not ZIPs

     





    Some additional thoughts:

    Remember that the CSAR version is separate from the TOSCA version. The current CSAR version is 1.1. So my proposal here would be for CSAR 2.0 (it's a significant enough change that I think it would warrant a
    major semantic version change).

    But, backwards compatibility would mean that systems would still be able to support CSAR 1.1, which is in ZIP. To be 100% clear: you could write a TOSCA 2.0 service template and package it in CSAR 1.1. We would have to be clear in the TOSCA 2.0 spec that this
    is supported.

    Another thought regarding extensions -- if we move to tarballs, it might be a good idea to choose a different extension than ".csar" so that processors would easily know if they're dealing with a new-style vs. old-style container. (This
    is a common problem with systems that upgrade their formats.) So, perhaps something like this:



    ".csar" extension: means CSAR 1.1 or CSAR 1.0, meaning it's a ZIP
    ".csar2" extension: means CSAR 2.0 (and beyond), meaning it's a TAR
    ".csar2.gz" extension: GZIPped TAR


    It's a bit awkward, but 100% deterministic.








     


    On Thu, Apr 15, 2021 at 12:03 PM Tal Liron < tliron@redhat.com > wrote:




    In a conversation I had with someone who professes to "hate TOSCA" one of the issues that came up was how bad CSARs are. And one point made hit home.


     


    CSARs are currently defined as
    ZIP containers . Unfortunately, ZIP is not a streaming format, instead requiring random access to locations in the container. The entire container needs to be read in order to access an individual entry. Thus The any processing of a CSAR has to take place
    on an accessible file system, which means that if the CSAR is at a URL then the whole package would have to be downloaded first.


     


    If you're dealing with a CSAR with very big artifacts (virtual machine images) then this quickly becomes a major burden on different parts of the system which need to process specific parts of a CSAR. This is indeed a pain point with currently
    existing TOSCA solutions, e.g. ONAP.


     


    There's a reason why "tarballs" are so often preferred in packaging. A ".tar.gz" file is streamable for two reasons: gunzip is streaming decompression of a single file, and that single file is a "tape archive" (tar), which is a straightforward
    concatenation, likewise streaming. There is no random access. Thus a CSAR processor can choose to process just a specific entry and not have to download the entirety. It can throw away bytes that do not interest it.


     


    Note that if one can benefit from random access to a tarball, then it's easy enough to unpack it in its entirety, and indeed in a much more efficient way than a ZIP: the tarball can be unpacked and streamed directly to the filesystem. A
    ZIP would still have to be downloaded first to accomplish the some function, leading indeed to more than double the storage requirement.


     


    So, it's very obvious to me that this needs to change in TOSCA 2.0 with a new CSAR specification.


     


    My specific recommendations:


     


    1. Let's first standardize on TAR. So a raw ".csar" extension would be exactly a "tape archive" (a tarball).


    2. Let's then standardize on GZIP for the supported algorithm. So a ".csar.gz" extension would imply a GZIPped CSAR. There are many other popular algorithms used (bz2, xz) but in the interests of interoperability it's best to recommend
    one. The usefulness of adding the extra ".gz" is to clarify if decompression is needed, and indeed many toolchains recognize that convention automatically.


     











  • 2.  Re: [tosca] Re: Proposal: CSARs should be tarballs, not ZIPs

    Posted 04-16-2021 20:48
    On Fri, Apr 16, 2021 at 2:25 PM Chris Lauwers < lauwers@ubicity.com > wrote: Let s put this on the agenda for Tuesday s meeting. A couple of observations: I m not sure if the CSAR version should be related to the container format. For one, a processor can t get to the version until after unpacking the container. You might be right that CSAR doesn't have to be linked to the container format. But if that's the case, we would need to update the spec, because it currently is linked. I think that it is a good idea to link it. This gives guidance to TOSCA processors as to what they need to support. If we leave it implementation-specific there would be no portability at all. By the way, I also glossed over the TOSCA-Meta-File-Version, which is currently at 1.0. That does not have to bump. That's what you can indeed only read after unpacking. But the CSAR version does refer to the container itself and its structure. Do we really need to adopt a standard file name extension? Processors should be smart enough to try whatever the various supported container formats are an figure it out for themselves. I do agree with this. For example, TOSCA source files are in ".yaml", not in ".tosca". The extension adheres to the format, not the content. In the same way I think a CSAR should be ".tar", ".tar.gz" (or ".zip" if we still want to support it? I hope not). I kept the ".csar" only for backwards compatibility, but it's also true that we don't have to be backwards compatible. This is not what we currently have in the spec, so we would again need to update it. It's a good idea, I support it.