tosca-comment

 View Only
  • 1.  RE: TOSCA inputs and outputs for shell scripts

    Posted 10-08-2020 02:47
    Thanks Adam. Do you see any disadvantages/advantages of using YAML instead of JSON for exchanging input/output values between the orchestrator and the artifacts (“scripts”)?

    Chris

    From: adam souzis <adam@onecommons.org>
    Sent: Wednesday, October 7, 2020 4:30 PM
    To: Chris Lauwers <lauwers@ubicity.com>
    Cc: tosca-comment@lists.oasis-open.org
    Subject: Re: TOSCA inputs and outputs for shell scripts



    On Tue, Oct 6, 2020 at 7:56 PM Chris Lauwers <lauwers@ubicity.com<mailto:lauwers@ubicity.com>> wrote:
    • Clearly using environment variable for outputs doesn’t work. As alternatives, we discussed passing a file descriptor, or an environment variable that specifies a location to which to write outputs. Have you given any more thought to this?
    Yes I considered implementing something like that where the shell scripts could echo outputs to a file descriptor that was set in an environment variable and the orchestrator would read from that file after the script terminated. But I didn't find implementing this was necessary since my orchestrator has TOSCA extensions that lets the user configure the environment variables passed to the shell script or process and also a template which has access to the script's stdout, stderr and returncode. So essentially the user is free to implement any private protocol between the orchestrator and shell script as needed. This is particularly useful if you need to work with pre-existing scripts.
    I’m not sure how having access to the script’s stdout and stderr allows the orchestrator to retrieve the appropriate operation output values. If a script has to return output values, doesn’t it need to be designed to do so?
    When my orchestrator invokes the artifact processor for bash scripts, it will “clone” the stdout file descriptor to a new file descriptor (typically nr. 3), and then redirect stdout and stderr to various log files. The script will then “echo” any required output values to the “cloned” file descriptor.


    Yes, I just mean that the user defined templates can implement a protocol like yours or something different as needed. For example, to support Terraform it can read outputs from its Terraform state file (which is encoded as json).


    Regarding file descriptors, their main benefits compared to a file path are that file descriptors are more secure: easier to avoid potential race conditions and information leaks, and they can point to a private temp file not visible on the file system. They are also more flexible (e.g. they don't require access to a file system, they can point to pipe so they can more easily support streaming, etc.)

    • Even for input values, environment variable are not always convenient. What is the standard way for encoding maps, lists, or complex data types so they can be carried in an environment variable? I’m curious how your implementation handles this.

    My implementation handles this by trying to parse the variable's value as JSON, and if parsing fails, treat it as a string. This approach is easy to implement and handles numbers, lists, and maps pretty much as the user would expect.
    That’s a reasonable and elegant approach, but why wouldn’t we always encode all input values as JSON?


    Yes that makes sense... in my case also needs to be able to handle external environment variables that weren't set by the orchestrator.

    Thanks,

    Chris




  • 2.  Re: TOSCA inputs and outputs for shell scripts

    Posted 10-08-2020 03:30
    YAML is nicer for humans but for machine interchange I think JSON is
    clearly superior:

    * whitespace has syntactic significant in YAML which is particular
    problematic with environment variables
    * YAML has gotchas and corner cases when parsing (e.g."yes/no" string can
    parsed as booleans) while JSON is simple and fast to parse
    * almost all modern languages have built-in support for JSON but require an
    external library to parse YAML

    If you do want to support YAML I suggest specifying a safe subset (i.e.
    JSON :)

    Adam

    On Wed, Oct 7, 2020 at 7:46 PM Chris Lauwers <lauwers@ubicity.com> wrote:

    > Thanks Adam. Do you see any disadvantages/advantages of using YAML instead
    > of JSON for exchanging input/output values between the orchestrator and the
    > artifacts (“scripts”)?
    >
    >
    >
    > Chris
    >
    >
    >
    > *From:* adam souzis <adam@onecommons.org>
    > *Sent:* Wednesday, October 7, 2020 4:30 PM
    > *To:* Chris Lauwers <lauwers@ubicity.com>
    > *Cc:* tosca-comment@lists.oasis-open.org
    > *Subject:* Re: TOSCA inputs and outputs for shell scripts
    >
    >
    >
    >
    >
    >
    >
    > On Tue, Oct 6, 2020 at 7:56 PM Chris Lauwers <lauwers@ubicity.com> wrote:
    >
    > · Clearly using environment variable for outputs doesn’t work. As
    > alternatives, we discussed passing a file descriptor, or an environment
    > variable that specifies a location to which to write outputs. Have you
    > given any more thought to this?
    >
    > Yes I considered implementing something like that where the shell scripts
    > could echo outputs to a file descriptor that was set in an environment
    > variable and the orchestrator would read from that file after the script
    > terminated. But I didn't find implementing this was necessary since my
    > orchestrator has TOSCA extensions that lets the user configure the
    > environment variables passed to the shell script or process and also a
    > template which has access to the script's stdout, stderr and returncode. So
    > essentially the user is free to implement any private protocol between the
    > orchestrator and shell script as needed. This is particularly useful if you
    > need to work with pre-existing scripts.
    >
    > I’m not sure how having access to the script’s stdout and stderr allows
    > the orchestrator to retrieve the appropriate operation output values. If a
    > script has to return output values, doesn’t it need to be designed to do so?
    >
    > When my orchestrator invokes the artifact processor for bash scripts, it
    > will “clone” the stdout file descriptor to a new file descriptor (typically
    > nr. 3), and then redirect stdout and stderr to various log files. The
    > script will then “echo” any required output values to the “cloned” file
    > descriptor.
    >
    >
    >
    >
    >
    > Yes, I just mean that the user defined templates can implement a protocol
    > like yours or something different as needed. For example, to support
    > Terraform it can read outputs from its Terraform state file (which is
    > encoded as json).
    >
    >
    >
    >
    >
    > Regarding file descriptors, their main benefits compared to a file path
    > are that file descriptors are more secure: easier to avoid potential race
    > conditions and information leaks, and they can point to a private temp file
    > not visible on the file system. They are also more flexible (e.g. they
    > don't require access to a file system, they can point to pipe so they can
    > more easily support streaming, etc.)
    >
    >
    >
    > · Even for input values, environment variable are not always
    > convenient. What is the standard way for encoding maps, lists, or complex
    > data types so they can be carried in an environment variable? I’m curious
    > how your implementation handles this.
    >
    >
    >
    > My implementation handles this by trying to parse the variable's value as
    > JSON, and if parsing fails, treat it as a string. This approach is easy to
    > implement and handles numbers, lists, and maps pretty much as the user
    > would expect.
    >
    > That’s a reasonable and elegant approach, but why wouldn’t we always
    > encode all input values as JSON?
    >
    >
    >
    >
    >
    > Yes that makes sense... in my case also needs to be able to handle
    > external environment variables that weren't set by the orchestrator.
    >
    >
    >
    > Thanks,
    >
    >
    >
    > Chris
    >
    >
    >
    >



  • 3.  Re: [tosca-comment] Re: TOSCA inputs and outputs for shell scripts

    Posted 10-08-2020 05:08
    On Wed, Oct 7, 2020 at 10:30 PM adam souzis <adam@onecommons.org> wrote:

    > YAML is nicer for humans but for machine interchange I think JSON is
    > clearly superior:
    >

    Adam, I bring good news: the whitespace style in YAML is optional. You can
    use JSON-style if you prefer. And in fact as of version 1.2 YAML is a
    strict superset of JSON. Thus, every YAML 1.2 parser would be able to
    consume JSON. (YAML 1.2 also introduces breaking changes; it's a long
    story.)

    JSON, like YAML, also has special keywords: true, false, null, nan, etc. So
    you have the same potential challenge as with JSON in consuming strings
    that look like these keywords. A conformant encoder should properly escape
    such values.

    Note that JSON has a very serious deficiency compared to YAML: it does not
    distinguish between integers and floats. (There are implementations that
    do, but they are non-comfortant and portability is not guaranteed.)

    As of TOSCA 2.0 we've beefed up the specification for primitive data type
    interchange and finally separated it from YAML. Thus, though TOSCA syntax
    is written in YAML, its data types (for inputs, outputs, properties,
    attributes, etc.) are not limited by your TOSCA code. So, at the end of the
    day, I think it would be safe to use whatever interchange format is
    convenient for your application as long as you are aware of its limitations
    (e.g. no integers in JSON) and understand TOSCA's assumptions. I agree that
    JSON's universality is a big advantage.

    >



  • 4.  Re: [tosca-comment] Re: TOSCA inputs and outputs for shell scripts

    Posted 10-08-2020 05:13
    yes, indeed -- that's why I wrote:

    If you do want to support YAML I suggest specifying a safe subset (i.e.
    > JSON :)
    >



    On Wed, Oct 7, 2020 at 10:08 PM Tal Liron <tliron@redhat.com> wrote:

    > On Wed, Oct 7, 2020 at 10:30 PM adam souzis <adam@onecommons.org> wrote:
    >
    >> YAML is nicer for humans but for machine interchange I think JSON is
    >> clearly superior:
    >>
    >
    > Adam, I bring good news: the whitespace style in YAML is optional. You can
    > use JSON-style if you prefer. And in fact as of version 1.2 YAML is a
    > strict superset of JSON. Thus, every YAML 1.2 parser would be able to
    > consume JSON. (YAML 1.2 also introduces breaking changes; it's a long
    > story.)
    >
    > JSON, like YAML, also has special keywords: true, false, null, nan, etc.
    > So you have the same potential challenge as with JSON in consuming strings
    > that look like these keywords. A conformant encoder should properly escape
    > such values.
    >
    > Note that JSON has a very serious deficiency compared to YAML: it does not
    > distinguish between integers and floats. (There are implementations that
    > do, but they are non-comfortant and portability is not guaranteed.)
    >
    > As of TOSCA 2.0 we've beefed up the specification for primitive data type
    > interchange and finally separated it from YAML. Thus, though TOSCA syntax
    > is written in YAML, its data types (for inputs, outputs, properties,
    > attributes, etc.) are not limited by your TOSCA code. So, at the end of the
    > day, I think it would be safe to use whatever interchange format is
    > convenient for your application as long as you are aware of its limitations
    > (e.g. no integers in JSON) and understand TOSCA's assumptions. I agree that
    > JSON's universality is a big advantage.
    >
    >>



  • 5.  Re: [tosca-comment] Re: TOSCA inputs and outputs for shell scripts

    Posted 10-08-2020 05:17
    Also, regarding parse see https://github.com/cblp/yaml-sucks for example

    On Wed, Oct 7, 2020 at 10:12 PM adam souzis <adam@onecommons.org> wrote:

    > yes, indeed -- that's why I wrote:
    >
    > If you do want to support YAML I suggest specifying a safe subset (i.e.
    >> JSON :)
    >>
    >
    >
    >
    > On Wed, Oct 7, 2020 at 10:08 PM Tal Liron <tliron@redhat.com> wrote:
    >
    >> On Wed, Oct 7, 2020 at 10:30 PM adam souzis <adam@onecommons.org> wrote:
    >>
    >>> YAML is nicer for humans but for machine interchange I think JSON is
    >>> clearly superior:
    >>>
    >>
    >> Adam, I bring good news: the whitespace style in YAML is optional. You
    >> can use JSON-style if you prefer. And in fact as of version 1.2 YAML is a
    >> strict superset of JSON. Thus, every YAML 1.2 parser would be able to
    >> consume JSON. (YAML 1.2 also introduces breaking changes; it's a long
    >> story.)
    >>
    >> JSON, like YAML, also has special keywords: true, false, null, nan, etc.
    >> So you have the same potential challenge as with JSON in consuming strings
    >> that look like these keywords. A conformant encoder should properly escape
    >> such values.
    >>
    >> Note that JSON has a very serious deficiency compared to YAML: it does
    >> not distinguish between integers and floats. (There are implementations
    >> that do, but they are non-comfortant and portability is not guaranteed.)
    >>
    >> As of TOSCA 2.0 we've beefed up the specification for primitive data type
    >> interchange and finally separated it from YAML. Thus, though TOSCA syntax
    >> is written in YAML, its data types (for inputs, outputs, properties,
    >> attributes, etc.) are not limited by your TOSCA code. So, at the end of the
    >> day, I think it would be safe to use whatever interchange format is
    >> convenient for your application as long as you are aware of its limitations
    >> (e.g. no integers in JSON) and understand TOSCA's assumptions. I agree that
    >> JSON's universality is a big advantage.
    >>
    >>>



  • 6.  Re: [tosca-comment] Re: TOSCA inputs and outputs for shell scripts

    Posted 10-08-2020 05:27
    On Thu, Oct 8, 2020 at 12:17 AM adam souzis <adam@onecommons.org> wrote:

    > Also, regarding parse see https://github.com/cblp/yaml-sucks for example
    >

    The YAML spec is not ambiguous, and especially in 1.2 there is greater
    clarity on which schemas should be supported for greater compatibility.

    But you are very right that the quality of YAML parsers (most of which are
    still stuck at 1.1) is ... variable. I can even add a few more examples to
    that page. But the point is that these parsers do not follow the spec.

    The bottom line is that it's outside of TOSCA's scope: choose whatever
    makes sense to you. I generally avoid JSON due to the lack of integers,
    meaning that it may not survive roundtrips of TOSCA data, but it works for
    you, why not? (By the way, note that both TOSCA and YAML don't have
    unsigned integers.) I imagine that many orchestrators would be using
    databases to store values and may not use textual interchange formats at
    all.

    >