Open Supplychain Information Modeling TC

 View Only
  • 1.  Information about 'not' having a component

    Posted 09-05-2024 12:39

    Won't be obvious, but related to my pedigree/provenance comments are two other concepts:

    • Information about 'not' having a component
    • SBOM completeness

     

    I believe there is a distinction between data, information, and knowledge.

    I would like to know (ie derive knowledge from information derived from data) that a product is or isn't exploitable via TTP-whatever (based on CVE-whatever). To achieve that knowledge, I may use different information depending on the situation:

    • Case 1:
      • I have data from build and deploy that lets me derive a complete, accurate (within what quality metrics I supply) component inventory SBOM that shows I definitively don't have the component affected by the CVE.
    • Case 2:
      • I have data from an executable scanner that gives me an incomplete SBOM but can definitively (within what quality metrics I supply) don't have the component affected by the CVE
    • Case 3:
      • I have data from build/deploy by which I have complete SBOMs for some components, but for some I only know the programming language – and it's not the programming language of the CVE (eg IRL I have a Raspberry Pi running my elixir software on top of Nerves OS which I know has zero java in it so can't be affected by Log4J)
    • Lots of other cases not relevant to my argument at the moment.

     

    There are many subtle information terms that need definition in the above:

    • The concept of a 'complete' SBOM vs an 'incomplete' SBOM
    • The concept of an SBOM declaring it doesn't have an component (note that is the root 'information' most useful in all 3 cases)
    • Hidden, but present due to term 'accurate', are the concepts of:
      • Information model – the 'schema' by which we can answer
      • Information – the instantiation of the schema with actual info
      • Ground Truth – to be accurate it means our 'information' as instantiated is congruent with 'ground truth'

    I think we need to actually define the above concepte, at least by example, to be able to meaningfully resove some of the issues we discussed last meeting.

    -- 

    Duncan Sparrell

    sFractal Consulting

    iPhone, iTypo, iApologize

    I welcome VSRE emails. Learn more at http://vsre.info/

     



  • 2.  RE: Information about 'not' having a component

    Posted 09-05-2024 15:38
    "I believe there is a distinction between data, information, and knowledge" - could you elucidate the distinction you see, a little? I'm not disagreeing with the statement but would like to learn more about how you separate the three. What would your rough sketch definitions be?

    No disagreement either that we need to define "complete" and "accurate" and so on. Can an "incomplete" SBOM nonetheless be accurate? Etc.

    Wild idea to ponder: an SBOM is a claim by an identity as to what it knows about the constituent components of a given software product. An empty document ("I know nothing! - signed, Isaac") is a valid SBOM by that definition, though it's not a component inventory in any meaningful sense of the term.

    Also a document saying "I know nothing except there's no Java in here - signed, Isaac" would be a valid SBOM similarly... and in fact could even be useful in practice (provided that you have some trust in Isaac's claims). And it'd still barely qualify as an inventory.

    Isaac

    On Thu, Sep 5, 2024 at 10:39 AM Duncan Sparrell via OASIS <Mail@mail.groups.oasis-open.org> wrote:
    Won't be obvious, but related to my pedigree/provenance comments are two other concepts: Information about 'not' having a componentSBOM...

    Open Supplychain Information Modeling TC

    Post New Message
    Information about 'not' having a component
    Reply to Group Reply to Sender via Email
    Sep 5, 2024 12:39 PM
    Duncan Sparrell

    Won't be obvious, but related to my pedigree/provenance comments are two other concepts:

    • Information about 'not' having a component
    • SBOM completeness

     

    I believe there is a distinction between data, information, and knowledge.

    I would like to know (ie derive knowledge from information derived from data) that a product is or isn't exploitable via TTP-whatever (based on CVE-whatever). To achieve that knowledge, I may use different information depending on the situation:

    • Case 1:
      • I have data from build and deploy that lets me derive a complete, accurate (within what quality metrics I supply) component inventory SBOM that shows I definitively don't have the component affected by the CVE.
    • Case 2:
      • I have data from an executable scanner that gives me an incomplete SBOM but can definitively (within what quality metrics I supply) don't have the component affected by the CVE
    • Case 3:
      • I have data from build/deploy by which I have complete SBOMs for some components, but for some I only know the programming language – and it's not the programming language of the CVE (eg IRL I have a Raspberry Pi running my elixir software on top of Nerves OS which I know has zero java in it so can't be affected by Log4J)
    • Lots of other cases not relevant to my argument at the moment.

     

    There are many subtle information terms that need definition in the above:

    • The concept of a 'complete' SBOM vs an 'incomplete' SBOM
    • The concept of an SBOM declaring it doesn't have an component (note that is the root 'information' most useful in all 3 cases)
    • Hidden, but present due to term 'accurate', are the concepts of:
      • Information model – the 'schema' by which we can answer
      • Information – the instantiation of the schema with actual info
      • Ground Truth – to be accurate it means our 'information' as instantiated is congruent with 'ground truth'

    I think we need to actually define the above concepte, at least by example, to be able to meaningfully resove some of the issues we discussed last meeting.

    -- 

    Duncan Sparrell

    sFractal Consulting

    iPhone, iTypo, iApologize

    I welcome VSRE emails. Learn more at http://vsre.info/

     

      Reply to Group via Email   Reply to Sender via Email   View Thread   Recommend   Forward  



     
    You are subscribed to "Open Supplychain Information Modeling TC" as isaach@google.com. To change your subscriptions, go to My Subscriptions. To unsubscribe from this community discussion, go to Unsubscribe.





  • 3.  RE: Information about 'not' having a component

    Posted 09-05-2024 15:50

    You bring up another term I think we need to define – "Assertion".

     

    Wrt data/info/knowledge I'll defer to standard DIKW pyramid and probably Wikipedia is as good as any

    https://en.wikipedia.org/wiki/DIKW_pyramid

     

    Wrt OSIM – I consider CycloneDX and SPDX to be data. As a human I can look a particular instantiation and call this CycloneDX file an SBOM, and call another one a VEX instead of an SBOM. And I can look at a third SPDX file and say 'it's the same SBOM as the first CycloneDX one". To be able to do that, I have in my head an 'information model' of what is an SBOM. Two different data formats give the same information (#1 &#3) vs not everything in that data format is an SBOM (#2). I want to codify (preferably with JADN but could live with ASN.1 if someone else does the ASN.1) what the 'information' is that makes 1&3 an SBOM, but 2 is not an SBOM.

     

    -- 

    Duncan Sparrell

    sFractal Consulting

    iPhone, iTypo, iApologize

    I welcome VSRE emails. Learn more at http://vsre.info/

     

     






  • 4.  RE: Information about 'not' having a component

    Posted 09-05-2024 15:58

    Ha! Two meetings in and we're down to the epistemological foundation already! :)

    Yup, let's add "assertion" to the pile. Probably "attestation" too. fwiw ssci.io/attestations-deck covers a number of these (uses "claim" but "assertion" likely work too; uses the SLSA definition of "provenance") and shows how they fit together in practice.

    Isaac

    On Thu, Sep 5, 2024 at 1:49 PM Duncan Sparrell via OASIS <Mail@mail.groups.oasis-open.org> wrote:
    You bring up another term I think we need to define – "Assertion". Wrt data/info/knowledge I'll defer to standard DIKW pyramid and probably...

    Open Supplychain Information Modeling TC

    Post New Message
    Re: Information about 'not' having a component
    Reply to Group Reply to Sender via Email
    Sep 5, 2024 3:50 PM
    Duncan Sparrell

    You bring up another term I think we need to define – "Assertion".

     

    Wrt data/info/knowledge I'll defer to standard DIKW pyramid and probably Wikipedia is as good as any

    en.wikipedia.org/wiki/DIKW_pyramid

     

    Wrt OSIM – I consider CycloneDX and SPDX to be data. As a human I can look a particular instantiation and call this CycloneDX file an SBOM, and call another one a VEX instead of an SBOM. And I can look at a third SPDX file and say 'it's the same SBOM as the first CycloneDX one". To be able to do that, I have in my head an 'information model' of what is an SBOM. Two different data formats give the same information (#1 &#3) vs not everything in that data format is an SBOM (#2). I want to codify (preferably with JADN but could live with ASN.1 if someone else does the ASN.1) what the 'information' is that makes 1&3 an SBOM, but 2 is not an SBOM.

     

    -- 

    Duncan Sparrell

    sFractal Consulting

    iPhone, iTypo, iApologize

    I welcome VSRE emails. Learn more at http://vsre.info/

     

     



      Reply to Group via Email   Reply to Sender via Email   View Thread   Recommend   Forward  




     
    You are subscribed to "Open Supplychain Information Modeling TC" as isaach@google.com. To change your subscriptions, go to My Subscriptions. To unsubscribe from this community discussion, go to Unsubscribe.



    Original Message:
    Sent: 9/5/2024 3:50:00 PM
    From: Duncan Sparrell
    Subject: RE: Information about 'not' having a component

    You bring up another term I think we need to define – "Assertion".

     

    Wrt data/info/knowledge I'll defer to standard DIKW pyramid and probably Wikipedia is as good as any

    https://en.wikipedia.org/wiki/DIKW_pyramid

     

    Wrt OSIM – I consider CycloneDX and SPDX to be data. As a human I can look a particular instantiation and call this CycloneDX file an SBOM, and call another one a VEX instead of an SBOM. And I can look at a third SPDX file and say 'it's the same SBOM as the first CycloneDX one". To be able to do that, I have in my head an 'information model' of what is an SBOM. Two different data formats give the same information (#1 &#3) vs not everything in that data format is an SBOM (#2). I want to codify (preferably with JADN but could live with ASN.1 if someone else does the ASN.1) what the 'information' is that makes 1&3 an SBOM, but 2 is not an SBOM.

     

    -- 

    Duncan Sparrell

    sFractal Consulting

    iPhone, iTypo, iApologize

    I welcome VSRE emails. Learn more at http://vsre.info/

     

     




    Original Message:
    Sent: 9/5/2024 3:38:00 PM
    From: Isaac Hepworth
    Subject: RE: Information about 'not' having a component

    "I believe there is a distinction between data, information, and knowledge" - could you elucidate the distinction you see, a little? I'm not disagreeing with the statement but would like to learn more about how you separate the three. What would your rough sketch definitions be?

    No disagreement either that we need to define "complete" and "accurate" and so on. Can an "incomplete" SBOM nonetheless be accurate? Etc.

    Wild idea to ponder: an SBOM is a claim by an identity as to what it knows about the constituent components of a given software product. An empty document ("I know nothing! - signed, Isaac") is a valid SBOM by that definition, though it's not a component inventory in any meaningful sense of the term.

    Also a document saying "I know nothing except there's no Java in here - signed, Isaac" would be a valid SBOM similarly... and in fact could even be useful in practice (provided that you have some trust in Isaac's claims). And it'd still barely qualify as an inventory.

    Isaac

    On Thu, Sep 5, 2024 at 10:39 AM Duncan Sparrell via OASIS <Mail@mail.groups.oasis-open.org> wrote:
    Won't be obvious, but related to my pedigree/provenance comments are two other concepts: Information about 'not' having a componentSBOM...

    Open Supplychain Information Modeling TC

    Post New Message
    Information about 'not' having a component
    Reply to Group Reply to Sender via Email
    Sep 5, 2024 12:39 PM
    Duncan Sparrell

    Won't be obvious, but related to my pedigree/provenance comments are two other concepts:

    • Information about 'not' having a component
    • SBOM completeness

     

    I believe there is a distinction between data, information, and knowledge.

    I would like to know (ie derive knowledge from information derived from data) that a product is or isn't exploitable via TTP-whatever (based on CVE-whatever). To achieve that knowledge, I may use different information depending on the situation:

    • Case 1:
      • I have data from build and deploy that lets me derive a complete, accurate (within what quality metrics I supply) component inventory SBOM that shows I definitively don't have the component affected by the CVE.
    • Case 2:
      • I have data from an executable scanner that gives me an incomplete SBOM but can definitively (within what quality metrics I supply) don't have the component affected by the CVE
    • Case 3:
      • I have data from build/deploy by which I have complete SBOMs for some components, but for some I only know the programming language – and it's not the programming language of the CVE (eg IRL I have a Raspberry Pi running my elixir software on top of Nerves OS which I know has zero java in it so can't be affected by Log4J)
    • Lots of other cases not relevant to my argument at the moment.

     

    There are many subtle information terms that need definition in the above:

    • The concept of a 'complete' SBOM vs an 'incomplete' SBOM
    • The concept of an SBOM declaring it doesn't have an component (note that is the root 'information' most useful in all 3 cases)
    • Hidden, but present due to term 'accurate', are the concepts of:
      • Information model – the 'schema' by which we can answer
      • Information – the instantiation of the schema with actual info
      • Ground Truth – to be accurate it means our 'information' as instantiated is congruent with 'ground truth'

    I think we need to actually define the above concepte, at least by example, to be able to meaningfully resove some of the issues we discussed last meeting.

    -- 

    Duncan Sparrell

    sFractal Consulting

    iPhone, iTypo, iApologize

    I welcome VSRE emails. Learn more at http://vsre.info/

     

      Reply to Group via Email   Reply to Sender via Email   View Thread   Recommend   Forward  



     
    You are subscribed to "Open Supplychain Information Modeling TC" as isaach@google.com. To change your subscriptions, go to My Subscriptions. To unsubscribe from this community discussion, go to Unsubscribe.




  • 5.  RE: Information about 'not' having a component

    Posted 09-06-2024 07:39
    Great presentation! And a lot of terms to add to definitions list. And some great use cases. 

    iPhone, iTypo, iApologize





  • 6.  RE: Information about 'not' having a component

    Posted 09-09-2024 11:39

    WRT data/information/knowledge I'd add that Jay and Isaac both participate in SLSA, and note that "Artifact" is defined to be "an immutable blob of data".  Immutability is what links software (where validation is performed by programming language grammars) and data/information where validation is performed by concrete (data) and abstract (information) models, and its importance with respect to artifact identity management is easy to overlook.

    UML says "
    DataTypes model Types whose instances are distinguished only by their value."  This means that instances have (immutable) values, and that if an instance has a different value then it is a different instance, and thus must have a different identity.  (I don't know if software has an analog to information "significance"; if a difference (such as whitespace or comments) is insignificant then it isn't included in the immutable abstract value.)

    In any case, knowledge is usually modeled as an ontology/knowledge graph, where graph nodes are immutable but edges can be added at any time, often modifying the semantics of existing nodes.


    David







  • 7.  RE: Information about 'not' having a component

    Posted 09-09-2024 13:20

    With respect to "what is an SBOM", the following is a pending addition to the JADN committee note:

    Because abstraction establishes a correspondence between logical values and concrete representations, information modeling can be described as a process of synthesis starting with conceptual/logical design leaving details for later,or analysis starting with existing data to find patterns and meanings:

    • An IM defines the essential content of data artifacts used in computing independently of how they are represented for processing, communication, or storage.
    • An IM defines logical equivalence of data artifacts such that all representations of the same logical value are equivalent and data can be converted from any representation to any other without loss of information.

     

    Ideally an information model would be used at the design stage to define exactly what questions we want an SBOM to answer; if it doesn't answer the minimum required questions then it isn't an SBOM at all; other answers may be accurately defined but optional.  That exercise could still be performed without regard to CDX or SPDX.

    The analysis process is more challenging.  Individual chunks of information can be reverse-engineered from each format, but automatically establishing their equivalence could be a challenge.

    As always, the most effective approach might be to meet in the middle – define what we think an SBOM should be, informed by but not constrained by existing formats.  Then do a gap analysis to see how they map to each other and to our golden ideal.

    David Kemp
    NSA Cybersecurity Collaboration Center