OASIS Cyber Threat Intelligence (CTI) TC

 View Only
  • 1.  Thoughts and ideas

    Posted 07-02-2024 14:42
    Dear Friends in our CTI Community,

    It has been a while since I was actively involved, and as such I have not been following all of the day to day. Sorry about that. But I needed to step away for a bit, after so many years, I just needed a break and time away. 

    However, since I stepped down and away, I have had some time to think about what we have done together and the work as a whole. Through this I have some initial questions. I apologize if you have already discussed this at length and I was just not following along.

    I know there have been several suggestions for improvements and enhancements to STIX 2.1 and TAXII 2.1. The infamous incident object rings a bell loudly. 

    But I am not as curious about minor updates and enhancements or even basic feature requests. I am more interested in what everyone thinks a STIX 3.0 and TAXII 3.0 could and should look like. Meaning, what did we get right, what did we get wrong, what do we wish we would have done differently? And ultimately, how has the changing landscape shifted since we started this journey with STIX 1.0 and TAXII 1.0 and their major evolution to JSON and the SDO/SRO object model we have today.

    If we were to say, we wanted to release a STIX 3.0 and TAXII 3.0 in 3-5 years, what would that look like? What would we keep and what would we change? What do we need to do to harmonize the broader ecosystem around this? How do we see CTI and response changing, especially with the advent of a hyper-connected AI and 6G enabled edge? There has been a lot of great work in all kinds of areas in and around this space. But just like when we went from STIX 1.x to STIX 2.x, sometimes you need to reset and bring things together with new ideas and a broader scope.

    Now obviously I have a lot of ideas here, but I am curious to know what this community thinks and what everyone's appetite is for rolling up our sleeves. 

    Thanks
    Bret


  • 2.  RE: Thoughts and ideas

    Posted 07-10-2024 17:44

    I'm in a similar situation regarding involvement but am intending to devote more time to this area.  I'll break down the basic Pros and Cons as I see them.  I'll also provide some solution paths for the Cons (so as not to just complain about them).

    Pros:

    1. The STIX and TAXII protocols have done fairly well with defining the various CTI things the community needs as far as a taxonomy is concerned.
    2. It has mostly structured a hierarchy from the taxonomy.
    3. It has mostly accomplished a way to define CTI documents and a method to share those documents.
    4. It proposes a process for extending parts of the hierarchy.

    Cons:

    1. The STIX protocol from its earliest conception attempts to use a graph analogy to structure a document but misses the mark on well applied graph process and theory.
    2. The STIX protocol is not machine readable / understandable. To effectively use CTI data, the STIX protocol must move from taxonomy and hierarchy to an ontology and related standardized principles of ontology--descriptive logic.  This will enable a solution path for CTI knowledge processing and management concerns such as data governance, AI explainability, inferred knowledge, and other CTI analytics.
    3. The STIX protocol can apply better object oriented principles for class-subclass and property-subproperty relationships.  This would align the work with well defined ontology principles.  Problem areas in the current work include duplicative definitions and puns across the various STIX definitions.
    4. The STIX protocol does not adopt namespace solutions that help us "play well with others".

    Solutions:

    These solution should be considered a primer for standardized knowledge graph technologies.

    For Con 1:

    The STIX graph analogy uses node-edge-node relations but, in some of the definitions, things you expect to be edges are described as nodes.  For instance a "relationship" is properly describes (almost) universally as an edge that connect two nodes.  Yet, STIX defines a "Relationship" as a node with a "source" and a "target".  Using a directed graph (common for knowledge graphs) we can see the differences:

    (SourceNode) -- relationship -->(TargetNode)

    versus:

    (SourceNode) <-- source -- (Relationship) -- target --> (TargetNode)

    While this is technically not wrong (in graph theory, edges can be converted to nodes and all sorts of other blasphemy), it unnecessarily complicates the linked data and query solutions.  Using a knowledge graph solution, the source (Domain) and target (Range) is a result of the directed edge and describes a predicate Binary function.  In a descriptive logic ontology, information can be easily inferred from the edge logic.  There is nothing gained by dividing the relationship into two node-edge-node statements as a directed edge already defines source and target.

    However, a "relationship" as a node is sometimes beneficial as it can be used to apply additional data to the relationship.  This is known a "reification" in the knowledge graph arena.  Instead of using additional (Relationship) -- some property -> (Data)statements, a knowledge graph can reference a whole node-edge-node statement as a node:

    << (SourceNode) -- relationship -->(TargetNode) >> -- some property -> (Data)

    or

    (SomeNode) -- some property -> << (SourceNode) -- relationship -->(TargetNode) >>

    This is generally considered a better solution as the context is clearer--something is being said about the entire statement instead of just the edge of a statement.

    There are other issues but the above is a primary issue.

    For Con 2:

    The solution is combined with Con 4 below as they are related. 

    For Con 3:

    As a generic example, one might describe a directed edge relationship between a Person and an Email Address as:

    (John) -- has a --> (email) -- defined as --> (john@example.org)

    or as individual statements

    (John) -- has a --> (email_1)
    (email_1) -- defined as --> (john@example.org)

    The "has a" and "defined as" properties are highly generic properties that can be used in other kinds of relationships.  These are the kind of definitions I see in STIX.  This should be a clue that these properties are super-properties of some better defined sub-properties to represent specific relations.  There is no well defined expectations of what the domain and range are for these properties.  If we allow context alone to determine meaning for these properties, we fail to provide enough logic for effective use (and related governance) for the user base.  It raises questions of who's definition of what context is appropriate.

    A better solution might be via the following two blocks:

    (Kind) -- type --> (Class)
    (Individual) -- type --> (Kind)
    (Organization) -- type --> (Kind)
    hasEmail -- type --> (Property)
    hasEmail -- range --> (Email)

    These are common ontology definitions provided by a "specific domain" standards body (like CTI).  The ontology definitions for "Class", "Property", "type" and "range" are given by another "general domain" standards body--the "specific domain" standard aligns with a "general domain" standard that helps define and share the specific work with specific domain users as well as a larger community of data users.

    (John) -- type --> (Individual)
    (John) -- hasEmail --> (_container1_)
    (_container1_) -- type --> (Home)
    (_container1_) -- hasValue --> (mailto:john@example.org)

    The above is a user created document using the combined standard definitions for the specific and general domains.  The "hasEmail" property implies that "_container1_" is an "Email" class type in the prior ontology statement.  It is also directly defined as a "Home" class type and, therefore, defined as a Home Email address.  In knowledge graph speak, "_container1_" is called a Blank Node as it's just a place to consolidate related data and does not need to be a Named Node.

    It is also interesting to note that the ontology statements are declared and defined in exactly the same way that data statements are defined.  Then, redefinition and deprecation of ontology statements can be automated in these solutions since it's machine readable.  For the current STIX protocol, change requires independent work by every implementer.  By using standardized ontologies, we minimize impact on the implementations.

    In fact, I've snuck in the W3C vCard Ontology for this example.  This highlights the very relevant reason to adopt a namespace solution.  STIX can use existing ontologies to augment its own ontology--we don't need to reinvent the wheel...again.

    For Con 4:

    Without a well defined ontology, the STIX protocol is difficult to use with other data standards without a lot of extra implementation work on individual organizations.  This forces them to develop bespoke data solutions in their own bubble.  That, in turn, makes it difficult for them to effectively share their data with others and adds additional maintenance cost to their solutions.  The status quo fosters a fractured user domain.

    By adopting an ontology solution, we must also embrace namespace solutions to help us share data in the CTI domain with others.  We become a resource for a larger community of data users.  A STIX namespace (and possible sub namespaces) organizes the ontology and compliant data to be shared and used with other ontologies.  For instance, if NIEM and STIX adopt ontological solutions, the ontologies can be aligned by establishing equivalency statements--a NIEM ThingX class can be equivalent to a STIX ThingY class.  This mitigates any need for us to agree to use a common term or to develop any kind of data conversion process.  The equivalencies allows the standardized ontology engines to query on one term and provide results that are otherwise defined by another term.

    This helps an analyst "Get Things Done"(tm).  It helps our community to connect with other communities.  The descriptive logic that comes with an ontology provides inferencing--new data can be inferred from existing data.  For instance, (John) -- type --> (Individual)implies John is also a "Kind" by the subclass relation.  If we query for things of type "Kind" using an inferencing engine, we get John as a result.  While these are simple examples, the implication for this shift is HUGE for the user base.

    This knowledge technology is also being used behind the scenes for the current AI movement.  AI solutions have difficulty explaining how they arrive at a solution and proving that they are correct solutions.  With knowledge graph technologies, the AIs have a way to "explain" how solutions are derived.  They also have a ready made ML Feature selection set from the knowledge graph repositories.

    Conclusion:

    The point is, that by shifting to an ontology solution, we help a larger community get work done with lower costs to everyone and better, standardized, machine ingestible definitions. It provides for a natural extendibility as a consequence--independent groups can define their own ontologies to use with the STIX ontology without concern. STIX bundles can have extension data using extension ontologies in a document without impacting end users who haven't implemented an extension--they are just extra data elements that can be ignored or processed for later use.  If they want to use that data, they can ingest the related ontology at will and adjust their queries in short order.

    I can present these points in more detail as requested.



    ------------------------------
    Keven Ates
    US Federal Bureau of Investigation
    Washington DC
    ------------------------------



  • 3.  RE: Thoughts and ideas

    Posted 07-11-2024 08:17
    Keven,

    Exactly. I also think there are a lot of work items that we need to address to accommodate the changing landscape of AI in CTI. I know some have already done some work here. I would also like to look at a registry for objects and vocabs. 

    All in all, I think there are a lot of things that we need to do. We have learned a lot since we did the first migration from STIX 1.0/1.1 to STIX 2.0/2.1. I also believe that the changes we need are not simple iterative changes, but things that will warrant a STIX and TAXII 3.0. 

    Keven, I do think it would be good to help everyone understand what would be involved in moving to an ontology, like concrete examples. But we can talk off line about what could be good there.  

    Bret

    On Wed, Jul 10, 2024 at 11:43 PM Keven Ates via OASIS <Mail@mail.groups.oasis-open.org> wrote:
    I'm in a similar situation regarding involvement but am intending to devote more time to this area. I'll break down the basic Pros and Cons as I...

    OASIS Cyber Threat Intelligence (CTI) TC

    Post New Message
    Re: Thoughts and ideas
    Reply to Group Reply to Sender via Email
    Jul 10, 2024 5:44 PM
    Keven Ates

    I'm in a similar situation regarding involvement but am intending to devote more time to this area.  I'll break down the basic Pros and Cons as I see them.  I'll also provide some solution paths for the Cons (so as not to just complain about them).

    Pros:

    1. The STIX and TAXII protocols have done fairly well with defining the various CTI things the community needs as far as a taxonomy is concerned.
    2. It has mostly structured a hierarchy from the taxonomy.
    3. It has mostly accomplished a way to define CTI documents and a method to share those documents.
    4. It proposes a process for extending parts of the hierarchy.

    Cons:

    1. The STIX protocol from its earliest conception attempts to use a graph analogy to structure a document but misses the mark on well applied graph process and theory.
    2. The STIX protocol is not machine readable / understandable. To effectively use CTI data, the STIX protocol must move from taxonomy and hierarchy to an ontology and related standardized principles of ontology--descriptive logic.  This will enable a solution path for CTI knowledge processing and management concerns such as data governance, AI explainability, inferred knowledge, and other CTI analytics.
    3. The STIX protocol can apply better object oriented principles for class-subclass and property-subproperty relationships.  This would align the work with well defined ontology principles.  Problem areas in the current work include duplicative definitions and puns across the various STIX definitions.
    4. The STIX protocol does not adopt namespace solutions that help us "play well with others".

    Solutions:

    These solution should be considered a primer for standardized knowledge graph technologies.

    For Con 1:

    The STIX graph analogy uses node-edge-node relations but, in some of the definitions, things you expect to be edges are described as nodes.  For instance a "relationship" is properly describes (almost) universally as an edge that connect two nodes.  Yet, STIX defines a "Relationship" as a node with a "source" and a "target".  Using a directed graph (common for knowledge graphs) we can see the differences:

    (SourceNode) -- relationship -->(TargetNode)

    versus:

    (SourceNode) <-- source -- (Relationship) -- target --> (TargetNode)

    While this is technically not wrong (in graph theory, edges can be converted to nodes and all sorts of other blasphemy), it unnecessarily complicates the linked data and query solutions.  Using a knowledge graph solution, the source (Domain) and target (Range) is a result of the directed edge and describes a predicate Binary function.  In a descriptive logic ontology, information can be easily inferred from the edge logic.  There is nothing gained by dividing the relationship into two node-edge-node statements as a directed edge already defines source and target.

    However, a "relationship" as a node is sometimes beneficial as it can be used to apply additional data to the relationship.  This is known a "reification" in the knowledge graph arena.  Instead of using additional (Relationship) -- some property -> (Data)statements, a knowledge graph can reference a whole node-edge-node statement as a node:

    << (SourceNode) -- relationship -->(TargetNode) >> -- some property -> (Data)

    or

    (SomeNode) -- some property -> << (SourceNode) -- relationship -->(TargetNode) >>

    This is generally considered a better solution as the context is clearer--something is being said about the entire statement instead of just the edge of a statement.

    There are other issues but the above is a primary issue.

    For Con 2:

    The solution is combined with Con 4 below as they are related. 

    For Con 3:

    As a generic example, one might describe a directed edge relationship between a Person and an Email Address as:

    (John) -- has a --> (email) -- defined as --> (john@example.org)

    or as individual statements

    (John) -- has a --> (email_1)
    (email_1) -- defined as --> (john@example.org)

    The "has a" and "defined as" properties are highly generic properties that can be used in other kinds of relationships.  These are the kind of definitions I see in STIX.  This should be a clue that these properties are super-properties of some better defined sub-properties to represent specific relations.  There is no well defined expectations of what the domain and range are for these properties.  If we allow context alone to determine meaning for these properties, we fail to provide enough logic for effective use (and related governance) for the user base.  It raises questions of who's definition of what context is appropriate.

    A better solution might be via the following two blocks:

    (Kind) -- type --> (Class)
    (Individual) -- type --> (Kind)
    (Organization) -- type --> (Kind)
    hasEmail -- type --> (Property)
    hasEmail -- range --> (Email)

    These are common ontology definitions provided by a "specific domain" standards body (like CTI).  The ontology definitions for "Class", "Property", "type" and "range" are given by another "general domain" standards body--the "specific domain" standard aligns with a "general domain" standard that helps define and share the specific work with specific domain users as well as a larger community of data users.

    (John) -- type --> (Individual)
    (John) -- hasEmail --> (_container1_)
    (_container1_) -- type --> (Home)
    (_container1_) -- hasValue --> (mailto:john@example.org)

    The above is a user created document using the combined standard definitions for the specific and general domains.  The "hasEmail" property implies that "_container1_" is an "Email" class type in the prior ontology statement.  It is also directly defined as a "Home" class type and, therefore, defined as a Home Email address.  In knowledge graph speak, "_container1_" is called a Blank Node as it's just a place to consolidate related data and does not need to be a Named Node.

    It is also interesting to note that the ontology statements are declared and defined in exactly the same way that data statements are defined.  Then, redefinition and deprecation of ontology statements can be automated in these solutions since it's machine readable.  For the current STIX protocol, change requires independent work by every implementer.  By using standardized ontologies, we minimize impact on the implementations.

    In fact, I've snuck in the W3C vCard Ontology for this example.  This highlights the very relevant reason to adopt a namespace solution.  STIX can use existing ontologies to augment its own ontology--we don't need to reinvent the wheel...again.

    For Con 4:

    Without a well defined ontology, the STIX protocol is difficult to use with other data standards without a lot of extra implementation work on individual organizations.  This forces them to develop bespoke data solutions in their own bubble.  That, in turn, makes it difficult for them to effectively share their data with others and adds additional maintenance cost to their solutions.  The status quo fosters a fractured user domain.

    By adopting an ontology solution, we must also embrace namespace solutions to help us share data in the CTI domain with others.  We become a resource for a larger community of data users.  A STIX namespace (and possible sub namespaces) organizes the ontology and compliant data to be shared and used with other ontologies.  For instance, if NIEM and STIX adopt ontological solutions, the ontologies can be aligned by establishing equivalency statements--a NIEM ThingX class can be equivalent to a STIX ThingY class.  This mitigates any need for us to agree to use a common term or to develop any kind of data conversion process.  The equivalencies allows the standardized ontology engines to query on one term and provide results that are otherwise defined by another term.

    This helps an analyst "Get Things Done"(tm).  It helps our community to connect with other communities.  The descriptive logic that comes with an ontology provides inferencing--new data can be inferred from existing data.  For instance, (John) -- type --> (Individual)implies John is also a "Kind" by the subclass relation.  If we query for things of type "Kind" using an inferencing engine, we get John as a result.  While these are simple examples, the implication for this shift is HUGE for the user base.

    This knowledge technology is also being used behind the scenes for the current AI movement.  AI solutions have difficulty explaining how they arrive at a solution and proving that they are correct solutions.  With knowledge graph technologies, the AIs have a way to "explain" how solutions are derived.  They also have a ready made ML Feature selection set from the knowledge graph repositories.

    Conclusion:

    The point is, that by shifting to an ontology solution, we help a larger community get work done with lower costs to everyone and better, standardized, machine ingestible definitions. It provides for a natural extendibility as a consequence--independent groups can define their own ontologies to use with the STIX ontology without concern. STIX bundles can have extension data using extension ontologies in a document without impacting end users who haven't implemented an extension--they are just extra data elements that can be ignored or processed for later use.  If they want to use that data, they can ingest the related ontology at will and adjust their queries in short order.

    I can present these points in more detail as requested.



    ------------------------------
    Keven Ates
    US Federal Bureau of Investigation
    Washington DC
    ------------------------------
      Reply to Group via Email   Reply to Sender via Email   View Thread   Recommend   Forward  




     
    You are subscribed to "OASIS Cyber Threat Intelligence (CTI) TC" as bret.jordan.sdo@gmail.com. To change your subscriptions, go to My Subscriptions. To unsubscribe from this community discussion, go to Unsubscribe.



    Original Message:
    Sent: 7/10/2024 5:44:00 PM
    From: Keven Ates
    Subject: RE: Thoughts and ideas

    I'm in a similar situation regarding involvement but am intending to devote more time to this area.  I'll break down the basic Pros and Cons as I see them.  I'll also provide some solution paths for the Cons (so as not to just complain about them).

    Pros:

    1. The STIX and TAXII protocols have done fairly well with defining the various CTI things the community needs as far as a taxonomy is concerned.
    2. It has mostly structured a hierarchy from the taxonomy.
    3. It has mostly accomplished a way to define CTI documents and a method to share those documents.
    4. It proposes a process for extending parts of the hierarchy.

    Cons:

    1. The STIX protocol from its earliest conception attempts to use a graph analogy to structure a document but misses the mark on well applied graph process and theory.
    2. The STIX protocol is not machine readable / understandable. To effectively use CTI data, the STIX protocol must move from taxonomy and hierarchy to an ontology and related standardized principles of ontology--descriptive logic.  This will enable a solution path for CTI knowledge processing and management concerns such as data governance, AI explainability, inferred knowledge, and other CTI analytics.
    3. The STIX protocol can apply better object oriented principles for class-subclass and property-subproperty relationships.  This would align the work with well defined ontology principles.  Problem areas in the current work include duplicative definitions and puns across the various STIX definitions.
    4. The STIX protocol does not adopt namespace solutions that help us "play well with others".

    Solutions:

    These solution should be considered a primer for standardized knowledge graph technologies.

    For Con 1:

    The STIX graph analogy uses node-edge-node relations but, in some of the definitions, things you expect to be edges are described as nodes.  For instance a "relationship" is properly describes (almost) universally as an edge that connect two nodes.  Yet, STIX defines a "Relationship" as a node with a "source" and a "target".  Using a directed graph (common for knowledge graphs) we can see the differences:

    (SourceNode) -- relationship -->(TargetNode)

    versus:

    (SourceNode) <-- source -- (Relationship) -- target --> (TargetNode)

    While this is technically not wrong (in graph theory, edges can be converted to nodes and all sorts of other blasphemy), it unnecessarily complicates the linked data and query solutions.  Using a knowledge graph solution, the source (Domain) and target (Range) is a result of the directed edge and describes a predicate Binary function.  In a descriptive logic ontology, information can be easily inferred from the edge logic.  There is nothing gained by dividing the relationship into two node-edge-node statements as a directed edge already defines source and target.

    However, a "relationship" as a node is sometimes beneficial as it can be used to apply additional data to the relationship.  This is known a "reification" in the knowledge graph arena.  Instead of using additional (Relationship) -- some property -> (Data)statements, a knowledge graph can reference a whole node-edge-node statement as a node:

    << (SourceNode) -- relationship -->(TargetNode) >> -- some property -> (Data)

    or

    (SomeNode) -- some property -> << (SourceNode) -- relationship -->(TargetNode) >>

    This is generally considered a better solution as the context is clearer--something is being said about the entire statement instead of just the edge of a statement.

    There are other issues but the above is a primary issue.

    For Con 2:

    The solution is combined with Con 4 below as they are related. 

    For Con 3:

    As a generic example, one might describe a directed edge relationship between a Person and an Email Address as:

    (John) -- has a --> (email) -- defined as --> (john@example.org)

    or as individual statements

    (John) -- has a --> (email_1)
    (email_1) -- defined as --> (john@example.org)

    The "has a" and "defined as" properties are highly generic properties that can be used in other kinds of relationships.  These are the kind of definitions I see in STIX.  This should be a clue that these properties are super-properties of some better defined sub-properties to represent specific relations.  There is no well defined expectations of what the domain and range are for these properties.  If we allow context alone to determine meaning for these properties, we fail to provide enough logic for effective use (and related governance) for the user base.  It raises questions of who's definition of what context is appropriate.

    A better solution might be via the following two blocks:

    (Kind) -- type --> (Class)
    (Individual) -- type --> (Kind)
    (Organization) -- type --> (Kind)
    hasEmail -- type --> (Property)
    hasEmail -- range --> (Email)

    These are common ontology definitions provided by a "specific domain" standards body (like CTI).  The ontology definitions for "Class", "Property", "type" and "range" are given by another "general domain" standards body--the "specific domain" standard aligns with a "general domain" standard that helps define and share the specific work with specific domain users as well as a larger community of data users.

    (John) -- type --> (Individual)
    (John) -- hasEmail --> (_container1_)
    (_container1_) -- type --> (Home)
    (_container1_) -- hasValue --> (mailto:john@example.org)

    The above is a user created document using the combined standard definitions for the specific and general domains.  The "hasEmail" property implies that "_container1_" is an "Email" class type in the prior ontology statement.  It is also directly defined as a "Home" class type and, therefore, defined as a Home Email address.  In knowledge graph speak, "_container1_" is called a Blank Node as it's just a place to consolidate related data and does not need to be a Named Node.

    It is also interesting to note that the ontology statements are declared and defined in exactly the same way that data statements are defined.  Then, redefinition and deprecation of ontology statements can be automated in these solutions since it's machine readable.  For the current STIX protocol, change requires independent work by every implementer.  By using standardized ontologies, we minimize impact on the implementations.

    In fact, I've snuck in the W3C vCard Ontology for this example.  This highlights the very relevant reason to adopt a namespace solution.  STIX can use existing ontologies to augment its own ontology--we don't need to reinvent the wheel...again.

    For Con 4:

    Without a well defined ontology, the STIX protocol is difficult to use with other data standards without a lot of extra implementation work on individual organizations.  This forces them to develop bespoke data solutions in their own bubble.  That, in turn, makes it difficult for them to effectively share their data with others and adds additional maintenance cost to their solutions.  The status quo fosters a fractured user domain.

    By adopting an ontology solution, we must also embrace namespace solutions to help us share data in the CTI domain with others.  We become a resource for a larger community of data users.  A STIX namespace (and possible sub namespaces) organizes the ontology and compliant data to be shared and used with other ontologies.  For instance, if NIEM and STIX adopt ontological solutions, the ontologies can be aligned by establishing equivalency statements--a NIEM ThingX class can be equivalent to a STIX ThingY class.  This mitigates any need for us to agree to use a common term or to develop any kind of data conversion process.  The equivalencies allows the standardized ontology engines to query on one term and provide results that are otherwise defined by another term.

    This helps an analyst "Get Things Done"(tm).  It helps our community to connect with other communities.  The descriptive logic that comes with an ontology provides inferencing--new data can be inferred from existing data.  For instance, (John) -- type --> (Individual)implies John is also a "Kind" by the subclass relation.  If we query for things of type "Kind" using an inferencing engine, we get John as a result.  While these are simple examples, the implication for this shift is HUGE for the user base.

    This knowledge technology is also being used behind the scenes for the current AI movement.  AI solutions have difficulty explaining how they arrive at a solution and proving that they are correct solutions.  With knowledge graph technologies, the AIs have a way to "explain" how solutions are derived.  They also have a ready made ML Feature selection set from the knowledge graph repositories.

    Conclusion:

    The point is, that by shifting to an ontology solution, we help a larger community get work done with lower costs to everyone and better, standardized, machine ingestible definitions. It provides for a natural extendibility as a consequence--independent groups can define their own ontologies to use with the STIX ontology without concern. STIX bundles can have extension data using extension ontologies in a document without impacting end users who haven't implemented an extension--they are just extra data elements that can be ignored or processed for later use.  If they want to use that data, they can ingest the related ontology at will and adjust their queries in short order.

    I can present these points in more detail as requested.



    ------------------------------
    Keven Ates
    US Federal Bureau of Investigation
    Washington DC
    ------------------------------


  • 4.  RE: Thoughts and ideas

    Posted 07-11-2024 10:29

    I have to admit I'm not yet sold on going for a STIX / TAXII 3.0 at this point and if so I would want to make it a lot more incremental than STIX 1 to STIX 2 with the goal of keeping it mostly backwards compatible with all changes being easy to hide behind libraries.  For me:

     

    Pros:

     

    1. The STIX 2.1 work done for deterministic IDs was a huge win for producing STIX efficiently at the edge, but I would like to see this further enhanced by removing the requirement to have a created property for the original creation date for SDOs and relationships.
    2. STIX 2.1 has provided a fairly decent guide for implementers to create more consistent objects within the domain.
    3. STIX 2.1's open vocabs and enumerations have proven robust from my experience and are easy for implementers to take advantage of
    4. Both STIX 2.1 and TAXII 2.1 clients have shown a fairly low barrier to entry to do things decently.  The general push to avoid giving duplicate paths has helped push most people down a pretty good path.  That said I have noticed confusion regarding Malware Analysis and Sightings so we can probably still improve things.
    5. Extensions have proven a robust option within STIX 2.1.
    6. Data markings are intuitive to apply and it's easy to show how you can effectively redact objects from feeds using them.  There is a con on this as well.

     

    Cons:

     

    1. TAXII 2.1 was made to be really simple and as such TAXII query was not included in the base spec but instead only called out for as part of interop.  TAXII also only restricts content by feed by default but we would have benefited from including rules on object level markings in my opinion.  That said this could be addressed in a 2.2 without a 3.0 release.
    2. STIX objects would benefit from being less verbose when sent over the wide / stored in bundles.  For some of our malware processing results we end up with a 2GiB STIX JSON compared to the 500MB JSON for the system specific JSON file.  I think we would benefit from removing the created property from SDOs as just using modified for versions.  This would save space and allow for better async STIX production with deterministic IDs.  It also might be useful to allow default properties to be defined within a bundle so when packaging STIX to be sent properties like spec_version, created_by_ref, and object_marking_refs could be packaged into a single set of values at the bundle level when transporting it enmass instead of treating each as always being stored independently.
    3. We should work to define an ontological mapping for enumerations and open vocabularies and give standard libraries for reading these.  This would make it easy for developers to deploy their solutions without having to worry about ontologies or googling the exact syntax of a nested namespace like "action.uses" while still allowing smarter tools to take advantage of it.
    4. We should define a new class of objects for containers.  Reports and Groups aren't SDOs they are containers for other objects and how we treat them needs to be different.
    5. Reports should drop the required published timestamp since right now it's a pain for internal systems to share information about reports that are still being drafted but not officially published.  This field can be changed so simply allowing them to be marked unpublished would be a lot more efficient than either having a fake value or having to change object types between grouping and report depending on the stage of the work being done.
    6. While STIX is really impressive for its graph based model I think we understated the importance of using Reports (or Groups which I like less) to actually create bundles of shared / related information.  From my work I found it was often best when integrated with TAXII servers to listen for new Report objects and then pull down all of the content and validate that so it can then loaded into a system for analysts or automated tooling to work with.  At least from the incident side.
    7. As a bit of minor terminology fun Incident might have been an unfortunate name since most other standards refer to them as cases.  Where Incident is a type of case.  On the STIX side we ended up having to say the two were the same, but that you could say an incident was a false positive or simply being investigated to ensure they ended up as synonyms.
    8. We would have from a way to version pin objects and digital signatures so I hope we can get that into the next release.

     

    Mixed:

    1. Relationships have a weird place in STIX at the moment.  On the one hand having them separated into other objects allows for really efficient information control and allows secondary parties to effectively add or comment on them which is awesome.  However processing them as their own objects raises the processing burden for STIX a lot since you need to first load every object in a Report (and if you don't have a Report / Group good luck) and then figure out how they related to each other.  If you're dealing with a knowledge graph you can do a bit more to stream this but you're still best off getting them as reports so you can easily ensure that the objects a relationship refers to are loaded prior to the relationship to keep things working smoothly.  Their position as a hypergraph vs a standard graph makes it so a lot of tooling doesn't work nearly as well against STIX as it would if we could simplify the graph though.  So it might have been better if we could find a way to simplify edges while still providing a way to assert that relationships instead of entire objects were wrong while still also attaching verbs, time bounding, and data markings to these relationships.
    2. I do like the idea of giving guidance in the spec how we should interpret which properties change an objects class or as Keven mentioned which are sub vs super properties.  I just don't want implementers to have to worry about namespaces or the language itself since that's why XML lost to JSON in the first place. https://corecursive.com/json-vs-xml-douglas-crockford/ where one point he made which really resonates with me is, "So there is an intellectual trap in XML, that it's really complicated. It doesn't look like it should be complicated. It's just angle brackets, but the semantics of XML can be really complicated, and they assumed it was complicated for a reason. This problem of data interchange is so complex that there's a conservation of complexity, that your format has to be complex, and the tools supporting that format have to be complex, so that the applications are manageable. And that's a false assumption, there is no such thing as conservation of complexity."  I want to make sure that while we can help guide how tools interpret a hierarchy that we don't push this burden onto every implementer of the STIX specification for exactly this reason.
    3.  

     

    //SIGNED//

     

    Jeffrey Mates, Civ DC3/XT

    Computer Scientist

    Information Technology

    jeffrey.mates@us.af.mil

     






  • 5.  RE: Thoughts and ideas

    Posted 07-11-2024 14:39

    Welcome back Bret and Thank you for starting this thread! During our monthly sessions members have shown interests updates for STIX and TAXII to include the use of semantic technology and new TAXII capabilities. We also have a series of items that didn't get incorporated into STIX 2.1 (requests) or TAXII 2.1 (requests) releases.

    We hope to address those are all topics (and more) through our CTI TC Working Calls. TC continue the discussion, RVSP to the working calls, and participate by joining, proposing, and leading sessions!

    -Marlon 



    ------------------------------
    Marlon Taylor
    US DHS Cybersecurity and Infrastructure Security Agency (CISA)
    ------------------------------



  • 6.  RE: Thoughts and ideas

    Posted 07-12-2024 19:24

    Bret,

    The DoD/IC Ontology Working Group (DIOWG) has a DoD IC Ontology Foundry--a registry for "officially" related ontologies for its community.  It is modeled on the OBO Ontology Foundry.  It is structured as a hierarchy.  The Basic Formal Ontology is the top.  The Common Core Ontology follows and encompasses all the domain specific ontologies.  One of them is the Cyber Ontology.  Ideally, the Cyber Ontology Working Group would oversee our STIX ontology submission into that space.  This would be a tremendous win for the CTI TC.

    For implementation, the upper level ontologies are generally not used for any real work that would interest us--its primarily an ontology organization strategy.  We would concentrate on our domain ontology work and use with other intersecting domains.  As a way to short circuit to a STIX ontology, there is a faithful rendition of the STIX protocol at the TAC TC (you may be aware).  While it represents a fairly faithful rendition of the STIX protocol, this work needs review by the CTI TC to ensure alignment.

    See the Pizza Ontology for an (humorous) intro to ontology work.  It demonstrates 3 of many (what I call transport) formats: JSON, OWL, TTL.  The JSON format is, well, JSON.  A better implementation would use JSON-LD to be more compact.  The OWL format is an XML implementation and is generally considered the baseline standard due to historical reasons.  The TTL format (a.k.a., Turtle) is the most popular format by far...very human readable and generally more condensed.  Each file is "graph equivalent" to the others of the same name.  Reviewing a file, you can see how the various classes and properties are defined, their terms, labels, etc.

    The related technology makes the transport format irrelevant.  See this discussion for an overview of the format non-issue.

    @JeffreyMates - Mixed, 2:

    The linked article also says, "One of the lessons Doug would like to pass on is, don't be too tied down to the way you develop software. Embrace and explore new paradigms."

    His recollection on getting JSON adoption over XML feels much like this discussion on namespaces.

    Namespaces are a simplifier, not a complicator.  They helps organize code and data.  Namespaces are used all over JavaScript.  In fact, its a primary function of the language--when you declare an object literal, you literally (pun intended) declare a namespace scope.  Doug supports it when he is referencing the "closure" issue.  The Object Oriented paradigm is also in JavaScript with "class", so I don't get his statements on the subject since class is a "closure" concept.

    By not using namespaces, we force the STIX protocol to apply other complexities within to compensate.  It forces implementer without to apply a defacto namespace when they mix STIX with non-STIX data--a STIX "indicator" (a thing, especially a trend or fact, that indicates the state or level of something) is very different from an vehicle "indicator" (a gauge or meter of a specified kind.), but a STIX "indicator" may be related to a vehicle "indicator" (CAN bus attacks anyone?).  So, how do we reduce confusion?  Namespaces...easy!  We can play in a bigger world and freely mix STIX data with other data.  Resistance is futile!



    ------------------------------
    Keven Ates
    US Federal Bureau of Investigation
    Washington DC
    ------------------------------



  • 7.  RE: Thoughts and ideas

    Posted 07-15-2024 09:43

    Keven,

     

    Fair point on the value of namespaces when jumping between ontologies.  I have to admit my own person bias against them stems from my STIX 1.1 days, and the frequency with which they seemed to randomly break things (like xpaths) or cause developer confusion (indicator:indicator stix:Indicator, stixCommon:Indicator?).  I personally was hoping this is where we could look to something like NIEM to provide scaffolding for folks that want strong cross domain defined namespaces through a logical conversion layer that won't burden folks writing network sensors to consider how CAN bus sensors or MIR sensors need to view the world.

     

    Doing this won't be an easy task because it will mean reviving the NIEM cyber domain to properly map to STIX as we have been working to expand it and having linkages from STIX into NIEM Core, but if we can focus on providing tooling to do this instead of reworking the STIX standard to be namespaces no one will need to know that depending on the identity_class they use it would convert to a nc:PersonType with the name property going to nc:PersonName vs nc:OrganizationType with the name property mapping to nc:OrganizatioName.

     

    I know the TAC TC has also done a lot of work in this area already, but from my time talking to developers working on CASE / UCO and my own time working with STIX 1 vs 2  I strongly believe that one of the most important factors driving standards is how easy it is to produce usable content at the edge that you can consume with legacy systems on the backend without much additional work.  Dedicated and focused systems on the backend often come later and are secondary to the market forces that drive the first two factors.

     

    //SIGNED//

     

    Jeffrey Mates, Civ DC3/XT

    Computer Scientist

    Information Technology

    jeffrey.mates@us.af.mil

     






  • 8.  RE: Thoughts and ideas

    Posted 07-15-2024 15:35

    Keven,

    To clarify the scope of work you're envisioning, RDF-based ontologies describe relationships between resources, physical and digital.  An automobile, a person, or a pizza are physical resources, while documents and messages (IP packets, JPG files, html web pages, REST protocol bodies, or STIX documents) are digital resources.  So an ontology would describe relationships between subjects and objects, e.g., {bundle, contains, sco), but not the bytes traveling over the wire in a TAXII message with a bundle containing some sco's, which are JSON values defined by the standard, not JSON-LD or TTL values. Is this what you have in mind?

    I was hoping the Pizza ontology would be enlightening, with both the ontology and some documents and message examples exchanged in the pizza manufacturing process.  But unless I missed something, the only thing pizza related in https://raw.githubusercontent.com/TrustedAI-KG-NLP/cco-pizza/main/pizza-full.ttl is the title: 

    <http://purl.org/dc/elements/1.1/title> "Pizza Manufacturing Ontology based on Common-Core Ontology" ;

    There's nothing related to pizzas, recipes, pepperoni and mushrooms, or factories, parlors and employees.  A STIX ontology provides the hooks for enriching STIX things with a huge universe of related material, but would not define the STIX documents themselves?  A worked example of the desired work products would help.

    RDF makes an enlightening distinction between ontologies and information models. RDF Datatypes define a "Lexical-to-value (L2V) mapping", where lexical values are byte sequences sent over the wire and (logical) values exist within applications.  Application logical values are mapped to lexical representations by parsing and serialization.  But RDF:

    • defines L2V mappings only for primitive values (strings, numbers, some standard values like dates/times, URIs, etc.)
    • supports only character sequence lexical values, not byte sequences

    Because information models are built of datatypes rather than relationship tuples, they directly define documents and messages consisting of both primitives and structures, and support both text and binary lexical values.

    "The related technology makes the transport format irrelevant.  See this discussion for an overview of the format non-issue."

    "My Questions: What serializations will CCO be offered in? If more than one, are they all equivalent, or is one the standard and others derived?"

    "RDF serializations are equivalent. Any one can be converted into any of the others, using publicly-available tools or library features rdflib (not endorsing either, just indicating examples)."


    To be clear, ontology serializations are equivalent and can represent ontologies in any format.  But document/message serializations (lexical values) are defined by information datatypes to achieve the same equivalence - a message in XML can be parsed and re-serialized to JSON or CBOR or Protobuf and back to XML without loss because an IM defines the essential content that needs to be preserved.

    (Oh, and I agree wholeheartedly that namespaces are a great simplifier - they define the scope over which uniqueness must be preserved, and beyond which independent developers can work without coordination.)



    ------------------------------
    David Kemp
    National Security Agency
    ------------------------------