CTI STIX Subcommittee

Expand all | Collapse all

Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

  • 1.  Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 14:39





    John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not yet been mentioned (big memory oversight on my part).

    That is that in addition to specifying schema-validatable vocabularies (whether default or custom) or just specifying a free-form string with no idea where it comes from there is also the ability to specify a non-schema-validated free-form string but
    to explicitly reference a definition for a vocabulary that it comes from.
    ControlledVocabularyStringType (the
    type for all the properties we are talking about) has two additional properties “vocab_name” and “vocab_reference” to serve the exact use case John describes ("we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide
    an ability to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema”). This approach may also be a great one for the original situation Jason described with his users defining values.


    Here is Section 3.7 of the STIX 1.2.1 specification part2-common:



    JA
    X-NONE



























































































































































































































































































































    1.1 Vocabulary
    Data Types
    There are three vocabulary-related UML data types defined in the Common data model, and together they provide a content creator with four choices for defining content, listed below in order of formality. Please
    see STIX TM Version 1.2.1 Part 14: Vocabularies for further information on STIX vocabularies.


    ·       
    Leverage a default vocabulary
    using the ControlledVocabularyStringType data type. STIX v1.2.1
    defines a collection of default vocabularies and associated enumerations that are based on input from the STIX community (see
    STIX TM Version 1.2.1 Part 14: Vocabularies );
    however, not all vocabulary properties have an assigned default vocabulary.

    ·       
    Formally define a custom vocabulary
    using the ControlledVocabularyStringType data type. To achieve
    value enforcement, a custom vocabulary must be formally added to the STIX Vocabulary data model.  Because this is an extension of the STIX Vocabulary data model, producers and consumers MUST be aware of the addition to the data model for successful sharing
    of STIX documents.

    ·       
    Reference an externally-defined, custom vocabulary
    using the UnenforcedVocabularyStringType data type to constrain
    the set of values. Externally-defined vocabularies are publically defined, but have not been included as formally specified vocabularies within the STIX Vocabulary data model using the
    ControlledVocabularyStringType data type.  In this case, it is
    sufficient to specify the name of the vocabulary and a URL that defines that vocabulary.

    ·       
    Choose an arbitrary and unconstrained value
    using the VocabularyStringType data type.
    While not required by the general STIX language, default vocabularies should be used whenever possible to ensure the greatest level of compatibility between STIX users.  If an appropriate default vocabulary
    is not available a formally defined custom vocabulary can be specified and leveraged. In addition to compatibility advantages, using formally defined vocabularies (whether default vocabularies or otherwise defined) enables enforced use of valid enumeration
    values; please see STIX TM Version 1.2.1 Part 14: Vocabularies for the associated policy. 

    If a formally defined vocabulary is not sufficient for a content producer’s purposes, the STIX Vocabulary data model allows the two alternatives listed above: externally defined custom vocabularies and arbitrary
    string values, which dispense with enumerated vocabularies altogether.  If a custom vocabulary is not formally added to the Vocabulary data model then no enforcement policy of appropriate values is specified.
    The UML diagram shown in   REF _Ref419296006 h   * MERGEFORMAT
    Figure 3-20
    illustrates the relationships between the three vocabulary data types defined in the STIX Common data model. As illustrated, all controlled vocabularies formally defined within the STIX Vocabulary data model are defined using an enumeration derived from the
    ControlledVocabularyStringType data type. 

    As shown, the
    HighMediumLowVocab-1.0 enumeration (used as a defined controlled vocabulary exemplar) is defined as a specialization of the
    ControlledVocabularyStringType data type, and therefore it is also a specialization of the
    VocabularyStringType data type.

    Further details of each vocabulary class are provided in Subsections   REF _Ref418766010
    h   * MERGEFORMAT
    3.7.1
    through   REF _Ref420936722
    h   * MERGEFORMAT
    3.7.3 .





    Figure
      STYLEREF 1 s 3 - SEQ Figure * ARABIC s 1 20 .
    UML diagram of the STIX TM Vocabulary data model
    1.1.1 VocabularyStringType
    Data Type
    The
    VocabularyStringType data type is the basic data type of all vocabularies. Therefore, all properties in the collection of STIX data models that makes use of the Vocabulary data model must be defined to use the
    VocabularyStringType data type. Because this data type is a specialization of the
    basicDataTypes:BasicString data type, it can be used to support the arbitrary string option for vocabularies.
    1.1.2 UnenforcedVocabularyStringType
    Data Type
    The
    UnenforcedVocabularyStringType data type specifies custom vocabulary values via an enumeration
    defined outside of the STIX Vocabulary data model.  It extends the VocabularyStringType
    data type. Note that the STIX vocabulary data model does not define any enforcement policy for this data type.

    The property table of the
    UnenforcedVocabularyStringType data type is
    given in   REF _Ref419330869 h   * MERGEFORMAT Table
    3-46 .
    Table   STYLEREF 1 s 3 - SEQ Table * ARABIC s 1 46 .
    Properties of the UnenforcedVocabularyStringType data type




    Name


    Type


    Multiplicity


    Description




    vocab_name


    basicDataTypes:
    NoEmbeddedQuoteString


    0..1


    The vocab_name property specifies the name of the externally defined vocabulary.




    vocab_reference


    basicDataTypes:URI


    0..1


    The vocab_reference property specifies the location of the externally defined vocabulary using a Uniform Resource Identifier (URI).




    1.1.3 ControlledVocabularyStringType Data Type
    The
    ControlledVocabularyStringType data type specifies a formally defined vocabulary. It is an abstract data type so it MUST be extended via an enumeration from the STIX Vocabulary data model (descriptions of all default vocabularies defined within the STIX
    Vocabulary data model are found in STIX TM Version 1.2.1 Part 14: Vocabularies [i] ).
    Any custom vocabulary must be defined via an enumeration added to the STIX Vocabulary data model, if appropriate enumeration values are to be enforced.
    The
    ControlledVocabularyStringType class has no properties of its own, so there is no associated property table.




    [i] Note that all defined vocabulary enumerations have version numbers in their names to facilitate additions to the enumerations that are backward
    compatible.













    I  apologize for the oversight. I went to the “obvious” answer too soon without stepping back and thinking big picture. This is something I hope we as a community try to avoid.


    sean









    From: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > on behalf of John Wunder < jwunder@mitre.org >
    Date: Friday, October 23, 2015 at 10:20 AM
    To: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org >
    Subject: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions






    Even if we don’t choose to issue a new one for STIX 1.2.1 (I’m unclear on how that would work implementation-wise…it seems to be asking for compatibility issues)
    it would be awesome to come up with a more coherent and comprehensive list. So yeah, I say go for it.


    Thinking longer term, another option would be a simpler implementation of the concepts that we cover now. For example, we could just choose to have an unvalidated
    vocabulary: we would define the vocabulary and provide an ability to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema.


    I think in STIX 1.x we were a bit too strict about trying to get everything to validate in schema and in the end we just made things that should be simple much more
    complicated.


    John




    On Oct 23, 2015, at 9:38 AM, Jason Keirstead < Jason.Keirstead@CA.IBM.COM > wrote:



    Note, I have made this reply to CTI-STIX from CTI-Users

    I agree pretty much 100% with what you say Bernd. I see there is a bit of a conflict here

    - There is obviously a need to have a controlled vocabulary, so that tools and researchers can share categorized intelligence efficiently; however...

    - The current vocabulary list is seemingly arbitrary - and has many gaps, and also redundancies, as you mentioned. Off the top of my head it should have 2x - 3x as many options, and like you mention, some are redundant. I totally agree that it makes no sense
    to have different Watchlist types when that can be inferred easily from the data.

    Due to how STIX 1.X is constructed, we can easily revision this vocabulary as a non-breaking change. I would propose that the STIX TC undertake a work product to revision this vocabulary. This is a "quick win" that the TC can provide.

    If desired - I would volunteer to take the initial stab at extending the vocabulary.

    -
    Jason Keirstead
    Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown


    <graycol.gif> "Grobauer, Bernd" ---2015/10/23 07:50:32 AM---Hi, > I heard a recent proposal to remove it entirely. What would be the

    From: "Grobauer, Bernd" < Bernd.Grobauer@siemens.com >
    To: " jwunder@mitre.org " < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, " Cliff.Palmer@gd-ms.com "
    < Cliff.Palmer@gd-ms.com >
    Cc: " cti-users@lists.oasis-open.org " < cti-users@lists.oasis-open.org >
    Date: 2015/10/23 07:50 AM
    Subject: RE: [cti-users] Indicator Type / Vocabulary Implementation Questions
    Sent by: < cti-users@lists.oasis-open.org >





    Hi,

    > I heard a recent proposal to remove it entirely. What would be the
    > impact of that?

    I had made the suggestion to remove the IncidentType entirely in
    my somewhat provocative mail a few weeks ago, in which I wanted
    to explore how much potential for simplification in going towards
    STIX 2.0 there might be.

    Why had I suggested to remove it?

    The main reason is that I do not find the values that are currently part of the
    standard vocabulary particularly useful:

    - Why would I put 'IP Watchlist' or 'Domain Watchlist' or 'File Hash Watchlist'
     into the Indicator Type? I could understand "Watchlist", which tells you
     to watch for whatever Observable Patterns are indicated in the indicator.

    - Another type is 'C2' -- at the same time I have the ability to reference
     in the indicator a kill chain phase ... and if the referenced kill chain
     is of any use, it will have something corresponding to 'C2'.

     Now I have (again) two ways of expressing the same thing ... we have
     just stumbled over this issue a few days ago in a sharing group we
     are part of: we use the reference to the killchain phase to indicate
     C2-activity, others use the indicator type.

     Similarly, "Exfiltration" -- should that not be described with a reference
     from the indicator to an TTP "Exfiltration"?

    Other entries in the standard vocabulary ("Malicious Email", "Host Characteristics")
    seem like there would be no end to the list of allowed vocabulary (think
    "Malicious <enter CybOX object type here>" as pattern for generating vocabulary...)

    My suggestion to get rid of the indicator type was really a bit of a calculated
    provocation -- I have no trouble with keeping it in STIX. But we should
    ensure that the standard vocabulary is defined such that it really adds
    value rather than adding confusion by allowing yet more ways to describe
    the same thing in different ways.

    Kind regards,

    Bernd

    ----------------

    Bernd Grobauer, Siemens CERT




















  • 2.  Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 14:43
      |   view attached
    The twofold problem with this approach however is what I mentioned previously - Many STIX consuming systems do not have internet access, and thus can not download arbitrary vocabularies created by others - Entities producing STIX documents may also not have any access to a reasonable place to post a vocabulary for the public. - Jason Keirstead Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown "Barnum, Sean D." ---2015/10/23 11:38:47 AM---John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not From: "Barnum, Sean D." <sbarnum@mitre.org> To: "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> Date: 2015/10/23 11:38 AM Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions Sent by: <cti-stix@lists.oasis-open.org> John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not yet been mentioned (big memory oversight on my part). That is that in addition to specifying schema-validatable vocabularies (whether default or custom) or just specifying a free-form string with no idea where it comes from there is also the ability to specify a non-schema-validated free-form string but to explicitly reference a definition for a vocabulary that it comes from. ControlledVocabularyStringType (the type for all the properties we are talking about) has two additional properties “vocab_name” and “vocab_reference” to serve the exact use case John describes ("we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema”). This approach may also be a great one for the original situation Jason described with his users defining values. Here is Section 3.7 of the STIX 1.2.1 specification part2-common:
    1.1 Vocabulary Data Types There are three vocabulary-related UML data types defined in the Common data model, and together they provide a content creator with four choices for defining content, listed below in order of formality. Please see STIX TM Version 1.2.1 Part 14: Vocabularies for further information on STIX vocabularies. · Leverage a default vocabulary using the ControlledVocabularyStringType data type. STIX v1.2.1 defines a collection of default vocabularies and associated enumerations that are based on input from the STIX community (see STIX TM Version 1.2.1 Part 14: Vocabularies ); however, not all vocabulary properties have an assigned default vocabulary. · Formally define a custom vocabulary using the ControlledVocabularyStringType data type. To achieve value enforcement, a custom vocabulary must be formally added to the STIX Vocabulary data model. Because this is an extension of the STIX Vocabulary data model, producers and consumers MUST be aware of the addition to the data model for successful sharing of STIX documents. · Reference an externally-defined, custom vocabulary using the UnenforcedVocabularyStringType data type to constrain the set of values. Externally-defined vocabularies are publically defined, but have not been included as formally specified vocabularies within the STIX Vocabulary data model using the ControlledVocabularyStringType data type. In this case, it is sufficient to specify the name of the vocabulary and a URL that defines that vocabulary. · Choose an arbitrary and unconstrained value using the VocabularyStringType data type. While not required by the general STIX language, default vocabularies should be used whenever possible to ensure the greatest level of compatibility between STIX users. If an appropriate default vocabulary is not available a formally defined custom vocabulary can be specified and leveraged. In addition to compatibility advantages, using formally defined vocabularies (whether default vocabularies or otherwise defined) enables enforced use of valid enumeration values; please see STIX TM Version 1.2.1 Part 14: Vocabularies for the associated policy. If a formally defined vocabulary is not sufficient for a content producer’s purposes, the STIX Vocabulary data model allows the two alternatives listed above: externally defined custom vocabularies and arbitrary string values, which dispense with enumerated vocabularies altogether. If a custom vocabulary is not formally added to the Vocabulary data model then no enforcement policy of appropriate values is specified. The UML diagram shown in Figure 3-20 illustrates the relationships between the three vocabulary data types defined in the STIX Common data model. As illustrated, all controlled vocabularies formally defined within the STIX Vocabulary data model are defined using an enumeration derived from the ControlledVocabularyStringType data type. As shown, the HighMediumLowVocab-1.0 enumeration (used as a defined controlled vocabulary exemplar) is defined as a specialization of the ControlledVocabularyStringType data type, and therefore it is also a specialization of the VocabularyStringType data type. Further details of each vocabulary class are provided in Subsections 3.7.1 through 3.7.3 . Figure 3-20. UML diagram of the STIX TM Vocabulary data model 1.1.1 VocabularyStringType Data Type The VocabularyStringType data type is the basic data type of all vocabularies. Therefore, all properties in the collection of STIX data models that makes use of the Vocabulary data model must be defined to use the VocabularyStringType data type. Because this data type is a specialization of the basicDataTypes:BasicString data type, it can be used to support the arbitrary string option for vocabularies. 1.1.2 UnenforcedVocabularyStringType Data Type The UnenforcedVocabularyStringType data type specifies custom vocabulary values via an enumeration defined outside of the STIX Vocabulary data model. It extends the VocabularyStringType data type. Note that the STIX vocabulary data model does not define any enforcement policy for this data type. The property table of the UnenforcedVocabularyStringType data type is given in Table 3-46 . Table 3-46. Properties of the UnenforcedVocabularyStringType data type
    Name Type Multiplicity Description
    vocab_name basicDataTypes: NoEmbeddedQuoteString 0..1 The vocab_name property specifies the name of the externally defined vocabulary.
    vocab_reference basicDataTypes:URI 0..1 The vocab_reference property specifies the location of the externally defined vocabulary using a Uniform Resource Identifier (URI).
    1.1.3 ControlledVocabularyStringType Data Type The ControlledVocabularyStringType data type specifies a formally defined vocabulary. It is an abstract data type so it MUST be extended via an enumeration from the STIX Vocabulary data model (descriptions of all default vocabularies defined within the STIX Vocabulary data model are found in STIX TM Version 1.2.1 Part 14: Vocabularies [i] ). Any custom vocabulary must be defined via an enumeration added to the STIX Vocabulary data model, if appropriate enumeration values are to be enforced. The ControlledVocabularyStringType class has no properties of its own, so there is no associated property table. [i] Note that all defined vocabulary enumerations have version numbers in their names to facilitate additions to the enumerations that are backward compatible. I apologize for the oversight. I went to the “obvious” answer too soon without stepping back and thinking big picture. This is something I hope we as a community try to avoid. sean From: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > on behalf of John Wunder < jwunder@mitre.org > Date: Friday, October 23, 2015 at 10:20 AM To: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > Subject: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions Even if we don’t choose to issue a new one for STIX 1.2.1 (I’m unclear on how that would work implementation-wise…it seems to be asking for compatibility issues) it would be awesome to come up with a more coherent and comprehensive list. So yeah, I say go for it. Thinking longer term, another option would be a simpler implementation of the concepts that we cover now. For example, we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema. I think in STIX 1.x we were a bit too strict about trying to get everything to validate in schema and in the end we just made things that should be simple much more complicated. John
    On Oct 23, 2015, at 9:38 AM, Jason Keirstead < Jason.Keirstead@CA.IBM.COM > wrote:
    Note, I have made this reply to CTI-STIX from CTI-Users I agree pretty much 100% with what you say Bernd. I see there is a bit of a conflict here - There is obviously a need to have a controlled vocabulary, so that tools and researchers can share categorized intelligence efficiently; however... - The current vocabulary list is seemingly arbitrary - and has many gaps, and also redundancies, as you mentioned. Off the top of my head it should have 2x - 3x as many options, and like you mention, some are redundant. I totally agree that it makes no sense to have different Watchlist types when that can be inferred easily from the data. Due to how STIX 1.X is constructed, we can easily revision this vocabulary as a non-breaking change. I would propose that the STIX TC undertake a work product to revision this vocabulary. This is a "quick win" that the TC can provide. If desired - I would volunteer to take the initial stab at extending the vocabulary. - Jason Keirstead Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown <graycol.gif> "Grobauer, Bernd" ---2015/10/23 07:50:32 AM---Hi, > I heard a recent proposal to remove it entirely. What would be the From: "Grobauer, Bernd" < Bernd.Grobauer@siemens.com > To: " jwunder@mitre.org " < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, " Cliff.Palmer@gd-ms.com " < Cliff.Palmer@gd-ms.com > Cc: " cti-users@lists.oasis-open.org " < cti-users@lists.oasis-open.org > Date: 2015/10/23 07:50 AM Subject: RE: [cti-users] Indicator Type / Vocabulary Implementation Questions Sent by: < cti-users@lists.oasis-open.org > Hi, > I heard a recent proposal to remove it entirely. What would be the > impact of that? I had made the suggestion to remove the IncidentType entirely in my somewhat provocative mail a few weeks ago, in which I wanted to explore how much potential for simplification in going towards STIX 2.0 there might be. Why had I suggested to remove it? The main reason is that I do not find the values that are currently part of the standard vocabulary particularly useful: - Why would I put 'IP Watchlist' or 'Domain Watchlist' or 'File Hash Watchlist' into the Indicator Type? I could understand "Watchlist", which tells you to watch for whatever Observable Patterns are indicated in the indicator. - Another type is 'C2' -- at the same time I have the ability to reference in the indicator a kill chain phase ... and if the referenced kill chain is of any use, it will have something corresponding to 'C2'. Now I have (again) two ways of expressing the same thing ... we have just stumbled over this issue a few days ago in a sharing group we are part of: we use the reference to the killchain phase to indicate C2-activity, others use the indicator type. Similarly, "Exfiltration" -- should that not be described with a reference from the indicator to an TTP "Exfiltration"? Other entries in the standard vocabulary ("Malicious Email", "Host Characteristics") seem like there would be no end to the list of allowed vocabulary (think "Malicious <enter CybOX object type here>" as pattern for generating vocabulary...) My suggestion to get rid of the indicator type was really a bit of a calculated provocation -- I have no trouble with keeping it in STIX. But we should ensure that the standard vocabulary is defined such that it really adds value rather than adding confusion by allowing yet more ways to describe the same thing in different ways. Kind regards, Bernd ---------------- Bernd Grobauer, Siemens CERT [attachment "27E231B6-B98D-4FC2-8B03-3FC81C1937D2.png" deleted by Jason Keirstead/CanEast/IBM]




  • 3.  Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 15:14
      |   view attached





    Understood.


    This seems like a fundamental issue with diverse information exchange. 
    The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the content itself. This was discussed way back when the vocabulary structure was being tackled
    and was ruled out as unneccessary complexity and something best handled outside the scope of STIX. That is why the “vocab_name” and “vocab_ref” properties were added as a mid-ground between schema-validatable vocab definitions which had a hard requirement
    to access to the vocab definition and pure free-form string with no context of where it came from. The idea is that producers could simply reference vocabularies by name and/or by link to an actual explanatory definition in order to give context to a free-form
    string. These could be common vocabularies used out in the world or defined by another standard or could be custom ones where the producer  could post a simple web page listing the vocabulary values with definitions for each. 


    I am certainly not asserting that everything is perfect or that all problems are solved without cost or downside. I think there will always be some issues we have to work around. I just wanted to provide some context on what capability is currently available
    and why it is the way it currently is.


    Thank you for the great conversation around this topic. If nothing else comes out of it other than a renewed momentum to improving the IndicatorType default vocab I think it will have been well worth our time. ;-)


    sean









    From: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, October 23, 2015 at 10:42 AM
    To: "Barnum, Sean D." < sbarnum@mitre.org >
    Cc: John Wunder < jwunder@mitre.org >, " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org >
    Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions





    The twofold problem with this approach however is what I mentioned previously

    - Many STIX consuming systems do not have internet access, and thus can not download arbitrary vocabularies created by others

    - Entities producing STIX documents may also not have any access to a reasonable place to post a vocabulary for the public.


    -
    Jason Keirstead
    Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown


    "Barnum,
    Sean D." ---2015/10/23 11:38:47 AM---John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not

    From: "Barnum, Sean D." < sbarnum@mitre.org >
    To: "Wunder, John A." < jwunder@mitre.org >, " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org >
    Date: 2015/10/23 11:38 AM
    Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions
    Sent by: < cti-stix@lists.oasis-open.org >





    John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not yet been mentioned (big memory oversight on my part).
    That is that in addition to specifying schema-validatable vocabularies (whether default or custom) or just specifying a free-form string with no idea where it comes from there is also the ability to specify a non-schema-validated
    free-form string but to explicitly reference a definition for a vocabulary that it comes from.
    ControlledVocabularyStringType (the type for all the properties we are
    talking about) has two additional properties “vocab_name” and “vocab_reference” to serve the exact use case John describes ("we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability to indicate which vocabulary
    has been chosen (including no vocab), but would not validate it via schema”). This approach may also be a great one for the original situation Jason described with his users defining values.

    Here is Section 3.7 of the STIX 1.2.1 specification part2-common:


    1.1 Vocabulary Data Types

    There are three vocabulary-related UML data types defined in the Common data model, and together they provide a content creator with four choices for defining content, listed below in order of formality. Please see
    STIX TM Version 1.2.1 Part 14: Vocabularies for further
    information on STIX vocabularies.


    · Leverage a default vocabulary using the
    ControlledVocabularyStringType data type. STIX v1.2.1 defines a collection of default vocabularies and associated enumerations that are based on input from the STIX community (see
    STIX TM Version 1.2.1 Part 14: Vocabularies );
    however, not all vocabulary properties have an assigned default vocabulary.
    · Formally define a custom vocabulary using the
    ControlledVocabularyStringType data type. To achieve value enforcement, a custom vocabulary must be formally added to the STIX Vocabulary data model. Because this is an extension of the STIX
    Vocabulary data model, producers and consumers MUST be aware of the addition to the data model for successful sharing of STIX documents.
    · Reference an externally-defined, custom vocabulary using the
    UnenforcedVocabularyStringType data type to constrain the set of values. Externally-defined vocabularies are publically defined, but have not been included as formally specified vocabularies
    within the STIX Vocabulary data model using the ControlledVocabularyStringType data type. In this case, it is sufficient to specify the name of the vocabulary and a URL that defines that vocabulary.
    · Choose an arbitrary and unconstrained value using the
    VocabularyStringType data type.

    While not required by the general STIX language, default vocabularies should be used whenever possible to ensure the greatest level of compatibility between STIX users. If an appropriate default vocabulary is not available a formally defined
    custom vocabulary can be specified and leveraged. In addition to compatibility advantages, using formally defined vocabularies (whether default vocabularies or otherwise defined) enables enforced use of valid enumeration values; please see
    STIX TM Version 1.2.1 Part 14: Vocabularies for the associated
    policy.
    If a formally defined vocabulary is not sufficient for a content producer’s purposes, the STIX Vocabulary data model allows the two alternatives listed above: externally defined custom vocabularies and arbitrary string values, which dispense
    with enumerated vocabularies altogether. If a custom vocabulary is not formally added to the Vocabulary data model then no enforcement policy of appropriate values is specified.
    The UML diagram shown in Figure 3-20 illustrates the relationships between the three vocabulary data types defined in the STIX Common data model. As illustrated, all controlled
    vocabularies formally defined within the STIX Vocabulary data model are defined using an enumeration derived from the
    ControlledVocabularyStringType data type.

    As shown, the HighMediumLowVocab-1.0 enumeration (used as a defined controlled vocabulary exemplar) is defined as a specialization of the
    ControlledVocabularyStringType data type, and therefore it is also a specialization of the
    VocabularyStringType data type.

    Further details of each vocabulary class are provided in Subsections
    3.7.1 through
    3.7.3 .


    Figure 3-20. UML diagram of the STIX TM Vocabulary data model


    1.1.1 VocabularyStringType Data Type

    The VocabularyStringType data type is the basic data type of all vocabularies. Therefore, all properties in the collection of STIX data models that makes use of the Vocabulary data
    model must be defined to use the VocabularyStringType data type. Because this data type is a specialization of the
    basicDataTypes:BasicString data type, it can be used to support the arbitrary string option for vocabularies.


    1.1.2 UnenforcedVocabularyStringType Data Type

    The UnenforcedVocabularyStringType data type specifies custom vocabulary values via an enumeration defined outside of the STIX Vocabulary data model. It extends the
    VocabularyStringType data type. Note that the STIX vocabulary data model does not define any enforcement policy for this data type.

    The property table of the UnenforcedVocabularyStringType data type is given in
    Table 3-46 .
    Table 3-46. Properties of the
    UnenforcedVocabularyStringType data type




    Name
    Type
    Multiplicity
    Description


    vocab_name
    basicDataTypes:
    NoEmbeddedQuoteString


    0..1

    The vocab_name property specifies the name of the externally defined vocabulary.


    vocab_reference
    basicDataTypes:URI

    0..1

    The vocab_reference property specifies the location of the externally defined vocabulary using a Uniform Resource Identifier (URI).





    1.1.3 ControlledVocabularyStringType Data Type

    The ControlledVocabularyStringType data type specifies a formally defined vocabulary. It is an abstract data type so it MUST be extended via an enumeration from the STIX Vocabulary
    data model (descriptions of all default vocabularies defined within the STIX Vocabulary data model are found in
    STIX TM Version 1.2.1 Part 14: Vocabularies [i] ).
    Any custom vocabulary must be defined via an enumeration added to the STIX Vocabulary data model, if appropriate enumeration values are to be enforced.
    The ControlledVocabularyStringType class has no properties of its own, so there is no associated property table.




    [i] Note that all defined vocabulary enumerations have version numbers in their names to facilitate additions to the enumerations that are backward compatible.





    I apologize for the oversight. I went to the “obvious” answer too soon without stepping back and thinking big picture. This is something I hope we as a community try to avoid.

    sean

    From: " cti-stix@lists.oasis-open.org "
    < cti-stix@lists.oasis-open.org > on behalf of John Wunder < jwunder@mitre.org >
    Date: Friday, October 23, 2015 at 10:20 AM
    To: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org >
    Subject: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Even if we don’t choose to issue a new one for STIX 1.2.1 (I’m unclear on how that would work implementation-wise…it seems to be asking for compatibility issues) it would be awesome to come up with a more coherent and comprehensive
    list. So yeah, I say go for it.

    Thinking longer term, another option would be a simpler implementation of the concepts that we cover now. For example, we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability
    to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema.

    I think in STIX 1.x we were a bit too strict about trying to get everything to validate in schema and in the end we just made things that should be simple much more complicated.

    John



    On Oct 23, 2015, at 9:38 AM, Jason Keirstead < Jason.Keirstead@CA.IBM.COM > wrote:
    Note, I have made this reply to CTI-STIX from CTI-Users

    I agree pretty much 100% with what you say Bernd. I see there is a bit of a conflict here

    - There is obviously a need to have a controlled vocabulary, so that tools and researchers can share categorized intelligence efficiently; however...

    - The current vocabulary list is seemingly arbitrary - and has many gaps, and also redundancies, as you mentioned. Off the top of my head it should have 2x - 3x as many options, and like you mention, some are redundant. I totally agree that it makes no sense
    to have different Watchlist types when that can be inferred easily from the data.

    Due to how STIX 1.X is constructed, we can easily revision this vocabulary as a non-breaking change. I would propose that the STIX TC undertake a work product to revision this vocabulary. This is a "quick win" that the TC can provide.

    If desired - I would volunteer to take the initial stab at extending the vocabulary.

    -
    Jason Keirstead
    Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown


    <graycol.gif> "Grobauer, Bernd" ---2015/10/23 07:50:32 AM---Hi, > I heard a recent proposal to remove it entirely. What would be the

    From: "Grobauer, Bernd" < Bernd.Grobauer@siemens.com >
    To: " jwunder@mitre.org " < jwunder@mitre.org >,
    Jason Keirstead/CanEast/IBM@IBMCA, " Cliff.Palmer@gd-ms.com " < Cliff.Palmer@gd-ms.com >
    Cc: " cti-users@lists.oasis-open.org " < cti-users@lists.oasis-open.org >
    Date: 2015/10/23 07:50 AM
    Subject: RE: [cti-users] Indicator Type / Vocabulary Implementation Questions
    Sent by: < cti-users@lists.oasis-open.org >





    Hi,

    > I heard a recent proposal to remove it entirely. What would be the
    > impact of that?

    I had made the suggestion to remove the IncidentType entirely in
    my somewhat provocative mail a few weeks ago, in which I wanted
    to explore how much potential for simplification in going towards
    STIX 2.0 there might be.

    Why had I suggested to remove it?

    The main reason is that I do not find the values that are currently part of the
    standard vocabulary particularly useful:

    - Why would I put 'IP Watchlist' or 'Domain Watchlist' or 'File Hash Watchlist'
    into the Indicator Type? I could understand "Watchlist", which tells you
    to watch for whatever Observable Patterns are indicated in the indicator.

    - Another type is 'C2' -- at the same time I have the ability to reference
    in the indicator a kill chain phase ... and if the referenced kill chain
    is of any use, it will have something corresponding to 'C2'.

    Now I have (again) two ways of expressing the same thing ... we have
    just stumbled over this issue a few days ago in a sharing group we
    are part of: we use the reference to the killchain phase to indicate
    C2-activity, others use the indicator type.

    Similarly, "Exfiltration" -- should that not be described with a reference
    from the indicator to an TTP "Exfiltration"?

    Other entries in the standard vocabulary ("Malicious Email", "Host Characteristics")
    seem like there would be no end to the list of allowed vocabulary (think
    "Malicious <enter CybOX object type here>" as pattern for generating vocabulary...)

    My suggestion to get rid of the indicator type was really a bit of a calculated
    provocation -- I have no trouble with keeping it in STIX. But we should
    ensure that the standard vocabulary is defined such that it really adds
    value rather than adding confusion by allowing yet more ways to describe
    the same thing in different ways.

    Kind regards,

    Bernd

    ----------------

    Bernd Grobauer, Siemens CERT








    [attachment "27E231B6-B98D-4FC2-8B03-3FC81C1937D2.png" deleted by Jason Keirstead/CanEast/IBM]











  • 4.  Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 16:58




    re: "The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the content itself. "



    https://avro.apache.org/docs/current/










    From: < cti-stix@lists.oasis-open.org > on behalf of Sean Barnum < sbarnum@mitre.org >
    Date: Friday, October 23, 2015 at 11:13 AM
    To: Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Cc: John Wunder < jwunder@mitre.org >, " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org >
    Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions









  • 5.  Re: [cti-stix] [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 17:57
    We just need a way to guide developers so that when they write implementations, those implementations do not go POOF when they get something they did not expect.  We can not assume or rely on the fact that a schema exists somewhere in the wild and that some how software will be able to download it and automatically write code to support it.   When I talk about simplicity, as I have said many times before, it does not mean reducing expressiveness.  <throwing spaghetti at the wall to see what sticks> Perhaps this issue could be solved by a layered approach..... Step 1: Greatly increase the default vocabulary, removing weirdness and things that are duplicate in nature. Try and get the default vocabulary to a 60/40 or 70/30 rule.  It would also be really nice if the terms were backed by a numerical element.  String matching in code in notoriously inefficient.   Step 2: Provide a secondary entry point for an additional vocabulary for those groups that want or need to define their own.  We would obviously need an entry point in to the controlled vocabulary that can be a fall back , something that is a lot higher up the food chain.   By doing something like this, software that can only work with the default vocabulary can skip or throw away things it does not understand and have a fall back to something that is close enough for what they need.  In the second example below, the sub_type would be options and parsers could easily throw it away or skip it.  In fact most JSON parsers today naturally just skip things that do not map to a struct.   {   stixtype : indicator ,   type : IP Watchlist ,   .... { or  {   stixtype : indicator ,   type : Malware ,   sub_type : {     vocab : http://a.b.com/vocab-foo ,     type : X-RAT-Downloader   },   .... { </spaghetti runs down the wall and pools in a gob of mess on the floor> Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.   On Oct 23, 2015, at 10:57, Patrick Maroney < Pmaroney@Specere.org > wrote: re: The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the content itself. https://avro.apache.org/docs/current/ From: < cti-stix@lists.oasis-open.org > on behalf of Sean Barnum < sbarnum@mitre.org > Date: Friday, October 23, 2015 at 11:13 AM To: Jason Keirstead < Jason.Keirstead@ca.ibm.com > Cc: John Wunder < jwunder@mitre.org >, cti-stix@lists.oasis-open.org < cti-stix@lists.oasis-open.org > Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions Attachment: signature.asc Description: Message signed with OpenPGP using GPGMail


  • 6.  Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 18:02
    To fix an issue like this it's better to identify the root cause. And yes, even if it's adding some 'relative complexity', I think a parent/childs approach a la CWE/CAPEC would help On Friday, 23 October 2015, Jordan, Bret < bret.jordan@bluecoat.com > wrote: We just need a way to guide developers so that when they write implementations, those implementations do not go POOF when they get something they did not expect.  We can not assume or rely on the fact that a schema exists somewhere in the wild and that some how software will be able to download it and automatically write code to support it.   When I talk about simplicity, as I have said many times before, it does not mean reducing expressiveness.  <throwing spaghetti at the wall to see what sticks> Perhaps this issue could be solved by a layered approach..... Step 1: Greatly increase the default vocabulary, removing weirdness and things that are duplicate in nature. Try and get the default vocabulary to a 60/40 or 70/30 rule.  It would also be really nice if the terms were backed by a numerical element.  String matching in code in notoriously inefficient.   Step 2: Provide a secondary entry point for an additional vocabulary for those groups that want or need to define their own.  We would obviously need an entry point in to the controlled vocabulary that can be a "fall back", something that is a lot higher up the food chain.   By doing something like this, software that can only work with the default vocabulary can skip or throw away things it does not understand and have a fall back to something that is "close enough" for what they need.  In the second example below, the "sub_type" would be options and parsers could easily throw it away or skip it.  In fact most JSON parsers today naturally just skip things that do not map to a struct.   {   "stixtype": "indicator",   "type": "IP Watchlist",   .... { or  {   "stixtype": "indicator",   "type": "Malware",   "sub_type" : {     "vocab": " http://a.b.com/vocab-foo ",     "type": "X-RAT-Downloader"   },   .... { </spaghetti runs down the wall and pools in a gob of mess on the floor> Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."  On Oct 23, 2015, at 10:57, Patrick Maroney < Pmaroney@Specere.org > wrote: re: "The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the content itself. " https://avro.apache.org/docs/current/ From: < cti-stix@lists.oasis-open.org > on behalf of Sean Barnum < sbarnum@mitre.org > Date: Friday, October 23, 2015 at 11:13 AM To: Jason Keirstead < Jason.Keirstead@ca.ibm.com > Cc: John Wunder < jwunder@mitre.org >, " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions


  • 7.  RE: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-27-2015 00:20




    [+1]
     
    Terry MacDonald
    Senior STIX Subject Matter Expert
    SOLTRA   An FS-ISAC and DTCC Company
    +61 (407) 203 206
    terry@soltra.com
     
     
    From: cti-stix@lists.oasis-open.org [mailto:cti-stix@lists.oasis-open.org]
    On Behalf Of Jerome Athias
    Sent: Saturday, 24 October 2015 5:02 AM
    To: Jordan, Bret <bret.jordan@bluecoat.com>
    Cc: Patrick Maroney <Pmaroney@specere.org>; Sean D. Barnum <sbarnum@mitre.org>; Jason Keirstead <Jason.Keirstead@ca.ibm.com>; Wunder, John A. <jwunder@mitre.org>; cti-stix@lists.oasis-open.org
    Subject: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions
     
    To fix an issue like this it's better to identify the root cause. And yes, even if it's adding some 'relative complexity', I think a parent/childs approach a la CWE/CAPEC would help

    On Friday, 23 October 2015, Jordan, Bret < bret.jordan@bluecoat.com > wrote:


    We just need a way to guide developers so that when they write implementations, those implementations do not go POOF when they get something they did not expect.  We can not assume or rely on the fact that a schema exists somewhere in the
    wild and that some how software will be able to download it and automatically write code to support it.  

     


    When I talk about simplicity, as I have said many times before, it does not mean reducing expressiveness. 


     


    <throwing spaghetti at the wall to see what sticks>


     


    Perhaps this issue could be solved by a layered approach.....


     


    Step 1: Greatly increase the default vocabulary, removing weirdness and things that are duplicate in nature. Try and get the default vocabulary to a 60/40 or 70/30 rule.  It would also be really nice if the terms were backed by a numerical
    element.  String matching in code in notoriously inefficient.  


     


    Step 2: Provide a secondary entry point for an additional vocabulary for those groups that want or need to define their own.  We would obviously need an entry point in to the controlled vocabulary that can be a "fall back", something that
    is a lot higher up the food chain.  


     


    By doing something like this, software that can only work with the default vocabulary can skip or throw away things it does not understand and have a fall back to something that is "close enough" for what they need.  In the second example
    below, the "sub_type" would be options and parsers could easily throw it away or skip it.  In fact most JSON parsers today naturally just skip things that do not map to a struct.  


     


     


    {


      "stixtype": "indicator",


      "type": "IP Watchlist",


      ....


    {


     


     


    or 


     



    {


      "stixtype": "indicator",


      "type": "Malware",


      "sub_type" : {


        "vocab": " http://a.b.com/vocab-foo ",


        "type": "X-RAT-Downloader"


      },


      ....


    {



     


     


    </spaghetti runs down the wall and pools in a gob of mess on the floor>


     


     








     


    Thanks,


     


    Bret



     


     


     



    Bret Jordan CISSP

    Director of Security Architecture and Standards Office of the CTO


    Blue Coat Systems



    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050


    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 









     



    On Oct 23, 2015, at 10:57, Patrick Maroney < Pmaroney@Specere.org > wrote:

     




    re: "The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the
    content itself. "


     



    https://avro.apache.org/docs/current/




     




     


    From:
    < cti-stix@lists.oasis-open.org > on behalf of Sean Barnum < sbarnum@mitre.org >
    Date: Friday, October 23, 2015 at 11:13 AM
    To: Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Cc: John Wunder < jwunder@mitre.org >, " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org >
    Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions


     





     










  • 8.  Re: [cti-stix] [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 18:07



    I like this idea…one problem with the custom vocabulary approach is that even if terms kind of align with the default vocabulary the software can’t bucket it correctly. This approach lets you make a best match with the default vocab while still being able to
    use your custom one.





    On Oct 23, 2015, at 1:57 PM, Jordan, Bret < bret.jordan@BLUECOAT.COM > wrote:



    We just need a way to guide developers so that when they write implementations, those implementations do not go POOF when they get something they did not expect.  We can not assume or rely on the fact that a schema exists somewhere in the wild and that some
    how software will be able to download it and automatically write code to support it.  


    When I talk about simplicity, as I have said many times before, it does not mean reducing expressiveness. 


    <throwing spaghetti at the wall to see what sticks>


    Perhaps this issue could be solved by a layered approach.....


    Step 1: Greatly increase the default vocabulary, removing weirdness and things that are duplicate in nature. Try and get the default vocabulary to a 60/40 or 70/30 rule.  It would also be really nice if the terms were backed by a numerical element.
     String matching in code in notoriously inefficient.  


    Step 2: Provide a secondary entry point for an additional vocabulary for those groups that want or need to define their own.  We would obviously need an entry point in to the controlled vocabulary that can be a "fall back", something that is a
    lot higher up the food chain.  


    By doing something like this, software that can only work with the default vocabulary can skip or throw away things it does not understand and have a fall back to something that is "close enough" for what they need.  In the second example below,
    the "sub_type" would be options and parsers could easily throw it away or skip it.  In fact most JSON parsers today naturally just skip things that do not map to a struct.  




    {
      "stixtype": "indicator",
      "type": "IP Watchlist",
      ....
    {




    or 



    {
      "stixtype": "indicator",
      "type": "Malware",
      "sub_type" : {
        "vocab": " http://a.b.com/vocab-foo ",
        "type": "X-RAT-Downloader"
      },
      ....
    {





    </spaghetti runs down the wall and pools in a gob of mess on the floor>















    Thanks,


    Bret











    Bret Jordan CISSP

    Director of Security Architecture and Standards Office of the CTO

    Blue Coat Systems

    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050
    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 











    On Oct 23, 2015, at 10:57, Patrick Maroney < Pmaroney@Specere.org > wrote:




    re: "The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the content
    itself. "



    https://avro.apache.org/docs/current/










    From: < cti-stix@lists.oasis-open.org > on behalf of Sean Barnum < sbarnum@mitre.org >
    Date: Friday, October 23, 2015 at 11:13 AM
    To: Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Cc: John Wunder < jwunder@mitre.org >, " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org >
    Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions





















  • 9.  Re: [cti-stix] [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 18:12
    If we did things like this generally across the board then we get a simple one-way-of-doing things.  Which means, we get more compatible code in the market.  An approach like this allows us to be very expressive which is needed by the groups that people like Pat represent.  And we NEED to make sure they are taken care of.  But we also need to make sure that desperate applications that process and use STIX can actually talk to each other.  Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.   On Oct 23, 2015, at 12:06, Wunder, John A. < jwunder@mitre.org > wrote: I like this idea…one problem with the custom vocabulary approach is that even if terms kind of align with the default vocabulary the software can’t bucket it correctly. This approach lets you make a best match with the default vocab while still being able to use your custom one. On Oct 23, 2015, at 1:57 PM, Jordan, Bret < bret.jordan@BLUECOAT.COM > wrote: We just need a way to guide developers so that when they write implementations, those implementations do not go POOF when they get something they did not expect.  We can not assume or rely on the fact that a schema exists somewhere in the wild and that some how software will be able to download it and automatically write code to support it.   When I talk about simplicity, as I have said many times before, it does not mean reducing expressiveness.  <throwing spaghetti at the wall to see what sticks> Perhaps this issue could be solved by a layered approach..... Step 1: Greatly increase the default vocabulary, removing weirdness and things that are duplicate in nature. Try and get the default vocabulary to a 60/40 or 70/30 rule.  It would also be really nice if the terms were backed by a numerical element.  String matching in code in notoriously inefficient.   Step 2: Provide a secondary entry point for an additional vocabulary for those groups that want or need to define their own.  We would obviously need an entry point in to the controlled vocabulary that can be a fall back , something that is a lot higher up the food chain.   By doing something like this, software that can only work with the default vocabulary can skip or throw away things it does not understand and have a fall back to something that is close enough for what they need.  In the second example below, the sub_type would be options and parsers could easily throw it away or skip it.  In fact most JSON parsers today naturally just skip things that do not map to a struct.   {   stixtype : indicator ,   type : IP Watchlist ,   .... { or  {   stixtype : indicator ,   type : Malware ,   sub_type : {     vocab : http://a.b.com/vocab-foo ,     type : X-RAT-Downloader   },   .... { </spaghetti runs down the wall and pools in a gob of mess on the floor> Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.   On Oct 23, 2015, at 10:57, Patrick Maroney < Pmaroney@Specere.org > wrote: re: The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the content itself. https://avro.apache.org/docs/current/ From: < cti-stix@lists.oasis-open.org > on behalf of Sean Barnum < sbarnum@mitre.org > Date: Friday, October 23, 2015 at 11:13 AM To: Jason Keirstead < Jason.Keirstead@ca.ibm.com > Cc: John Wunder < jwunder@mitre.org >, cti-stix@lists.oasis-open.org < cti-stix@lists.oasis-open.org > Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions Attachment: signature.asc Description: Message signed with OpenPGP using GPGMail


  • 10.  Re: [cti-users] Indicator Type / Vocabulary Implementation Questions

    Posted 10-23-2015 15:35
      |   view attached
    I put that on hold for a long time because of other issues (things to improve) but while it is discussed now... To be efficient the controlled vocabularies must be common to all. This is not possible today because they are limited and not structured in an efficient way. Fire on me for complexity (you're welcome), but they should be structured like the CWEs (or CAPEC). Don't say it's not possible. It was done for the weaknesses (CWEs) (what btw some still call 'vulnerabilities' just because the cybersec community is not mature enough to use a common, proper ontology...) Using IDs, so u can rename them without causing issues. With parents/childs so u can use a high level point of view, or go deep into the classification if u want/can It is a needed effort for interoperability/automation On Friday, 23 October 2015, Jason Keirstead < Jason.Keirstead@ca.ibm.com > wrote: The twofold problem with this approach however is what I mentioned previously - Many STIX consuming systems do not have internet access, and thus can not download arbitrary vocabularies created by others - Entities producing STIX documents may also not have any access to a reasonable place to post a vocabulary for the public. - Jason Keirstead Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown "Barnum, Sean D." ---2015/10/23 11:38:47 AM---John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not From: "Barnum, Sean D." < sbarnum@mitre.org > To: "Wunder, John A." < jwunder@mitre.org >, " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > Date: 2015/10/23 11:38 AM Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions Sent by: < cti-stix@lists.oasis-open.org > John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not yet been mentioned (big memory oversight on my part). That is that in addition to specifying schema-validatable vocabularies (whether default or custom) or just specifying a free-form string with no idea where it comes from there is also the ability to specify a non-schema-validated free-form string but to explicitly reference a definition for a vocabulary that it comes from. ControlledVocabularyStringType (the type for all the properties we are talking about) has two additional properties “vocab_name” and “vocab_reference” to serve the exact use case John describes ("we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema”). This approach may also be a great one for the original situation Jason described with his users defining values. Here is Section 3.7 of the STIX 1.2.1 specification part2-common: 1.1 Vocabulary Data Types There are three vocabulary-related UML data types defined in the Common data model, and together they provide a content creator with four choices for defining content, listed below in order of formality. Please see STIX TM Version 1.2.1 Part 14: Vocabularies for further information on STIX vocabularies. · Leverage a default vocabulary using the ControlledVocabularyStringType data type. STIX v1.2.1 defines a collection of default vocabularies and associated enumerations that are based on input from the STIX community (see STIX TM Version 1.2.1 Part 14: Vocabularies ); however, not all vocabulary properties have an assigned default vocabulary. · Formally define a custom vocabulary using the ControlledVocabularyStringType data type. To achieve value enforcement, a custom vocabulary must be formally added to the STIX Vocabulary data model. Because this is an extension of the STIX Vocabulary data model, producers and consumers MUST be aware of the addition to the data model for successful sharing of STIX documents. · Reference an externally-defined, custom vocabulary using the UnenforcedVocabularyStringType data type to constrain the set of values. Externally-defined vocabularies are publically defined, but have not been included as formally specified vocabularies within the STIX Vocabulary data model using the ControlledVocabularyStringType data type. In this case, it is sufficient to specify the name of the vocabulary and a URL that defines that vocabulary. · Choose an arbitrary and unconstrained value using the VocabularyStringType data type. While not required by the general STIX language, default vocabularies should be used whenever possible to ensure the greatest level of compatibility between STIX users. If an appropriate default vocabulary is not available a formally defined custom vocabulary can be specified and leveraged. In addition to compatibility advantages, using formally defined vocabularies (whether default vocabularies or otherwise defined) enables enforced use of valid enumeration values; please see STIX TM Version 1.2.1 Part 14: Vocabularies for the associated policy. If a formally defined vocabulary is not sufficient for a content producer’s purposes, the STIX Vocabulary data model allows the two alternatives listed above: externally defined custom vocabularies and arbitrary string values, which dispense with enumerated vocabularies altogether. If a custom vocabulary is not formally added to the Vocabulary data model then no enforcement policy of appropriate values is specified. The UML diagram shown in Figure 3-20 illustrates the relationships between the three vocabulary data types defined in the STIX Common data model. As illustrated, all controlled vocabularies formally defined within the STIX Vocabulary data model are defined using an enumeration derived from the ControlledVocabularyStringType data type. As shown, the HighMediumLowVocab-1.0 enumeration (used as a defined controlled vocabulary exemplar) is defined as a specialization of the ControlledVocabularyStringType data type, and therefore it is also a specialization of the VocabularyStringType data type. Further details of each vocabulary class are provided in Subsections 3.7.1 through 3.7.3 . Figure 3-20. UML diagram of the STIX TM Vocabulary data model 1.1.1 VocabularyStringType Data Type The VocabularyStringType data type is the basic data type of all vocabularies. Therefore, all properties in the collection of STIX data models that makes use of the Vocabulary data model must be defined to use the VocabularyStringType data type. Because this data type is a specialization of the basicDataTypes:BasicString data type, it can be used to support the arbitrary string option for vocabularies. 1.1.2 UnenforcedVocabularyStringType Data Type The UnenforcedVocabularyStringType data type specifies custom vocabulary values via an enumeration defined outside of the STIX Vocabulary data model. It extends the VocabularyStringType data type. Note that the STIX vocabulary data model does not define any enforcement policy for this data type. The property table of the UnenforcedVocabularyStringType data type is given in Table 3-46 . Table 3-46. Properties of the UnenforcedVocabularyStringType data type Name Type Multiplicity Description vocab_name basicDataTypes: NoEmbeddedQuoteString 0..1 The vocab_name property specifies the name of the externally defined vocabulary. vocab_reference basicDataTypes:URI 0..1 The vocab_reference property specifies the location of the externally defined vocabulary using a Uniform Resource Identifier (URI). 1.1.3 ControlledVocabularyStringType Data Type The ControlledVocabularyStringType data type specifies a formally defined vocabulary. It is an abstract data type so it MUST be extended via an enumeration from the STIX Vocabulary data model (descriptions of all default vocabularies defined within the STIX Vocabulary data model are found in STIX TM Version 1.2.1 Part 14: Vocabularies [i] ). Any custom vocabulary must be defined via an enumeration added to the STIX Vocabulary data model, if appropriate enumeration values are to be enforced. The ControlledVocabularyStringType class has no properties of its own, so there is no associated property table. [i] Note that all defined vocabulary enumerations have version numbers in their names to facilitate additions to the enumerations that are backward compatible. I apologize for the oversight. I went to the “obvious” answer too soon without stepping back and thinking big picture. This is something I hope we as a community try to avoid. sean From: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > on behalf of John Wunder < jwunder@mitre.org > Date: Friday, October 23, 2015 at 10:20 AM To: " cti-stix@lists.oasis-open.org " < cti-stix@lists.oasis-open.org > Subject: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions Even if we don’t choose to issue a new one for STIX 1.2.1 (I’m unclear on how that would work implementation-wise…it seems to be asking for compatibility issues) it would be awesome to come up with a more coherent and comprehensive list. So yeah, I say go for it. Thinking longer term, another option would be a simpler implementation of the concepts that we cover now. For example, we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema. I think in STIX 1.x we were a bit too strict about trying to get everything to validate in schema and in the end we just made things that should be simple much more complicated. John On Oct 23, 2015, at 9:38 AM, Jason Keirstead < Jason.Keirstead@CA.IBM.COM > wrote: Note, I have made this reply to CTI-STIX from CTI-Users I agree pretty much 100% with what you say Bernd. I see there is a bit of a conflict here - There is obviously a need to have a controlled vocabulary, so that tools and researchers can share categorized intelligence efficiently; however... - The current vocabulary list is seemingly arbitrary - and has many gaps, and also redundancies, as you mentioned. Off the top of my head it should have 2x - 3x as many options, and like you mention, some are redundant. I totally agree that it makes no sense to have different Watchlist types when that can be inferred easily from the data. Due to how STIX 1.X is constructed, we can easily revision this vocabulary as a non-breaking change. I would propose that the STIX TC undertake a work product to revision this vocabulary. This is a "quick win" that the TC can provide. If desired - I would volunteer to take the initial stab at extending the vocabulary. - Jason Keirstead Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown <graycol.gif> "Grobauer, Bernd" ---2015/10/23 07:50:32 AM---Hi, > I heard a recent proposal to remove it entirely. What would be the From: "Grobauer, Bernd" < Bernd.Grobauer@siemens.com > To: " jwunder@mitre.org " < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, " Cliff.Palmer@gd-ms.com " < Cliff.Palmer@gd-ms.com > Cc: " cti-users@lists.oasis-open.org " < cti-users@lists.oasis-open.org > Date: 2015/10/23 07:50 AM Subject: RE: [cti-users] Indicator Type / Vocabulary Implementation Questions Sent by: < cti-users@lists.oasis-open.org > Hi, > I heard a recent proposal to remove it entirely. What would be the > impact of that? I had made the suggestion to remove the IncidentType entirely in my somewhat provocative mail a few weeks ago, in which I wanted to explore how much potential for simplification in going towards STIX 2.0 there might be. Why had I suggested to remove it? The main reason is that I do not find the values that are currently part of the standard vocabulary particularly useful: - Why would I put 'IP Watchlist' or 'Domain Watchlist' or 'File Hash Watchlist' into the Indicator Type? I could understand "Watchlist", which tells you to watch for whatever Observable Patterns are indicated in the indicator. - Another type is 'C2' -- at the same time I have the ability to reference in the indicator a kill chain phase ... and if the referenced kill chain is of any use, it will have something corresponding to 'C2'. Now I have (again) two ways of expressing the same thing ... we have just stumbled over this issue a few days ago in a sharing group we are part of: we use the reference to the killchain phase to indicate C2-activity, others use the indicator type. Similarly, "Exfiltration" -- should that not be described with a reference from the indicator to an TTP "Exfiltration"? Other entries in the standard vocabulary ("Malicious Email", "Host Characteristics") seem like there would be no end to the list of allowed vocabulary (think "Malicious <enter CybOX object type here>" as pattern for generating vocabulary...) My suggestion to get rid of the indicator type was really a bit of a calculated provocation -- I have no trouble with keeping it in STIX. But we should ensure that the standard vocabulary is defined such that it really adds value rather than adding confusion by allowing yet more ways to describe the same thing in different ways. Kind regards, Bernd ---------------- Bernd Grobauer, Siemens CERT [attachment "27E231B6-B98D-4FC2-8B03-3FC81C1937D2.png" deleted by Jason Keirstead/CanEast/IBM]


  • 11.  RE: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions