Understood.
This seems like a fundamental issue with diverse information exchange.
The only way I can see that STIX could try to overcome this issue is providing structures enabling the producer to transmit their full vocab definition as part of the content itself. This was discussed way back when the vocabulary structure was being tackled
and was ruled out as unneccessary complexity and something best handled outside the scope of STIX. That is why the “vocab_name” and “vocab_ref” properties were added as a mid-ground between schema-validatable vocab definitions which had a hard requirement
to access to the vocab definition and pure free-form string with no context of where it came from. The idea is that producers could simply reference vocabularies by name and/or by link to an actual explanatory definition in order to give context to a free-form
string. These could be common vocabularies used out in the world or defined by another standard or could be custom ones where the producer could post a simple web page listing the vocabulary values with definitions for each.
I am certainly not asserting that everything is perfect or that all problems are solved without cost or downside. I think there will always be some issues we have to work around. I just wanted to provide some context on what capability is currently available
and why it is the way it currently is.
Thank you for the great conversation around this topic. If nothing else comes out of it other than a renewed momentum to improving the IndicatorType default vocab I think it will have been well worth our time. ;-)
sean
From: "
cti-stix@lists.oasis-open.org " <
cti-stix@lists.oasis-open.org > on behalf of Jason Keirstead <
Jason.Keirstead@ca.ibm.com >
Date: Friday, October 23, 2015 at 10:42 AM
To: "Barnum, Sean D." <
sbarnum@mitre.org >
Cc: John Wunder <
jwunder@mitre.org >, "
cti-stix@lists.oasis-open.org " <
cti-stix@lists.oasis-open.org >
Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions
The twofold problem with this approach however is what I mentioned previously
- Many STIX consuming systems do not have internet access, and thus can not download arbitrary vocabularies created by others
- Entities producing STIX documents may also not have any access to a reasonable place to post a vocabulary for the public.
-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown
"Barnum,
Sean D." ---2015/10/23 11:38:47 AM---John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not
From: "Barnum, Sean D." <
sbarnum@mitre.org >
To: "Wunder, John A." <
jwunder@mitre.org >, "
cti-stix@lists.oasis-open.org " <
cti-stix@lists.oasis-open.org >
Date: 2015/10/23 11:38 AM
Subject: Re: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions
Sent by: <
cti-stix@lists.oasis-open.org >
John’s comments just reminded me that there is part of the vocabulary approach in STIX that has not yet been mentioned (big memory oversight on my part).
That is that in addition to specifying schema-validatable vocabularies (whether default or custom) or just specifying a free-form string with no idea where it comes from there is also the ability to specify a non-schema-validated
free-form string but to explicitly reference a definition for a vocabulary that it comes from.
ControlledVocabularyStringType (the type for all the properties we are
talking about) has two additional properties “vocab_name” and “vocab_reference” to serve the exact use case John describes ("we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability to indicate which vocabulary
has been chosen (including no vocab), but would not validate it via schema”). This approach may also be a great one for the original situation Jason described with his users defining values.
Here is Section 3.7 of the STIX 1.2.1 specification part2-common:
1.1 Vocabulary Data Types
There are three vocabulary-related UML data types defined in the Common data model, and together they provide a content creator with four choices for defining content, listed below in order of formality. Please see
STIX TM Version 1.2.1 Part 14: Vocabularies for further
information on STIX vocabularies.
· Leverage a default vocabulary using the
ControlledVocabularyStringType data type. STIX v1.2.1 defines a collection of default vocabularies and associated enumerations that are based on input from the STIX community (see
STIX TM Version 1.2.1 Part 14: Vocabularies );
however, not all vocabulary properties have an assigned default vocabulary.
· Formally define a custom vocabulary using the
ControlledVocabularyStringType data type. To achieve value enforcement, a custom vocabulary must be formally added to the STIX Vocabulary data model. Because this is an extension of the STIX
Vocabulary data model, producers and consumers MUST be aware of the addition to the data model for successful sharing of STIX documents.
· Reference an externally-defined, custom vocabulary using the
UnenforcedVocabularyStringType data type to constrain the set of values. Externally-defined vocabularies are publically defined, but have not been included as formally specified vocabularies
within the STIX Vocabulary data model using the ControlledVocabularyStringType data type. In this case, it is sufficient to specify the name of the vocabulary and a URL that defines that vocabulary.
· Choose an arbitrary and unconstrained value using the
VocabularyStringType data type.
While not required by the general STIX language, default vocabularies should be used whenever possible to ensure the greatest level of compatibility between STIX users. If an appropriate default vocabulary is not available a formally defined
custom vocabulary can be specified and leveraged. In addition to compatibility advantages, using formally defined vocabularies (whether default vocabularies or otherwise defined) enables enforced use of valid enumeration values; please see
STIX TM Version 1.2.1 Part 14: Vocabularies for the associated
policy.
If a formally defined vocabulary is not sufficient for a content producer’s purposes, the STIX Vocabulary data model allows the two alternatives listed above: externally defined custom vocabularies and arbitrary string values, which dispense
with enumerated vocabularies altogether. If a custom vocabulary is not formally added to the Vocabulary data model then no enforcement policy of appropriate values is specified.
The UML diagram shown in Figure 3-20 illustrates the relationships between the three vocabulary data types defined in the STIX Common data model. As illustrated, all controlled
vocabularies formally defined within the STIX Vocabulary data model are defined using an enumeration derived from the
ControlledVocabularyStringType data type.
As shown, the HighMediumLowVocab-1.0 enumeration (used as a defined controlled vocabulary exemplar) is defined as a specialization of the
ControlledVocabularyStringType data type, and therefore it is also a specialization of the
VocabularyStringType data type.
Further details of each vocabulary class are provided in Subsections
3.7.1 through
3.7.3 .
Figure 3-20. UML diagram of the STIX TM Vocabulary data model
1.1.1 VocabularyStringType Data Type
The VocabularyStringType data type is the basic data type of all vocabularies. Therefore, all properties in the collection of STIX data models that makes use of the Vocabulary data
model must be defined to use the VocabularyStringType data type. Because this data type is a specialization of the
basicDataTypes:BasicString data type, it can be used to support the arbitrary string option for vocabularies.
1.1.2 UnenforcedVocabularyStringType Data Type
The UnenforcedVocabularyStringType data type specifies custom vocabulary values via an enumeration defined outside of the STIX Vocabulary data model. It extends the
VocabularyStringType data type. Note that the STIX vocabulary data model does not define any enforcement policy for this data type.
The property table of the UnenforcedVocabularyStringType data type is given in
Table 3-46 .
Table 3-46. Properties of the
UnenforcedVocabularyStringType data type
Name
Type
Multiplicity
Description
vocab_name
basicDataTypes:
NoEmbeddedQuoteString
0..1
The vocab_name property specifies the name of the externally defined vocabulary.
vocab_reference
basicDataTypes:URI
0..1
The vocab_reference property specifies the location of the externally defined vocabulary using a Uniform Resource Identifier (URI).
1.1.3 ControlledVocabularyStringType Data Type
The ControlledVocabularyStringType data type specifies a formally defined vocabulary. It is an abstract data type so it MUST be extended via an enumeration from the STIX Vocabulary
data model (descriptions of all default vocabularies defined within the STIX Vocabulary data model are found in
STIX TM Version 1.2.1 Part 14: Vocabularies [i] ).
Any custom vocabulary must be defined via an enumeration added to the STIX Vocabulary data model, if appropriate enumeration values are to be enforced.
The ControlledVocabularyStringType class has no properties of its own, so there is no associated property table.
[i] Note that all defined vocabulary enumerations have version numbers in their names to facilitate additions to the enumerations that are backward compatible.
I apologize for the oversight. I went to the “obvious” answer too soon without stepping back and thinking big picture. This is something I hope we as a community try to avoid.
sean
From: "
cti-stix@lists.oasis-open.org "
<
cti-stix@lists.oasis-open.org > on behalf of John Wunder <
jwunder@mitre.org >
Date: Friday, October 23, 2015 at 10:20 AM
To: "
cti-stix@lists.oasis-open.org " <
cti-stix@lists.oasis-open.org >
Subject: [cti-stix] Re: [cti-users] Indicator Type / Vocabulary Implementation Questions
Even if we don’t choose to issue a new one for STIX 1.2.1 (I’m unclear on how that would work implementation-wise…it seems to be asking for compatibility issues) it would be awesome to come up with a more coherent and comprehensive
list. So yeah, I say go for it.
Thinking longer term, another option would be a simpler implementation of the concepts that we cover now. For example, we could just choose to have an unvalidated vocabulary: we would define the vocabulary and provide an ability
to indicate which vocabulary has been chosen (including no vocab), but would not validate it via schema.
I think in STIX 1.x we were a bit too strict about trying to get everything to validate in schema and in the end we just made things that should be simple much more complicated.
John
On Oct 23, 2015, at 9:38 AM, Jason Keirstead <
Jason.Keirstead@CA.IBM.COM > wrote:
Note, I have made this reply to CTI-STIX from CTI-Users
I agree pretty much 100% with what you say Bernd. I see there is a bit of a conflict here
- There is obviously a need to have a controlled vocabulary, so that tools and researchers can share categorized intelligence efficiently; however...
- The current vocabulary list is seemingly arbitrary - and has many gaps, and also redundancies, as you mentioned. Off the top of my head it should have 2x - 3x as many options, and like you mention, some are redundant. I totally agree that it makes no sense
to have different Watchlist types when that can be inferred easily from the data.
Due to how STIX 1.X is constructed, we can easily revision this vocabulary as a non-breaking change. I would propose that the STIX TC undertake a work product to revision this vocabulary. This is a "quick win" that the TC can provide.
If desired - I would volunteer to take the initial stab at extending the vocabulary.
-
Jason Keirstead
Product Architect, Security Intelligence, IBM Security Systems
www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown
<graycol.gif> "Grobauer, Bernd" ---2015/10/23 07:50:32 AM---Hi, > I heard a recent proposal to remove it entirely. What would be the
From: "Grobauer, Bernd" <
Bernd.Grobauer@siemens.com >
To: "
jwunder@mitre.org " <
jwunder@mitre.org >,
Jason Keirstead/CanEast/IBM@IBMCA, "
Cliff.Palmer@gd-ms.com " <
Cliff.Palmer@gd-ms.com >
Cc: "
cti-users@lists.oasis-open.org " <
cti-users@lists.oasis-open.org >
Date: 2015/10/23 07:50 AM
Subject: RE: [cti-users] Indicator Type / Vocabulary Implementation Questions
Sent by: <
cti-users@lists.oasis-open.org >
Hi,
> I heard a recent proposal to remove it entirely. What would be the
> impact of that?
I had made the suggestion to remove the IncidentType entirely in
my somewhat provocative mail a few weeks ago, in which I wanted
to explore how much potential for simplification in going towards
STIX 2.0 there might be.
Why had I suggested to remove it?
The main reason is that I do not find the values that are currently part of the
standard vocabulary particularly useful:
- Why would I put 'IP Watchlist' or 'Domain Watchlist' or 'File Hash Watchlist'
into the Indicator Type? I could understand "Watchlist", which tells you
to watch for whatever Observable Patterns are indicated in the indicator.
- Another type is 'C2' -- at the same time I have the ability to reference
in the indicator a kill chain phase ... and if the referenced kill chain
is of any use, it will have something corresponding to 'C2'.
Now I have (again) two ways of expressing the same thing ... we have
just stumbled over this issue a few days ago in a sharing group we
are part of: we use the reference to the killchain phase to indicate
C2-activity, others use the indicator type.
Similarly, "Exfiltration" -- should that not be described with a reference
from the indicator to an TTP "Exfiltration"?
Other entries in the standard vocabulary ("Malicious Email", "Host Characteristics")
seem like there would be no end to the list of allowed vocabulary (think
"Malicious <enter CybOX object type here>" as pattern for generating vocabulary...)
My suggestion to get rid of the indicator type was really a bit of a calculated
provocation -- I have no trouble with keeping it in STIX. But we should
ensure that the standard vocabulary is defined such that it really adds
value rather than adding confusion by allowing yet more ways to describe
the same thing in different ways.
Kind regards,
Bernd
----------------
Bernd Grobauer, Siemens CERT
[attachment "27E231B6-B98D-4FC2-8B03-3FC81C1937D2.png" deleted by Jason Keirstead/CanEast/IBM]