+1 for RDF/JSON-LD As IBM said so clearly in their article:
If we serialize the graph into a standard format, we end up having
completely portable data. *Any RDF system can consume RDF from any other
RDF system without any type of coordination. *The application or users
might not know what to do with the data yet, but they can at least consume
it and then see what they have ingested.
https://www.ibm.com/developerworks/library/wa-data-integration-at-scale_rdf/On Tue, Oct 6, 2015 at 2:22 PM, Cory Casanave <
cory-c@modeldriven.com>
wrote:
> Bret,
>
> JSON comes from the assumption of a coupled environment “everyone knows
> what it means” and “everything will fit in a single hierarchical format”
> from a single data source. There is also the assumption that someone will
> write code for every use of every data element. I’m not suggesting that
> this does not have a place.
>
>
>
> RDF comes from the assumption of an open (perhaps within a protected
> ecosystem), loosely coupled environment – the “web of data” – multiple
> sources of data. This data may be consumed and integrated dynamically and
> sometimes without custom code. You said you understood the attraction of
> RDF for these purposes.
>
>
>
> So, the question is: do our use cases fall completely to one side or
> somewhere in the middle? I think your perspective is that the CTI use cases
> fall completely to the “coupled, we know what to expect” side. If you are
> correct then the “LD” context extensions are extra baggage in a message. I
> would suggest that if you are correct that the “cost” of these extra
> elements is minimal – just ignore them and code to the JSON tags. I tend to
> think the use cases fall in the middle.
>
>
>
> If I am correct and there is leverage in ANY OF THE FOLLOWING:
>
> · The ability to reference external data from a CTI data
> structure and be able to understand it (to some extent). Note this may be
> less true of the “core data” but very true of “edge data”, like
> organizations, software kinds and CWEs. However, there exist today open or
> somewhat open repositories of malware and bad IP addresses, so I would
> assert SOME DATA WILL BE OPEN or partially open –probably REST.
>
> · The ability of external agents to reference CTI data and
> reference it to some extent. Thus allowing CTI to define the formats for
> some of the above (e.g. malware).
>
> · That data consumers will ingest some CTI data into their
> analytics engines (or other software) without writing any custom code.
>
> · That CTI data will be combined/related with non CTI data
>
> · That everything you want to know about a subject may not be in
> a single message.
>
> · That some level of semantics may be able to augment human or
> manual capabilities.
>
> · There may be different hierarchies of data appropriate for
> different needs/use cases.
>
> · That there is a need to validate data (without puking like XML
> tools do)
>
> · That some consumers will not want to code for the entire CTI
> framework.
>
> · That the less we need to custom code, the better (faster,
> cheaper and more reliable). You write code four your core algorithms, the
> rest can be automated.
>
> · That there may be name conflicts across the many CTI imported
> packages.
>
> · That there will be some data repositories that we want to query
>
>
>
> All of the above depend on being able to understand what you get without
> human intervention. All of the above are capabilities leveraged by RDF &
> JSON-LD (and in many cases XML). JSON-LD is much more friendly than its
> RDF-XML or XML-Schema counterparts. IMHO the small cost of those context
> declarations is justified by these capabilities, even if they are not
> leveraged by everyone. What we don’t want to do is lock ourselves into a
> “data island” where we can’t break out to the more open environment.
>
>
>
> I also don’t think something the size of CTI, even in pure JSON, will be
> small or simple.
>
>
>
> Thus from a cost/benefit or cost/risk point of view, LD is a justified
> extension to using JSON.
>
>
>
> -Cory
>
>
>
> *From:* Jordan, Bret [mailto:
bret.jordan@bluecoat.com]
> *Sent:* Tuesday, October 06, 2015 1:00 PM
> *To:* Sean D. Barnum
> *Cc:* Cory Casanave; Jason Keirstead; Terry MacDonald;
>
cti-users@lists.oasis-open.org;
cti-stix@lists.oasis-open.org; Wunder,
> John A.
> *Subject:* Re: [cti-users] [cti-stix] [cti-users] MTI Binding
>
>
>
> Well my question still remains...
>
>
>
>
>
> Please help me understand....
>
>
>
> Say for kicks and giggles I have some structs that looks like this to
> consume an Indicator in a STIX package
>
>
>
> type StixMessageType struct {
> Id string `json:"id,omitempty"`
> IdRef string `json:"idref,omitempty"`
> Timestamp string `json:"timestamp,omitempty"`
> Version string `json:"version,omitempty"`
> Indicators []indicator.IndicatorType `json:"indicators,omitempty"`
> }
>
>
>
> type IndicatorType struct {
>
> Id string
> `json:"id,omitempty"`
>
> IdRef string
> `json:"idref,omitempty"`
> Timestamp string
> `json:"timestamp,omitempty"`
> Version string
> `json:"version,omitempty"`
> Negate bool
> `json:"negate,omitempty"`
> Title string
> `json:"title,omitempty"`
> Types []string
> `json:"type,omitempty"`
> AlternativeIDs []string
> `json:"alternative_ids,omitempty"`
> Descriptions []common.StructuredTextType
> `json:"descriptions,omitempty"`
> ShortDescriptions []common.StructuredTextType
> `json:"short_descriptions,omitempty"`
> ValidTimePositions []ValidTimeType
> `json:"valid_time_positions,omitempty"`
> Observable *observable.ObservableType
> `json:"observable,omitempty"`
> CompositeIndicatorExpression *CompositeIndicatorExpressionType
> `json:"composite_indicator_expression,omitempty"`
> IndicatedTTP []common.RelatedTTPType
> `json:"indicated_ttps,omitempty"`
> KillChainPhases []common.KillChainPhaseReferenceType
> `json:"kill_chain_phases,omitempty"`
> TestMechanisms []TestMechanismType
> `json:"test_mechanisms,omitempty"`
> LikelyImpact *common.StatementType
> `json:"likely_impact,omitempty"`
> SuggestedCOAs []SuggestedCOAsType
> `json:"suggested_coas,omitempty"`
> Handling []common.MarkingSpecificationType
> `json:"handling,omitempty"`
> Confidence *common.ConfidenceType
> `json:"confidence,omitempty"`
> Sightings *SightingsType
> `json:"sightings,omitempty"`
> RelatedIndicators *RelatedIndicatorsType
> `json:"related_indicators,omitempty"`
> RelatedCampaigns *RelatedCampaignReferencesType
> `json:"related_campaigns,omitempty"`
> RelatedPackages []common.RelatedPackageRefType
> `json:"related_packages,omitempty"`
> Producer *common.InformationSourceType
> `json:"producer,omitempty"`
> }
>
> And lets say I get an indicator over the wire that looks something like
> this, built directly from the STIX 1.2 schema.
>
>
>
> {
> "stix_package": {
> "id": "example:package-1ad2aab5-1707-4fcc-8fd2-ebae152adeec",
> "indicators": [
> {
> "id":
> "example:indicator-8571137a-32a2-4934-8077-2129475813af",
> "idref": "companyfoo:indicator-1234-1234-1234-1234",
> "timestamp": "2015-10-05T20:03:23-06:00",
> "version": "1.2.1",
> "title": "Some really neat indicator that we found",
> "type": [
> "URL Watchlist"
> ],
> "alternative_ids": [
> "CV-2014-12-12345",
> "CV-2015-02-54321"
> ],
> "descriptions": [
> {
> "id":
> "example:text-8769b510-9e76-4573-9778-864a51f052ae",
> "format": "text/plain",
> "value": "Some long description"
> }
> ],
> "short_descriptions": [
> {
> "id":
> "example:text-12170f79-62f4-48d9-9a97-d7077f05714f",
> "format": "text/plain",
> "value": "Some shorter description"
> }
> ]
> }
> ]
> }
> }
>
>
>
> We have a version field to say what version of an indicator it is and we
> know the type because it is in an indicator blob inside a stix_package
> blob.
>
>
>
> How does adding namespace elements make this more clear?
>
>
>
> I can easily parse this indicator and do interesting things with it. I
> can parse the TTPs that reference it and do things with them. I understand
> all of the fields in the indicator package because they are all well
> documented on the Github site [1] for Indicators. So I can either dump
> this data in to a relational database or in to a document database like
> MongoDB.
>
>
>
> Please help me understand why namespaces are required for structured data
> that is well defined. Like I said before, *I can totally get and fully
> understand* the need for JSON-LD in the open web. It makes perfect sense
> when you need this to share say random profile data between two or more
> entities (twitter, Facebook, youtube, etc). But we are not transporting
> random CTI, it will be in STIX. So alternative_ids are Alternative IDs,
> and a title is a Title. I do not see how JSON-LD helps us in anyway other
> than making a case for RDF over UML/OWL as RDF can work with JSON-LD.
>
>
>
> The only value I can see for JSON-LD is if we want to allow overloading.
> So I can make my own Indicator format and not adhere to the STIX version of
> an Indicator. In that case, yes, I can see the value, but I can also see
> the madness and chaos that would come from it.
>
>
>
> [1] -
http://stixproject.github.io/data-model/1.2/indicator/IndicatorType/>
>
>
>
>
> Thanks,
>
>
>
> Bret
>
>
>
>
>
>
>
> *Bret Jordan CISSP*
>
> Director of Security Architecture and Standards | Office of the CTO
>
> Blue Coat Systems
>
> PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
>
> "Without cryptography vihv vivc ce xhrnrw, however, the only thing that
> can not be unscrambled is an egg."
>
>
>
> On Oct 6, 2015, at 10:53, Barnum, Sean D. <
sbarnum@mitre.org> wrote:
>
>
>
> I would agree with Cory’s characterizations and assertions here.
>
> Jason, I think you may have misinterpreted what Cory was trying to say.
>
> He was not saying that given fields have multiple meanings. He was saying
> that differing use cases focused on different purposes may leverage
> different sets of fields and there will likely be overlap between the
> fields leveraged by different use cases. And that different use cases may
> care about a given field for different reasons and do different things with
> its content. That does not mean that the field has multiple meanings just
> that its one meaning may serve multiple purposes.
>
>
>
> Cory, please feel free to point out if I am mischaracterizing your intent.
>
>
>
> sean
>
>
>
>
>
> *From: *Cory Casanave
> *Date: *Tuesday, October 6, 2015 at 10:37 AM
> *To: *Jason Keirstead
> *Cc: *"Barnum, Sean D.", Terry MacDonald, "Jordan, Bret", "
>
cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org", John
> Wunder
> *Subject: *RE: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
>
>
>
> Jason,
>
> Re: This premise is untrue. Or at least, at the release of STIX 2.0, this
> has to be untrue - otherwise we have fundamentally failed in creating a
> data interchange standard. And I believe that this incongruency is at the
> heart of this whole discussion.
>
>
>
> The count of terms is easily verified, so I assume you think this is
> untrue: *may be used for very different use cases that use different
> viewpoints of the data with different root structures*
>
>
>
> Consider STIX may be used by one application to produce or consume a list
> of suspect IP addresses, and is hard-coded to that purpose and structure.
> It has been independently suggested that all uses of STIX will be coded.
>
>
>
> Another is coded for “Mitigation Strategies - Coordinated Action Plans -
> Courses of Action - Understanding of Achievable Mitigation Effects”.
>
>
>
> Other than having a common STIX envelope, I would consider these different
> viewpoints of the data with different root structures.
>
>
>
> You could identify dozens of such essentially different exchanges. I’m not
> suggesting this as a failure, only a reality of the domain and scope. Of
> course, there are different approaches to handling such diversity – which
> is part of this conversation.
>
>
>
> -Cory
>
>
>
> *From:* Jason Keirstead [mailto:
Jason.Keirstead@ca.ibm.com> <
Jason.Keirstead@ca.ibm.com>]
> *Sent:* Tuesday, October 06, 2015 8:26 AM
> *To:* Cory Casanave
> *Cc:* Barnum, Sean D.; Terry MacDonald; Jordan, Bret;
>
cti-users@lists.oasis-open.org;
cti-stix@lists.oasis-open.org; Wunder,
> John A.
> *Subject:* RE: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
>
>
>
> "STIX is several thousand terms and may be used for very different use
> cases that use different viewpoints of the data with different root
> structures."
>
> This premise is untrue. Or at least, at the release of STIX 2.0, this has
> to be untrue - otherwise we have fundamentally failed in creating a data
> interchange standard. And I believe that this incongruency is at the heart
> of this whole discussion.
>
> The whole point of data interchange standards is to explicitly avoid this
> premise. It is so that when I create a message such as
> "<foofarah><name>foo</name><id>bar</id></foofarah>", **I can send that
> message without any other context to any recipient on the planet** - and
> the recipient will be able to understand it, because they do not have to
> guess as to what "name", or "id" mean - because they know that I am
> following the "Fooferah 1.0" standard, which explicitly defines what is
> present in those fields.
>
>
> -
> Jason Keirstead
> Product Architect, Security Intelligence, IBM Security Systems
>
www.ibm.com/security |
www.securityintelligence.com>
> Without data, all you are is just another person with an opinion - Unknown
>
>
>
> <image001.gif>Cory Casanave ---2015/10/05 09:04:26 PM---Sean, I very much
> agree. A lot of the “its simple” view of JSON or even early XML is based on
> its us
>
> From: Cory Casanave <
cory-c@modeldriven.com>
> To: "Barnum, Sean D." <
sbarnum@mitre.org>, Terry MacDonald <
>
terry.macdonald@gmail.com>, "Jordan, Bret" <
bret.jordan@bluecoat.com>
> Cc: "cti-users@lists.oasis-open.org" <
cti-users@lists.oasis-open.org>, "
>
cti-stix@lists.oasis-open.org" <
cti-stix@lists.oasis-open.org>, "Wunder,
> John A." <
jwunder@mitre.org>
> Date: 2015/10/05 09:04 PM
> Subject: RE: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
> Sent by: <
cti-users@lists.oasis-open.org>
> ------------------------------
>
>
>
>
> Sean,
> I very much agree. A lot of the “its simple” view of JSON or even early
> XML is based on its use for single and highly structured interactions
> between endpoints controlled by the same authority (My server talking to my
> android application).
>
> STIX is several thousand terms and may be used for very different use
> cases that use different viewpoints of the data with different root
> structures. On top of this is the need for extensibility and flexibility.
> This is simply the reality of the domain and the scope of STIX. The bad
> news is that regardless of the serialization format, schema language,
> model, language, etc. It is somewhat complex – that is the real and
> necessary complexity. So I am concerned that the “Pure JSON will be simple”
> view will end in some disappointment. Note that the same concerns of
> complexity are levied against NIEM, another large XML schema based data
> sharing standard.
>
> The good news is we can make it BETTER and as simple as is practical! When
> some of these requirements are folded into XML schema, it adds complexity –
> so perhaps some of these other choices REDUCE complexity even if they
> require some new learning. Where we can add semantic precision software can
> handle some of the load. If we have a way to define fine-tuned “profiles”,
> these may be much simpler for their more limited purpose. We can also make
> the models easier to understand for us humans with graphical models linked
> to semantic definitions.
>
> I am copying the following list from Shawn Riley to show the variety of
> information formats and viewpoints that we are trying to fit together under
> one, many faceted, schema:
>
> Below is some of the typical cybersecurity data and information
> users/analysts/scientists have to organize into some type of body of
> knowledge so they understand their cybersecurity ecosystem. If the
> technology can’t understanding the meaning of the data then it’s the humans
> who have to understand it and “connect the dots”.
>
> Configuration/Anomaly Reporting - Infrastructure Information - Risk
> Posture - Anomalies
>
> Knowledge of Threat Actors - Threat Actor Infrastructure - Threat Actor
> Personas - Collected Threat Actor Indicators - Threat Actor Attribution -
> Trend Analysis - Victim Information
>
> Incident Awareness - Incident Information - Incident Data - Infrastructure
> Impact and Effects - Investigations/cases - Alerting Indicators - Victim
> Information
>
> Indications and Warnings - Events and Alerts - Tipping and Cueing -
> Warnings - Impact assessments - Potential Indicators
>
> Vulnerability Knowledge - Vulnerabilities - Exploits - Potential Victim
> Information
>
> Mitigation Strategies - Coordinated Action Plans - Courses of Action -
> Understanding of Achievable Mitigation Effects
>
> Mitigation Actions and Responses - Computer Network Defense Situational
> Awareness - Action Tasking and Status - Effectiveness Reporting - After
> Action Reporting and Lessons Learned
>
>
> *From:*
cti-stix@lists.oasis-open.org [mailto:
cti-stix@lists.oasis-open.org> <
cti-stix@lists.oasis-open.org>] *On Behalf Of *Barnum, Sean D.
> * Sent:* Monday, October 05, 2015 9:18 AM
> * To:* Terry MacDonald; Jordan, Bret
> * Cc:*
cti-users@lists.oasis-open.org;
cti-stix@lists.oasis-open.org;
> Wunder, John A.
> * Subject:* [cti-stix] Re: [cti-users] Re: [cti-stix] [cti-users] MTI
> Binding
>
> I think that using these simple idioms would be great for folks to see
> roughly what the different forms look like but I do not think they would be
> sufficient for comparing size and complexity as a whole.
> These are VERY simple example structures. More complex examples would
> likely differ from these simple ones in how each representation tackle size
> and complexity.
>
> sean
>
> *From: *<
cti-users@lists.oasis-open.org> on behalf of Terry MacDonald
> * Date: *Friday, October 2, 2015 at 5:51 PM
> * To: *"Jordan, Bret"
> * Cc: *"cti-users@lists.oasis-open.org", "cti-stix@lists.oasis-open.org",
> John Wunder
> * Subject: *Re: [cti-users] Re: [cti-stix] [cti-users] MTI Binding
>
> +1. Is a nice idea as we can see a size and complexity comparison. Is
> there any chance each person can document the process that the generation
> took? I'm thinking it could be useful to see how complicated the toolchain
> for developing each type of output is.
>
> Cheers
> Terry MacDonald
> On 3 Oct 2015 6:33 am, "Jordan, Bret" <
bret.jordan@bluecoat.com> wrote:
> I think this is a great idea..
>
> Thanks,
>
> Bret
>
>
>
> *Bret Jordan CISSP*
> Director of Security Architecture and Standards | Office of the CTO
> Blue Coat Systems
> PGP Fingerprint: 63B4 FC53 680A 6B7D 1447 F2C0 74F8 ACAE 7415 0050
> "Without cryptography vihv vivc ce xhrnrw, however, the only thing that
> can not be unscrambled is an egg."
>
> On Oct 2, 2015, at 14:08, Wunder, John A. <
jwunder@mitre.org> wrote:
>
> How about we take two of the idioms on the stixproject.github.io site?
>
> -
http://stixproject.github.io/documentation/idioms/c2-indicator/> -
http://stixproject.github.io/documentation/idioms/simple-incident/>
> Thanks for helping out. I think it would be nice to see these as:
>
> - Current STIX XML (Done already)
> - Simplified XML (TBD, maybe if the JSON one is quick I’ll do this too)
> - JSON/JSON-Schema (Wunder)
> - JSON-LD (Casanave)
> - Any others people are interest (PMML, Thrift, ProtoBuf, etc)
>
> John
>
> On Oct 2, 2015, at 11:50 AM, Cory Casanave <
cory-c@modeldriven.com> wrote:
>
> Re: Examples.
> Pick your examples, I can help out. Would prefer to baseline off of the
> same schema subset & example data, current STIX is fine to define the
> examples. I suggest at least one that is very simple “pure hierarchical
> data” and at least one with some related entities.
> -Cory
>
>
>
>
> <image001.gif>
>
>
>