So I’ve been thinking about this a bit, and I believe the main things we want to capture with regards to analysis metadata are (let me know if anything is missing!):
Static Analysis
Tool Identity
Dynamic Analysis
Analysis run length (in seconds?) Tool Identity VM Identity Guest OS Identity Installed Software on Guest OS
It seems like we can accomplish much of this through the use of the existing Software Object, so I’m working on prototyping a new “analysis-type” that can do so.
Jeff – your proposal is interesting, though one significant omission I see is the specific analysis-based context around the observed data and the entities inside. For instance, with regards to dynamic analysis, how would you capture that
a particular file was opened, deleted, created, etc.? I think you’d probably end up having to create many corresponding top-level relationships (“creates”, “deletes”, etc.), which would result in a more complex and larger representation than what we have currently.
Regards,
Ivan
From: "Mates, Jeffrey CIV DC3DCCI" <
Jeffrey.Mates@dc3.mil>
Date: Monday, September 18, 2017 at 9:00 AM
To: Jason Keirstead <
Jason.Keirstead@ca.ibm.com>, "Katz, Gary CTR DC3DCCI" <
Gary.Katz.ctr@dc3.mil>
Cc: Bret Jordan <
Bret_Jordan@symantec.com>, Ivan Kirillov <
ikirillov@mitre.org>, "cti-stix@lists.oasis-open.org" <
cti-stix@lists.oasis-open.org>
Subject: RE: [cti-stix] RE: [Non-DoD Source] Re: [cti-stix] Re: [EXT] Re: [cti-stix] Feedback on Malware Object
Going through the current model I think a lot of the dynamic and static analysis needs can be fit by using the Observed Data object along with a set of full CybOX graphs. An extension to Observable would be
needed to track environment information, but at least in my mind that seems to be the nearest fit. It would also benefit from having an additional relationship type to malware to clarify that the observed data object was built by sandboxing or tearing apart
the malware. Here’s a quick and dirty stab at what it might look like:
To avoid having my mail client mangle this I included the example as a text attachment. It shows a sample run of a binary file through both a dynamic and static analysis environment. It also shows a file being
flagged as delivery phase and an IP being flagged as a C2. It does this by build four overlapping observed data objects, one malware object, two attack patterns and relationships between these.
I created custom relationship types to be slightly more descriptive than “derived-from” for this specifically:
“describes” between observed data and attack pattern to indicate that the attack pattern described (and thus provides the kill chain phase for the observed
data blocks). “built-from” between malware and observed data to show that the malware was used to perform the static and dynamic analysis that produced the observed data
blocks.
I think this example works for the following concerns:
#3: By building separate derived observed data blocks and linking them to attack patterns we can give these kill chain phases. This isn’t very elegant and would be simpler if we added kill_chain_phases directly
as a property to observed data thus cutting out two objects and relationships.
#4: We can use tags to indicate if an observed-data block was built from static or dynamic analysis if desired.
#5: Tags can be used to capture a good chunk of this data, but if this needs to be captured in a more structured manner it seems like an extension to the observed data block would be the way to go.
#6: We can tag for to indicate if analysis is static, but for recording variable type values it seems like an extension for each affected CybOX object would be required.
#7: We can record HTTP and other network traffic metadata from dynamic analysis using this object. If they need raw data I can’t see any way to link artifacts and network traffic without a custom extension,
and even that seems inelegant so I’m not sure that’s a good solution.
Jeffrey Mates, Civ DC3/DCCI
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Computer Scientist
Defense Cyber Crime Institute
jeffrey.mates@dc3.mil 410-694-4335
From:
cti-stix@lists.oasis-open.org [mailto:
cti-stix@lists.oasis-open.org]
On Behalf Of Jason Keirstead
Sent: Friday, September 15, 2017 8:14 PM
To: Katz, Gary CTR DC3DCCI <
Gary.Katz.ctr@dc3.mil>
Cc: Bret Jordan <
Bret_Jordan@symantec.com>; Kirillov, Ivan A. <
ikirillov@mitre.org>;
cti-stix@lists.oasis-open.org Subject: Re: [cti-stix] RE: [Non-DoD Source] Re: [cti-stix] Re: [EXT] Re: [cti-stix] Feedback on Malware Object
Isn't it more than just the OS? You also need to capture the configuration of the OS, and also, of the sandbox itself (IE are you hooking sleep?)
Sent from IBM Verse
Katz, Gary CTR DC3\DCCI --- [cti-stix] RE: [Non-DoD Source] Re: [cti-stix] Re: [EXT] Re: [cti-stix] Feedback on Malware Object ---
From:
"Katz, Gary CTR DC3\DCCI" <
Gary.Katz.ctr@dc3.mil>
To:
"Jason Keirstead" <
Jason.Keirstead@ca.ibm.com>, "Bret Jordan" <
Bret_Jordan@symantec.com>
Cc:
"Kirillov, Ivan A." <
ikirillov@mitre.org>,
cti-stix@lists.oasis-open.org Date:
Fri, Sep 15, 2017 2:51 PM
Subject:
[cti-stix] RE: [Non-DoD Source] Re: [cti-stix] Re: [EXT] Re: [cti-stix] Feedback on Malware Object
It seems that #5 would be accomplished through the Operating System object which is in the Cyber Observables playground. From the looks of it, we still have some work to do maturing the object. The OS object
would then be included or referenced by the malware object to denote the characteristics of the environment in which the malware was run.
From:
cti-stix@lists.oasis-open.org [mailto:
cti-stix@lists.oasis-open.org]
On Behalf Of Jason Keirstead
Sent: Thursday, September 14, 2017 11:03 PM
To: Bret Jordan <
Bret_Jordan@symantec.com>
Cc: Kirillov, Ivan A. <
ikirillov@mitre.org>;
cti-stix@lists.oasis-open.org Subject: [Non-DoD Source] Re: [cti-stix] Re: [EXT] Re: [cti-stix] Feedback on Malware Object
For what it's worth, I was at an event today and one of the presentations had a section on maldocs and sandbox detonation and one of the things specifically discussed
was how important it was to capture the configuration of your sandbox along with the sample, so that you know for example how long is being allowed for the detonation. So, in a completely independent forum, it's backing Brett up here.
Sent from IBM Verse
Bret Jordan --- [cti-stix] Re: [EXT] Re: [cti-stix] Feedback on Malware Object ---
From:
"Bret Jordan" <
Bret_Jordan@symantec.com>
To:
"Kirillov, Ivan A." <
ikirillov@mitre.org>,
cti-stix@lists.oasis-open.org Date:
Thu, Sep 14, 2017 4:40 PM
Subject:
[cti-stix] Re: [EXT] Re: [cti-stix] Feedback on Malware Object
I think we could sweet talk people around issue 1.. But item 5 is really important for STIX Malware. As it currently stands there is no real way to understand the dynamic analysis, which type of system generated it, how that virtual
environment was configured, and what it actually applies to. In defense of item 5, we knew this might be an issue. But we decided to get something put together in the minigroup and then see if this was really an issue.
This is a significant issue for us, and given the sheer volume of STIX Malware content that we could potential create for the eco-system, I would hope that the minigroup and STIX SC would consider this feedback from our internal
teams and work through how we can address this.
Bret
From: Kirillov, Ivan A. <
ikirillov@mitre.org>
Sent: Thursday, September 14, 2017 1:58:01 PM
To: Bret Jordan;
cti-stix@lists.oasis-open.org Subject: [EXT] Re: [cti-stix] Feedback on Malware Object
Thanks for the feedback Bret (and thanks to your team as well)! On the face of it, #2, #3, #4, #6, and probably #7 are things that we could tweak or update in the existing data model without significantly changing the scope or intent of
the current object.
#1 is something that we discussed at length during the initial discussions around this object, and our decision to have a single object was based on the fact that having a separate malware family and instance object would be highly duplicative
in terms of the data that both would convey.
#5 would mean significantly changing the scope of this SDO. If you recall, our original goal was to create an object suitable for the “80%” use case, that is capable of capturing roughly 80% of commonly reported data around malware. To
this end, I don’t think we were envisioning a 1:1 mapping between a sandbox tool such as that produced by Symantec, and were thinking that there would have to be some post-processing or filtering of data that would take place before sandbox or other analysis
data would be populated in this object. In addition, while I don’t doubt that we could create a data model and structure to capture details of the particular execution environment for each sandbox run and its corresponding output, doing so would engender a
fair amount of specification and schema complexity. It’s also worth pointing out that if you have a strong need for #5, this is something that MAEC does to a large extent.
Anyhow, let’s discuss this during tomorrow’s call. Also, we’d be very interested if any other sandbox tool vendors (FireEye, etc.) have similar (or any) feedback, so please let us know.
Regards,
Ivan
From: <
cti-stix@lists.oasis-open.org> on behalf of Bret Jordan <
Bret_Jordan@symantec.com>
Date: Thursday, September 14, 2017 at 12:42 PM
To: "cti-stix@lists.oasis-open.org" <
cti-stix@lists.oasis-open.org>
Subject: [cti-stix] Feedback on Malware Object
Here is the initial feedback from Symantec on the 2.1 Malware Object.
Malware needs to be two different objects, one for family and one for the instance. It is really confusing to think of these as the same object.
Malware needs to track the build version and vanity name. Example 3rd generation of this malware.
Need to be able to track which sub-phase of the kill chain it is in or provide some sort of nesting logic. Basically, this malware calls this other malware, which calls this other malware. Need to know the order in which they
were called as this is critical information. There are often multiple phases that require understanding of the nesting.
Maybe change the terms to be Static Events and Dynamic Events, this would reduce the overlap and the guessing of where thing are documented. Example is Mutexes (down below)
Dynamic Analysis
Needs to capture multiple passes based on the type of execution environment that it was run on (Windows 7, Windows 10 + Office 2016, Windows 10 + Chrome)
Needs to track which type of sandbox it was run on and how that sandbox or virtual machine was configured / track the execution environment of where it was run, not just what it targets.
How the information as collected.
Where it was seen.
Where it will probably run
Need to track the platform that these run on, some things are not applicable to certain platforms. DLLs on Linux, Android MMS on Windows.
Need ability to search across data to find similar file creates across various runs of dynamic analysis under different configurations
Static Analysis
Has a lot of events that are really dynamic events. This is what it would do if/when it were to run
Mutex for example might be based on computer name or GUID. Information can be derived from the execution environment. Same with filenames. This makes these more “Dynamic Events”
Need to make a Communications Analysis/Events section and pull out all of the communication pieces.
HTTP / HTTPS
Network Requests
IRC
Raw Sockets
UDP
Network Flow
Contacted IPs, etc.
Bret