I think added to server makes sense because the object will only be added to the server once even if the object is exposed through multiple collections for different sharing communities.
Allan
From: Bret Jordan <
Bret_Jordan@symantec.com>
Date: Wednesday, January 24, 2018 at 8:12 AM
To: Allan Thomson <
athomson@lookingglasscyber.com>, "cti-taxii@lists.oasis-open.org" <
cti-taxii@lists.oasis-open.org>
Subject: Re: [EXT] Re: [cti-taxii] More questions about tracking objects in collections
Allan,
The problem is how and where you pull the date_added information from.... See the dataset examples. If you pull it from the "server" then yes, it will work. But if
you try to track it at the collection level, then you need to track all of the versions in the collection as well. Since you would only have a date_added in the collection for the main object.
The spec originally said that the date_added is when the object is added to the server, not when it is added to a collection. We talked a few weeks back about changing
that. But I now think that is in error. It probably does need to be at the server level. Otherwise you are forcing the server to do an enormous amount of expense book keeping to update all of the collections that, that object is in.
Bret
From: Allan Thomson <
athomson@lookingglasscyber.com>
Sent: Wednesday, January 24, 2018 9:02:51 AM
To: Bret Jordan;
cti-taxii@lists.oasis-open.org Subject: [EXT] Re: [cti-taxii] More questions about tracking objects in collections
Bret – not sure I follow the problem nor agree with some of your statements in the description.
When a client connects to a TAXII server for the very first time it will not specify a added_after as it does not have any knowledge of what timestamps of objects
are on the server. So the 1 st request will have no added_after. The response to the 1 st request will contain 5 objects as you state up to indicator—1 ver 5 dated in 2003 The second request from the client * should * asked for added_after the last date returned for the last object added in the returned result. Which would
be 2003. The second response would then return object indicator—1 ver 6 and so on.
This process seems to work as it was intended. To allow a client to get all objects and eventually get to a point where its only getting the most recent objects added since the last time it
got in response.
So I’m not sure I follow what the problem is that you are identifying is.
Allan
From:
<
cti-taxii@lists.oasis-open.org> on behalf of Bret Jordan <
Bret_Jordan@symantec.com>
Date: Tuesday, January 23, 2018 at 9:07 PM
To: "cti-taxii@lists.oasis-open.org" <
cti-taxii@lists.oasis-open.org>
Subject: [cti-taxii] More questions about tracking objects in collections
All,
I have brought up a tangential question a bit ago, but I am still struggling with how to best address this. I think the conclusion that we came do a few weeks back might
in fact be wrong..... I will do my best to try an explain the problem, at least the way I see it.
1) A TAXII Server will probably want to have a repository / database of STIX content
2) A TAXII Collection will have some subset of the total objects in its collection
3) A STIX object can be in multiple collections
4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each
collection. Which also means, if you update an object you probably want the update to be available in every collection that it is found in. You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data.
Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things):
STIX Object Repo
indicator--1 ver 1, date_added 1999
indicator--1 ver 2, date_added 2000
indicator--1 ver 3, date_added 2001
indicator--1 ver 4, date_added 2002
indicator--1 ver 5, date_added 2003
indicator--1 ver 6, date_added 2004
indicator--1 ver 7, date_added 2005
indicator--1 ver 8, date_added 2006
indicator--1 ver 9, date_added 2007
indicator--1 ver 10, date_added 2008
indicator--1 ver 11, date_added 2009
indicator--1 ver 12, date_added 2010
indicator--2 ver 1, date_added 2011
indicator--3 ver 1, date_added 2012
indicator--4 ver 1, date_added 2013
indicator--5 ver 1, date_added 2014
indicator--6 ver 1, date_added 2015
indicator--7 ver 1, date_added 2016
Collection 1 Repo
indicator--1, date_added 1999
indicator--2, date_added 2011
indicator--3, date_added 2012
indicator--4, date_added 2013
indicator--5, date_added 2014
indicator--6, date_added 2015
indicator--7, date_added 2016
5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes)
6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers
will both be 1999.
7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1.
You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone
to error and problem. The only way i can really see to solve this is to track the date_added by the time the object comes in to the collection, not the time it was actually added to the collection.
So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error. Otherwise it gets ugly.
From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times.
Bret