CTI TAXII Subcommittee

 View Only
  • 1.  More questions about tracking objects in collections

    Posted 01-24-2018 05:07
    All, I have brought up a tangential question a bit ago, but I am still struggling with how to best address this.  I think the conclusion that we came do a few weeks back might in fact be wrong.....   I will do my best to try an explain the problem, at least the way I see it. 1) A TAXII Server will probably want to have a repository / database of STIX content  2) A TAXII Collection will have some subset of the total objects in its collection 3) A STIX object can be in multiple collections  4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be available in every collection that it is found in.  You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data.   Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things): STIX Object Repo indicator--1 ver 1, date_added 1999 indicator--1 ver 2,  date_added 2000 indicator--1 ver 3 ,  date_added 2001 indicator--1 ver 4 ,  date_added 2002 indicator--1 ver 5 ,  date_added 2003 indicator--1 ver 6 ,  date_added 2004 indicator--1 ver 7 ,  date_added 2005 indicator--1 ver 8 ,  date_added 2006 indicator--1 ver 9 ,  date_added 2007 indicator--1 ver 10 ,  date_added 2008 indicator--1 ver 11 ,  date_added 2009 indicator--1 ver 12 ,  date_added 2010 indicator--2 ver 1 ,  date_added 2011 indicator--3 ver 1 ,  date_added 2012 indicator--4 ver 1 ,  date_added 2013 indicator--5 ver 1 ,  date_added 2014 indicator--6 ver 1 ,  date_added 2015 indicator--7 ver 1 ,  date_added 2016 Collection 1 Repo indicator--1, date_added 1999 indicator--2, date_added 2011 indicator--3, date_added 2012 indicator--4, date_added 2013 indicator--5, date_added 2014 indicator--6, date_added 2015 indicator--7, date_added 2016 5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes) 6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999. 7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1. You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem.  The only way i can really see to solve this is to track the date_added by the time the object comes in to the collection, not the time it was actually added to the collection.   So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error.  Otherwise it gets ugly.   From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times.   Bret


  • 2.  Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 01:28
    Bret Jordan wrote this message on Wed, Jan 24, 2018 at 05:07 +0000: > I have brought up a tangential question a bit ago, but I am still struggling with how to best address this. I think the conclusion that we came do a few weeks back might in fact be wrong..... I will do my best to try an explain the problem, at least the way I see it. > > > 1) A TAXII Server will probably want to have a repository / database of STIX content > > 2) A TAXII Collection will have some subset of the total objects in its collection > > 3) A STIX object can be in multiple collections > > 4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be available in every collection that it is found in. You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data. > > > Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things): > > > STIX Object Repo > > indicator--1 ver 1, date_added 1999 > > indicator--1 ver 2, date_added 2000 > > indicator--1 ver 3, date_added 2001 > > indicator--1 ver 4, date_added 2002 > > indicator--1 ver 5, date_added 2003 > > indicator--1 ver 6, date_added 2004 > > indicator--1 ver 7, date_added 2005 > > indicator--1 ver 8, date_added 2006 > > indicator--1 ver 9, date_added 2007 > > indicator--1 ver 10, date_added 2008 > > indicator--1 ver 11, date_added 2009 > > indicator--1 ver 12, date_added 2010 > > indicator--2 ver 1, date_added 2011 > > indicator--3 ver 1, date_added 2012 > > indicator--4 ver 1, date_added 2013 > > indicator--5 ver 1, date_added 2014 > > indicator--6 ver 1, date_added 2015 > > indicator--7 ver 1, date_added 2016 > > > Collection 1 Repo > > indicator--1, date_added 1999 > > indicator--2, date_added 2011 > > indicator--3, date_added 2012 > > indicator--4, date_added 2013 > > indicator--5, date_added 2014 > > indicator--6, date_added 2015 > > indicator--7, date_added 2016 > > > 5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes) > > 6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999. > > 7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1. > > > You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem. The only way i can really see to solve this is to track the date_added by the time the object comes in to the collection, not the time it was actually added to the collection. > > > So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error. Otherwise it gets ugly. > > > From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times. I would say part of the problem is that multiple versions are returned. IMO, unless you set a special flag, only the latest version of an object should be returned. That is an interesting point that if there are a large number of versions, that you can end up skipping them, and this is an issue. -- John-Mark


  • 3.  Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 01:45
    I agree. Only the latest version of a STIX object should be returned, and there should be a way for the client to explicitly ask for all versions if the client desires. Cheers Terry MacDonald Cosive On 25/01/2018 14:28, "John-Mark Gurney" < jmg@newcontext.com > wrote: Bret Jordan wrote this message on Wed, Jan 24, 2018 at 05:07 +0000: > I have brought up a tangential question a bit ago, but I am still struggling with how to best address this.  I think the conclusion that we came do a few weeks back might in fact be wrong.....   I will do my best to try an explain the problem, at least the way I see it. > > > 1) A TAXII Server will probably want to have a repository / database of STIX content > > 2) A TAXII Collection will have some subset of the total objects in its collection > > 3) A STIX object can be in multiple collections > > 4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be available in every collection that it is found in.  You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data. > > > Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things): > > > STIX Object Repo > > indicator--1 ver 1, date_added 1999 > > indicator--1 ver 2, date_added 2000 > > indicator--1 ver 3, date_added 2001 > > indicator--1 ver 4, date_added 2002 > > indicator--1 ver 5, date_added 2003 > > indicator--1 ver 6, date_added 2004 > > indicator--1 ver 7, date_added 2005 > > indicator--1 ver 8, date_added 2006 > > indicator--1 ver 9, date_added 2007 > > indicator--1 ver 10, date_added 2008 > > indicator--1 ver 11, date_added 2009 > > indicator--1 ver 12, date_added 2010 > > indicator--2 ver 1, date_added 2011 > > indicator--3 ver 1, date_added 2012 > > indicator--4 ver 1, date_added 2013 > > indicator--5 ver 1, date_added 2014 > > indicator--6 ver 1, date_added 2015 > > indicator--7 ver 1, date_added 2016 > > > Collection 1 Repo > > indicator--1, date_added 1999 > > indicator--2, date_added 2011 > > indicator--3, date_added 2012 > > indicator--4, date_added 2013 > > indicator--5, date_added 2014 > > indicator--6, date_added 2015 > > indicator--7, date_added 2016 > > > 5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes) > > 6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999. > > 7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1. > > > You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem.  The only way i can really see to solve this is to track the date_added by the time the object comes in to the collection, not the time it was actually added to the collection. > > > So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error.  Otherwise it gets ugly. > > > From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times. I would say part of the problem is that multiple versions are returned. IMO, unless you set a special flag, only the latest version of an object should be returned. That is an interesting point that if there are a large number of versions, that you can end up skipping them, and this is an issue. -- John-Mark ------------------------------ ------------------------------ --------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/ apps/org/workgroup/portal/my_ workgroups.php


  • 4.  Re: [EXT] Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 01:52
    We already support the ability for a client to ask for "all" versions.  The problem is, when they do that, we have a weird pagination problem to deal with.  Bret From: Terry MacDonald <terry.macdonald@cosive.com> Sent: Wednesday, January 24, 2018 6:44:50 PM To: John-Mark Gurney Cc: Bret Jordan; cti-taxii@lists.oasis-open.org Subject: [EXT] Re: [cti-taxii] More questions about tracking objects in collections   I agree. Only the latest version of a STIX object should be returned, and there should be a way for the client to explicitly ask for all versions if the client desires. Cheers Terry MacDonald Cosive On 25/01/2018 14:28, "John-Mark Gurney" < jmg@newcontext.com > wrote: Bret Jordan wrote this message on Wed, Jan 24, 2018 at 05:07 +0000: > I have brought up a tangential question a bit ago, but I am still struggling with how to best address this.  I think the conclusion that we came do a few weeks back might in fact be wrong.....   I will do my best to try an explain the problem, at least the way I see it. > > > 1) A TAXII Server will probably want to have a repository / database of STIX content > > 2) A TAXII Collection will have some subset of the total objects in its collection > > 3) A STIX object can be in multiple collections > > 4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be available in every collection that it is found in.  You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data. > > > Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things): > > > STIX Object Repo > > indicator--1 ver 1, date_added 1999 > > indicator--1 ver 2, date_added 2000 > > indicator--1 ver 3, date_added 2001 > > indicator--1 ver 4, date_added 2002 > > indicator--1 ver 5, date_added 2003 > > indicator--1 ver 6, date_added 2004 > > indicator--1 ver 7, date_added 2005 > > indicator--1 ver 8, date_added 2006 > > indicator--1 ver 9, date_added 2007 > > indicator--1 ver 10, date_added 2008 > > indicator--1 ver 11, date_added 2009 > > indicator--1 ver 12, date_added 2010 > > indicator--2 ver 1, date_added 2011 > > indicator--3 ver 1, date_added 2012 > > indicator--4 ver 1, date_added 2013 > > indicator--5 ver 1, date_added 2014 > > indicator--6 ver 1, date_added 2015 > > indicator--7 ver 1, date_added 2016 > > > Collection 1 Repo > > indicator--1, date_added 1999 > > indicator--2, date_added 2011 > > indicator--3, date_added 2012 > > indicator--4, date_added 2013 > > indicator--5, date_added 2014 > > indicator--6, date_added 2015 > > indicator--7, date_added 2016 > > > 5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes) > > 6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999. > > 7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1. > > > You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem.  The only way i can really see to solve this is to track the date_added by the time the object comes in to the collection, not the time it was actually added to the collection. > > > So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error.  Otherwise it gets ugly. > > > From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times. I would say part of the problem is that multiple versions are returned. IMO, unless you set a special flag, only the latest version of an object should be returned. That is an interesting point that if there are a large number of versions, that you can end up skipping them, and this is an issue. -- John-Mark ------------------------------ ------------------------------ --------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/ apps/org/workgroup/portal/my_ workgroups.php


  • 5.  Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 15:23




    You’re assuming the TAXII server * is * the database that holds all records including all history. This should not be assumed.
     
    The TAXII server collections are simply queues where data get posted and data get pulled. Therefore it should not have any logic or assumed logic that it keeps historical record of all object versions. That is introducing a lot more impact
    on what the TAXII server is doing and I think goes beyond what the current spec says it does.
     
    I suggest the TAXII server should not just return the latest version. It should return all data that has been pushed to it.
     
    This is necessary for clients that want to receive objects that they missed since the last time they sync-d.
     
    Allan
     

    From: <cti-taxii@lists.oasis-open.org> on behalf of Terry MacDonald <terry.macdonald@cosive.com>
    Date: Wednesday, January 24, 2018 at 5:45 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>
    Subject: Re: [cti-taxii] More questions about tracking objects in collections


     


    I agree. Only the latest version of a STIX object should be returned, and there should be a way for the client to explicitly ask for all versions if the client desires.


     


    Cheers


    Terry MacDonald


    Cosive



     

    On 25/01/2018 14:28, "John-Mark Gurney" < jmg@newcontext.com >
    wrote:

    Bret Jordan wrote this message on Wed, Jan 24, 2018 at 05:07 +0000:
    > I have brought up a tangential question a bit ago, but I am still struggling with how to best address this.  I think the conclusion that we came do a few weeks back might in fact be wrong.....   I will do my best to try an explain the problem, at least the
    way I see it.
    >
    >
    > 1) A TAXII Server will probably want to have a repository / database of STIX content
    >
    > 2) A TAXII Collection will have some subset of the total objects in its collection
    >
    > 3) A STIX object can be in multiple collections
    >
    > 4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be
    available in every collection that it is found in.  You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data.
    >
    >
    > Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things):
    >
    >
    > STIX Object Repo
    >
    > indicator--1 ver 1, date_added 1999
    >
    > indicator--1 ver 2, date_added 2000
    >
    > indicator--1 ver 3, date_added 2001
    >
    > indicator--1 ver 4, date_added 2002
    >
    > indicator--1 ver 5, date_added 2003
    >
    > indicator--1 ver 6, date_added 2004
    >
    > indicator--1 ver 7, date_added 2005
    >
    > indicator--1 ver 8, date_added 2006
    >
    > indicator--1 ver 9, date_added 2007
    >
    > indicator--1 ver 10, date_added 2008
    >
    > indicator--1 ver 11, date_added 2009
    >
    > indicator--1 ver 12, date_added 2010
    >
    > indicator--2 ver 1, date_added 2011
    >
    > indicator--3 ver 1, date_added 2012
    >
    > indicator--4 ver 1, date_added 2013
    >
    > indicator--5 ver 1, date_added 2014
    >
    > indicator--6 ver 1, date_added 2015
    >
    > indicator--7 ver 1, date_added 2016
    >
    >
    > Collection 1 Repo
    >
    > indicator--1, date_added 1999
    >
    > indicator--2, date_added 2011
    >
    > indicator--3, date_added 2012
    >
    > indicator--4, date_added 2013
    >
    > indicator--5, date_added 2014
    >
    > indicator--6, date_added 2015
    >
    > indicator--7, date_added 2016
    >
    >
    > 5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes)
    >
    > 6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999.
    >
    > 7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1.
    >
    >
    > You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem.  The only way i can really see to solve this is to track the date_added
    by the time the object comes in to the collection, not the time it was actually added to the collection.
    >
    >
    > So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error.  Otherwise it gets ugly.
    >
    >
    > From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times.

    I would say part of the problem is that multiple versions are returned.
    IMO, unless you set a special flag, only the latest version of an object
    should be returned.

    That is an interesting point that if there are a large number of versions,
    that you can end up skipping them, and this is an issue.

    --
    John-Mark

    ---------------------------------------------------------------------
    To unsubscribe from this mail list, you must leave the OASIS TC that
    generates this mail.  Follow this link to all your TCs in OASIS at:
    https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php









  • 6.  Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 15:43
    The spec quite clearly says right now that
    the default behavior is that only the latest version of any individual
    object is returned, *unless* the client passes "version=all". - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:      
      Allan Thomson <athomson@lookingglasscyber.com> To:      
      Terry MacDonald <terry.macdonald@cosive.com>,
    John-Mark Gurney <jmg@newcontext.com> Cc:      
      "Bret Jordan (CS)"
    <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org"
    <cti-taxii@lists.oasis-open.org> Date:      
      01/25/2018 11:29 AM Subject:    
        Re: [cti-taxii]
    More questions about tracking objects in collections Sent by:    
        <cti-taxii@lists.oasis-open.org> You’re assuming the TAXII server * is *
    the database that holds all records including all history. This should
    not be assumed.   The TAXII server collections are simply
    queues where data get posted and data get pulled. Therefore it should not
    have any logic or assumed logic that it keeps historical record of all
    object versions. That is introducing a lot more impact on what the TAXII
    server is doing and I think goes beyond what the current spec says it does.   I suggest the TAXII server should not just
    return the latest version. It should return all data that has been pushed
    to it.   This is necessary for clients that want
    to receive objects that they missed since the last time they sync-d.   Allan   From: <cti-taxii@lists.oasis-open.org>
    on behalf of Terry MacDonald <terry.macdonald@cosive.com> Date: Wednesday, January 24, 2018 at 5:45 PM To: John-Mark Gurney <jmg@newcontext.com> Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>,
    "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org> Subject: Re: [cti-taxii] More questions about tracking objects in collections   I agree. Only
    the latest version of a STIX object should be returned, and there should
    be a way for the client to explicitly ask for all versions if the client
    desires.   Cheers Terry MacDonald Cosive   On 25/01/2018 14:28, "John-Mark Gurney"
    < jmg@newcontext.com >
    wrote: Bret Jordan wrote this message on Wed,
    Jan 24, 2018 at 05:07 +0000: > I have brought up a tangential question a bit ago, but I am still
    struggling with how to best address this.  I think the conclusion
    that we came do a few weeks back might in fact be wrong.....   I will
    do my best to try an explain the problem, at least the way I see it. > > > 1) A TAXII Server will probably want to have a repository / database
    of STIX content > > 2) A TAXII Collection will have some subset of the total objects in
    its collection > > 3) A STIX object can be in multiple collections > > 4) An object in a collection probably means all versions of that object.
    Meaning, you would probably not want to individually track all versions
    of an options in each collection. Which also means, if you update an object
    you probably want the update to be available in every collection that it
    is found in.  You can see the problem at scale, take 1 billion objects,
    1000 collections with 20% overlap of data. > > > Now lets look at the following small data set (yes the ID is not a
    valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate
    things): > > > STIX Object Repo > > indicator--1 ver 1, date_added 1999 > > indicator--1 ver 2, date_added 2000 > > indicator--1 ver 3, date_added 2001 > > indicator--1 ver 4, date_added 2002 > > indicator--1 ver 5, date_added 2003 > > indicator--1 ver 6, date_added 2004 > > indicator--1 ver 7, date_added 2005 > > indicator--1 ver 8, date_added 2006 > > indicator--1 ver 9, date_added 2007 > > indicator--1 ver 10, date_added 2008 > > indicator--1 ver 11, date_added 2009 > > indicator--1 ver 12, date_added 2010 > > indicator--2 ver 1, date_added 2011 > > indicator--3 ver 1, date_added 2012 > > indicator--4 ver 1, date_added 2013 > > indicator--5 ver 1, date_added 2014 > > indicator--6 ver 1, date_added 2015 > > indicator--7 ver 1, date_added 2016 > > > Collection 1 Repo > > indicator--1, date_added 1999 > > indicator--2, date_added 2011 > > indicator--3, date_added 2012 > > indicator--4, date_added 2013 > > indicator--5, date_added 2014 > > indicator--6, date_added 2015 > > indicator--7, date_added 2016 > > > 5) Now lets assume that the TAXII server is set to only send 5 objects.
    (this is done for illustration purposes) > > 6) When the client makes its first request with an added_after URL
    of 1998 the client will get the records indicator--1 ver 1 through indicator--1
    ver 5 and the X-Headers will both be 1999. > > 7) The second request will get weird, if you add for the added_after
    value of 1999 you will get the same records again or you will skip the
    remaining versions of indicator--1. > > > You can obviously try and record the version (modified timestamp)
    in the collection table as well, but that will be an enormous amount of
    book keeping at scale and prone to error and problem.  The only way
    i can really see to solve this is to track the date_added by the time the
    object comes in to the collection, not the time it was actually added to
    the collection. > > > So while we talked a few weeks ago about changing the text to say
    when an object was added to a collection, I think that might be in error.
     Otherwise it gets ugly. > > > From a scale standpoint, think of collections with 100 million records
    or more and where some objects may be revisioned a 10,000 times. I would say part of the problem is that multiple versions are returned. IMO, unless you set a special flag, only the latest version of an object should be returned. That is an interesting point that if there are a large number of versions, that you can end up skipping them, and this is an issue. -- John-Mark --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php



  • 7.  Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 15:45




    Jason – Agree but I believe that’s done in the context of object query for a specific object where it makes sense that it returns the latest one only.
     
    I was referring to a collection and pulling a collection down by timestamp.
     
    In that case, everything that has been posted to the collection should be returned including prior versions of the object.
     
    In this sense, I consider the collection a ‘queue’ and not a db per se.
     

    Allan Thomson,

    CTO,
    Lookingglass Cyber Solutions
    This electronic message transmission contains information from LookingGlass Cyber Solutions, Inc. which may be attorney-client privileged, proprietary and/or confidential.
    The information in this message is intended only for use by the individual(s) to whom it is addressed.  If you believe that you have received this message in error, please contact the sender, delete this message, and be aware that any review, use, disclosure,
    copying or distribution of the contents contained within is strictly prohibited

     

    From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Thursday, January 25, 2018 at 7:42 AM
    To: Allan Thomson <athomson@lookingglasscyber.com>
    Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Terry MacDonald <terry.macdonald@cosive.com>
    Subject: Re: [cti-taxii] More questions about tracking objects in collections


     

    The spec quite clearly says right now that the default behavior is that only the latest version of any individual object is returned, *unless* the client passes "version=all".


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Allan Thomson <athomson@lookingglasscyber.com>
    To:         Terry MacDonald <terry.macdonald@cosive.com>, John-Mark Gurney <jmg@newcontext.com>
    Cc:         "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>
    Date:         01/25/2018 11:29 AM
    Subject:         Re: [cti-taxii] More questions about tracking objects in collections
    Sent by:         <cti-taxii@lists.oasis-open.org>




    You’re assuming the TAXII server * is * the database that holds all records including all history. This should not be assumed.
     
    The TAXII server collections are simply queues where data get posted and data get pulled. Therefore it should not have any logic or assumed logic that it keeps historical record of all object versions. That is introducing a lot
    more impact on what the TAXII server is doing and I think goes beyond what the current spec says it does.
     
    I suggest the TAXII server should not just return the latest version. It should return all data that has been pushed to it.
     
    This is necessary for clients that want to receive objects that they missed since the last time they sync-d.
     
    Allan
     
    From: <cti-taxii@lists.oasis-open.org> on behalf of Terry MacDonald <terry.macdonald@cosive.com>
    Date: Wednesday, January 24, 2018 at 5:45 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>
    Subject: Re: [cti-taxii] More questions about tracking objects in collections
     
    I agree. Only the latest version of a STIX object should be returned, and there should be a way for the client to explicitly ask for all versions if the client desires.

     
    Cheers
    Terry MacDonald
    Cosive
     
    On 25/01/2018 14:28, "John-Mark Gurney" < jmg@newcontext.com >
    wrote:
    Bret Jordan wrote this message on Wed, Jan 24, 2018 at 05:07 +0000:
    > I have brought up a tangential question a bit ago, but I am still struggling with how to best address this.  I think the conclusion that we came do a few weeks back might in fact be wrong.....   I will do my best to try an explain the problem, at least the
    way I see it.
    >
    >
    > 1) A TAXII Server will probably want to have a repository / database of STIX content
    >
    > 2) A TAXII Collection will have some subset of the total objects in its collection
    >
    > 3) A STIX object can be in multiple collections
    >
    > 4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be
    available in every collection that it is found in.  You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data.
    >
    >
    > Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things):
    >
    >
    > STIX Object Repo
    >
    > indicator--1 ver 1, date_added 1999
    >
    > indicator--1 ver 2, date_added 2000
    >
    > indicator--1 ver 3, date_added 2001
    >
    > indicator--1 ver 4, date_added 2002
    >
    > indicator--1 ver 5, date_added 2003
    >
    > indicator--1 ver 6, date_added 2004
    >
    > indicator--1 ver 7, date_added 2005
    >
    > indicator--1 ver 8, date_added 2006
    >
    > indicator--1 ver 9, date_added 2007
    >
    > indicator--1 ver 10, date_added 2008
    >
    > indicator--1 ver 11, date_added 2009
    >
    > indicator--1 ver 12, date_added 2010
    >
    > indicator--2 ver 1, date_added 2011
    >
    > indicator--3 ver 1, date_added 2012
    >
    > indicator--4 ver 1, date_added 2013
    >
    > indicator--5 ver 1, date_added 2014
    >
    > indicator--6 ver 1, date_added 2015
    >
    > indicator--7 ver 1, date_added 2016
    >
    >
    > Collection 1 Repo
    >
    > indicator--1, date_added 1999
    >
    > indicator--2, date_added 2011
    >
    > indicator--3, date_added 2012
    >
    > indicator--4, date_added 2013
    >
    > indicator--5, date_added 2014
    >
    > indicator--6, date_added 2015
    >
    > indicator--7, date_added 2016
    >
    >
    > 5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes)
    >
    > 6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999.
    >
    > 7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1.
    >
    >
    > You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem.  The only way i can really see to solve this is to track the date_added
    by the time the object comes in to the collection, not the time it was actually added to the collection.
    >
    >
    > So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error.  Otherwise it gets ugly.
    >
    >
    > From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times.

    I would say part of the problem is that multiple versions are returned.
    IMO, unless you set a special flag, only the latest version of an object
    should be returned.

    That is an interesting point that if there are a large number of versions,
    that you can end up skipping them, and this is an issue.

    --
    John-Mark

    ---------------------------------------------------------------------
    To unsubscribe from this mail list, you must leave the OASIS TC that
    generates this mail.  Follow this link to all your TCs in OASIS at:
    https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php










  • 8.  Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 15:52
    The way filtering is defined right now
    - it can apply to any endpoint. IE, you apply your filter to the GET on
    /collections/<id>/objects.  So the way I read the spec... this
    "return the latest version" is true throughout all endpoints,
    right now. We actually don't have any endpoint
    that would allow something like a "global search" right now anyway,
    so the /collections/<id>/objects endpoints are the only places filters
    even really make sense. Tackling this "global query"
    problem is part of the proposal Terry and I created... - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:      
      Allan Thomson <athomson@lookingglasscyber.com> To:      
      Jason Keirstead <Jason.Keirstead@ca.ibm.com> Cc:      
      "Bret Jordan (CS)"
    <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org"
    <cti-taxii@lists.oasis-open.org>, "John-Mark Gurney" <jmg@newcontext.com>,
    Terry MacDonald <terry.macdonald@cosive.com> Date:      
      01/25/2018 11:45 AM Subject:    
        Re: [cti-taxii]
    More questions about tracking objects in collections Sent by:    
        <cti-taxii@lists.oasis-open.org> Jason – Agree but I believe that’s done
    in the context of object query for a specific object where it makes sense
    that it returns the latest one only.   I was referring to a collection and pulling
    a collection down by timestamp.   In that case, everything that has been
    posted to the collection should be returned including prior versions of
    the object.   In this sense, I consider the collection
    a ‘queue’ and not a db per se.   Allan Thomson, CTO, Lookingglass
    Cyber Solutions This electronic message transmission
    contains information from LookingGlass Cyber Solutions, Inc. which may
    be attorney-client privileged, proprietary and/or confidential. The information
    in this message is intended only for use by the individual(s) to whom it
    is addressed.  If you believe that you have received this message
    in error, please contact the sender, delete this message, and be aware
    that any review, use, disclosure, copying or distribution of the contents
    contained within is strictly prohibited   From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Thursday, January 25, 2018 at 7:42 AM To: Allan Thomson <athomson@lookingglasscyber.com> Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>,
    "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>,
    John-Mark Gurney <jmg@newcontext.com>, Terry MacDonald <terry.macdonald@cosive.com> Subject: Re: [cti-taxii] More questions about tracking objects in collections   The spec quite clearly says right now that
    the default behavior is that only the latest version of any individual
    object is returned, *unless* the client passes "version=all". - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:         Allan
    Thomson <athomson@lookingglasscyber.com> To:         Terry MacDonald
    <terry.macdonald@cosive.com>, John-Mark Gurney <jmg@newcontext.com> Cc:         "Bret
    Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org"
    <cti-taxii@lists.oasis-open.org> Date:         01/25/2018
    11:29 AM Subject:         Re:
    [cti-taxii] More questions about tracking objects in collections Sent by:         <cti-taxii@lists.oasis-open.org> You’re assuming the TAXII server * is * the database that holds all
    records including all history. This should not be assumed. The TAXII server collections are simply queues where data get posted and
    data get pulled. Therefore it should not have any logic or assumed logic
    that it keeps historical record of all object versions. That is introducing
    a lot more impact on what the TAXII server is doing and I think goes beyond
    what the current spec says it does. I suggest the TAXII server should not just return the latest version. It
    should return all data that has been pushed to it. This is necessary for clients that want to receive objects that they missed
    since the last time they sync-d. Allan From: <cti-taxii@lists.oasis-open.org> on behalf of Terry MacDonald
    <terry.macdonald@cosive.com> Date: Wednesday, January 24, 2018 at 5:45 PM To: John-Mark Gurney <jmg@newcontext.com> Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>,
    "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org> Subject: Re: [cti-taxii] More questions about tracking objects in collections I agree. Only the latest version of a STIX object should be returned, and
    there should be a way for the client to explicitly ask for all versions
    if the client desires. Cheers Terry MacDonald Cosive On 25/01/2018 14:28, "John-Mark Gurney" < jmg@newcontext.com >
    wrote: Bret Jordan wrote this message on Wed, Jan 24, 2018 at 05:07 +0000: > I have brought up a tangential question a bit ago, but I am still
    struggling with how to best address this.  I think the conclusion
    that we came do a few weeks back might in fact be wrong.....   I will
    do my best to try an explain the problem, at least the way I see it. > > > 1) A TAXII Server will probably want to have a repository / database
    of STIX content > > 2) A TAXII Collection will have some subset of the total objects in
    its collection > > 3) A STIX object can be in multiple collections > > 4) An object in a collection probably means all versions of that object.
    Meaning, you would probably not want to individually track all versions
    of an options in each collection. Which also means, if you update an object
    you probably want the update to be available in every collection that it
    is found in.  You can see the problem at scale, take 1 billion objects,
    1000 collections with 20% overlap of data. > > > Now lets look at the following small data set (yes the ID is not a
    valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate
    things): > > > STIX Object Repo > > indicator--1 ver 1, date_added 1999 > > indicator--1 ver 2, date_added 2000 > > indicator--1 ver 3, date_added 2001 > > indicator--1 ver 4, date_added 2002 > > indicator--1 ver 5, date_added 2003 > > indicator--1 ver 6, date_added 2004 > > indicator--1 ver 7, date_added 2005 > > indicator--1 ver 8, date_added 2006 > > indicator--1 ver 9, date_added 2007 > > indicator--1 ver 10, date_added 2008 > > indicator--1 ver 11, date_added 2009 > > indicator--1 ver 12, date_added 2010 > > indicator--2 ver 1, date_added 2011 > > indicator--3 ver 1, date_added 2012 > > indicator--4 ver 1, date_added 2013 > > indicator--5 ver 1, date_added 2014 > > indicator--6 ver 1, date_added 2015 > > indicator--7 ver 1, date_added 2016 > > > Collection 1 Repo > > indicator--1, date_added 1999 > > indicator--2, date_added 2011 > > indicator--3, date_added 2012 > > indicator--4, date_added 2013 > > indicator--5, date_added 2014 > > indicator--6, date_added 2015 > > indicator--7, date_added 2016 > > > 5) Now lets assume that the TAXII server is set to only send 5 objects.
    (this is done for illustration purposes) > > 6) When the client makes its first request with an added_after URL
    of 1998 the client will get the records indicator--1 ver 1 through indicator--1
    ver 5 and the X-Headers will both be 1999. > > 7) The second request will get weird, if you add for the added_after
    value of 1999 you will get the same records again or you will skip the
    remaining versions of indicator--1. > > > You can obviously try and record the version (modified timestamp)
    in the collection table as well, but that will be an enormous amount of
    book keeping at scale and prone to error and problem.  The only way
    i can really see to solve this is to track the date_added by the time the
    object comes in to the collection, not the time it was actually added to
    the collection. > > > So while we talked a few weeks ago about changing the text to say
    when an object was added to a collection, I think that might be in error.
     Otherwise it gets ugly. > > > From a scale standpoint, think of collections with 100 million records
    or more and where some objects may be revisioned a 10,000 times. I would say part of the problem is that multiple versions are returned. IMO, unless you set a special flag, only the latest version of an object should be returned. That is an interesting point that if there are a large number of versions, that you can end up skipping them, and this is an issue. -- John-Mark --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php



  • 9.  Re: [cti-taxii] More questions about tracking objects in collections

    Posted 01-25-2018 16:01




    I was specifically thinking about if no filter is applied.
     
    From reading the spec section 3.5.
     
    “If no match parameter is specified then the TAXII Client is requesting all content be returned for that Endpoint”
     
    Would suggest everything that has been posted to the collection should be returned.
     
    Or at least this is how I interrupt it.
     
    Either way, if there is ambiguity in the spec (which there seems to be) then we need to clarify it.
     
    I’m not sure I understand your point about global query. That sounds like turning a TAXII server into an object database which I think is a mistake. If you want an object/relational/….etc database then there’s lot to pick from.

     

    Allan

     

    From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Thursday, January 25, 2018 at 7:52 AM
    To: Allan Thomson <athomson@lookingglasscyber.com>
    Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Terry MacDonald <terry.macdonald@cosive.com>
    Subject: Re: [cti-taxii] More questions about tracking objects in collections


     

    The way filtering is defined right now - it can apply to any endpoint. IE, you apply your filter to the GET on /collections/<id>/objects.  So the
    way I read the spec... this "return the latest version" is true throughout all endpoints, right now.


    We actually don't have any endpoint that would allow something like a "global search" right now anyway, so the /collections/<id>/objects endpoints
    are the only places filters even really make sense.

    Tackling this "global query" problem is part of the proposal Terry and I created...


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Allan
    Thomson <athomson@lookingglasscyber.com>
    To:         Jason
    Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "Bret
    Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>, "John-Mark Gurney" <jmg@newcontext.com>, Terry MacDonald <terry.macdonald@cosive.com>
    Date:         01/25/2018
    11:45 AM
    Subject:         Re:
    [cti-taxii] More questions about tracking objects in collections
    Sent by:         <cti-taxii@lists.oasis-open.org>




    Jason – Agree but I believe that’s done in the context of object query for a specific object where it makes sense that it returns the latest one only.
     
    I was referring to a collection and pulling a collection down by timestamp.
     
    In that case, everything that has been posted to the collection should be returned including prior versions of the object.
     
    In this sense, I consider the collection a ‘queue’ and not a db per se.
     
    Allan Thomson,

    CTO,
    Lookingglass
    Cyber Solutions
    This electronic message transmission contains information from LookingGlass Cyber Solutions, Inc. which may be attorney-client privileged, proprietary
    and/or confidential. The information in this message is intended only for use by the individual(s) to whom it is addressed.  If you believe that you have received this message in error, please contact the sender, delete this message, and be aware that any
    review, use, disclosure, copying or distribution of the contents contained within is strictly prohibited
     
    From:
    Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Thursday, January 25, 2018 at 7:42 AM
    To: Allan Thomson <athomson@lookingglasscyber.com>
    Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Terry MacDonald <terry.macdonald@cosive.com>
    Subject: Re: [cti-taxii] More questions about tracking objects in collections
     
    The spec quite clearly says right now that the default behavior is that only the latest version of any individual object is returned, *unless*
    the client passes "version=all".


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Allan Thomson <athomson@lookingglasscyber.com>
    To:         Terry MacDonald <terry.macdonald@cosive.com>, John-Mark Gurney <jmg@newcontext.com>
    Cc:         "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>
    Date:         01/25/2018 11:29 AM
    Subject:         Re: [cti-taxii] More questions about tracking objects in collections
    Sent by:         <cti-taxii@lists.oasis-open.org>




    You’re assuming the TAXII server * is * the database that holds all records including all history. This should not be assumed.

    The TAXII server collections are simply queues where data get posted and data get pulled. Therefore it should not have any logic or assumed logic that it keeps historical record of all object versions. That is introducing a lot more impact on what the TAXII
    server is doing and I think goes beyond what the current spec says it does.

    I suggest the TAXII server should not just return the latest version. It should return all data that has been pushed to it.

    This is necessary for clients that want to receive objects that they missed since the last time they sync-d.

    Allan

    From: <cti-taxii@lists.oasis-open.org> on behalf of Terry MacDonald <terry.macdonald@cosive.com>
    Date: Wednesday, January 24, 2018 at 5:45 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti-taxii@lists.oasis-open.org" <cti-taxii@lists.oasis-open.org>
    Subject: Re: [cti-taxii] More questions about tracking objects in collections

    I agree. Only the latest version of a STIX object should be returned, and there should be a way for the client to explicitly ask for all versions if the client desires.


    Cheers
    Terry MacDonald
    Cosive

    On 25/01/2018 14:28, "John-Mark Gurney" < jmg@newcontext.com >
    wrote:
    Bret Jordan wrote this message on Wed, Jan 24, 2018 at 05:07 +0000:
    > I have brought up a tangential question a bit ago, but I am still struggling with how to best address this.  I think the conclusion that we came do a few weeks back might in fact be wrong.....   I will do my best to try an explain the problem, at least the
    way I see it.
    >
    >
    > 1) A TAXII Server will probably want to have a repository / database of STIX content
    >
    > 2) A TAXII Collection will have some subset of the total objects in its collection
    >
    > 3) A STIX object can be in multiple collections
    >
    > 4) An object in a collection probably means all versions of that object. Meaning, you would probably not want to individually track all versions of an options in each collection. Which also means, if you update an object you probably want the update to be
    available in every collection that it is found in.  You can see the problem at scale, take 1 billion objects, 1000 collections with 20% overlap of data.
    >
    >
    > Now lets look at the following small data set (yes the ID is not a valid UUIDv4 and the ver is not a timestamp, but this is done to illustrate things):
    >
    >
    > STIX Object Repo
    >
    > indicator--1 ver 1, date_added 1999
    >
    > indicator--1 ver 2, date_added 2000
    >
    > indicator--1 ver 3, date_added 2001
    >
    > indicator--1 ver 4, date_added 2002
    >
    > indicator--1 ver 5, date_added 2003
    >
    > indicator--1 ver 6, date_added 2004
    >
    > indicator--1 ver 7, date_added 2005
    >
    > indicator--1 ver 8, date_added 2006
    >
    > indicator--1 ver 9, date_added 2007
    >
    > indicator--1 ver 10, date_added 2008
    >
    > indicator--1 ver 11, date_added 2009
    >
    > indicator--1 ver 12, date_added 2010
    >
    > indicator--2 ver 1, date_added 2011
    >
    > indicator--3 ver 1, date_added 2012
    >
    > indicator--4 ver 1, date_added 2013
    >
    > indicator--5 ver 1, date_added 2014
    >
    > indicator--6 ver 1, date_added 2015
    >
    > indicator--7 ver 1, date_added 2016
    >
    >
    > Collection 1 Repo
    >
    > indicator--1, date_added 1999
    >
    > indicator--2, date_added 2011
    >
    > indicator--3, date_added 2012
    >
    > indicator--4, date_added 2013
    >
    > indicator--5, date_added 2014
    >
    > indicator--6, date_added 2015
    >
    > indicator--7, date_added 2016
    >
    >
    > 5) Now lets assume that the TAXII server is set to only send 5 objects. (this is done for illustration purposes)
    >
    > 6) When the client makes its first request with an added_after URL of 1998 the client will get the records indicator--1 ver 1 through indicator--1 ver 5 and the X-Headers will both be 1999.
    >
    > 7) The second request will get weird, if you add for the added_after value of 1999 you will get the same records again or you will skip the remaining versions of indicator--1.
    >
    >
    > You can obviously try and record the version (modified timestamp) in the collection table as well, but that will be an enormous amount of book keeping at scale and prone to error and problem.  The only way i can really see to solve this is to track the date_added
    by the time the object comes in to the collection, not the time it was actually added to the collection.
    >
    >
    > So while we talked a few weeks ago about changing the text to say when an object was added to a collection, I think that might be in error.  Otherwise it gets ugly.
    >
    >
    > From a scale standpoint, think of collections with 100 million records or more and where some objects may be revisioned a 10,000 times.

    I would say part of the problem is that multiple versions are returned.
    IMO, unless you set a special flag, only the latest version of an object
    should be returned.

    That is an interesting point that if there are a large number of versions,
    that you can end up skipping them, and this is an issue.

    --
    John-Mark

    ---------------------------------------------------------------------
    To unsubscribe from this mail list, you must leave the OASIS TC that
    generates this mail.  Follow this link to all your TCs in OASIS at:
    https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php