OASIS Cyber Threat Intelligence (CTI) TC

 View Only
  • 1.  RE: [EXT] Re: [cti] TAXII Pagination

    Posted 10-09-2019 15:45




    All,
     
    As I was reading this proposed solution for TAXII Pagination. It occurred to me that currently a TAXII Server does not have a way of advertising his self-imposed limit for pagination requests. This way, a client can also know ahead of time
    its limit via the server api_root resource. This more of a different problem than the originally expressed in this thread, but related.
     
    What I propose is adding a new property called max_limit
    and you can read the details below.
     




    Property Name


    Type


    Description




    title (required)


    string


    A human readable plain text name used to identify this API instance. 




    description (optional)


    string


    A human readable plain text description for this API Root.




    versions (required)


    list of type
    string


    The list of TAXII versions that this API Root is compatible with. The values listed in this property
    MUST match the media types defined in Section
    1.6.8.1
    and MUST include the optional version parameter. A value of " application/taxii+json;version=2.1"
    MUST be included in this list to indicate conformance with this specification.




    max_content_length (required)


    integer


    The maximum size of the request body in octets (8-bit bytes) that the server can support. The value of the
    max_content_length
    MUST be a positive integer greater than zero. This applies to requests only and
    is determined by the server. Requests with total body length values smaller than this value
    MUST NOT result in an HTTP 413 (Request Entity Too Large) response. If for example, the server supported 100 MB of data, the value for this property would be determined by 100*1024*1024 which equals 104,857,600. This property contains useful information
    for the client when it POSTs requests to the Add Objects endpoint.




    max_limit (required)


    integer


    The maximum server imposed limit for pagination requests. The value of the
    max_limit
    MUST be a positive integer greater than zero. This only applies to pagination
    requests made to this api root. Any request with a limit greater than max_limit will be overridden
    by the server self-imposed limit.




     
    Any thoughts?
     
    - Emmanuelle
     


    From: cti@lists.oasis-open.org <cti@lists.oasis-open.org>
    On Behalf Of Allan Thomson
    Sent: Friday, October 4, 2019 3:57 PM
    To: Matt Pladna <mpladna@lookingglasscyber.com>; Bret Jordan <Bret_Jordan@symantec.com>; cti@lists.oasis-open.org
    Subject: [EXT] Re: [cti] TAXII Pagination


     
    +1
     

    Allan Thomson
    CTO ( +1-408-331-6646)

    LookingGlass Cyber Solutions
     

    From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    on behalf of Matt Pladna < mpladna@lookingglasscyber.com >
    Date: Friday, October 4, 2019 at 12:50 PM
    To: Bret Jordan < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] TAXII Pagination


     

    Thanks Bret,
     
    I like this approach and believe it s a small flexible change that lets a client consume data in page sizes they want regardless of what backend the target server uses. 

     
    Looking forward to feedback from others.
     
    Thanks,
     

    From: < cti@lists.oasis-open.org > on behalf of Bret Jordan < Bret_Jordan@symantec.com >
    Date: Friday, October 4, 2019 at 15:16
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] TAXII Pagination


     


    All,


     


    In TAXII 2.1 we have a pretty good pagination solution, but it suffers from a known issue when multiple records have the same date added value. We originally
    tried  to address this by saying that the date added value MUST be microsecond level precision. But that is not sufficient for some.  


     


    As such, I have been working with Looking Glass on a potential solution that requires the least amount of changes to make this work.  After many back-and-forth versions, I think we have something
    that might work.  Please review. 


     


     


    TAXII Pagination Proposal


     


    To keep things simple, for mental visualization, we will be defining the scenarios in terms of small numbers.  But one must realize that in production, these numbers will be many orders of magnitude
    larger.


     


    1 Fundamental Design Goals


    Completely stateless for the server in the true RESTful sense



    Simple way for clients to start synchronization after some point in time, without having to sync the entire collection.


    Example: A collection may have billions of records in it going back 10 years. But a client really only cares about syncing or getting data from the past 6 months.



    Need ability to paginate records where every record has its own date_added value


    Need ability to paginate records where many records may have the same date_added value


     


    2 Proposed Solution Summary




    Add a single optional property called "next" (type: string) to the TAXII Envelope
    Add a URL parameter called "next"


     


    3 Scenario



     


    The collection has 200 indicator records, however, the first 100 records all have the same date_added timestamp


     


    3.1 Problem


    Our current method breaks if and only if, the client has a limit of less than 100 or the server artificially limits the records to less than 100. Under this condition the client will not get all
    of the records or will have inconsistent experience. 


     


    3.2 Example Initial Request From Client


    ? added_after=2010-01-01T01:01:01.123456Z & limit=20


     


    3.3 Server Processes Query Request


    The server queries the datastore with a record limit of 21 records (client provided or server limited limit value + 1) that match the rest of the request


     




    The server checks results to see if there are 21 records returned.




    If NO then there are no more records that match the query and the TAXII server can send the results in a TAXII envelope to the client





    TAXII Envelope "more" property set to "false"
    TAXII Envelope "next" property is left empty





    If YES then there are more records and the server would respond with the following





    TAXII Envelope "more" property set to "true"
    TAXII Envelope "next" property set to a string value. For a relational database this could be the index autoID, for elastic search it could be the Scroll ID, for other systems it could be a cursor ID, or it could be any string
    (or int represented as a string) depending on the requirements of the server and the black magic it is doing in the background. The key is that it is something that the server knows how to deal with and process and the client only needs to send it back to
    the server in the next request to get more data.




     


    3.4 Example Follow On Request From Client


    ? added_after=2010-01-01T01:01:01.123456Z & limit=20 & next=123456789


     


     


    If we can verify that this does solve the issue, and is still easy to implement (I believe so) this is something that we could do for TAXII 2.1, if the TC agrees.  Yes it would require another
    CSD and Public Review, but it would allow us to address this last known issue.


     


     


    Thoughts ????


     


    Bret


     






  • 2.  Re: [cti] [EXT] Re: [cti] TAXII Pagination

    Posted 10-10-2019 14:22



    I thought a bit more about this, and after talking with Allan, I think the limit on the server side will probably be variable based on the size of the content.  So we might need to tweak the text and make sure that the max_content_length works in both directions.
     


    For example, say if normally a server limits a client to 500 objects at a time.  But say some of those are really big, like a gig in size. The server may need to dynamically change the limit based on the size of the objects.  So a client would
    need to always check the envelope to see if there are more records.  


    Thoughts ????


    Bret





    On Oct 9, 2019, at 11:43 AM, Vargas-Gonzalez, Emmanuelle < emmanuelle@mitre.org > wrote:




    All,

     

    As I was reading this proposed solution for TAXII Pagination. It occurred to me that currently a TAXII Server does not have a way of advertising his self-imposed limit for pagination requests. This way, a client can also know ahead of time its limit via the
    server api_root resource. This more of a different problem than the originally expressed in this thread, but related.

     

    What I propose is adding a new property called   max_limit   and you can read the details
    below.

     




    Property Name


    Type


    Description




    title   (required)


    string


    A human readable plain text name used to identify this API instance. 




    description (optional)


    string


    A human readable plain text description for this API Root.




    versions   (required)


    list   of
    type string


    The list of TAXII versions that this API Root is compatible with. The values listed in this property   MUST   match
    the media types defined in Section   1.6.8.1   and   MUST   include
    the optional version parameter. A value of " application/taxii+json;version=2.1"   MUST   be
    included in this list to indicate conformance with this specification.




    max_content_length (required)


    integer


    The maximum size of the request body in octets (8-bit bytes) that the server can support. The value of the   max_content_length   MUST   be
    a positive   integer   greater
    than zero. This applies to requests only and is determined by the server. Requests with total body length values smaller than this value   MUST NOT   result in
    an HTTP 413 (Request Entity Too Large) response. If for example, the server supported 100 MB of data, the value for this property would be determined by 100*1024*1024 which equals 104,857,600. This property contains useful information for the client when it
    POSTs requests to the Add Objects endpoint.




    max_limit (required)


    integer


    The maximum server imposed limit for pagination requests. The value of the   max_limit   MUST   be
    a positive   integer   greater
    than zero. This only applies to pagination requests made to this api root. Any request with a limit greater than   max_limit   will
    be overridden by the server self-imposed limit.





     

    Any thoughts?

     

    - Emmanuelle

     



    From:   cti@lists.oasis-open.org   < cti@lists.oasis-open.org >   On
    Behalf Of   Allan Thomson
    Sent:   Friday, October 4, 2019 3:57 PM
    To:   Matt Pladna < mpladna@lookingglasscyber.com >; Bret Jordan < Bret_Jordan@symantec.com >;   cti@lists.oasis-open.org
    Subject:   [EXT] Re: [cti] TAXII Pagination  



     

    +1

     


    Allan Thomson

    CTO ( +1-408-331-6646)


    LookingGlass Cyber Solutions

     


    From:   " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > on behalf of Matt Pladna < mpladna@lookingglasscyber.com >
    Date:   Friday, October 4, 2019 at 12:50 PM
    To:   Bret Jordan < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    Subject:   Re: [cti] TAXII Pagination  



     


    Thanks Bret,

     

    I like this approach and believe it s a small flexible change that lets a client consume data in page sizes they want regardless of what backend the target server uses. 

     

    Looking forward to feedback from others.

     

    Thanks,

     


    From:   < cti@lists.oasis-open.org >
    on behalf of Bret Jordan < Bret_Jordan@symantec.com >
    Date:   Friday, October 4, 2019 at 15:16
    To:   " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject:   [cti] TAXII Pagination  



     



    All,



     



    In TAXII 2.1 we have a pretty good pagination solution, but it suffers from a known issue when multiple records have the same date added value. We originally tried  to
    address this by saying that the date added value MUST be microsecond level precision. But that is not sufficient for some.  



     



    As such, I have been working with Looking Glass on a potential solution that requires the least amount of changes to make this work.  After many back-and-forth versions, I think we have something that might work.  Please
    review. 



     



     



    TAXII Pagination Proposal


     



    To keep things simple, for mental visualization, we will be defining the scenarios in terms of small numbers.  But one must realize that in production, these numbers will be many orders of magnitude larger.



     



    1 Fundamental Design Goals



    Completely stateless for the server in the true RESTful sense



    Simple way for clients to start synchronization after some point in time, without having to sync the entire collection.



    Example: A collection may have billions of records in it going back 10 years. But a client really only cares about syncing or getting data from the past 6 months.



    Need ability to paginate records where every record has its own date_added value



    Need ability to paginate records where many records may have the same date_added value



     



    2 Proposed Solution Summary




    Add a single optional property called "next" (type: string) to the TAXII Envelope
    Add a URL parameter called "next"



     



    3 Scenario  



     



    The collection has 200 indicator records, however, the first 100 records all have the same date_added timestamp



     



    3.1 Problem



    Our current method breaks if and only if, the client has a limit of less than 100 or the server artificially limits the records to less than 100. Under this condition the client will not get all of the records or will
    have inconsistent experience. 



     



    3.2 Example Initial Request From Client



    ? added_after=2010-01-01T01:01:01.123456Z & limit=20



     



    3.3 Server Processes Query Request



    The server queries the datastore with a record limit of 21 records (client provided or server limited limit value + 1) that match the rest of the request



     




    The server checks results to see if there are 21 records returned.



    If NO then there are no more records that match the query and the TAXII server can send the results in a TAXII envelope to the client





    TAXII Envelope "more" property set to "false"
    TAXII Envelope "next" property is left empty





    If YES then there are more records and the server would respond with the following





    TAXII Envelope "more" property set to "true"
    TAXII Envelope "next" property set to a string value. For a relational database this could be the index autoID, for elastic search it could be the Scroll ID, for other systems it could be a cursor ID, or it could be any
    string (or int represented as a string) depending on the requirements of the server and the black magic it is doing in the background. The key is that it is something that the server knows how to deal with and process and the client only needs to send it back
    to the server in the next request to get more data.  





     



    3.4 Example Follow On Request From Client



    ? added_after=2010-01-01T01:01:01.123456Z & limit=20 & next=123456789



     



     



    If we can verify that this does solve the issue, and is still easy to implement (I believe so) this is something that we could do for TAXII 2.1, if the TC agrees.  Yes it would require another CSD and Public Review, but
    it would allow us to address this last known issue.



     



     



    Thoughts ????



     



    Bret













  • 3.  Re: [cti] RE: [EXT] Re: [cti] TAXII Pagination

    Posted 10-10-2019 15:33
    The "next"
    method is precisely what I was originally asking for a few months ago,
    so I agree with this approach. However a question,
    with this solution, can one simply opt to use the "next" method
    and skip the time-based method totally? This would be ideal. - Jason Keirstead Chief Architect - IBM Security Threat Management www.ibm.com/security "Would you like me to give you a formula for success? It's quite simple,
    really. Double your rate of failure." - Thomas J. Watson From:
            "Vargas-Gonzalez,
    Emmanuelle" <emmanuelle@mitre.org> To:
            Allan
    Thomson <athomson@lookingglasscyber.com>, Matt Pladna <mpladna@lookingglasscyber.com>,
    Bret Jordan <Bret_Jordan@symantec.com>, "cti@lists.oasis-open.org"
    <cti@lists.oasis-open.org> Date:
            10/09/2019
    12:45 PM Subject:
            [EXTERNAL]
    [cti] RE: [EXT] Re: [cti] TAXII Pagination Sent
    by:         <cti@lists.oasis-open.org> All,   As
    I was reading this proposed solution for TAXII Pagination. It occurred
    to me that currently a TAXII Server does not have a way of advertising
    his self-imposed limit for pagination requests. This way, a client can
    also know ahead of time its limit via the server api_root resource. This
    more of a different problem than the originally expressed in this thread,
    but related.   What
    I propose is adding a new property called max_limit
    and you can
    read the details below.   Property
    Name Type Description title (required) string A
    human readable plain text name used to identify this API instance. description (optional) string A
    human readable plain text description for this API Root. versions (required) list of type string The
    list of TAXII versions that this API Root is compatible with. The values
    listed in this property MUST match the media types defined in Section 1.6.8.1 and MUST include the optional version parameter. A value of " application/taxii+json;version=2.1" MUST be included in this list to indicate conformance with this
    specification. max_content_length (required) integer The
    maximum size of the request body in octets (8-bit bytes) that the server
    can support. The value of the max_content_length MUST be a positive integer greater than zero. This applies to requests only and is determined by the
    server. Requests with total body length values smaller than this value
    MUST NOT result in an HTTP 413 (Request Entity Too Large) response.
    If for example, the server supported 100 MB of data, the value for this
    property would be determined by 100*1024*1024 which equals 104,857,600.
    This property contains useful information for the client when it POSTs
    requests to the Add Objects endpoint. max_limit (required) integer The
    maximum server imposed limit for pagination requests. The value of the
    max_limit MUST be a positive integer greater than zero. This only applies to pagination requests made to this
    api root. Any request with a limit greater than max_limit will be overridden by the server self-imposed limit.   Any
    thoughts?   -
    Emmanuelle   From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> On Behalf
    Of Allan Thomson Sent: Friday, October 4, 2019 3:57 PM To: Matt Pladna <mpladna@lookingglasscyber.com>; Bret Jordan
    <Bret_Jordan@symantec.com>; cti@lists.oasis-open.org Subject: [EXT] Re: [cti] TAXII Pagination   +1   Allan
    Thomson CTO
    (+1-408-331-6646) LookingGlass
    Cyber Solutions   From:
    " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    on behalf of Matt Pladna < mpladna@lookingglasscyber.com > Date: Friday, October 4, 2019 at 12:50 PM To: Bret Jordan < Bret_Jordan@symantec.com >,
    " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > Subject: Re: [cti] TAXII Pagination   Thanks
    Bret,   I
    like this approach and believe it s a small flexible change that lets
    a client consume data in page sizes they want regardless of what backend
    the target server uses.     Looking
    forward to feedback from others.   Thanks,   From:
    < cti@lists.oasis-open.org >
    on behalf of Bret Jordan < Bret_Jordan@symantec.com > Date: Friday, October 4, 2019 at 15:16 To: " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > Subject: [cti] TAXII Pagination   All,   In
    TAXII 2.1 we have a pretty good pagination solution, but it suffers from
    a known issue when multiple records have the same date added value. We
    originally tried to address this by saying that the date added value MUST
    be microsecond level precision. But that is not sufficient for some.     As
    such, I have been working with Looking Glass on a potential solution that
    requires the least amount of changes to make this work.  After many
    back-and-forth versions, I think we have something that might work.  Please
    review.     TAXII
    Pagination Proposal   To
    keep things simple, for mental visualization, we will be defining the scenarios
    in terms of small numbers.  But one must realize that in production,
    these numbers will be many orders of magnitude larger.   1
    Fundamental Design Goals Completely
    stateless for the server in the true RESTful sense Simple
    way for clients to start synchronization after some point in time, without
    having to sync the entire collection. Example:
    A collection may have billions of records in it going back 10 years. But
    a client really only cares about syncing or getting data from the past
    6 months. Need
    ability to paginate records where every record has its own date_added value Need
    ability to paginate records where many records may have the same date_added
    value   2
    Proposed Solution Summary Add a single optional
    property called "next" (type: string) to the TAXII Envelope Add a URL parameter
    called "next"   3
    Scenario   The
    collection has 200 indicator records, however, the first 100 records all
    have the same date_added timestamp   3.1
    Problem Our
    current method breaks if and only if, the client has a limit of less than
    100 or the server artificially limits the records to less than 100. Under
    this condition the client will not get all of the records or will have
    inconsistent experience.   3.2
    Example Initial Request From Client ?added_after=2010-01-01T01:01:01.123456Z&limit=20   3.3
    Server Processes Query Request The
    server queries the datastore with a record limit of 21 records (client
    provided or server limited limit value + 1) that match the rest of the
    request   The server
    checks results to see if there are 21 records returned. If NO then
    there are no more records that match the query and the TAXII server can
    send the results in a TAXII envelope to the client TAXII Envelope
    "more" property set to "false" TAXII Envelope
    "next" property is left empty If YES then
    there are more records and the server would respond with the following TAXII Envelope
    "more" property set to "true" TAXII Envelope
    "next" property set to a string value. For a relational database
    this could be the index autoID, for elastic search it could be the Scroll
    ID, for other systems it could be a cursor ID, or it could be any string
    (or int represented as a string) depending on the requirements of the server
    and the black magic it is doing in the background. The key is that it is
    something that the server knows how to deal with and process and the client
    only needs to send it back to the server in the next request to get more
    data.   3.4
    Example Follow On Request From Client ?added_after=2010-01-01T01:01:01.123456Z&limit=20&next=123456789     If
    we can verify that this does solve the issue, and is still easy to implement
    (I believe so) this is something that we could do for TAXII 2.1, if the
    TC agrees.  Yes it would require another CSD and Public Review, but
    it would allow us to address this last known issue.     Thoughts
    ????   Bret