CTI STIX Subcommittee

  • 1.  Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 16:22
    I've been thinking a lot about this and I think it makes sense. One of the concerns we had at the time we chose UUID4 is that users of libraries like python-stix would need to remember to set the UUID5 namespace -- or, if they don't and python-stix has some default namespace, different tools using the libraries could have overlapping IDs. This would also apply to users of the new Java libraries that I've seen come out. It might mean these libraries requiring that people set a unique namespace before creating any objects, vs. now where it can just go ahead and create IDs by default. I'd be curious what other people think about this problem and how we can help avoid it becoming an issue (especially given how many people use those libraries). John ïOn 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey Polzunov" <cti-stix@lists.oasis-open.org on behalf of sergey@eclecticiq.com> wrote: Hey everybody! Current STIX2 spec definition of an`identifier` for STIX2 objects is as follows: > An identifier universally and uniquely identifies a SDO, SRO, Bundle, or Marking Definition. Identifiers MUST follow the form object-type--UUIDv4, where object-type is the exact value (all type names are lowercase strings, by definition) from the type property of the object being identified or referenced and where the UUIDv4 is an RFC 4122-compliant Version 4 UUID. The UUID MUST be generated according to the algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. from http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 I think the requirement to have UUID4 brings more problems than benefits. It makes STIX1->STIX2 transition difficult, hurting existing STIX1 users. I will try to show it in these 2 use cases. Use case 1 ---------- Imagine that I'm a client of an intelligence provider A. I've been a client for a long time and I have received intelligence in STIX1.2, which I stored in my DB. I fetch new intelligence daily, downloading only fresh data. Often fresh data links to old objects for context. Provider A decides to upgrade and switch to STIX2. In addition to an old STIX1.2 feed, provider creates new STIX2 feed with the same data. In STIX2 all objects have new identifiers and Provider A does not bother to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have 2 options: - clean slate option: drop all old data from this provider and re-fetch everything. That will work if Provider A is the only provider I use or if I never referenced Provider A's data from my own intelligence. Not a great plan. - new era option: leave my STIX1.2 data graph in place and start consuming new STIX2 feed from today. This option has one big issue: new STIX2 data will not be connected to STIX1.2 data I already have, because STIX2 ids are all different. If I want to deduce connection, I need to deduplicate the data against my existing STIX1.2 DB. This means my ingestion pipeline must be smart enough to compare STIX1.2 objects to STIX2 objects and be fast enough to do that for every new STIX2 object. This will be difficult to implement and will have a huge performance penalty. Use case 2 ---------- Imagine that I'm a NCSC. I receive intelligence from providers, combine it and distribute it to my clients. My providers are still on STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive into STIX2. Full STIX1.2 entities I can transform easily but what do I do with IDREFs I have in my STIX1.2 data? I can generate new STIX2 id every time I see new STIX1.2 IDREF in incoming data and store STIX1.2->STIX2 mapping somewhere to be used next time I see this IDREF. This is painful and will require additional resources, but it is doable. But it will only work until the moment my providers switch to STIX2 and start sending me full objects for those IDREFs with new random STIX2 identifiers! I can not predict these identifiers and I can't match them with the ones I generated. So my thinking is - what is the point in even bothering with old IDREFs? I will just drop them, sending my clients sometimes disconnected STIX2 entities, hoping that they will figure it out. Proposed solutions if UUID5 is allowed in STIX2 identifiers: Use case 1 solution ------------------- There can be a guideline that will recommend providers to use old STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are predictable I, as a client, can greatly simplify my deduplication logic. I can run DB migration once to calculate STIX2 identifiers for all my STIX1.2 objects and use these on ingestion for deduplication. Appending STIX2 data to my STIX1.2 DB will be much easier. I'm also interested in pushing Provider A to adopting this STIX2 identifier generation practice because it will save me money. Use case 2 solution ------------------- WIth UUID5 I have a way out: I can generate new STIX2 ids from old STIX1.2 ids! I can parse IDREF value, that looks like `[ns prefix]:[construct type]-[GUID]`, and use provider's namespace / construct type to build new STIX2 identifier. The logic will be like this: - full IDREF will be input for UUID5 function - for STIX1.2 types that were split (like TTP), I do not know exact STIX2 type Provider would use for old TTP. My solution here would be to play safe and create relations for all possible types: for IDREF to TTP, I will create 4 relations: one to a possible Tool object, one to Malware, one to Attack Pattern and one to Identity. It is an overhead but it is a small price for keeping interconnected intelligence graph. Again, when time comes and my providers move to STIX2, I'm interested in pushing them to adopt this id generation schema for old objects, because it will save me, as NCSC, money. To reiterate, I would like to propose: - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; - creating a guideline, complimentary to the spec, that would explain how STIX1.2 ids can be transformed into STIX2 for easier transition. Practicalities: UUID5 ids require use of a namespace. UUID5 RFC ( https://tools.ietf.org/html/rfc4122#section-4.3 ) defines some generic namespaces ( https://tools.ietf.org/html/rfc4122#appendix-C ) but does not prohibits the use of custom ones. I suggest this algorithm: - namespace UUID5 is generated by using predefined `NameSpace_URL` namespace and producer's URL; - for old objects, GUID part of STIX2 identifier is namespaced UUID5 generated from old STIX1.2 id - for new objects, GUID part of STIX2 identifier is either namespaced UUID5 with random UUID4 string, or just random UUID4. Example python code for generating UUID5 with custom namespace: In [1]: import uuid ...: ...: stix12_id = 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, ' https://eclecticiq.com/ns ') ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) ...: ...: print("new STIX2 id: {}".format(stix2_id)) new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e BONUS: python functions to convert STIX1.2 IDREFs into STIX2 identifiers - https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 Thank you, Sergey Polzunov EclecticIQ


  • 2.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 18:08
    I would also support this. I have learned more about the inner
    workings of UUID4/5 and I don't have any reservations about it anymore.
    The odds of collision with a properly-implemented UUID5 are on-par with
    UUID4 As far as John's comment below - all
    this means IMHO is the library has to force you to provide a namespace
    (ie make it a mandatory argument in your constructor or whatever). - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:      
      "Wunder, John
    A." <jwunder@mitre.org> To:      
      Sergey Polzunov <sergey@eclecticiq.com>,
    "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> Date:      
      02/04/2019 12:22 PM Subject:    
        [cti-stix] Re:
    [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Sent by:    
        <cti-stix@lists.oasis-open.org> I've been thinking a lot about this and I think it
    makes sense. One of the concerns we had at the time we chose UUID4 is that users of
    libraries like python-stix would need to remember to set the UUID5 namespace
    -- or, if they don't and python-stix has some default namespace, different
    tools using the libraries could have overlapping IDs. This would also apply
    to users of the new Java libraries that I've seen come out. It might mean
    these libraries requiring that people set a unique namespace before creating
    any objects, vs. now where it can just go ahead and create IDs by default.
    I'd be curious what other people think about this problem and how we can
    help avoid it becoming an issue (especially given how many people use those
    libraries). John ïOn 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of
    Sergey Polzunov" <cti-stix@lists.oasis-open.org on behalf of sergey@eclecticiq.com>
    wrote:    Hey everybody!        Current STIX2 spec definition of an`identifier` for STIX2
    objects is as follows:        > An identifier universally and uniquely identifies a
    SDO, SRO, Bundle, or Marking Definition. Identifiers MUST follow the form
    object-type--UUIDv4, where object-type is the exact value (all type names
    are lowercase strings, by definition) from the type property of the object
    being identified or referenced and where the UUIDv4 is an RFC 4122-compliant
    Version 4 UUID. The UUID MUST be generated according to the algorithm(s)
    defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122].     from http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265        I think the requirement to have UUID4 brings more problems
    than benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    STIX1 users.    I will try to show it in these 2 use cases.            Use case 1    ----------        Imagine that I'm a client of an intelligence
    provider A. I've been a client for a long time and I have received intelligence
    in STIX1.2, which I stored in my DB. I fetch new intelligence daily, downloading
    only fresh data. Often fresh data links to old objects for context.        Provider A decides to upgrade and switch to
    STIX2. In addition to an old STIX1.2 feed, provider creates new STIX2 feed
    with the same data. In STIX2 all objects have new identifiers and Provider
    A does not bother to supply a mapping of STIX1.2 ids to STIX2 ids. Now
    I, as a client, have 2 options:        - clean slate option: drop all old data from
    this provider and re-fetch everything. That will work if Provider A is
    the only provider I use or if I never referenced Provider A's data from
    my own intelligence. Not a great plan.        - new era option: leave my STIX1.2 data graph
    in place and start consuming new STIX2 feed from today. This option has
    one big issue: new STIX2 data will not be connected to STIX1.2 data I already
    have, because STIX2 ids are all different. If I want to deduce connection,
    I need to deduplicate the data against my existing STIX1.2 DB. This means
    my ingestion pipeline must be smart enough to compare STIX1.2 objects to
    STIX2 objects and be fast enough to do that for every new STIX2 object.
    This will be difficult to implement and will have a huge performance penalty.        Use case 2    ----------        Imagine that I'm a NCSC. I receive intelligence
    from providers, combine it and distribute it to my clients. My providers
    are still on STIX1.2 but my clients want STIX2, so I must convert STIX1.2
    I receive into STIX2. Full STIX1.2 entities I can transform easily but
    what do I do with IDREFs I have in my STIX1.2 data?        I can generate new STIX2 id every time I see
    new STIX1.2 IDREF in incoming data and store STIX1.2->STIX2 mapping
    somewhere to be used next time I see this IDREF. This is painful and will
    require additional resources, but it is doable. But it will only work until
    the moment my providers switch to STIX2 and start sending me full objects
    for those IDREFs with new random STIX2 identifiers! I can not predict these
    identifiers and I can't match them with the ones I generated. So my thinking
    is - what is the point in even bothering with old IDREFs? I will just drop
    them, sending my clients sometimes disconnected STIX2 entities, hoping
    that they will figure it out.            Proposed solutions if UUID5 is allowed in STIX2 identifiers:        Use case 1 solution    -------------------        There can be a guideline that will recommend
    providers to use old STIX1.2 IDs as input for new STIX2 identifiers. If
    STIX2 identifiers are predictable I, as a client, can greatly simplify
    my deduplication logic. I can run DB migration once to calculate STIX2
    identifiers for all my STIX1.2 objects and use these on ingestion for deduplication.
    Appending STIX2 data to my STIX1.2 DB will be much easier.        I'm also interested in pushing Provider A to
    adopting this STIX2 identifier generation practice because it will save
    me money.        Use case 2 solution    -------------------    WIth UUID5 I have a way out: I can generate new STIX2 ids
    from old STIX1.2 ids! I can parse IDREF value, that looks like `[ns prefix]:[construct
    type]-[GUID]`, and use provider's namespace / construct type to build new
    STIX2 identifier. The logic will be like this:        - full IDREF will be input for UUID5 function        - for STIX1.2 types that were split (like TTP),
    I do not know exact STIX2 type Provider would use for old TTP. My solution
    here would be to play safe and create relations for all possible types:
    for IDREF to TTP, I will create 4 relations: one to a possible Tool object,
    one to Malware, one to Attack Pattern and one to Identity. It is an overhead
    but it is a small price for keeping interconnected intelligence graph.        Again, when time comes and my providers move
    to STIX2, I'm interested in pushing them to adopt this id generation schema
    for old objects, because it will save me, as NCSC, money.            To reiterate, I would like to propose:    - a change in STIX2 spec to allow both UUID5 and UUID4 to
    be used in an identifier of SDO, SRO, MarkingDefinition and Custom Object
    entities;    - creating a guideline, complimentary to the spec, that would
    explain how STIX1.2 ids can be transformed into STIX2 for easier transition.            Practicalities:        UUID5 ids require use of a namespace. UUID5 RFC ( https://tools.ietf.org/html/rfc4122#section-4.3 )
    defines some generic namespaces ( https://tools.ietf.org/html/rfc4122#appendix-C )
    but does not prohibits the use of custom ones. I suggest this algorithm:        - namespace UUID5 is generated by using predefined
    `NameSpace_URL` namespace and producer's URL;        - for old objects, GUID part of STIX2 identifier
    is namespaced UUID5 generated from old STIX1.2 id        - for new objects, GUID part of STIX2 identifier
    is either namespaced UUID5 with random UUID4 string, or just random UUID4.        Example python code for generating UUID5 with custom namespace:            In [1]: import uuid           ...:           ...: stix12_id = 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473'           ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=')           ...: stix2_uuid = uuid.uuid5(namespace_uuid,
    stix12_id)           ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid)           ...:           ...: print("new STIX2 id: {}".format(stix2_id))            new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e            BONUS: python functions to convert STIX1.2 IDREFs into STIX2
    identifiers - https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5            Thank you,    Sergey Polzunov    EclecticIQ                



  • 3.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 20:11
    Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > I would also support this. > > I have learned more about the inner workings of UUID4/5 and I don't have > any reservations about it anymore. The odds of collision with a > properly-implemented UUID5 are on-par with UUID4 > > As far as John's comment below - all this means IMHO is the library has to > force you to provide a namespace (ie make it a mandatory argument in your > constructor or whatever). The one requirement I would like to make sure about UUIDv5 is that it is NOT based upon the data from the object, otherwise versioning will break. The reason we didn't use UUIDc4 as most of the proposals to use it was to make it a hash of the contents, such as name and description, and then update the UUID whenever the name and/or description changed.. If we do this, the name space should probably be the identity of the new STIX2 object. This would prevent collisions from happening when two entities try to create a "new" STIX2 object from a STIX1 object... > From: "Wunder, John A." <jwunder@mitre.org> > To: Sergey Polzunov <sergey@eclecticiq.com>, > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > Date: 02/04/2019 12:22 PM > Subject: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in > STIX2 identifier > Sent by: <cti-stix@lists.oasis-open.org> > > > > I've been thinking a lot about this and I think it makes sense. > > One of the concerns we had at the time we chose UUID4 is that users of > libraries like python-stix would need to remember to set the UUID5 > namespace -- or, if they don't and python-stix has some default namespace, > different tools using the libraries could have overlapping IDs. This would > also apply to users of the new Java libraries that I've seen come out. It > might mean these libraries requiring that people set a unique namespace > before creating any objects, vs. now where it can just go ahead and create > IDs by default. I'd be curious what other people think about this problem > and how we can help avoid it becoming an issue (especially given how many > people use those libraries). > > John > > ïOn 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > sergey@eclecticiq.com> wrote: > > Hey everybody! > > Current STIX2 spec definition of an`identifier` for STIX2 objects is > as follows: > > > An identifier universally and uniquely identifies a SDO, SRO, > Bundle, or Marking Definition. Identifiers MUST follow the form > object-type--UUIDv4, where object-type is the exact value (all type names > are lowercase strings, by definition) from the type property of the object > being identified or referenced and where the UUIDv4 is an RFC > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. > from > https://urldefense.proofpoint.com/v2/url?u=http-3A__docs.oasis-2Dopen.org_cti_stix_v2.0_cs01_part1-2Dstix-2Dcore_stix-2Dv2.0-2Dcs01-2Dpart1-2Dstix-2Dcore.html-23-5FToc496709265&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=zfrnxDjWpZcsez2rf97Zaly0E5lxTlnVx_Ert-ibfNs&e= > > > I think the requirement to have UUID4 brings more problems than > benefits. It makes STIX1->STIX2 transition difficult, hurting existing > STIX1 users. > I will try to show it in these 2 use cases. > > > Use case 1 > ---------- > Imagine that I'm a client of an intelligence provider A. I've been > a client for a long time and I have received intelligence in STIX1.2, > which I stored in my DB. I fetch new intelligence daily, downloading only > fresh data. Often fresh data links to old objects for context. > Provider A decides to upgrade and switch to STIX2. In addition to > an old STIX1.2 feed, provider creates new STIX2 feed with the same data. > In STIX2 all objects have new identifiers and Provider A does not bother > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have > 2 options: > - clean slate option: drop all old data from this provider and > re-fetch everything. That will work if Provider A is the only provider I > use or if I never referenced Provider A's data from my own intelligence. > Not a great plan. > - new era option: leave my STIX1.2 data graph in place and start > consuming new STIX2 feed from today. This option has one big issue: new > STIX2 data will not be connected to STIX1.2 data I already have, because > STIX2 ids are all different. If I want to deduce connection, I need to > deduplicate the data against my existing STIX1.2 DB. This means my > ingestion pipeline must be smart enough to compare STIX1.2 objects to > STIX2 objects and be fast enough to do that for every new STIX2 object. > This will be difficult to implement and will have a huge performance > penalty. > > Use case 2 > ---------- > Imagine that I'm a NCSC. I receive intelligence from providers, > combine it and distribute it to my clients. My providers are still on > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive > into STIX2. Full STIX1.2 entities I can transform easily but what do I do > with IDREFs I have in my STIX1.2 data? > I can generate new STIX2 id every time I see new STIX1.2 IDREF in > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next > time I see this IDREF. This is painful and will require additional > resources, but it is doable. But it will only work until the moment my > providers switch to STIX2 and start sending me full objects for those > IDREFs with new random STIX2 identifiers! I can not predict these > identifiers and I can't match them with the ones I generated. So my > thinking is - what is the point in even bothering with old IDREFs? I will > just drop them, sending my clients sometimes disconnected STIX2 entities, > hoping that they will figure it out. > > > Proposed solutions if UUID5 is allowed in STIX2 identifiers: > > Use case 1 solution > ------------------- > There can be a guideline that will recommend providers to use old > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are > predictable I, as a client, can greatly simplify my deduplication logic. I > can run DB migration once to calculate STIX2 identifiers for all my > STIX1.2 objects and use these on ingestion for deduplication. Appending > STIX2 data to my STIX1.2 DB will be much easier. > I'm also interested in pushing Provider A to adopting this STIX2 > identifier generation practice because it will save me money. > > Use case 2 solution > ------------------- > WIth UUID5 I have a way out: I can generate new STIX2 ids from old > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct > type to build new STIX2 identifier. The logic will be like this: > - full IDREF will be input for UUID5 function > - for STIX1.2 types that were split (like TTP), I do not know > exact STIX2 type Provider would use for old TTP. My solution here would be > to play safe and create relations for all possible types: for IDREF to > TTP, I will create 4 relations: one to a possible Tool object, one to > Malware, one to Attack Pattern and one to Identity. It is an overhead but > it is a small price for keeping interconnected intelligence graph. > Again, when time comes and my providers move to STIX2, I'm > interested in pushing them to adopt this id generation schema for old > objects, because it will save me, as NCSC, money. > > > To reiterate, I would like to propose: > - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; > - creating a guideline, complimentary to the spec, that would explain > how STIX1.2 ids can be transformed into STIX2 for easier transition. > > > Practicalities: > > UUID5 ids require use of a namespace. UUID5 RFC ( > https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc4122-23section-2D4.3&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=4S3JJDGuEB4dp-nbUpj_i5LYOFl6lwNf8raqg591Nvk&e= > ) defines some generic namespaces ( > https://urldefense.proofpoint.com/v2/url?u=https-3A__tools.ietf.org_html_rfc4122-23appendix-2DC&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=8k2RlQd8GRtjQsdrXu2UMQRVCF2r4OlooXiwasJZVPM&e= > ) but does not prohibits the use of custom ones. I suggest this algorithm: > - namespace UUID5 is generated by using predefined `NameSpace_URL` > namespace and producer's URL; > - for old objects, GUID part of STIX2 identifier is namespaced > UUID5 generated from old STIX1.2 id > - for new objects, GUID part of STIX2 identifier is either > namespaced UUID5 with random UUID4 string, or just random UUID4. > > Example python code for generating UUID5 with custom namespace: > > In [1]: import uuid > ...: > ...: stix12_id = > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' > ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, > ' https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e= ') > ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) > ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) > ...: > ...: print("new STIX2 id: {}".format(stix2_id)) > > new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e > > > BONUS: python functions to convert STIX1.2 IDREFs into STIX2 > identifiers - > https://urldefense.proofpoint.com/v2/url?u=https-3A__gist.github.com_traut_fd4b9b8de3c2aa0e161d68c4099656e5&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=SRn1ZkLBt4wI8lc3BSNnzXQpqEX_D0jvJqeWW4qlfhI&e= > > > > Thank you, > Sergey Polzunov > EclecticIQ > > > > > > > > > > -- John-Mark


  • 4.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 20:20
    I would think we would want to use a DNS
    or URL namespace, would we not? - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:      
      John-Mark Gurney <jmg@newcontext.com> To:      
      Jason Keirstead <Jason.Keirstead@ca.ibm.com> Cc:      
      "Wunder, John
    A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> Date:      
      02/04/2019 04:10 PM Subject:    
        Re: [cti-stix]
    Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Jason Keirstead wrote this message on Mon, Feb 04,
    2019 at 14:08 -0400: > I would also support this. > > I have learned more about the inner workings of UUID4/5 and I don't
    have > any reservations about it anymore. The odds of collision with a > properly-implemented UUID5 are on-par with UUID4 > > As far as John's comment below - all this means IMHO is the library
    has to > force you to provide a namespace (ie make it a mandatory argument
    in your > constructor or whatever). The one requirement I would like to make sure about UUIDv5 is that it is NOT based upon the data from the object, otherwise versioning will break. The reason we didn't use UUIDc4 as most of the proposals to use it was to make it a hash of the contents, such as name and description, and then update the UUID whenever the name and/or description changed.. If we do this, the name space should probably be the identity of the new STIX2 object.  This would prevent collisions from happening when two entities try to create a "new" STIX2 object from a STIX1
    object... > From:   "Wunder, John A." <jwunder@mitre.org> > To:     Sergey Polzunov <sergey@eclecticiq.com>, > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > Date:   02/04/2019 12:22 PM > Subject:        [cti-stix] Re: [EXT] [cti-stix]
    ability to use UUID5 in > STIX2 identifier > Sent by:        <cti-stix@lists.oasis-open.org> > > > > I've been thinking a lot about this and I think it makes sense. > > One of the concerns we had at the time we chose UUID4 is that users
    of > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This
    would > also apply to users of the new Java libraries that I've seen come
    out. It > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and
    create > IDs by default. I'd be curious what other people think about this
    problem > and how we can help avoid it becoming an issue (especially given how
    many > people use those libraries). > > John > > ïOn 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf
    of Sergey > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > sergey@eclecticiq.com> wrote: > >     Hey everybody! >   >     Current STIX2 spec definition of an`identifier` for
    STIX2 objects is > as follows: >   >     > An identifier universally and uniquely identifies
    a SDO, SRO, > Bundle, or Marking Definition. Identifiers MUST follow the form > object-type--UUIDv4, where object-type is the exact value (all type
    names > are lowercase strings, by definition) from the type property of the
    object > being identified or referenced and where the UUIDv4 is an RFC > 4122-compliant Version 4 UUID. The UUID MUST be generated according
    to the > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. >     from > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > >   >     I think the requirement to have UUID4 brings more problems
    than > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users. >     I will try to show it in these 2 use cases. >   >   >     Use case 1 >     ---------- >         Imagine that I'm a client of an intelligence
    provider A. I've been > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading
    only > fresh data. Often fresh data links to old objects for context. >         Provider A decides to upgrade and switch
    to STIX2. In addition to > an old STIX1.2 feed, provider creates new STIX2 feed with the same
    data. > In STIX2 all objects have new identifiers and Provider A does not
    bother > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client,
    have > 2 options: >         - clean slate option: drop all old data
    from this provider and > re-fetch everything. That will work if Provider A is the only provider
    I > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan. >         - new era option: leave my STIX1.2 data
    graph in place and start > consuming new STIX2 feed from today. This option has one big issue:
    new > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need
    to > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects
    to > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty. >   >     Use case 2 >     ---------- >         Imagine that I'm a NCSC. I receive intelligence
    from providers, > combine it and distribute it to my clients. My providers are still
    on > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what
    do I do > with IDREFs I have in my STIX1.2 data? >         I can generate new STIX2 id every time
    I see new STIX1.2 IDREF in > incoming data and store STIX1.2->STIX2 mapping somewhere to be
    used next > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment
    my > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs?
    I will > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out. >   >   >     Proposed solutions if UUID5 is allowed in STIX2 identifiers: >   >     Use case 1 solution >     ------------------- >         There can be a guideline that will recommend
    providers to use old > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers
    are > predictable I, as a client, can greatly simplify my deduplication
    logic. I > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier. >         I'm also interested in pushing Provider
    A to adopting this STIX2 > identifier generation practice because it will save me money. >   >     Use case 2 solution >     ------------------- >     WIth UUID5 I have a way out: I can generate new STIX2
    ids from old > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this: >         - full IDREF will be input for UUID5 function >         - for STIX1.2 types that were split (like
    TTP), I do not know > exact STIX2 type Provider would use for old TTP. My solution here
    would be > to play safe and create relations for all possible types: for IDREF
    to > TTP, I will create 4 relations: one to a possible Tool object, one
    to > Malware, one to Attack Pattern and one to Identity. It is an overhead
    but > it is a small price for keeping interconnected intelligence graph. >         Again, when time comes and my providers
    move to STIX2, I'm > interested in pushing them to adopt this id generation schema for
    old > objects, because it will save me, as NCSC, money. >   >   >     To reiterate, I would like to propose: >     - a change in STIX2 spec to allow both UUID5 and UUID4
    to be used in > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; >     - creating a guideline, complimentary to the spec, that
    would explain > how STIX1.2 ids can be transformed into STIX2 for easier transition. >   >   >     Practicalities: >   >     UUID5 ids require use of a namespace. UUID5 RFC ( > https://tools.ietf.org/html/rfc4122#section-4.3 > ) defines some generic namespaces ( > https://tools.ietf.org/html/rfc4122#appendix-C > ) but does not prohibits the use of custom ones. I suggest this algorithm: >         - namespace UUID5 is generated by using
    predefined `NameSpace_URL` > namespace and producer's URL; >         - for old objects, GUID part of STIX2
    identifier is namespaced > UUID5 generated from old STIX1.2 id >         - for new objects, GUID part of STIX2
    identifier is either > namespaced UUID5 with random UUID4 string, or just random UUID4. >   >     Example python code for generating UUID5 with custom
    namespace: >   >         In [1]: import uuid >            ...: >            ...: stix12_id = > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=') >            ...: stix2_uuid = uuid.uuid5(namespace_uuid,
    stix12_id) >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) >            ...: >            ...: print("new STIX2
    id: {}".format(stix2_id)) >   >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e >   >   >     BONUS: python functions to convert STIX1.2 IDREFs into
    STIX2 > identifiers - > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > >   >   >     Thank you, >     Sergey Polzunov >     EclecticIQ >   >   >   >   > > > > > > -- John-Mark