CTI STIX Subcommittee

 View Only
Expand all | Collapse all

Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

  • 1.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 20:30




    It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs.
     
    The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables.
    The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable.
     

    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262

    E: sean.barnum@fireeye.com
     

    From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:25 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    Actually, I don't agree with this part.

    The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>,
    Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:23 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>






    Agree.
     
    And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID.
    It should just not be done on ALL properties of the object.
    We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs.
     
    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com
     
    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:20 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
     
    I would think we would want to use a DNS or URL namespace, would we not?

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         John-Mark Gurney <jmg@newcontext.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:10 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier








    Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400:
    > I would also support this.
    >
    > I have learned more about the inner workings of UUID4/5 and I don't have
    > any reservations about it anymore. The odds of collision with a
    > properly-implemented UUID5 are on-par with UUID4
    >
    > As far as John's comment below - all this means IMHO is the library has to
    > force you to provide a namespace (ie make it a mandatory argument in your
    > constructor or whatever).

    The one requirement I would like to make sure about UUIDv5 is that it
    is NOT based upon the data from the object, otherwise versioning will
    break.

    The reason we didn't use UUIDc4 as most of the proposals to use it was
    to make it a hash of the contents, such as name and description, and
    then update the UUID whenever the name and/or description changed..

    If we do this, the name space should probably be the identity of the
    new STIX2 object.  This would prevent collisions from happening when
    two entities try to create a "new" STIX2 object from a STIX1 object...

    > From:   "Wunder, John A." <jwunder@mitre.org>
    > To:     Sergey Polzunov <sergey@eclecticiq.com>,
    > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
    > Date:   02/04/2019 12:22 PM
    > Subject:        [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    > STIX2 identifier
    > Sent by:        <cti-stix@lists.oasis-open.org>
    >
    >
    >
    > I've been thinking a lot about this and I think it makes sense.
    >
    > One of the concerns we had at the time we chose UUID4 is that users of
    > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This would
    > also apply to users of the new Java libraries that I've seen come out. It
    > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and create
    > IDs by default. I'd be curious what other people think about this problem
    > and how we can help avoid it becoming an issue (especially given how many
    > people use those libraries).
    >
    > John
    >
    > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey
    > Polzunov" <cti-stix@lists.oasis-open.org on behalf of
    > sergey@eclecticiq.com> wrote:
    >
    >     Hey everybody!
    >  
    >     Current STIX2 spec definition of an`identifier` for STIX2 objects is
    > as follows:
    >  
    >     > An identifier universally and uniquely identifies a SDO, SRO,
    > Bundle, or Marking Definition. Identifiers MUST follow the form
    > object-type--UUIDv4, where object-type is the exact value (all type names
    > are lowercase strings, by definition) from the type property of the object
    > being identified or referenced and where the UUIDv4 is an RFC
    > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the
    > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122].
    >     from
    > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265
    >
    >  
    >     I think the requirement to have UUID4 brings more problems than
    > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users.
    >     I will try to show it in these 2 use cases.
    >  
    >  
    >     Use case 1
    >     ----------
    >         Imagine that I'm a client of an intelligence provider A. I've been
    > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading only
    > fresh data. Often fresh data links to old objects for context.
    >         Provider A decides to upgrade and switch to STIX2. In addition to
    > an old STIX1.2 feed, provider creates new STIX2 feed with the same data.
    > In STIX2 all objects have new identifiers and Provider A does not bother
    > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have
    > 2 options:
    >         - clean slate option: drop all old data from this provider and
    > re-fetch everything. That will work if Provider A is the only provider I
    > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan.
    >         - new era option: leave my STIX1.2 data graph in place and start
    > consuming new STIX2 feed from today. This option has one big issue: new
    > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need to
    > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects to
    > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty.
    >  
    >     Use case 2
    >     ----------
    >         Imagine that I'm a NCSC. I receive intelligence from providers,
    > combine it and distribute it to my clients. My providers are still on
    > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what do I do
    > with IDREFs I have in my STIX1.2 data?
    >         I can generate new STIX2 id every time I see new STIX1.2 IDREF in
    > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next
    > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment my
    > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs? I will
    > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out.
    >  
    >  
    >     Proposed solutions if UUID5 is allowed in STIX2 identifiers:
    >  
    >     Use case 1 solution
    >     -------------------
    >         There can be a guideline that will recommend providers to use old
    > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are
    > predictable I, as a client, can greatly simplify my deduplication logic. I
    > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier.
    >         I'm also interested in pushing Provider A to adopting this STIX2
    > identifier generation practice because it will save me money.
    >  
    >     Use case 2 solution
    >     -------------------
    >     WIth UUID5 I have a way out: I can generate new STIX2 ids from old
    > STIX1.2 ids! I can parse IDREF value, that looks like `[ns
    > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this:
    >         - full IDREF will be input for UUID5 function
    >         - for STIX1.2 types that were split (like TTP), I do not know
    > exact STIX2 type Provider would use for old TTP. My solution here would be
    > to play safe and create relations for all possible types: for IDREF to
    > TTP, I will create 4 relations: one to a possible Tool object, one to
    > Malware, one to Attack Pattern and one to Identity. It is an overhead but
    > it is a small price for keeping interconnected intelligence graph.
    >         Again, when time comes and my providers move to STIX2, I'm
    > interested in pushing them to adopt this id generation schema for old
    > objects, because it will save me, as NCSC, money.
    >  
    >  
    >     To reiterate, I would like to propose:
    >     - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in
    > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities;
    >     - creating a guideline, complimentary to the spec, that would explain
    > how STIX1.2 ids can be transformed into STIX2 for easier transition.
    >  
    >  
    >     Practicalities:
    >  
    >     UUID5 ids require use of a namespace. UUID5 RFC (
    > https://tools.ietf.org/html/rfc4122#section-4.3
    > ) defines some generic namespaces (
    > https://tools.ietf.org/html/rfc4122#appendix-C
    > ) but does not prohibits the use of custom ones. I suggest this algorithm:
    >         - namespace UUID5 is generated by using predefined `NameSpace_URL`
    > namespace and producer's URL;
    >         - for old objects, GUID part of STIX2 identifier is namespaced
    > UUID5 generated from old STIX1.2 id
    >         - for new objects, GUID part of STIX2 identifier is either
    > namespaced UUID5 with random UUID4 string, or just random UUID4.
    >  
    >     Example python code for generating UUID5 with custom namespace:
    >  
    >         In [1]: import uuid
    >            ...:
    >            ...: stix12_id =
    > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473'
    >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=')
    >            ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id)
    >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid)
    >            ...:
    >            ...: print("new STIX2 id: {}".format(stix2_id))
    >  
    >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e
    >  
    >  
    >     BONUS: python functions to convert STIX1.2 IDREFs into STIX2
    > identifiers -
    > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5
    >
    >  
    >  
    >     Thank you,
    >     Sergey Polzunov
    >     EclecticIQ
    >  
    >  
    >  
    >  
    >
    >
    >
    >
    >
    >

    --
    John-Mark




    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     

    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited.
    If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.





  • 2.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 20:34
    I don't think that figuring out semantic
    equivalence should be a goal here - referring back to Sergey's original
    email, that is not why people are trying to do this. People want to do this so they can have
    a bi-directional traceability method from a STIX ID back and forth into
    an already-existing ID system. Forcing them to generate the IDs based
    on properties, defeats the whole purpose. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:      
      Sean Barnum <sean.barnum@FireEye.com> To:      
      Jason Keirstead <Jason.Keirstead@ca.ibm.com> Cc:      
      "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov
    <sergey@eclecticiq.com> Date:      
      02/04/2019 04:30 PM Subject:    
        Re: [cti-stix]
    Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Sent by:    
        <cti-stix@lists.oasis-open.org> It is to assist in semantic equivalence
    normalization across producers. Just like we have done for SCOs.   The reality is that objects such as
    Locations, Identities, etc are likely to be repeated and widely used about
    as much as many Observables. The ability to inherently converge on
    equivalence of things like Locations and Identities via UUIDv5 calculation
    is extremely valuable.   Sean Barnum Principal Architect FireEye M: 703.473.8262 E: sean.barnum@fireeye.com   From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:25 PM To: Sean Barnum <sean.barnum@FireEye.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>,
    John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A."
    <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    STIX2 identifier   Actually, I don't agree with this part. The entire point of UUIDv5, is that I should not care what method you use
    to compute your IDs - because it's in your namespace, so its not my problem
    anymore. I don't think we want to codify it. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:         Sean
    Barnum <sean.barnum@FireEye.com> To:         Jason Keirstead
    <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> Cc:         "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>,
    Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019
    04:23 PM Subject:         Re:
    [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Sent by:         <cti-stix@lists.oasis-open.org> Agree. And I would suggest we DO want the calculation to be done based upon
    the data from the object . As that is how we get value from such an ID. It should just not be done on ALL properties of the object. We just need to define the semantically relevant properties to use for
    the calculation. This is exactly what we have just done for SCOs. Sean Barnum Principal Architect FireEye M: 703.473.8262 E: sean.barnum@fireeye.com From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead
    <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:20 PM To: John-Mark Gurney <jmg@newcontext.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov
    <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    STIX2 identifier I would think we would want to use a DNS or URL namespace, would we not? - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:         John-Mark
    Gurney <jmg@newcontext.com> To:         Jason Keirstead
    <Jason.Keirstead@ca.ibm.com> Cc:         "Wunder,
    John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019
    04:10 PM Subject:         Re:
    [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > I would also support this. > > I have learned more about the inner workings of UUID4/5 and I don't
    have > any reservations about it anymore. The odds of collision with a > properly-implemented UUID5 are on-par with UUID4 > > As far as John's comment below - all this means IMHO is the library
    has to > force you to provide a namespace (ie make it a mandatory argument
    in your > constructor or whatever). The one requirement I would like to make sure about UUIDv5 is that it is NOT based upon the data from the object, otherwise versioning will break. The reason we didn't use UUIDc4 as most of the proposals to use it was to make it a hash of the contents, such as name and description, and then update the UUID whenever the name and/or description changed.. If we do this, the name space should probably be the identity of the new STIX2 object.  This would prevent collisions from happening when two entities try to create a "new" STIX2 object from a STIX1
    object... > From:   "Wunder, John A." <jwunder@mitre.org> > To:     Sergey Polzunov <sergey@eclecticiq.com>, > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > Date:   02/04/2019 12:22 PM > Subject:        [cti-stix] Re: [EXT] [cti-stix]
    ability to use UUID5 in > STIX2 identifier > Sent by:        <cti-stix@lists.oasis-open.org> > > > > I've been thinking a lot about this and I think it makes sense. > > One of the concerns we had at the time we chose UUID4 is that users
    of > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This
    would > also apply to users of the new Java libraries that I've seen come
    out. It > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and
    create > IDs by default. I'd be curious what other people think about this
    problem > and how we can help avoid it becoming an issue (especially given how
    many > people use those libraries). > > John > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf
    of Sergey > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > sergey@eclecticiq.com> wrote: > >     Hey everybody! >   >     Current STIX2 spec definition of an`identifier` for
    STIX2 objects is > as follows: >   >     > An identifier universally and uniquely identifies
    a SDO, SRO, > Bundle, or Marking Definition. Identifiers MUST follow the form > object-type--UUIDv4, where object-type is the exact value (all type
    names > are lowercase strings, by definition) from the type property of the
    object > being identified or referenced and where the UUIDv4 is an RFC > 4122-compliant Version 4 UUID. The UUID MUST be generated according
    to the > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. >     from > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > >   >     I think the requirement to have UUID4 brings more problems
    than > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users. >     I will try to show it in these 2 use cases. >   >   >     Use case 1 >     ---------- >         Imagine that I'm a client of an intelligence
    provider A. I've been > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading
    only > fresh data. Often fresh data links to old objects for context. >         Provider A decides to upgrade and switch
    to STIX2. In addition to > an old STIX1.2 feed, provider creates new STIX2 feed with the same
    data. > In STIX2 all objects have new identifiers and Provider A does not
    bother > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client,
    have > 2 options: >         - clean slate option: drop all old data
    from this provider and > re-fetch everything. That will work if Provider A is the only provider
    I > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan. >         - new era option: leave my STIX1.2 data
    graph in place and start > consuming new STIX2 feed from today. This option has one big issue:
    new > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need
    to > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects
    to > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty. >   >     Use case 2 >     ---------- >         Imagine that I'm a NCSC. I receive intelligence
    from providers, > combine it and distribute it to my clients. My providers are still
    on > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what
    do I do > with IDREFs I have in my STIX1.2 data? >         I can generate new STIX2 id every time
    I see new STIX1.2 IDREF in > incoming data and store STIX1.2->STIX2 mapping somewhere to be
    used next > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment
    my > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs?
    I will > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out. >   >   >     Proposed solutions if UUID5 is allowed in STIX2 identifiers: >   >     Use case 1 solution >     ------------------- >         There can be a guideline that will recommend
    providers to use old > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers
    are > predictable I, as a client, can greatly simplify my deduplication
    logic. I > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier. >         I'm also interested in pushing Provider
    A to adopting this STIX2 > identifier generation practice because it will save me money. >   >     Use case 2 solution >     ------------------- >     WIth UUID5 I have a way out: I can generate new STIX2
    ids from old > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this: >         - full IDREF will be input for UUID5 function >         - for STIX1.2 types that were split (like
    TTP), I do not know > exact STIX2 type Provider would use for old TTP. My solution here
    would be > to play safe and create relations for all possible types: for IDREF
    to > TTP, I will create 4 relations: one to a possible Tool object, one
    to > Malware, one to Attack Pattern and one to Identity. It is an overhead
    but > it is a small price for keeping interconnected intelligence graph. >         Again, when time comes and my providers
    move to STIX2, I'm > interested in pushing them to adopt this id generation schema for
    old > objects, because it will save me, as NCSC, money. >   >   >     To reiterate, I would like to propose: >     - a change in STIX2 spec to allow both UUID5 and UUID4
    to be used in > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; >     - creating a guideline, complimentary to the spec, that
    would explain > how STIX1.2 ids can be transformed into STIX2 for easier transition. >   >   >     Practicalities: >   >     UUID5 ids require use of a namespace. UUID5 RFC ( > https://tools.ietf.org/html/rfc4122#section-4.3 > ) defines some generic namespaces ( > https://tools.ietf.org/html/rfc4122#appendix-C > ) but does not prohibits the use of custom ones. I suggest this algorithm: >         - namespace UUID5 is generated by using
    predefined `NameSpace_URL` > namespace and producer's URL; >         - for old objects, GUID part of STIX2
    identifier is namespaced > UUID5 generated from old STIX1.2 id >         - for new objects, GUID part of STIX2
    identifier is either > namespaced UUID5 with random UUID4 string, or just random UUID4. >   >     Example python code for generating UUID5 with custom
    namespace: >   >         In [1]: import uuid >            ...: >            ...: stix12_id = > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=') >            ...: stix2_uuid = uuid.uuid5(namespace_uuid,
    stix12_id) >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) >            ...: >            ...: print("new STIX2
    id: {}".format(stix2_id)) >   >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e >   >   >     BONUS: python functions to convert STIX1.2 IDREFs into
    STIX2 > identifiers - > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > >   >   >     Thank you, >     Sergey Polzunov >     EclecticIQ >   >   >   >   > > > > > > -- John-Mark This email and any attachments thereto may contain private,
    confidential, and/or privileged material for the sole use of the intended
    recipient. Any review, copying, or distribution of this email (or any attachments
    thereto) by others is strictly prohibited. If you are not the intended
    recipient, please contact the sender immediately and permanently delete
    the original and any copies of this email and any attachments thereto.
      This email and any attachments thereto may contain private,
    confidential, and/or privileged material for the sole use of the intended
    recipient. Any review, copying, or distribution of this email (or any attachments
    thereto) by others is strictly prohibited. If you are not the intended
    recipient, please contact the sender immediately and permanently delete
    the original and any copies of this email and any attachments thereto.




  • 3.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 20:41




    Yeah I agree with Jason here.
     
    I don t think that would work for UUIDv5 anyway, since everyone will be using different namespaces even if you hash the data the same way you end up with different (non-transparent) IDs:
     
    >>> uid = uuid.uuid4()
    >>> uuid.uuid5(uid, "data")
    UUID('c5bb29ba-4d85-5280-b202-7085b7485b62')
    >>> uid2 = uuid.uuid4()
    >>> uuid.uuid5(uid2, "data")
    UUID('d09fafa9-ead3-54f1-b1c7-1d1d55f66fb7')
     
    You can see that with the same data (just the string literal data ) you end up with different IDs. You could of course use the namespace of a different producer to try to back generate a UUIDv5 for an object they ve presumably created,
    but IMO that s a recipe for trouble (did they actually create that object?). Instead you should just rely on having received that object before and use whatever UUID it had.
     
    It seems like the simplest thing is to just allow UUIDv5 via the various proposals we ve seen. It meets the use cases below and is a very straightforward change.
     
    John
     

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:35 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, John Wunder <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    I don't think that figuring out semantic equivalence should be a goal here - referring back to Sergey's original email, that is not why people are trying to do this.


    People want to do this so they can have a bi-directional traceability method from a STIX ID back and forth into an already-existing ID system.

    Forcing them to generate the IDs based on properties, defeats the whole purpose.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:30 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>






    It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs.
     
    The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables.
    The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable.
     
    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com
     
    From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:25 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
     
    Actually, I don't agree with this part.

    The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:23 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>







    Agree.

    And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID.
    It should just not be done on ALL properties of the object.
    We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs.

    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:20 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    I would think we would want to use a DNS or URL namespace, would we not?

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         John-Mark Gurney <jmg@newcontext.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:10 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier









    Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400:
    > I would also support this.
    >
    > I have learned more about the inner workings of UUID4/5 and I don't have
    > any reservations about it anymore. The odds of collision with a
    > properly-implemented UUID5 are on-par with UUID4
    >
    > As far as John's comment below - all this means IMHO is the library has to
    > force you to provide a namespace (ie make it a mandatory argument in your
    > constructor or whatever).

    The one requirement I would like to make sure about UUIDv5 is that it
    is NOT based upon the data from the object, otherwise versioning will
    break.

    The reason we didn't use UUIDc4 as most of the proposals to use it was
    to make it a hash of the contents, such as name and description, and
    then update the UUID whenever the name and/or description changed..

    If we do this, the name space should probably be the identity of the
    new STIX2 object.  This would prevent collisions from happening when
    two entities try to create a "new" STIX2 object from a STIX1 object...

    > From:   "Wunder, John A." <jwunder@mitre.org>
    > To:     Sergey Polzunov <sergey@eclecticiq.com>,
    > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
    > Date:   02/04/2019 12:22 PM
    > Subject:        [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    > STIX2 identifier
    > Sent by:        <cti-stix@lists.oasis-open.org>
    >
    >
    >
    > I've been thinking a lot about this and I think it makes sense.
    >
    > One of the concerns we had at the time we chose UUID4 is that users of
    > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This would
    > also apply to users of the new Java libraries that I've seen come out. It
    > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and create
    > IDs by default. I'd be curious what other people think about this problem
    > and how we can help avoid it becoming an issue (especially given how many
    > people use those libraries).
    >
    > John
    >
    > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey
    > Polzunov" <cti-stix@lists.oasis-open.org on behalf of
    > sergey@eclecticiq.com> wrote:
    >
    >     Hey everybody!
    >  
    >     Current STIX2 spec definition of an`identifier` for STIX2 objects is
    > as follows:
    >  
    >     > An identifier universally and uniquely identifies a SDO, SRO,
    > Bundle, or Marking Definition. Identifiers MUST follow the form
    > object-type--UUIDv4, where object-type is the exact value (all type names
    > are lowercase strings, by definition) from the type property of the object
    > being identified or referenced and where the UUIDv4 is an RFC
    > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the
    > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122].
    >     from
    > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265
    >
    >  
    >     I think the requirement to have UUID4 brings more problems than
    > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users.
    >     I will try to show it in these 2 use cases.
    >  
    >  
    >     Use case 1
    >     ----------
    >         Imagine that I'm a client of an intelligence provider A. I've been
    > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading only
    > fresh data. Often fresh data links to old objects for context.
    >         Provider A decides to upgrade and switch to STIX2. In addition to
    > an old STIX1.2 feed, provider creates new STIX2 feed with the same data.
    > In STIX2 all objects have new identifiers and Provider A does not bother
    > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have
    > 2 options:
    >         - clean slate option: drop all old data from this provider and
    > re-fetch everything. That will work if Provider A is the only provider I
    > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan.
    >         - new era option: leave my STIX1.2 data graph in place and start
    > consuming new STIX2 feed from today. This option has one big issue: new
    > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need to
    > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects to
    > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty.
    >  
    >     Use case 2
    >     ----------
    >         Imagine that I'm a NCSC. I receive intelligence from providers,
    > combine it and distribute it to my clients. My providers are still on
    > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what do I do
    > with IDREFs I have in my STIX1.2 data?
    >         I can generate new STIX2 id every time I see new STIX1.2 IDREF in
    > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next
    > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment my
    > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs? I will
    > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out.
    >  
    >  
    >     Proposed solutions if UUID5 is allowed in STIX2 identifiers:
    >  
    >     Use case 1 solution
    >     -------------------
    >         There can be a guideline that will recommend providers to use old
    > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are
    > predictable I, as a client, can greatly simplify my deduplication logic. I
    > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier.
    >         I'm also interested in pushing Provider A to adopting this STIX2
    > identifier generation practice because it will save me money.
    >  
    >     Use case 2 solution
    >     -------------------
    >     WIth UUID5 I have a way out: I can generate new STIX2 ids from old
    > STIX1.2 ids! I can parse IDREF value, that looks like `[ns
    > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this:
    >         - full IDREF will be input for UUID5 function
    >         - for STIX1.2 types that were split (like TTP), I do not know
    > exact STIX2 type Provider would use for old TTP. My solution here would be
    > to play safe and create relations for all possible types: for IDREF to
    > TTP, I will create 4 relations: one to a possible Tool object, one to
    > Malware, one to Attack Pattern and one to Identity. It is an overhead but
    > it is a small price for keeping interconnected intelligence graph.
    >         Again, when time comes and my providers move to STIX2, I'm
    > interested in pushing them to adopt this id generation schema for old
    > objects, because it will save me, as NCSC, money.
    >  
    >  
    >     To reiterate, I would like to propose:
    >     - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in
    > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities;
    >     - creating a guideline, complimentary to the spec, that would explain
    > how STIX1.2 ids can be transformed into STIX2 for easier transition.
    >  
    >  
    >     Practicalities:
    >  
    >     UUID5 ids require use of a namespace. UUID5 RFC (
    > https://tools.ietf.org/html/rfc4122#section-4.3
    > ) defines some generic namespaces (
    > https://tools.ietf.org/html/rfc4122#appendix-C
    > ) but does not prohibits the use of custom ones. I suggest this algorithm:
    >         - namespace UUID5 is generated by using predefined `NameSpace_URL`
    > namespace and producer's URL;
    >         - for old objects, GUID part of STIX2 identifier is namespaced
    > UUID5 generated from old STIX1.2 id
    >         - for new objects, GUID part of STIX2 identifier is either
    > namespaced UUID5 with random UUID4 string, or just random UUID4.
    >  
    >     Example python code for generating UUID5 with custom namespace:
    >  
    >         In [1]: import uuid
    >            ...:
    >            ...: stix12_id =
    > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473'
    >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=')
    >            ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id)
    >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid)
    >            ...:
    >            ...: print("new STIX2 id: {}".format(stix2_id))
    >  
    >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e
    >  
    >  
    >     BONUS: python functions to convert STIX1.2 IDREFs into STIX2
    > identifiers -
    > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5
    >
    >  
    >  
    >     Thank you,
    >     Sergey Polzunov
    >     EclecticIQ
    >  
    >  
    >  
    >  
    >
    >
    >
    >
    >
    >

    --
    John-Mark


    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     
    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     






  • 4.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 21:38




    We want distinct Deterministic ID UUIDs to discriminate between Sources of same/similar information and to better detect/prevent parroting and/or leakage.
     
    A given of community of trust can establish conventions within the UUIDv5 encoding as required (e.g., just Namespace, Namespace+Community_ID, Namespace+Key).   We should only be using and asserting immutable objects and using the canonical
    representations suggested.   A number of UUIDv5 use cases were proposed historically including non-attributional Source Traceability.
     
    However, none of this impacts those not wishing to participate in Deterministic Reference ID use cases, but empowers those of us who do.
     

    Patrick Maroney
    DarkLight
    Mobile: (609)841-5104
    Email:   patrick.maroney@darklight.ai
     
    www.darklight.ai

     
     

    From: <cti-stix@lists.oasis-open.org> on behalf of John Wunder <jwunder@mitre.org>
    Date: Monday, February 4, 2019 at 3:40 PM
    To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    Yeah I agree with Jason here.
     
    I don t think that would work for UUIDv5 anyway, since everyone will be using different namespaces even if you hash the data the same way you end up with different (non-transparent) IDs:
     
    >>> uid = uuid.uuid4()
    >>> uuid.uuid5(uid, "data")
    UUID('c5bb29ba-4d85-5280-b202-7085b7485b62')
    >>> uid2 = uuid.uuid4()
    >>> uuid.uuid5(uid2, "data")
    UUID('d09fafa9-ead3-54f1-b1c7-1d1d55f66fb7')
     
    You can see that with the same data (just the string literal data ) you end up with different IDs. You could of course use the namespace of a different producer to try to back generate a UUIDv5 for an object they ve presumably created,
    but IMO that s a recipe for trouble (did they actually create that object?). Instead you should just rely on having received that object before and use whatever UUID it had.
     
    It seems like the simplest thing is to just allow UUIDv5 via the various proposals we ve seen. It meets the use cases below and is a very straightforward change.
     
    John
     

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:35 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, John Wunder <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    I don't think that figuring out semantic equivalence should be a goal here - referring back to Sergey's original email, that is not why people are trying to do this.


    People want to do this so they can have a bi-directional traceability method from a STIX ID back and forth into an already-existing ID system.

    Forcing them to generate the IDs based on properties, defeats the whole purpose.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:30 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>






    It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs.
     
    The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables.
    The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable.
     
    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com
     
    From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:25 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
     
    Actually, I don't agree with this part.

    The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:23 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>







    Agree.

    And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID.
    It should just not be done on ALL properties of the object.
    We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs.

    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:20 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    I would think we would want to use a DNS or URL namespace, would we not?

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         John-Mark Gurney <jmg@newcontext.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:10 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier









    Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400:
    > I would also support this.
    >
    > I have learned more about the inner workings of UUID4/5 and I don't have
    > any reservations about it anymore. The odds of collision with a
    > properly-implemented UUID5 are on-par with UUID4
    >
    > As far as John's comment below - all this means IMHO is the library has to
    > force you to provide a namespace (ie make it a mandatory argument in your
    > constructor or whatever).

    The one requirement I would like to make sure about UUIDv5 is that it
    is NOT based upon the data from the object, otherwise versioning will
    break.

    The reason we didn't use UUIDc4 as most of the proposals to use it was
    to make it a hash of the contents, such as name and description, and
    then update the UUID whenever the name and/or description changed..

    If we do this, the name space should probably be the identity of the
    new STIX2 object.  This would prevent collisions from happening when
    two entities try to create a "new" STIX2 object from a STIX1 object...

    > From:   "Wunder, John A." <jwunder@mitre.org>
    > To:     Sergey Polzunov <sergey@eclecticiq.com>,
    > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
    > Date:   02/04/2019 12:22 PM
    > Subject:        [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    > STIX2 identifier
    > Sent by:        <cti-stix@lists.oasis-open.org>
    >
    >
    >
    > I've been thinking a lot about this and I think it makes sense.
    >
    > One of the concerns we had at the time we chose UUID4 is that users of
    > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This would
    > also apply to users of the new Java libraries that I've seen come out. It
    > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and create
    > IDs by default. I'd be curious what other people think about this problem
    > and how we can help avoid it becoming an issue (especially given how many
    > people use those libraries).
    >
    > John
    >
    > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey
    > Polzunov" <cti-stix@lists.oasis-open.org on behalf of
    > sergey@eclecticiq.com> wrote:
    >
    >     Hey everybody!
    >  
    >     Current STIX2 spec definition of an`identifier` for STIX2 objects is
    > as follows:
    >  
    >     > An identifier universally and uniquely identifies a SDO, SRO,
    > Bundle, or Marking Definition. Identifiers MUST follow the form
    > object-type--UUIDv4, where object-type is the exact value (all type names
    > are lowercase strings, by definition) from the type property of the object
    > being identified or referenced and where the UUIDv4 is an RFC
    > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the
    > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122].
    >     from
    > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265
    >
    >  
    >     I think the requirement to have UUID4 brings more problems than
    > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users.
    >     I will try to show it in these 2 use cases.
    >  
    >  
    >     Use case 1
    >     ----------
    >         Imagine that I'm a client of an intelligence provider A. I've been
    > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading only
    > fresh data. Often fresh data links to old objects for context.
    >         Provider A decides to upgrade and switch to STIX2. In addition to
    > an old STIX1.2 feed, provider creates new STIX2 feed with the same data.
    > In STIX2 all objects have new identifiers and Provider A does not bother
    > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have
    > 2 options:
    >         - clean slate option: drop all old data from this provider and
    > re-fetch everything. That will work if Provider A is the only provider I
    > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan.
    >         - new era option: leave my STIX1.2 data graph in place and start
    > consuming new STIX2 feed from today. This option has one big issue: new
    > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need to
    > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects to
    > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty.
    >  
    >     Use case 2
    >     ----------
    >         Imagine that I'm a NCSC. I receive intelligence from providers,
    > combine it and distribute it to my clients. My providers are still on
    > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what do I do
    > with IDREFs I have in my STIX1.2 data?
    >         I can generate new STIX2 id every time I see new STIX1.2 IDREF in
    > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next
    > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment my
    > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs? I will
    > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out.
    >  
    >  
    >     Proposed solutions if UUID5 is allowed in STIX2 identifiers:
    >  
    >     Use case 1 solution
    >     -------------------
    >         There can be a guideline that will recommend providers to use old
    > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are
    > predictable I, as a client, can greatly simplify my deduplication logic. I
    > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier.
    >         I'm also interested in pushing Provider A to adopting this STIX2
    > identifier generation practice because it will save me money.
    >  
    >     Use case 2 solution
    >     -------------------
    >     WIth UUID5 I have a way out: I can generate new STIX2 ids from old
    > STIX1.2 ids! I can parse IDREF value, that looks like `[ns
    > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this:
    >         - full IDREF will be input for UUID5 function
    >         - for STIX1.2 types that were split (like TTP), I do not know
    > exact STIX2 type Provider would use for old TTP. My solution here would be
    > to play safe and create relations for all possible types: for IDREF to
    > TTP, I will create 4 relations: one to a possible Tool object, one to
    > Malware, one to Attack Pattern and one to Identity. It is an overhead but
    > it is a small price for keeping interconnected intelligence graph.
    >         Again, when time comes and my providers move to STIX2, I'm
    > interested in pushing them to adopt this id generation schema for old
    > objects, because it will save me, as NCSC, money.
    >  
    >  
    >     To reiterate, I would like to propose:
    >     - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in
    > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities;
    >     - creating a guideline, complimentary to the spec, that would explain
    > how STIX1.2 ids can be transformed into STIX2 for easier transition.
    >  
    >  
    >     Practicalities:
    >  
    >     UUID5 ids require use of a namespace. UUID5 RFC (
    > https://tools.ietf.org/html/rfc4122#section-4.3
    > ) defines some generic namespaces (
    > https://tools.ietf.org/html/rfc4122#appendix-C
    > ) but does not prohibits the use of custom ones. I suggest this algorithm:
    >         - namespace UUID5 is generated by using predefined `NameSpace_URL`
    > namespace and producer's URL;
    >         - for old objects, GUID part of STIX2 identifier is namespaced
    > UUID5 generated from old STIX1.2 id
    >         - for new objects, GUID part of STIX2 identifier is either
    > namespaced UUID5 with random UUID4 string, or just random UUID4.
    >  
    >     Example python code for generating UUID5 with custom namespace:
    >  
    >         In [1]: import uuid
    >            ...:
    >            ...: stix12_id =
    > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473'
    >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=')
    >            ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id)
    >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid)
    >            ...:
    >            ...: print("new STIX2 id: {}".format(stix2_id))
    >  
    >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e
    >  
    >  
    >     BONUS: python functions to convert STIX1.2 IDREFs into STIX2
    > identifiers -
    > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5
    >
    >  
    >  
    >     Thank you,
    >     Sergey Polzunov
    >     EclecticIQ
    >  
    >  
    >  
    >  
    >
    >
    >
    >
    >
    >

    --
    John-Mark



    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     
    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     






  • 5.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 22:48




    There is a reason why created_by_ref property is a common property on all SDO. i.e. ability to attribute to the source of the information independent of the identity of the particular object id.
     
    There are also reasons why there are very strict rules defined in the STIX specification on when an entity is allowed to modify information vs not.
     
    I suggest we don t conflate identity of an individual object with the concept of source of that information and vs versa.
     
    I m not suggesting we don t consider UUIDv5 for ids of objects but rather we keep in mind there are reasons why we defined created_by and strict rules on modification/replication.
     
    Allan
     

    From: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> on behalf of Patrick Maroney <pmaroney@darklight.ai>
    Date: Monday, February 4, 2019 at 1:38 PM
    To: "Wunder, John" <jwunder@mitre.org>, Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    We want distinct Deterministic ID UUIDs to discriminate between Sources of same/similar information and to better detect/prevent parroting and/or leakage.
     
    A given of community of trust can establish conventions within the UUIDv5 encoding as required (e.g., just Namespace, Namespace+Community_ID, Namespace+Key).   We should only be using and asserting immutable objects and using the canonical
    representations suggested.   A number of UUIDv5 use cases were proposed historically including non-attributional Source Traceability.
     
    However, none of this impacts those not wishing to participate in Deterministic Reference ID use cases, but empowers those of us who do.
     

    Patrick Maroney
    DarkLight
    Mobile: (609)841-5104
    Email:   patrick.maroney@darklight.ai
     
    www.darklight.ai

     
     

    From: <cti-stix@lists.oasis-open.org> on behalf of John Wunder <jwunder@mitre.org>
    Date: Monday, February 4, 2019 at 3:40 PM
    To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    Yeah I agree with Jason here.
     
    I don t think that would work for UUIDv5 anyway, since everyone will be using different namespaces even if you hash the data the same way you end up with different (non-transparent) IDs:
     
    >>> uid = uuid.uuid4()
    >>> uuid.uuid5(uid, "data")
    UUID('c5bb29ba-4d85-5280-b202-7085b7485b62')
    >>> uid2 = uuid.uuid4()
    >>> uuid.uuid5(uid2, "data")
    UUID('d09fafa9-ead3-54f1-b1c7-1d1d55f66fb7')
     
    You can see that with the same data (just the string literal data ) you end up with different IDs. You could of course use the namespace of a different producer to try to back generate a UUIDv5 for an object they ve presumably created,
    but IMO that s a recipe for trouble (did they actually create that object?). Instead you should just rely on having received that object before and use whatever UUID it had.
     
    It seems like the simplest thing is to just allow UUIDv5 via the various proposals we ve seen. It meets the use cases below and is a very straightforward change.
     
    John
     

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:35 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, John Wunder <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    I don't think that figuring out semantic equivalence should be a goal here - referring back to Sergey's original email, that is not why people are trying to do this.


    People want to do this so they can have a bi-directional traceability method from a STIX ID back and forth into an already-existing ID system.

    Forcing them to generate the IDs based on properties, defeats the whole purpose.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:30 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>






    It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs.
     
    The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables.
    The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable.
     
    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com
     
    From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:25 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
     
    Actually, I don't agree with this part.

    The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:23 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>







    Agree.

    And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID.
    It should just not be done on ALL properties of the object.
    We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs.

    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:20 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    I would think we would want to use a DNS or URL namespace, would we not?

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         John-Mark Gurney <jmg@newcontext.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:10 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier









    Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400:
    > I would also support this.
    >
    > I have learned more about the inner workings of UUID4/5 and I don't have
    > any reservations about it anymore. The odds of collision with a
    > properly-implemented UUID5 are on-par with UUID4
    >
    > As far as John's comment below - all this means IMHO is the library has to
    > force you to provide a namespace (ie make it a mandatory argument in your
    > constructor or whatever).

    The one requirement I would like to make sure about UUIDv5 is that it
    is NOT based upon the data from the object, otherwise versioning will
    break.

    The reason we didn't use UUIDc4 as most of the proposals to use it was
    to make it a hash of the contents, such as name and description, and
    then update the UUID whenever the name and/or description changed..

    If we do this, the name space should probably be the identity of the
    new STIX2 object.  This would prevent collisions from happening when
    two entities try to create a "new" STIX2 object from a STIX1 object...

    > From:   "Wunder, John A." <jwunder@mitre.org>
    > To:     Sergey Polzunov <sergey@eclecticiq.com>,
    > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
    > Date:   02/04/2019 12:22 PM
    > Subject:        [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    > STIX2 identifier
    > Sent by:        <cti-stix@lists.oasis-open.org>
    >
    >
    >
    > I've been thinking a lot about this and I think it makes sense.
    >
    > One of the concerns we had at the time we chose UUID4 is that users of
    > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This would
    > also apply to users of the new Java libraries that I've seen come out. It
    > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and create
    > IDs by default. I'd be curious what other people think about this problem
    > and how we can help avoid it becoming an issue (especially given how many
    > people use those libraries).
    >
    > John
    >
    > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey
    > Polzunov" <cti-stix@lists.oasis-open.org on behalf of
    > sergey@eclecticiq.com> wrote:
    >
    >     Hey everybody!
    >  
    >     Current STIX2 spec definition of an`identifier` for STIX2 objects is
    > as follows:
    >  
    >     > An identifier universally and uniquely identifies a SDO, SRO,
    > Bundle, or Marking Definition. Identifiers MUST follow the form
    > object-type--UUIDv4, where object-type is the exact value (all type names
    > are lowercase strings, by definition) from the type property of the object
    > being identified or referenced and where the UUIDv4 is an RFC
    > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the
    > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122].
    >     from
    > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265
    >
    >  
    >     I think the requirement to have UUID4 brings more problems than
    > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users.
    >     I will try to show it in these 2 use cases.
    >  
    >  
    >     Use case 1
    >     ----------
    >         Imagine that I'm a client of an intelligence provider A. I've been
    > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading only
    > fresh data. Often fresh data links to old objects for context.
    >         Provider A decides to upgrade and switch to STIX2. In addition to
    > an old STIX1.2 feed, provider creates new STIX2 feed with the same data.
    > In STIX2 all objects have new identifiers and Provider A does not bother
    > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have
    > 2 options:
    >         - clean slate option: drop all old data from this provider and
    > re-fetch everything. That will work if Provider A is the only provider I
    > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan.
    >         - new era option: leave my STIX1.2 data graph in place and start
    > consuming new STIX2 feed from today. This option has one big issue: new
    > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need to
    > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects to
    > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty.
    >  
    >     Use case 2
    >     ----------
    >         Imagine that I'm a NCSC. I receive intelligence from providers,
    > combine it and distribute it to my clients. My providers are still on
    > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what do I do
    > with IDREFs I have in my STIX1.2 data?
    >         I can generate new STIX2 id every time I see new STIX1.2 IDREF in
    > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next
    > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment my
    > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs? I will
    > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out.
    >  
    >  
    >     Proposed solutions if UUID5 is allowed in STIX2 identifiers:
    >  
    >     Use case 1 solution
    >     -------------------
    >         There can be a guideline that will recommend providers to use old
    > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are
    > predictable I, as a client, can greatly simplify my deduplication logic. I
    > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier.
    >         I'm also interested in pushing Provider A to adopting this STIX2
    > identifier generation practice because it will save me money.
    >  
    >     Use case 2 solution
    >     -------------------
    >     WIth UUID5 I have a way out: I can generate new STIX2 ids from old
    > STIX1.2 ids! I can parse IDREF value, that looks like `[ns
    > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this:
    >         - full IDREF will be input for UUID5 function
    >         - for STIX1.2 types that were split (like TTP), I do not know
    > exact STIX2 type Provider would use for old TTP. My solution here would be
    > to play safe and create relations for all possible types: for IDREF to
    > TTP, I will create 4 relations: one to a possible Tool object, one to
    > Malware, one to Attack Pattern and one to Identity. It is an overhead but
    > it is a small price for keeping interconnected intelligence graph.
    >         Again, when time comes and my providers move to STIX2, I'm
    > interested in pushing them to adopt this id generation schema for old
    > objects, because it will save me, as NCSC, money.
    >  
    >  
    >     To reiterate, I would like to propose:
    >     - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in
    > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities;
    >     - creating a guideline, complimentary to the spec, that would explain
    > how STIX1.2 ids can be transformed into STIX2 for easier transition.
    >  
    >  
    >     Practicalities:
    >  
    >     UUID5 ids require use of a namespace. UUID5 RFC (
    > https://tools.ietf.org/html/rfc4122#section-4.3
    > ) defines some generic namespaces (
    > https://tools.ietf.org/html/rfc4122#appendix-C
    > ) but does not prohibits the use of custom ones. I suggest this algorithm:
    >         - namespace UUID5 is generated by using predefined `NameSpace_URL`
    > namespace and producer's URL;
    >         - for old objects, GUID part of STIX2 identifier is namespaced
    > UUID5 generated from old STIX1.2 id
    >         - for new objects, GUID part of STIX2 identifier is either
    > namespaced UUID5 with random UUID4 string, or just random UUID4.
    >  
    >     Example python code for generating UUID5 with custom namespace:
    >  
    >         In [1]: import uuid
    >            ...:
    >            ...: stix12_id =
    > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473'
    >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=')
    >            ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id)
    >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid)
    >            ...:
    >            ...: print("new STIX2 id: {}".format(stix2_id))
    >  
    >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e
    >  
    >  
    >     BONUS: python functions to convert STIX1.2 IDREFs into STIX2
    > identifiers -
    > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5
    >
    >  
    >  
    >     Thank you,
    >     Sergey Polzunov
    >     EclecticIQ
    >  
    >  
    >  
    >  
    >
    >
    >
    >
    >
    >

    --
    John-Mark




    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     
    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     






  • 6.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-04-2019 22:53




    No problem Allan.  The sole objective here is gaining support for relaxing the constraints on which form of UUID generation is used for reference IDs.
     

    Patrick Maroney
    DarkLight
    Mobile: (609)841-5104
    Email:   patrick.maroney@darklight.ai
     
    www.darklight.ai

     
     

    From: Allan Thomson <athomson@lookingglasscyber.com>
    Date: Monday, February 4, 2019 at 5:47 PM
    To: Patrick Maroney <pmaroney@darklight.ai>, John Wunder <jwunder@mitre.org>, Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    There is a reason why created_by_ref property is a common property on all SDO. i.e. ability to attribute to the source of the information independent of the identity of the particular object id.
     
    There are also reasons why there are very strict rules defined in the STIX specification on when an entity is allowed to modify information vs not.
     
    I suggest we don t conflate identity of an individual object with the concept of source of that information and vs versa.
     
    I m not suggesting we don t consider UUIDv5 for ids of objects but rather we keep in mind there are reasons why we defined created_by and strict rules on modification/replication.
     
    Allan
     

    From: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> on behalf of Patrick Maroney <pmaroney@darklight.ai>
    Date: Monday, February 4, 2019 at 1:38 PM
    To: "Wunder, John" <jwunder@mitre.org>, Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    We want distinct Deterministic ID UUIDs to discriminate between Sources of same/similar information and to better detect/prevent parroting and/or leakage.
     
    A given of community of trust can establish conventions within the UUIDv5 encoding as required (e.g., just Namespace, Namespace+Community_ID, Namespace+Key).   We should only be using and asserting immutable objects and using the canonical
    representations suggested.   A number of UUIDv5 use cases were proposed historically including non-attributional Source Traceability.
     
    However, none of this impacts those not wishing to participate in Deterministic Reference ID use cases, but empowers those of us who do.
     

    Patrick Maroney
    DarkLight
    Mobile: (609)841-5104
    Email:   patrick.maroney@darklight.ai
     
    www.darklight.ai

     
     

    From: <cti-stix@lists.oasis-open.org> on behalf of John Wunder <jwunder@mitre.org>
    Date: Monday, February 4, 2019 at 3:40 PM
    To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    Yeah I agree with Jason here.
     
    I don t think that would work for UUIDv5 anyway, since everyone will be using different namespaces even if you hash the data the same way you end up with different (non-transparent) IDs:
     
    >>> uid = uuid.uuid4()
    >>> uuid.uuid5(uid, "data")
    UUID('c5bb29ba-4d85-5280-b202-7085b7485b62')
    >>> uid2 = uuid.uuid4()
    >>> uuid.uuid5(uid2, "data")
    UUID('d09fafa9-ead3-54f1-b1c7-1d1d55f66fb7')
     
    You can see that with the same data (just the string literal data ) you end up with different IDs. You could of course use the namespace of a different producer to try to back generate a UUIDv5 for an object they ve presumably created,
    but IMO that s a recipe for trouble (did they actually create that object?). Instead you should just rely on having received that object before and use whatever UUID it had.
     
    It seems like the simplest thing is to just allow UUIDv5 via the various proposals we ve seen. It meets the use cases below and is a very straightforward change.
     
    John
     

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:35 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, John Wunder <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     

    I don't think that figuring out semantic equivalence should be a goal here - referring back to Sergey's original email, that is not why people are trying to do this.


    People want to do this so they can have a bi-directional traceability method from a STIX ID back and forth into an already-existing ID system.

    Forcing them to generate the IDs based on properties, defeats the whole purpose.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:30 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>






    It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs.
     
    The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables.
    The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable.
     
    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com
     
    From: Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:25 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
     
    Actually, I don't agree with this part.

    The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it.

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         Sean Barnum <sean.barnum@FireEye.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com>
    Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:23 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier
    Sent by:         <cti-stix@lists.oasis-open.org>







    Agree.

    And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID.
    It should just not be done on ALL properties of the object.
    We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs.

    Sean Barnum
    Principal Architect
    FireEye
    M: 703.473.8262
    E: sean.barnum@fireeye.com

    From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Monday, February 4, 2019 at 3:20 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    I would think we would want to use a DNS or URL namespace, would we not?

    -
    Jason Keirstead
    Lead Architect - IBM Security Connect
    www.ibm.com/security

    "Things may come to those who wait, but only the things left by those who hustle." - Unknown





    From:         John-Mark Gurney <jmg@newcontext.com>
    To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Cc:         "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com>
    Date:         02/04/2019 04:10 PM
    Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier









    Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400:
    > I would also support this.
    >
    > I have learned more about the inner workings of UUID4/5 and I don't have
    > any reservations about it anymore. The odds of collision with a
    > properly-implemented UUID5 are on-par with UUID4
    >
    > As far as John's comment below - all this means IMHO is the library has to
    > force you to provide a namespace (ie make it a mandatory argument in your
    > constructor or whatever).

    The one requirement I would like to make sure about UUIDv5 is that it
    is NOT based upon the data from the object, otherwise versioning will
    break.

    The reason we didn't use UUIDc4 as most of the proposals to use it was
    to make it a hash of the contents, such as name and description, and
    then update the UUID whenever the name and/or description changed..

    If we do this, the name space should probably be the identity of the
    new STIX2 object.  This would prevent collisions from happening when
    two entities try to create a "new" STIX2 object from a STIX1 object...

    > From:   "Wunder, John A." <jwunder@mitre.org>
    > To:     Sergey Polzunov <sergey@eclecticiq.com>,
    > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
    > Date:   02/04/2019 12:22 PM
    > Subject:        [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    > STIX2 identifier
    > Sent by:        <cti-stix@lists.oasis-open.org>
    >
    >
    >
    > I've been thinking a lot about this and I think it makes sense.
    >
    > One of the concerns we had at the time we chose UUID4 is that users of
    > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This would
    > also apply to users of the new Java libraries that I've seen come out. It
    > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and create
    > IDs by default. I'd be curious what other people think about this problem
    > and how we can help avoid it becoming an issue (especially given how many
    > people use those libraries).
    >
    > John
    >
    > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey
    > Polzunov" <cti-stix@lists.oasis-open.org on behalf of
    > sergey@eclecticiq.com> wrote:
    >
    >     Hey everybody!
    >  
    >     Current STIX2 spec definition of an`identifier` for STIX2 objects is
    > as follows:
    >  
    >     > An identifier universally and uniquely identifies a SDO, SRO,
    > Bundle, or Marking Definition. Identifiers MUST follow the form
    > object-type--UUIDv4, where object-type is the exact value (all type names
    > are lowercase strings, by definition) from the type property of the object
    > being identified or referenced and where the UUIDv4 is an RFC
    > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the
    > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122].
    >     from
    > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265
    >
    >  
    >     I think the requirement to have UUID4 brings more problems than
    > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users.
    >     I will try to show it in these 2 use cases.
    >  
    >  
    >     Use case 1
    >     ----------
    >         Imagine that I'm a client of an intelligence provider A. I've been
    > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading only
    > fresh data. Often fresh data links to old objects for context.
    >         Provider A decides to upgrade and switch to STIX2. In addition to
    > an old STIX1.2 feed, provider creates new STIX2 feed with the same data.
    > In STIX2 all objects have new identifiers and Provider A does not bother
    > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have
    > 2 options:
    >         - clean slate option: drop all old data from this provider and
    > re-fetch everything. That will work if Provider A is the only provider I
    > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan.
    >         - new era option: leave my STIX1.2 data graph in place and start
    > consuming new STIX2 feed from today. This option has one big issue: new
    > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need to
    > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects to
    > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty.
    >  
    >     Use case 2
    >     ----------
    >         Imagine that I'm a NCSC. I receive intelligence from providers,
    > combine it and distribute it to my clients. My providers are still on
    > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what do I do
    > with IDREFs I have in my STIX1.2 data?
    >         I can generate new STIX2 id every time I see new STIX1.2 IDREF in
    > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next
    > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment my
    > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs? I will
    > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out.
    >  
    >  
    >     Proposed solutions if UUID5 is allowed in STIX2 identifiers:
    >  
    >     Use case 1 solution
    >     -------------------
    >         There can be a guideline that will recommend providers to use old
    > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are
    > predictable I, as a client, can greatly simplify my deduplication logic. I
    > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier.
    >         I'm also interested in pushing Provider A to adopting this STIX2
    > identifier generation practice because it will save me money.
    >  
    >     Use case 2 solution
    >     -------------------
    >     WIth UUID5 I have a way out: I can generate new STIX2 ids from old
    > STIX1.2 ids! I can parse IDREF value, that looks like `[ns
    > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this:
    >         - full IDREF will be input for UUID5 function
    >         - for STIX1.2 types that were split (like TTP), I do not know
    > exact STIX2 type Provider would use for old TTP. My solution here would be
    > to play safe and create relations for all possible types: for IDREF to
    > TTP, I will create 4 relations: one to a possible Tool object, one to
    > Malware, one to Attack Pattern and one to Identity. It is an overhead but
    > it is a small price for keeping interconnected intelligence graph.
    >         Again, when time comes and my providers move to STIX2, I'm
    > interested in pushing them to adopt this id generation schema for old
    > objects, because it will save me, as NCSC, money.
    >  
    >  
    >     To reiterate, I would like to propose:
    >     - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in
    > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities;
    >     - creating a guideline, complimentary to the spec, that would explain
    > how STIX1.2 ids can be transformed into STIX2 for easier transition.
    >  
    >  
    >     Practicalities:
    >  
    >     UUID5 ids require use of a namespace. UUID5 RFC (
    > https://tools.ietf.org/html/rfc4122#section-4.3
    > ) defines some generic namespaces (
    > https://tools.ietf.org/html/rfc4122#appendix-C
    > ) but does not prohibits the use of custom ones. I suggest this algorithm:
    >         - namespace UUID5 is generated by using predefined `NameSpace_URL`
    > namespace and producer's URL;
    >         - for old objects, GUID part of STIX2 identifier is namespaced
    > UUID5 generated from old STIX1.2 id
    >         - for new objects, GUID part of STIX2 identifier is either
    > namespaced UUID5 with random UUID4 string, or just random UUID4.
    >  
    >     Example python code for generating UUID5 with custom namespace:
    >  
    >         In [1]: import uuid
    >            ...:
    >            ...: stix12_id =
    > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473'
    >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=')
    >            ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id)
    >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid)
    >            ...:
    >            ...: print("new STIX2 id: {}".format(stix2_id))
    >  
    >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e
    >  
    >  
    >     BONUS: python functions to convert STIX1.2 IDREFs into STIX2
    > identifiers -
    > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5
    >
    >  
    >  
    >     Thank you,
    >     Sergey Polzunov
    >     EclecticIQ
    >  
    >  
    >  
    >  
    >
    >
    >
    >
    >
    >

    --
    John-Mark





    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     
    This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto)
    by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.

     






  • 7.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-05-2019 13:06
    UUIDv5 enables this.  Those who want
    to use a specific deterministic ID can just form a consortium of agreement,
    choose a namespace for this consortium to use, and use that namespace among
    all of their products. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:      
      Patrick Maroney <pmaroney@darklight.ai> To:      
      "Wunder, John
    A." <jwunder@mitre.org>, Jason Keirstead <Jason.Keirstead@ca.ibm.com>,
    Sean Barnum <sean.barnum@FireEye.com> Cc:      
      "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>,
    Sergey Polzunov <sergey@eclecticiq.com> Date:      
      02/04/2019 05:38 PM Subject:    
        Re: [cti-stix]
    Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier We want distinct Deterministic ID UUIDs
    to discriminate between Sources of same/similar information and to better
    detect/prevent parroting and/or leakage.   A given of community of trust can establish
    conventions within the UUIDv5 encoding as required (e.g., just Namespace,
    Namespace+Community_ID, Namespace+Key).   We should only be using
    and asserting immutable objects and using the canonical representations
    suggested.   A number of UUIDv5 use cases were proposed historically
    including non-attributional Source Traceability.   However, none of this impacts those
    not wishing to participate in Deterministic Reference ID use cases, but
    empowers those of us who do.   Patrick Maroney DarkLight Mobile: (609)841-5104 Email:   patrick.maroney@darklight.ai   www.darklight.ai     From: <cti-stix@lists.oasis-open.org>
    on behalf of John Wunder <jwunder@mitre.org> Date: Monday, February 4, 2019 at 3:40 PM To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum
    <sean.barnum@FireEye.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>,
    John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    STIX2 identifier   Yeah I agree with Jason here.   I don t think that would work for UUIDv5
    anyway, since everyone will be using different namespaces even if you hash
    the data the same way you end up with different (non-transparent) IDs:   >>> uid = uuid.uuid4() >>> uuid.uuid5(uid, "data") UUID('c5bb29ba-4d85-5280-b202-7085b7485b62') >>> uid2 = uuid.uuid4() >>> uuid.uuid5(uid2, "data") UUID('d09fafa9-ead3-54f1-b1c7-1d1d55f66fb7')   You can see that with the same data
    (just the string literal data ) you end up with different IDs. You could
    of course use the namespace of a different producer to try to back generate
    a UUIDv5 for an object they ve presumably created, but IMO that s a recipe
    for trouble (did they actually create that object?). Instead you should
    just rely on having received that object before and use whatever UUID it
    had.   It seems like the simplest thing is
    to just allow UUIDv5 via the various proposals we ve seen. It meets the
    use cases below and is a very straightforward change.   John   From: <cti-stix@lists.oasis-open.org>
    on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:35 PM To: Sean Barnum <sean.barnum@FireEye.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>,
    John-Mark Gurney <jmg@newcontext.com>, John Wunder <jwunder@mitre.org>,
    Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    STIX2 identifier   I don't think that figuring out semantic
    equivalence should be a goal here - referring back to Sergey's original
    email, that is not why people are trying to do this. People want to do this so they can have a bi-directional traceability method
    from a STIX ID back and forth into an already-existing ID system. Forcing them to generate the IDs based on properties, defeats the whole
    purpose. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:         Sean
    Barnum <sean.barnum@FireEye.com> To:         Jason Keirstead
    <Jason.Keirstead@ca.ibm.com> Cc:         "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov
    <sergey@eclecticiq.com> Date:         02/04/2019
    04:30 PM Subject:         Re:
    [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Sent by:         <cti-stix@lists.oasis-open.org> It is to assist in semantic equivalence normalization across producers.
    Just like we have done for SCOs. The reality is that objects such as Locations, Identities, etc are likely
    to be repeated and widely used about as much as many Observables. The ability to inherently converge on equivalence of things like Locations
    and Identities via UUIDv5 calculation is extremely valuable. Sean Barnum Principal Architect FireEye M: 703.473.8262 E: sean.barnum@fireeye.com From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:25 PM To: Sean Barnum <sean.barnum@FireEye.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>,
    John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A."
    <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    STIX2 identifier Actually, I don't agree with this part. The entire point of UUIDv5, is that I should not care what method you use
    to compute your IDs - because it's in your namespace, so its not my problem
    anymore. I don't think we want to codify it. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:         Sean
    Barnum <sean.barnum@FireEye.com> To:         Jason Keirstead
    <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> Cc:         "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>,
    Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019
    04:23 PM Subject:         Re:
    [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Sent by:         <cti-stix@lists.oasis-open.org> Agree. And I would suggest we DO want the calculation to be done based upon
    the data from the object . As that is how we get value from such an ID. It should just not be done on ALL properties of the object. We just need to define the semantically relevant properties to use for
    the calculation. This is exactly what we have just done for SCOs. Sean Barnum Principal Architect FireEye M: 703.473.8262 E: sean.barnum@fireeye.com From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead
    <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:20 PM To: John-Mark Gurney <jmg@newcontext.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>,
    "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov
    <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in
    STIX2 identifier I would think we would want to use a DNS or URL namespace, would we not? - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those
    who hustle." - Unknown From:         John-Mark
    Gurney <jmg@newcontext.com> To:         Jason Keirstead
    <Jason.Keirstead@ca.ibm.com> Cc:         "Wunder,
    John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org"
    <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019
    04:10 PM Subject:         Re:
    [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > I would also support this. > > I have learned more about the inner workings of UUID4/5 and I don't
    have > any reservations about it anymore. The odds of collision with a > properly-implemented UUID5 are on-par with UUID4 > > As far as John's comment below - all this means IMHO is the library
    has to > force you to provide a namespace (ie make it a mandatory argument
    in your > constructor or whatever). The one requirement I would like to make sure about UUIDv5 is that it is NOT based upon the data from the object, otherwise versioning will break. The reason we didn't use UUIDc4 as most of the proposals to use it was to make it a hash of the contents, such as name and description, and then update the UUID whenever the name and/or description changed.. If we do this, the name space should probably be the identity of the new STIX2 object.  This would prevent collisions from happening when two entities try to create a "new" STIX2 object from a STIX1
    object... > From:   "Wunder, John A." <jwunder@mitre.org> > To:     Sergey Polzunov <sergey@eclecticiq.com>, > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > Date:   02/04/2019 12:22 PM > Subject:        [cti-stix] Re: [EXT] [cti-stix]
    ability to use UUID5 in > STIX2 identifier > Sent by:        <cti-stix@lists.oasis-open.org> > > > > I've been thinking a lot about this and I think it makes sense. > > One of the concerns we had at the time we chose UUID4 is that users
    of > libraries like python-stix would need to remember to set the UUID5
    > namespace -- or, if they don't and python-stix has some default namespace,
    > different tools using the libraries could have overlapping IDs. This
    would > also apply to users of the new Java libraries that I've seen come
    out. It > might mean these libraries requiring that people set a unique namespace
    > before creating any objects, vs. now where it can just go ahead and
    create > IDs by default. I'd be curious what other people think about this
    problem > and how we can help avoid it becoming an issue (especially given how
    many > people use those libraries). > > John > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf
    of Sergey > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > sergey@eclecticiq.com> wrote: > >     Hey everybody! >   >     Current STIX2 spec definition of an`identifier` for
    STIX2 objects is > as follows: >   >     > An identifier universally and uniquely identifies
    a SDO, SRO, > Bundle, or Marking Definition. Identifiers MUST follow the form > object-type--UUIDv4, where object-type is the exact value (all type
    names > are lowercase strings, by definition) from the type property of the
    object > being identified or referenced and where the UUIDv4 is an RFC > 4122-compliant Version 4 UUID. The UUID MUST be generated according
    to the > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. >     from > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > >   >     I think the requirement to have UUID4 brings more problems
    than > benefits. It makes STIX1->STIX2 transition difficult, hurting existing
    > STIX1 users. >     I will try to show it in these 2 use cases. >   >   >     Use case 1 >     ---------- >         Imagine that I'm a client of an intelligence
    provider A. I've been > a client for a long time and I have received intelligence in STIX1.2,
    > which I stored in my DB. I fetch new intelligence daily, downloading
    only > fresh data. Often fresh data links to old objects for context. >         Provider A decides to upgrade and switch
    to STIX2. In addition to > an old STIX1.2 feed, provider creates new STIX2 feed with the same
    data. > In STIX2 all objects have new identifiers and Provider A does not
    bother > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client,
    have > 2 options: >         - clean slate option: drop all old data
    from this provider and > re-fetch everything. That will work if Provider A is the only provider
    I > use or if I never referenced Provider A's data from my own intelligence.
    > Not a great plan. >         - new era option: leave my STIX1.2 data
    graph in place and start > consuming new STIX2 feed from today. This option has one big issue:
    new > STIX2 data will not be connected to STIX1.2 data I already have, because
    > STIX2 ids are all different. If I want to deduce connection, I need
    to > deduplicate the data against my existing STIX1.2 DB. This means my
    > ingestion pipeline must be smart enough to compare STIX1.2 objects
    to > STIX2 objects and be fast enough to do that for every new STIX2 object.
    > This will be difficult to implement and will have a huge performance
    > penalty. >   >     Use case 2 >     ---------- >         Imagine that I'm a NCSC. I receive intelligence
    from providers, > combine it and distribute it to my clients. My providers are still
    on > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive
    > into STIX2. Full STIX1.2 entities I can transform easily but what
    do I do > with IDREFs I have in my STIX1.2 data? >         I can generate new STIX2 id every time
    I see new STIX1.2 IDREF in > incoming data and store STIX1.2->STIX2 mapping somewhere to be
    used next > time I see this IDREF. This is painful and will require additional
    > resources, but it is doable. But it will only work until the moment
    my > providers switch to STIX2 and start sending me full objects for those
    > IDREFs with new random STIX2 identifiers! I can not predict these
    > identifiers and I can't match them with the ones I generated. So my
    > thinking is - what is the point in even bothering with old IDREFs?
    I will > just drop them, sending my clients sometimes disconnected STIX2 entities,
    > hoping that they will figure it out. >   >   >     Proposed solutions if UUID5 is allowed in STIX2 identifiers: >   >     Use case 1 solution >     ------------------- >         There can be a guideline that will recommend
    providers to use old > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers
    are > predictable I, as a client, can greatly simplify my deduplication
    logic. I > can run DB migration once to calculate STIX2 identifiers for all my
    > STIX1.2 objects and use these on ingestion for deduplication. Appending
    > STIX2 data to my STIX1.2 DB will be much easier. >         I'm also interested in pushing Provider
    A to adopting this STIX2 > identifier generation practice because it will save me money. >   >     Use case 2 solution >     ------------------- >     WIth UUID5 I have a way out: I can generate new STIX2
    ids from old > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct
    > type to build new STIX2 identifier. The logic will be like this: >         - full IDREF will be input for UUID5 function >         - for STIX1.2 types that were split (like
    TTP), I do not know > exact STIX2 type Provider would use for old TTP. My solution here
    would be > to play safe and create relations for all possible types: for IDREF
    to > TTP, I will create 4 relations: one to a possible Tool object, one
    to > Malware, one to Attack Pattern and one to Identity. It is an overhead
    but > it is a small price for keeping interconnected intelligence graph. >         Again, when time comes and my providers
    move to STIX2, I'm > interested in pushing them to adopt this id generation schema for
    old > objects, because it will save me, as NCSC, money. >   >   >     To reiterate, I would like to propose: >     - a change in STIX2 spec to allow both UUID5 and UUID4
    to be used in > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; >     - creating a guideline, complimentary to the spec, that
    would explain > how STIX1.2 ids can be transformed into STIX2 for easier transition. >   >   >     Practicalities: >   >     UUID5 ids require use of a namespace. UUID5 RFC ( > https://tools.ietf.org/html/rfc4122#section-4.3 > ) defines some generic namespaces ( > https://tools.ietf.org/html/rfc4122#appendix-C > ) but does not prohibits the use of custom ones. I suggest this algorithm: >         - namespace UUID5 is generated by using
    predefined `NameSpace_URL` > namespace and producer's URL; >         - for old objects, GUID part of STIX2
    identifier is namespaced > UUID5 generated from old STIX1.2 id >         - for new objects, GUID part of STIX2
    identifier is either > namespaced UUID5 with random UUID4 string, or just random UUID4. >   >     Example python code for generating UUID5 with custom
    namespace: >   >         In [1]: import uuid >            ...: >            ...: stix12_id = > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL,
    > 'https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e=') >            ...: stix2_uuid = uuid.uuid5(namespace_uuid,
    stix12_id) >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) >            ...: >            ...: print("new STIX2
    id: {}".format(stix2_id)) >   >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e >   >   >     BONUS: python functions to convert STIX1.2 IDREFs into
    STIX2 > identifiers - > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > >   >   >     Thank you, >     Sergey Polzunov >     EclecticIQ >   >   >   >   > > > > > > -- John-Mark This email and any attachments thereto may contain private,
    confidential, and/or privileged material for the sole use of the intended
    recipient. Any review, copying, or distribution of this email (or any attachments
    thereto) by others is strictly prohibited. If you are not the intended
    recipient, please contact the sender immediately and permanently delete
    the original and any copies of this email and any attachments thereto.
      This email and any attachments thereto may contain private,
    confidential, and/or privileged material for the sole use of the intended
    recipient. Any review, copying, or distribution of this email (or any attachments
    thereto) by others is strictly prohibited. If you are not the intended
    recipient, please contact the sender immediately and permanently delete
    the original and any copies of this email and any attachments thereto.
     



  • 8.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-07-2019 18:01
    I have been thinking about this a lot over the past few weeks.  There is a lot of history here, and my concerns always were ensuring that two organizations did not produce two different objects with the same ID.   I personally believe that the ID of the object should not be used to find semantic equivalence.  If we want to add another property for that, then great.   I would support the following updated text: ### BEGIN FROM Part1 Section 2.7 An identifier universally and uniquely identifies a SDO, SRO, Bundle, Language Content, or Marking Definition. Identifiers MUST follow the form object-type-- hash , where object-type is the exact value (all type names are lowercase strings, by definition) from the type property of the object being identified or referenced and where the hash is either  UUIDv4 is an RFC 4122-compliant Version 4 UUID or a Version 5 UUID . The UUID MUST be generated according to the algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) or section 4.3 (Version 5 UUID) [RFC4122]. For UUIDv5, the namespace SHOULD be the organization's fully qualified DNS name (example.com) and the name should be a value consistent within the organizations that guarantees that two different objects do not generate the same ID. For example, an organization MAY choose to use all properties, a subset of all properties, an organizational content identifier, a STIX 1.x identifier, or something else as string data for the name. Some objects in STIX MAY define a list of properties that an organization MAY use for the name portion of the UUIDv5 hash, that MIGHT allow deterministic generation and verification of semantic equivalency.  ### END If we did this definition right, we could use the SAME identifier for both STIX content and Cyber Observables, thus preventing a lot of future confusion in the market.  Bret From: cti-stix@lists.oasis-open.org <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> Sent: Tuesday, February 5, 2019 6:06 AM To: Patrick Maroney Cc: cti-stix@lists.oasis-open.org; John-Mark Gurney; Wunder, John A.; Sean Barnum; Sergey Polzunov Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier   UUIDv5 enables this.  Those who want to use a specific deterministic ID can just form a consortium of agreement, choose a namespace for this consortium to use, and use that namespace among all of their products. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those who hustle." - Unknown From:         Patrick Maroney <pmaroney@darklight.ai> To:         "Wunder, John A." <jwunder@mitre.org>, Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com> Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019 05:38 PM Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier We want distinct Deterministic ID UUIDs to discriminate between Sources of same/similar information and to better detect/prevent parroting and/or leakage.   A given of community of trust can establish conventions within the UUIDv5 encoding as required (e.g., just Namespace, Namespace+Community_ID, Namespace+Key).   We should only be using and asserting immutable objects and using the canonical representations suggested.   A number of UUIDv5 use cases were proposed historically including non-attributional Source Traceability.   However, none of this impacts those not wishing to participate in Deterministic Reference ID use cases, but empowers those of us who do.   Patrick Maroney DarkLight Mobile: (609)841-5104 Email:   patrick.maroney@darklight.ai   www.darklight.ai     From: <cti-stix@lists.oasis-open.org> on behalf of John Wunder <jwunder@mitre.org> Date: Monday, February 4, 2019 at 3:40 PM To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, Sean Barnum <sean.barnum@FireEye.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier   Yeah I agree with Jason here.   I don’t think that would work for UUIDv5 anyway, since everyone will be using different namespaces even if you hash the data the same way you end up with different (non-transparent) IDs:   >>> uid = uuid.uuid4() >>> uuid.uuid5(uid, "data") UUID('c5bb29ba-4d85-5280-b202-7085b7485b62') >>> uid2 = uuid.uuid4() >>> uuid.uuid5(uid2, "data") UUID('d09fafa9-ead3-54f1-b1c7-1d1d55f66fb7')   You can see that with the same data (just the string literal “data”) you end up with different IDs. You could of course use the namespace of a different producer to try to back generate a UUIDv5 for an object they’ve presumably created, but IMO that’s a recipe for trouble (did they actually create that object?). Instead you should just rely on having received that object before and use whatever UUID it had.   It seems like the simplest thing is to just allow UUIDv5 via the various proposals we’ve seen. It meets the use cases below and is a very straightforward change.   John   From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:35 PM To: Sean Barnum <sean.barnum@FireEye.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, John Wunder <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier   I don't think that figuring out semantic equivalence should be a goal here - referring back to Sergey's original email, that is not why people are trying to do this. People want to do this so they can have a bi-directional traceability method from a STIX ID back and forth into an already-existing ID system. Forcing them to generate the IDs based on properties, defeats the whole purpose. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those who hustle." - Unknown From:         Sean Barnum <sean.barnum@FireEye.com> To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com> Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019 04:30 PM Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Sent by:         <cti-stix@lists.oasis-open.org> It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs. The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables. The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable. Sean Barnum Principal Architect FireEye M: 703.473.8262 E: sean.barnum@fireeye.com From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:25 PM To: Sean Barnum <sean.barnum@FireEye.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Actually, I don't agree with this part. The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it. - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those who hustle." - Unknown From:         Sean Barnum <sean.barnum@FireEye.com> To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> Cc:         "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019 04:23 PM Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Sent by:         <cti-stix@lists.oasis-open.org> Agree. And I would suggest we DO want the calculation to be done “based upon the data from the object”. As that is how we get value from such an ID. It should just not be done on ALL properties of the object. We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs. Sean Barnum Principal Architect FireEye M: 703.473.8262 E: sean.barnum@fireeye.com From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Monday, February 4, 2019 at 3:20 PM To: John-Mark Gurney <jmg@newcontext.com> Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier I would think we would want to use a DNS or URL namespace, would we not? - Jason Keirstead Lead Architect - IBM Security Connect www.ibm.com/security "Things may come to those who wait, but only the things left by those who hustle." - Unknown From:         John-Mark Gurney <jmg@newcontext.com> To:         Jason Keirstead <Jason.Keirstead@ca.ibm.com> Cc:         "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> Date:         02/04/2019 04:10 PM Subject:         Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > I would also support this. > > I have learned more about the inner workings of UUID4/5 and I don't have > any reservations about it anymore. The odds of collision with a > properly-implemented UUID5 are on-par with UUID4 > > As far as John's comment below - all this means IMHO is the library has to > force you to provide a namespace (ie make it a mandatory argument in your > constructor or whatever). The one requirement I would like to make sure about UUIDv5 is that it is NOT based upon the data from the object, otherwise versioning will break. The reason we didn't use UUIDc4 as most of the proposals to use it was to make it a hash of the contents, such as name and description, and then update the UUID whenever the name and/or description changed.. If we do this, the name space should probably be the identity of the new STIX2 object.  This would prevent collisions from happening when two entities try to create a "new" STIX2 object from a STIX1 object... > From:   "Wunder, John A." <jwunder@mitre.org> > To:     Sergey Polzunov <sergey@eclecticiq.com>, > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > Date:   02/04/2019 12:22 PM > Subject:        [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in > STIX2 identifier > Sent by:        <cti-stix@lists.oasis-open.org> > > > > I've been thinking a lot about this and I think it makes sense. > > One of the concerns we had at the time we chose UUID4 is that users of > libraries like python-stix would need to remember to set the UUID5 > namespace -- or, if they don't and python-stix has some default namespace, > different tools using the libraries could have overlapping IDs. This would > also apply to users of the new Java libraries that I've seen come out. It > might mean these libraries requiring that people set a unique namespace > before creating any objects, vs. now where it can just go ahead and create > IDs by default. I'd be curious what other people think about this problem > and how we can help avoid it becoming an issue (especially given how many > people use those libraries). > > John > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > sergey@eclecticiq.com> wrote: > >     Hey everybody! >   >     Current STIX2 spec definition of an`identifier` for STIX2 objects is > as follows: >   >     > An identifier universally and uniquely identifies a SDO, SRO, > Bundle, or Marking Definition. Identifiers MUST follow the form > object-type--UUIDv4, where object-type is the exact value (all type names > are lowercase strings, by definition) from the type property of the object > being identified or referenced and where the UUIDv4 is an RFC > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. >     — from > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > >   >     I think the requirement to have UUID4 brings more problems than > benefits. It makes STIX1->STIX2 transition difficult, hurting existing > STIX1 users. >     I will try to show it in these 2 use cases. >   >   >     Use case 1 >     ---------- >         Imagine that I'm a client of an intelligence provider A. I've been > a client for a long time and I have received intelligence in STIX1.2, > which I stored in my DB. I fetch new intelligence daily, downloading only > fresh data. Often fresh data links to old objects for context. >         Provider A decides to upgrade and switch to STIX2. In addition to > an old STIX1.2 feed, provider creates new STIX2 feed with the same data. > In STIX2 all objects have new identifiers and Provider A does not bother > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have > 2 options: >         - clean slate option: drop all old data from this provider and > re-fetch everything. That will work if Provider A is the only provider I > use or if I never referenced Provider A's data from my own intelligence. > Not a great plan. >         - new era option: leave my STIX1.2 data graph in place and start > consuming new STIX2 feed from today. This option has one big issue: new > STIX2 data will not be connected to STIX1.2 data I already have, because > STIX2 ids are all different. If I want to deduce connection, I need to > deduplicate the data against my existing STIX1.2 DB. This means my > ingestion pipeline must be smart enough to compare STIX1.2 objects to > STIX2 objects and be fast enough to do that for every new STIX2 object. > This will be difficult to implement and will have a huge performance > penalty. >   >     Use case 2 >     ---------- >         Imagine that I'm a NCSC. I receive intelligence from providers, > combine it and distribute it to my clients. My providers are still on > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive > into STIX2. Full STIX1.2 entities I can transform easily but what do I do > with IDREFs I have in my STIX1.2 data? >         I can generate new STIX2 id every time I see new STIX1.2 IDREF in > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next > time I see this IDREF. This is painful and will require additional > resources, but it is doable. But it will only work until the moment my > providers switch to STIX2 and start sending me full objects for those > IDREFs with new random STIX2 identifiers! I can not predict these > identifiers and I can't match them with the ones I generated. So my > thinking is - what is the point in even bothering with old IDREFs? I will > just drop them, sending my clients sometimes disconnected STIX2 entities, > hoping that they will figure it out. >   >   >     Proposed solutions if UUID5 is allowed in STIX2 identifiers: >   >     Use case 1 solution >     ------------------- >         There can be a guideline that will recommend providers to use old > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are > predictable I, as a client, can greatly simplify my deduplication logic. I > can run DB migration once to calculate STIX2 identifiers for all my > STIX1.2 objects and use these on ingestion for deduplication. Appending > STIX2 data to my STIX1.2 DB will be much easier. >         I'm also interested in pushing Provider A to adopting this STIX2 > identifier generation practice because it will save me money. >   >     Use case 2 solution >     ------------------- >     WIth UUID5 I have a way out: I can generate new STIX2 ids from old > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct > type to build new STIX2 identifier. The logic will be like this: >         - full IDREF will be input for UUID5 function >         - for STIX1.2 types that were split (like TTP), I do not know > exact STIX2 type Provider would use for old TTP. My solution here would be > to play safe and create relations for all possible types: for IDREF to > TTP, I will create 4 relations: one to a possible Tool object, one to > Malware, one to Attack Pattern and one to Identity. It is an overhead but > it is a small price for keeping interconnected intelligence graph. >         Again, when time comes and my providers move to STIX2, I'm > interested in pushing them to adopt this id generation schema for old > objects, because it will save me, as NCSC, money. >   >   >     To reiterate, I would like to propose: >     - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; >     - creating a guideline, complimentary to the spec, that would explain > how STIX1.2 ids can be transformed into STIX2 for easier transition. >   >   >     Practicalities: >   >     UUID5 ids require use of a namespace. UUID5 RFC ( > https://tools.ietf.org/html/rfc4122#section-4.3 > ) defines some generic namespaces ( > https://tools.ietf.org/html/rfc4122#appendix-C > ) but does not prohibits the use of custom ones. I suggest this algorithm: >         - namespace UUID5 is generated by using predefined `NameSpace_URL` > namespace and producer's URL; >         - for old objects, GUID part of STIX2 identifier is namespaced > UUID5 generated from old STIX1.2 id >         - for new objects, GUID part of STIX2 identifier is either > namespaced UUID5 with random UUID4 string, or just random UUID4. >   >     Example python code for generating UUID5 with custom namespace: >   >         In [1]: import uuid >            ...: >            ...: stix12_id = > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' >            ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, > 'https://clicktime.symantec.com/3Twt3zW8FEEWaBXtubfTCpc7Vc?u=https%3A%2F%2Furldefense.proofpoint.com%2Fv2%2Furl%3Fu%3Dhttps-3A__eclecticiq.com_ns%26d%3DDwIGaQ%26c%3Djf_iaSHvJObTbx-siA1ZOg%26r%3Dk6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA%26m%3DcvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI%26s%3DtQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM%26e%3D%27) >            ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) >            ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) >            ...: >            ...: print("new STIX2 id: {}".format(stix2_id)) >   >         new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e >   >   >     BONUS: python functions to convert STIX1.2 IDREFs into STIX2 > identifiers - > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > >   >   >     Thank you, >     Sergey Polzunov >     EclecticIQ >   >   >   >   > > > > > > -- John-Mark This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.   This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.  


  • 9.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-14-2019 01:27
    Sean Barnum wrote this message on Mon, Feb 04, 2019 at 20:29 +0000: > It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs. > > The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables. > The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable. Except that it will completely break versioning if we have the same UUID created by different authors. We have to be careful merging the SCO UUID language and the SDO UUID language such that we both break each other. We already have an issue w/ UUID's being repeated from the same org, but for different types of objects: https://github.com/pan-unit42/playbook_viewer/issues/7 Expanding this such that the same UUID's can be generated by different orgs, and the same type will break STIX significantly. > From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Date: Monday, February 4, 2019 at 3:25 PM > To: Sean Barnum <sean.barnum@FireEye.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > Actually, I don't agree with this part. > > The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it. > > - > Jason Keirstead > Lead Architect - IBM Security Connect > www.ibm.com/security > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > From: Sean Barnum <sean.barnum@FireEye.com> > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Date: 02/04/2019 04:23 PM > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > Sent by: <cti-stix@lists.oasis-open.org> > ________________________________ > > > > Agree. > > And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID. > It should just not be done on ALL properties of the object. > We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs. > > Sean Barnum > Principal Architect > FireEye > M: 703.473.8262 > E: sean.barnum@fireeye.com > > From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Date: Monday, February 4, 2019 at 3:20 PM > To: John-Mark Gurney <jmg@newcontext.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > I would think we would want to use a DNS or URL namespace, would we not? > > - > Jason Keirstead > Lead Architect - IBM Security Connect > www.ibm.com/security > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > From: John-Mark Gurney <jmg@newcontext.com> > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Cc: "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> > Date: 02/04/2019 04:10 PM > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > ________________________________ > > > > > Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > > I would also support this. > > > > I have learned more about the inner workings of UUID4/5 and I don't have > > any reservations about it anymore. The odds of collision with a > > properly-implemented UUID5 are on-par with UUID4 > > > > As far as John's comment below - all this means IMHO is the library has to > > force you to provide a namespace (ie make it a mandatory argument in your > > constructor or whatever). > > The one requirement I would like to make sure about UUIDv5 is that it > is NOT based upon the data from the object, otherwise versioning will > break. > > The reason we didn't use UUIDc4 as most of the proposals to use it was > to make it a hash of the contents, such as name and description, and > then update the UUID whenever the name and/or description changed.. > > If we do this, the name space should probably be the identity of the > new STIX2 object. This would prevent collisions from happening when > two entities try to create a "new" STIX2 object from a STIX1 object... > > > From: "Wunder, John A." <jwunder@mitre.org> > > To: Sergey Polzunov <sergey@eclecticiq.com>, > > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > > Date: 02/04/2019 12:22 PM > > Subject: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in > > STIX2 identifier > > Sent by: <cti-stix@lists.oasis-open.org> > > > > > > > > I've been thinking a lot about this and I think it makes sense. > > > > One of the concerns we had at the time we chose UUID4 is that users of > > libraries like python-stix would need to remember to set the UUID5 > > namespace -- or, if they don't and python-stix has some default namespace, > > different tools using the libraries could have overlapping IDs. This would > > also apply to users of the new Java libraries that I've seen come out. It > > might mean these libraries requiring that people set a unique namespace > > before creating any objects, vs. now where it can just go ahead and create > > IDs by default. I'd be curious what other people think about this problem > > and how we can help avoid it becoming an issue (especially given how many > > people use those libraries). > > > > John > > > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey > > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > > sergey@eclecticiq.com> wrote: > > > > Hey everybody! > > > > Current STIX2 spec definition of an`identifier` for STIX2 objects is > > as follows: > > > > > An identifier universally and uniquely identifies a SDO, SRO, > > Bundle, or Marking Definition. Identifiers MUST follow the form > > object-type--UUIDv4, where object-type is the exact value (all type names > > are lowercase strings, by definition) from the type property of the object > > being identified or referenced and where the UUIDv4 is an RFC > > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the > > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. > > from > > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > > > > > > I think the requirement to have UUID4 brings more problems than > > benefits. It makes STIX1->STIX2 transition difficult, hurting existing > > STIX1 users. > > I will try to show it in these 2 use cases. > > > > > > Use case 1 > > ---------- > > Imagine that I'm a client of an intelligence provider A. I've been > > a client for a long time and I have received intelligence in STIX1.2, > > which I stored in my DB. I fetch new intelligence daily, downloading only > > fresh data. Often fresh data links to old objects for context. > > Provider A decides to upgrade and switch to STIX2. In addition to > > an old STIX1.2 feed, provider creates new STIX2 feed with the same data. > > In STIX2 all objects have new identifiers and Provider A does not bother > > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have > > 2 options: > > - clean slate option: drop all old data from this provider and > > re-fetch everything. That will work if Provider A is the only provider I > > use or if I never referenced Provider A's data from my own intelligence. > > Not a great plan. > > - new era option: leave my STIX1.2 data graph in place and start > > consuming new STIX2 feed from today. This option has one big issue: new > > STIX2 data will not be connected to STIX1.2 data I already have, because > > STIX2 ids are all different. If I want to deduce connection, I need to > > deduplicate the data against my existing STIX1.2 DB. This means my > > ingestion pipeline must be smart enough to compare STIX1.2 objects to > > STIX2 objects and be fast enough to do that for every new STIX2 object. > > This will be difficult to implement and will have a huge performance > > penalty. > > > > Use case 2 > > ---------- > > Imagine that I'm a NCSC. I receive intelligence from providers, > > combine it and distribute it to my clients. My providers are still on > > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive > > into STIX2. Full STIX1.2 entities I can transform easily but what do I do > > with IDREFs I have in my STIX1.2 data? > > I can generate new STIX2 id every time I see new STIX1.2 IDREF in > > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next > > time I see this IDREF. This is painful and will require additional > > resources, but it is doable. But it will only work until the moment my > > providers switch to STIX2 and start sending me full objects for those > > IDREFs with new random STIX2 identifiers! I can not predict these > > identifiers and I can't match them with the ones I generated. So my > > thinking is - what is the point in even bothering with old IDREFs? I will > > just drop them, sending my clients sometimes disconnected STIX2 entities, > > hoping that they will figure it out. > > > > > > Proposed solutions if UUID5 is allowed in STIX2 identifiers: > > > > Use case 1 solution > > ------------------- > > There can be a guideline that will recommend providers to use old > > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are > > predictable I, as a client, can greatly simplify my deduplication logic. I > > can run DB migration once to calculate STIX2 identifiers for all my > > STIX1.2 objects and use these on ingestion for deduplication. Appending > > STIX2 data to my STIX1.2 DB will be much easier. > > I'm also interested in pushing Provider A to adopting this STIX2 > > identifier generation practice because it will save me money. > > > > Use case 2 solution > > ------------------- > > WIth UUID5 I have a way out: I can generate new STIX2 ids from old > > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct > > type to build new STIX2 identifier. The logic will be like this: > > - full IDREF will be input for UUID5 function > > - for STIX1.2 types that were split (like TTP), I do not know > > exact STIX2 type Provider would use for old TTP. My solution here would be > > to play safe and create relations for all possible types: for IDREF to > > TTP, I will create 4 relations: one to a possible Tool object, one to > > Malware, one to Attack Pattern and one to Identity. It is an overhead but > > it is a small price for keeping interconnected intelligence graph. > > Again, when time comes and my providers move to STIX2, I'm > > interested in pushing them to adopt this id generation schema for old > > objects, because it will save me, as NCSC, money. > > > > > > To reiterate, I would like to propose: > > - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in > > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; > > - creating a guideline, complimentary to the spec, that would explain > > how STIX1.2 ids can be transformed into STIX2 for easier transition. > > > > > > Practicalities: > > > > UUID5 ids require use of a namespace. UUID5 RFC ( > > https://tools.ietf.org/html/rfc4122#section-4.3 > > ) defines some generic namespaces ( > > https://tools.ietf.org/html/rfc4122#appendix-C > > ) but does not prohibits the use of custom ones. I suggest this algorithm: > > - namespace UUID5 is generated by using predefined `NameSpace_URL` > > namespace and producer's URL; > > - for old objects, GUID part of STIX2 identifier is namespaced > > UUID5 generated from old STIX1.2 id > > - for new objects, GUID part of STIX2 identifier is either > > namespaced UUID5 with random UUID4 string, or just random UUID4. > > > > Example python code for generating UUID5 with custom namespace: > > > > In [1]: import uuid > > ...: > > ...: stix12_id = > > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' > > ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, > > ' https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e= ') > > ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) > > ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) > > ...: > > ...: print("new STIX2 id: {}".format(stix2_id)) > > > > new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e > > > > > > BONUS: python functions to convert STIX1.2 IDREFs into STIX2 > > identifiers - > > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > > > > > > > > Thank you, > > Sergey Polzunov > > EclecticIQ > > > > > > > > > > > > > > > > > > > > > > -- > John-Mark > > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. -- John-Mark


  • 10.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-14-2019 02:42




    While I can t attest to HOW the three IDs referenced have the same suffix UUID value, aren t they still unique given their prefixes are unique?
     
      "id": "relationship--4832076b-7a4c- 4 952-8853-6446de513176"
      "id": "report--4832076b-7a4c- 4 952-8853-6446de513176"
      "id": "campaign--4832076b-7a4c- 4 952-8853-6446de513176"
     
    Note also that these are NOT UUIDv5.
     

    Patrick Maroney
    DarkLight
    www.darklight.ai

     
     

    From: <cti-stix@lists.oasis-open.org> on behalf of John-Mark Gurney <jmg@newcontext.com>
    Organization: New Context
    Date: Wednesday, February 13, 2019 at 8:26 PM
    To: Sean Barnum <sean.barnum@FireEye.com>
    Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>
    Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier


     


    We already have an issue w/ UUID's being repeated from the same org,


    but for different types of objects:


    https://github.com/pan-unit42/playbook_viewer/issues/7


     


    Expanding this such that the same UUID's can be generated by different


    orgs, and the same type will break STIX significantly.







  • 11.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-14-2019 23:22
    Patrick Maroney wrote this message on Thu, Feb 14, 2019 at 02:42 +0000: > While I can t attest to HOW the three IDs referenced have the same suffix UUID value, aren t they still unique given their prefixes are unique? > > "id": "relationship--4832076b-7a4c-4952-8853-6446de513176" > "id": "report--4832076b-7a4c-4952-8853-6446de513176" > "id": "campaign--4832076b-7a4c-4952-8853-6446de513176" > > Note also that these are NOT UUIDv5. Yes, per the specification as currently written, they are "valid". But I know some people assumed that the UU part of UUID would be correct, and the above shows that they are not universally unique. This means if you used just the UUID as your primary key and do not include the type, that you may have issues w/ stored the objects due to colliding UUIDs. Also, though the are "valid", they are clearly not properly generated per the specification, which requires that a TRNG and PRNG be used, and that obviously did not happen. "The UUID MUST be generated according to the algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]." > From: <cti-stix@lists.oasis-open.org> on behalf of John-Mark Gurney <jmg@newcontext.com> > Organization: New Context > Date: Wednesday, February 13, 2019 at 8:26 PM > To: Sean Barnum <sean.barnum@FireEye.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > We already have an issue w/ UUID's being repeated from the same org, > but for different types of objects: > https://github.com/pan-unit42/playbook_viewer/issues/7 > > Expanding this such that the same UUID's can be generated by different > orgs, and the same type will break STIX significantly. -- John-Mark


  • 12.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-14-2019 13:42
    +1. Let's just take the simple change to add UUIDv5 and leave it at that. Every change we make doesn't need to mean revisiting everything we've already done. IMO we need to be much more deliberate about changes than much of this conversation (and other recent conversations on the list) suggest. ïOn 2/13/19, 8:27 PM, "cti-stix@lists.oasis-open.org on behalf of John-Mark Gurney" <cti-stix@lists.oasis-open.org on behalf of jmg@newcontext.com> wrote: Sean Barnum wrote this message on Mon, Feb 04, 2019 at 20:29 +0000: > It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs. > > The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables. > The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable. Except that it will completely break versioning if we have the same UUID created by different authors. We have to be careful merging the SCO UUID language and the SDO UUID language such that we both break each other. We already have an issue w/ UUID's being repeated from the same org, but for different types of objects: https://github.com/pan-unit42/playbook_viewer/issues/7 Expanding this such that the same UUID's can be generated by different orgs, and the same type will break STIX significantly. > From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Date: Monday, February 4, 2019 at 3:25 PM > To: Sean Barnum <sean.barnum@FireEye.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > Actually, I don't agree with this part. > > The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it. > > - > Jason Keirstead > Lead Architect - IBM Security Connect > www.ibm.com/security > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > From: Sean Barnum <sean.barnum@FireEye.com> > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Date: 02/04/2019 04:23 PM > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > Sent by: <cti-stix@lists.oasis-open.org> > ________________________________ > > > > Agree. > > And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID. > It should just not be done on ALL properties of the object. > We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs. > > Sean Barnum > Principal Architect > FireEye > M: 703.473.8262 > E: sean.barnum@fireeye.com > > From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Date: Monday, February 4, 2019 at 3:20 PM > To: John-Mark Gurney <jmg@newcontext.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > I would think we would want to use a DNS or URL namespace, would we not? > > - > Jason Keirstead > Lead Architect - IBM Security Connect > www.ibm.com/security > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > From: John-Mark Gurney <jmg@newcontext.com> > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Cc: "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> > Date: 02/04/2019 04:10 PM > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > ________________________________ > > > > > Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > > I would also support this. > > > > I have learned more about the inner workings of UUID4/5 and I don't have > > any reservations about it anymore. The odds of collision with a > > properly-implemented UUID5 are on-par with UUID4 > > > > As far as John's comment below - all this means IMHO is the library has to > > force you to provide a namespace (ie make it a mandatory argument in your > > constructor or whatever). > > The one requirement I would like to make sure about UUIDv5 is that it > is NOT based upon the data from the object, otherwise versioning will > break. > > The reason we didn't use UUIDc4 as most of the proposals to use it was > to make it a hash of the contents, such as name and description, and > then update the UUID whenever the name and/or description changed.. > > If we do this, the name space should probably be the identity of the > new STIX2 object. This would prevent collisions from happening when > two entities try to create a "new" STIX2 object from a STIX1 object... > > > From: "Wunder, John A." <jwunder@mitre.org> > > To: Sergey Polzunov <sergey@eclecticiq.com>, > > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > > Date: 02/04/2019 12:22 PM > > Subject: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in > > STIX2 identifier > > Sent by: <cti-stix@lists.oasis-open.org> > > > > > > > > I've been thinking a lot about this and I think it makes sense. > > > > One of the concerns we had at the time we chose UUID4 is that users of > > libraries like python-stix would need to remember to set the UUID5 > > namespace -- or, if they don't and python-stix has some default namespace, > > different tools using the libraries could have overlapping IDs. This would > > also apply to users of the new Java libraries that I've seen come out. It > > might mean these libraries requiring that people set a unique namespace > > before creating any objects, vs. now where it can just go ahead and create > > IDs by default. I'd be curious what other people think about this problem > > and how we can help avoid it becoming an issue (especially given how many > > people use those libraries). > > > > John > > > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey > > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > > sergey@eclecticiq.com> wrote: > > > > Hey everybody! > > > > Current STIX2 spec definition of an`identifier` for STIX2 objects is > > as follows: > > > > > An identifier universally and uniquely identifies a SDO, SRO, > > Bundle, or Marking Definition. Identifiers MUST follow the form > > object-type--UUIDv4, where object-type is the exact value (all type names > > are lowercase strings, by definition) from the type property of the object > > being identified or referenced and where the UUIDv4 is an RFC > > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the > > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. > > from > > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > > > > > > I think the requirement to have UUID4 brings more problems than > > benefits. It makes STIX1->STIX2 transition difficult, hurting existing > > STIX1 users. > > I will try to show it in these 2 use cases. > > > > > > Use case 1 > > ---------- > > Imagine that I'm a client of an intelligence provider A. I've been > > a client for a long time and I have received intelligence in STIX1.2, > > which I stored in my DB. I fetch new intelligence daily, downloading only > > fresh data. Often fresh data links to old objects for context. > > Provider A decides to upgrade and switch to STIX2. In addition to > > an old STIX1.2 feed, provider creates new STIX2 feed with the same data. > > In STIX2 all objects have new identifiers and Provider A does not bother > > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have > > 2 options: > > - clean slate option: drop all old data from this provider and > > re-fetch everything. That will work if Provider A is the only provider I > > use or if I never referenced Provider A's data from my own intelligence. > > Not a great plan. > > - new era option: leave my STIX1.2 data graph in place and start > > consuming new STIX2 feed from today. This option has one big issue: new > > STIX2 data will not be connected to STIX1.2 data I already have, because > > STIX2 ids are all different. If I want to deduce connection, I need to > > deduplicate the data against my existing STIX1.2 DB. This means my > > ingestion pipeline must be smart enough to compare STIX1.2 objects to > > STIX2 objects and be fast enough to do that for every new STIX2 object. > > This will be difficult to implement and will have a huge performance > > penalty. > > > > Use case 2 > > ---------- > > Imagine that I'm a NCSC. I receive intelligence from providers, > > combine it and distribute it to my clients. My providers are still on > > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive > > into STIX2. Full STIX1.2 entities I can transform easily but what do I do > > with IDREFs I have in my STIX1.2 data? > > I can generate new STIX2 id every time I see new STIX1.2 IDREF in > > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next > > time I see this IDREF. This is painful and will require additional > > resources, but it is doable. But it will only work until the moment my > > providers switch to STIX2 and start sending me full objects for those > > IDREFs with new random STIX2 identifiers! I can not predict these > > identifiers and I can't match them with the ones I generated. So my > > thinking is - what is the point in even bothering with old IDREFs? I will > > just drop them, sending my clients sometimes disconnected STIX2 entities, > > hoping that they will figure it out. > > > > > > Proposed solutions if UUID5 is allowed in STIX2 identifiers: > > > > Use case 1 solution > > ------------------- > > There can be a guideline that will recommend providers to use old > > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are > > predictable I, as a client, can greatly simplify my deduplication logic. I > > can run DB migration once to calculate STIX2 identifiers for all my > > STIX1.2 objects and use these on ingestion for deduplication. Appending > > STIX2 data to my STIX1.2 DB will be much easier. > > I'm also interested in pushing Provider A to adopting this STIX2 > > identifier generation practice because it will save me money. > > > > Use case 2 solution > > ------------------- > > WIth UUID5 I have a way out: I can generate new STIX2 ids from old > > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct > > type to build new STIX2 identifier. The logic will be like this: > > - full IDREF will be input for UUID5 function > > - for STIX1.2 types that were split (like TTP), I do not know > > exact STIX2 type Provider would use for old TTP. My solution here would be > > to play safe and create relations for all possible types: for IDREF to > > TTP, I will create 4 relations: one to a possible Tool object, one to > > Malware, one to Attack Pattern and one to Identity. It is an overhead but > > it is a small price for keeping interconnected intelligence graph. > > Again, when time comes and my providers move to STIX2, I'm > > interested in pushing them to adopt this id generation schema for old > > objects, because it will save me, as NCSC, money. > > > > > > To reiterate, I would like to propose: > > - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in > > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; > > - creating a guideline, complimentary to the spec, that would explain > > how STIX1.2 ids can be transformed into STIX2 for easier transition. > > > > > > Practicalities: > > > > UUID5 ids require use of a namespace. UUID5 RFC ( > > https://tools.ietf.org/html/rfc4122#section-4.3 > > ) defines some generic namespaces ( > > https://tools.ietf.org/html/rfc4122#appendix-C > > ) but does not prohibits the use of custom ones. I suggest this algorithm: > > - namespace UUID5 is generated by using predefined `NameSpace_URL` > > namespace and producer's URL; > > - for old objects, GUID part of STIX2 identifier is namespaced > > UUID5 generated from old STIX1.2 id > > - for new objects, GUID part of STIX2 identifier is either > > namespaced UUID5 with random UUID4 string, or just random UUID4. > > > > Example python code for generating UUID5 with custom namespace: > > > > In [1]: import uuid > > ...: > > ...: stix12_id = > > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' > > ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, > > ' https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e= ') > > ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) > > ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) > > ...: > > ...: print("new STIX2 id: {}".format(stix2_id)) > > > > new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e > > > > > > BONUS: python functions to convert STIX1.2 IDREFs into STIX2 > > identifiers - > > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > > > > > > > > Thank you, > > Sergey Polzunov > > EclecticIQ > > > > > > > > > > > > > > > > > > > > > > -- > John-Mark > > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. -- John-Mark --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 13.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-14-2019 14:01
    By the way, another important (and in the case of some core software components from security vendors) issue that we have already mentioned: https://github.com/oasis-tcs/cti-stix2/issues/133 Could we relax the "MUST" to a "RECOMMENDED" for the UUIDv4 validation in the upcoming version of the STIX format? It's very common for UUID libraries to fall back to a time-based format when the PRNG is not accessible. Thank you. On 14/02/2019 14:42, Wunder, John A. wrote: > +1. Let's just take the simple change to add UUIDv5 and leave it at that. Every change we make doesn't need to mean revisiting everything we've already done. IMO we need to be much more deliberate about changes than much of this conversation (and other recent conversations on the list) suggest. > > ïOn 2/13/19, 8:27 PM, "cti-stix@lists.oasis-open.org on behalf of John-Mark Gurney" <cti-stix@lists.oasis-open.org on behalf of jmg@newcontext.com> wrote: > > Sean Barnum wrote this message on Mon, Feb 04, 2019 at 20:29 +0000: > > It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs. > > > > The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables. > > The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable. > > Except that it will completely break versioning if we have the same > UUID created by different authors. We have to be careful merging the > SCO UUID language and the SDO UUID language such that we both break > each other. > > We already have an issue w/ UUID's being repeated from the same org, > but for different types of objects: > https://github.com/pan-unit42/playbook_viewer/issues/7 > > Expanding this such that the same UUID's can be generated by different > orgs, and the same type will break STIX significantly. > > > From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > > Date: Monday, February 4, 2019 at 3:25 PM > > To: Sean Barnum <sean.barnum@FireEye.com> > > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > > > Actually, I don't agree with this part. > > > > The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it. > > > > - > > Jason Keirstead > > Lead Architect - IBM Security Connect > > www.ibm.com/security > > > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > > > > > > From: Sean Barnum <sean.barnum@FireEye.com> > > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> > > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Date: 02/04/2019 04:23 PM > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > Sent by: <cti-stix@lists.oasis-open.org> > > ________________________________ > > > > > > > > Agree. > > > > And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID. > > It should just not be done on ALL properties of the object. > > We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs. > > > > Sean Barnum > > Principal Architect > > FireEye > > M: 703.473.8262 > > E: sean.barnum@fireeye.com > > > > From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> > > Date: Monday, February 4, 2019 at 3:20 PM > > To: John-Mark Gurney <jmg@newcontext.com> > > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > > > I would think we would want to use a DNS or URL namespace, would we not? > > > > - > > Jason Keirstead > > Lead Architect - IBM Security Connect > > www.ibm.com/security > > > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > > > > > > From: John-Mark Gurney <jmg@newcontext.com> > > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > > Cc: "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Date: 02/04/2019 04:10 PM > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > ________________________________ > > > > > > > > > > Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > > > I would also support this. > > > > > > I have learned more about the inner workings of UUID4/5 and I don't have > > > any reservations about it anymore. The odds of collision with a > > > properly-implemented UUID5 are on-par with UUID4 > > > > > > As far as John's comment below - all this means IMHO is the library has to > > > force you to provide a namespace (ie make it a mandatory argument in your > > > constructor or whatever). > > > > The one requirement I would like to make sure about UUIDv5 is that it > > is NOT based upon the data from the object, otherwise versioning will > > break. > > > > The reason we didn't use UUIDc4 as most of the proposals to use it was > > to make it a hash of the contents, such as name and description, and > > then update the UUID whenever the name and/or description changed.. > > > > If we do this, the name space should probably be the identity of the > > new STIX2 object. This would prevent collisions from happening when > > two entities try to create a "new" STIX2 object from a STIX1 object... > > > > > From: "Wunder, John A." <jwunder@mitre.org> > > > To: Sergey Polzunov <sergey@eclecticiq.com>, > > > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > > > Date: 02/04/2019 12:22 PM > > > Subject: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in > > > STIX2 identifier > > > Sent by: <cti-stix@lists.oasis-open.org> > > > > > > > > > > > > I've been thinking a lot about this and I think it makes sense. > > > > > > One of the concerns we had at the time we chose UUID4 is that users of > > > libraries like python-stix would need to remember to set the UUID5 > > > namespace -- or, if they don't and python-stix has some default namespace, > > > different tools using the libraries could have overlapping IDs. This would > > > also apply to users of the new Java libraries that I've seen come out. It > > > might mean these libraries requiring that people set a unique namespace > > > before creating any objects, vs. now where it can just go ahead and create > > > IDs by default. I'd be curious what other people think about this problem > > > and how we can help avoid it becoming an issue (especially given how many > > > people use those libraries). > > > > > > John > > > > > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey > > > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > > > sergey@eclecticiq.com> wrote: > > > > > > Hey everybody! > > > > > > Current STIX2 spec definition of an`identifier` for STIX2 objects is > > > as follows: > > > > > > > An identifier universally and uniquely identifies a SDO, SRO, > > > Bundle, or Marking Definition. Identifiers MUST follow the form > > > object-type--UUIDv4, where object-type is the exact value (all type names > > > are lowercase strings, by definition) from the type property of the object > > > being identified or referenced and where the UUIDv4 is an RFC > > > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the > > > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. > > > from > > > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > > > > > > > > > I think the requirement to have UUID4 brings more problems than > > > benefits. It makes STIX1->STIX2 transition difficult, hurting existing > > > STIX1 users. > > > I will try to show it in these 2 use cases. > > > > > > > > > Use case 1 > > > ---------- > > > Imagine that I'm a client of an intelligence provider A. I've been > > > a client for a long time and I have received intelligence in STIX1.2, > > > which I stored in my DB. I fetch new intelligence daily, downloading only > > > fresh data. Often fresh data links to old objects for context. > > > Provider A decides to upgrade and switch to STIX2. In addition to > > > an old STIX1.2 feed, provider creates new STIX2 feed with the same data. > > > In STIX2 all objects have new identifiers and Provider A does not bother > > > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have > > > 2 options: > > > - clean slate option: drop all old data from this provider and > > > re-fetch everything. That will work if Provider A is the only provider I > > > use or if I never referenced Provider A's data from my own intelligence. > > > Not a great plan. > > > - new era option: leave my STIX1.2 data graph in place and start > > > consuming new STIX2 feed from today. This option has one big issue: new > > > STIX2 data will not be connected to STIX1.2 data I already have, because > > > STIX2 ids are all different. If I want to deduce connection, I need to > > > deduplicate the data against my existing STIX1.2 DB. This means my > > > ingestion pipeline must be smart enough to compare STIX1.2 objects to > > > STIX2 objects and be fast enough to do that for every new STIX2 object. > > > This will be difficult to implement and will have a huge performance > > > penalty. > > > > > > Use case 2 > > > ---------- > > > Imagine that I'm a NCSC. I receive intelligence from providers, > > > combine it and distribute it to my clients. My providers are still on > > > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive > > > into STIX2. Full STIX1.2 entities I can transform easily but what do I do > > > with IDREFs I have in my STIX1.2 data? > > > I can generate new STIX2 id every time I see new STIX1.2 IDREF in > > > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next > > > time I see this IDREF. This is painful and will require additional > > > resources, but it is doable. But it will only work until the moment my > > > providers switch to STIX2 and start sending me full objects for those > > > IDREFs with new random STIX2 identifiers! I can not predict these > > > identifiers and I can't match them with the ones I generated. So my > > > thinking is - what is the point in even bothering with old IDREFs? I will > > > just drop them, sending my clients sometimes disconnected STIX2 entities, > > > hoping that they will figure it out. > > > > > > > > > Proposed solutions if UUID5 is allowed in STIX2 identifiers: > > > > > > Use case 1 solution > > > ------------------- > > > There can be a guideline that will recommend providers to use old > > > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are > > > predictable I, as a client, can greatly simplify my deduplication logic. I > > > can run DB migration once to calculate STIX2 identifiers for all my > > > STIX1.2 objects and use these on ingestion for deduplication. Appending > > > STIX2 data to my STIX1.2 DB will be much easier. > > > I'm also interested in pushing Provider A to adopting this STIX2 > > > identifier generation practice because it will save me money. > > > > > > Use case 2 solution > > > ------------------- > > > WIth UUID5 I have a way out: I can generate new STIX2 ids from old > > > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > > > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct > > > type to build new STIX2 identifier. The logic will be like this: > > > - full IDREF will be input for UUID5 function > > > - for STIX1.2 types that were split (like TTP), I do not know > > > exact STIX2 type Provider would use for old TTP. My solution here would be > > > to play safe and create relations for all possible types: for IDREF to > > > TTP, I will create 4 relations: one to a possible Tool object, one to > > > Malware, one to Attack Pattern and one to Identity. It is an overhead but > > > it is a small price for keeping interconnected intelligence graph. > > > Again, when time comes and my providers move to STIX2, I'm > > > interested in pushing them to adopt this id generation schema for old > > > objects, because it will save me, as NCSC, money. > > > > > > > > > To reiterate, I would like to propose: > > > - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in > > > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; > > > - creating a guideline, complimentary to the spec, that would explain > > > how STIX1.2 ids can be transformed into STIX2 for easier transition. > > > > > > > > > Practicalities: > > > > > > UUID5 ids require use of a namespace. UUID5 RFC ( > > > https://tools.ietf.org/html/rfc4122#section-4.3 > > > ) defines some generic namespaces ( > > > https://tools.ietf.org/html/rfc4122#appendix-C > > > ) but does not prohibits the use of custom ones. I suggest this algorithm: > > > - namespace UUID5 is generated by using predefined `NameSpace_URL` > > > namespace and producer's URL; > > > - for old objects, GUID part of STIX2 identifier is namespaced > > > UUID5 generated from old STIX1.2 id > > > - for new objects, GUID part of STIX2 identifier is either > > > namespaced UUID5 with random UUID4 string, or just random UUID4. > > > > > > Example python code for generating UUID5 with custom namespace: > > > > > > In [1]: import uuid > > > ...: > > > ...: stix12_id = > > > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' > > > ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, > > > ' https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e= ') > > > ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) > > > ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) > > > ...: > > > ...: print("new STIX2 id: {}".format(stix2_id)) > > > > > > new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e > > > > > > > > > BONUS: python functions to convert STIX1.2 IDREFs into STIX2 > > > identifiers - > > > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > > > > > > > > > > > > Thank you, > > > Sergey Polzunov > > > EclecticIQ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > John-Mark > > > > > > > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. > > > > > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. > > -- > John-Mark > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php > > > -- Alexandre Dulaunoy CIRCL - Computer Incident Response Center Luxembourg 16, bd d'Avranches L-1160 Luxembourg info@circl.lu - www.circl.lu - (+352) 247 88444


  • 14.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-14-2019 14:40
    I can assert from real-world experience not opinion that this is simply untrue. It does not break versioning or STIX. This is how we operate with billions of objects today from a large number of sources internal and external and have for over 2 years. It works and works well. There really is not much difference between versioning in this scenario or with content coming from diverse sources within a single org. You have to track and consider the id, the modified date and the source of the object (the source should be explicit and NOT presumed to always be conflated into the ID). The situations of players out there using the same ids for multiple different objects is NOT the fault of how IDs are specified in STIX. Those situations are clear violations of the spec. They are simply doing it wrong. Those poor implementations that are clearly non-conformant should not be used to prevent us from improving how we specify and handle ids. Sean Barnum Principal Architect FireEye M: 703.473.8262 E: sean.barnum@fireeye.com ïOn 2/13/19, 8:26 PM, "cti-stix@lists.oasis-open.org on behalf of John-Mark Gurney" <cti-stix@lists.oasis-open.org on behalf of jmg@newcontext.com> wrote: Sean Barnum wrote this message on Mon, Feb 04, 2019 at 20:29 +0000: > It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs. > > The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables. > The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable. Except that it will completely break versioning if we have the same UUID created by different authors. We have to be careful merging the SCO UUID language and the SDO UUID language such that we both break each other. We already have an issue w/ UUID's being repeated from the same org, but for different types of objects: https://github.com/pan-unit42/playbook_viewer/issues/7 Expanding this such that the same UUID's can be generated by different orgs, and the same type will break STIX significantly. > From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Date: Monday, February 4, 2019 at 3:25 PM > To: Sean Barnum <sean.barnum@FireEye.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > Actually, I don't agree with this part. > > The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it. > > - > Jason Keirstead > Lead Architect - IBM Security Connect > www.ibm.com/security > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > From: Sean Barnum <sean.barnum@FireEye.com> > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Date: 02/04/2019 04:23 PM > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > Sent by: <cti-stix@lists.oasis-open.org> > ________________________________ > > > > Agree. > > And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID. > It should just not be done on ALL properties of the object. > We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs. > > Sean Barnum > Principal Architect > FireEye > M: 703.473.8262 > E: sean.barnum@fireeye.com > > From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Date: Monday, February 4, 2019 at 3:20 PM > To: John-Mark Gurney <jmg@newcontext.com> > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > I would think we would want to use a DNS or URL namespace, would we not? > > - > Jason Keirstead > Lead Architect - IBM Security Connect > www.ibm.com/security > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > From: John-Mark Gurney <jmg@newcontext.com> > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > Cc: "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> > Date: 02/04/2019 04:10 PM > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > ________________________________ > > > > > Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > > I would also support this. > > > > I have learned more about the inner workings of UUID4/5 and I don't have > > any reservations about it anymore. The odds of collision with a > > properly-implemented UUID5 are on-par with UUID4 > > > > As far as John's comment below - all this means IMHO is the library has to > > force you to provide a namespace (ie make it a mandatory argument in your > > constructor or whatever). > > The one requirement I would like to make sure about UUIDv5 is that it > is NOT based upon the data from the object, otherwise versioning will > break. > > The reason we didn't use UUIDc4 as most of the proposals to use it was > to make it a hash of the contents, such as name and description, and > then update the UUID whenever the name and/or description changed.. > > If we do this, the name space should probably be the identity of the > new STIX2 object. This would prevent collisions from happening when > two entities try to create a "new" STIX2 object from a STIX1 object... > > > From: "Wunder, John A." <jwunder@mitre.org> > > To: Sergey Polzunov <sergey@eclecticiq.com>, > > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > > Date: 02/04/2019 12:22 PM > > Subject: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in > > STIX2 identifier > > Sent by: <cti-stix@lists.oasis-open.org> > > > > > > > > I've been thinking a lot about this and I think it makes sense. > > > > One of the concerns we had at the time we chose UUID4 is that users of > > libraries like python-stix would need to remember to set the UUID5 > > namespace -- or, if they don't and python-stix has some default namespace, > > different tools using the libraries could have overlapping IDs. This would > > also apply to users of the new Java libraries that I've seen come out. It > > might mean these libraries requiring that people set a unique namespace > > before creating any objects, vs. now where it can just go ahead and create > > IDs by default. I'd be curious what other people think about this problem > > and how we can help avoid it becoming an issue (especially given how many > > people use those libraries). > > > > John > > > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey > > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > > sergey@eclecticiq.com> wrote: > > > > Hey everybody! > > > > Current STIX2 spec definition of an`identifier` for STIX2 objects is > > as follows: > > > > > An identifier universally and uniquely identifies a SDO, SRO, > > Bundle, or Marking Definition. Identifiers MUST follow the form > > object-type--UUIDv4, where object-type is the exact value (all type names > > are lowercase strings, by definition) from the type property of the object > > being identified or referenced and where the UUIDv4 is an RFC > > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the > > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. > > from > > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > > > > > > I think the requirement to have UUID4 brings more problems than > > benefits. It makes STIX1->STIX2 transition difficult, hurting existing > > STIX1 users. > > I will try to show it in these 2 use cases. > > > > > > Use case 1 > > ---------- > > Imagine that I'm a client of an intelligence provider A. I've been > > a client for a long time and I have received intelligence in STIX1.2, > > which I stored in my DB. I fetch new intelligence daily, downloading only > > fresh data. Often fresh data links to old objects for context. > > Provider A decides to upgrade and switch to STIX2. In addition to > > an old STIX1.2 feed, provider creates new STIX2 feed with the same data. > > In STIX2 all objects have new identifiers and Provider A does not bother > > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have > > 2 options: > > - clean slate option: drop all old data from this provider and > > re-fetch everything. That will work if Provider A is the only provider I > > use or if I never referenced Provider A's data from my own intelligence. > > Not a great plan. > > - new era option: leave my STIX1.2 data graph in place and start > > consuming new STIX2 feed from today. This option has one big issue: new > > STIX2 data will not be connected to STIX1.2 data I already have, because > > STIX2 ids are all different. If I want to deduce connection, I need to > > deduplicate the data against my existing STIX1.2 DB. This means my > > ingestion pipeline must be smart enough to compare STIX1.2 objects to > > STIX2 objects and be fast enough to do that for every new STIX2 object. > > This will be difficult to implement and will have a huge performance > > penalty. > > > > Use case 2 > > ---------- > > Imagine that I'm a NCSC. I receive intelligence from providers, > > combine it and distribute it to my clients. My providers are still on > > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive > > into STIX2. Full STIX1.2 entities I can transform easily but what do I do > > with IDREFs I have in my STIX1.2 data? > > I can generate new STIX2 id every time I see new STIX1.2 IDREF in > > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next > > time I see this IDREF. This is painful and will require additional > > resources, but it is doable. But it will only work until the moment my > > providers switch to STIX2 and start sending me full objects for those > > IDREFs with new random STIX2 identifiers! I can not predict these > > identifiers and I can't match them with the ones I generated. So my > > thinking is - what is the point in even bothering with old IDREFs? I will > > just drop them, sending my clients sometimes disconnected STIX2 entities, > > hoping that they will figure it out. > > > > > > Proposed solutions if UUID5 is allowed in STIX2 identifiers: > > > > Use case 1 solution > > ------------------- > > There can be a guideline that will recommend providers to use old > > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are > > predictable I, as a client, can greatly simplify my deduplication logic. I > > can run DB migration once to calculate STIX2 identifiers for all my > > STIX1.2 objects and use these on ingestion for deduplication. Appending > > STIX2 data to my STIX1.2 DB will be much easier. > > I'm also interested in pushing Provider A to adopting this STIX2 > > identifier generation practice because it will save me money. > > > > Use case 2 solution > > ------------------- > > WIth UUID5 I have a way out: I can generate new STIX2 ids from old > > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct > > type to build new STIX2 identifier. The logic will be like this: > > - full IDREF will be input for UUID5 function > > - for STIX1.2 types that were split (like TTP), I do not know > > exact STIX2 type Provider would use for old TTP. My solution here would be > > to play safe and create relations for all possible types: for IDREF to > > TTP, I will create 4 relations: one to a possible Tool object, one to > > Malware, one to Attack Pattern and one to Identity. It is an overhead but > > it is a small price for keeping interconnected intelligence graph. > > Again, when time comes and my providers move to STIX2, I'm > > interested in pushing them to adopt this id generation schema for old > > objects, because it will save me, as NCSC, money. > > > > > > To reiterate, I would like to propose: > > - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in > > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; > > - creating a guideline, complimentary to the spec, that would explain > > how STIX1.2 ids can be transformed into STIX2 for easier transition. > > > > > > Practicalities: > > > > UUID5 ids require use of a namespace. UUID5 RFC ( > > https://tools.ietf.org/html/rfc4122#section-4.3 > > ) defines some generic namespaces ( > > https://tools.ietf.org/html/rfc4122#appendix-C > > ) but does not prohibits the use of custom ones. I suggest this algorithm: > > - namespace UUID5 is generated by using predefined `NameSpace_URL` > > namespace and producer's URL; > > - for old objects, GUID part of STIX2 identifier is namespaced > > UUID5 generated from old STIX1.2 id > > - for new objects, GUID part of STIX2 identifier is either > > namespaced UUID5 with random UUID4 string, or just random UUID4. > > > > Example python code for generating UUID5 with custom namespace: > > > > In [1]: import uuid > > ...: > > ...: stix12_id = > > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' > > ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, > > ' https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e= ') > > ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) > > ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) > > ...: > > ...: print("new STIX2 id: {}".format(stix2_id)) > > > > new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e > > > > > > BONUS: python functions to convert STIX1.2 IDREFs into STIX2 > > identifiers - > > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > > > > > > > > Thank you, > > Sergey Polzunov > > EclecticIQ > > > > > > > > > > > > > > > > > > > > > > -- > John-Mark > > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. -- John-Mark --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail. Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto.


  • 15.  Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier

    Posted 02-14-2019 23:34
    Sean Barnum wrote this message on Thu, Feb 14, 2019 at 14:39 +0000: > I can assert from real-world experience not opinion that this is simply untrue. It does not break versioning or STIX. From 3.4: A version of a STIX Object is identified uniquely by the combination of its id and modified properties. STIX Objects have a single object creator, the entity that generates the id for the object and creates the first version. The object creator may (but not necessarily will) be identified in the created_by_ref property of the object. Only the object creator is permitted to create new versions of a STIX Object. Producers other than the object creator MUST NOT create new versions of that object. We assume that two different creators will produce a different identifiers. Otherwise we cannot tell versions of two the objects belong to which producer. You may have done this in your product, allowed two different producers to generate different versions of an SDO, but that does not mean that it is interoperable and will work with everyone. > This is how we operate with billions of objects today from a large number of sources internal and external and have for over 2 years. It works and works well. > There really is not much difference between versioning in this scenario or with content coming from diverse sources within a single org. > You have to track and consider the id, the modified date and the source of the object (the source should be explicit and NOT presumed to always be conflated into the ID). I have no issues w/ players adding their own ID to an object to do this, but the UUID needs to be unique, not mostly unique. > The situations of players out there using the same ids for multiple different objects is NOT the fault of how IDs are specified in STIX. Those situations are clear violations of the spec. They are simply doing it wrong. Those poor implementations that are clearly non-conformant should not be used to prevent us from improving how we specify and handle ids. > ïOn 2/13/19, 8:26 PM, "cti-stix@lists.oasis-open.org on behalf of John-Mark Gurney" <cti-stix@lists.oasis-open.org on behalf of jmg@newcontext.com> wrote: > > Sean Barnum wrote this message on Mon, Feb 04, 2019 at 20:29 +0000: > > It is to assist in semantic equivalence normalization across producers. Just like we have done for SCOs. > > > > The reality is that objects such as Locations, Identities, etc are likely to be repeated and widely used about as much as many Observables. > > The ability to inherently converge on equivalence of things like Locations and Identities via UUIDv5 calculation is extremely valuable. > > Except that it will completely break versioning if we have the same > UUID created by different authors. We have to be careful merging the > SCO UUID language and the SDO UUID language such that we both break > each other. > > We already have an issue w/ UUID's being repeated from the same org, > but for different types of objects: > https://github.com/pan-unit42/playbook_viewer/issues/7 > > Expanding this such that the same UUID's can be generated by different > orgs, and the same type will break STIX significantly. > > > From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > > Date: Monday, February 4, 2019 at 3:25 PM > > To: Sean Barnum <sean.barnum@FireEye.com> > > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, John-Mark Gurney <jmg@newcontext.com>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > > > Actually, I don't agree with this part. > > > > The entire point of UUIDv5, is that I should not care what method you use to compute your IDs - because it's in your namespace, so its not my problem anymore. I don't think we want to codify it. > > > > - > > Jason Keirstead > > Lead Architect - IBM Security Connect > > www.ibm.com/security > > > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > > > > > > From: Sean Barnum <sean.barnum@FireEye.com> > > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, John-Mark Gurney <jmg@newcontext.com> > > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Date: 02/04/2019 04:23 PM > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > Sent by: <cti-stix@lists.oasis-open.org> > > ________________________________ > > > > > > > > Agree. > > > > And I would suggest we DO want the calculation to be done based upon the data from the object . As that is how we get value from such an ID. > > It should just not be done on ALL properties of the object. > > We just need to define the semantically relevant properties to use for the calculation. This is exactly what we have just done for SCOs. > > > > Sean Barnum > > Principal Architect > > FireEye > > M: 703.473.8262 > > E: sean.barnum@fireeye.com > > > > From: <cti-stix@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> > > Date: Monday, February 4, 2019 at 3:20 PM > > To: John-Mark Gurney <jmg@newcontext.com> > > Cc: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > > > I would think we would want to use a DNS or URL namespace, would we not? > > > > - > > Jason Keirstead > > Lead Architect - IBM Security Connect > > www.ibm.com/security > > > > "Things may come to those who wait, but only the things left by those who hustle." - Unknown > > > > > > > > > > From: John-Mark Gurney <jmg@newcontext.com> > > To: Jason Keirstead <Jason.Keirstead@ca.ibm.com> > > Cc: "Wunder, John A." <jwunder@mitre.org>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, Sergey Polzunov <sergey@eclecticiq.com> > > Date: 02/04/2019 04:10 PM > > Subject: Re: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in STIX2 identifier > > ________________________________ > > > > > > > > > > Jason Keirstead wrote this message on Mon, Feb 04, 2019 at 14:08 -0400: > > > I would also support this. > > > > > > I have learned more about the inner workings of UUID4/5 and I don't have > > > any reservations about it anymore. The odds of collision with a > > > properly-implemented UUID5 are on-par with UUID4 > > > > > > As far as John's comment below - all this means IMHO is the library has to > > > force you to provide a namespace (ie make it a mandatory argument in your > > > constructor or whatever). > > > > The one requirement I would like to make sure about UUIDv5 is that it > > is NOT based upon the data from the object, otherwise versioning will > > break. > > > > The reason we didn't use UUIDc4 as most of the proposals to use it was > > to make it a hash of the contents, such as name and description, and > > then update the UUID whenever the name and/or description changed.. > > > > If we do this, the name space should probably be the identity of the > > new STIX2 object. This would prevent collisions from happening when > > two entities try to create a "new" STIX2 object from a STIX1 object... > > > > > From: "Wunder, John A." <jwunder@mitre.org> > > > To: Sergey Polzunov <sergey@eclecticiq.com>, > > > "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> > > > Date: 02/04/2019 12:22 PM > > > Subject: [cti-stix] Re: [EXT] [cti-stix] ability to use UUID5 in > > > STIX2 identifier > > > Sent by: <cti-stix@lists.oasis-open.org> > > > > > > > > > > > > I've been thinking a lot about this and I think it makes sense. > > > > > > One of the concerns we had at the time we chose UUID4 is that users of > > > libraries like python-stix would need to remember to set the UUID5 > > > namespace -- or, if they don't and python-stix has some default namespace, > > > different tools using the libraries could have overlapping IDs. This would > > > also apply to users of the new Java libraries that I've seen come out. It > > > might mean these libraries requiring that people set a unique namespace > > > before creating any objects, vs. now where it can just go ahead and create > > > IDs by default. I'd be curious what other people think about this problem > > > and how we can help avoid it becoming an issue (especially given how many > > > people use those libraries). > > > > > > John > > > > > > On 2/4/19, 11:12 AM, "cti-stix@lists.oasis-open.org on behalf of Sergey > > > Polzunov" <cti-stix@lists.oasis-open.org on behalf of > > > sergey@eclecticiq.com> wrote: > > > > > > Hey everybody! > > > > > > Current STIX2 spec definition of an`identifier` for STIX2 objects is > > > as follows: > > > > > > > An identifier universally and uniquely identifies a SDO, SRO, > > > Bundle, or Marking Definition. Identifiers MUST follow the form > > > object-type--UUIDv4, where object-type is the exact value (all type names > > > are lowercase strings, by definition) from the type property of the object > > > being identified or referenced and where the UUIDv4 is an RFC > > > 4122-compliant Version 4 UUID. The UUID MUST be generated according to the > > > algorithm(s) defined in RFC 4122, section 4.4 (Version 4 UUID) [RFC4122]. > > > from > > > http://docs.oasis-open.org/cti/stix/v2.0/cs01/part1-stix-core/stix-v2.0-cs01-part1-stix-core.html#_Toc496709265 > > > > > > > > > I think the requirement to have UUID4 brings more problems than > > > benefits. It makes STIX1->STIX2 transition difficult, hurting existing > > > STIX1 users. > > > I will try to show it in these 2 use cases. > > > > > > > > > Use case 1 > > > ---------- > > > Imagine that I'm a client of an intelligence provider A. I've been > > > a client for a long time and I have received intelligence in STIX1.2, > > > which I stored in my DB. I fetch new intelligence daily, downloading only > > > fresh data. Often fresh data links to old objects for context. > > > Provider A decides to upgrade and switch to STIX2. In addition to > > > an old STIX1.2 feed, provider creates new STIX2 feed with the same data. > > > In STIX2 all objects have new identifiers and Provider A does not bother > > > to supply a mapping of STIX1.2 ids to STIX2 ids. Now I, as a client, have > > > 2 options: > > > - clean slate option: drop all old data from this provider and > > > re-fetch everything. That will work if Provider A is the only provider I > > > use or if I never referenced Provider A's data from my own intelligence. > > > Not a great plan. > > > - new era option: leave my STIX1.2 data graph in place and start > > > consuming new STIX2 feed from today. This option has one big issue: new > > > STIX2 data will not be connected to STIX1.2 data I already have, because > > > STIX2 ids are all different. If I want to deduce connection, I need to > > > deduplicate the data against my existing STIX1.2 DB. This means my > > > ingestion pipeline must be smart enough to compare STIX1.2 objects to > > > STIX2 objects and be fast enough to do that for every new STIX2 object. > > > This will be difficult to implement and will have a huge performance > > > penalty. > > > > > > Use case 2 > > > ---------- > > > Imagine that I'm a NCSC. I receive intelligence from providers, > > > combine it and distribute it to my clients. My providers are still on > > > STIX1.2 but my clients want STIX2, so I must convert STIX1.2 I receive > > > into STIX2. Full STIX1.2 entities I can transform easily but what do I do > > > with IDREFs I have in my STIX1.2 data? > > > I can generate new STIX2 id every time I see new STIX1.2 IDREF in > > > incoming data and store STIX1.2->STIX2 mapping somewhere to be used next > > > time I see this IDREF. This is painful and will require additional > > > resources, but it is doable. But it will only work until the moment my > > > providers switch to STIX2 and start sending me full objects for those > > > IDREFs with new random STIX2 identifiers! I can not predict these > > > identifiers and I can't match them with the ones I generated. So my > > > thinking is - what is the point in even bothering with old IDREFs? I will > > > just drop them, sending my clients sometimes disconnected STIX2 entities, > > > hoping that they will figure it out. > > > > > > > > > Proposed solutions if UUID5 is allowed in STIX2 identifiers: > > > > > > Use case 1 solution > > > ------------------- > > > There can be a guideline that will recommend providers to use old > > > STIX1.2 IDs as input for new STIX2 identifiers. If STIX2 identifiers are > > > predictable I, as a client, can greatly simplify my deduplication logic. I > > > can run DB migration once to calculate STIX2 identifiers for all my > > > STIX1.2 objects and use these on ingestion for deduplication. Appending > > > STIX2 data to my STIX1.2 DB will be much easier. > > > I'm also interested in pushing Provider A to adopting this STIX2 > > > identifier generation practice because it will save me money. > > > > > > Use case 2 solution > > > ------------------- > > > WIth UUID5 I have a way out: I can generate new STIX2 ids from old > > > STIX1.2 ids! I can parse IDREF value, that looks like `[ns > > > prefix]:[construct type]-[GUID]`, and use provider's namespace / construct > > > type to build new STIX2 identifier. The logic will be like this: > > > - full IDREF will be input for UUID5 function > > > - for STIX1.2 types that were split (like TTP), I do not know > > > exact STIX2 type Provider would use for old TTP. My solution here would be > > > to play safe and create relations for all possible types: for IDREF to > > > TTP, I will create 4 relations: one to a possible Tool object, one to > > > Malware, one to Attack Pattern and one to Identity. It is an overhead but > > > it is a small price for keeping interconnected intelligence graph. > > > Again, when time comes and my providers move to STIX2, I'm > > > interested in pushing them to adopt this id generation schema for old > > > objects, because it will save me, as NCSC, money. > > > > > > > > > To reiterate, I would like to propose: > > > - a change in STIX2 spec to allow both UUID5 and UUID4 to be used in > > > an identifier of SDO, SRO, MarkingDefinition and Custom Object entities; > > > - creating a guideline, complimentary to the spec, that would explain > > > how STIX1.2 ids can be transformed into STIX2 for easier transition. > > > > > > > > > Practicalities: > > > > > > UUID5 ids require use of a namespace. UUID5 RFC ( > > > https://tools.ietf.org/html/rfc4122#section-4.3 > > > ) defines some generic namespaces ( > > > https://tools.ietf.org/html/rfc4122#appendix-C > > > ) but does not prohibits the use of custom ones. I suggest this algorithm: > > > - namespace UUID5 is generated by using predefined `NameSpace_URL` > > > namespace and producer's URL; > > > - for old objects, GUID part of STIX2 identifier is namespaced > > > UUID5 generated from old STIX1.2 id > > > - for new objects, GUID part of STIX2 identifier is either > > > namespaced UUID5 with random UUID4 string, or just random UUID4. > > > > > > Example python code for generating UUID5 with custom namespace: > > > > > > In [1]: import uuid > > > ...: > > > ...: stix12_id = > > > 'eclecticiq:threat-actor-07fa8672-4bca-46e1-a60f-023882b4a473' > > > ...: namespace_uuid = uuid.uuid5(uuid.NAMESPACE_URL, > > > ' https://urldefense.proofpoint.com/v2/url?u=https-3A__eclecticiq.com_ns&d=DwIGaQ&c=jf_iaSHvJObTbx-siA1ZOg&r=k6Q07xZDujljzkKqZUfupXFUDIHGIiq-Sl_u1bw0hyA&m=cvP-VddGmd9zTZUjb6OSCUczFxCjDL1cA586YiCE8YI&s=tQbiU4LJBfzo5lmgDPo4k6EjM9ZKKwE6AzhNphzBRcM&e= ') > > > ...: stix2_uuid = uuid.uuid5(namespace_uuid, stix12_id) > > > ...: stix2_id = 'threat-actor--{}'.format(stix2_uuid) > > > ...: > > > ...: print("new STIX2 id: {}".format(stix2_id)) > > > > > > new STIX2 id: threat-actor--adee573a-12e9-5dd3-958b-0040d32c6b3e > > > > > > > > > BONUS: python functions to convert STIX1.2 IDREFs into STIX2 > > > identifiers - > > > https://gist.github.com/traut/fd4b9b8de3c2aa0e161d68c4099656e5 > > > > > > > > > > > > Thank you, > > > Sergey Polzunov > > > EclecticIQ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > John-Mark > > > > > > > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. > > > > > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. > > -- > John-Mark > > --------------------------------------------------------------------- > To unsubscribe from this mail list, you must leave the OASIS TC that > generates this mail. Follow this link to all your TCs in OASIS at: > https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php > > > > This email and any attachments thereto may contain private, confidential, and/or privileged material for the sole use of the intended recipient. Any review, copying, or distribution of this email (or any attachments thereto) by others is strictly prohibited. If you are not the intended recipient, please contact the sender immediately and permanently delete the original and any copies of this email and any attachments thereto. -- John-Mark