OASIS Cyber Threat Intelligence (CTI) TC

Expand all | Collapse all

RE: [cti] Idea for Internationalization (was: Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0))

  • 1.  RE: [cti] Idea for Internationalization (was: Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0))

    Posted 02-02-2016 03:03




     
    Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.
     
    Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.
     
    We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients can evaluate the applicability and importance.  The original, English verison is also included,
    so if the recipient deems it applicable, he/she can translate the rest.
     
    This is all for human-readable reports, but the use case seems similar.
     
    Proposal:
     

    1.       
     Add an optional language tag in all top level constructs. 


    2.       
    Add an optional Alternate Language tag to Relationship objects.

    3.       
    Producers can create multiple language-specific versions of whatever top-level objects they wish. 


    4.       
    Producers can create Alternate Language relationships between these alternate language objects.

    5.       
    Consumers can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions.

    6.       
    If the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.
     
    Use Cases:
     

    1.       
     I produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together
    with an Alternate Language relationship.

    2.       
    I consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships.

    3.       
    I consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English
    version.  When only an English version exists, I display the English version.
     
    For consideration,
     
    Chris Ricard
    FS-ISAC
     
     
     



    From: Jordan, Bret
    Sent: Monday, February 1, 2016 9:02 PM
    To: Masuoka, Ryusuke
    Cc: cti@lists.oasis-open.org
    Subject: Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)

     
    Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.  

     


    One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang".  This would be the language that the object is
    written in.  This would enable tools to select and filter by a language.  


     


    Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed
    around the idea of creating a "translation" object that could be sent either with the original TLO or separately. 


     


    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:


     


    1) A threat intel provider could issue TLOs in language specific versions if they wanted.


     


    2) A threat intel provider could produce language translations and attach them to the TLO.  


     


    3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.


     


    There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated objects to
    auto point to a new version.  They would be tied at the hip, to the version that were created for.


     

    The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.  


     


    What would you think of something like this?








     


    Thanks,


     


    Bret



     


     


     



    Bret Jordan CISSP

    Director of Security Architecture and Standards Office of the CTO


    Blue Coat Systems



    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050


    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 









     



    On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:

     


    Hi, Bret,


     


    I guess it depends. But what I see is scenarios like the following:


     


    - A Japanese entity receives CTI information pieces in English.


      The entity determines some of them are important/critical


      and worth translating them into Japanese, add descriptions in Japanese


    and redistribute them to other Japanese entities (if redistribution is allowed).


      The CTIM (CTI Management System) of a receiving party displays


      the Japanese description whenever possible, while allowing access to


      the original English descriptions.


     


    - Japanese entities produce CTI in Japanese (not in English, surprise!).


      An entity decides some of them are important/critical and worth


    translating them into English, add descriptions in English,


    and redistribute them to other countries (if redistribution is allowed).


    The CTIM of a receiving party displays  the English description if so set,


    while allowing access to the original Japanese (likely more accurate)


    descriptions.


     


    Regards,


     


    Ryu


     




    From:   Jordan,
    Bret [ mailto:bret.jordan@bluecoat.com ]  
    Sent:   Tuesday, February 02, 2016 10:17 AM
    To:   Masuoka, Ryusuke/ ??   ?? ;
    cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)




     


    Some questions:



    Will organizations producing threat intelligence produce one incident for each language?  


    Or will they produce one big incident that contains all of the languages?


    For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?  


    Or would you expect the TAXII server to send you both?


    What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?





     




     










    Thanks,




     




    Bret





     




     




     





    Bret Jordan CISSP



    Director of Security Architecture and Standards Office of the CTO




    Blue Coat Systems





    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050




    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 











     





    On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:



     




    Hi,




     




    Not UTF-8 thing (I understand most of modern programming languages




    and other standards deal with it correctly).




     




    It is about having text fields in multiple languages.




    For example, descriptions of a package in English and Japanese.




    The system will pick which language to display based on




    the language code ( “ en ”   or   “ jp ” )
    in the field.




     




    Is it something already discussed in Slack?




    (Sorry if so.)




     




    Regards,




     




    Ryu




     






    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Jordan, Bret
    Sent:   Tuesday, February 02, 2016 9:59 AM
    To:   Masuoka, Ryusuke/ ??   ??
    Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)






     




    I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.
     So this alone should make things easier for our international friends..





     






    If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not
    really understand the problem that we were trying to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?






     






    I really want to make sure our baby works for everyone.  But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner
    store and get a coke".









     




    Thanks,






     






    Bret







     






     






     







    Bret Jordan CISSP





    Director of Security Architecture and Standards Office of the CTO






    Blue Coat Systems







    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050






    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 













     







    On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:





     






    Hi,






     






    Is there a place for   “ Internationalization ”   of
    text fields?






    I would like very much to see it in STIX 2.0 (or CTI Common?)






    and I am willing to contribute.






     






    Regards,






     






    Ryu






     








    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Barnum, Sean D.
    Sent:   Tuesday, February 02, 2016 3:49 AM
    To:   cti@lists.oasis-open.org
    Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)








     










    All,








     








    As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a
    more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.








     








    A  proposed tranche plan  is:





    February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to talk to vendor on the RSA show floor about it.  And have something to
    show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions, assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure patterns, etc) and quality.




    This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning and key object support decisions with remaining
    tranches focused on key object refactoring based on decisions from the Indicator tranche.








    Please let us  know if you see any issues with this tranche plan.








     








     








    The first tranche (Indicators) is the most relevant for now as it begins today.








    Below is a draft plan for the Indicator tranche. This draft  Indicator Tranche plan
    is also in the wiki.








    This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it.








    We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it.








    If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.








     








     








     






    Indicator tranche plan

    Objective:
    To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are required to support common indicator
    use cases.
    Target completion date:
    February 29, 2016
    Proposed workflow:


    Raise and describe the issue with a brief wiki writeup
    Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly
    marked as a proposal) to the wiki writeup and announce it to the list.
    Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus. 
    Discuss key issues on weekly working call.
    If consensus (unanimous or at least no strong objections) reached:



    Capture normative language in pre-draft spec document
    Capture consensus changes in JSON Schema implementation
    Capture consensus changes in UML model
    Capture statement of consensus in issue tracker
    Mark issue tracker as “Consensus Achieved”
    Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion



    If consensus not achieved (strong objection exists) within allowed time window:



    Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone
    OR





















    o   
    Capture current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of
    tranche
     






  • 2.  Re: [cti] Idea for Internationalization

    Posted 02-02-2016 03:30
    Those are great use cases and match what Ryu brought up.  We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have some general consensus around the idea that: All TLOs should have a field called lang that defines the languages of the object (ex, en_us)   What we do not yet know is should this be required?  Or should be be optional? Beyond that we have a few different options: Translated content is embedded inside the original TLO This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally produce it.   We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2 Translated versions of a TLO will be represented as a new TLO This could work, if there was a relationship object or a translation object that could connect them.  There might be some weirdness in the work flow that we have yet to identify.   Another option would be to create a translation object that just contains the fields that can be translated.  You would then have the parent TLO written in lang=foo and the translations that are written in lang=fr, lang=de, lang=jp etc. This is very similar to #2.  However, this object contains just a subset of the fields. This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release the original TLO You could schema validate this method This would be super easy for consumers and parsers to deal with The last option as I see it, is something similar to #3, but where the fields are not defined in the spec.  A translator can flag any fields they want to translate and include them in the object.  This object will be tied to the original in the same way as #3.   On the consuming side, you could do really interesting things in software with merging the data in your database. The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields would be included. This is very flexible, but may produce problems for consumers or parsers.  Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.   On Feb 1, 2016, at 20:02, Chris Ricard < cricard@fsisac.us > wrote:   Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.   Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.   We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients can evaluate the applicability and importance.  The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.   This is all for human-readable reports, but the use case seems similar.   Proposal:   1.           Add an optional language tag in all top level constructs.  2.          Add an optional Alternate Language tag to Relationship objects. 3.          Producers can create multiple language-specific versions of whatever top-level objects they wish.  4.          Producers can create Alternate Language relationships between these alternate language objects. 5.          Consumers can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions. 6.          If the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.   Use Cases:   1.           I produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language relationship. 2.          I consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships. 3.          I consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English version.  When only an English version exists, I display the English version.   For consideration,   Chris Ricard FS-ISAC       From:   Jordan, Bret Sent:   Monday, February 1, 2016 9:02 PM To:   Masuoka, Ryusuke Cc:   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.     One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called lang .  This would be the language that the object is written in.  This would enable tools to select and filter by a language.     Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed around the idea of creating a translation object that could be sent either with the original TLO or separately.    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:   1) A threat intel provider could issue TLOs in language specific versions if they wanted.   2) A threat intel provider could produce language translations and attach them to the TLO.     3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.   There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated objects to auto point to a new version.  They would be tied at the hip, to the version that were created for.   The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.     What would you think of something like this?   Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.     On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi, Bret,   I guess it depends. But what I see is scenarios like the following:   - A Japanese entity receives CTI information pieces in English.   The entity determines some of them are important/critical   and worth translating them into Japanese, add descriptions in Japanese and redistribute them to other Japanese entities (if redistribution is allowed).   The CTIM (CTI Management System) of a receiving party displays   the Japanese description whenever possible, while allowing access to   the original English descriptions.   - Japanese entities produce CTI in Japanese (not in English, surprise!).   An entity decides some of them are important/critical and worth translating them into English, add descriptions in English, and redistribute them to other countries (if redistribution is allowed). The CTIM of a receiving party displays  the English description if so set, while allowing access to the original Japanese (likely more accurate) descriptions.   Regards,   Ryu   From:   Jordan, Bret [ mailto:bret.jordan@bluecoat.com ]   Sent:   Tuesday, February 02, 2016 10:17 AM To:   Masuoka, Ryusuke/ ??   ?? ;   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   Some questions: Will organizations producing threat intelligence produce one incident for each language?   Or will they produce one big incident that contains all of the languages? For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?   Or would you expect the TAXII server to send you both? What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?     Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.     On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi,   Not UTF-8 thing (I understand most of modern programming languages and other standards deal with it correctly).   It is about having text fields in multiple languages. For example, descriptions of a package in English and Japanese. The system will pick which language to display based on the language code ( “ en ”   or   “ jp ” ) in the field.   Is it something already discussed in Slack? (Sorry if so.)   Regards,   Ryu   From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Jordan, Bret Sent:   Tuesday, February 02, 2016 9:59 AM To:   Masuoka, Ryusuke/ ??   ?? Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.  So this alone should make things easier for our international friends..   If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand the problem that we were trying to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?   I really want to make sure our baby works for everyone.  But as I said on Slack, I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a coke .   Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.     On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi,   Is there a place for   “ Internationalization ”   of text fields? I would like very much to see it in STIX 2.0 (or CTI Common?) and I am willing to contribute.   Regards,   Ryu   From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Barnum, Sean D. Sent:   Tuesday, February 02, 2016 3:49 AM To:   cti@lists.oasis-open.org Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   All,   As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.   A  proposed tranche plan  is: February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to talk to vendor on the RSA show floor about it.  And have something to show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions, assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure patterns, etc) and quality. This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche. Please let us  know if you see any issues with this tranche plan.     The first tranche (Indicators) is the most relevant for now as it begins today. Below is a draft plan for the Indicator tranche. This draft  Indicator Tranche plan is also in the wiki. This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it. We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it. If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.       Indicator tranche plan Objective: To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are required to support common indicator use cases. Target completion date: February 29, 2016 Proposed workflow: Raise and describe the issue with a brief wiki writeup Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly marked as a proposal) to the wiki writeup and announce it to the list. Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus.  Discuss key issues on weekly working call. If consensus (unanimous or at least no strong objections) reached: Capture normative language in pre-draft spec document Capture consensus changes in JSON Schema implementation Capture consensus changes in UML model Capture statement of consensus in issue tracker Mark issue tracker as “Consensus Achieved” Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion If consensus not achieved (strong objection exists) within allowed time window: Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone OR o      Capture current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche Attachment: signature.asc Description: Message signed with OpenPGP using GPGMail


  • 3.  RE: [cti] Idea for Internationalization

    Posted 02-02-2016 07:02




    Hi, Bret, Ricard,

     
    Thank you for feedback. I reply to this threat as the subject is

    more appropriate.

     
    The use case scenarios are not all about translating existing

    (text) objects. In some cases:
     
    - Some major text fields (title, description, etc.) are produced

    in multiple languages at the time of package creation.
     
      For example, a Japanese entity creates a new CTI package
    and gives its title in Japanese as well as English so that

    at least someone, who gets interested in the title and

    does not read Japanese, can contact the original producer
    for further details. (It is often the case for papers written
    in Japanese. They provide titles and abstracts in English.)
     
      It is not always clear which is original and which is translation.
     
    I know that it might lose to some degree tractability of which text is original,

    but how about making it possible for any text field to

    have its text values as many languages as necessary,
    specified in the object by
    “ lang ”
    tag. Something like the following.
     
    -----
    {
      "type": "stix-package",
      "id": "stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365",
      “title”: [
        {
          “lang”: “en”,
          “value”: “Some really neat campaign that we found”
         },

         {
           “lang”: “ja”,
           “value”: “ ???????????????????? ”
          }
        ]
    }
    -----
     
    What do you say?
     
    Regards,
     
    Ryu
     


    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Jordan, Bret
    Sent: Tuesday, February 02, 2016 12:30 PM
    To: Chris Ricard
    Cc: cti@lists.oasis-open.org
    Subject: Re: [cti] Idea for Internationalization


     
    Those are great use cases and match what Ryu brought up.  We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have some general consensus around
    the idea that:



    All TLOs should have a field called "lang" that defines the languages of the object (ex, en_us)  



    What we do not yet know is should this be required?  Or should be be optional?


    Beyond that we have a few different options:



     




    Translated content is embedded inside the original TLO



    This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally produce it.  
    We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2



    Translated versions of a TLO will be represented as a new TLO



    This could work, if there was a relationship object or a translation object that could connect them. 
    There might be some weirdness in the work flow that we have yet to identify.  



    Another option would be to create a translation object that just contains the fields that can be translated.  You would then have the parent TLO written in lang=foo and the translations that are written in lang=fr, lang=de, lang=jp etc.



    This is very similar to #2.  However, this object contains just a subset of the fields.
    This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release the original TLO
    You could schema validate this method
    This would be super easy for consumers and parsers to deal with



    The last option as I see it, is something similar to #3, but where the fields are not defined in the spec.  A translator can flag any fields they want to translate and include them in the object.  This object will be tied to the original
    in the same way as #3.  



    On the consuming side, you could do really interesting things in software with merging the data in your database.
    The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields would be included.
    This is very flexible, but may produce problems for consumers or parsers. 











     


    Thanks,


     


    Bret



     


     


     



    Bret Jordan CISSP

    Director of Security Architecture and Standards Office of the CTO


    Blue Coat Systems



    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050


    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 









     



    On Feb 1, 2016, at 20:02, Chris Ricard < cricard@fsisac.us > wrote:

     


     


    Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.


     


    Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.


     


    We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients can evaluate the
    applicability and importance.  The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.


     


    This is all for human-readable reports, but the use case seems similar.


     


    Proposal:


     


    1.           Add
    an optional language tag in all top level constructs. 


    2.          Add
    an optional Alternate Language tag to Relationship objects.


    3.          Producers
    can create multiple language-specific versions of whatever top-level objects they wish. 


    4.          Producers
    can create Alternate Language relationships between these alternate language objects.


    5.          Consumers
    can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions.


    6.          If
    the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.


     


    Use Cases:


     


    1.           I
    produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language
    relationship.


    2.          I
    consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships.


    3.          I
    consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English version.  When only
    an English version exists, I display the English version.


     


    For consideration,


     


    Chris Ricard


    FS-ISAC


     


     


     




    From:   Jordan, Bret
    Sent:   Monday, February 1, 2016 9:02 PM
    To:   Masuoka, Ryusuke
    Cc:   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)



     


    Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.  



     




    One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang".  This would be the language that the object is
    written in.  This would enable tools to select and filter by a language.  




     




    Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed
    around the idea of creating a "translation" object that could be sent either with the original TLO or separately. 




     




    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:




     




    1) A threat intel provider could issue TLOs in language specific versions if they wanted.




     




    2) A threat intel provider could produce language translations and attach them to the TLO.  




     




    3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.




     




    There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated objects to auto
    point to a new version.  They would be tied at the hip, to the version that were created for.




     



    The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.  




     




    What would you think of something like this?









     



    Thanks,




     




    Bret





     




     




     





    Bret Jordan CISSP



    Director of Security Architecture and Standards Office of the CTO




    Blue Coat Systems





    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050




    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 











     





    On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:



     




    Hi, Bret,




     




    I guess it depends. But what I see is scenarios like the following:




     




    - A Japanese entity receives CTI information pieces in English.




      The entity determines some of them are important/critical




      and worth translating them into Japanese, add descriptions in Japanese




    and redistribute them to other Japanese entities (if redistribution is allowed).




      The CTIM (CTI Management System) of a receiving party displays




      the Japanese description whenever possible, while allowing access to




      the original English descriptions.




     




    - Japanese entities produce CTI in Japanese (not in English, surprise!).




      An entity decides some of them are important/critical and worth




    translating them into English, add descriptions in English,




    and redistribute them to other countries (if redistribution is allowed).




    The CTIM of a receiving party displays  the English description if so set,




    while allowing access to the original Japanese (likely more accurate)




    descriptions.




     




    Regards,




     




    Ryu




     






    From:   Jordan,
    Bret [ mailto:bret.jordan@bluecoat.com ]  
    Sent:   Tuesday, February 02, 2016 10:17 AM
    To:   Masuoka, Ryusuke/ ??   ?? ;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)






     




    Some questions:




    Will organizations producing threat intelligence produce one incident for each language?  


    Or will they produce one big incident that contains all of the languages?


    For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?  


    Or would you expect the TAXII server to send you both?


    What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?






     






     












    Thanks,






     






    Bret







     






     






     







    Bret Jordan CISSP





    Director of Security Architecture and Standards Office of the CTO






    Blue Coat Systems







    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050






    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 













     







    On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:





     






    Hi,






     






    Not UTF-8 thing (I understand most of modern programming languages






    and other standards deal with it correctly).






     






    It is about having text fields in multiple languages.






    For example, descriptions of a package in English and Japanese.






    The system will pick which language to display based on






    the language code ( “ en ”   or   “ jp ” )
    in the field.






     






    Is it something already discussed in Slack?






    (Sorry if so.)






     






    Regards,






     






    Ryu






     








    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Jordan, Bret
    Sent:   Tuesday, February 02, 2016 9:59 AM
    To:   Masuoka, Ryusuke/ ??   ??
    Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)








     






    I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.  So this alone should make things easier
    for our international friends..







     








    If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand the problem that we were trying
    to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?








     








    I really want to make sure our baby works for everyone.  But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a coke".










     





    Thanks,








     








    Bret









     








     








     









    Bret Jordan CISSP







    Director of Security Architecture and Standards Office of the CTO








    Blue Coat Systems









    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050








    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 















     









    On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:







     








    Hi,








     








    Is there a place for   “ Internationalization ”   of
    text fields?








    I would like very much to see it in STIX 2.0 (or CTI Common?)








    and I am willing to contribute.








     








    Regards,








     








    Ryu








     










    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Barnum, Sean D.
    Sent:   Tuesday, February 02, 2016 3:49 AM
    To:   cti@lists.oasis-open.org
    Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)










     












    All,










     










    As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX
    2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.










     










    A  proposed tranche plan  is:






    February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to talk to vendor on the
    RSA show floor about it.  And have something to show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions
    to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions,
    assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure
    patterns, etc) and quality.





    This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning
    and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche.










    Please let us  know if you see any issues with this tranche plan.










     










     










    The first tranche (Indicators) is the most relevant for now as it begins today.










    Below is a draft plan for the Indicator tranche. This draft  Indicator
    Tranche plan is also in the wiki.










    This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it.










    We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it.










    If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.










     










     










     







    Indicator tranche plan

    Objective:
    To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are
    required to support common indicator use cases.
    Target completion date:
    February 29, 2016
    Proposed workflow:


    Raise and describe the issue with a brief wiki writeup
    Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly
    marked as a proposal) to the wiki writeup and announce it to the list.
    Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus. 
    Discuss key issues on weekly working call.
    If consensus (unanimous or at least no strong objections) reached:



    Capture normative language in pre-draft spec document
    Capture consensus changes in JSON Schema implementation
    Capture consensus changes in UML model
    Capture statement of consensus in issue tracker
    Mark issue tracker as “Consensus Achieved”
    Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion



    If consensus not achieved (strong objection exists) within allowed time window:



    Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone
    OR





















    o      Capture
    current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche



     








  • 4.  RE: [cti] Idea for Internationalization

    Posted 02-02-2016 10:38




    Hi Bret / Ryu / All,
     
    Embedding the different translations within a single TLO object (as Ryu has suggested) is my preferred option. If we use a relationship
    to join the two different translations of the same information, then we end up with two bits of information that are effectively the same thing. This causes problems when third-parties link their threat intel to the two translations of the same information.

     
    Lets imagine OrgA creates IncidentA(en) and IncidentA(jp) and relates them together using translation-of relationship. OrgX has CampaignX(en)
    that they want to relate to the Incidents that they’ve seen.
    ·         
    It sizeably increases the size of data stored and moved. There are duplicate fields in both objects, and we require a separate
    relationship object to relate the two together. This is another relationship that needs to be walked and we then require extra storage unnecessarily used up by duplicate information.
    ·         
    Which object do they relate to their threat intel? The English version or the translated version? Both?
    ·         
    If OrgA updates IncidentA(en) and IncidentA(jp) isn’t updated, then which object is considered the “truth”?
    ·         
    What if a consumer only receives the IncidentA(en), yet a third-party has published a relationship linking IncidentA(jp)
    to another different Campaign? If the translations were embedded within the same single object then the single relationship would cover all translations.
     
    This has bigger implications on the object lifecycle than many people realize. Anytime that we separate the same logical ‘thing’ into
    separate versions of that thing we are potentially opening ourselves up to this problem. If we are describing an Incident, then there should be one Incident object with the data relating to that object within that one object. New revisions of that object should
    update that same object, and updates to that object should not affect any relationships pointing to that object.
     
    I’m worried that we are potentially making a future problem for ourselves if we head down this path.
     
    Cheers
     

    Terry MacDonald
    Senior STIX Subject Matter Expert
    SOLTRA   An FS-ISAC and DTCC Company
    +61 (407) 203 206
    terry@soltra.com

     

     


    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Masuoka, Ryusuke
    Sent: Tuesday, 2 February 2016 6:02 PM
    To: Jordan, Bret <bret.jordan@bluecoat.com>; Chris Ricard <cricard@fsisac.us>
    Cc: cti@lists.oasis-open.org
    Subject: RE: [cti] Idea for Internationalization


     
    Hi, Bret, Ricard,

     
    Thank you for feedback. I reply to this threat as the subject is

    more appropriate.

     
    The use case scenarios are not all about translating existing

    (text) objects. In some cases:
     
    - Some major text fields (title, description, etc.) are produced

    in multiple languages at the time of package creation.
     
      For example, a Japanese entity creates a new CTI package
    and gives its title in Japanese as well as English so that

    at least someone, who gets interested in the title and

    does not read Japanese, can contact the original producer
    for further details. (It is often the case for papers written
    in Japanese. They provide titles and abstracts in English.)
     
      It is not always clear which is original and which is translation.
     
    I know that it might lose to some degree tractability of which text is original,

    but how about making it possible for any text field to

    have its text values as many languages as necessary,
    specified in the object by
    “ lang ”
    tag. Something like the following.
     
    -----
    {
      "type": "stix-package",
      "id": "stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365",
      “title”: [
        {
          “lang”: “en”,
          “value”: “Some really neat campaign that we found”
         },

         {
           “lang”: “ja”,
           “value”: “ ? ??????????????????? ”
          }
        ]
    }
    -----
     
    What do you say?
     
    Regards,
     
    Ryu
     


    From:
    cti@lists.oasis-open.org
    [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Jordan, Bret
    Sent: Tuesday, February 02, 2016 12:30 PM
    To: Chris Ricard
    Cc: cti@lists.oasis-open.org
    Subject: Re: [cti] Idea for Internationalization


     
    Those are great use cases and match what Ryu brought up.  We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have
    some general consensus around the idea that:



    All TLOs should have a field called "lang" that defines the languages of the object (ex, en_us)  



    What we do not yet know is should this be required?  Or should be be optional?


    Beyond that we have a few different options:



     




    Translated content is embedded inside the original TLO



    This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally produce it.  
    We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2



    Translated versions of a TLO will be represented as a new TLO



    This could work, if there was a relationship object or a translation object that could connect them. 
    There might be some weirdness in the work flow that we have yet to identify.  



    Another option would be to create a translation object that just contains the fields that can be translated.  You would then have the parent TLO written in lang=foo and the translations that are written in
    lang=fr, lang=de, lang=jp etc.



    This is very similar to #2.  However, this object contains just a subset of the fields.
    This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release the original TLO
    You could schema validate this method
    This would be super easy for consumers and parsers to deal with



    The last option as I see it, is something similar to #3, but where the fields are not defined in the spec.  A translator can flag any fields they want to translate and include them in the object.  This object
    will be tied to the original in the same way as #3.  



    On the consuming side, you could do really interesting things in software with merging the data in your database.
    The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields would be included.
    This is very flexible, but may produce problems for consumers or parsers. 











     


    Thanks,


     


    Bret



     


     


     



    Bret Jordan CISSP

    Director of Security Architecture and Standards Office of the CTO


    Blue Coat Systems



    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050


    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 









     



    On Feb 1, 2016, at 20:02, Chris Ricard < cricard@fsisac.us >
    wrote:

     


     


    Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.


     


    Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.


     


    We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients
    can evaluate the applicability and importance.  The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.


     


    This is all for human-readable reports, but the use case seems similar.


     


    Proposal:


     


    1.           Add
    an optional language tag in all top level constructs. 


    2.          Add
    an optional Alternate Language tag to Relationship objects.


    3.          Producers
    can create multiple language-specific versions of whatever top-level objects they wish. 


    4.          Producers
    can create Alternate Language relationships between these alternate language objects.


    5.          Consumers
    can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions.


    6.          If
    the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.


     


    Use Cases:


     


    1.           I
    produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language
    relationship.


    2.          I
    consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships.


    3.          I
    consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English version.  When only
    an English version exists, I display the English version.


     


    For consideration,


     


    Chris Ricard


    FS-ISAC


     


     


     




    From:   Jordan, Bret
    Sent:   Monday, February 1, 2016 9:02 PM
    To:   Masuoka, Ryusuke
    Cc:   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)



     


    Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.  



     




    One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang".  This would be the language
    that the object is written in.  This would enable tools to select and filter by a language.  




     




    Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for
    example, we have tossed around the idea of creating a "translation" object that could be sent either with the original TLO or separately. 




     




    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:




     




    1) A threat intel provider could issue TLOs in language specific versions if they wanted.




     




    2) A threat intel provider could produce language translations and attach them to the TLO.  




     




    3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.




     




    There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated
    objects to auto point to a new version.  They would be tied at the hip, to the version that were created for.




     



    The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.  




     




    What would you think of something like this?









     



    Thanks,




     




    Bret





     




     




     





    Bret Jordan CISSP



    Director of Security Architecture and Standards Office of the CTO




    Blue Coat Systems





    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050




    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 











     





    On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:



     




    Hi, Bret,




     




    I guess it depends. But what I see is scenarios like the following:




     




    - A Japanese entity receives CTI information pieces in English.




      The entity determines some of them are important/critical




      and worth translating them into Japanese, add descriptions in Japanese




    and redistribute them to other Japanese entities (if redistribution is allowed).




      The CTIM (CTI Management System) of a receiving party displays




      the Japanese description whenever possible, while allowing access to




      the original English descriptions.




     




    - Japanese entities produce CTI in Japanese (not in English, surprise!).




      An entity decides some of them are important/critical and worth




    translating them into English, add descriptions in English,




    and redistribute them to other countries (if redistribution is allowed).




    The CTIM of a receiving party displays  the English description if so set,




    while allowing access to the original Japanese (likely more accurate)




    descriptions.




     




    Regards,




     




    Ryu




     






    From:   Jordan,
    Bret [ mailto:bret.jordan@bluecoat.com ]  
    Sent:   Tuesday, February 02, 2016 10:17 AM
    To:   Masuoka, Ryusuke/ ??   ?? ;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)






     




    Some questions:




    Will organizations producing threat intelligence produce one incident for each language?  


    Or will they produce one big incident that contains all of the languages?


    For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?  


    Or would you expect the TAXII server to send you both?


    What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?






     






     












    Thanks,






     






    Bret







     






     






     







    Bret Jordan CISSP





    Director of Security Architecture and Standards Office of the CTO






    Blue Coat Systems







    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050






    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 













     







    On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:





     






    Hi,






     






    Not UTF-8 thing (I understand most of modern programming languages






    and other standards deal with it correctly).






     






    It is about having text fields in multiple languages.






    For example, descriptions of a package in English and Japanese.






    The system will pick which language to display based on






    the language code ( “ en ”   or   “ jp ” )
    in the field.






     






    Is it something already discussed in Slack?






    (Sorry if so.)






     






    Regards,






     






    Ryu






     








    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Jordan, Bret
    Sent:   Tuesday, February 02, 2016 9:59 AM
    To:   Masuoka, Ryusuke/ ??   ??
    Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)








     






    I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.  So this
    alone should make things easier for our international friends..







     








    If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand
    the problem that we were trying to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?








     








    I really want to make sure our baby works for everyone.  But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a
    coke".










     





    Thanks,








     








    Bret









     








     








     









    Bret Jordan CISSP







    Director of Security Architecture and Standards Office of the CTO








    Blue Coat Systems









    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050








    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 















     









    On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:







     








    Hi,








     








    Is there a place for   “ Internationalization ”   of
    text fields?








    I would like very much to see it in STIX 2.0 (or CTI Common?)








    and I am willing to contribute.








     








    Regards,








     








    Ryu








     










    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Barnum, Sean D.
    Sent:   Tuesday, February 02, 2016 3:49 AM
    To:   cti@lists.oasis-open.org
    Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)










     












    All,










     










    As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target
    date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.










     










    A  proposed
    tranche plan  is:






    February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to
    talk to vendor on the RSA show floor about it.  And have something to show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor
    top level abstractions to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run
    Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic
    support (opinions, assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming
    conventions, structure patterns, etc) and quality.





    This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is
    likely to cover patterning and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche.










    Please let us  know if you see any issues with this tranche plan.










     










     










    The first tranche (Indicators) is the most relevant for now as it begins today.










    Below is a draft plan for the Indicator tranche. This draft  Indicator
    Tranche plan is also in the wiki.










    This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it.










    We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it.










    If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.










     










     










     







    Indicator tranche plan

    Objective:
    To discuss and reach consensus on all in-scope tracker issues
    for STIX 2.0 that are required to support common indicator use cases.
    Target completion date:
    February 29, 2016
    Proposed workflow:


    Raise and describe the issue with a brief wiki writeup
    Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams,
    schema,etc clearly marked as a proposal) to the wiki writeup and announce it to the list.
    Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus. 
    Discuss key issues on weekly working call.
    If consensus (unanimous or at least no strong objections) reached:



    Capture normative language in pre-draft spec document
    Capture consensus changes in JSON Schema implementation
    Capture consensus changes in UML model
    Capture statement of consensus in issue tracker
    Mark issue tracker as “Consensus Achieved”
    Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion



    If consensus not achieved (strong objection exists) within allowed time window:



    Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone
    OR





















    o      Capture
    current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche



     








  • 5.  Re: [cti] Idea for Internationalization

    Posted 02-02-2016 12:25




    I think this option is what Bret was describing as #1. As he says, there’s a few downsides: first, it’s more complicated for single language content (not a huge deal) and, second, it doesn’t allow for translations by someone other than the individual producer
    (since they can’t revise the object). They’d have to use some relationship approach, which would give us two ways of doing it and is exactly what we want to avoid. I initially thought this might be uncommon, but it’s exactly what Chris just outlined as a use
    case.


    For that reason I prefer option #3 from Bret’s e-mail. It supports both original translations from the producer as well as shared third-party translations. It doesn’t complicated things for single language content and it doesn’t have the relationship tracking
    downsides that Terry mentions below because it’s not a separate TLO (well, it is, but of a special type).


    It would look something like this:


    [
      {
        “type”: “indicator”,
        “id”: “indicator—UUID”,
        “lang”: “en-US”,
        “title”: “English title”,
      },
      {
        “type”: “indicator-translation”,
        “id”: “indicator-translation—UUID”,
        “object_ref”: “indicator—UUID”
        “lang”: “jp”,
        “title”: “Japanese Title”
      }
    ]


    The second object (the translation) would also need a way to point to a specific revision (if we use a different approach than relationships) and the producer, but we don’t have consensus on that so I omitted it. Option 4 would look the same, by the way,
    except the type would be “translation” and we wouldn’t have individual schemas specifying which fields are translatable.


    Because the indicator is the only indicator TLO there, all domain relationships would go through it as the original.


    John








    From: < cti@lists.oasis-open.org > on behalf of Terry MacDonald < terry@soltra.com >
    Date: Tuesday, February 2, 2016 at 5:37 AM
    To: "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Jordan, Bret" < bret.jordan@bluecoat.com >, Chris Ricard < cricard@fsisac.us >
    Cc: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: RE: [cti] Idea for Internationalization








    Hi Bret / Ryu / All,
     
    Embedding the different translations within a single TLO object (as Ryu has suggested) is my preferred option. If we use a relationship
    to join the two different translations of the same information, then we end up with two bits of information that are effectively the same thing. This causes problems when third-parties link their threat intel to the two translations of the same information.

     
    Lets imagine OrgA creates IncidentA(en) and IncidentA(jp) and relates them together using translation-of relationship. OrgX has CampaignX(en)
    that they want to relate to the Incidents that they’ve seen.
    ·         
    It sizeably increases the size of data stored and moved. There are duplicate fields in both objects, and we require a
    separate relationship object to relate the two together. This is another relationship that needs to be walked and we then require extra storage unnecessarily used up by duplicate information.
    ·         
    Which object do they relate to their threat intel? The English version or the translated version? Both?
    ·         
    If OrgA updates IncidentA(en) and IncidentA(jp) isn’t updated, then which object is considered the “truth”?
    ·         
    What if a consumer only receives the IncidentA(en), yet a third-party has published a relationship linking IncidentA(jp)
    to another different Campaign? If the translations were embedded within the same single object then the single relationship would cover all translations.
     
    This has bigger implications on the object lifecycle than many people realize. Anytime that we separate the same logical ‘thing’ into
    separate versions of that thing we are potentially opening ourselves up to this problem. If we are describing an Incident, then there should be one Incident object with the data relating to that object within that one object. New revisions of that object should
    update that same object, and updates to that object should not affect any relationships pointing to that object.
     
    I’m worried that we are potentially making a future problem for ourselves if we head down this path.
     
    Cheers
     

    Terry MacDonald
    Senior STIX Subject Matter Expert
    SOLTRA   An FS-ISAC and DTCC Company
    +61 (407) 203 206
    terry@soltra.com
     

     


    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Masuoka, Ryusuke
    Sent: Tuesday, 2 February 2016 6:02 PM
    To: Jordan, Bret < bret.jordan@bluecoat.com >; Chris Ricard < cricard@fsisac.us >
    Cc: cti@lists.oasis-open.org
    Subject: RE: [cti] Idea for Internationalization


     
    Hi, Bret, Ricard,

     
    Thank you for feedback. I reply to this threat as the subject is

    more appropriate.

     
    The use case scenarios are not all about translating existing

    (text) objects. In some cases:
     
    - Some major text fields (title, description, etc.) are produced

    in multiple languages at the time of package creation.
     
      For example, a Japanese entity creates a new CTI package
    and gives its title in Japanese as well as English so that

    at least someone, who gets interested in the title and

    does not read Japanese, can contact the original producer
    for further details. (It is often the case for papers written
    in Japanese. They provide titles and abstracts in English.)
     
      It is not always clear which is original and which is translation.
     
    I know that it might lose to some degree tractability of which text is original,

    but how about making it possible for any text field to

    have its text values as many languages as necessary,
    specified in the object by
    “ lang ”
    tag. Something like the following.
     
    -----
    {
      "type": "stix-package",
      "id": "stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365",
      “title”: [
        {
          “lang”: “en”,
          “value”: “Some really neat campaign that we found”
         },

         {
           “lang”: “ja”,
           “value”: “ ? ??????????????????? ”
          }
        ]
    }
    -----
     
    What do you say?
     
    Regards,
     
    Ryu
     


    From: cti@lists.oasis-open.org
    [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Jordan, Bret
    Sent: Tuesday, February 02, 2016 12:30 PM
    To: Chris Ricard
    Cc: cti@lists.oasis-open.org
    Subject: Re: [cti] Idea for Internationalization


     
    Those are great use cases and match what Ryu brought up.  We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have
    some general consensus around the idea that:



    All TLOs should have a field called "lang" that defines the languages of the object (ex, en_us)  



    What we do not yet know is should this be required?  Or should be be optional?


    Beyond that we have a few different options:



     




    Translated content is embedded inside the original TLO



    This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally produce it.  
    We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2



    Translated versions of a TLO will be represented as a new TLO



    This could work, if there was a relationship object or a translation object that could connect them. 
    There might be some weirdness in the work flow that we have yet to identify.  



    Another option would be to create a translation object that just contains the fields that can be translated.  You would then have the parent TLO written in lang=foo and the translations that are written in
    lang=fr, lang=de, lang=jp etc.



    This is very similar to #2.  However, this object contains just a subset of the fields.
    This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release the original TLO
    You could schema validate this method
    This would be super easy for consumers and parsers to deal with



    The last option as I see it, is something similar to #3, but where the fields are not defined in the spec.  A translator can flag any fields they want to translate and include them in the object.  This object
    will be tied to the original in the same way as #3.  



    On the consuming side, you could do really interesting things in software with merging the data in your database.
    The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields would be included.
    This is very flexible, but may produce problems for consumers or parsers. 











     


    Thanks,


     


    Bret



     


     


     



    Bret Jordan CISSP

    Director of Security Architecture and Standards Office of the CTO


    Blue Coat Systems



    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050


    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 









     



    On Feb 1, 2016, at 20:02, Chris Ricard < cricard@fsisac.us >
    wrote:

     


     


    Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.


     


    Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.


     


    We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients
    can evaluate the applicability and importance.  The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.


     


    This is all for human-readable reports, but the use case seems similar.


     


    Proposal:


     


    1.           Add
    an optional language tag in all top level constructs. 


    2.          Add
    an optional Alternate Language tag to Relationship objects.


    3.          Producers
    can create multiple language-specific versions of whatever top-level objects they wish. 


    4.          Producers
    can create Alternate Language relationships between these alternate language objects.


    5.          Consumers
    can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions.


    6.          If
    the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.


     


    Use Cases:


     


    1.           I
    produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language
    relationship.


    2.          I
    consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships.


    3.          I
    consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English version.  When only
    an English version exists, I display the English version.


     


    For consideration,


     


    Chris Ricard


    FS-ISAC


     


     


     




    From:   Jordan, Bret
    Sent:   Monday, February 1, 2016 9:02 PM
    To:   Masuoka, Ryusuke
    Cc:   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)



     


    Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.  



     




    One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang".  This would be the language
    that the object is written in.  This would enable tools to select and filter by a language.  




     




    Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for
    example, we have tossed around the idea of creating a "translation" object that could be sent either with the original TLO or separately. 




     




    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:




     




    1) A threat intel provider could issue TLOs in language specific versions if they wanted.




     




    2) A threat intel provider could produce language translations and attach them to the TLO.  




     




    3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.




     




    There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated
    objects to auto point to a new version.  They would be tied at the hip, to the version that were created for.




     



    The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.  




     




    What would you think of something like this?









     



    Thanks,




     




    Bret





     




     




     





    Bret Jordan CISSP



    Director of Security Architecture and Standards Office of the CTO




    Blue Coat Systems





    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050




    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 











     





    On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:



     




    Hi, Bret,




     




    I guess it depends. But what I see is scenarios like the following:




     




    - A Japanese entity receives CTI information pieces in English.




      The entity determines some of them are important/critical




      and worth translating them into Japanese, add descriptions in Japanese




    and redistribute them to other Japanese entities (if redistribution is allowed).




      The CTIM (CTI Management System) of a receiving party displays




      the Japanese description whenever possible, while allowing access to




      the original English descriptions.




     




    - Japanese entities produce CTI in Japanese (not in English, surprise!).




      An entity decides some of them are important/critical and worth




    translating them into English, add descriptions in English,




    and redistribute them to other countries (if redistribution is allowed).




    The CTIM of a receiving party displays  the English description if so set,




    while allowing access to the original Japanese (likely more accurate)




    descriptions.




     




    Regards,




     




    Ryu




     






    From:   Jordan,
    Bret [ mailto:bret.jordan@bluecoat.com ]  
    Sent:   Tuesday, February 02, 2016 10:17 AM
    To:   Masuoka, Ryusuke/ ??   ?? ;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)






     




    Some questions:




    Will organizations producing threat intelligence produce one incident for each language?  


    Or will they produce one big incident that contains all of the languages?


    For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?  


    Or would you expect the TAXII server to send you both?


    What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?






     






     












    Thanks,






     






    Bret







     






     






     







    Bret Jordan CISSP





    Director of Security Architecture and Standards Office of the CTO






    Blue Coat Systems







    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050






    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 













     







    On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:





     






    Hi,






     






    Not UTF-8 thing (I understand most of modern programming languages






    and other standards deal with it correctly).






     






    It is about having text fields in multiple languages.






    For example, descriptions of a package in English and Japanese.






    The system will pick which language to display based on






    the language code ( “ en ”   or   “ jp ” )
    in the field.






     






    Is it something already discussed in Slack?






    (Sorry if so.)






     






    Regards,






     






    Ryu






     








    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Jordan, Bret
    Sent:   Tuesday, February 02, 2016 9:59 AM
    To:   Masuoka, Ryusuke/ ??   ??
    Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)








     






    I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.  So this
    alone should make things easier for our international friends..







     








    If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand
    the problem that we were trying to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?








     








    I really want to make sure our baby works for everyone.  But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a
    coke".










     





    Thanks,








     








    Bret









     








     








     









    Bret Jordan CISSP







    Director of Security Architecture and Standards Office of the CTO








    Blue Coat Systems









    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050








    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 















     









    On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:







     








    Hi,








     








    Is there a place for   “ Internationalization ”   of
    text fields?








    I would like very much to see it in STIX 2.0 (or CTI Common?)








    and I am willing to contribute.








     








    Regards,








     








    Ryu








     










    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Barnum, Sean D.
    Sent:   Tuesday, February 02, 2016 3:49 AM
    To:   cti@lists.oasis-open.org
    Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)










     












    All,










     










    As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target
    date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.










     










    A  proposed
    tranche plan  is:






    February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to
    talk to vendor on the RSA show floor about it.  And have something to show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor
    top level abstractions to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run
    Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic
    support (opinions, assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming
    conventions, structure patterns, etc) and quality.





    This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is
    likely to cover patterning and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche.










    Please let us  know if you see any issues with this tranche plan.










     










     










    The first tranche (Indicators) is the most relevant for now as it begins today.










    Below is a draft plan for the Indicator tranche. This draft  Indicator
    Tranche plan is also in the wiki.










    This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it.










    We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it.










    If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.










     










     










     







    Indicator tranche plan

    Objective:
    To discuss and reach consensus on all in-scope tracker issues
    for STIX 2.0 that are required to support common indicator use cases.
    Target completion date:
    February 29, 2016
    Proposed workflow:


    Raise and describe the issue with a brief wiki writeup
    Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams,
    schema,etc clearly marked as a proposal) to the wiki writeup and announce it to the list.
    Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus. 
    Discuss key issues on weekly working call.
    If consensus (unanimous or at least no strong objections) reached:



    Capture normative language in pre-draft spec document
    Capture consensus changes in JSON Schema implementation
    Capture consensus changes in UML model
    Capture statement of consensus in issue tracker
    Mark issue tracker as “Consensus Achieved”
    Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion



    If consensus not achieved (strong objection exists) within allowed time window:



    Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone
    OR





















    o      Capture
    current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche



     











  • 6.  Re: [cti] Idea for Internationalization

    Posted 02-02-2016 14:14
    In case this is of interest, OASIS does have active experts in internationalization/translation. The XML Localisation Interchange File Format (XLIFF) TC ( https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff ) is working on standards to support multi-lingual data exchange for localisation. If anyone wanted to kick best practices / pros & cons around on that subject, I'm sure you'd find folks on that side happy to talk.  Best,  /chet On Tue, Feb 2, 2016 at 7:25 AM, Wunder, John A. < jwunder@mitre.org > wrote: I think this option is what Bret was describing as #1. As he says, there’s a few downsides: first, it’s more complicated for single language content (not a huge deal) and, second, it doesn’t allow for translations by someone other than the individual producer (since they can’t revise the object). They’d have to use some relationship approach, which would give us two ways of doing it and is exactly what we want to avoid. I initially thought this might be uncommon, but it’s exactly what Chris just outlined as a use case. For that reason I prefer option #3 from Bret’s e-mail. It supports both original translations from the producer as well as shared third-party translations. It doesn’t complicated things for single language content and it doesn’t have the relationship tracking downsides that Terry mentions below because it’s not a separate TLO (well, it is, but of a special type). It would look something like this: [   {     “type”: “indicator”,     “id”: “indicator—UUID”,     “lang”: “en-US”,     “title”: “English title”,   },   {     “type”: “indicator-translation”,     “id”: “indicator-translation—UUID”,     “object_ref”: “indicator—UUID”     “lang”: “jp”,     “title”: “Japanese Title”   } ] The second object (the translation) would also need a way to point to a specific revision (if we use a different approach than relationships) and the producer, but we don’t have consensus on that so I omitted it. Option 4 would look the same, by the way, except the type would be “translation” and we wouldn’t have individual schemas specifying which fields are translatable. Because the indicator is the only indicator TLO there, all domain relationships would go through it as the original. John From: < cti@lists.oasis-open.org > on behalf of Terry MacDonald < terry@soltra.com > Date: Tuesday, February 2, 2016 at 5:37 AM To: "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Jordan, Bret" < bret.jordan@bluecoat.com >, Chris Ricard < cricard@fsisac.us > Cc: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: RE: [cti] Idea for Internationalization Hi Bret / Ryu / All,   Embedding the different translations within a single TLO object (as Ryu has suggested) is my preferred option. If we use a relationship to join the two different translations of the same information, then we end up with two bits of information that are effectively the same thing. This causes problems when third-parties link their threat intel to the two translations of the same information.   Lets imagine OrgA creates IncidentA(en) and IncidentA(jp) and relates them together using translation-of relationship. OrgX has CampaignX(en) that they want to relate to the Incidents that they’ve seen. ·          It sizeably increases the size of data stored and moved. There are duplicate fields in both objects, and we require a separate relationship object to relate the two together. This is another relationship that needs to be walked and we then require extra storage unnecessarily used up by duplicate information. ·          Which object do they relate to their threat intel? The English version or the translated version? Both? ·          If OrgA updates IncidentA(en) and IncidentA(jp) isn’t updated, then which object is considered the “truth”? ·          What if a consumer only receives the IncidentA(en), yet a third-party has published a relationship linking IncidentA(jp) to another different Campaign? If the translations were embedded within the same single object then the single relationship would cover all translations.   This has bigger implications on the object lifecycle than many people realize. Anytime that we separate the same logical ‘thing’ into separate versions of that thing we are potentially opening ourselves up to this problem. If we are describing an Incident, then there should be one Incident object with the data relating to that object within that one object. New revisions of that object should update that same object, and updates to that object should not affect any relationships pointing to that object.   I’m worried that we are potentially making a future problem for ourselves if we head down this path.   Cheers   Terry MacDonald Senior STIX Subject Matter Expert SOLTRA   An FS-ISAC and DTCC Company +61 (407) 203 206 terry@soltra.com     From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Masuoka, Ryusuke Sent: Tuesday, 2 February 2016 6:02 PM To: Jordan, Bret < bret.jordan@bluecoat.com >; Chris Ricard < cricard@fsisac.us > Cc: cti@lists.oasis-open.org Subject: RE: [cti] Idea for Internationalization   Hi, Bret, Ricard,   Thank you for feedback. I reply to this threat as the subject is more appropriate.   The use case scenarios are not all about translating existing (text) objects. In some cases:   - Some major text fields (title, description, etc.) are produced in multiple languages at the time of package creation.     For example, a Japanese entity creates a new CTI package and gives its title in Japanese as well as English so that at least someone, who gets interested in the title and does not read Japanese, can contact the original producer for further details. (It is often the case for papers written in Japanese. They provide titles and abstracts in English.)     It is not always clear which is original and which is translation.   I know that it might lose to some degree tractability of which text is original, but how about making it possible for any text field to have its text values as many languages as necessary, specified in the object by “ lang ” tag. Something like the following.   ----- {   "type": "stix-package",   "id": "stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365",   “title”: [     {       “lang”: “en”,       “value”: “Some really neat campaign that we found”      },      {        “lang”: “ja”,        “value”: “ ? ??????????????????? ”       }     ] } -----   What do you say?   Regards,   Ryu   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Jordan, Bret Sent: Tuesday, February 02, 2016 12:30 PM To: Chris Ricard Cc: cti@lists.oasis-open.org Subject: Re: [cti] Idea for Internationalization   Those are great use cases and match what Ryu brought up.  We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have some general consensus around the idea that: All TLOs should have a field called "lang" that defines the languages of the object (ex, en_us)   What we do not yet know is should this be required?  Or should be be optional? Beyond that we have a few different options:   Translated content is embedded inside the original TLO This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally produce it.   We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2 Translated versions of a TLO will be represented as a new TLO This could work, if there was a relationship object or a translation object that could connect them.  There might be some weirdness in the work flow that we have yet to identify.   Another option would be to create a translation object that just contains the fields that can be translated.  You would then have the parent TLO written in lang=foo and the translations that are written in lang=fr, lang=de, lang=jp etc. This is very similar to #2.  However, this object contains just a subset of the fields. This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release the original TLO You could schema validate this method This would be super easy for consumers and parsers to deal with The last option as I see it, is something similar to #3, but where the fields are not defined in the spec.  A translator can flag any fields they want to translate and include them in the object.  This object will be tied to the original in the same way as #3.   On the consuming side, you could do really interesting things in software with merging the data in your database. The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields would be included. This is very flexible, but may produce problems for consumers or parsers.    Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."    On Feb 1, 2016, at 20:02, Chris Ricard < cricard@fsisac.us > wrote:     Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.   Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.   We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients can evaluate the applicability and importance.  The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.   This is all for human-readable reports, but the use case seems similar.   Proposal:   1.           Add an optional language tag in all top level constructs.  2.          Add an optional Alternate Language tag to Relationship objects. 3.          Producers can create multiple language-specific versions of whatever top-level objects they wish.  4.          Producers can create Alternate Language relationships between these alternate language objects. 5.          Consumers can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions. 6.          If the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.   Use Cases:   1.           I produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language relationship. 2.          I consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships. 3.          I consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English version.  When only an English version exists, I display the English version.   For consideration,   Chris Ricard FS-ISAC       From:   Jordan, Bret Sent:   Monday, February 1, 2016 9:02 PM To:   Masuoka, Ryusuke Cc:   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.     One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang".  This would be the language that the object is written in.  This would enable tools to select and filter by a language.     Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed around the idea of creating a "translation" object that could be sent either with the original TLO or separately.    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:   1) A threat intel provider could issue TLOs in language specific versions if they wanted.   2) A threat intel provider could produce language translations and attach them to the TLO.     3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.   There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated objects to auto point to a new version.  They would be tied at the hip, to the version that were created for.   The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.     What would you think of something like this?   Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."    On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi, Bret,   I guess it depends. But what I see is scenarios like the following:   - A Japanese entity receives CTI information pieces in English.   The entity determines some of them are important/critical   and worth translating them into Japanese, add descriptions in Japanese and redistribute them to other Japanese entities (if redistribution is allowed).   The CTIM (CTI Management System) of a receiving party displays   the Japanese description whenever possible, while allowing access to   the original English descriptions.   - Japanese entities produce CTI in Japanese (not in English, surprise!).   An entity decides some of them are important/critical and worth translating them into English, add descriptions in English, and redistribute them to other countries (if redistribution is allowed). The CTIM of a receiving party displays  the English description if so set, while allowing access to the original Japanese (likely more accurate) descriptions.   Regards,   Ryu   From:   Jordan, Bret [ mailto:bret.jordan@bluecoat.com ]   Sent:   Tuesday, February 02, 2016 10:17 AM To:   Masuoka, Ryusuke/ ??   ?? ;   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   Some questions: Will organizations producing threat intelligence produce one incident for each language?   Or will they produce one big incident that contains all of the languages? For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?   Or would you expect the TAXII server to send you both? What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?     Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."    On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi,   Not UTF-8 thing (I understand most of modern programming languages and other standards deal with it correctly).   It is about having text fields in multiple languages. For example, descriptions of a package in English and Japanese. The system will pick which language to display based on the language code ( “ en ”   or   “ jp ” ) in the field.   Is it something already discussed in Slack? (Sorry if so.)   Regards,   Ryu   From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Jordan, Bret Sent:   Tuesday, February 02, 2016 9:59 AM To:   Masuoka, Ryusuke/ ??   ?? Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.  So this alone should make things easier for our international friends..   If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand the problem that we were trying to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?   I really want to make sure our baby works for everyone.  But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a coke".   Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg."    On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi,   Is there a place for   “ Internationalization ”   of text fields? I would like very much to see it in STIX 2.0 (or CTI Common?) and I am willing to contribute.   Regards,   Ryu   From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Barnum, Sean D. Sent:   Tuesday, February 02, 2016 3:49 AM To:   cti@lists.oasis-open.org Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   All,   As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.   A  proposed tranche plan  is: February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to talk to vendor on the RSA show floor about it.  And have something to show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions, assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure patterns, etc) and quality. This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche. Please let us  know if you see any issues with this tranche plan.     The first tranche (Indicators) is the most relevant for now as it begins today. Below is a draft plan for the Indicator tranche. This draft  Indicator Tranche plan is also in the wiki. This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it. We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it. If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.       Indicator tranche plan Objective: To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are required to support common indicator use cases. Target completion date: February 29, 2016 Proposed workflow: Raise and describe the issue with a brief wiki writeup Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly marked as a proposal) to the wiki writeup and announce it to the list. Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus.  Discuss key issues on weekly working call. If consensus (unanimous or at least no strong objections) reached: Capture normative language in pre-draft spec document Capture consensus changes in JSON Schema implementation Capture consensus changes in UML model Capture statement of consensus in issue tracker Mark issue tracker as “Consensus Achieved” Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion If consensus not achieved (strong objection exists) within allowed time window: Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone OR o      Capture current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche   -- /chet  ---------------- Chet Ensign Director of Standards Development and TC Administration  OASIS: Advancing open standards for the information society http://www.oasis-open.org Primary: +1 973-996-2298 Mobile: +1 201-341-1393 


  • 7.  RE: [cti] Idea for Internationalization

    Posted 02-02-2016 16:26
    I feel like this is something we should pursue. Most (or perhaps all) of these types of issues have been dealt with in other subject fields, so it's worth learning from the folks who have done this before. (FTR, I have too, but it's been a while and was focused on UI implementation as much as it was on content). -- Kyle Maxwell [kmaxwell@verisign.com] Senior Analyst, iDefense Applied Security Intelligence From: cti@lists.oasis-open.org [cti@lists.oasis-open.org] on behalf of Chet Ensign [chet.ensign@oasis-open.org] Sent: Tuesday, February 02, 2016 08:13 To: Wunder, John A. Cc: cti@lists.oasis-open.org Subject: Re: [cti] Idea for Internationalization In case this is of interest, OASIS does have active experts in internationalization/translation. The XML Localisation Interchange File Format (XLIFF) TC ( https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff ) is working on standards to support multi-lingual data exchange for localisation. If anyone wanted to kick best practices / pros & cons around on that subject, I'm sure you'd find folks on that side happy to talk.  Best,  /chet


  • 8.  RE: [cti] Idea for Internationalization

    Posted 02-05-2016 08:12
    Hi,   Thank you for the input.   I looked up     XLIFF Version 1.2   http://docs.oasis-open.org/xliff/xliff-core/xliff-core.html   and it goes like (as in “ 2. General Structure ” )   <xliff version='1.2'        xmlns='urn:oasis:names:tc:xliff:document:1.2'> <file original='hello.txt' source-language='en' target-language='fr'        datatype='plaintext'> <body> <trans-unit id='hi'> <source>Hello world</source> <target>Bonjour le monde</target> <alt-trans> <target xml:lang='es'>Hola mundo</target> </alt-trans> </trans-unit> </body> </file> </xliff>   The original text is in <source>, (translation) target text is in <target>, alternate translations are put in <alt-trans>, and all those elements are put in <trans-unit>.   Regards,   Ryu   From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Maxwell, Kyle Sent: Wednesday, February 03, 2016 1:26 AM To: Chet Ensign; Wunder, John A. Cc: cti@lists.oasis-open.org Subject: RE: [cti] Idea for Internationalization   I feel like this is something we should pursue. Most (or perhaps all) of these types of issues have been dealt with in other subject fields, so it's worth learning from the folks who have done this before. (FTR, I have too, but it's been a while and was focused on UI implementation as much as it was on content).   -- Kyle Maxwell [kmaxwell@verisign.com] Senior Analyst, iDefense Applied Security Intelligence From: cti@lists.oasis-open.org [cti@lists.oasis-open.org] on behalf of Chet Ensign [chet.ensign@oasis-open.org] Sent: Tuesday, February 02, 2016 08:13 To: Wunder, John A. Cc: cti@lists.oasis-open.org Subject: Re: [cti] Idea for Internationalization In case this is of interest, OASIS does have active experts in internationalization/translation. The XML Localisation Interchange File Format (XLIFF) TC ( https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=xliff ) is working on standards to support multi-lingual data exchange for localisation. If anyone wanted to kick best practices / pros & cons around on that subject, I'm sure you'd find folks on that side happy to talk.    Best,    /chet      


  • 9.  Re: [cti] Idea for Internationalization

    Posted 02-02-2016 15:12
    Yes, doing my option 3 or option 4 allows us to have the translation work done by the original producer and by 3rd parties along the way.  The solution that Ryu and Terry have called out, work, but only for the original producer.  Everyone that wants to add a translation, or fix a translation, would have to re-issue and re-version the entire TLO.  Which will break linkages to campaign and threat actors as the translations grow organically by themselves.  Further, you loose the ability to track the original source.  Also, every time we require some 3rd party to rev the original TLO, we run the risk of losing data markings and other key things. Options 3 allows for the creation of tiny add-on objects that contain just the fields that one might translate. Option 4, allows you to create effectively a dict of the object and translate any fields you want.   Thanks, Bret Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.   On Feb 2, 2016, at 05:25, Wunder, John A. < jwunder@mitre.org > wrote: I think this option is what Bret was describing as #1. As he says, there’s a few downsides: first, it’s more complicated for single language content (not a huge deal) and, second, it doesn’t allow for translations by someone other than the individual producer (since they can’t revise the object). They’d have to use some relationship approach, which would give us two ways of doing it and is exactly what we want to avoid. I initially thought this might be uncommon, but it’s exactly what Chris just outlined as a use case. For that reason I prefer option #3 from Bret’s e-mail. It supports both original translations from the producer as well as shared third-party translations. It doesn’t complicated things for single language content and it doesn’t have the relationship tracking downsides that Terry mentions below because it’s not a separate TLO (well, it is, but of a special type). It would look something like this: [   {     “type”: “indicator”,     “id”: “indicator—UUID”,     “lang”: “en-US”,     “title”: “English title”,   },   {     “type”: “indicator-translation”,     “id”: “indicator-translation—UUID”,     “object_ref”: “indicator—UUID”     “lang”: “jp”,     “title”: “Japanese Title”   } ] The second object (the translation) would also need a way to point to a specific revision (if we use a different approach than relationships) and the producer, but we don’t have consensus on that so I omitted it. Option 4 would look the same, by the way, except the type would be “translation” and we wouldn’t have individual schemas specifying which fields are translatable. Because the indicator is the only indicator TLO there, all domain relationships would go through it as the original. John From:   < cti@lists.oasis-open.org > on behalf of Terry MacDonald < terry@soltra.com > Date:   Tuesday, February 2, 2016 at 5:37 AM To:   Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >, Jordan, Bret < bret.jordan@bluecoat.com >, Chris Ricard < cricard@fsisac.us > Cc:   cti@lists.oasis-open.org < cti@lists.oasis-open.org > Subject:   RE: [cti] Idea for Internationalization Hi Bret / Ryu / All,   Embedding the different translations within a single TLO object (as Ryu has suggested) is my preferred option. If we use a relationship to join the two different translations of the same information, then we end up with two bits of information that are effectively the same thing. This causes problems when third-parties link their threat intel to the two translations of the same information.   Lets imagine OrgA creates IncidentA(en) and IncidentA(jp) and relates them together using translation-of relationship. OrgX has CampaignX(en) that they want to relate to the Incidents that they’ve seen. ·            It sizeably increases the size of data stored and moved. There are duplicate fields in both objects, and we require a separate relationship object to relate the two together. This is another relationship that needs to be walked and we then require extra storage unnecessarily used up by duplicate information. ·            Which object do they relate to their threat intel? The English version or the translated version? Both? ·            If OrgA updates IncidentA(en) and IncidentA(jp) isn’t updated, then which object is considered the “truth”? ·            What if a consumer only receives the IncidentA(en), yet a third-party has published a relationship linking IncidentA(jp) to another different Campaign? If the translations were embedded within the same single object then the single relationship would cover all translations.   This has bigger implications on the object lifecycle than many people realize. Anytime that we separate the same logical ‘thing’ into separate versions of that thing we are potentially opening ourselves up to this problem. If we are describing an Incident, then there should be one Incident object with the data relating to that object within that one object. New revisions of that object should update that same object, and updates to that object should not affect any relationships pointing to that object.   I’m worried that we are potentially making a future problem for ourselves if we head down this path.   Cheers   Terry MacDonald Senior STIX Subject Matter Expert SOLTRA   An FS-ISAC and DTCC Company +61 (407) 203 206   terry@soltra.com     From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Masuoka, Ryusuke Sent:   Tuesday, 2 February 2016 6:02 PM To:   Jordan, Bret < bret.jordan@bluecoat.com >; Chris Ricard < cricard@fsisac.us > Cc:   cti@lists.oasis-open.org Subject:   RE: [cti] Idea for Internationalization   Hi, Bret, Ricard,   Thank you for feedback. I reply to this threat as the subject is more appropriate.   The use case scenarios are not all about translating existing (text) objects. In some cases:   - Some major text fields (title, description, etc.) are produced in multiple languages at the time of package creation.     For example, a Japanese entity creates a new CTI package and gives its title in Japanese as well as English so that at least someone, who gets interested in the title and does not read Japanese, can contact the original producer for further details. (It is often the case for papers written in Japanese. They provide titles and abstracts in English.)     It is not always clear which is original and which is translation.   I know that it might lose to some degree tractability of which text is original, but how about making it possible for any text field to have its text values as many languages as necessary, specified in the object by   “ lang ”   tag. Something like the following.   ----- {   type : stix-package ,   id : stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365 ,   “title”: [     {       “lang”: “en”,       “value”: “Some really neat campaign that we found”      },      {        “lang”: “ja”,        “value”: “ ? ??????????????????? ”       }     ] } -----   What do you say?   Regards,   Ryu   From: cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Jordan, Bret Sent:   Tuesday, February 02, 2016 12:30 PM To:   Chris Ricard Cc:   cti@lists.oasis-open.org Subject:   Re: [cti] Idea for Internationalization   Those are great use cases and match what Ryu brought up.  We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have some general consensus around the idea that: All TLOs should have a field called lang that defines the languages of the object (ex, en_us)   What we do not yet know is should this be required?  Or should be be optional? Beyond that we have a few different options:   Translated content is embedded inside the original TLO This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally produce it.   We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2 Translated versions of a TLO will be represented as a new TLO This could work, if there was a relationship object or a translation object that could connect them.  There might be some weirdness in the work flow that we have yet to identify.   Another option would be to create a translation object that just contains the fields that can be translated.  You would then have the parent TLO written in lang=foo and the translations that are written in lang=fr, lang=de, lang=jp etc. This is very similar to #2.  However, this object contains just a subset of the fields. This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release the original TLO You could schema validate this method This would be super easy for consumers and parsers to deal with The last option as I see it, is something similar to #3, but where the fields are not defined in the spec.  A translator can flag any fields they want to translate and include them in the object.  This object will be tied to the original in the same way as #3.   On the consuming side, you could do really interesting things in software with merging the data in your database. The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields would be included. This is very flexible, but may produce problems for consumers or parsers.    Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.     On Feb 1, 2016, at 20:02, Chris Ricard < cricard@fsisac.us > wrote:     Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.   Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.   We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients can evaluate the applicability and importance.  The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.   This is all for human-readable reports, but the use case seems similar.   Proposal:   1.           Add an optional language tag in all top level constructs.  2.          Add an optional Alternate Language tag to Relationship objects. 3.          Producers can create multiple language-specific versions of whatever top-level objects they wish.  4.          Producers can create Alternate Language relationships between these alternate language objects. 5.          Consumers can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions. 6.          If the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.   Use Cases:   1.           I produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language relationship. 2.          I consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships. 3.          I consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English version.  When only an English version exists, I display the English version.   For consideration,   Chris Ricard FS-ISAC       From:   Jordan, Bret Sent:   Monday, February 1, 2016 9:02 PM To:   Masuoka, Ryusuke Cc:   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.     One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called lang .  This would be the language that the object is written in.  This would enable tools to select and filter by a language.     Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed around the idea of creating a translation object that could be sent either with the original TLO or separately.    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:   1) A threat intel provider could issue TLOs in language specific versions if they wanted.   2) A threat intel provider could produce language translations and attach them to the TLO.     3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.   There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated objects to auto point to a new version.  They would be tied at the hip, to the version that were created for.   The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.     What would you think of something like this?   Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.     On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi, Bret,   I guess it depends. But what I see is scenarios like the following:   - A Japanese entity receives CTI information pieces in English.   The entity determines some of them are important/critical   and worth translating them into Japanese, add descriptions in Japanese and redistribute them to other Japanese entities (if redistribution is allowed).   The CTIM (CTI Management System) of a receiving party displays   the Japanese description whenever possible, while allowing access to   the original English descriptions.   - Japanese entities produce CTI in Japanese (not in English, surprise!).   An entity decides some of them are important/critical and worth translating them into English, add descriptions in English, and redistribute them to other countries (if redistribution is allowed). The CTIM of a receiving party displays  the English description if so set, while allowing access to the original Japanese (likely more accurate) descriptions.   Regards,   Ryu   From:   Jordan, Bret [ mailto:bret.jordan@bluecoat.com ]   Sent:   Tuesday, February 02, 2016 10:17 AM To:   Masuoka, Ryusuke/ ??   ?? ;   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   Some questions: Will organizations producing threat intelligence produce one incident for each language?   Or will they produce one big incident that contains all of the languages? For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?   Or would you expect the TAXII server to send you both? What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?     Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.     On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi,   Not UTF-8 thing (I understand most of modern programming languages and other standards deal with it correctly).   It is about having text fields in multiple languages. For example, descriptions of a package in English and Japanese. The system will pick which language to display based on the language code ( “ en ”   or   “ jp ” ) in the field.   Is it something already discussed in Slack? (Sorry if so.)   Regards,   Ryu   From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Jordan, Bret Sent:   Tuesday, February 02, 2016 9:59 AM To:   Masuoka, Ryusuke/ ??   ?? Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.  So this alone should make things easier for our international friends..   If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand the problem that we were trying to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?   I really want to make sure our baby works for everyone.  But as I said on Slack, I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a coke .   Thanks,   Bret       Bret Jordan CISSP Director of Security Architecture and Standards Office of the CTO Blue Coat Systems PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050 Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg.     On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com > wrote:   Hi,   Is there a place for   “ Internationalization ”   of text fields? I would like very much to see it in STIX 2.0 (or CTI Common?) and I am willing to contribute.   Regards,   Ryu   From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On Behalf Of   Barnum, Sean D. Sent:   Tuesday, February 02, 2016 3:49 AM To:   cti@lists.oasis-open.org Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)   All,   As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX 2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.   A  proposed tranche plan  is: February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to talk to vendor on the RSA show floor about it.  And have something to show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions, assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure patterns, etc) and quality. This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche. Please let us  know if you see any issues with this tranche plan.     The first tranche (Indicators) is the most relevant for now as it begins today. Below is a draft plan for the Indicator tranche. This draft  Indicator Tranche plan is also in the wiki. This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it. We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it. If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.       Indicator tranche plan Objective: To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are required to support common indicator use cases. Target completion date: February 29, 2016 Proposed workflow: Raise and describe the issue with a brief wiki writeup Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly marked as a proposal) to the wiki writeup and announce it to the list. Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus.  Discuss key issues on weekly working call. If consensus (unanimous or at least no strong objections) reached: Capture normative language in pre-draft spec document Capture consensus changes in JSON Schema implementation Capture consensus changes in UML model Capture statement of consensus in issue tracker Mark issue tracker as “Consensus Achieved” Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion If consensus not achieved (strong objection exists) within allowed time window: Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone OR o      Capture current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche Attachment: signature.asc Description: Message signed with OpenPGP using GPGMail


  • 10.  RE: [cti] Idea for Internationalization

    Posted 02-02-2016 15:29




    Hi Brett,
     
    “ The solution that Ryu and Terry have called out, work, but only for the original producer.  Everyone that wants to add a translation,
    or fix a translation, would have to re-issue and re-version the entire TLO.  Which will break linkages to campaign and threat actors as the translations grow organically by themselves.” 
     
    Not true, if we use the incremental versioning method which was described in the TWIGS proposal to the F2F. That * would * be
    true if we used major versioning, but in TWIGS we removed major versioning because of the difficulties it caused in situations like this. If we do incremental versioning as we have written in the TWIGS proposal, then the Object ID doesn’t change with new versions
    of the object. Which means that Option 1 (kind of what Ryu and I proposed) doesn’t result in broken linkages to campaigns and threat actors.
     
    This also doesn’t result in losing the ability to track source, as another one of the TWIGS proposal ‘rules’ also applies here – that
    only content producers can update the objects they release. This also does means that translations can only be released by the content producers as well, which  is not optimal.
     
    I prefer Option 1 but can get on-board with Option 3 if that’s what others prefer. A translation only object related to the root object
    does introduce a greater size, but if it is an object that allows being partially filled then that will reduce the extra characters wasted.
     
    Cheers
     

    Terry MacDonald
    Senior STIX Subject Matter Expert
    SOLTRA   An FS-ISAC and DTCC Company
    +61 (407) 203 206
    terry@soltra.com
     

     


    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Jordan, Bret
    Sent: Wednesday, 3 February 2016 2:12 AM
    To: Wunder, John A. <jwunder@mitre.org>
    Cc: cti@lists.oasis-open.org
    Subject: Re: [cti] Idea for Internationalization


     
    Yes, doing my option 3 or option 4 allows us to have the translation work done by the original producer and by 3rd parties along the way.  The solution that Ryu and Terry have called out, work, but only for the original producer.  Everyone
    that wants to add a translation, or fix a translation, would have to re-issue and re-version the entire TLO.  Which will break linkages to campaign and threat actors as the translations grow organically by themselves. 

     


    Further, you loose the ability to track the original source.  Also, every time we require some 3rd party to rev the original TLO, we run the risk of losing data markings and other key things.

     


    Options 3 allows for the creation of tiny add-on objects that contain just the fields that one might translate. Option 4, allows you to create effectively a dict of the object and translate any fields you want.  







     


    Thanks,


     


    Bret



     


     


     



    Bret Jordan CISSP

    Director of Security Architecture and Standards Office of the CTO


    Blue Coat Systems



    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050


    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 









     



    On Feb 2, 2016, at 05:25, Wunder, John A. < jwunder@mitre.org > wrote:

     



    I think this option is what Bret was describing as #1. As he says, there’s a few downsides: first, it’s more complicated for single language content (not a huge deal) and,
    second, it doesn’t allow for translations by someone other than the individual producer (since they can’t revise the object). They’d have to use some relationship approach, which would give us two ways of doing it and is exactly what we want to avoid. I initially
    thought this might be uncommon, but it’s exactly what Chris just outlined as a use case.


     


    For that reason I prefer option #3 from Bret’s e-mail. It supports both original translations from the producer as well as shared third-party translations. It doesn’t complicated
    things for single language content and it doesn’t have the relationship tracking downsides that Terry mentions below because it’s not a separate TLO (well, it is, but of a special type).


     


    It would look something like this:


     


    [


      {


        “type”: “indicator”,


        “id”: “indicator—UUID”,


        “lang”: “en-US”,


        “title”: “English title”,


      },


      {


        “type”: “indicator-translation”,


        “id”: “indicator-translation—UUID”,


        “object_ref”: “indicator—UUID”


        “lang”: “jp”,


        “title”: “Japanese Title”


      }


    ]


     


    The second object (the translation) would also need a way to point to a specific revision (if we use a different approach than relationships) and the producer, but we don’t
    have consensus on that so I omitted it. Option 4 would look the same, by the way, except the type would be “translation” and we wouldn’t have individual schemas specifying which fields are translatable.


     


    Because the indicator is the only indicator TLO there, all domain relationships would go through it as the original.


     


    John



     


    From:   < cti@lists.oasis-open.org >
    on behalf of Terry MacDonald < terry@soltra.com >
    Date:   Tuesday, February 2, 2016 at 5:37 AM
    To:   "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Jordan, Bret" < bret.jordan@bluecoat.com >,
    Chris Ricard < cricard@fsisac.us >
    Cc:   " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject:   RE: [cti] Idea for Internationalization


     




    Hi Bret / Ryu / All,


     


    Embedding the different translations within a single TLO object (as Ryu has suggested) is my preferred option. If we use a relationship to join the two different
    translations of the same information, then we end up with two bits of information that are effectively the same thing. This causes problems when third-parties link their threat intel to the two translations of the same information.


     


    Lets imagine OrgA creates IncidentA(en) and IncidentA(jp) and relates them together using translation-of relationship. OrgX has CampaignX(en) that they want to
    relate to the Incidents that they’ve seen.


    ·            It
    sizeably increases the size of data stored and moved. There are duplicate fields in both objects, and we require a separate relationship object to relate the two together. This is another relationship that needs to be walked and we then require extra storage
    unnecessarily used up by duplicate information.


    ·            Which
    object do they relate to their threat intel? The English version or the translated version? Both?


    ·            If
    OrgA updates IncidentA(en) and IncidentA(jp) isn’t updated, then which object is considered the “truth”?


    ·            What
    if a consumer only receives the IncidentA(en), yet a third-party has published a relationship linking IncidentA(jp) to another different Campaign? If the translations were embedded within the same single object then the single relationship would cover all
    translations.


     


    This has bigger implications on the object lifecycle than many people realize. Anytime that we separate the same logical ‘thing’ into separate versions of that
    thing we are potentially opening ourselves up to this problem. If we are describing an Incident, then there should be one Incident object with the data relating to that object within that one object. New revisions of that object should update that same object,
    and updates to that object should not affect any relationships pointing to that object.


     


    I’m worried that we are potentially making a future problem for ourselves if we head down this path.


     


    Cheers


     



    Terry MacDonald


    Senior STIX Subject Matter Expert


    SOLTRA   An FS-ISAC and DTCC Company


    +61 (407) 203 206   terry@soltra.com


     



     




    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Masuoka, Ryusuke
    Sent:   Tuesday, 2 February 2016 6:02 PM
    To:   Jordan, Bret < bret.jordan@bluecoat.com >; Chris Ricard < cricard@fsisac.us >
    Cc:   cti@lists.oasis-open.org
    Subject:   RE: [cti] Idea for Internationalization




     


    Hi, Bret, Ricard,


     


    Thank you for feedback. I reply to this threat as the subject is


    more appropriate.


     


    The use case scenarios are not all about translating existing


    (text) objects. In some cases:


     


    - Some major text fields (title, description, etc.) are produced


    in multiple languages at the time of package creation.


     


      For example, a Japanese entity creates a new CTI package


    and gives its title in Japanese as well as English so that


    at least someone, who gets interested in the title and


    does not read Japanese, can contact the original producer


    for further details. (It is often the case for papers written


    in Japanese. They provide titles and abstracts in English.)


     


      It is not always clear which is original and which is translation.


     


    I know that it might lose to some degree tractability of which text is original,


    but how about making it possible for any text field to


    have its text values as many languages as necessary,


    specified in the object by   “ lang ”   tag.
    Something like the following.


     


    -----


    {


      "type": "stix-package",


      "id": "stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365",


      “title”: [


        {


          “lang”: “en”,


          “value”: “Some really neat campaign that we found”


         },


         {


           “lang”: “ja”,


           “value”: “ ? ??????????????????? ”


          }


        ]


    }


    -----


     


    What do you say?


     


    Regards,


     


    Ryu


     




    From: cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Jordan, Bret
    Sent:   Tuesday, February 02, 2016 12:30 PM
    To:   Chris Ricard
    Cc:   cti@lists.oasis-open.org
    Subject:   Re: [cti] Idea for Internationalization




     


    Those are great use cases and match what Ryu brought up.  We have not yet heard from the majority of the community yet, but I believe from conversations we have had on Slack that we have
    some general consensus around the idea that:



    All TLOs should have a field called "lang" that defines the languages of the object (ex, en_us)  


    What we do not yet know is should this be required?  Or should be be optional?



    Beyond that we have a few different options:





     




    Translated content is embedded inside the original TLO


    This I believe is riddled with problems as individuals start making translations of a TLO and needing to re-issue the TLO even if they did not originally
    produce it.   We will get in to all sorts of versioning problems, similar to what we have today in STIX 1.2


    Translated versions of a TLO will be represented as a new TLO


    This could work, if there was a relationship object or a translation object that could connect them.  There might be some weirdness in the work flow that we have yet to identify.  


    Another option would be to create a translation object that just contains the fields that can be translated.  You would then have the parent TLO written
    in lang=foo and the translations that are written in lang=fr, lang=de, lang=jp etc.


    This is very similar to #2.  However, this object contains just a subset of the fields. This would allow desperate organizations to produce translations of a TLO independent of the original producers and then release it without needing to re-release
    the original TLO You could schema validate this method This would be super easy for consumers and parsers to deal with


    The last option as I see it, is something similar to #3, but where the fields are not defined in the spec.  A translator can flag any fields they want to
    translate and include them in the object.  This object will be tied to the original in the same way as #3.  


    On the consuming side, you could do really interesting things in software with merging the data in your database. The problem with this is you would not be able to schema validate the translation object as you would have no way of knowing ahead of time, which fields
    would be included. This is very flexible, but may produce problems for consumers or parsers. 












     




    Thanks,




     




    Bret





     




     




     





    Bret Jordan CISSP



    Director of Security Architecture and Standards Office of the CTO




    Blue Coat Systems





    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050




    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 











     





    On Feb 1, 2016, at 20:02, Chris Ricard < cricard@fsisac.us >
    wrote:



     




     




    Five minutes before this email, I was CCed on a Japanese translation of an advisory we published earlier today.




     




    Background:  We have FS-ISAC members in Japan, as well as a partner org in Japan that we work with.




     




    We have a Japanese translator on staff, who identifies the important advisories, and translates enough of them into Japanese so the recipients can evaluate the
    applicability and importance.  The original, English verison is also included, so if the recipient deems it applicable, he/she can translate the rest.




     




    This is all for human-readable reports, but the use case seems similar.




     




    Proposal:




     




    1.           Add
    an optional language tag in all top level constructs. 




    2.          Add
    an optional Alternate Language tag to Relationship objects.




    3.          Producers
    can create multiple language-specific versions of whatever top-level objects they wish. 




    4.          Producers
    can create Alternate Language relationships between these alternate language objects.




    5.          Consumers
    can choose to maintain alternate language versions of the objects, or can choose to maintain some or all of the alternate language versions.




    6.          If
    the consumer chooses to maintain Alternate Languages, the Alternate Language relationship objects would support the relationship between the alternate language versions of the same object.




     




    Use Cases:




     




    1.           I
    produce content in English, but have Japanese constituents.  I publish everything in English, and a subset of the info in Japanese.  For those objects that I also publish in Japanese, I link the English and Japanese versions together with an Alternate Language
    relationship.




    2.          I
    consume the content in #1, and English is my primary language.  Upon receipt, I discard the Japanese versions and the alternate language relationships.




    3.          I
    consume the content in #1, but Japanese is my primary language.  Upon receipt, I consume both the English and Japanese versions.  When both exist for a given item, I display the Japanese version first, and provide a link to the English version.  When only
    an English version exists, I display the English version.




     




    For consideration,




     




    Chris Ricard




    FS-ISAC




     




     




     






    From:   Jordan, Bret
    Sent:   Monday, February 1, 2016 9:02 PM
    To:   Masuoka, Ryusuke
    Cc:   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)





     




    Thanks for the feedback.  The tear line I am trying to figure out is where is this a specification issue and where is it an implementation issue.  





     






    One idea that we have tossed around on Slack is the idea that each top level object (TLO) would have a field called "lang".  This would be the language that the object is written in.  This would enable tools to select
    and filter by a language.  






     






    Then given the fact that only a handful of fields for a given TLO have the ability to be translated, you are not going to translate an IP address for example, we have tossed around the idea of creating a "translation"
    object that could be sent either with the original TLO or separately. 






     






    Going down the path this way, though I am not yet advocating that is the best way, would allow organizations to do very interesting things with CTI data:






     






    1) A threat intel provider could issue TLOs in language specific versions if they wanted.






     






    2) A threat intel provider could produce language translations and attach them to the TLO.  






     






    3) End users could augment or add their own translations without needing to re-release the entire TLO and thus avoid versioning issues.






     






    There was some initial concern about this model as some believe it might have issues with versioning.  But I do not think so, as you would not want translated objects to auto point to a new version.  They would be tied
    at the hip, to the version that were created for.






     





    The reason for looking at doing something like this is to avoid need for turning every String field in the serialization in to an array of objects.  






     






    What would you think of something like this?










     




    Thanks,






     






    Bret







     






     






     







    Bret Jordan CISSP





    Director of Security Architecture and Standards Office of the CTO






    Blue Coat Systems







    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050






    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 













     







    On Feb 1, 2016, at 18:29, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:





     






    Hi, Bret,






     






    I guess it depends. But what I see is scenarios like the following:






     






    - A Japanese entity receives CTI information pieces in English.






      The entity determines some of them are important/critical






      and worth translating them into Japanese, add descriptions in Japanese






    and redistribute them to other Japanese entities (if redistribution is allowed).






      The CTIM (CTI Management System) of a receiving party displays






      the Japanese description whenever possible, while allowing access to






      the original English descriptions.






     






    - Japanese entities produce CTI in Japanese (not in English, surprise!).






      An entity decides some of them are important/critical and worth






    translating them into English, add descriptions in English,






    and redistribute them to other countries (if redistribution is allowed).






    The CTIM of a receiving party displays  the English description if so set,






    while allowing access to the original Japanese (likely more accurate)






    descriptions.






     






    Regards,






     






    Ryu






     








    From:   Jordan,
    Bret [ mailto:bret.jordan@bluecoat.com ]  
    Sent:   Tuesday, February 02, 2016 10:17 AM
    To:   Masuoka, Ryusuke/ ??   ?? ;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)








     






    Some questions:





    Will organizations producing threat intelligence produce one incident for each language?  


    Or will they produce one big incident that contains all of the languages?


    For an indicator with a localized title / description, would a TAXII server just send you the jp version vs the en_us version?  


    Or would you expect the TAXII server to send you both?


    What would be the expected behavior if you got a version in a language that you did not speak, say Hungarian?







     








     














    Thanks,








     








    Bret









     








     








     









    Bret Jordan CISSP







    Director of Security Architecture and Standards Office of the CTO








    Blue Coat Systems









    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050








    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 















     









    On Feb 1, 2016, at 18:07, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:







     








    Hi,








     








    Not UTF-8 thing (I understand most of modern programming languages








    and other standards deal with it correctly).








     








    It is about having text fields in multiple languages.








    For example, descriptions of a package in English and Japanese.








    The system will pick which language to display based on








    the language code ( “ en ”   or   “ jp ” )
    in the field.








     








    Is it something already discussed in Slack?








    (Sorry if so.)








     








    Regards,








     








    Ryu








     










    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Jordan, Bret
    Sent:   Tuesday, February 02, 2016 9:59 AM
    To:   Masuoka, Ryusuke/ ??   ??
    Cc:   Barnum, Sean D.;   cti@lists.oasis-open.org
    Subject:   Re: [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)










     








    I would really like to understand this... . Do you mean to make sure the text fields are not ASCII so that you can put in other character sets?  JSON gives us UTF-8 by default.  So this
    alone should make things easier for our international friends..









     










    If this is not what you mean.  Please explain and give us some context.  We have had some passionate debates on Slack about this recently, but I feel now, that we do not really understand
    the problem that we were trying to solve.  Can you help us understand the problem?  What works, what does not work, what you need it to do and why?










     










    I really want to make sure our baby works for everyone.  But as I said on Slack, "I do not want to engineer a space ship when all we need is a bike to run to the corner store and get a coke".











     






    Thanks,










     










    Bret











     










     










     











    Bret Jordan CISSP









    Director of Security Architecture and Standards Office of the CTO










    Blue Coat Systems











    PGP Fingerprint: 63B4 FC53 680A 6B7D 1447  F2C0 74F8 ACAE 7415 0050










    "Without cryptography vihv vivc ce xhrnrw, however, the only thing that can not be unscrambled is an egg." 

















     











    On Feb 1, 2016, at 17:46, Masuoka, Ryusuke < masuoka.ryusuke@jp.fujitsu.com >
    wrote:









     










    Hi,










     










    Is there a place for   “ Internationalization ”   of
    text fields?










    I would like very much to see it in STIX 2.0 (or CTI Common?)










    and I am willing to contribute.










     










    Regards,










     










    Ryu










     












    From:   cti@lists.oasis-open.org   [ mailto:cti@lists.oasis-open.org ]   On
    Behalf Of   Barnum, Sean D.
    Sent:   Tuesday, February 02, 2016 3:49 AM
    To:   cti@lists.oasis-open.org
    Subject:   [cti] Draft tranche plan for achieving our July target date for draft specs (STIX 2.0, TAXII 2.0, CybOX 3.0)












     














    All,












     












    As discussed at the face to face meeting and briefly on the TC monthly call and the list we plan to work toward our aggressive July target date for draft STIX
    2.0, TAXII 2.0 and CybOX 3.0 specs utilizing a more product management approach with roughly monthly tranches focused on resolving all in-scope identified issues relevant to a particular capability area.












     












    A  proposed
    tranche plan  is:







    February 29th - Run Indicators to the ground.  Get these fundamentals worked through to enable us to talk to vendor on the
    RSA show floor about it.  And have something to show them.  March 31st – Run remaining cross-cutting issues to ground. Run Identity-based Victim, Source and Actor top level abstractions
    to ground. April 30th – Run Incidents (investigations) to ground. Run Asset top level abstraction to ground. Run Campaign to ground. May 31st – Run controlled vocabularies to ground. Run automated COA default extension to ground. Run analytic support (opinions,
    assertions, hypotheses, etc.) to ground. June 30th – All other remaining top level elements. Review pass for consistency (field name choices, naming conventions, structure
    patterns, etc) and quality.






    This should cover the existing in-scope issues in a coherent and dependency-aware iterative fashion. For CybOX, the Indicator tranche is likely to cover patterning
    and key object support decisions with remaining tranches focused on key object refactoring based on decisions from the Indicator tranche.












    Please let us  know if you see any issues with this tranche plan.












     












     












    The first tranche (Indicators) is the most relevant for now as it begins today.












    Below is a draft plan for the Indicator tranche. This draft  Indicator
    Tranche plan is also in the wiki.












    This is a very aggressive plan considering the amount of issues to discuss and decide and the limited time to do it.












    We will strive to achieve this plan and encourage active collaboration from everyone to help us accomplish it.












    If you have comments, feedback or issues with this draft plan please let us know so that we may adapt as appropriate.












     












     












     








    Indicator tranche plan

    Objective:
    To discuss and reach consensus on all in-scope tracker issues for STIX 2.0 that are
    required to support common indicator use cases.
    Target completion date:
    February 29, 2016
    Proposed workflow:


    Raise and describe the issue with a brief wiki writeup
    Discuss issue on list and/or slack (with summaries made on list). Anyone with proposed solution may add details of their proposal (proposed normative text, examples, diagrams, schema,etc clearly
    marked as a proposal) to the wiki writeup and announce it to the list.
    Discuss, debate, review proposals, comment as appropriate within defined time window to work towards consensus. 
    Discuss key issues on weekly working call.
    If consensus (unanimous or at least no strong objections) reached:



    Capture normative language in pre-draft spec document
    Capture consensus changes in JSON Schema implementation
    Capture consensus changes in UML model
    Capture statement of consensus in issue tracker
    Mark issue tracker as “Consensus Achieved”
    Clearly mark relevant issue wiki pages as “Consensus Achieved” or potentially move them to a separate Consensus repo to avoid confusion



    If consensus not achieved (strong objection exists) within allowed time window:



    Discuss and decide whether issue is absolutely necessary for MVP and if not decide to postpone
    OR





















    o      Capture
    current consensus status in issue tracker, mark as “Consensus Stalled”, move on to other issues and revisit the issue during last week of tranche










     








  • 11.  RE: [cti] Idea for Internationalization

    Posted 02-02-2016 15:37
    Instead of re-issuing the TLO at all, I don't get why we can't just have a "translation" TLO. This avoids having to re-issue objects at all. All IDs stay the same. Also allows any party to add translations. Here is the concept. { "type": "stix-package", "id": "stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365", "lang" :"en-US" “title”: "My Campaign" } { "type":"translation", "id":"stix-translation-- 78ac772a-ba02-4693-9b0b-39d568bc8514", "field":"title", "lang":"jp", "value":" ?????? " } "relationships" : [ { "id" : "relationship--1" , "type" : "relationship" , "from" : " stix-package--ad3d029f-6fe7-4923-aafc-3b69aed32365 " , "to" : " stix-translation-- 78ac772a-ba02-4693-9b0b-39d568bc8514 " , "relationship_value" : "translation" }, - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown Terry MacDonald ---02/02/2016 11:29:12 AM---Hi Brett, “The solution that Ryu and Terry have called out, work, but only for the original producer From: Terry MacDonald <terry@soltra.com> To: "Jordan, Bret" <bret.jordan@bluecoat.com>, "Wunder, John A." <jwunder@mitre.org> Cc: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Date: 02/02/2016 11:29 AM Subject: RE: [cti] Idea for Internationalization Sent by: <cti@lists.oasis-open.org> Hi Brett, “ The solution that Ryu and Terry have called out, work, but only for the original producer. Everyone that wants to add a translation, or fix a translation, would have to re-issue and re-version the entire TLO. Which will break linkages to campaign and threat actors as the translations grow organically by themselves.” Not true, if we use the incremental versioning method which was described in the TWIGS proposal to the F2F. That * would * be true if we used major versioning, but in TWIGS we removed major versioning because of the difficulties it caused in situations like this. If we do incremental versioning as we have written in the TWIGS proposal, then the Object ID doesn’t change with new versions of the object. Which means that Option 1 (kind of what Ryu and I proposed) doesn’t result in broken linkages to campaigns and threat actors. This also doesn’t result in losing the ability to track source, as another one of the TWIGS proposal ‘rules’ also applies here – that only content producers can update the objects they release. This also does means that translations can only be released by the content producers as well, which is not optimal. I prefer Option 1 but can get on-board with Option 3 if that’s what others prefer. A translation only object related to the root object does introduce a greater size, but if it is an object that allows being partially filled then that will reduce the extra characters wasted. Cheers Terry MacDonald Senior STIX Subject Matter Expert SOLTRA An FS-ISAC and DTCC Company +61 (407) 203 206 terry@soltra.com