OASIS Cyber Threat Intelligence (CTI) TC

Expand all | Collapse all

Re: [cti] Internationalization: lang field required or optional?

  • 1.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-23-2017 21:30




    Prefer optional.
     

    From: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of "Wunder, John" <jwunder@mitre.org>
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: [cti] Internationalization: lang field required or optional?


     

    Hey everyone,
     
    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:

    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz
     
    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object.
    What we need to decide is whether we make that field required or optional.
     
    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the
    field or not.
     
    Here are some thoughts:
     
    Making it required:



    -          
    All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -          
    It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -          
    It shows we have a commitment to internationalization
     
    Making it optional:
     
    -          
    Any SRO or SDO could have a language tag, so consumers could not depend on it
    -          
    Producers would not have to create it
    -          
    We do have a SHOULD requirement saying that it should be included
     
    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to
    English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those
    who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s
    a lot of bloat…would like to avoid adding to that.
     
    Anyway, what does everyone think…required or optional?
     
    John






  • 2.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-23-2017 22:05
    My thoughts.... 1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it.  Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.  Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is. Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.  I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it. Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time. Bret  From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Allan Thomson <athomson@lookingglasscyber.com> Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   Prefer optional.   From: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of "Wunder, John" <jwunder@mitre.org> Date: Thursday, February 23, 2017 at 12:59 PM To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Subject: [cti] Internationalization: lang field required or optional?   Hey everyone,   We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz   We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional.   If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.   Here are some thoughts:   Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization   Making it optional:   -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included   My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.   Anyway, what does everyone think…required or optional?   John


  • 3.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-23-2017 23:01




    If you are expecting to use different language content then its required for interoperability reasons.
     
    But by marking it required in the spec means that all content must have it even when most content is not multi-language.

     
    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.
     
    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.
     
    allan
     

    From: Bret Jordan <Bret_Jordan@symantec.com>
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson <athomson@lookingglasscyber.com>, "Wunder, John" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: Re: [cti] Internationalization: lang field required or optional?


     


    My thoughts....
     
    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it. 
     
    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. 
     
    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.
     
    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing. 
     
    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the
    language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.
     
    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.
     
    Bret 





    From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Allan Thomson <athomson@lookingglasscyber.com>
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     



    Prefer optional.
     

    From: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of "Wunder, John" <jwunder@mitre.org>
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: [cti] Internationalization: lang field required or optional?


     

    Hey everyone,
     
    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:

    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz
     
    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object.
    What we need to decide is whether we make that field required or optional.
     
    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the
    field or not.
     
    Here are some thoughts:
     
    Making it required:




    -          
    All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -          
    It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -          
    It shows we have a commitment to internationalization
     
    Making it optional:
     
    -          
    Any SRO or SDO could have a language tag, so consumers could not depend on it
    -          
    Producers would not have to create it
    -          
    We do have a SHOULD requirement saying that it should be included
     
    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to
    English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those
    who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s
    a lot of bloat…would like to avoid adding to that.
     
    Anyway, what does everyone think…required or optional?
     
    John







  • 4.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-23-2017 23:28
    >>  But by marking it required in the spec means that all content must have it even when most content is not multi-language. This is not correct.  The lang field is not for marking multi-language content, but rather, identify the language of the SDO/SRO. Bret From: Allan Thomson <athomson@lookingglasscyber.com> Sent: Thursday, February 23, 2017 4:01:24 PM To: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   If you are expecting to use different language content then its required for interoperability reasons.   But by marking it required in the spec means that all content must have it even when most content is not multi-language.   I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.   If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.   allan   From: Bret Jordan <Bret_Jordan@symantec.com> Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson <athomson@lookingglasscyber.com>, "Wunder, John" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Subject: Re: [cti] Internationalization: lang field required or optional?   My thoughts....   1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it.    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.    Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.   Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it.   Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.   Bret  From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Allan Thomson <athomson@lookingglasscyber.com> Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   Prefer optional.   From: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of "Wunder, John" <jwunder@mitre.org> Date: Thursday, February 23, 2017 at 12:59 PM To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Subject: [cti] Internationalization: lang field required or optional?   Hey everyone,   We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz   We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional.   If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.   Here are some thoughts:   Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization   Making it optional:   -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included   My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.   Anyway, what does everyone think…required or optional?   John


  • 5.  RE: [cti] Internationalization: lang field required or optional?

    Posted 02-23-2017 23:45
    I’m leaning towards required right now, for mostly the same reasons as Bret enumerated.  From an interoperability perspective, leaving it optional and undef can cause all sorts of assumption grief down the road.   From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org] On Behalf Of Bret Jordan Sent: Thursday, February 23, 2017 6:28 PM To: Allan Thomson; Wunder, John A.; cti@lists.oasis-open.org Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?   >>  But by marking it required in the spec means that all content must have it even when most content is not multi-language.   This is not correct.  The lang field is not for marking multi-language content, but rather, identify the language of the SDO/SRO.   Bret From: Allan Thomson < athomson@lookingglasscyber.com > Sent: Thursday, February 23, 2017 4:01:24 PM To: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   If you are expecting to use different language content then its required for interoperability reasons.   But by marking it required in the spec means that all content must have it even when most content is not multi-language.   I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.   If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.   allan   From: Bret Jordan < Bret_Jordan@symantec.com > Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional?   My thoughts....   1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it.    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.    Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.   Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it.   Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.   Bret  From: cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com > Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   Prefer optional.   From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org > Date: Thursday, February 23, 2017 at 12:59 PM To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: [cti] Internationalization: lang field required or optional?   Hey everyone,   We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz   We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional.   If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.   Here are some thoughts:   Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization   Making it optional:   -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included   My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.   Anyway, what does everyone think…required or optional?   John


  • 6.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-24-2017 02:43




    I think we are talking past each other.
     
    I’m not suggesting its only for multi-language.
     
    But I think this email thread is not ideal place to resolve the issues.
     
    allan
     

    From: Bret Jordan <Bret_Jordan@symantec.com>
    Date: Thursday, February 23, 2017 at 3:27 PM
    To: Allan Thomson <athomson@lookingglasscyber.com>, "Wunder, John" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: Re: [cti] Internationalization: lang field required or optional?


     


    >>  But by marking it required in the spec means that all content must have it even when most content is not multi-language.
     
    This is not correct.  The lang field is not for marking multi-language content, but rather, identify the language of the SDO/SRO.
     
    Bret





    From: Allan Thomson <athomson@lookingglasscyber.com>
    Sent: Thursday, February 23, 2017 4:01:24 PM
    To: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     



    If you are expecting to use different language content then its required for interoperability reasons.
     
    But by marking it required in the spec means that all content must have it even when most content is not multi-language.

     
    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.
     
    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.
     
    allan
     

    From: Bret Jordan <Bret_Jordan@symantec.com>
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson <athomson@lookingglasscyber.com>, "Wunder, John" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: Re: [cti] Internationalization: lang field required or optional?


     


    My thoughts....
     
    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it. 
     
    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. 
     
    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.
     
    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing. 
     
    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the
    language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.
     
    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.
     
    Bret 





    From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Allan Thomson <athomson@lookingglasscyber.com>
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     



    Prefer optional.
     

    From: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of "Wunder, John" <jwunder@mitre.org>
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: [cti] Internationalization: lang field required or optional?


     

    Hey everyone,
     
    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:

    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz
     
    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object.
    What we need to decide is whether we make that field required or optional.
     
    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the
    field or not.
     
    Here are some thoughts:
     
    Making it required:





    -          
    All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -          
    It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -          
    It shows we have a commitment to internationalization
     
    Making it optional:
     
    -          
    Any SRO or SDO could have a language tag, so consumers could not depend on it
    -          
    Producers would not have to create it
    -          
    We do have a SHOULD requirement saying that it should be included
     
    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to
    English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those
    who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s
    a lot of bloat…would like to avoid adding to that.
     
    Anyway, what does everyone think…required or optional?
     
    John








  • 7.  RE: [cti] Internationalization: lang field required or optional?

    Posted 02-24-2017 08:57




    Hi,

     
    [I will reiterate (with a little bit of update) my concern stated

    in the #i18n Slack channel here.]
     
    My concern is that no one would include lang: tag in their STIX if we make it optional,
    that it makes almost meaningless to define lang: tag, and

    that is it somewhat problematic from
    “ promoting interoperability ”
    point of view.
     
    I guess usually analysts would communicate within circles speaking the same language,

    e.g. English, Japanese. It works just fine without lang: tag as long as they are

    within each circle of the same language.
    What I envision is (and I think it is the reason why we want to introduce lang: tag in the first place)

    that we will see more CTI flowing across language borders.
    At that time, we cannot provide good interoperability for cross-language-border CTI contents.
     
    I saw an argument that it is difficult to determine which language the user is using.
    But in these days, all modern OS has its language (package and/or locale) to run.

    As such, it is just a matter of application to pick the language the OS is using and

    provide the language code for lang: tag it is already using.
    [I believe something like the following should work for Python (data is a STIX Object)
    import ctypes
    windll = ctypes.windll.kernel32
    language = locale.windows_locale[ windll.GetUserDefaultUILanguage() ]
      data[ “ lang ” ]
    = language[0:1]
    ]
     
    It would be a very conscious act for an analyst to provide CTI in a different language

    if the application is displaying the menu in another language.
    Therefore, it should not be much to let the analyst specify lang: tag anyway.
     
    That is my argument for making twelve additional bytes

    (thank you for the correct count, JMG!) required.
     
    Regards,
     
    Ryu
     


    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Allan Thomson
    Sent: Friday, February 24, 2017 11:43 AM
    To: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     
    I think we are talking past each other.
     
    I’m not suggesting its only for multi-language.
     
    But I think this email thread is not ideal place to resolve the issues.
     
    allan
     

    From: Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 3:27 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?


     


    >>  But by marking it required in the spec means that all content must have it even when most content is not multi-language.
     
    This is not correct.  The lang field is not for marking multi-language content, but rather, identify the language of the SDO/SRO.
     
    Bret





    From: Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 4:01:24 PM
    To: Bret Jordan; Wunder, John A.;
    cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     



    If you are expecting to use different language content then its required for interoperability reasons.
     
    But by marking it required in the spec means that all content must have it even when most content is not multi-language.

     
    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.
     
    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.
     
    allan
     

    From: Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?


     


    My thoughts....
     
    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it. 
     
    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. 
     
    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.
     
    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the
    language tag is now missing. 
     
    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and
    you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.
     
    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are
    just at a guess all the time.
     
    Bret 





    From:
    cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     



    Prefer optional.
     

    From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?


     

    Hey everyone,
     
    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:

    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz
     
    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content
    in that object. What we need to decide is whether we make that field required or optional.
     
    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either
    include the field or not.
     
    Here are some thoughts:
     
    Making it required:




    -          
    All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -          
    It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -          
    It shows we have a commitment to internationalization
     
    Making it optional:
     
    -          
    Any SRO or SDO could have a language tag, so consumers could not depend on it
    -          
    Producers would not have to create it
    -          
    We do have a SHOULD requirement saying that it should be included
     
    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will
    hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need
    it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who
    said there’s a lot of bloat…would like to avoid adding to that.
     
    Anyway, what does everyone think…required or optional?
     
    John








  • 8.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-24-2017 13:15
    I also agree with Alan and John in the
    preference to make this optional. In general I do not like sending bytes
    when bytes are not required in a data interchange format, especially when
    considering the scale of data we will be dealing with in STIX/TAXII. We
    should be looking for opportunities to keep the data format trim. Truthfully,
    the vast majority of data in an ecosystem will all be the same language,
    and thus having to transmit a language tag for every single object in a
    package is redundant information. There is also another issue with making
    it "required", and that is that we would then have to support
    "unknown" or "undefined" - which many products would
    have to mark content as since they may not know the producer of the content's
    native language.  There is an ISO 639 language tag for "undefined",
    but there is no IETF tag for "undefined" in the IANA registry,
    they never adopted the ISO entry. So making this mandatory may force a
    revisit of the RFC5646 decision. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown
    From:      
      Allan Thomson <athomson@lookingglasscyber.com> To:      
      Bret Jordan <Bret_Jordan@symantec.com>,
    "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org"
    <cti@lists.oasis-open.org> Date:      
      02/23/2017 07:01 PM Subject:    
        Re: [cti] Internationalization:
    lang field required or optional? Sent by:    
        <cti@lists.oasis-open.org> If you are expecting to use different language
    content then its required for interoperability reasons.   But by marking it required in the spec
    means that all content must have it even when most content is not multi-language.
      I generally would prefer more tolerance
    in the spec level and let the products/market use good behavior to drive
    what fields are included or not.   If people care about language and multi-language
    support then they will use it. If they don’t then they wont be interoperable
    as that will be part of the test in the interop spec.   allan   From: Bret Jordan <Bret_Jordan@symantec.com> Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson <athomson@lookingglasscyber.com>, "Wunder,
    John" <jwunder@mitre.org>, "cti@lists.oasis-open.org"
    <cti@lists.oasis-open.org> Subject: Re: [cti] Internationalization: lang field required or optional?   My thoughts....   1) In reality we are talking about a feature
    not a property.   2) If it is property of this feature is
    optional, then the only products that will implement this feature, are
    those that care about internationalization. 3) If it is required, then everyone will
    be forced to implement it.   Personally I see this as a data quality
    issue, not a STIX issue.  And I think both sides can suffer from it.
      Problems with Required: a) product or tool does not care, does
    not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for
    it, but analyst does not care and it just remains what ever the default
    is.   Problems with Optional: a) product or tool does not care, does
    not provide a UX for it, and just leaves it out of the data.  So it
    is undef.   b) product or tool does care and provides
    a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data
    that has a lang tag, but they do not support that feature so they never
    implemented it.  So when the data goes back out the other side, the
    language tag is now missing.   I personally do not see the harm in requiring
    tools to support and populate the Lang tag.  In the spec we can define
    an "unknown" value, so if you are doing bulk loading of data
    and you honestly do not know the language, you could just flag it as "unknown".
     Then at least as the consumer you would know that the producer did
    not know the language.  Versus getting an object where the language
    tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add
    it.   Once again, this is a data quality problem
    and if we make the lang field required, then it is a SUPER EASY interop
    test to see if they do it right.  If it is optional, then you are
    just at a guess all the time.   Bret From: cti@lists.oasis-open.org <cti@lists.oasis-open.org>
    on behalf of Allan Thomson <athomson@lookingglasscyber.com> Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   Prefer optional.   From: "cti@lists.oasis-open.org"
    <cti@lists.oasis-open.org> on behalf of "Wunder, John"
    <jwunder@mitre.org> Date: Thursday, February 23, 2017 at 12:59 PM To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Subject: [cti] Internationalization: lang field required or optional?   Hey everyone,   We’re getting very close to having a completed
    approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz   We do have one remaining question before
    we can move forward though. As part of the proposal, every single top-level
    object has a “lang” field, that identifies the language of the text content
    in that object. What we need to decide is whether we make that field required
    or optional.   If we make the field required, every top-level
    object in STIX (SDOs and SROs) would have to have a “lang” field in it
    or it would be invalid STIX. If we make it optional, producers could either
    include the field or not.   Here are some thoughts:   Making it required: -           All
    SDOs and SROs would have a language tag, so consumers could depend on it
    being there -           It
    would encourage producers to actually fill it out, because they wouldn’t
    be creating valid STIX otherwise -           It
    shows we have a commitment to internationalization   Making it optional:   -           Any
    SRO or SDO could have a language tag, so consumers could not depend on
    it -           Producers
    would not have to create it -           We
    do have a SHOULD requirement saying that it should be included   My opinion is that we should make it optional.
    If it’s required, I think people who don’t want to do internationalization
    (especially those creating one-off scripts or open source tools) will hardcode
    it to English and things will be mislabeled. If it’s optional, I think
    those who need/want to support internationalization and would do it right
    (most/all vendors, major open source projects) will populate it correctly
    regardless…because they need it…while those who couldn’t be bothered
    will be able to leave it off and we won’t have mis-labeled data. Also
    it’s almost not worth saying, but we already have a bunch of required
    fields on every SDO/SRO and I’ve already had one conversation with someone
    who said there’s a lot of bloat…would like to avoid adding to that.   Anyway, what does everyone think…required
    or optional?   John



  • 9.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-24-2017 14:05




    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.
     
    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI
    sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.

     
    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred.
    Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example,
    via a web service), it’s problematic.
     
    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK,
    particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers
    MUST assume it was last modified at the “created” date.
     
    Greg
     

    From:
    <cti@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson <athomson@lookingglasscyber.com>
    Cc: Bret Jordan <Bret_Jordan@symantec.com>, John Wunder <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: Re: [cti] Internationalization: lang field required or optional?


     

    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be
    looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may
    not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the
    RFC5646 decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         Allan Thomson <athomson@lookingglasscyber.com>
    To:         Bret Jordan <Bret_Jordan@symantec.com>, "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org"
    <cti@lists.oasis-open.org>
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?
    Sent by:         <cti@lists.oasis-open.org>






    If you are expecting to use different language content then its required for interoperability reasons.
     
    But by marking it required in the spec means that all content must have it even when most content is not multi-language.

     
    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.
     
    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.
     
    allan
     
    From: Bret Jordan <Bret_Jordan@symantec.com>
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson <athomson@lookingglasscyber.com>, "Wunder, John" <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    My thoughts....
     
    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

     
    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.

     
    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.
     
    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.

     
    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you
    could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.
     
    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.
     
    Bret




    From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Allan Thomson <athomson@lookingglasscyber.com>
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    Prefer optional.
     
    From: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of "Wunder, John" <jwunder@mitre.org>
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Subject: [cti] Internationalization: lang field required or optional?
     
    Hey everyone,
     
    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz
     
    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object.
    What we need to decide is whether we make that field required or optional.
     
    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the
    field or not.
     
    Here are some thoughts:
     
    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -           It shows we have a commitment to internationalization
     
    Making it optional:
     
    -           Any SRO or SDO could have a language tag, so consumers could not depend on it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included
     
    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to
    English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those
    who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s
    a lot of bloat…would like to avoid adding to that.
     
    Anyway, what does everyone think…required or optional?
     
    John










  • 10.  RE: [cti] Internationalization: lang field required or optional?

    Posted 02-27-2017 08:21




    Hi,

     
    I think the differences are in use cases in each one ’ s
    mind.
     
    Human readable texts are for humans to consume, but

    “ lang: ”
    tag is for the system to produce/consume.
    This is an on-the-wire/between-systems requirement/optionality.
    With the system knowing the language code for the human readable
    texts, the system can handle things better and provide much better UI, etc.
     
    My question is what is worth (use cases) to define lang: tag if it is optional.
     
    Regards,
     
    Ryu
     


    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Back, Greg
    Sent: Friday, February 24, 2017 11:05 PM
    To: Jason Keirstead; Allan Thomson
    Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     
    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.
     
    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving
    them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.

     
    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from)
    can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that
    many people use (for example, via a web service), it’s problematic.
     
    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or
    “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly
    specified in JSON, consumers MUST assume it was last modified at the “created” date.
     
    Greg
     

    From:
    < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?


     

    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII.
    We should be looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content
    as since they may not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force
    a revisit of the RFC5646decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         Allan Thomson < athomson@lookingglasscyber.com >
    To:         Bret Jordan < Bret_Jordan@symantec.com >,
    "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?
    Sent by:         < cti@lists.oasis-open.org >






    If you are expecting to use different language content then its required for interoperability reasons.
     
    But by marking it required in the spec means that all content must have it even when most content is not multi-language.

     
    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.
     
    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop
    spec.
     
    allan
     
    From:
    Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    My thoughts....
     
    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

     
    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.

     
    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.
     
    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language
    tag is now missing.
     
    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly
    do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.
     
    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at
    a guess all the time.
     
    Bret





    From:
    cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    Prefer optional.
     
    From:
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?
     
    Hey everyone,
     
    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz
     
    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language
    of the text content in that object. What we need to decide is whether we make that field required or optional.
     
    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional,
    producers could either include the field or not.
     
    Here are some thoughts:
     
    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid
    STIX otherwise
    -           It shows we have a commitment to internationalization
     
    Making it optional:
     
    -           Any SRO or SDO could have a language tag, so consumers could not depend on it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included
     
    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open
    source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because
    they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with
    someone who said there’s a lot of bloat…would like to avoid adding to that.
     
    Anyway, what does everyone think…required or optional?
     
    John









  • 11.  RE: [cti] Internationalization: lang field required or optional?

    Posted 02-28-2017 14:58
      |   view attached




    Greetings,
     
    If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions.
    The  terms identified in RFC 2119 allow for conditions.
    Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase:
     
    “If a producer other than the object creator wishes to create a new version, they
    MUST instead create a new object with a new id .”
     
    So let’s say we go with
    OPTIONAL … MUST/SHALL…
     
    These are somewhat convoluted but:
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process.
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border.
     
    Ryu asks if it’s worth spending time defining use cases?
    If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field.
    Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user.
     
    IMHO, tool providers can easily accommodate this field in their UI and in the interchange.

    How tool providers enhance their user experience is not the CTI TC’s concern.
    I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice.
     
    Gus
     
    Gus Creedon

     
    7940 Jones Branch Drive, Tysons, VA 22102
    Office: (703)917-7272       Cell: (571)335-6899
     

     
     
     


    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Masuoka, Ryusuke
    Sent: Monday, February 27, 2017 3:21 AM
    To: Back, Greg <gback@mitre.org>; Jason Keirstead <Jason.Keirstead@ca.ibm.com>; Allan Thomson <athomson@lookingglasscyber.com>
    Cc: Bret Jordan <Bret_Jordan@symantec.com>; Wunder, John A. <jwunder@mitre.org>; cti@lists.oasis-open.org
    Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional?


     
    Hi,

     
    I think the differences are in use cases in each one ’ s
    mind.
     
    Human readable texts are for humans to consume, but

    “ lang: ”
    tag is for the system to produce/consume.
    This is an on-the-wire/between-systems requirement/optionality.
    With the system knowing the language code for the human readable
    texts, the system can handle things better and provide much better UI, etc.
     
    My question is what is worth (use cases) to define lang: tag if it is optional.
     
    Regards,
     
    Ryu
     


    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Back, Greg
    Sent: Friday, February 24, 2017 11:05 PM
    To: Jason Keirstead; Allan Thomson
    Cc: Bret Jordan; Wunder, John A.;
    cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?


     
    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.
     
    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if
    leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec
    level.
     
    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified,
    valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a
    server that many people use (for example, via a web service), it’s problematic.
     
    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined”
    or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not
    explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date.
     
    Greg
     

    From:
    < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?


     

    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with
    in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant
    information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark
    content as since they may not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may
    force a revisit of the RFC5646decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         Allan Thomson < athomson@lookingglasscyber.com >
    To:         Bret Jordan < Bret_Jordan@symantec.com >,
    "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field
    required or optional?
    Sent by:         < cti@lists.oasis-open.org >






    If you are expecting to use different language content then its required for interoperability reasons.
     
    But by marking it required in the spec means that all content must have it even when most content is not multi-language.

     
    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.
     
    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in
    the interop spec.
     
    allan
     
    From:
    Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    My thoughts....
     
    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

     
    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.

     
    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.
     
    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the
    language tag is now missing.
     
    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and
    you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.
     
    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are
    just at a guess all the time.
     
    Bret





    From:
    cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    Prefer optional.
     
    From:
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?
     
    Hey everyone,
     
    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz
     
    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the
    language of the text content in that object. What we need to decide is whether we make that field required or optional.
     
    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional,
    producers could either include the field or not.
     
    Here are some thoughts:
     
    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on
    it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t
    be creating valid STIX otherwise
    -           It shows we have a commitment to internationalization
     
    Making it optional:
     
    -           Any SRO or SDO could have a language tag, so consumers could not depend on
    it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included
     
    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts
    or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because
    they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with
    someone who said there’s a lot of bloat…would like to avoid adding to that.
     
    Anyway, what does everyone think…required or optional?
     
    John








  • 12.  RE: [cti] Internationalization: lang field required or optional?

    Posted 02-28-2017 17:22
    It is not going to be hard at all coming
    up with use cases that generate optional conditions. In fact, I suspect
    this is going to be the majority (which is why it should be optional). Example - I have a cloud-based TIP that
    is heavily UK based, and thus my user accounts do not prompt them to specify
    their language (the product UI is English only). Someone enters an indicator
    into the platform and shares it out. I have *NO IDEA* what language that
    Indicator description is written in... you may *assume* it is English,
    but I really do not know, because I didn't ask the user... maybe they typed
    in French or Spanish, who knows. When I share that Indicator out, the language
    field *should not* be "en" or "en-GB", because I have
    no idea what the language actually is - it should be empty, or "undefined".
    But as I pointed out the other day, unfortunately the IETF has not decided
    to adopt the "und" language code from ISO! So if we don't make
    the field optional, then we can't use RFC5646 anymore and have to switch to ISO 639-X. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown
    From:      
      "CREEDON, Gus"
    <GCREEDON@lmi.org> To:      
      "Masuoka, Ryusuke"
    <masuoka.ryusuke@jp.fujitsu.com>, "Back, Greg" <gback@mitre.org>,
    Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" <athomson@lookingglasscyber.com> Cc:      
      Bret Jordan <Bret_Jordan@symantec.com>,
    "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org"
    <cti@lists.oasis-open.org> Date:      
      02/28/2017 10:58 AM Subject:    
        RE: [cti] Internationalization:
    lang field required or optional? Greetings,   If we do not make it REQUIRED,
    then we may be looking at a lot of work coming up with use cases that generate
    OPTIONAL conditions. The  terms identified
    in RFC 2119 allow for conditions. Parsing a sentence from STIX
    2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’
    phrase:   “If a producer other than the
    object creator wishes to create a new version, they MUST instead
    create a new object with a new id .”   So let’s say we go with OPTIONAL … MUST/SHALL…   These are somewhat convoluted
    but: OPTIONAL – the lang: field
    MUST/SHALL be used in the STIX message if the producer intends to enable
    consumers to accelerate the language identification process. OPTIONAL – the lang: field
    MUST/SHALL be used in the STIX message if the producer broadcasts to consumers
    who reside across a sovereign border.   Ryu asks if it’s worth spending
    time defining use cases? If we don’t intend to make
    lang: REQUIRED, then we need to develop conditions to satisfy the business/use
    case and express them in the object field. Again, that could turn into
    a lot of work and overly complicate the tool developer’s UI if they want
    to Q&A their way through the options with the user.   IMHO, tool providers can
    easily accommodate this field in their UI and in the interchange. How tool providers enhance
    their user experience is not the CTI TC’s concern. I believe, “REQUIRED –
    MUST be filled in with a valid code”, is the better choice.   Gus   Gus Creedon   7940 Jones Branch Drive, Tysons,
    VA 22102 Office: (703)917-7272  
      Cell: (571)335-6899         From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Masuoka, Ryusuke Sent: Monday, February 27, 2017 3:21 AM To: Back, Greg <gback@mitre.org>; Jason Keirstead <Jason.Keirstead@ca.ibm.com>;
    Allan Thomson <athomson@lookingglasscyber.com> Cc: Bret Jordan <Bret_Jordan@symantec.com>; Wunder, John A. <jwunder@mitre.org>;
    cti@lists.oasis-open.org Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required
    or optional?   Hi,   I think the differences
    are in use cases in each one ’ s
    mind.   Human readable texts are
    for humans to consume, but “ lang: ” tag is for the system to produce/consume. This is an on-the-wire/between-systems
    requirement/optionality. With the system knowing
    the language code for the human readable texts, the system can
    handle things better and provide much better UI, etc.   My question is what is
    worth (use cases) to define lang: tag if it is optional.   Regards,   Ryu   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Back, Greg Sent: Friday, February 24, 2017 11:05 PM To: Jason Keirstead; Allan Thomson Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   I originally didn’t feel strongly either
    way, but I’m coming around to feeling pretty strongly it should be optional.   Language is necessary only for human consumption
    (vs. encoding, which is necessary for machine consumption).  IMO,
    fields should only be required if leaving them off makes effective CTI
    sharing difficult, and I don’t (yet) think this is true for language information.
    It’s certainly we can specify in conformance levels or interoperability
    profiles, but I feel it would be a mistake to require it at the spec level.
      As I’ve been working on python-stix2,
    creating an Indicator only requires “labels” and “pattern”. All other
    required fields (type, id, created, modified, valid_from) can be reasonably
    inferred. Any program that uses python-stix2 needs to therefore require
    the user to enter that information, or make an assumption on their behalf.
    Getting the “current user’s” language works fine on personal machines,
    but on a server that many people use (for example, via a web service),
    it’s problematic.   Also, a field doesn’t need to be required
    if we define how consumers should behave when it’s missing; in this case,
    saying that the language is “undefined” or “unspecified” is likely
    OK, particularly that “unspecified” is OK for machine-to-machine communication
    that doesn’t involve humans. This is the reason I’ve always felt “modified”
    should be optional; IMO it’s perfectly reasonable to mandate that, if
    not explicitly specified in JSON, consumers MUST assume it was last modified
    at the “created” date.   Greg   From: < cti@lists.oasis-open.org >
    on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Friday, February 24, 2017 at 7:15 AM To: Allan Thomson < athomson@lookingglasscyber.com > Cc: Bret Jordan < Bret_Jordan@symantec.com >,
    John Wunder < jwunder@mitre.org >,
    " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional?   I also agree with Alan and John in the preference
    to make this optional. In general I do not like sending bytes when bytes are not required in a
    data interchange format, especially when considering the scale of data
    we will be dealing with in STIX/TAXII. We should be looking for opportunities
    to keep the data format trim. Truthfully, the vast majority of data in
    an ecosystem will all be the same language, and thus having to transmit
    a language tag for every single object in a package is redundant information. There is also another issue with making it "required", and that
    is that we would then have to support "unknown" or "undefined"
    - which many products would have to mark content as since they may not
    know the producer of the content's native language.  There is an ISO
    639 language tag for "undefined", but there is no IETF tag for
    "undefined" in the IANA registry, they never adopted the ISO
    entry. So making this mandatory may force a revisit of the RFC5646decision. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security
    www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown
    From:         Allan
    Thomson < athomson@lookingglasscyber.com > To:         Bret Jordan
    < Bret_Jordan@symantec.com >,
    "Wunder, John A." < jwunder@mitre.org >,
    " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > Date:         02/23/2017
    07:01 PM Subject:         Re:
    [cti] Internationalization: lang field required or optional? Sent by:         < cti@lists.oasis-open.org > If you are expecting to use different language content then its required
    for interoperability reasons. But by marking it required in the spec means that all content must have
    it even when most content is not multi-language. I generally would prefer more tolerance in the spec level and let the products/market
    use good behavior to drive what fields are included or not. If people care about language and multi-language support then they will
    use it. If they don’t then they wont be interoperable as that will be
    part of the test in the interop spec. allan From: Bret Jordan < Bret_Jordan@symantec.com > Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson < athomson@lookingglasscyber.com >,
    "Wunder, John" < jwunder@mitre.org >,
    " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? My thoughts.... 1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products
    that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it. Personally I see this as a data quality issue, not a STIX issue.  And
    I think both sides can suffer from it. Problems with Required: a) product or tool does not care, does not provide a UX for it, and just
    hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care
    and it just remains what ever the default is. Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just
    leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does
    not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do
    not support that feature so they never implemented it.  So when the
    data goes back out the other side, the language tag is now missing. I personally do not see the harm in requiring tools to support and populate
    the Lang tag.  In the spec we can define an "unknown" value,
    so if you are doing bulk loading of data and you honestly do not know the
    language, you could just flag it as "unknown".  Then at
    least as the consumer you would know that the producer did not know the
    language.  Versus getting an object where the language tag is omitted
    and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it. Once again, this is a data quality problem and if we make the lang field
    required, then it is a SUPER EASY interop test to see if they do it right.
     If it is optional, then you are just at a guess all the time. Bret From: cti@lists.oasis-open.org < cti@lists.oasis-open.org >
    on behalf of Allan Thomson < athomson@lookingglasscyber.com > Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? Prefer optional. From: " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    on behalf of "Wunder, John" < jwunder@mitre.org > Date: Thursday, February 23, 2017 at 12:59 PM To: " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > Subject: [cti] Internationalization: lang field required or optional? Hey everyone, We’re getting very close to having a completed approach for internationalization,
    you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz We do have one remaining question before we can move forward though. As
    part of the proposal, every single top-level object has a “lang” field,
    that identifies the language of the text content in that object. What we
    need to decide is whether we make that field required or optional. If we make the field required, every top-level object in STIX (SDOs and
    SROs) would have to have a “lang” field in it or it would be invalid
    STIX. If we make it optional, producers could either include the field
    or not. Here are some thoughts: Making it required: -           All
    SDOs and SROs would have a language tag, so consumers could depend on it
    being there -           It
    would encourage producers to actually fill it out, because they wouldn’t
    be creating valid STIX otherwise -           It
    shows we have a commitment to internationalization Making it optional: -           Any
    SRO or SDO could have a language tag, so consumers could not depend on
    it -           Producers
    would not have to create it -           We
    do have a SHOULD requirement saying that it should be included My opinion is that we should make it optional. If it’s required, I think
    people who don’t want to do internationalization (especially those creating
    one-off scripts or open source tools) will hardcode it to English and things
    will be mislabeled. If it’s optional, I think those who need/want to support
    internationalization and would do it right (most/all vendors, major open
    source projects) will populate it correctly regardless…because they need
    it…while those who couldn’t be bothered will be able to leave it off
    and we won’t have mis-labeled data. Also it’s almost not worth saying,
    but we already have a bunch of required fields on every SDO/SRO and I’ve
    already had one conversation with someone who said there’s a lot of bloat…would
    like to avoid adding to that. Anyway, what does everyone think…required or optional? John



  • 13.  Re: [cti] Internationalization: lang field required or optional?

    Posted 02-28-2017 20:03
      |   view attached




    We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t
    set it to some useful value, they’ll just make it unknown.
     
    FWIW I tried to collect where we stand across Slack and e-mail:
    Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense)
    Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher)
     
    In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.”
     
    In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much
    more than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing.
     
    Anybody else want to chime in on this?
     
    John
     

    From:
    Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Tuesday, February 28, 2017 at 12:21 PM
    To: "CREEDON, Gus" <GCREEDON@lmi.org>
    Cc: Allan Thomson <athomson@lookingglasscyber.com>, "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>, Greg Back <gback@mitre.org>, John Wunder <jwunder@mitre.org>, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
    Subject: RE: [cti] Internationalization: lang field required or optional?


     

    It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be
    optional).

    Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator
    into the platform and shares it out. I have *NO IDEA* what language that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When
    I share that Indicator out, the language field *should not* be "en" or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the
    "und" language code from ISO! So if we don't make the field optional, then we can't use
    RFC5646 anymore and have to switch to ISO 639-X.


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         "CREEDON, Gus" <GCREEDON@lmi.org>
    To:         "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>, "Back, Greg" <gback@mitre.org>, Jason Keirstead/CanEast/IBM@IBMCA,
    "Allan Thomson" <athomson@lookingglasscyber.com>
    Cc:         Bret Jordan <Bret_Jordan@symantec.com>, "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org"
    <cti@lists.oasis-open.org>
    Date:         02/28/2017 10:58 AM
    Subject:         RE: [cti] Internationalization: lang field required or optional?






    Greetings,
     
    If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions.
    The  terms identified in RFC 2119 allow for conditions.
    Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase:
     
    “If a producer other than the object creator wishes to create a new version, they
    MUST instead create a new object with a new id .”
     
    So let’s say we go with
    OPTIONAL … MUST/SHALL…
     
    These are somewhat convoluted but:
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process.
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border.
     
    Ryu asks if it’s worth spending time defining use cases?
    If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field.
    Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user.
     
    IMHO, tool providers can easily accommodate this field in their UI and in the interchange.

    How tool providers enhance their user experience is not the CTI TC’s concern.
    I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice.
     
    Gus
     
    Gus Creedon

     
    7940 Jones Branch Drive, Tysons, VA 22102
    Office: (703)917-7272     Cell: (571)335-6899
     

     
     
     
    From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Masuoka, Ryusuke
    Sent: Monday, February 27, 2017 3:21 AM
    To: Back, Greg <gback@mitre.org>; Jason Keirstead <Jason.Keirstead@ca.ibm.com>; Allan Thomson <athomson@lookingglasscyber.com>
    Cc: Bret Jordan <Bret_Jordan@symantec.com>; Wunder, John A. <jwunder@mitre.org>; cti@lists.oasis-open.org
    Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional?
     
    Hi,

     
    I think the differences are in use cases in each one ’ s
    mind.
     
    Human readable texts are for humans to consume, but

    “ lang: ” tag
    is for the system to produce/consume.
    This is an on-the-wire/between-systems requirement/optionality.
    With the system knowing the language code for the human readable
    texts, the system can handle things better and provide much better UI, etc.
     
    My question is what is worth (use cases) to define lang: tag if it is optional.
     
    Regards,
     
    Ryu
     
    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Back, Greg
    Sent: Friday, February 24, 2017 11:05 PM
    To: Jason Keirstead; Allan Thomson
    Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.
     
    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and
    I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.

     
    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program
    that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web
    service), it’s problematic.
     
    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified”
    is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last
    modified at the “created” date.
     
    Greg
     
    From: < cti@lists.oasis-open.org > on behalf of Jason
    Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully,
    the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There
    is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         Allan Thomson < athomson@lookingglasscyber.com >
    To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?
    Sent by:         < cti@lists.oasis-open.org >








    If you are expecting to use different language content then its required for interoperability reasons.

    But by marking it required in the spec means that all content must have it even when most content is not multi-language.


    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.

    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.

    allan

    From: Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    My thoughts....

    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.


    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.

    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.


    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".
     Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.

    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.

    Bret





    From: cti@lists.oasis-open.org < cti@lists.oasis-open.org >
    on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    Prefer optional.

    From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?

    Hey everyone,

    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz

    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field
    required or optional.

    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.

    Here are some thoughts:

    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -           It shows we have a commitment to internationalization

    Making it optional:

    -           Any SRO or SDO could have a language tag, so consumers could not depend on it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included

    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional,
    I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and
    we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.

    Anyway, what does everyone think…required or optional?

    John











  • 14.  Re: [cti] Internationalization: lang field required or optional?

    Posted 03-01-2017 02:26
      |   view attached
    In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it.  If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in the event that you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want. Bret   From: Wunder, John A. <jwunder@mitre.org> Sent: Tuesday, February 28, 2017 1:02:20 PM To: Jason Keirstead; CREEDON, Gus Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org; Back, Greg; Masuoka, Ryusuke Subject: Re: [cti] Internationalization: lang field required or optional?   We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set it to some useful value, they’ll just make it unknown.   FWIW I tried to collect where we stand across Slack and e-mail: Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense) Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher)   In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.”   In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing.   Anybody else want to chime in on this?   John   From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Tuesday, February 28, 2017 at 12:21 PM To: "CREEDON, Gus" <GCREEDON@lmi.org> Cc: Allan Thomson <athomson@lookingglasscyber.com>, "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>, Greg Back <gback@mitre.org>, John Wunder <jwunder@mitre.org>, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com> Subject: RE: [cti] Internationalization: lang field required or optional?   It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional). Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en" or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional, then we can't use RFC5646 anymore and have to switch to ISO 639-X. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         "CREEDON, Gus" <GCREEDON@lmi.org> To:         "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>, "Back, Greg" <gback@mitre.org>, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" <athomson@lookingglasscyber.com> Cc:         Bret Jordan <Bret_Jordan@symantec.com>, "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Date:         02/28/2017 10:58 AM Subject:         RE: [cti] Internationalization: lang field required or optional? Greetings,   If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions. The  terms identified in RFC 2119 allow for conditions. Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase:   “If a producer other than the object creator wishes to create a new version, they MUST instead create a new object with a new id .”   So let’s say we go with OPTIONAL … MUST/SHALL…   These are somewhat convoluted but: OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process. OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border.   Ryu asks if it’s worth spending time defining use cases? If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field. Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user.   IMHO, tool providers can easily accommodate this field in their UI and in the interchange. How tool providers enhance their user experience is not the CTI TC’s concern. I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice.   Gus   Gus Creedon   7940 Jones Branch Drive, Tysons, VA 22102 Office: (703)917-7272     Cell: (571)335-6899         From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Masuoka, Ryusuke Sent: Monday, February 27, 2017 3:21 AM To: Back, Greg <gback@mitre.org>; Jason Keirstead <Jason.Keirstead@ca.ibm.com>; Allan Thomson <athomson@lookingglasscyber.com> Cc: Bret Jordan <Bret_Jordan@symantec.com>; Wunder, John A. <jwunder@mitre.org>; cti@lists.oasis-open.org Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional?   Hi,   I think the differences are in use cases in each one ’ s mind.   Human readable texts are for humans to consume, but “ lang: ” tag is for the system to produce/consume. This is an on-the-wire/between-systems requirement/optionality. With the system knowing the language code for the human readable texts, the system can handle things better and provide much better UI, etc.   My question is what is worth (use cases) to define lang: tag if it is optional.   Regards,   Ryu   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Back, Greg Sent: Friday, February 24, 2017 11:05 PM To: Jason Keirstead; Allan Thomson Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional?   I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.   Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.   As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic.   Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date.   Greg   From: < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Friday, February 24, 2017 at 7:15 AM To: Allan Thomson < athomson@lookingglasscyber.com > Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional?   I also agree with Alan and John in the preference to make this optional. In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information. There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Allan Thomson < athomson@lookingglasscyber.com > To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/23/2017 07:01 PM Subject:         Re: [cti] Internationalization: lang field required or optional? Sent by:         < cti@lists.oasis-open.org > If you are expecting to use different language content then its required for interoperability reasons. But by marking it required in the spec means that all content must have it even when most content is not multi-language. I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not. If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec. allan From: Bret Jordan < Bret_Jordan@symantec.com > Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? My thoughts.... 1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it. Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is. Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing. I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it. Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time. Bret From: cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com > Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? Prefer optional. From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org > Date: Thursday, February 23, 2017 at 12:59 PM To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: [cti] Internationalization: lang field required or optional? Hey everyone, We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional. If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not. Here are some thoughts: Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization Making it optional: -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that. Anyway, what does everyone think…required or optional? John


  • 15.  Re: [cti] Internationalization: lang field required or optional?

    Posted 03-01-2017 10:31
    On 01.03.2017 02:25:37, Bret Jordan wrote: > > If we make it required, then we might get some people doing stupid > stuff and defaulting it to "en". But how is that different or worse > then what we have today? Further, I think we can use the RFC but > say, in the event that you do not know the language, you MUST use > "und". No need to go to the ISO version. We can use these RFCs how > ever we want. > +1 -- Cheers, Trey ++--------------------------------------------------------------------------++ Kingfisher Operations, sprl gpg fingerprint: 85F3 5F54 4A2A B4CD 33C4 5B9B B30D DD6E 62C8 6C1D ++--------------------------------------------------------------------------++ -- "There is more simplicity in the man who eats caviar on impulse than in the man who eats Grape-Nuts on principle." --G.K. Chesterton Attachment: signature.asc Description: Digital signature


  • 16.  Re: [cti] Internationalization: lang field required or optional?

    Posted 03-01-2017 12:46
    Perhaps we could lobby to get it added to the IANA registry? I do not actually understand why it is defined in ISO 639, but not in the IANA registry. > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that. If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Bret Jordan <Bret_Jordan@symantec.com> To:         "Wunder, John A." <jwunder@mitre.org>, Jason Keirstead/CanEast/IBM@IBMCA, "CREEDON, Gus" <GCREEDON@lmi.org> Cc:         Allan Thomson <athomson@lookingglasscyber.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>, "Back, Greg" <gback@mitre.org>, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com> Date:         02/28/2017 10:26 PM Subject:         Re: [cti] Internationalization: lang field required or optional? In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in the event that you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want. Bret   From: Wunder, John A. <jwunder@mitre.org> Sent: Tuesday, February 28, 2017 1:02:20 PM To: Jason Keirstead; CREEDON, Gus Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org; Back, Greg; Masuoka, Ryusuke Subject: Re: [cti] Internationalization: lang field required or optional?   We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set it to some useful value, they’ll just make it unknown.   FWIW I tried to collect where we stand across Slack and e-mail: Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense) Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher)   In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.”   In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing.   Anybody else want to chime in on this?   John   From: Jason Keirstead <Jason.Keirstead@ca.ibm.com> Date: Tuesday, February 28, 2017 at 12:21 PM To: "CREEDON, Gus" <GCREEDON@lmi.org> Cc: Allan Thomson <athomson@lookingglasscyber.com>, "Bret Jordan (CS)" <Bret_Jordan@symantec.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>, Greg Back <gback@mitre.org>, John Wunder <jwunder@mitre.org>, "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com> Subject: RE: [cti] Internationalization: lang field required or optional?   It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional). Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en" or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional, then we can't use RFC5646 anymore and have to switch to ISO 639-X. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         "CREEDON, Gus" <GCREEDON@lmi.org> To:         "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>, "Back, Greg" <gback@mitre.org>, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" <athomson@lookingglasscyber.com> Cc:         Bret Jordan <Bret_Jordan@symantec.com>, "Wunder, John A." <jwunder@mitre.org>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Date:         02/28/2017 10:58 AM Subject:         RE: [cti] Internationalization: lang field required or optional? Greetings, If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions. The  terms identified in RFC 2119 allow for conditions. Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase: “If a producer other than the object creator wishes to create a new version, they MUST instead create a new object with a new id .” So let’s say we go with OPTIONAL … MUST/SHALL… These are somewhat convoluted but: OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process. OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border. Ryu asks if it’s worth spending time defining use cases? If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field. Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user. IMHO, tool providers can easily accommodate this field in their UI and in the interchange. How tool providers enhance their user experience is not the CTI TC’s concern. I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice. Gus Gus Creedon 7940 Jones Branch Drive, Tysons, VA 22102 Office: (703)917-7272     Cell: (571)335-6899 From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Masuoka, Ryusuke Sent: Monday, February 27, 2017 3:21 AM To: Back, Greg <gback@mitre.org>; Jason Keirstead <Jason.Keirstead@ca.ibm.com>; Allan Thomson <athomson@lookingglasscyber.com> Cc: Bret Jordan <Bret_Jordan@symantec.com>; Wunder, John A. <jwunder@mitre.org>; cti@lists.oasis-open.org Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional? Hi, I think the differences are in use cases in each one ’ s mind. Human readable texts are for humans to consume, but “ lang: ” tag is for the system to produce/consume. This is an on-the-wire/between-systems requirement/optionality. With the system knowing the language code for the human readable texts, the system can handle things better and provide much better UI, etc. My question is what is worth (use cases) to define lang: tag if it is optional. Regards, Ryu From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Back, Greg Sent: Friday, February 24, 2017 11:05 PM To: Jason Keirstead; Allan Thomson Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional. Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level. As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic. Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date. Greg From: < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Friday, February 24, 2017 at 7:15 AM To: Allan Thomson < athomson@lookingglasscyber.com > Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? I also agree with Alan and John in the preference to make this optional. In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information. There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Allan Thomson < athomson@lookingglasscyber.com > To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/23/2017 07:01 PM Subject:         Re: [cti] Internationalization: lang field required or optional? Sent by:         < cti@lists.oasis-open.org > If you are expecting to use different language content then its required for interoperability reasons. But by marking it required in the spec means that all content must have it even when most content is not multi-language. I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not. If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec. allan From: Bret Jordan < Bret_Jordan@symantec.com > Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? My thoughts.... 1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it. Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is. Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing. I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it. Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time. Bret From: cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com > Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? Prefer optional. From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org > Date: Thursday, February 23, 2017 at 12:59 PM To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: [cti] Internationalization: lang field required or optional? Hey everyone, We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional. If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not. Here are some thoughts: Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization Making it optional: -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that. Anyway, what does everyone think…required or optional? John


  • 17.  RE: [cti] Internationalization: lang field required or optional?

    Posted 03-01-2017 13:45
      |   view attached




    > The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.
     
    This is my biggest concern here.  Just because today the majority of use cases are single language (usually English) we shouldn’t assume that will always be the case.  Our
    goal is to democratize the sharing of intelligence content and future proof the specs as best as we can.
     
    > If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default
    language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object.
     
    I can get behind this idea.  We could add a required ‘default-lang’ field to the STIX Bundle which could indicate language used for all SDO in the Bundle.  If there is a deviation,
    we add an optional ‘lang’ field to the SDO to capture that.  Normative text should include that any STIX communication MUST have a language defined, either at the Bundle level or at the individual object level as needed.
     
    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Jason Keirstead
    Sent: Wednesday, March 01, 2017 7:44 AM
    To: Bret Jordan
    Cc: Allan Thomson; cti@lists.oasis-open.org; Back, Greg; CREEDON, Gus; Jason Keirstead; Wunder, John A.; Masuoka, Ryusuke
    Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?
     
    Perhaps we could lobby to get it added to the IANA registry?


    I do not actually understand why it is defined in ISO 639, but not in the IANA registry.

    > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make it optional, then
    it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it.


    The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.

    If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem
    without having to add 12 superfluous bytes to every single STIX object.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         Bret Jordan < Bret_Jordan@symantec.com >
    To:         "Wunder, John A." < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA,
    "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc:         Allan Thomson < athomson@lookingglasscyber.com >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, "Back, Greg" < gback@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Date:         02/28/2017 10:26 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?






    In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is
    highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it.


    If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in
    the event that you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want.

    Bret
     




    From: Wunder, John A. < jwunder@mitre.org >
    Sent: Tuesday, February 28, 2017 1:02:20 PM
    To: Jason Keirstead; CREEDON, Gus
    Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org ; Back, Greg; Masuoka, Ryusuke
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set
    it to some useful value, they’ll just make it unknown.
     
    FWIW I tried to collect where we stand across Slack and e-mail:
    Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense)
    Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher)
     
    In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.”
     
    In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more
    than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing.
     
    Anybody else want to chime in on this?
     
    John
     
    From: Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Tuesday, February 28, 2017 at 12:21 PM
    To: "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc: Allan Thomson < athomson@lookingglasscyber.com >, "Bret Jordan (CS)" < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >, Greg Back < gback@mitre.org >, John Wunder < jwunder@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Subject: RE: [cti] Internationalization: lang field required or optional?
     
    It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional).

    Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language
    that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en"
    or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional,
    then we can't use RFC5646anymore and have to switch to ISO 639-X.


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         "CREEDON, Gus" < GCREEDON@lmi.org >
    To:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Back, Greg" < gback@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" < athomson@lookingglasscyber.com >
    Cc:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/28/2017 10:58 AM
    Subject:         RE: [cti] Internationalization: lang field required or optional?







    Greetings,

    If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions.
    The  terms identified in RFC 2119 allow for conditions.
    Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase:

    “If a producer other than the object creator wishes to create a new version, they
    MUST instead create a new object with a new id .”

    So let’s say we go with
    OPTIONAL … MUST/SHALL…

    These are somewhat convoluted but:
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process.
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border.

    Ryu asks if it’s worth spending time defining use cases?
    If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field.
    Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user.

    IMHO, tool providers can easily accommodate this field in their UI and in the interchange.

    How tool providers enhance their user experience is not the CTI TC’s concern.
    I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice.

    Gus

    Gus Creedon

    7940 Jones Branch Drive, Tysons, VA 22102
    Office: (703)917-7272     Cell: (571)335-6899





    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Masuoka, Ryusuke
    Sent: Monday, February 27, 2017 3:21 AM
    To: Back, Greg < gback@mitre.org >; Jason Keirstead < Jason.Keirstead@ca.ibm.com >; Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >; Wunder, John A. < jwunder@mitre.org >;
    cti@lists.oasis-open.org
    Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional?

    Hi,

    I think the differences are in use cases in each one’s mind.

    Human readable texts are for humans to consume, but
    “lang:”tag is for the system to produce/consume.
    This is an on-the-wire/between-systems requirement/optionality.
    With the system knowing the language code for the human readable
    texts, the system can handle things better and provide much better UI, etc.

    My question is what is worth (use cases) to define lang: tag if it is optional.

    Regards,

    Ryu

    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Back, Greg
    Sent: Friday, February 24, 2017 11:05 PM
    To: Jason Keirstead; Allan Thomson
    Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.

    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information.
    It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.


    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the
    user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic.

    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication
    that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date.

    Greg

    From: < cti@lists.oasis-open.org > on behalf
    of Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully,
    the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There
    is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         Allan Thomson < athomson@lookingglasscyber.com >
    To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John
    A." < jwunder@mitre.org >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?
    Sent by:         < cti@lists.oasis-open.org >








    If you are expecting to use different language content then its required for interoperability reasons.

    But by marking it required in the spec means that all content must have it even when most content is not multi-language.


    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.

    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.

    allan

    From: Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    My thoughts....

    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.


    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.

    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.


    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".
     Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.

    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.

    Bret






    From:
    cti@lists.oasis-open.org < cti@lists.oasis-open.org >
    on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    Prefer optional.

    From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?

    Hey everyone,

    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz

    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field
    required or optional.

    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.

    Here are some thoughts:

    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -           It shows we have a commitment to internationalization

    Making it optional:

    -           Any SRO or SDO could have a language tag, so consumers could not depend on it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included

    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional,
    I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and
    we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.

    Anyway, what does everyone think…required or optional?

    John












  • 18.  RE: [cti] Internationalization: lang field required or optional?

    Posted 03-01-2017 14:53
      |   view attached




    >> The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike
    that.
     
    >This is my biggest concern here.  Just because today the majority of use cases are single >language (usually English) we shouldn’t assume that will always
    be the case.  Our goal is to >democratize the sharing of intelligence content and future proof the specs as best as we can.
     
    I couldn’t agree more. I can’t think of the amount of times a short term, dare I say it US centric decision, has had serious consequences on the long term internationalization
    of data sharing I have worked on.
     
    ·        
    All dates of birth must be complete dates

    o   
    Workaround – if complete date is unknown then a date must be chosen just to conform to specs. Most often 1-1 is chosen but not always.


    o   
    Outcome - Hard to match identities. May have consequences on if someone is treated as a minor or adult in legal terms

    ·        
    Everyone must have a first and last name

    o   
    Workaround – If they only have one name then sometimes ‘UNKNOWN’ is used or ‘BLANK’ or ‘No First Name’, or ‘No Last Name’ or ‘NFN’ or….. or …. Or

    o   
    Outcome – It is impossible to filter or search as there is no consistency
    ·        
    Back end systems only accept 7 bit ascii but encoding format is removed from the message format to ‘save bytes flying over the wire’ (sound familiar
    here?)

    o   
    Workaround – No workaround. By law other countries have to record and send a person’s true name with accents (e.g. René,
    Jesús). Some words actually change meaning without the accent.

    o   
    Outcome –You can send accented characters to each other but please don’t send to the US or it might break our system…..
     
    My background isn’t CTI, its Biometrics and Identity, but this here is the reason I joined this group and why this is my first comment.
     
    Regards,

    Derek Northrope
    Head of Biometrics
     
    Associate Director
    Enterprise and Cyber Security
    Fujitsu Americas, Inc.
     
    Cell:        +1 613 410-3532
    E-mail:       
    derek.northrope@ca.fujitsu.com
    LinkedIn:    https://ca.linkedin.com/in/dereknorthrope
     
    This e-mail and any attached files may contain confidential and/or privileged material for the sole use of the intended recipient.  Any review, use,
    distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive this e-mail for the recipient), you may not review, copy or distribute this message.  Please contact the sender by reply e-mail and
    delete all copies of this message.
    P Do you really need to print this e-mail? Think green!

     


    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Coderre, Robert
    Sent: Wednesday, March 01, 2017 8:45 AM
    To: Jason.Keirstead@ca.ibm.com; Bret_Jordan@symantec.com
    Cc: athomson@lookingglasscyber.com; cti@lists.oasis-open.org; gback@mitre.org; GCREEDON@lmi.org; jwunder@mitre.org; Masuoka, Ryusuke/ ??
    ??
    Subject: RE: [cti] Internationalization: lang field required or optional?


     
    > The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.
     
    This is my biggest concern here.  Just because today the majority of use cases are single language (usually English) we shouldn’t assume that will always be the
    case.  Our goal is to democratize the sharing of intelligence content and future proof the specs as best as we can.
     
    > If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define
    a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object.
     
    I can get behind this idea.  We could add a required ‘default-lang’ field to the STIX Bundle which could indicate language used for all SDO in the Bundle.  If
    there is a deviation, we add an optional ‘lang’ field to the SDO to capture that.  Normative text should include that any STIX communication MUST have a language defined, either at the Bundle level or at the individual object level as needed.
     
    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Jason Keirstead
    Sent: Wednesday, March 01, 2017 7:44 AM
    To: Bret Jordan
    Cc: Allan Thomson; cti@lists.oasis-open.org ; Back, Greg; CREEDON, Gus; Jason Keirstead; Wunder, John A.; Masuoka, Ryusuke
    Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?
     
    Perhaps we could lobby to get it added to the IANA registry?


    I do not actually understand why it is defined in ISO 639, but not in the IANA registry.

    > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make
    it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it.


    The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.

    If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language"
    for my ecosystem without having to add 12 superfluous bytes to every single STIX object.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         Bret Jordan < Bret_Jordan@symantec.com >
    To:         "Wunder, John A." < jwunder@mitre.org >,
    Jason Keirstead/CanEast/IBM@IBMCA, "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc:         Allan Thomson < athomson@lookingglasscyber.com >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, "Back, Greg" < gback@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Date:         02/28/2017 10:26 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?






    In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it
    optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it.


    If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use
    the RFC but say, in the event that you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want.

    Bret
     




    From: Wunder, John A. < jwunder@mitre.org >
    Sent: Tuesday, February 28, 2017 1:02:20 PM
    To: Jason Keirstead; CREEDON, Gus
    Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org ; Back, Greg; Masuoka, Ryusuke
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left
    it off won’t set it to some useful value, they’ll just make it unknown.
     
    FWIW I tried to collect where we stand across Slack and e-mail:
    Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense)
    Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher)
     
    In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content
    is known.”
     
    In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel
    like it’s much more than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing.
     
    Anybody else want to chime in on this?
     
    John
     
    From:
    Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Tuesday, February 28, 2017 at 12:21 PM
    To: "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc: Allan Thomson < athomson@lookingglasscyber.com >, "Bret Jordan (CS)" < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >, Greg Back < gback@mitre.org >, John Wunder < jwunder@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Subject: RE: [cti] Internationalization: lang field required or optional?
     
    It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be
    optional).

    Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language
    that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en"
    or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional,
    then we can't use RFC5646anymore and have to switch to ISO 639-X.


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         "CREEDON, Gus" < GCREEDON@lmi.org >
    To:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Back, Greg" < gback@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" < athomson@lookingglasscyber.com >
    Cc:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/28/2017 10:58 AM
    Subject:         RE: [cti] Internationalization: lang field required or optional?







    Greetings,

    If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions.
    The  terms identified in RFC 2119 allow for conditions.
    Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase:

    “If a producer other than the object creator wishes to create a new version, they
    MUST instead create a new object with a new id .”

    So let’s say we go with
    OPTIONAL … MUST/SHALL…

    These are somewhat convoluted but:
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process.
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border.

    Ryu asks if it’s worth spending time defining use cases?
    If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field.
    Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user.

    IMHO, tool providers can easily accommodate this field in their UI and in the interchange.

    How tool providers enhance their user experience is not the CTI TC’s concern.
    I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice.

    Gus

    Gus Creedon

    7940 Jones Branch Drive, Tysons, VA 22102
    Office: (703)917-7272     Cell: (571)335-6899





    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Masuoka, Ryusuke
    Sent: Monday, February 27, 2017 3:21 AM
    To: Back, Greg < gback@mitre.org >; Jason Keirstead < Jason.Keirstead@ca.ibm.com >; Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >; Wunder, John A. < jwunder@mitre.org >;
    cti@lists.oasis-open.org
    Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional?

    Hi,

    I think the differences are in use cases in each one’s mind.

    Human readable texts are for humans to consume, but
    “lang:”tag is for the system to produce/consume.
    This is an on-the-wire/between-systems requirement/optionality.
    With the system knowing the language code for the human readable
    texts, the system can handle things better and provide much better UI, etc.

    My question is what is worth (use cases) to define lang: tag if it is optional.

    Regards,

    Ryu

    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Back, Greg
    Sent: Friday, February 24, 2017 11:05 PM
    To: Jason Keirstead; Allan Thomson
    Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.

    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information.
    It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.


    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the
    user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic.

    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication
    that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date.

    Greg

    From: < cti@lists.oasis-open.org >
    on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder
    < jwunder@mitre.org >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully,
    the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There
    is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         Allan Thomson < athomson@lookingglasscyber.com >
    To:         Bret Jordan < Bret_Jordan@symantec.com >,
    "Wunder, John A." < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?
    Sent by:         < cti@lists.oasis-open.org >








    If you are expecting to use different language content then its required for interoperability reasons.

    But by marking it required in the spec means that all content must have it even when most content is not multi-language.


    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.

    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.

    allan

    From: Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >,
    "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    My thoughts....

    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.


    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.

    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.


    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".
     Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.

    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.

    Bret






    From:
    cti@lists.oasis-open.org < cti@lists.oasis-open.org >
    on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    Prefer optional.

    From: " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?

    Hey everyone,

    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz

    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field
    required or optional.

    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.

    Here are some thoughts:

    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -           It shows we have a commitment to internationalization

    Making it optional:

    -           Any SRO or SDO could have a language tag, so consumers could not depend on it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included

    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional,
    I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and
    we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.

    Anyway, what does everyone think…required or optional?

    John







    This e-mail and any attached files are only for the use of its intended recipient(s). Its contents are confidential and may be privileged. Fujitsu does not guarantee that this e-mail has not
    been intercepted and amended or that it is virus free. If you have received this e-mail and are not the intended recipient, please contact the sender by e-mail and destroy all copies of this e-mail and any attachments. / Le présent courriel, ainsi que ses
    pièces jointes, ne peut être utilisé que par le ou les destinataires auxquels il a été transmis. Les renseignements qu'il contient sont confidentiels, voire même protégés. Fujitsu ne peut garantir que ce courriel n'a pas été intercepté ou modifié, ou qu'il
    ne contient aucun virus. Si vous avez reçu ce courriel sans en être le destinataire prévu, veuillez communiquer par courriel avec son expéditeur et en détruire toutes les copies et pièces jointes.





  • 19.  RE: [cti] Internationalization: lang field required or optional?

    Posted 03-01-2017 17:22
    For what it is worth - my argument against this is not coming from a US-centric point of view... not only am I not from the US, but our products have to support ~ 15 different languages at a minimum as they are used globally. Everyone needs to be on the same page on this discussion - and I feel very strongly like we are talking past each-other with the terms "use" and "optional". Some key points to be clear on.. a) There is a very large difference between an individual "using" something, and a piece of software implementing support for it. As this is a discussion around a data interchange specification, our primary concern is surrounding *software implementation*, not individual users. b) There is indisputably going to be a very large amount of CTI data for which no language can be inferred. This data may be machine generated,  or net-new data input by tools who do not know the language of their immediate users, or the *many billions* of pieces of existing CTI data being migrated to STIX 2. Does anyone actually dispute this? I don't think anyone is disputing it, but if there are then we should open this thread up. Therefore - when we say that we want this field to be "optional" - we are not referring to if the *specification of a language* is allowed to be optional at all... point (b) above proves that we *must* support, via some mechanism, a way to allow a language that is unspecified. What we are discussing when we say we want a field to be "optional", is simply discussing the mechanism by which a piece of software should indicate that a language is unknown. Should that be by simply omitting the field outright, or should it be lang="und" ? We are arguing about those 10 bytes on each object - nothing more. We are not arguing capability at all - all are agreed on the need for the capability. The main argument folks seem to be making for having this field be mandatory on each object, is an assumption that if the field is optional, that it won't be "used". Again - when a statement like this is made, what people are saying is "if the field is optional, than software vendors will not implement support in their products to set the field". I would argue very strongly against this, because it is false in my opinion. Software vendors will do what the market compels them to - and for any software vendor that targets multiple markets (ie, most major ones) - this includes internationalization. In our own case, we will most certainly implement support in our products to set this field, because I need to support ~ 15 languages. However, just because I will add support for this field, does not mean I support making it mandatory, because I am going to have to transmit millions of pieces of data for which no language can be inferred, and I do not want to transmit all of these GB for no real reason, when having an empty JSON field conveys the exact same information.. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         "Derek.Northrope@ca.fujitsu.com" <Derek.Northrope@ca.fujitsu.com> To:         "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Cc:         "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com> Date:         03/01/2017 10:53 AM Subject:         RE:  [cti] Internationalization: lang field required or optional? Sent by:         <cti@lists.oasis-open.org> >> The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.   >This is my biggest concern here.  Just because today the majority of use cases are single >language (usually English) we shouldn’t assume that will always be the case.  Our goal is to >democratize the sharing of intelligence content and future proof the specs as best as we can.   I couldn’t agree more. I can’t think of the amount of times a short term, dare I say it US centric decision, has had serious consequences on the long term internationalization of data sharing I have worked on.   ·         All dates of birth must be complete dates o   Workaround – if complete date is unknown then a date must be chosen just to conform to specs. Most often 1-1 is chosen but not always. o   Outcome - Hard to match identities. May have consequences on if someone is treated as a minor or adult in legal terms ·         Everyone must have a first and last name o   Workaround – If they only have one name then sometimes ‘UNKNOWN’ is used or ‘BLANK’ or ‘No First Name’, or ‘No Last Name’ or ‘NFN’ or….. or …. Or o   Outcome – It is impossible to filter or search as there is no consistency ·         Back end systems only accept 7 bit ascii but encoding format is removed from the message format to ‘save bytes flying over the wire’ (sound familiar here?) o   Workaround – No workaround. By law other countries have to record and send a person’s true name with accents (e.g. René, Jesús). Some words actually change meaning without the accent. o   Outcome –You can send accented characters to each other but please don’t send to the US or it might break our system…..   My background isn’t CTI, its Biometrics and Identity, but this here is the reason I joined this group and why this is my first comment.   Regards, Derek Northrope Head of Biometrics   Associate Director Enterprise and Cyber Security Fujitsu Americas, Inc.   Cell:        +1 613 410-3532 E-mail:       derek.northrope@ca.fujitsu.com LinkedIn:   https://ca.linkedin.com/in/dereknorthrope   This e-mail and any attached files may contain confidential and/or privileged material for the sole use of the intended recipient.  Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive this e-mail for the recipient), you may not review, copy or distribute this message.  Please contact the sender by reply e-mail and delete all copies of this message. P Do you really need to print this e-mail? Think green!   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Coderre, Robert Sent: Wednesday, March 01, 2017 8:45 AM To: Jason.Keirstead@ca.ibm.com; Bret_Jordan@symantec.com Cc: athomson@lookingglasscyber.com; cti@lists.oasis-open.org; gback@mitre.org; GCREEDON@lmi.org; jwunder@mitre.org; Masuoka, Ryusuke/ ?? ?? Subject: RE: [cti] Internationalization: lang field required or optional?   > The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.   This is my biggest concern here.  Just because today the majority of use cases are single language (usually English) we shouldn’t assume that will always be the case.  Our goal is to democratize the sharing of intelligence content and future proof the specs as best as we can.   > If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object.   I can get behind this idea.  We could add a required ‘default-lang’ field to the STIX Bundle which could indicate language used for all SDO in the Bundle.  If there is a deviation, we add an optional ‘lang’ field to the SDO to capture that.  Normative text should include that any STIX communication MUST have a language defined, either at the Bundle level or at the individual object level as needed.   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Jason Keirstead Sent: Wednesday, March 01, 2017 7:44 AM To: Bret Jordan Cc: Allan Thomson; cti@lists.oasis-open.org ; Back, Greg; CREEDON, Gus; Jason Keirstead; Wunder, John A.; Masuoka, Ryusuke Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?   Perhaps we could lobby to get it added to the IANA registry? I do not actually understand why it is defined in ISO 639, but not in the IANA registry. > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that. If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Bret Jordan < Bret_Jordan@symantec.com > To:         "Wunder, John A." < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "CREEDON, Gus" < GCREEDON@lmi.org > Cc:         Allan Thomson < athomson@lookingglasscyber.com >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, "Back, Greg" < gback@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com > Date:         02/28/2017 10:26 PM Subject:         Re: [cti] Internationalization: lang field required or optional? In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in the event that you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want. Bret From: Wunder, John A. < jwunder@mitre.org > Sent: Tuesday, February 28, 2017 1:02:20 PM To: Jason Keirstead; CREEDON, Gus Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org ; Back, Greg; Masuoka, Ryusuke Subject: Re: [cti] Internationalization: lang field required or optional? We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set it to some useful value, they’ll just make it unknown. FWIW I tried to collect where we stand across Slack and e-mail: Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense) Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher) In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.” In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing. Anybody else want to chime in on this? John From: Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Tuesday, February 28, 2017 at 12:21 PM To: "CREEDON, Gus" < GCREEDON@lmi.org > Cc: Allan Thomson < athomson@lookingglasscyber.com >, "Bret Jordan (CS)" < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, Greg Back < gback@mitre.org >, John Wunder < jwunder@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com > Subject: RE: [cti] Internationalization: lang field required or optional? It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional). Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en" or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional, then we can't use RFC5646anymore and have to switch to ISO 639-X. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         "CREEDON, Gus" < GCREEDON@lmi.org > To:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Back, Greg" < gback@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" < athomson@lookingglasscyber.com > Cc:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/28/2017 10:58 AM Subject:         RE: [cti] Internationalization: lang field required or optional? Greetings, If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions. The  terms identified in RFC 2119 allow for conditions. Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase: “If a producer other than the object creator wishes to create a new version, they MUST instead create a new object with a new id .” So let’s say we go with OPTIONAL … MUST/SHALL… These are somewhat convoluted but: OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process. OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border. Ryu asks if it’s worth spending time defining use cases? If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field. Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user. IMHO, tool providers can easily accommodate this field in their UI and in the interchange. How tool providers enhance their user experience is not the CTI TC’s concern. I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice. Gus Gus Creedon 7940 Jones Branch Drive, Tysons, VA 22102 Office: (703)917-7272     Cell: (571)335-6899 From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Masuoka, Ryusuke Sent: Monday, February 27, 2017 3:21 AM To: Back, Greg < gback@mitre.org >; Jason Keirstead < Jason.Keirstead@ca.ibm.com >; Allan Thomson < athomson@lookingglasscyber.com > Cc: Bret Jordan < Bret_Jordan@symantec.com >; Wunder, John A. < jwunder@mitre.org >; cti@lists.oasis-open.org Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional? Hi, I think the differences are in use cases in each one’s mind. Human readable texts are for humans to consume, but “lang:”tag is for the system to produce/consume. This is an on-the-wire/between-systems requirement/optionality. With the system knowing the language code for the human readable texts, the system can handle things better and provide much better UI, etc. My question is what is worth (use cases) to define lang: tag if it is optional. Regards, Ryu From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Back, Greg Sent: Friday, February 24, 2017 11:05 PM To: Jason Keirstead; Allan Thomson Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional. Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level. As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic. Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date. Greg From: < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Friday, February 24, 2017 at 7:15 AM To: Allan Thomson < athomson@lookingglasscyber.com > Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? I also agree with Alan and John in the preference to make this optional. In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information. There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Allan Thomson < athomson@lookingglasscyber.com > To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/23/2017 07:01 PM Subject:         Re: [cti] Internationalization: lang field required or optional? Sent by:         < cti@lists.oasis-open.org > If you are expecting to use different language content then its required for interoperability reasons. But by marking it required in the spec means that all content must have it even when most content is not multi-language. I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not. If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec. allan From: Bret Jordan < Bret_Jordan@symantec.com > Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? My thoughts.... 1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it. Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is. Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing. I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it. Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time. Bret From: cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com > Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? Prefer optional. From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org > Date: Thursday, February 23, 2017 at 12:59 PM To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: [cti] Internationalization: lang field required or optional? Hey everyone, We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional. If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not. Here are some thoughts: Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization Making it optional: -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that. Anyway, what does everyone think…required or optional? John This e-mail and any attached files are only for the use of its intended recipient(s). Its contents are confidential and may be privileged. Fujitsu does not guarantee that this e-mail has not been intercepted and amended or that it is virus free. If you have received this e-mail and are not the intended recipient, please contact the sender by e-mail and destroy all copies of this e-mail and any attachments. / Le présent courriel, ainsi que ses pièces jointes, ne peut être utilisé que par le ou les destinataires auxquels il a été transmis. Les renseignements qu'il contient sont confidentiels, voire même protégés. Fujitsu ne peut garantir que ce courriel n'a pas été intercepté ou modifié, ou qu'il ne contient aucun virus. Si vous avez reçu ce courriel sans en être le destinataire prévu, veuillez communiquer par courriel avec son expéditeur et en détruire toutes les copies et pièces jointes.


  • 20.  Re: [cti] Internationalization: lang field required or optional?

    Posted 03-04-2017 20:57
      |   view attached
    I think I am coming around to Jason's views.. But I do have a few concerns with our one-size fits all approach.  Namely, I do not think the "lang" property should be on the "Common Properties" for all objects.  Because it does not make any sense to have it on the Observed Data or Sighting objects.  There are no string fields on these objects that allow for human generated content.   I can also see how for things like Indicators, where the vast majority of indicators will have nothing to translate, that the "lang" field should be Optional.  This makes a lot of sense.  Where I still find myself on the fence is with higher level intelligence.  This is usually always human generated and I feel it would be within reason to make "lang" required for these objects. Bret From: cti@lists.oasis-open.org <cti@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com> Sent: Wednesday, March 1, 2017 10:21:24 AM To: Derek.Northrope@ca.fujitsu.com Cc: cti@lists.oasis-open.org; Masuoka, Ryusuke Subject: RE: [cti] Internationalization: lang field required or optional?   For what it is worth - my argument against this is not coming from a US-centric point of view... not only am I not from the US, but our products have to support ~ 15 different languages at a minimum as they are used globally. Everyone needs to be on the same page on this discussion - and I feel very strongly like we are talking past each-other with the terms "use" and "optional". Some key points to be clear on.. a) There is a very large difference between an individual "using" something, and a piece of software implementing support for it. As this is a discussion around a data interchange specification, our primary concern is surrounding *software implementation*, not individual users. b) There is indisputably going to be a very large amount of CTI data for which no language can be inferred. This data may be machine generated,  or net-new data input by tools who do not know the language of their immediate users, or the *many billions* of pieces of existing CTI data being migrated to STIX 2. Does anyone actually dispute this? I don't think anyone is disputing it, but if there are then we should open this thread up. Therefore - when we say that we want this field to be "optional" - we are not referring to if the *specification of a language* is allowed to be optional at all... point (b) above proves that we *must* support, via some mechanism, a way to allow a language that is unspecified. What we are discussing when we say we want a field to be "optional", is simply discussing the mechanism by which a piece of software should indicate that a language is unknown. Should that be by simply omitting the field outright, or should it be lang="und" ? We are arguing about those 10 bytes on each object - nothing more. We are not arguing capability at all - all are agreed on the need for the capability. The main argument folks seem to be making for having this field be mandatory on each object, is an assumption that if the field is optional, that it won't be "used". Again - when a statement like this is made, what people are saying is "if the field is optional, than software vendors will not implement support in their products to set the field". I would argue very strongly against this, because it is false in my opinion. Software vendors will do what the market compels them to - and for any software vendor that targets multiple markets (ie, most major ones) - this includes internationalization. In our own case, we will most certainly implement support in our products to set this field, because I need to support ~ 15 languages. However, just because I will add support for this field, does not mean I support making it mandatory, because I am going to have to transmit millions of pieces of data for which no language can be inferred, and I do not want to transmit all of these GB for no real reason, when having an empty JSON field conveys the exact same information.. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         "Derek.Northrope@ca.fujitsu.com" <Derek.Northrope@ca.fujitsu.com> To:         "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> Cc:         "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com> Date:         03/01/2017 10:53 AM Subject:         RE:  [cti] Internationalization: lang field required or optional? Sent by:         <cti@lists.oasis-open.org> >> The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.   >This is my biggest concern here.  Just because today the majority of use cases are single >language (usually English) we shouldn’t assume that will always be the case.  Our goal is to >democratize the sharing of intelligence content and future proof the specs as best as we can.   I couldn’t agree more. I can’t think of the amount of times a short term, dare I say it US centric decision, has had serious consequences on the long term internationalization of data sharing I have worked on.   ·         All dates of birth must be complete dates o   Workaround ? if complete date is unknown then a date must be chosen just to conform to specs. Most often 1-1 is chosen but not always. o   Outcome - Hard to match identities. May have consequences on if someone is treated as a minor or adult in legal terms ·         Everyone must have a first and last name o   Workaround ? If they only have one name then sometimes ‘UNKNOWN’ is used or ‘BLANK’ or ‘No First Name’, or ‘No Last Name’ or ‘NFN’ or….. or …. Or o   Outcome ? It is impossible to filter or search as there is no consistency ·         Back end systems only accept 7 bit ascii but encoding format is removed from the message format to ‘save bytes flying over the wire’ (sound familiar here?) o   Workaround ? No workaround. By law other countries have to record and send a person’s true name with accents (e.g. René, Jesús). Some words actually change meaning without the accent. o   Outcome ?You can send accented characters to each other but please don’t send to the US or it might break our system…..   My background isn’t CTI, its Biometrics and Identity, but this here is the reason I joined this group and why this is my first comment.   Regards, Derek Northrope Head of Biometrics   Associate Director Enterprise and Cyber Security Fujitsu Americas, Inc.   Cell:        +1 613 410-3532 E-mail:       derek.northrope@ca.fujitsu.com LinkedIn:   https://ca.linkedin.com/in/dereknorthrope   This e-mail and any attached files may contain confidential and/or privileged material for the sole use of the intended recipient.  Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive this e-mail for the recipient), you may not review, copy or distribute this message.  Please contact the sender by reply e-mail and delete all copies of this message. P Do you really need to print this e-mail? Think green!   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Coderre, Robert Sent: Wednesday, March 01, 2017 8:45 AM To: Jason.Keirstead@ca.ibm.com; Bret_Jordan@symantec.com Cc: athomson@lookingglasscyber.com; cti@lists.oasis-open.org; gback@mitre.org; GCREEDON@lmi.org; jwunder@mitre.org; Masuoka, Ryusuke/ ?? ?? Subject: RE: [cti] Internationalization: lang field required or optional?   > The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.   This is my biggest concern here.  Just because today the majority of use cases are single language (usually English) we shouldn’t assume that will always be the case.  Our goal is to democratize the sharing of intelligence content and future proof the specs as best as we can.   > If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object.   I can get behind this idea.  We could add a required ‘default-lang’ field to the STIX Bundle which could indicate language used for all SDO in the Bundle.  If there is a deviation, we add an optional ‘lang’ field to the SDO to capture that.  Normative text should include that any STIX communication MUST have a language defined, either at the Bundle level or at the individual object level as needed.   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Jason Keirstead Sent: Wednesday, March 01, 2017 7:44 AM To: Bret Jordan Cc: Allan Thomson; cti@lists.oasis-open.org ; Back, Greg; CREEDON, Gus; Jason Keirstead; Wunder, John A.; Masuoka, Ryusuke Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?   Perhaps we could lobby to get it added to the IANA registry? I do not actually understand why it is defined in ISO 639, but not in the IANA registry. > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that. If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Bret Jordan < Bret_Jordan@symantec.com > To:         "Wunder, John A." < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "CREEDON, Gus" < GCREEDON@lmi.org > Cc:         Allan Thomson < athomson@lookingglasscyber.com >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, "Back, Greg" < gback@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com > Date:         02/28/2017 10:26 PM Subject:         Re: [cti] Internationalization: lang field required or optional? In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in the event that you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want. Bret From: Wunder, John A. < jwunder@mitre.org > Sent: Tuesday, February 28, 2017 1:02:20 PM To: Jason Keirstead; CREEDON, Gus Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org ; Back, Greg; Masuoka, Ryusuke Subject: Re: [cti] Internationalization: lang field required or optional? We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set it to some useful value, they’ll just make it unknown. FWIW I tried to collect where we stand across Slack and e-mail: Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense) Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher) In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.” In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing. Anybody else want to chime in on this? John From: Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Tuesday, February 28, 2017 at 12:21 PM To: "CREEDON, Gus" < GCREEDON@lmi.org > Cc: Allan Thomson < athomson@lookingglasscyber.com >, "Bret Jordan (CS)" < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, Greg Back < gback@mitre.org >, John Wunder < jwunder@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com > Subject: RE: [cti] Internationalization: lang field required or optional? It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional). Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en" or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional, then we can't use RFC5646anymore and have to switch to ISO 639-X. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         "CREEDON, Gus" < GCREEDON@lmi.org > To:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Back, Greg" < gback@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" < athomson@lookingglasscyber.com > Cc:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/28/2017 10:58 AM Subject:         RE: [cti] Internationalization: lang field required or optional? Greetings, If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions. The  terms identified in RFC 2119 allow for conditions. Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase: “If a producer other than the object creator wishes to create a new version, they MUST instead create a new object with a new id .” So let’s say we go with OPTIONAL … MUST/SHALL… These are somewhat convoluted but: OPTIONAL ? the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process. OPTIONAL ? the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border. Ryu asks if it’s worth spending time defining use cases? If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field. Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user. IMHO, tool providers can easily accommodate this field in their UI and in the interchange. How tool providers enhance their user experience is not the CTI TC’s concern. I believe, “REQUIRED ? MUST be filled in with a valid code”, is the better choice. Gus Gus Creedon 7940 Jones Branch Drive, Tysons, VA 22102 Office: (703)917-7272     Cell: (571)335-6899 From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Masuoka, Ryusuke Sent: Monday, February 27, 2017 3:21 AM To: Back, Greg < gback@mitre.org >; Jason Keirstead < Jason.Keirstead@ca.ibm.com >; Allan Thomson < athomson@lookingglasscyber.com > Cc: Bret Jordan < Bret_Jordan@symantec.com >; Wunder, John A. < jwunder@mitre.org >; cti@lists.oasis-open.org Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional? Hi, I think the differences are in use cases in each one’s mind. Human readable texts are for humans to consume, but “lang:”tag is for the system to produce/consume. This is an on-the-wire/between-systems requirement/optionality. With the system knowing the language code for the human readable texts, the system can handle things better and provide much better UI, etc. My question is what is worth (use cases) to define lang: tag if it is optional. Regards, Ryu From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ] On Behalf Of Back, Greg Sent: Friday, February 24, 2017 11:05 PM To: Jason Keirstead; Allan Thomson Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional. Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level. As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic. Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date. Greg From: < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Friday, February 24, 2017 at 7:15 AM To: Allan Thomson < athomson@lookingglasscyber.com > Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? I also agree with Alan and John in the preference to make this optional. In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information. There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Allan Thomson < athomson@lookingglasscyber.com > To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/23/2017 07:01 PM Subject:         Re: [cti] Internationalization: lang field required or optional? Sent by:         < cti@lists.oasis-open.org > If you are expecting to use different language content then its required for interoperability reasons. But by marking it required in the spec means that all content must have it even when most content is not multi-language. I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not. If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec. allan From: Bret Jordan < Bret_Jordan@symantec.com > Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? My thoughts.... 1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it. Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is. Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing. I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it. Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time. Bret From: cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber.com > Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? Prefer optional. From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org > Date: Thursday, February 23, 2017 at 12:59 PM To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: [cti] Internationalization: lang field required or optional? Hey everyone, We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional. If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not. Here are some thoughts: Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization Making it optional: -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that. Anyway, what does everyone think…required or optional? John This e-mail and any attached files are only for the use of its intended recipient(s). Its contents are confidential and may be privileged. Fujitsu does not guarantee that this e-mail has not been intercepted and amended or that it is virus free. If you have received this e-mail and are not the intended recipient, please contact the sender by e-mail and destroy all copies of this e-mail and any attachments. / Le présent courriel, ainsi que ses pièces jointes, ne peut être utilisé que par le ou les destinataires auxquels il a été transmis. Les renseignements qu'il contient sont confidentiels, voire même protégés. Fujitsu ne peut garantir que ce courriel n'a pas été intercepté ou modifié, ou qu'il ne contient aucun virus. Si vous avez reçu ce courriel sans en être le destinataire prévu, veuillez communiquer par courriel avec son expéditeur et en détruire toutes les copies et pièces jointes.


  • 21.  Re: [cti] Internationalization: lang field required or optional?

    Posted 03-05-2017 05:16
      |   view attached
    I in not a fan of caveats. I much prefer having rules that apply always. I'm leaning towards the required lang attribute on all objects as it future proofs the objects so that if we do add in text fields then we'll already have a lang attribute to apply to it. Cheers Terry MacDonald Cosive On 5/03/2017 9:57 AM, "Bret Jordan" < Bret_Jordan@symantec.com > wrote: I think I am coming around to Jason's views.. But I do have a few concerns with our one-size fits all approach.  Namely, I do not think the "lang" property should be on the "Common Properties" for all objects.  Because it does not make any sense to have it on the Observed Data or Sighting objects.  There are no string fields on these objects that allow for human generated content.   I can also see how for things like Indicators, where the vast majority of indicators will have nothing to translate, that the "lang" field should be Optional.  This makes a lot of sense.  Where I still find myself on the fence is with higher level intelligence.  This is usually always human generated and I feel it would be within reason to make "lang" required for these objects. Bret From: cti@lists.oasis-open.org < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com > Sent: Wednesday, March 1, 2017 10:21:24 AM To: Derek.Northrope@ca.fujitsu.com Cc: cti@lists.oasis-open.org ; Masuoka, Ryusuke Subject: RE: [cti] Internationalization: lang field required or optional?   For what it is worth - my argument against this is not coming from a US-centric point of view... not only am I not from the US, but our products have to support ~ 15 different languages at a minimum as they are used globally. Everyone needs to be on the same page on this discussion - and I feel very strongly like we are talking past each-other with the terms "use" and "optional". Some key points to be clear on.. a) There is a very large difference between an individual "using" something, and a piece of software implementing support for it. As this is a discussion around a data interchange specification, our primary concern is surrounding *software implementation*, not individual users. b) There is indisputably going to be a very large amount of CTI data for which no language can be inferred. This data may be machine generated,  or net-new data input by tools who do not know the language of their immediate users, or the *many billions* of pieces of existing CTI data being migrated to STIX 2. Does anyone actually dispute this? I don't think anyone is disputing it, but if there are then we should open this thread up. Therefore - when we say that we want this field to be "optional" - we are not referring to if the *specification of a language* is allowed to be optional at all... point (b) above proves that we *must* support, via some mechanism, a way to allow a language that is unspecified. What we are discussing when we say we want a field to be "optional", is simply discussing the mechanism by which a piece of software should indicate that a language is unknown. Should that be by simply omitting the field outright, or should it be lang="und" ? We are arguing about those 10 bytes on each object - nothing more. We are not arguing capability at all - all are agreed on the need for the capability. The main argument folks seem to be making for having this field be mandatory on each object, is an assumption that if the field is optional, that it won't be "used". Again - when a statement like this is made, what people are saying is "if the field is optional, than software vendors will not implement support in their products to set the field". I would argue very strongly against this, because it is false in my opinion. Software vendors will do what the market compels them to - and for any software vendor that targets multiple markets (ie, most major ones) - this includes internationalization. In our own case, we will most certainly implement support in our products to set this field, because I need to support ~ 15 languages. However, just because I will add support for this field, does not mean I support making it mandatory, because I am going to have to transmit millions of pieces of data for which no language can be inferred, and I do not want to transmit all of these GB for no real reason, when having an empty JSON field conveys the exact same information.. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         " Derek.Northrope@ca.fujitsu. com " < Derek.Northrope@ca.fujitsu. com > To:         " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Cc:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu. com > Date:         03/01/2017 10:53 AM Subject:         RE:  [cti] Internationalization: lang field required or optional? Sent by:         < cti@lists.oasis-open.org > >> The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.   >This is my biggest concern here.  Just because today the majority of use cases are single >language (usually English) we shouldn’t assume that will always be the case.  Our goal is to >democratize the sharing of intelligence content and future proof the specs as best as we can.   I couldn’t agree more. I can’t think of the amount of times a short term, dare I say it US centric decision, has had serious consequences on the long term internationalization of data sharing I have worked on.   ·         All dates of birth must be complete dates o   Workaround – if complete date is unknown then a date must be chosen just to conform to specs. Most often 1-1 is chosen but not always. o   Outcome - Hard to match identities. May have consequences on if someone is treated as a minor or adult in legal terms ·         Everyone must have a first and last name o   Workaround – If they only have one name then sometimes ‘UNKNOWN’ is used or ‘BLANK’ or ‘No First Name’, or ‘No Last Name’ or ‘NFN’ or….. or …. Or o   Outcome – It is impossible to filter or search as there is no consistency ·         Back end systems only accept 7 bit ascii but encoding format is removed from the message format to ‘save bytes flying over the wire’ (sound familiar here?) o   Workaround – No workaround. By law other countries have to record and send a person’s true name with accents (e.g. René, Jesús). Some words actually change meaning without the accent. o   Outcome –You can send accented characters to each other but please don’t send to the US or it might break our system…..   My background isn’t CTI, its Biometrics and Identity, but this here is the reason I joined this group and why this is my first comment.   Regards, Derek Northrope Head of Biometrics   Associate Director Enterprise and Cyber Security Fujitsu Americas, Inc.   Cell:         +1 613 410-3532 E-mail:       derek.northrope@ca.fujitsu.com LinkedIn:   https://ca.linkedin. com/in/dereknorthrope   This e-mail and any attached files may contain confidential and/or privileged material for the sole use of the intended recipient.  Any review, use, distribution or disclosure by others is strictly prohibited. If you are not the intended recipient (or authorized to receive this e-mail for the recipient), you may not review, copy or distribute this message.  Please contact the sender by reply e-mail and delete all copies of this message. P Do you really need to print this e-mail? Think green!   From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open. org ] On Behalf Of Coderre, Robert Sent: Wednesday, March 01, 2017 8:45 AM To: Jason.Keirstead@ca.ibm.com ; Bret_Jordan@symantec.com Cc: athomson@lookingglasscyber.com ; cti@lists.oasis-open.org ; gback@mitre.org ; GCREEDON@lmi.org ; jwunder@mitre.org ; Masuoka, Ryusuke/ ?? ?? Subject: RE: [cti] Internationalization: lang field required or optional?   > The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.   This is my biggest concern here.  Just because today the majority of use cases are single language (usually English) we shouldn’t assume that will always be the case.  Our goal is to democratize the sharing of intelligence content and future proof the specs as best as we can.   > If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object.   I can get behind this idea.  We could add a required ‘default-lang’ field to the STIX Bundle which could indicate language used for all SDO in the Bundle.  If there is a deviation, we add an optional ‘lang’ field to the SDO to capture that.  Normative text should include that any STIX communication MUST have a language defined, either at the Bundle level or at the individual object level as needed.   From: cti@lists.oasis-open.org [ mailt o:cti@lists.oasis-open.org ] On Behalf Of Jason Keirstead Sent: Wednesday, March 01, 2017 7:44 AM To: Bret Jordan Cc: Allan Thomson; cti@lists.oasis-open.org ; Back, Greg; CREEDON, Gus; Jason Keirstead; Wunder, John A.; Masuoka, Ryusuke Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?   Perhaps we could lobby to get it added to the IANA registry? I do not actually understand why it is defined in ISO 639, but not in the IANA registry. > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that. If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX object. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Bret Jordan < Bret_Jordan@symantec.com > To:         "Wunder, John A." < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "CREEDON, Gus" < GCREEDON@lmi.org > Cc:         Allan Thomson < athomson@lookingglasscyber. com >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, "Back, Greg" < gback@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu. com > Date:         02/28/2017 10:26 PM Subject:         Re: [cti] Internationalization: lang field required or optional? In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is highly unlikely that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it. If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in the event that you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want. Bret From: Wunder, John A. < jwunder@mitre.org > Sent: Tuesday, February 28, 2017 1:02:20 PM To: Jason Keirstead; CREEDON, Gus Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org ; Back, Greg; Masuoka, Ryusuke Subject: Re: [cti] Internationalization: lang field required or optional? We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set it to some useful value, they’ll just make it unknown. FWIW I tried to collect where we stand across Slack and e-mail: Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense) Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher) In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.” In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more than “optional”…it’s you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing. Anybody else want to chime in on this? John From: Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Tuesday, February 28, 2017 at 12:21 PM To: "CREEDON, Gus" < GCREEDON@lmi.org > Cc: Allan Thomson < athomson@lookingglasscyber. com >, "Bret Jordan (CS)" < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >, Greg Back < gback@mitre.org >, John Wunder < jwunder@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu. com > Subject: RE: [cti] Internationalization: lang field required or optional? It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional). Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en" or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional, then we can't use RFC5646anymore and have to switch to ISO 639-X. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         "CREEDON, Gus" < GCREEDON@lmi.org > To:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu. com >, "Back, Greg" < gback@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" < athomson@lookingglasscyber. com > Cc:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/28/2017 10:58 AM Subject:         RE: [cti] Internationalization: lang field required or optional? Greetings, If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions. The  terms identified in RFC 2119 allow for conditions. Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase: “If a producer other than the object creator wishes to create a new version, they MUST instead create a new object with a new id .” So let’s say we go with OPTIONAL … MUST/SHALL… These are somewhat convoluted but: OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process. OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border. Ryu asks if it’s worth spending time defining use cases? If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field. Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user. IMHO, tool providers can easily accommodate this field in their UI and in the interchange. How tool providers enhance their user experience is not the CTI TC’s concern. I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice. Gus Gus Creedon 7940 Jones Branch Drive, Tysons, VA 22102 Office: (703)917-7272     Cell: (571)335-6899 From: cti@lists.oasis-open.org [ mailt o:cti@lists.oasis-open.org ] On Behalf Of Masuoka, Ryusuke Sent: Monday, February 27, 2017 3:21 AM To: Back, Greg < gback@mitre.org >; Jason Keirstead < Jason.Keirstead@ca.ibm.com >; Allan Thomson < athomson@lookingglasscyber. com > Cc: Bret Jordan < Bret_Jordan@symantec.com >; Wunder, John A. < jwunder@mitre.org >; cti@lists.oasis-open.org Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional? Hi, I think the differences are in use cases in each one’s mind. Human readable texts are for humans to consume, but “lang:”tag is for the system to produce/consume. This is an on-the-wire/between-systems requirement/optionality. With the system knowing the language code for the human readable texts, the system can handle things better and provide much better UI, etc. My question is what is worth (use cases) to define lang: tag if it is optional. Regards, Ryu From: cti@lists.oasis-open.org [ mailt o:cti@lists.oasis-open.org ] On Behalf Of Back, Greg Sent: Friday, February 24, 2017 11:05 PM To: Jason Keirstead; Allan Thomson Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional. Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information. It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level. As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic. Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date. Greg From: < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com > Date: Friday, February 24, 2017 at 7:15 AM To: Allan Thomson < athomson@lookingglasscyber. com > Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? I also agree with Alan and John in the preference to make this optional. In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully, the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information. There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision. - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown From:         Allan Thomson < athomson@lookingglasscyber. com > To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Date:         02/23/2017 07:01 PM Subject:         Re: [cti] Internationalization: lang field required or optional? Sent by:         < cti@lists.oasis-open.org > If you are expecting to use different language content then its required for interoperability reasons. But by marking it required in the spec means that all content must have it even when most content is not multi-language. I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not. If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec. allan From: Bret Jordan < Bret_Jordan@symantec.com > Date: Thursday, February 23, 2017 at 2:04 PM To: Allan Thomson < athomson@lookingglasscyber. com >, "Wunder, John" < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: Re: [cti] Internationalization: lang field required or optional? My thoughts.... 1) In reality we are talking about a feature not a property.   2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization. 3) If it is required, then everyone will be forced to implement it. Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it. Problems with Required: a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en" b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is. Problems with Optional: a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.   b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank. c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing. I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".  Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if: i) they did not know the language ii) there tool did not support it iii) they were just lazy and did not add it. Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time. Bret From: cti@lists.oasis-open.org < cti@ lists.oasis-open.org > on behalf of Allan Thomson < athomson@lookingglasscyber. com > Sent: Thursday, February 23, 2017 2:29:59 PM To: Wunder, John A.; cti@lists.oasis-open.org Subject: Re: [cti] Internationalization: lang field required or optional? Prefer optional. From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > on behalf of "Wunder, John" < jwunder@mitre.org > Date: Thursday, February 23, 2017 at 12:59 PM To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org > Subject: [cti] Internationalization: lang field required or optional? Hey everyone, We’re getting very close to having a completed approach for internationalization, you can see the full writeup here: https://docs.google.com/ document/d/15qD9KBQcVcY4FlG9n_ VGhqacaeiLlNcQ7zVEjc8I3b4/ edit#heading=h.61fy0hlsdirz We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field required or optional. If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not. Here are some thoughts: Making it required: -           All SDOs and SROs would have a language tag, so consumers could depend on it being there -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise -           It shows we have a commitment to internationalization Making it optional: -           Any SRO or SDO could have a language tag, so consumers could not depend on it -           Producers would not have to create it -           We do have a SHOULD requirement saying that it should be included My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional, I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that. Anyway, what does everyone think…required or optional? John This e-mail and any attached files are only for the use of its intended recipient(s). Its contents are confidential and may be privileged. Fujitsu does not guarantee that this e-mail has not been intercepted and amended or that it is virus free. If you have received this e-mail and are not the intended recipient, please contact the sender by e-mail and destroy all copies of this e-mail and any attachments. / Le présent courriel, ainsi que ses pièces jointes, ne peut être utilisé que par le ou les destinataires auxquels il a été transmis. Les renseignements qu'il contient sont confidentiels, voire même protégés. Fujitsu ne peut garantir que ce courriel n'a pas été intercepté ou modifié, ou qu'il ne contient aucun virus. Si vous avez reçu ce courriel sans en être le destinataire prévu, veuillez communiquer par courriel avec son expéditeur et en détruire toutes les copies et pièces jointes.


  • 22.  Re: [cti] Internationalization: lang field required or optional?

    Posted 03-05-2017 23:59
      |   view attached




    +1.
     

    From:
    "cti@lists.oasis-open.org" <cti@lists.oasis-open.org> on behalf of Jason Keirstead <Jason.Keirstead@ca.ibm.com>
    Date: Wednesday, March 1, 2017 at 9:21 AM
    To: "Derek.Northrope@ca.fujitsu.com" <Derek.Northrope@ca.fujitsu.com>
    Cc: "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>, "masuoka.ryusuke@jp.fujitsu.com" <masuoka.ryusuke@jp.fujitsu.com>
    Subject: RE: [cti] Internationalization: lang field required or optional?


     

    For what it is worth - my argument against this is not coming from a US-centric point of view... not only am I not from the US, but our products have to support ~ 15 different
    languages at a minimum as they are used globally.

    Everyone needs to be on the same page on this discussion - and I feel very strongly like we are talking past each-other with the terms "use" and "optional". Some key points to be clear on..


    a) There is a very large difference between an individual "using" something, and a piece of software implementing support for it. As this is a discussion around a data interchange specification,
    our primary concern is surrounding *software implementation*, not individual users.

    b) There is indisputably going to be a very large amount of CTI data for which no language can be inferred. This data may be machine generated,  or net-new data input by tools who do not know the
    language of their immediate users, or the *many billions* of pieces of existing CTI data being migrated to STIX 2. Does anyone actually dispute this? I don't think anyone is disputing it, but if there are then we should open this thread up.

    Therefore - when we say that we want this field to be "optional" - we are not referring to if the *specification of a language* is allowed to be optional at all... point (b) above proves that we
    *must* support, via some mechanism, a way to allow a language that is unspecified.


    What we are discussing when we say we want a field to be "optional", is simply discussing the mechanism by which a piece of software should indicate that a language is unknown. Should that be by
    simply omitting the field outright, or should it be lang="und" ? We are arguing about those 10 bytes on each object - nothing more. We are not arguing capability at all - all are agreed on the need for the capability.

    The main argument folks seem to be making for having this field be mandatory on each object, is an assumption that if the field is optional, that it won't be "used". Again - when a statement like
    this is made, what people are saying is "if the field is optional, than software vendors will not implement support in their products to set the field". I would argue very strongly against this, because it is false in my opinion. Software vendors will do what
    the market compels them to - and for any software vendor that targets multiple markets (ie, most major ones) - this includes internationalization.


    In our own case, we will most certainly implement support in our products to set this field, because I need to support ~ 15 languages. However, just because I will add support for this field, does
    not mean I support making it mandatory, because I am going to have to transmit millions of pieces of data for which no language can be inferred, and I do not want to transmit all of these GB for no real reason, when having an empty JSON field conveys the exact
    same information..

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         "Derek.Northrope@ca.fujitsu.com" <Derek.Northrope@ca.fujitsu.com>
    To:         "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>
    Cc:         "Masuoka, Ryusuke" <masuoka.ryusuke@jp.fujitsu.com>
    Date:         03/01/2017 10:53 AM
    Subject:         RE:  [cti] Internationalization: lang field required or optional?
    Sent by:         <cti@lists.oasis-open.org>






    >> The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.
     
    >This is my biggest concern here.  Just because today the majority of use cases are single >language (usually English) we shouldn’t assume that will always be the case.  Our goal is to >democratize
    the sharing of intelligence content and future proof the specs as best as we can.
     
    I couldn’t agree more. I can’t think of the amount of times a short term, dare I say it US centric decision, has had serious consequences on the long term internationalization of data sharing
    I have worked on.
     
    ·  
     
     
      All dates of birth must be complete dates
    o  
    Workaround – if complete date is unknown then a date must be chosen just to conform to specs. Most often 1-1 is chosen but not always.

    o  
    Outcome - Hard to match identities. May have consequences on if someone is treated as a minor or adult in legal terms

    ·  
     
     
      Everyone must have a first and last name
    o  
    Workaround – If they only have one name then sometimes ‘UNKNOWN’ is used or ‘BLANK’ or ‘No First Name’, or ‘No Last Name’ or ‘NFN’ or….. or …. Or
    o  
    Outcome – It is impossible to filter or search as there is no consistency
    ·  
     
     
      Back end systems only accept 7 bit ascii but encoding format is removed from the message format to ‘save bytes flying over the wire’
    (sound familiar here?)
    o  
    Workaround – No workaround. By law other countries have to record and send a person’s true name with accents (e.g. René,Jesús). Some words actually change meaning without the accent.
    o  
    Outcome –You can send accented characters to each other but please don’t send to the US or it might break our system…..
     
    My background isn’t CTI, its Biometrics and Identity, but this here is the reason I joined this group and why this is my first comment.
     
    Regards,
    Derek Northrope
    Head of Biometrics
     
    Associate Director
    Enterprise and Cyber Security
    Fujitsu Americas, Inc.
     
    Cell:        +1 613 410-3532
    E-mail:      
    derek.northrope@ca.fujitsu.com
    LinkedIn:   https://ca.linkedin.com/in/dereknorthrope
     
    This e-mail and any attached files may contain confidential and/or privileged material for the sole use of the intended recipient.  Any review, use, distribution or disclosure by others is strictly
    prohibited. If you are not the intended recipient (or authorized to receive this e-mail for the recipient), you may not review, copy or distribute this message.  Please contact the sender by reply e-mail and delete all copies of this message.
    P Do you really need to print this e-mail? Think green!
     
    From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Coderre, Robert
    Sent: Wednesday, March 01, 2017 8:45 AM
    To: Jason.Keirstead@ca.ibm.com; Bret_Jordan@symantec.com
    Cc: athomson@lookingglasscyber.com; cti@lists.oasis-open.org; gback@mitre.org; GCREEDON@lmi.org; jwunder@mitre.org; Masuoka, Ryusuke/ ????
    Subject: RE: [cti] Internationalization: lang field required or optional?
     
    > The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.
     
    This is my biggest concern here.  Just because today the majority of use cases are single language (usually English) we shouldn’t assume that will always be the case.  Our goal is to democratize the sharing of
    intelligence content and future proof the specs as best as we can.
     
    > If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having
    to add 12 superfluous bytes to every single STIX object.
     
    I can get behind this idea.  We could add a required ‘default-lang’ field to the STIX Bundle which could indicate language used for all SDO in the Bundle.  If there is a deviation, we add an optional ‘lang’ field
    to the SDO to capture that.  Normative text should include that any STIX communication MUST have a language defined, either at the Bundle level or at the individual object level as needed.
     
    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Jason Keirstead
    Sent: Wednesday, March 01, 2017 7:44 AM
    To: Bret Jordan
    Cc: Allan Thomson; cti@lists.oasis-open.org ; Back, Greg; CREEDON, Gus; Jason Keirstead; Wunder,
    John A.; Masuoka, Ryusuke
    Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?
     
    Perhaps we could lobby to get it added to the IANA registry?


    I do not actually understand why it is defined in ISO 639, but not in the IANA registry.

    > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make it optional, then it is highly unlikely that will take off in mass.
     You will have a few groups here and there that will do it, but the rest will just ignore it.


    The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.

    If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having to add 12 superfluous bytes to every single STIX
    object.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         Bret Jordan < Bret_Jordan@symantec.com >
    To:         "Wunder, John A." < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc:         Allan Thomson < athomson@lookingglasscyber.com >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >, "Back, Greg" < gback@mitre.org >,
    "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Date:         02/28/2017 10:26 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?








    In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is highly unlikely that will take off in mass.  You
    will have a few groups here and there that will do it, but the rest will just ignore it.


    If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in the event that you do not know the language, you
    MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want.

    Bret





    From: Wunder, John A. < jwunder@mitre.org >
    Sent: Tuesday, February 28, 2017 1:02:20 PM
    To: Jason Keirstead; CREEDON, Gus
    Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org ; Back, Greg; Masuoka, Ryusuke
    Subject: Re: [cti] Internationalization: lang field required or optional?

    We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set it to some useful value, they’ll just make it unknown.

    FWIW I tried to collect where we stand across Slack and e-mail:
    Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense)
    Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher)

    In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.”

    In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more than “optional”…it’s you should do this unless you have a good reason
    not to. So with the field “optional”, the validator would still throw a warning if it’s missing.

    Anybody else want to chime in on this?

    John

    From: Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Tuesday, February 28, 2017 at 12:21 PM
    To: "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc: Allan Thomson < athomson@lookingglasscyber.com >, "Bret Jordan (CS)" < Bret_Jordan@symantec.com >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >,
    Greg Back < gback@mitre.org >, John Wunder < jwunder@mitre.org >,
    "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Subject: RE: [cti] Internationalization: lang field required or optional?

    It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional).

    Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language
    that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en"
    or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional,
    then we can't use RFC5646anymore and have to switch to ISO 639-X.


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         "CREEDON, Gus" < GCREEDON@lmi.org >
    To:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Back, Greg" < gback@mitre.org >,
    Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" < athomson@lookingglasscyber.com >
    Cc:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/28/2017 10:58 AM
    Subject:         RE: [cti] Internationalization: lang field required or optional?









    Greetings,

    If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions.
    The  terms identified in RFC 2119 allow for conditions.
    Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase:

    “If a producer other than the object creator wishes to create a new version, they
    MUST instead create a new object with a new id .”

    So let’s say we go with
    OPTIONAL … MUST/SHALL…

    These are somewhat convoluted but:
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process.
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border.

    Ryu asks if it’s worth spending time defining use cases?
    If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field.
    Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user.

    IMHO, tool providers can easily accommodate this field in their UI and in the interchange.

    How tool providers enhance their user experience is not the CTI TC’s concern.
    I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice.

    Gus

    Gus Creedon

    7940 Jones Branch Drive, Tysons, VA 22102
    Office: (703)917-7272     Cell: (571)335-6899





    From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Masuoka, Ryusuke
    Sent: Monday, February 27, 2017 3:21 AM
    To: Back, Greg < gback@mitre.org >; Jason Keirstead < Jason.Keirstead@ca.ibm.com >;
    Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >; Wunder, John A. < jwunder@mitre.org >;
    cti@lists.oasis-open.org
    Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional?

    Hi,

    I think the differences are in use cases in each one’s mind.

    Human readable texts are for humans to consume, but
    “lang:”tag is for the system to produce/consume.
    This is an on-the-wire/between-systems requirement/optionality.
    With the system knowing the language code for the human readable
    texts, the system can handle things better and provide much better UI, etc.

    My question is what is worth (use cases) to define lang: tag if it is optional.

    Regards,

    Ryu

    From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Back, Greg
    Sent: Friday, February 24, 2017 11:05 PM
    To: Jason Keirstead; Allan Thomson
    Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.

    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information.
    It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.


    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the
    user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic.

    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication
    that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date.

    Greg

    From: < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully,
    the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There
    is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         Allan Thomson < athomson@lookingglasscyber.com >
    To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?
    Sent by:         < cti@lists.oasis-open.org >










    If you are expecting to use different language content then its required for interoperability reasons.

    But by marking it required in the spec means that all content must have it even when most content is not multi-language.


    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.

    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.

    allan

    From: Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    My thoughts....

    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.


    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.

    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.


    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".
     Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.

    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.

    Bret







    From: cti@lists.oasis-open.org < cti@lists.oasis-open.org >
    on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    Prefer optional.

    From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?

    Hey everyone,

    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz

    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field
    required or optional.

    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.

    Here are some thoughts:

    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -           It shows we have a commitment to internationalization

    Making it optional:

    -           Any SRO or SDO could have a language tag, so consumers could not depend on it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included

    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional,
    I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and
    we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.

    Anyway, what does everyone think…required or optional?

    John







    This e-mail and any attached files are only for the use of its intended recipient(s). Its contents are confidential and may be privileged. Fujitsu does not guarantee that this e-mail has not been
    intercepted and amended or that it is virus free. If you have received this e-mail and are not the intended recipient, please contact the sender by e-mail and destroy all copies of this e-mail and any attachments. / Le présent courriel, ainsi que ses pièces
    jointes, ne peut être utilisé que par le ou les destinataires auxquels il a été transmis. Les renseignements qu'il contient sont confidentiels, voire même protégés. Fujitsu ne peut garantir que ce courriel n'a pas été intercepté ou modifié, ou qu'il ne contient
    aucun virus. Si vous avez reçu ce courriel sans en être le destinataire prévu, veuillez communiquer par courriel avec son expéditeur et en détruire toutes les copies et pièces jointes.










  • 23.  Re: [cti] Internationalization: lang field required or optional?

    Posted 03-01-2017 15:36
      |   view attached




    I’d definitely support having it at the TAXII feed level. Could just use the standard HTTP Content-Language header that way.
     
    I don’t really feel good about having it in bundle. We’ve tried pretty hard to make sure bundle remains “throwaway”. If it came down to that I would prefer just duplicating it on every
    object to maintain that pattern. Sorry L

     
    John
     

    From:
    "Coderre, Robert" <rcoderre@verisign.com>
    Date: Wednesday, March 1, 2017 at 8:44 AM
    To: "Jason.Keirstead@ca.ibm.com" <Jason.Keirstead@ca.ibm.com>, "Bret Jordan (CS)" <Bret_Jordan@symantec.com>
    Cc: "athomson@lookingglasscyber.com" <athomson@lookingglasscyber.com>, "cti@lists.oasis-open.org" <cti@lists.oasis-open.org>, Greg Back <gback@mitre.org>, "GCREEDON@lmi.org" <GCREEDON@lmi.org>, John Wunder <jwunder@mitre.org>, "masuoka.ryusuke@jp.fujitsu.com"
    <masuoka.ryusuke@jp.fujitsu.com>
    Subject: RE: [cti] Internationalization: lang field required or optional?


     

    > The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.
     
    This is my biggest concern here.  Just because today the majority of use cases are single language (usually English) we shouldn’t assume that will always be the case.  Our goal is to democratize
    the sharing of intelligence content and future proof the specs as best as we can.
     
    > If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my
    ecosystem without having to add 12 superfluous bytes to every single STIX object.
     
    I can get behind this idea.  We could add a required ‘default-lang’ field to the STIX Bundle which could indicate language used for all SDO in the Bundle.  If there is a deviation, we add
    an optional ‘lang’ field to the SDO to capture that.  Normative text should include that any STIX communication MUST have a language defined, either at the Bundle level or at the individual object level as needed.
     
    From: cti@lists.oasis-open.org [mailto:cti@lists.oasis-open.org]
    On Behalf Of Jason Keirstead
    Sent: Wednesday, March 01, 2017 7:44 AM
    To: Bret Jordan
    Cc: Allan Thomson; cti@lists.oasis-open.org; Back, Greg; CREEDON, Gus; Jason Keirstead; Wunder, John A.; Masuoka, Ryusuke
    Subject: [EXTERNAL] Re: [cti] Internationalization: lang field required or optional?
     
    Perhaps we could lobby to get it added to the IANA registry?


    I do not actually understand why it is defined in ISO 639, but not in the IANA registry.

    > In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we > make it optional, then it is highly unlikely
    that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it.


    The problem is, this tag is not actually required in the majority of use cases - it is just bytes flying over the wire for no reason. I really dislike that.

    If we are going to go down this path, then I would propose it be done at the STIX package level at least, or perhaps the TAXII level... somewhere I can define a "default language" for my ecosystem without having
    to add 12 superfluous bytes to every single STIX object.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown




    From:         Bret Jordan < Bret_Jordan@symantec.com >
    To:         "Wunder, John A." < jwunder@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc:         Allan Thomson < athomson@lookingglasscyber.com >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >, "Back, Greg" < gback@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Date:         02/28/2017 10:26 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?






    In an effort to help defend Ryu's use cases, if we make it required, then over time it is more likely that UIs will start to incorporate the language tag in to their design.  If we make it optional, then it is highly unlikely
    that will take off in mass.  You will have a few groups here and there that will do it, but the rest will just ignore it.


    If we make it required, then we might get some people doing stupid stuff and defaulting it to "en".  But how is that different or worse then what we have today?  Further, I think we can use the RFC but say, in the event that
    you do not know the language, you MUST use "und".  No need to go to the ISO version.  We can use these RFCs how ever we want.

    Bret
     




    From: Wunder, John A. < jwunder@mitre.org >
    Sent: Tuesday, February 28, 2017 1:02:20 PM
    To: Jason Keirstead; CREEDON, Gus
    Cc: Allan Thomson; Bret Jordan; cti@lists.oasis-open.org ; Back, Greg; Masuoka, Ryusuke
    Subject: Re: [cti] Internationalization: lang field required or optional?
     
    We could always introduce our own “unknown” value, but that feels identical to making it optional except we have more bytes on the wire -- the same people who would have left it off won’t set it to some useful
    value, they’ll just make it unknown.
     
    FWIW I tried to collect where we stand across Slack and e-mail:
    Optional: myself (MITRE), Jason (IBM), Allan (LookingGlass), Greg Back (MITRE), Wouter (eclecticIQ), Alexandre (MISP), JMG (NewContext), Lauri (cyberdefense)
    Required: Bret (Symantec), Ryu (Hitachi), Rob (iDefense), Gus (LMI), Trey (Kingfisher)
     
    In terms of normative statements, if we do end up keeping it optional, perhaps we could strengthen it to: “The lang property SHOULD be present when the language of the content is known.”
     
    In the STIX 2 validator, SHOULD statements will throw a warning, plus there’s a --strict flag you can pass to change those warnings into errors. So if we add this, I feel like it’s much more than “optional”…it’s
    you should do this unless you have a good reason not to. So with the field “optional”, the validator would still throw a warning if it’s missing.
     
    Anybody else want to chime in on this?
     
    John
     
    From: Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Tuesday, February 28, 2017 at 12:21 PM
    To: "CREEDON, Gus" < GCREEDON@lmi.org >
    Cc: Allan Thomson < athomson@lookingglasscyber.com >, "Bret Jordan (CS)" < Bret_Jordan@symantec.com >, " cti@lists.oasis-open.org "
    < cti@lists.oasis-open.org >, Greg Back < gback@mitre.org >, John Wunder < jwunder@mitre.org >, "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >
    Subject: RE: [cti] Internationalization: lang field required or optional?
     
    It is not going to be hard at all coming up with use cases that generate optional conditions. In fact, I suspect this is going to be the majority (which is why it should be optional).

    Example - I have a cloud-based TIP that is heavily UK based, and thus my user accounts do not prompt them to specify their language (the product UI is English only). Someone enters an indicator into the platform and shares it out. I have *NO IDEA* what language
    that Indicator description is written in... you may *assume* it is English, but I really do not know, because I didn't ask the user... maybe they typed in French or Spanish, who knows. When I share that Indicator out, the language field *should not* be "en"
    or "en-GB", because I have no idea what the language actually is - it should be empty, or "undefined". But as I pointed out the other day, unfortunately the IETF has not decided to adopt the "und" language code from ISO! So if we don't make the field optional,
    then we can't use RFC5646anymore and have to switch to ISO 639-X.


    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         "CREEDON, Gus" < GCREEDON@lmi.org >
    To:         "Masuoka, Ryusuke" < masuoka.ryusuke@jp.fujitsu.com >, "Back, Greg" < gback@mitre.org >, Jason Keirstead/CanEast/IBM@IBMCA, "Allan Thomson" < athomson@lookingglasscyber.com >
    Cc:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >, " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/28/2017 10:58 AM
    Subject:         RE: [cti] Internationalization: lang field required or optional?







    Greetings,

    If we do not make it REQUIRED, then we may be looking at a lot of work coming up with use cases that generate OPTIONAL conditions.
    The  terms identified in RFC 2119 allow for conditions.
    Parsing a sentence from STIX 2.0, 3.4 Versioning, we do assign a condition to the ‘MUST instead create’ phrase:

    “If a producer other than the object creator wishes to create a new version, they
    MUST instead create a new object with a new id .”

    So let’s say we go with
    OPTIONAL … MUST/SHALL…

    These are somewhat convoluted but:
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer intends to enable consumers to accelerate the language identification process.
    OPTIONAL – the lang: field MUST/SHALL be used in the STIX message if the producer broadcasts to consumers who reside across a sovereign border.

    Ryu asks if it’s worth spending time defining use cases?
    If we don’t intend to make lang: REQUIRED, then we need to develop conditions to satisfy the business/use case and express them in the object field.
    Again, that could turn into a lot of work and overly complicate the tool developer’s UI if they want to Q&A their way through the options with the user.

    IMHO, tool providers can easily accommodate this field in their UI and in the interchange.

    How tool providers enhance their user experience is not the CTI TC’s concern.
    I believe, “REQUIRED – MUST be filled in with a valid code”, is the better choice.

    Gus

    Gus Creedon

    7940 Jones Branch Drive, Tysons, VA 22102
    Office: (703)917-7272     Cell: (571)335-6899





    From:
    cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Masuoka, Ryusuke
    Sent: Monday, February 27, 2017 3:21 AM
    To: Back, Greg < gback@mitre.org >; Jason Keirstead < Jason.Keirstead@ca.ibm.com >; Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >; Wunder, John A. < jwunder@mitre.org >;
    cti@lists.oasis-open.org
    Subject: [EXTERNAL] RE: [cti] Internationalization: lang field required or optional?

    Hi,

    I think the differences are in use cases in each one’s mind.

    Human readable texts are for humans to consume, but
    “lang:”tag is for the system to produce/consume.
    This is an on-the-wire/between-systems requirement/optionality.
    With the system knowing the language code for the human readable
    texts, the system can handle things better and provide much better UI, etc.

    My question is what is worth (use cases) to define lang: tag if it is optional.

    Regards,

    Ryu

    From: cti@lists.oasis-open.org [ mailto:cti@lists.oasis-open.org ]
    On Behalf Of Back, Greg
    Sent: Friday, February 24, 2017 11:05 PM
    To: Jason Keirstead; Allan Thomson
    Cc: Bret Jordan; Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I originally didn’t feel strongly either way, but I’m coming around to feeling pretty strongly it should be optional.

    Language is necessary only for human consumption (vs. encoding, which is necessary for machine consumption).  IMO, fields should only be required if leaving them off makes effective CTI sharing difficult, and I don’t (yet) think this is true for language information.
    It’s certainly we can specify in conformance levels or interoperability profiles, but I feel it would be a mistake to require it at the spec level.


    As I’ve been working on python-stix2, creating an Indicator only requires “labels” and “pattern”. All other required fields (type, id, created, modified, valid_from) can be reasonably inferred. Any program that uses python-stix2 needs to therefore require the
    user to enter that information, or make an assumption on their behalf. Getting the “current user’s” language works fine on personal machines, but on a server that many people use (for example, via a web service), it’s problematic.

    Also, a field doesn’t need to be required if we define how consumers should behave when it’s missing; in this case, saying that the language is “undefined” or “unspecified” is likely OK, particularly that “unspecified” is OK for machine-to-machine communication
    that doesn’t involve humans. This is the reason I’ve always felt “modified” should be optional; IMO it’s perfectly reasonable to mandate that, if not explicitly specified in JSON, consumers MUST assume it was last modified at the “created” date.

    Greg

    From: < cti@lists.oasis-open.org > on behalf of Jason Keirstead < Jason.Keirstead@ca.ibm.com >
    Date: Friday, February 24, 2017 at 7:15 AM
    To: Allan Thomson < athomson@lookingglasscyber.com >
    Cc: Bret Jordan < Bret_Jordan@symantec.com >, John Wunder < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    I also agree with Alan and John in the preference to make this optional.

    In general I do not like sending bytes when bytes are not required in a data interchange format, especially when considering the scale of data we will be dealing with in STIX/TAXII. We should be looking for opportunities to keep the data format trim. Truthfully,
    the vast majority of data in an ecosystem will all be the same language, and thus having to transmit a language tag for every single object in a package is redundant information.

    There is also another issue with making it "required", and that is that we would then have to support "unknown" or "undefined" - which many products would have to mark content as since they may not know the producer of the content's native language.  There
    is an ISO 639 language tag for "undefined", but there is no IETF tag for "undefined" in the IANA registry, they never adopted the ISO entry. So making this mandatory may force a revisit of the RFC5646decision.

    -
    Jason Keirstead
    STSM, Product Architect, Security Intelligence, IBM Security Systems
    www.ibm.com/security
    www.securityintelligence.com

    Without data, all you are is just another person with an opinion - Unknown





    From:         Allan Thomson < athomson@lookingglasscyber.com >
    To:         Bret Jordan < Bret_Jordan@symantec.com >, "Wunder, John A." < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Date:         02/23/2017 07:01 PM
    Subject:         Re: [cti] Internationalization: lang field required or optional?
    Sent by:         < cti@lists.oasis-open.org >








    If you are expecting to use different language content then its required for interoperability reasons.

    But by marking it required in the spec means that all content must have it even when most content is not multi-language.


    I generally would prefer more tolerance in the spec level and let the products/market use good behavior to drive what fields are included or not.

    If people care about language and multi-language support then they will use it. If they don’t then they wont be interoperable as that will be part of the test in the interop spec.

    allan

    From: Bret Jordan < Bret_Jordan@symantec.com >
    Date: Thursday, February 23, 2017 at 2:04 PM
    To: Allan Thomson < athomson@lookingglasscyber.com >, "Wunder, John" < jwunder@mitre.org >,
    " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: Re: [cti] Internationalization: lang field required or optional?

    My thoughts....

    1) In reality we are talking about a feature not a property.  
    2) If it is property of this feature is optional, then the only products that will implement this feature, are those that care about internationalization.
    3) If it is required, then everyone will be forced to implement it.

    Personally I see this as a data quality issue, not a STIX issue.  And I think both sides can suffer from it.


    Problems with Required:
    a) product or tool does not care, does not provide a UX for it, and just hard codes it to something, say "en"
    b) product or tool does provide a UX for it, but analyst does not care and it just remains what ever the default is.

    Problems with Optional:
    a) product or tool does not care, does not provide a UX for it, and just leaves it out of the data.  So it is undef.  
    b) product or tool does care and provides a UX for it and the analyst does not care and leaves it blank.
    c) Broker product or tool takes in data that has a lang tag, but they do not support that feature so they never implemented it.  So when the data goes back out the other side, the language tag is now missing.


    I personally do not see the harm in requiring tools to support and populate the Lang tag.  In the spec we can define an "unknown" value, so if you are doing bulk loading of data and you honestly do not know the language, you could just flag it as "unknown".
     Then at least as the consumer you would know that the producer did not know the language.  Versus getting an object where the language tag is omitted and you do not know if:
    i) they did not know the language
    ii) there tool did not support it
    iii) they were just lazy and did not add it.

    Once again, this is a data quality problem and if we make the lang field required, then it is a SUPER EASY interop test to see if they do it right.  If it is optional, then you are just at a guess all the time.

    Bret






    From: cti@lists.oasis-open.org < cti@lists.oasis-open.org >
    on behalf of Allan Thomson < athomson@lookingglasscyber.com >
    Sent: Thursday, February 23, 2017 2:29:59 PM
    To: Wunder, John A.; cti@lists.oasis-open.org
    Subject: Re: [cti] Internationalization: lang field required or optional?

    Prefer optional.

    From: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    on behalf of "Wunder, John" < jwunder@mitre.org >
    Date: Thursday, February 23, 2017 at 12:59 PM
    To: " cti@lists.oasis-open.org " < cti@lists.oasis-open.org >
    Subject: [cti] Internationalization: lang field required or optional?

    Hey everyone,

    We’re getting very close to having a completed approach for internationalization, you can see the full writeup here:
    https://docs.google.com/document/d/15qD9KBQcVcY4FlG9n_VGhqacaeiLlNcQ7zVEjc8I3b4/edit#heading=h.61fy0hlsdirz

    We do have one remaining question before we can move forward though. As part of the proposal, every single top-level object has a “lang” field, that identifies the language of the text content in that object. What we need to decide is whether we make that field
    required or optional.

    If we make the field required, every top-level object in STIX (SDOs and SROs) would have to have a “lang” field in it or it would be invalid STIX. If we make it optional, producers could either include the field or not.

    Here are some thoughts:

    Making it required:



    -           All SDOs and SROs would have a language tag, so consumers could depend on it being there
    -           It would encourage producers to actually fill it out, because they wouldn’t be creating valid STIX otherwise
    -           It shows we have a commitment to internationalization

    Making it optional:

    -           Any SRO or SDO could have a language tag, so consumers could not depend on it
    -           Producers would not have to create it
    -           We do have a SHOULD requirement saying that it should be included

    My opinion is that we should make it optional. If it’s required, I think people who don’t want to do internationalization (especially those creating one-off scripts or open source tools) will hardcode it to English and things will be mislabeled. If it’s optional,
    I think those who need/want to support internationalization and would do it right (most/all vendors, major open source projects) will populate it correctly regardless…because they need it…while those who couldn’t be bothered will be able to leave it off and
    we won’t have mis-labeled data. Also it’s almost not worth saying, but we already have a bunch of required fields on every SDO/SRO and I’ve already had one conversation with someone who said there’s a lot of bloat…would like to avoid adding to that.

    Anyway, what does everyone think…required or optional?

    John













  • 24.  RE: [cti] Internationalization: lang field required or optional?

    Posted 03-15-2017 21:29
    +1 for everything Allan said. I would prefer optional. >