CTI STIX Subcommittee

 View Only
  • 1.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 13:55
    Hi John – thanks for sending the options. Although I prefer Option 1, I think Option 2 is a reasonable middle ground for me as that is easier to just map all strings to lowercase. allan On 6/8/16, 5:44 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: >This topic applies to all open and controlled vocabularies, not just kill chains, so I changed the subject line. > >To sum up, I’m hearing three options: > >1. Terms are defined as case-insensitive in the specification and implementations MUST treat Threat-Blah as == THREAT-BLAH >2. Terms are defined as case-sensitive in the specification, but values of that field MUST be lower-case (note: I don’t know what this means for non-Latin character sets, if anything. I assume there’s prior art we can use though) >3. Terms are defined as case-sensitive in the specification, we have a SHOULD requirement to follow our naming and design rules unless this is a good reason not to (i.e. the tool has existing values for that field it can’t or doesn’t want to change). This is how the spec is written now. > >Correct me if I’m wrong, but here are the opinions I’ve heard: > >Allan prefers #1. >Bret prefers #2. >Myself, Jason, and JMG prefer #3 > >Anybody else want to weigh in? > >John > >On 6/8/16, 1:31 AM, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote: > >>I would greatly prefer that all vocabs are case sensitive and that they MUST be lower-case. That makes it very simple all the way around. >> >>Bret >> >>Sent from my Commodore 64 >> >>> On Jun 8, 2016, at 1:41 AM, Allan Thomson <athomson@lookingglasscyber.com> wrote: >>> >>> I think we are discussing trade-offs that impact products creating or using STIX. >>> >>> I personally much prefer lower case for all terms but that’s not the point of deciding case sensitive or not. >>> >>> I think you should also consider the users of our products in this. >>> >>> A user will not know which case the STIX spec defined the terms in and products that expose these terms in their UI will have to support case insensitive searching/use. >>> >>> Users will just type what they think the term is without regard to uppercase, lowercase, camel-case ….etc. >>> >>> By making terms case sensitive in the protocol exchange you are forcing products to know what the exact case was used in the spec, and then products will have to know how to map from what users do to the underlying protocol uses. >>> >>> For me, not having to care about case sensitivity if a user enters a term of an open vocab in all CAPS when the spec was defined in lowercase then that would be a good thing. >>> >>> I also think for open vocabs products will have to support the option to extend the vocab and therefore unless you are careful you could end up with multiple versions of the same term just because the user’s entered the term using different cases. >>> >>> For example, all of the following are clearly the same term: >>> >>> THREAT-BLAH >>> Threat-Blah >>> threat-blah >>> threat-Blah >>> threat-BLAH >>> >>> ….etc. >>> >>> Allan >>> >>>> On 6/7/16, 4:53 PM, "John-Mark Gurney" <jmg@newcontext.com> wrote: >>>> >>>> Jason Keirstead wrote this message on Tue, Jun 07, 2016 at 09:04 -0300: >>>>> I would vastly prefer that the standard declares that vocabularies are >>>>> case-sensitive. If vocabularies are case-insensitive it is a headache. Note >>>>> that I am *not* saying that I think that we should mandate that entries all >>>>> be lower-case - I am saying that we should mandate that the vocabulary is >>>>> case-sensitive and compares should be done that way. >>>> >>>> I agree... Trying to do case insensitive compares intorduces complexities >>>> that case sensitive does not.. Simple ==/strcmp for most uses... >>>> >>>> -- >>>> John-Mark >>> >


  • 2.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 14:35
    I can live with #2. On 6/8/16, 9:54 AM, "Allan Thomson" <athomson@lookingglasscyber.com> wrote: >Hi John – thanks for sending the options. > >Although I prefer Option 1, I think Option 2 is a reasonable middle ground for me as that is easier to just map all strings to lowercase. > >allan > >On 6/8/16, 5:44 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: > >>This topic applies to all open and controlled vocabularies, not just kill chains, so I changed the subject line. >> >>To sum up, I’m hearing three options: >> >>1. Terms are defined as case-insensitive in the specification and implementations MUST treat Threat-Blah as == THREAT-BLAH >>2. Terms are defined as case-sensitive in the specification, but values of that field MUST be lower-case (note: I don’t know what this means for non-Latin character sets, if anything. I assume there’s prior art we can use though) >>3. Terms are defined as case-sensitive in the specification, we have a SHOULD requirement to follow our naming and design rules unless this is a good reason not to (i.e. the tool has existing values for that field it can’t or doesn’t want to change). This is how the spec is written now. >> >>Correct me if I’m wrong, but here are the opinions I’ve heard: >> >>Allan prefers #1. >>Bret prefers #2. >>Myself, Jason, and JMG prefer #3 >> >>Anybody else want to weigh in? >> >>John >> >>On 6/8/16, 1:31 AM, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote: >> >>>I would greatly prefer that all vocabs are case sensitive and that they MUST be lower-case. That makes it very simple all the way around. >>> >>>Bret >>> >>>Sent from my Commodore 64 >>> >>>> On Jun 8, 2016, at 1:41 AM, Allan Thomson <athomson@lookingglasscyber.com> wrote: >>>> >>>> I think we are discussing trade-offs that impact products creating or using STIX. >>>> >>>> I personally much prefer lower case for all terms but that’s not the point of deciding case sensitive or not. >>>> >>>> I think you should also consider the users of our products in this. >>>> >>>> A user will not know which case the STIX spec defined the terms in and products that expose these terms in their UI will have to support case insensitive searching/use. >>>> >>>> Users will just type what they think the term is without regard to uppercase, lowercase, camel-case ….etc. >>>> >>>> By making terms case sensitive in the protocol exchange you are forcing products to know what the exact case was used in the spec, and then products will have to know how to map from what users do to the underlying protocol uses. >>>> >>>> For me, not having to care about case sensitivity if a user enters a term of an open vocab in all CAPS when the spec was defined in lowercase then that would be a good thing. >>>> >>>> I also think for open vocabs products will have to support the option to extend the vocab and therefore unless you are careful you could end up with multiple versions of the same term just because the user’s entered the term using different cases. >>>> >>>> For example, all of the following are clearly the same term: >>>> >>>> THREAT-BLAH >>>> Threat-Blah >>>> threat-blah >>>> threat-Blah >>>> threat-BLAH >>>> >>>> ….etc. >>>> >>>> Allan >>>> >>>>> On 6/7/16, 4:53 PM, "John-Mark Gurney" <jmg@newcontext.com> wrote: >>>>> >>>>> Jason Keirstead wrote this message on Tue, Jun 07, 2016 at 09:04 -0300: >>>>>> I would vastly prefer that the standard declares that vocabularies are >>>>>> case-sensitive. If vocabularies are case-insensitive it is a headache. Note >>>>>> that I am *not* saying that I think that we should mandate that entries all >>>>>> be lower-case - I am saying that we should mandate that the vocabulary is >>>>>> case-sensitive and compares should be done that way. >>>>> >>>>> I agree... Trying to do case insensitive compares intorduces complexities >>>>> that case sensitive does not.. Simple ==/strcmp for most uses... >>>>> >>>>> -- >>>>> John-Mark >>>> >> >


  • 3.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 14:39
    I’m a fan of #3 On 6/8/16, 10:34 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: >I can live with #2. > >On 6/8/16, 9:54 AM, "Allan Thomson" <athomson@lookingglasscyber.com> wrote: > >>Hi John – thanks for sending the options. >> >>Although I prefer Option 1, I think Option 2 is a reasonable middle ground for me as that is easier to just map all strings to lowercase. >> >>allan >> >>On 6/8/16, 5:44 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: >> >>>This topic applies to all open and controlled vocabularies, not just kill chains, so I changed the subject line. >>> >>>To sum up, I’m hearing three options: >>> >>>1. Terms are defined as case-insensitive in the specification and implementations MUST treat Threat-Blah as == THREAT-BLAH >>>2. Terms are defined as case-sensitive in the specification, but values of that field MUST be lower-case (note: I don’t know what this means for non-Latin character sets, if anything. I assume there’s prior art we can use though) >>>3. Terms are defined as case-sensitive in the specification, we have a SHOULD requirement to follow our naming and design rules unless this is a good reason not to (i.e. the tool has existing values for that field it can’t or doesn’t want to change). This is how the spec is written now. >>> >>>Correct me if I’m wrong, but here are the opinions I’ve heard: >>> >>>Allan prefers #1. >>>Bret prefers #2. >>>Myself, Jason, and JMG prefer #3 >>> >>>Anybody else want to weigh in? >>> >>>John >>> >>>On 6/8/16, 1:31 AM, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote: >>> >>>>I would greatly prefer that all vocabs are case sensitive and that they MUST be lower-case. That makes it very simple all the way around. >>>> >>>>Bret >>>> >>>>Sent from my Commodore 64 >>>> >>>>> On Jun 8, 2016, at 1:41 AM, Allan Thomson <athomson@lookingglasscyber.com> wrote: >>>>> >>>>> I think we are discussing trade-offs that impact products creating or using STIX. >>>>> >>>>> I personally much prefer lower case for all terms but that’s not the point of deciding case sensitive or not. >>>>> >>>>> I think you should also consider the users of our products in this. >>>>> >>>>> A user will not know which case the STIX spec defined the terms in and products that expose these terms in their UI will have to support case insensitive searching/use. >>>>> >>>>> Users will just type what they think the term is without regard to uppercase, lowercase, camel-case ….etc. >>>>> >>>>> By making terms case sensitive in the protocol exchange you are forcing products to know what the exact case was used in the spec, and then products will have to know how to map from what users do to the underlying protocol uses. >>>>> >>>>> For me, not having to care about case sensitivity if a user enters a term of an open vocab in all CAPS when the spec was defined in lowercase then that would be a good thing. >>>>> >>>>> I also think for open vocabs products will have to support the option to extend the vocab and therefore unless you are careful you could end up with multiple versions of the same term just because the user’s entered the term using different cases. >>>>> >>>>> For example, all of the following are clearly the same term: >>>>> >>>>> THREAT-BLAH >>>>> Threat-Blah >>>>> threat-blah >>>>> threat-Blah >>>>> threat-BLAH >>>>> >>>>> ….etc. >>>>> >>>>> Allan >>>>> >>>>>> On 6/7/16, 4:53 PM, "John-Mark Gurney" <jmg@newcontext.com> wrote: >>>>>> >>>>>> Jason Keirstead wrote this message on Tue, Jun 07, 2016 at 09:04 -0300: >>>>>>> I would vastly prefer that the standard declares that vocabularies are >>>>>>> case-sensitive. If vocabularies are case-insensitive it is a headache. Note >>>>>>> that I am *not* saying that I think that we should mandate that entries all >>>>>>> be lower-case - I am saying that we should mandate that the vocabulary is >>>>>>> case-sensitive and compares should be done that way. >>>>>> >>>>>> I agree... Trying to do case insensitive compares intorduces complexities >>>>>> that case sensitive does not.. Simple ==/strcmp for most uses... >>>>>> >>>>>> -- >>>>>> John-Mark >>>>> >>> >> >


  • 4.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 15:33
    I think that makes use roughly 50:50, with a preference towards #2 given people’s fallback choices. I was curious how lower-casing works with non-latin characters and it seems doable, though naturally more complicated than you would hope: http://stackoverflow.com/questions/929079/unicode-lowercase-characters Other languages don’t really have case distinctions so the topic isn’t relevant to them. For normative requirement purposes we can probably identify an existing place where people specify upper-case Unicode characters and just prohibit them. John On 6/8/16, 10:38 AM, "cti-stix@lists.oasis-open.org on behalf of Paul Patrick" <cti-stix@lists.oasis-open.org on behalf of ppatrick@isightpartners.com> wrote: >I’m a fan of #3 > >On 6/8/16, 10:34 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: > >>I can live with #2. >> >>On 6/8/16, 9:54 AM, "Allan Thomson" <athomson@lookingglasscyber.com> wrote: >> >>>Hi John – thanks for sending the options. >>> >>>Although I prefer Option 1, I think Option 2 is a reasonable middle ground for me as that is easier to just map all strings to lowercase. >>> >>>allan >>> >>>On 6/8/16, 5:44 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: >>> >>>>This topic applies to all open and controlled vocabularies, not just kill chains, so I changed the subject line. >>>> >>>>To sum up, I’m hearing three options: >>>> >>>>1. Terms are defined as case-insensitive in the specification and implementations MUST treat Threat-Blah as == THREAT-BLAH >>>>2. Terms are defined as case-sensitive in the specification, but values of that field MUST be lower-case (note: I don’t know what this means for non-Latin character sets, if anything. I assume there’s prior art we can use though) >>>>3. Terms are defined as case-sensitive in the specification, we have a SHOULD requirement to follow our naming and design rules unless this is a good reason not to (i.e. the tool has existing values for that field it can’t or doesn’t want to change). This is how the spec is written now. >>>> >>>>Correct me if I’m wrong, but here are the opinions I’ve heard: >>>> >>>>Allan prefers #1. >>>>Bret prefers #2. >>>>Myself, Jason, and JMG prefer #3 >>>> >>>>Anybody else want to weigh in? >>>> >>>>John >>>> >>>>On 6/8/16, 1:31 AM, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote: >>>> >>>>>I would greatly prefer that all vocabs are case sensitive and that they MUST be lower-case. That makes it very simple all the way around. >>>>> >>>>>Bret >>>>> >>>>>Sent from my Commodore 64 >>>>> >>>>>> On Jun 8, 2016, at 1:41 AM, Allan Thomson <athomson@lookingglasscyber.com> wrote: >>>>>> >>>>>> I think we are discussing trade-offs that impact products creating or using STIX. >>>>>> >>>>>> I personally much prefer lower case for all terms but that’s not the point of deciding case sensitive or not. >>>>>> >>>>>> I think you should also consider the users of our products in this. >>>>>> >>>>>> A user will not know which case the STIX spec defined the terms in and products that expose these terms in their UI will have to support case insensitive searching/use. >>>>>> >>>>>> Users will just type what they think the term is without regard to uppercase, lowercase, camel-case ….etc. >>>>>> >>>>>> By making terms case sensitive in the protocol exchange you are forcing products to know what the exact case was used in the spec, and then products will have to know how to map from what users do to the underlying protocol uses. >>>>>> >>>>>> For me, not having to care about case sensitivity if a user enters a term of an open vocab in all CAPS when the spec was defined in lowercase then that would be a good thing. >>>>>> >>>>>> I also think for open vocabs products will have to support the option to extend the vocab and therefore unless you are careful you could end up with multiple versions of the same term just because the user’s entered the term using different cases. >>>>>> >>>>>> For example, all of the following are clearly the same term: >>>>>> >>>>>> THREAT-BLAH >>>>>> Threat-Blah >>>>>> threat-blah >>>>>> threat-Blah >>>>>> threat-BLAH >>>>>> >>>>>> ….etc. >>>>>> >>>>>> Allan >>>>>> >>>>>>> On 6/7/16, 4:53 PM, "John-Mark Gurney" <jmg@newcontext.com> wrote: >>>>>>> >>>>>>> Jason Keirstead wrote this message on Tue, Jun 07, 2016 at 09:04 -0300: >>>>>>>> I would vastly prefer that the standard declares that vocabularies are >>>>>>>> case-sensitive. If vocabularies are case-insensitive it is a headache. Note >>>>>>>> that I am *not* saying that I think that we should mandate that entries all >>>>>>>> be lower-case - I am saying that we should mandate that the vocabulary is >>>>>>>> case-sensitive and compares should be done that way. >>>>>>> >>>>>>> I agree... Trying to do case insensitive compares intorduces complexities >>>>>>> that case sensitive does not.. Simple ==/strcmp for most uses... >>>>>>> >>>>>>> -- >>>>>>> John-Mark >>>>>> >>>> >>> >> >


  • 5.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 16:45
    Case insensitivity can get extremely complicated with non-latin characters. The definitive example is Turkish - http://www.i18nguy.com/unicode/turkish-i18n.html - Jason Keirstead STSM, Product Architect, Security Intelligence, IBM Security Systems www.ibm.com/security www.securityintelligence.com Without data, all you are is just another person with an opinion - Unknown "Wunder, John A." ---06/08/2016 12:33:11 PM---I think that makes use roughly 50:50, with a preference towards #2 given people’s fallback choices. From: "Wunder, John A." <jwunder@mitre.org> To: "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org> Date: 06/08/2016 12:33 PM Subject: Re: [cti-stix] Vocab case sensitivity in STIX Sent by: <cti-stix@lists.oasis-open.org> I think that makes use roughly 50:50, with a preference towards #2 given people’s fallback choices. I was curious how lower-casing works with non-latin characters and it seems doable, though naturally more complicated than you would hope: http://stackoverflow.com/questions/929079/unicode-lowercase-characters Other languages don’t really have case distinctions so the topic isn’t relevant to them. For normative requirement purposes we can probably identify an existing place where people specify upper-case Unicode characters and just prohibit them. John On 6/8/16, 10:38 AM, "cti-stix@lists.oasis-open.org on behalf of Paul Patrick" <cti-stix@lists.oasis-open.org on behalf of ppatrick@isightpartners.com> wrote: >I’m a fan of #3 > >On 6/8/16, 10:34 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: > >>I can live with #2. >> >>On 6/8/16, 9:54 AM, "Allan Thomson" <athomson@lookingglasscyber.com> wrote: >> >>>Hi John – thanks for sending the options. >>> >>>Although I prefer Option 1, I think Option 2 is a reasonable middle ground for me as that is easier to just map all strings to lowercase. >>> >>>allan >>> >>>On 6/8/16, 5:44 AM, "cti-stix@lists.oasis-open.org on behalf of Wunder, John A." <cti-stix@lists.oasis-open.org on behalf of jwunder@mitre.org> wrote: >>> >>>>This topic applies to all open and controlled vocabularies, not just kill chains, so I changed the subject line. >>>> >>>>To sum up, I’m hearing three options: >>>> >>>>1. Terms are defined as case-insensitive in the specification and implementations MUST treat Threat-Blah as == THREAT-BLAH >>>>2. Terms are defined as case-sensitive in the specification, but values of that field MUST be lower-case (note: I don’t know what this means for non-Latin character sets, if anything. I assume there’s prior art we can use though) >>>>3. Terms are defined as case-sensitive in the specification, we have a SHOULD requirement to follow our naming and design rules unless this is a good reason not to (i.e. the tool has existing values for that field it can’t or doesn’t want to change). This is how the spec is written now. >>>> >>>>Correct me if I’m wrong, but here are the opinions I’ve heard: >>>> >>>>Allan prefers #1. >>>>Bret prefers #2. >>>>Myself, Jason, and JMG prefer #3 >>>> >>>>Anybody else want to weigh in? >>>> >>>>John >>>> >>>>On 6/8/16, 1:31 AM, "Jordan, Bret" <bret.jordan@bluecoat.com> wrote: >>>> >>>>>I would greatly prefer that all vocabs are case sensitive and that they MUST be lower-case.  That makes it very simple all the way around. >>>>> >>>>>Bret >>>>> >>>>>Sent from my Commodore 64 >>>>> >>>>>> On Jun 8, 2016, at 1:41 AM, Allan Thomson <athomson@lookingglasscyber.com> wrote: >>>>>> >>>>>> I think we are discussing trade-offs that impact products creating or using STIX. >>>>>> >>>>>> I personally much prefer lower case for all terms but that’s not the point of deciding case sensitive or not. >>>>>> >>>>>> I think you should also consider the users of our products in this. >>>>>> >>>>>> A user will not know which case the STIX spec defined the terms in and products that expose these terms in their UI will have to support case insensitive searching/use. >>>>>> >>>>>> Users will just type what they think the term is without regard to uppercase, lowercase, camel-case ….etc. >>>>>> >>>>>> By making terms case sensitive in the protocol exchange you are forcing products to know what the exact case was used in the spec, and then products will have to know how to map from what users do to the underlying protocol uses. >>>>>> >>>>>> For me, not having to care about case sensitivity if a user enters a term of an open vocab in all CAPS when the spec was defined in lowercase then that would be a good thing. >>>>>> >>>>>> I also think for open vocabs products will have to support the option to extend the vocab and therefore unless you are careful you could end up with multiple versions of the same term just because the user’s entered the term using different cases. >>>>>> >>>>>> For example, all of the following are clearly the same term: >>>>>> >>>>>> THREAT-BLAH >>>>>> Threat-Blah >>>>>> threat-blah >>>>>> threat-Blah >>>>>> threat-BLAH >>>>>> >>>>>> ….etc. >>>>>> >>>>>> Allan >>>>>> >>>>>>> On 6/7/16, 4:53 PM, "John-Mark Gurney" <jmg@newcontext.com> wrote: >>>>>>> >>>>>>> Jason Keirstead wrote this message on Tue, Jun 07, 2016 at 09:04 -0300: >>>>>>>> I would vastly prefer that the standard declares that vocabularies are >>>>>>>> case-sensitive. If vocabularies are case-insensitive it is a headache. Note >>>>>>>> that I am *not* saying that I think that we should mandate that entries all >>>>>>>> be lower-case - I am saying that we should mandate that the vocabulary is >>>>>>>> case-sensitive and compares should be done that way. >>>>>>> >>>>>>> I agree...  Trying to do case insensitive compares intorduces complexities >>>>>>> that case sensitive does not..  Simple ==/strcmp for most uses... >>>>>>> >>>>>>> -- >>>>>>> John-Mark >>>>>> >>>> >>> >> >




  • 6.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 22:13
    Jason Keirstead wrote this message on Wed, Jun 08, 2016 at 13:44 -0300: > Case insensitivity can get extremely complicated with non-latin characters. > > The definitive example is Turkish - > http://www.i18nguy.com/unicode/turkish-i18n.html This is exactly why I support 3... If we support 2, we need to define either a limited character set (e.g. latin-1 only) with well defined rules, or a well defined rules on case sensitivity for ALL unicode characters, and be willing to break other languages like Turkish... The header on: http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt Helps... Also points out that some case transitions involve going from one code point to two... Hmm... I did find W3C's case folding page: https://www.w3.org/International/wiki/Case_folding So, anyone who has an opinion on this topic should read it, and then decide if they want to change their vote... More info on case mapping from Unicode: http://unicode.org/faq/casemap_charprop.html Another fun example from the Unicode page: "For example, while the default uppercase mapping of "a" is "A" and the default mapping of "à" is "À", the uppercase conversion of " e vais à Paris" in some forms of French might be "JE VAIS A PARIS" Notice how the "à" is uppercased as "A" in this case." IMO, the spec should be 3, but we provide non-normative text on how organizations and vendor products should allow such input.. If all the tools follow the rules, then the issues about comparision is a non-issue... -- John-Mark


  • 7.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 22:34
    One point for vocabs.... I thought we had decided that all controlled vocabularies would be defined in the standard as English, and that it was up to the local implementation to provide translations in other languages. If this is still the case, does this also apply to open vocabs? If this is the case then I'd go option #3 (fallback #2). Otherwise if we are still going English only then option #1 seems logical. Cheers Terry MacDonald On 9/06/2016 8:13 AM, "John-Mark Gurney" < jmg@newcontext.com > wrote: Jason Keirstead wrote this message on Wed, Jun 08, 2016 at 13:44 -0300: > Case insensitivity can get extremely complicated with non-latin characters. > > The definitive example is Turkish - > http://www.i18nguy.com/unicode/turkish-i18n.html This is exactly why I support 3...  If we support 2, we need to define either a limited character set (e.g. latin-1 only) with well defined rules, or a well defined rules on case sensitivity for ALL unicode characters, and be willing to break other languages like Turkish... The header on: http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt Helps...  Also points out that some case transitions involve going from one code point to two... Hmm... I did find W3C's case folding page: https://www.w3.org/International/wiki/Case_folding So, anyone who has an opinion on this topic should read it, and then decide if they want to change their vote... More info on case mapping from Unicode: http://unicode.org/faq/casemap_charprop.html Another fun example from the Unicode page: "For example, while the default uppercase mapping of "a" is "A" and the default mapping of "à" is "À", the uppercase conversion of " e vais à Paris" in some forms of French might be "JE VAIS A PARIS" Notice how the "à" is uppercased as "A" in this case." IMO, the spec should be 3, but we provide non-normative text on how organizations and vendor products should allow such input..  If all the tools follow the rules, then the issues about comparision is a non-issue... -- John-Mark --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php


  • 8.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-09-2016 12:35




    I don’t think we should mandate that values from extended vocabularies (either other values in open vocabs, or extension values in controlled vocabs) be in English…ignoring the issues actually
    verifying that (either as a tool trying to produce valid content or as a validation program), it means that people doing STIX in other languages either need to have some ability to translate to English. Or, they can’t use extended vocab values because they
    can’t produce English text.
     
    The values in vocabularies we define should all be in English. They’re pre-defined and tools can localize their interfaces with appropriate translations even in completely non-English ecosystems…they
    wouldn’t have that same ability for tool or user developed values.
     
    Let’s schedule this topic for the call on Tuesday. If we aren’t able to resolve it then, it should probably go to a vote.
     
    John
     

    From:
    <cti-stix@lists.oasis-open.org> on behalf of Terry MacDonald <terry.macdonald@cosive.com>
    Date: Wednesday, June 8, 2016 at 6:34 PM
    To: John-Mark Gurney <jmg@newcontext.com>
    Cc: Jason Keirstead <Jason.Keirstead@ca.ibm.com>, "cti-stix@lists.oasis-open.org" <cti-stix@lists.oasis-open.org>, "Wunder, John A." <jwunder@mitre.org>
    Subject: Re: [cti-stix] Vocab case sensitivity in STIX


     



    One point for vocabs.... I thought we had decided that all controlled vocabularies would be defined in the standard as English, and that it was up to the local implementation to provide translations in other languages.
    If this is still the case, does this also apply to open vocabs? If this is the case then I'd go option #3 (fallback #2). Otherwise if we are still going English only then option #1 seems logical.
    Cheers
    Terry MacDonald

    On 9/06/2016 8:13 AM, "John-Mark Gurney" < jmg@newcontext.com > wrote:

    Jason Keirstead wrote this message on Wed, Jun 08, 2016 at 13:44 -0300:
    > Case insensitivity can get extremely complicated with non-latin characters.
    >
    > The definitive example is Turkish -
    > http://www.i18nguy.com/unicode/turkish-i18n.html

    This is exactly why I support 3...  If we support 2, we need to define
    either a limited character set (e.g. latin-1 only) with well defined
    rules, or a well defined rules on case sensitivity for ALL unicode
    characters, and be willing to break other languages like Turkish...

    The header on:
    http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt

    Helps...  Also points out that some case transitions involve going
    from one code point to two...

    Hmm... I did find W3C's case folding page:
    https://www.w3.org/International/wiki/Case_folding

    So, anyone who has an opinion on this topic should read it, and then
    decide if they want to change their vote...

    More info on case mapping from Unicode:
    http://unicode.org/faq/casemap_charprop.html

    Another fun example from the Unicode page:
    "For example, while the default uppercase mapping of "a" is "A" and
    the default mapping of "à" is "À", the uppercase conversion of "
    e vais à Paris" in some forms of French might be "JE VAIS A PARIS"
    Notice how the "à" is uppercased as "A" in this case."

    IMO, the spec should be 3, but we provide non-normative text on how
    organizations and vendor products should allow such input..  If all
    the tools follow the rules, then the issues about comparision is a
    non-issue...

    --
    John-Mark

    ---------------------------------------------------------------------
    To unsubscribe from this mail list, you must leave the OASIS TC that
    generates this mail.  Follow this link to all your TCs in OASIS at:
    https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php










  • 9.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-09-2016 14:44
    By definition , a field value that comes from a vocabulary  comes from a… wait for it… I know this is going to be a new idea… a  vocabulary . I.e., it is an opaque stream of octets that means whatever the vocabulary says it is. We can define these vocabulary words to have meanings in English, like “Attack”, “Probe”, “Manager”, etc. Or, we can define these vocabulary words to have no meanings in English, like “Value1”, “Foobar”, “Mumble”. The point is that by definition, the vocabulary somewhere defines what these words mean. So, if the thought is we want to let someone say “Anschlag” or “??” and have it mean “Attack”, we are on a fool’s errand. The only way to have interoperability is to have one and only one identifier that a machine understands. A machine need not know English, German, or Chinese. A machine need only know that the string 0x41 0x74 0x74 0x61 0x63 0x6B is the identifier for the thing we call an “Attack”. You might call it an “Anschlag” or “??”. Feel free to let your UI translate the string 0x41 0x74 0x74 0x61 0x63 0x6B to whatever you need to for your customers to understand the message. Now if we are talking about free-text fields, like “Description of the indicator,” then it is a free-text field and we do not need to worry about case folding. Talking about case folding, a number of recent IETF protocols are  case sensitive. Guess what, interoperability improved. So, a sensible way to sidestep the issue altogether is to say that STIX is case sensitive. Attack does not equal attack does not equal ATTACK does not equal aTtaCK. Just say ‘attack’ or ‘ATTACK’ (pick one) and we are done. Likewise, if giving guidance for external vocabularies, we just note the STIX is case sensitive and if your external vocabulary is not case sensitive, we would strongly recommend you say what case things should be in. On Jun 9, 2016, at 8:34 AM, Wunder, John A. < jwunder@mitre.org > wrote: I don’t think we should mandate that values from extended vocabularies (either other values in open vocabs, or extension values in controlled vocabs) be in English…ignoring the issues actually verifying that (either as a tool trying to produce valid content or as a validation program), it means that people doing STIX in other languages either need to have some ability to translate to English. Or, they can’t use extended vocab values because they can’t produce English text.   The values in vocabularies we define should all be in English. They’re pre-defined and tools can localize their interfaces with appropriate translations even in completely non-English ecosystems…they wouldn’t have that same ability for tool or user developed values.   Let’s schedule this topic for the call on Tuesday. If we aren’t able to resolve it then, it should probably go to a vote.   John   From:   < cti-stix@lists.oasis-open.org > on behalf of Terry MacDonald < terry.macdonald@cosive.com > Date:   Wednesday, June 8, 2016 at 6:34 PM To:   John-Mark Gurney < jmg@newcontext.com > Cc:   Jason Keirstead < Jason.Keirstead@ca.ibm.com >, cti-stix@lists.oasis-open.org < cti-stix@lists.oasis-open.org >, Wunder, John A. < jwunder@mitre.org > Subject:   Re: [cti-stix] Vocab case sensitivity in STIX   One point for vocabs.... I thought we had decided that all controlled vocabularies would be defined in the standard as English, and that it was up to the local implementation to provide translations in other languages. If this is still the case, does this also apply to open vocabs? If this is the case then I'd go option #3 (fallback #2). Otherwise if we are still going English only then option #1 seems logical. Cheers Terry MacDonald   On 9/06/2016 8:13 AM, John-Mark Gurney < jmg@newcontext.com > wrote: Jason Keirstead wrote this message on Wed, Jun 08, 2016 at 13:44 -0300: > Case insensitivity can get extremely complicated with non-latin characters. > > The definitive example is Turkish - >   http://www.i18nguy.com/unicode/turkish-i18n.html This is exactly why I support 3...  If we support 2, we need to define either a limited character set (e.g. latin-1 only) with well defined rules, or a well defined rules on case sensitivity for ALL unicode characters, and be willing to break other languages like Turkish... The header on: http://www.unicode.org/Public/UCD/latest/ucd/CaseFolding.txt Helps...  Also points out that some case transitions involve going from one code point to two... Hmm... I did find W3C's case folding page: https://www.w3.org/International/wiki/Case_folding So, anyone who has an opinion on this topic should read it, and then decide if they want to change their vote... More info on case mapping from Unicode: http://unicode.org/faq/casemap_charprop.html Another fun example from the Unicode page: For example, while the default uppercase mapping of a is A and the default mapping of à is À , the uppercase conversion of e vais à Paris in some forms of French might be JE VAIS A PARIS Notice how the à is uppercased as A in this case. IMO, the spec should be 3, but we provide non-normative text on how organizations and vendor products should allow such input..  If all the tools follow the rules, then the issues about comparision is a non-issue... -- John-Mark --------------------------------------------------------------------- To unsubscribe from this mail list, you must leave the OASIS TC that generates this mail.  Follow this link to all your TCs in OASIS at: https://www.oasis-open.org/apps/org/workgroup/portal/my_workgroups.php Attachment: signature.asc Description: Message signed with OpenPGP using GPGMail


  • 10.  Re: [cti-stix] Vocab case sensitivity in STIX

    Posted 06-08-2016 22:40
    Wunder, John A. wrote this message on Wed, Jun 08, 2016 at 15:32 +0000: > I think that makes use roughly 50:50, with a preference towards #2 given people’s fallback choices. > > I was curious how lower-casing works with non-latin characters and it seems doable, though naturally more complicated than you would hope: http://stackoverflow.com/questions/929079/unicode-lowercase-characters > > Other languages don’t really have case distinctions so the topic isn’t relevant to them. For normative requirement purposes we can probably identify an existing place where people specify upper-case Unicode characters and just prohibit them. That is an interesting point... Prohibiting code points of category Lu from the field in addition to maybe Lt. -- John-Mark