OASIS Static Analysis Results Interchange Format (SARIF) TC

  • 1.  Important subtlety in region definition (long)

    Posted 05-24-2018 16:33
    Please read to the end, where I pose two questions.   The new change draft for Issue #93 , “Problems with regions”, addresses concerns raised by Jim and Luke, and incorporates some nice design suggestions from Jim.   The spec now says that a single region object can represent both a “text region” (a contiguous sequence of characters) and a “binary region” (a contiguous sequence of bytes), using separate sets of properties. And the spec says that if a region object represents both a text region and a binary region, then the text-related properties and the binary-related properties must represent exactly the same range of bytes.   The spec does not allow you to specify a region by a mixture of text- and binary-related properties. For example, consider a UTF-16 file with no BOM and contents "abcde
    " , and consider a region that includes the characters "bcd" . The spec allows you to represent this region in many ways, such as:   Text-related line/column properties:   { "startLine": 1, "startColumn": 2, "endColumn": 5 }   Text- related offset/length properties:   { "charOffset": 1, "charLength": 3 }   A mixture of text- related line/column and offset/length properties:   { ""startLine": 1, "startColumn": 2, "charLength": 3 }   Binary-related offset/length properties:   { "byteOffset": 2, "byteLength": 6 }   But the spec does not allow this:   { "startLine": 1, "byteOffset": 2, "byteLength": 6 }  # INVALID   I could have written the spec to allow this, but I chose not to, for simplicity. The spec already has paragraphs of text and dozens of examples illustrating valid combinations of the text-related properties alone. I judged that it would be too difficult to express, and too difficult for implementers to understand and implement correctly, language that attempted to enumerate all legal combinations of text-related and binary-related properties. Instead, I required each set of properties to stand alone, and for them to be consistent.   As the spec stands, that mixed region above would be equivalent to:   {   "startLine": 1,   "startColumn": 1,             // Missing startColumn defaults to 1.   "endLine": 1,                 // Missing endLine defaults to startLine.   "endColumn": 6,               // Missing endColumn defaults to (length of endLine) + 1, exclusive of newline sequence.     "byteOffset": 2   "byteLength": 6 }   … and now the text-related properties and the byte-related properties represent different byte ranges.   My two questions are:   Do you agree with my proposal to treat the text-related properties and binary-related properties separately? If so, should I state that explicitly, and give this example?   * sigh * Having written all this, I guess the answer to #2 has to be “Yes” if the answer to #1 is “Yes”.   Thanks, Larry


  • 2.  RE: [sarif] Important subtlety in region definition (long)

    Posted 05-25-2018 20:54
    Hearing no objection, and having gotten agreement offline from Michael, I’m going to make this restriction explicit in the change draft for #93.   Larry   From: sarif@lists.oasis-open.org <sarif@lists.oasis-open.org> On Behalf Of Larry Golding (Comcast) Sent: Thursday, May 24, 2018 9:31 AM To: sarif@lists.oasis-open.org Cc: 'James A. Kupsch' <kupsch@cs.wisc.edu>; Michael Fanning <Michael.Fanning@microsoft.com>; Luke Cartey <luke@semmle.com> Subject: [sarif] Important subtlety in region definition (long) Importance: High   Please read to the end, where I pose two questions.   The new change draft for Issue #93 , “Problems with regions”, addresses concerns raised by Jim and Luke, and incorporates some nice design suggestions from Jim.   The spec now says that a single region object can represent both a “text region” (a contiguous sequence of characters) and a “binary region” (a contiguous sequence of bytes), using separate sets of properties. And the spec says that if a region object represents both a text region and a binary region, then the text-related properties and the binary-related properties must represent exactly the same range of bytes.   The spec does not allow you to specify a region by a mixture of text- and binary-related properties. For example, consider a UTF-16 file with no BOM and contents "abcde
    " , and consider a region that includes the characters "bcd" . The spec allows you to represent this region in many ways, such as:   Text-related line/column properties:   { "startLine": 1, "startColumn": 2, "endColumn": 5 }   Text- related offset/length properties:   { "charOffset": 1, "charLength": 3 }   A mixture of text- related line/column and offset/length properties:   { ""startLine": 1, "startColumn": 2, "charLength": 3 }   Binary-related offset/length properties:   { "byteOffset": 2, "byteLength": 6 }   But the spec does not allow this:   { "startLine": 1, "byteOffset": 2, "byteLength": 6 }  # INVALID   I could have written the spec to allow this, but I chose not to, for simplicity. The spec already has paragraphs of text and dozens of examples illustrating valid combinations of the text-related properties alone. I judged that it would be too difficult to express, and too difficult for implementers to understand and implement correctly, language that attempted to enumerate all legal combinations of text-related and binary-related properties. Instead, I required each set of properties to stand alone, and for them to be consistent.   As the spec stands, that mixed region above would be equivalent to:   {   "startLine": 1,   "startColumn": 1,             // Missing startColumn defaults to 1.   "endLine": 1,                 // Missing endLine defaults to startLine.   "endColumn": 6,               // Missing endColumn defaults to (length of endLine) + 1, exclusive of newline sequence.     "byteOffset": 2   "byteLength": 6 }   … and now the text-related properties and the byte-related properties represent different byte ranges.   My two questions are:   Do you agree with my proposal to treat the text-related properties and binary-related properties separately? If so, should I state that explicitly, and give this example?   * sigh * Having written all this, I guess the answer to #2 has to be “Yes” if the answer to #1 is “Yes”.   Thanks, Larry


  • 3.  RE: [sarif] Important subtlety in region definition (long)

    Posted 05-25-2018 21:13
    I pushed a revision to the change draft that adds this section: 3.22.4 Independence of text and binary regions The text-related and binary-related properties in a region object SHALL be treated independently. That is, the value of a text-related property SHALL NOT be inferred from the value of any set of binary-related properties, and vice versa . EXAMPLE: This example is based on the sample text file show in NOTE 1 of §3.22.2. It represents invalid SARIF because the text-related and binary-related properties are inconsistent. At first glance they appear to be consistent because the byte at offset 2 is indeed on line 1: { "startLine": 1, "byteOffset": 2, "byteLength": 6 } However, because the default values for the missing text-related properties are determined entirely from the existing text-related properties, and independently of any binary-related properties, this region is in fact equivalent to this one: {   "startLine": 1,   "startColumn": 1,  // Missing startColumn defaults to 1.   "endLine": 1,      // Missing endLine defaults to startLine.   "endColumn": 6,    // Missing endColumn defaults to (length of endLine + 1),                      // exclusive of newline sequence.   "byteOffset": 2   "byteLength": 6 } This makes it clear that the text-related and binary-related properties represent different ranges of bytes, and therefore the region is invalid. Larry   From: sarif@lists.oasis-open.org <sarif@lists.oasis-open.org> On Behalf Of Larry Golding (Comcast) Sent: Friday, May 25, 2018 1:51 PM To: sarif@lists.oasis-open.org Cc: 'James A. Kupsch' <kupsch@cs.wisc.edu>; 'Michael Fanning' <Michael.Fanning@microsoft.com>; 'Luke Cartey' <luke@semmle.com> Subject: RE: [sarif] Important subtlety in region definition (long)   Hearing no objection, and having gotten agreement offline from Michael, I’m going to make this restriction explicit in the change draft for #93.   Larry   From: sarif@lists.oasis-open.org < sarif@lists.oasis-open.org > On Behalf Of Larry Golding (Comcast) Sent: Thursday, May 24, 2018 9:31 AM To: sarif@lists.oasis-open.org Cc: 'James A. Kupsch' < kupsch@cs.wisc.edu >; Michael Fanning < Michael.Fanning@microsoft.com >; Luke Cartey < luke@semmle.com > Subject: [sarif] Important subtlety in region definition (long) Importance: High   Please read to the end, where I pose two questions.   The new change draft for Issue #93 , “Problems with regions”, addresses concerns raised by Jim and Luke, and incorporates some nice design suggestions from Jim.   The spec now says that a single region object can represent both a “text region” (a contiguous sequence of characters) and a “binary region” (a contiguous sequence of bytes), using separate sets of properties. And the spec says that if a region object represents both a text region and a binary region, then the text-related properties and the binary-related properties must represent exactly the same range of bytes.   The spec does not allow you to specify a region by a mixture of text- and binary-related properties. For example, consider a UTF-16 file with no BOM and contents "abcde
    " , and consider a region that includes the characters "bcd" . The spec allows you to represent this region in many ways, such as:   Text-related line/column properties:   { "startLine": 1, "startColumn": 2, "endColumn": 5 }   Text- related offset/length properties:   { "charOffset": 1, "charLength": 3 }   A mixture of text- related line/column and offset/length properties:   { ""startLine": 1, "startColumn": 2, "charLength": 3 }   Binary-related offset/length properties:   { "byteOffset": 2, "byteLength": 6 }   But the spec does not allow this:   { "startLine": 1, "byteOffset": 2, "byteLength": 6 }  # INVALID   I could have written the spec to allow this, but I chose not to, for simplicity. The spec already has paragraphs of text and dozens of examples illustrating valid combinations of the text-related properties alone. I judged that it would be too difficult to express, and too difficult for implementers to understand and implement correctly, language that attempted to enumerate all legal combinations of text-related and binary-related properties. Instead, I required each set of properties to stand alone, and for them to be consistent.   As the spec stands, that mixed region above would be equivalent to:   {   "startLine": 1,   "startColumn": 1,             // Missing startColumn defaults to 1.   "endLine": 1,                 // Missing endLine defaults to startLine.   "endColumn": 6,               // Missing endColumn defaults to (length of endLine) + 1, exclusive of newline sequence.     "byteOffset": 2   "byteLength": 6 }   … and now the text-related properties and the byte-related properties represent different byte ranges.   My two questions are:   Do you agree with my proposal to treat the text-related properties and binary-related properties separately? If so, should I state that explicitly, and give this example?   * sigh * Having written all this, I guess the answer to #2 has to be “Yes” if the answer to #1 is “Yes”.   Thanks, Larry