OASIS Static Analysis Results Interchange Format (SARIF) TC

 View Only
Expand all | Collapse all

Alternatives for embedding links

  • 1.  Alternatives for embedding links

    Posted 11-10-2017 00:05
    Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 2.  RE: [sarif] Alternatives for embedding links

    Posted 11-13-2017 18:21
    Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted on in the TC following).   My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).     [here]{relatedLocations[0]}"   [here]{stacks[0].frames[2]. physicalLocation }"   [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"   Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.   Pros : Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.   Cons: It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more concise (arguably it’s clearer to have the explicit reference).   NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.   Michael   From: sarif@lists.oasis-open.org [mailto:sarif@lists.oasis-open.org] On Behalf Of Larry Golding (Comcast) Sent: Thursday, November 9, 2017 4:03 PM To: sarif@lists.oasis-open.org Subject: [sarif] Alternatives for embedding links   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 3.  RE: [sarif] Alternatives for embedding links

    Posted 11-13-2017 19:24
    Rather than introducing a JSONPath parser to parse references such as codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.   From: Michael Fanning [mailto:Michael.Fanning@microsoft.com] Sent: Monday, November 13, 2017 10:21 AM To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted on in the TC following).   My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).     [here]{relatedLocations[0]}"   [here]{stacks[0].frames[2]. physicalLocation }"   [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"   Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.   Pros : Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.   Cons: It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more concise (arguably it’s clearer to have the explicit reference).   NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.   Michael   From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ] On Behalf Of Larry Golding (Comcast) Sent: Thursday, November 9, 2017 4:03 PM To: sarif@lists.oasis-open.org Subject: [sarif] Alternatives for embedding links   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 4.  RE: [sarif] Alternatives for embedding links

    Posted 11-13-2017 19:37
    I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this?   We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations could be duplicated).   From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net] Sent: Monday, November 13, 2017 11:22 AM To: Michael Fanning <Michael.Fanning@microsoft.com>; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Rather than introducing a JSONPath parser to parse references such as codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 10:21 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted on in the TC following).   My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).     [here]{relatedLocations[0]}"   [here]{stacks[0].frames[2]. physicalLocation }"   [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"   Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.   Pros : Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.   Cons: It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more concise (arguably it’s clearer to have the explicit reference).   NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.   Michael   From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ] On Behalf Of Larry Golding (Comcast) Sent: Thursday, November 9, 2017 4:03 PM To: sarif@lists.oasis-open.org Subject: [sarif] Alternatives for embedding links   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 5.  RE: [sarif] Alternatives for embedding links

    Posted 11-13-2017 23:50
    I can’t think of a way to do it that does not involve encoding the information about which code flow you want to link to into the message, which leads to something like the mini-language from your message of this morning (Monday, November 13, 2017 10:21 AM).   Lacking information about customer demand for this feature (“click on a link in a message, and navigate directly into the code flow specified by the link”), I would hesitate to complicate the format to support it.   From: Michael Fanning [mailto:Michael.Fanning@microsoft.com] Sent: Monday, November 13, 2017 11:37 AM To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this?   We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations could be duplicated).   From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ] Sent: Monday, November 13, 2017 11:22 AM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Rather than introducing a JSONPath parser to parse references such as codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 10:21 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted on in the TC following).   My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).     [here]{relatedLocations[0]}"   [here]{stacks[0].frames[2]. physicalLocation }"   [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"   Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.   Pros : Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.   Cons: It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more concise (arguably it’s clearer to have the explicit reference).   NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.   Michael   From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ] On Behalf Of Larry Golding (Comcast) Sent: Thursday, November 9, 2017 4:03 PM To: sarif@lists.oasis-open.org Subject: [sarif] Alternatives for embedding links   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 6.  RE: [sarif] Alternatives for embedding links

    Posted 11-16-2017 00:16
    Well, any customer that is exclusively using code flows to troubleshoot issues would have an obvious need for it. The MS static driver verifier is such a tool.   I’m not yet convinced there isn’t an idea out there that satisfies all concerns. Here’s a suggestion, for example, might trigger other thoughts if you don’t actually buy it: we could keep the format you describe, which is simply a number. Instead of this number representing strictly the index into related locations, however, it could be an id that is guaranteed to exist as a property value for one and only one physical location which exists either in related locations, a stack or a code flow.   Now we still have the simple format for the embedded link that Jim suggested. The additional complexity is a ‘embeddedLinkId’ property that needs to be added to physicalLocation (or just ‘id’).   Now imagine what the consumer side looks like: on selecting an issue, a viewer must parse all of related locations, stacks and code flows associated with reach result (tools tend to only populate one of these, but this isn’t really relevant). On detecting an id associated with a location, the viewer would cache that physical location in order to properly respond to an embedded link.   Michael From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net] Sent: Monday, November 13, 2017 3:48 PM To: Michael Fanning <Michael.Fanning@microsoft.com>; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I can’t think of a way to do it that does not involve encoding the information about which code flow you want to link to into the message, which leads to something like the mini-language from your message of this morning (Monday, November 13, 2017 10:21 AM).   Lacking information about customer demand for this feature (“click on a link in a message, and navigate directly into the code flow specified by the link”), I would hesitate to complicate the format to support it.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 11:37 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this?   We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations could be duplicated).   From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ] Sent: Monday, November 13, 2017 11:22 AM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Rather than introducing a JSONPath parser to parse references such as codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 10:21 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted on in the TC following).   My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).     [here]{relatedLocations[0]}"   [here]{stacks[0].frames[2]. physicalLocation }"   [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"   Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.   Pros : Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.   Cons: It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more concise (arguably it’s clearer to have the explicit reference).   NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.   Michael   From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ] On Behalf Of Larry Golding (Comcast) Sent: Thursday, November 9, 2017 4:03 PM To: sarif@lists.oasis-open.org Subject: [sarif] Alternatives for embedding links   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 7.  RE: [sarif] Alternatives for embedding links

    Posted 11-16-2017 01:32
    I don’t see how the fact that a tool (such as the Microsoft “Static Driver Verifier”) which exclusively uses code flows thereby has a need to encode links in their result.message property. SDV does not do that; they use simple messages like "The dispatch routine has returned without releasing the cancel spinlock" and "The property is satisfied as the driver defines a AddDevice routine." .   If they chose to produce a result.message that referenced one of the locations in the code flow (for example, "The dispatch routine has returned without releasing the cancel spinlock, which was created [here](0)" ), they could use the workaround we have discussed, of duplicating that location into the result.relatedLocations array.   Introducing an emdeddedLinkId property would avoid the need to duplicate the location, at the cost (as you say below) of the additional producer-side complexity of generating that property, and the additional consumer-side complexity of searching for the location with that property.   Larry   From: Michael Fanning [mailto:Michael.Fanning@microsoft.com] Sent: Wednesday, November 15, 2017 4:15 PM To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Well, any customer that is exclusively using code flows to troubleshoot issues would have an obvious need for it. The MS static driver verifier is such a tool.   I’m not yet convinced there isn’t an idea out there that satisfies all concerns. Here’s a suggestion, for example, might trigger other thoughts if you don’t actually buy it: we could keep the format you describe, which is simply a number. Instead of this number representing strictly the index into related locations, however, it could be an id that is guaranteed to exist as a property value for one and only one physical location which exists either in related locations, a stack or a code flow.   Now we still have the simple format for the embedded link that Jim suggested. The additional complexity is a ‘embeddedLinkId’ property that needs to be added to physicalLocation (or just ‘id’).   Now imagine what the consumer side looks like: on selecting an issue, a viewer must parse all of related locations, stacks and code flows associated with reach result (tools tend to only populate one of these, but this isn’t really relevant). On detecting an id associated with a location, the viewer would cache that physical location in order to properly respond to an embedded link.   Michael From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ] Sent: Monday, November 13, 2017 3:48 PM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I can’t think of a way to do it that does not involve encoding the information about which code flow you want to link to into the message, which leads to something like the mini-language from your message of this morning (Monday, November 13, 2017 10:21 AM).   Lacking information about customer demand for this feature (“click on a link in a message, and navigate directly into the code flow specified by the link”), I would hesitate to complicate the format to support it.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 11:37 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this?   We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations could be duplicated).   From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ] Sent: Monday, November 13, 2017 11:22 AM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Rather than introducing a JSONPath parser to parse references such as codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 10:21 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted on in the TC following).   My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).     [here]{relatedLocations[0]}"   [here]{stacks[0].frames[2]. physicalLocation }"   [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"   Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.   Pros : Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.   Cons: It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more concise (arguably it’s clearer to have the explicit reference).   NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.   Michael   From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ] On Behalf Of Larry Golding (Comcast) Sent: Thursday, November 9, 2017 4:03 PM To: sarif@lists.oasis-open.org Subject: [sarif] Alternatives for embedding links   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 8.  RE: [sarif] Alternatives for embedding links

    Posted 11-17-2017 16:47




    It’s very reasonable to expect that one event or more events from a code flow are of particular interest and for these to be called out. See Paul’s example in issue
    #33 (this example, in fact, is what made me realize we have this deficit). PREfast defines a notion of key events. I believe SDV’s output could be improved to take advantage of this feature. Related
    locations is actually a bit of a dodge, allowing for tools that don’t provide complete code flows to deliver some supporting information. If you really need to dig into an XSS issue, for example, you need the full code flow from untrusted input to dangerous
    call (not just a related location that tells you where the source came in).
     
     
    There are many patterns that obviously could usefully connect to code flows, including complex buffer overrun issues (as in Paul’s example), use of dangling pointer, etc. For stacks, a memory leak detection tool might produce a result that
    needs to double-click into the stack itself. There are many code paths that potentially lead to an allocator. Leak detection is often therefore organized around allocations associated with a specific stack. The result is that an embedded link in such an issue
    needs to invoke the stack (i.e., there is no one related location that makes sense to call out on its own).
     
    The availability of a stack or code flow is a powerful tool for diagnosing root cause (and result validity). My opinion is that we should pursue embedded links that refer to them. I’m comfortable with the costs/complexity of using an ‘id’
    property on a physical location to do this. This seems less costly than having to duplicate one or more locations (which provide a double-click that doesn’t connect you to the richer code-flow view experience).
     
    Michael


    From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net]

    Sent: Wednesday, November 15, 2017 5:30 PM
    To: Michael Fanning <Michael.Fanning@microsoft.com>; sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    I don’t see how the fact that a tool (such as the Microsoft “Static Driver Verifier”) which exclusively uses code flows thereby has a need to encode links in their result.message property. SDV does not do that; they use simple messages
    like "The dispatch routine has returned without releasing the cancel spinlock" and
    "The property is satisfied as the driver defines a AddDevice routine." .
     
    If they chose to produce a result.message that referenced one of the locations in the code flow (for example,
    "The dispatch routine has returned without releasing the cancel spinlock, which was created [here](0)" ), they could use the workaround we have discussed, of duplicating that location into
    the result.relatedLocations array.
     
    Introducing an emdeddedLinkId property would avoid the need to duplicate the location, at the cost (as you say below) of the additional producer-side complexity of generating that property, and the additional consumer-side complexity of
    searching for the location with that property.
     
    Larry
     



    From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ]

    Sent: Wednesday, November 15, 2017 4:15 PM
    To: Larry Golding (Comcast) < larrygolding@comcast.net >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    Well, any customer that is exclusively using code flows to troubleshoot issues would have an obvious need for it. The MS static driver verifier is such a tool.
     
    I’m not yet convinced there isn’t an idea out there that satisfies all concerns. Here’s a suggestion, for example, might trigger other thoughts if you don’t actually buy it: we could keep the format you describe, which is simply a number.
    Instead of this number representing strictly the index into related locations, however, it could be an id that is guaranteed to exist as a property value for one and only one physical location which exists either in related locations, a stack or a code flow.
     
    Now we still have the simple format for the embedded link that Jim suggested. The additional complexity is a ‘embeddedLinkId’ property that needs to be added to physicalLocation (or just ‘id’).
     
    Now imagine what the consumer side looks like: on selecting an issue, a viewer must parse all of related locations, stacks and code flows associated with reach result (tools tend to only populate one of these, but this isn’t really relevant).
    On detecting an id associated with a location, the viewer would cache that physical location in order to properly respond to an embedded link.
     
    Michael


    From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ]

    Sent: Monday, November 13, 2017 3:48 PM
    To: Michael Fanning < Michael.Fanning@microsoft.com >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    I can’t think of a way to do it that does not involve encoding the information about
    which code flow you want to link to into the message, which leads to something like the mini-language from your message of this morning (Monday, November 13, 2017 10:21 AM).
     
    Lacking information about customer demand for this feature (“click on a link in a message, and navigate directly into the code flow specified by the link”), I would hesitate to complicate the format to support it.

     


    From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ]

    Sent: Monday, November 13, 2017 11:37 AM
    To: Larry Golding (Comcast) < larrygolding@comcast.net >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this?
     
    We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations
    could be duplicated).
     


    From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ]

    Sent: Monday, November 13, 2017 11:22 AM
    To: Michael Fanning < Michael.Fanning@microsoft.com >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    Rather than introducing a JSONPath parser to parse references such as
    codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears
    in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.
     


    From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ]

    Sent: Monday, November 13, 2017 10:21 AM
    To: Larry Golding (Comcast) < larrygolding@comcast.net >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted
    on in the TC following).
     
    My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame
    location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).
     
     
    [here]{relatedLocations[0]}"
     
    [here]{stacks[0].frames[2]. physicalLocation }"
     
    [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"
     
    Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.
     
    Pros :

    Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.
     
    Cons:

    It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more
    concise (arguably it’s clearer to have the explicit reference).
     
    NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to
    have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github

    issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.
     
    Michael
     


    From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ]
    On Behalf Of Larry Golding (Comcast)
    Sent: Thursday, November 9, 2017 4:03 PM
    To: sarif@lists.oasis-open.org
    Subject: [sarif] Alternatives for embedding links


     
    Hello all,
     
    Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup
    such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.
     
    Both options involve using a syntax borrowed from Markdown to specify the link:
    [ link text ]( link target ) . They differ in how
    link target is expressed.
    Option 1: Mini-language
    The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see
    Section 3.19 of the spec), with three properties:
     

    A
    uri property, which is what it sounds like: the URI of the source file. A
    region property, which specifies a region within the source file, using properties such as
    startLine and
    startColumn . For a full explanation of
    region , see Section 3.20 – but you don’t need to understand those details to understand this option.
     

    A
    uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to
    understand this option.
    This option looks
    like this :
     
    {
     
    "version" :
    "1.0.0" ,
     
    "runs" : [
        {
         
    "tool" : {
           
    "name" :
    "TaintTracker"
          },
     
         
    "results" : [
            {
             
    "ruleId" :
    "CA2001" ,
             
    "locations" : [
                {
                          "analysisTarget" :
    {
                            "uriBaseId" :
    "SRCROOT" ,
                            "uri" :
    "src/db/sql.cs" ,
                            "region" :
    {
                              "startLine" :
    63,
                              "startColumn" :
    12,
                              "endColumn" :
    18
                           }
                  }
                }
              ],
             
    "message" :
    "Tainted data is used to execute a SQL command. The data entered the system
    [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "
            }
          ]
        }
      ]
    }
     
    The link text is “ here ”, and the
    link target is expressed in the mini-language as follows:
     
    $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,

     
    $(SRCROOT) refers to the
    uriBaseId ,
    src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your
    SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.
     
    This design has an interesting consequence: since the mini-language specifies everything that a
    physicalLocation object specifies, we could consider
    removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:
     
    {
     
    "version" :
    "1.0.0" ,
     
    "runs" : [
        {
         
    "tool" : {
           
    "name" :
    "TaintTracker"
          },
     
         
    "results" : [
            {
             
    "ruleId" :
    "CA2001" ,
             
    "locations" : [
                {
                          "analysisTarget" :
    "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"
                }
              ],
             
    "message" :
    "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"
            }
          ]
        }
      ]
    }
     
    The analysisTarget property, whose value was previously a
    physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not
    require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the
    physicalLocation object on the one hand, and the mini-language on the other).
    Option 2: Index into relatedLocations
    The second option expresses the link target as an index into the
    result.relatedLocations array. To understand this option, you need to know that SARIF defines a property
    relatedLocations on the
    result object. Section 3.17.12 explains that this property contains:
     
    … an array of one or more unique (§3.9)
    annotatedCodeLocation objects (§3.25), each of which represents
    a location relevant to understanding the result .
     
    In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks
    like this :
     
    {
     
    "version" :
    "1.0.0" ,
     
    "runs" : [
        {
         
    "tool" : {
           
    "name" :
    "TaintTracker"
          },
     
         
    "results" : [
            {
             
    "ruleId" :
    "CA2001" ,
             
    "locations" : [
                {
                 
    "analysisTarget" : {
                   
    "uriBaseId" :
    "SRCROOT" ,
                   
    "uri" :
    "src/db/sql.cs" ,
                   
    "region" : {
                     
    "startLine" : 63,
                     
    "startColumn" : 12,
                     
    "endColumn" : 18
                    }
                  }
                }
              ],
             
    "message" :
    "Tainted data is used to execute a SQL command. The data entered the system
    [here](0) " ,
             
    "relatedLocations" : [
                {
                 
    "message" :
    "source of tainted data" ,
                 
    "physicalLocation" : {
                   
    "uriBaseId" :
    "SRCROOT" ,
                   
    "uri" :
    "src/ui/input.cs" ,
                   
    "region" : {
                     
    "startLine" : 20,
                     
    "startColumn" : 4
                    }
                  }
                }
              ]
            }
          ]
        }
      ]
    }
     
    The link text is “ here ”, and the
    link target is expressed as an index into the
    relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the
    message property of the
    annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0.
    Comparison of the options
    Option 1 (mini-language) has these advantages:

    It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.
     
    Option 2 (index into relatedLocations) has these advantages:

    It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains
    physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should
    consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.
     
    #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in
    themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the
    run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.
     
    We will discuss this further at the next TC meeting.
     
    Thanks for reading all of this!
    Larry






  • 9.  RE: [sarif] Alternatives for embedding links

    Posted 11-18-2017 03:10
    You’ve persuaded me of two things:   About the feature: It’s desirable to able to link directly into the code flow experience. With my proposal of duplicating code flow locations into the relatedLocations array, a user could navigate to the specified location, but it would not be presented in the context of a code flow. About the implementation: Introducing an optional embeddedLinkId property on annotatedCodeLocation , and referencing it from the simple embedded link syntax we’ve been discussing – [<linkText>](<embeddedLinkId>) , for example, [here](0) – is reasonable implementation. I’ll want to prototype that in the Visual Studio VSIX viewer before we settle on it, so I won’t write the spec-ese yet.   Larry   From: Michael Fanning [mailto:Michael.Fanning@microsoft.com] Sent: Friday, November 17, 2017 8:47 AM To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   It’s very reasonable to expect that one event or more events from a code flow are of particular interest and for these to be called out. See Paul’s example in issue #33 (this example, in fact, is what made me realize we have this deficit). PREfast defines a notion of key events. I believe SDV’s output could be improved to take advantage of this feature. Related locations is actually a bit of a dodge, allowing for tools that don’t provide complete code flows to deliver some supporting information. If you really need to dig into an XSS issue, for example, you need the full code flow from untrusted input to dangerous call (not just a related location that tells you where the source came in).     There are many patterns that obviously could usefully connect to code flows, including complex buffer overrun issues (as in Paul’s example), use of dangling pointer, etc. For stacks, a memory leak detection tool might produce a result that needs to double-click into the stack itself. There are many code paths that potentially lead to an allocator. Leak detection is often therefore organized around allocations associated with a specific stack. The result is that an embedded link in such an issue needs to invoke the stack (i.e., there is no one related location that makes sense to call out on its own).   The availability of a stack or code flow is a powerful tool for diagnosing root cause (and result validity). My opinion is that we should pursue embedded links that refer to them. I’m comfortable with the costs/complexity of using an ‘id’ property on a physical location to do this. This seems less costly than having to duplicate one or more locations (which provide a double-click that doesn’t connect you to the richer code-flow view experience).   Michael From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ] Sent: Wednesday, November 15, 2017 5:30 PM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I don’t see how the fact that a tool (such as the Microsoft “Static Driver Verifier”) which exclusively uses code flows thereby has a need to encode links in their result.message property. SDV does not do that; they use simple messages like "The dispatch routine has returned without releasing the cancel spinlock" and "The property is satisfied as the driver defines a AddDevice routine." .   If they chose to produce a result.message that referenced one of the locations in the code flow (for example, "The dispatch routine has returned without releasing the cancel spinlock, which was created [here](0)" ), they could use the workaround we have discussed, of duplicating that location into the result.relatedLocations array.   Introducing an emdeddedLinkId property would avoid the need to duplicate the location, at the cost (as you say below) of the additional producer-side complexity of generating that property, and the additional consumer-side complexity of searching for the location with that property.   Larry   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Wednesday, November 15, 2017 4:15 PM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Well, any customer that is exclusively using code flows to troubleshoot issues would have an obvious need for it. The MS static driver verifier is such a tool.   I’m not yet convinced there isn’t an idea out there that satisfies all concerns. Here’s a suggestion, for example, might trigger other thoughts if you don’t actually buy it: we could keep the format you describe, which is simply a number. Instead of this number representing strictly the index into related locations, however, it could be an id that is guaranteed to exist as a property value for one and only one physical location which exists either in related locations, a stack or a code flow.   Now we still have the simple format for the embedded link that Jim suggested. The additional complexity is a ‘embeddedLinkId’ property that needs to be added to physicalLocation (or just ‘id’).   Now imagine what the consumer side looks like: on selecting an issue, a viewer must parse all of related locations, stacks and code flows associated with reach result (tools tend to only populate one of these, but this isn’t really relevant). On detecting an id associated with a location, the viewer would cache that physical location in order to properly respond to an embedded link.   Michael From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ] Sent: Monday, November 13, 2017 3:48 PM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I can’t think of a way to do it that does not involve encoding the information about which code flow you want to link to into the message, which leads to something like the mini-language from your message of this morning (Monday, November 13, 2017 10:21 AM).   Lacking information about customer demand for this feature (“click on a link in a message, and navigate directly into the code flow specified by the link”), I would hesitate to complicate the format to support it.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 11:37 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this?   We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations could be duplicated).   From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ] Sent: Monday, November 13, 2017 11:22 AM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Rather than introducing a JSONPath parser to parse references such as codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.   From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ] Sent: Monday, November 13, 2017 10:21 AM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] Alternatives for embedding links   Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted on in the TC following).   My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).     [here]{relatedLocations[0]}"   [here]{stacks[0].frames[2]. physicalLocation }"   [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"   Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.   Pros : Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.   Cons: It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more concise (arguably it’s clearer to have the explicit reference).   NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.   Michael   From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ] On Behalf Of Larry Golding (Comcast) Sent: Thursday, November 9, 2017 4:03 PM To: sarif@lists.oasis-open.org Subject: [sarif] Alternatives for embedding links   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 10.  RE: [sarif] Alternatives for embedding links

    Posted 11-20-2017 21:24




    The solution needs to generalize for relatedLocations, stacks, and code flows. I suggest we consider the following:
     

    Add an ‘id’ or ‘embeddedLinkId’ property to physical location Update stacks to include a physical location rather than the current inlined PL property members.
     
    Re: the name of this property, I see two sides. On the one hand, ‘id’ is a concise, descriptive general purpose name. If we take a feature such as Andrew has proposed (to render several SARIF constructs as graphs), this property could have
    other value. Balancing this, those other scenarios do not exist and ‘embeddedLinkId’ is extremely descriptive of its purpose.
     
    Michael
     


    From: Larry Golding (Comcast) [mailto:larrygolding@comcast.net]

    Sent: Friday, November 17, 2017 7:08 PM
    To: Michael Fanning <Michael.Fanning@microsoft.com>; sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    You’ve persuaded me of two things:
     

    About the feature: It’s desirable to able to link directly into the code flow experience. With my proposal of duplicating code flow locations into the
    relatedLocations array, a user could navigate to the specified location, but it would not be presented in the context of a code flow. About the implementation: Introducing an optional
    embeddedLinkId property on
    annotatedCodeLocation , and referencing it from the simple embedded link syntax we’ve been discussing –
    [<linkText>](<embeddedLinkId>) , for example,
    [here](0) – is reasonable implementation. I’ll want to prototype that in the Visual Studio VSIX viewer before we settle on it, so I won’t write the spec-ese yet.
     
    Larry
     



    From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ]

    Sent: Friday, November 17, 2017 8:47 AM
    To: Larry Golding (Comcast) < larrygolding@comcast.net >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    It’s very reasonable to expect that one event or more events from a code flow are of particular interest and for these to be called out. See Paul’s example in issue

    #33 (this example, in fact, is what made me realize we have this deficit). PREfast defines a notion of key events. I believe SDV’s output could be improved to take advantage of this feature. Related locations is actually a bit of a dodge, allowing for tools
    that don’t provide complete code flows to deliver some supporting information. If you really need to dig into an XSS issue, for example, you need the full code flow from untrusted input to dangerous call (not just a related location that tells you where the
    source came in).
     
     
    There are many patterns that obviously could usefully connect to code flows, including complex buffer overrun issues (as in Paul’s example), use of dangling pointer, etc. For stacks, a memory leak detection tool might produce a result that
    needs to double-click into the stack itself. There are many code paths that potentially lead to an allocator. Leak detection is often therefore organized around allocations associated with a specific stack. The result is that an embedded link in such an issue
    needs to invoke the stack (i.e., there is no one related location that makes sense to call out on its own).
     
    The availability of a stack or code flow is a powerful tool for diagnosing root cause (and result validity). My opinion is that we should pursue embedded links that refer to them. I’m comfortable with the costs/complexity of using an ‘id’
    property on a physical location to do this. This seems less costly than having to duplicate one or more locations (which provide a double-click that doesn’t connect you to the richer code-flow view experience).
     
    Michael


    From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ]

    Sent: Wednesday, November 15, 2017 5:30 PM
    To: Michael Fanning < Michael.Fanning@microsoft.com >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    I don’t see how the fact that a tool (such as the Microsoft “Static Driver Verifier”) which exclusively uses code flows thereby has a need to encode links in their result.message property. SDV does not do that; they use simple messages
    like "The dispatch routine has returned without releasing the cancel spinlock" and
    "The property is satisfied as the driver defines a AddDevice routine." .
     
    If they chose to produce a result.message that referenced one of the locations in the code flow (for example,
    "The dispatch routine has returned without releasing the cancel spinlock, which was created [here](0)" ), they could use the workaround we have discussed, of duplicating that location into
    the result.relatedLocations array.
     
    Introducing an emdeddedLinkId property would avoid the need to duplicate the location, at the cost (as you say below) of the additional producer-side complexity of generating that property, and the additional consumer-side complexity of
    searching for the location with that property.
     
    Larry
     


    From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ]

    Sent: Wednesday, November 15, 2017 4:15 PM
    To: Larry Golding (Comcast) < larrygolding@comcast.net >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    Well, any customer that is exclusively using code flows to troubleshoot issues would have an obvious need for it. The MS static driver verifier is such a tool.
     
    I’m not yet convinced there isn’t an idea out there that satisfies all concerns. Here’s a suggestion, for example, might trigger other thoughts if you don’t actually buy it: we could keep the format you describe, which is simply a number.
    Instead of this number representing strictly the index into related locations, however, it could be an id that is guaranteed to exist as a property value for one and only one physical location which exists either in related locations, a stack or a code flow.
     
    Now we still have the simple format for the embedded link that Jim suggested. The additional complexity is a ‘embeddedLinkId’ property that needs to be added to physicalLocation (or just ‘id’).
     
    Now imagine what the consumer side looks like: on selecting an issue, a viewer must parse all of related locations, stacks and code flows associated with reach result (tools tend to only populate one of these, but this isn’t really relevant).
    On detecting an id associated with a location, the viewer would cache that physical location in order to properly respond to an embedded link.
     
    Michael


    From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ]

    Sent: Monday, November 13, 2017 3:48 PM
    To: Michael Fanning < Michael.Fanning@microsoft.com >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    I can’t think of a way to do it that does not involve encoding the information about
    which code flow you want to link to into the message, which leads to something like the mini-language from your message of this morning (Monday, November 13, 2017 10:21 AM).
     
    Lacking information about customer demand for this feature (“click on a link in a message, and navigate directly into the code flow specified by the link”), I would hesitate to complicate the format to support it.

     


    From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ]

    Sent: Monday, November 13, 2017 11:37 AM
    To: Larry Golding (Comcast) < larrygolding@comcast.net >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    I definitely think that minimizing parsing is important. But it would be nice if you could click directly into the code-flow experience, in particular. Can you think of a way to do this?
     
    We could advise viewers to look for a stack location or code flow location that matches the related location and to make a guess about where the user is jumping to (into a stack or code flow). This seems like a lot of work (and locations
    could be duplicated).
     


    From: Larry Golding (Comcast) [ mailto:larrygolding@comcast.net ]

    Sent: Monday, November 13, 2017 11:22 AM
    To: Michael Fanning < Michael.Fanning@microsoft.com >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    Rather than introducing a JSONPath parser to parse references such as
    codeFlow[0].annotatedCodeLocations[2] .physicalLocation , I prefer your suggested alternative: if you want to link from the text of a message to a physical location that already appears
    in (for example) a codeFlow, then recapitulate the location in the relatedLocations array.
     


    From: Michael Fanning [ mailto:Michael.Fanning@microsoft.com ]

    Sent: Monday, November 13, 2017 10:21 AM
    To: Larry Golding (Comcast) < larrygolding@comcast.net >;
    sarif@lists.oasis-open.org
    Subject: RE: [sarif] Alternatives for embedding links


     
    Thanks, Larry. To clarify your last point, we’d really like the TC to make a call on approach before the next TC, so that we can announce to the TC at next meeting that spec language for this issue is ready for final review (to be voted
    on in the TC following).
     
    My opinion on this is that we should stay away from the mini-language for the reasons you’ve cited. As far as the second option is concerned, I have a question: should we provide the ability for an embedded link to refer to a stack frame
    location, or a specific annotated code location provided in a code flow. This latter possibility occurred to me when looking at report output that Paul recently provided (related to the markdown vs. plain-text question).
     
     
    [here]{relatedLocations[0]}"
     
    [here]{stacks[0].frames[2]. physicalLocation }"
     
    [here]{codeFlow[0].annotatedCodeLocations[2] .physicalLocation }"
     
    Alternately, we could require people to recapitulate any location of interest in ‘related locations’, even if it originates with a stack or code flow.
     
    Pros :

    Any physical location that can be encapsulated within a result can be embedded as a link Click on a stack or code flow link could invoke the stack or code flow viewer experience (rather than simply opening the relevant file. As you can see, by the way, I think we should try to render this content as part as possibly as JSON.
     
    Cons:

    It’s a mini-language. The relatedLocations array dereference returns an actual physicalLocation (because relatedLocations is an array of them). We could drop .physicalLocation from the stacks/code flow information to be more
    concise (arguably it’s clearer to have the explicit reference).
     
    NOTE: stack frame doesn’t actually have a physical location, various physical location members are inlined into the frame. This was done because it doesn’t usually make sense to
    have the full expressivity of a region when referring to a stack frame. I think this was probably a mistake and we should reconsider. I opened a github

    issue to track this. Note that while a breaking change, it would be easy enough to transform the current SARIF into a correct form, if we decide to take it.
     
    Michael
     


    From: sarif@lists.oasis-open.org [ mailto:sarif@lists.oasis-open.org ]
    On Behalf Of Larry Golding (Comcast)
    Sent: Thursday, November 9, 2017 4:03 PM
    To: sarif@lists.oasis-open.org
    Subject: [sarif] Alternatives for embedding links


     
    Hello all,
     
    Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup
    such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.
     
    Both options involve using a syntax borrowed from Markdown to specify the link:
    [ link text ]( link target ) . They differ in how
    link target is expressed.
    Option 1: Mini-language
    The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see
    Section 3.19 of the spec), with three properties:
     

    A
    uri property, which is what it sounds like: the URI of the source file. A
    region property, which specifies a region within the source file, using properties such as
    startLine and
    startColumn . For a full explanation of
    region , see Section 3.20 – but you don’t need to understand those details to understand this option.
     

    A
    uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to
    understand this option.
    This option looks
    like this :
     
    {
     
    "version" :
    "1.0.0" ,
     
    "runs" : [
        {
         
    "tool" : {
           
    "name" :
    "TaintTracker"
          },
     
         
    "results" : [
            {
             
    "ruleId" :
    "CA2001" ,
             
    "locations" : [
                {
                          "analysisTarget" :
    {
                            "uriBaseId" :
    "SRCROOT" ,
                            "uri" :
    "src/db/sql.cs" ,
                            "region" :
    {
                              "startLine" :
    63,
                              "startColumn" :
    12,
                              "endColumn" :
    18
                           }
                  }
                }
              ],
             
    "message" :
    "Tainted data is used to execute a SQL command. The data entered the system
    [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "
            }
          ]
        }
      ]
    }
     
    The link text is “ here ”, and the
    link target is expressed in the mini-language as follows:
     
    $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,

     
    $(SRCROOT) refers to the
    uriBaseId ,
    src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your
    SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.
     
    This design has an interesting consequence: since the mini-language specifies everything that a
    physicalLocation object specifies, we could consider
    removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:
     
    {
     
    "version" :
    "1.0.0" ,
     
    "runs" : [
        {
         
    "tool" : {
           
    "name" :
    "TaintTracker"
          },
     
         
    "results" : [
            {
             
    "ruleId" :
    "CA2001" ,
             
    "locations" : [
                {
                          "analysisTarget" :
    "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"
                }
              ],
             
    "message" :
    "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"
            }
          ]
        }
      ]
    }
     
    The analysisTarget property, whose value was previously a
    physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not
    require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the
    physicalLocation object on the one hand, and the mini-language on the other).
    Option 2: Index into relatedLocations
    The second option expresses the link target as an index into the
    result.relatedLocations array. To understand this option, you need to know that SARIF defines a property
    relatedLocations on the
    result object. Section 3.17.12 explains that this property contains:
     
    … an array of one or more unique (§3.9)
    annotatedCodeLocation objects (§3.25), each of which represents
    a location relevant to understanding the result .
     
    In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks
    like this :
     
    {
     
    "version" :
    "1.0.0" ,
     
    "runs" : [
        {
         
    "tool" : {
           
    "name" :
    "TaintTracker"
          },
     
         
    "results" : [
            {
             
    "ruleId" :
    "CA2001" ,
             
    "locations" : [
                {
                 
    "analysisTarget" : {
                   
    "uriBaseId" :
    "SRCROOT" ,
                   
    "uri" :
    "src/db/sql.cs" ,
                   
    "region" : {
                     
    "startLine" : 63,
                     
    "startColumn" : 12,
                     
    "endColumn" : 18
                    }
                  }
                }
              ],
             
    "message" :
    "Tainted data is used to execute a SQL command. The data entered the system
    [here](0) " ,
             
    "relatedLocations" : [
                {
                 
    "message" :
    "source of tainted data" ,
                 
    "physicalLocation" : {
                   
    "uriBaseId" :
    "SRCROOT" ,
                   
    "uri" :
    "src/ui/input.cs" ,
                   
    "region" : {
                     
    "startLine" : 20,
                     
    "startColumn" : 4
                    }
                  }
                }
              ]
            }
          ]
        }
      ]
    }
     
    The link text is “ here ”, and the
    link target is expressed as an index into the
    relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the
    message property of the
    annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0.
    Comparison of the options
    Option 1 (mini-language) has these advantages:

    It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.
     
    Option 2 (index into relatedLocations) has these advantages:

    It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains
    physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should
    consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.
     
    #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in
    themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the
    run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.
     
    We will discuss this further at the next TC meeting.
     
    Thanks for reading all of this!
    Larry






  • 11.  Re: [sarif] Alternatives for embedding links

    Posted 11-13-2017 19:02
    Hello all, To add a little bit different perspective on the available options, the approach taken by TOIF involves a handful of semantic elements to identify “location relevant to understanding the results”.  While the main elements are Finding and Code location, TOIF also introduced two so-called “semantic elements”: Statement and Data Element. There are at least three important “roles” of a statement with regards to a finding (this assumes that many findings involve a certain dataflow):  - sink of a finding. A sink statement is part of a “necessary condition” of the finding. For example, in a buffer overflow a “sink” is a statement that accesses a buffer. - source of a finding. A source statement is part of the “sufficient condition” of the finding.  - transit statement. A transit statement is part of the dataflow between the source(s) and the sink. Note that this approach is used in the Julliet test set. One advantage of this approach is that it makes the roles of code locations explicit as an integral part of the specification. Compliant tools can then introduce visual markup for the presentation purposes in a safe way.   Identifying semantic element relevant to a finding cannot in general be done in a non-intrusive modification of an SCA tool, most “legacy” SCA tools do not output details of the statements involved in a finding, only code locations.  Here are some definitions taken from the TOIF spec, followed by a couple of examples. Statement Definition: An basic identifieable unit of behavior in software such as a source code statement, a basic block, a operator. Note : this corresponds to KDM ActionElement class Note : Defined in Figure 7. UML class diagram Semantic Statement Statement has code location Statement is involved in Finding Synonym: Finding is associated with statement Possibility: each Finding may be associated with many statement Statement is part of sink of Finding Note : In is a stronger form of the fact type Statement is involved in Finding where the role of Statement with respect to the Logical Weakness Model is known (i.e. sink) Statement is part of source of Finding Note : In is a stronger form of the fact type Statement is involved in Finding where the role of Statement with respect to the Logical Weakness Model is known (i.e. source) Statement is preceded by Statement Data  element Definition: An basic identifieable data item is software such as global and local variables, records, formal parameters and constants. Note : This corresponds to the KDM DataElement class Note : Defined in Figure 8. UML class diagram Semantic Data Data    element is defined at Code Location  Data    element is involved in Finding Data    element has name Data    element is involved in Statement  An example of using this “semantic markdown” in TOIF: <fact xmi:type= toif:Statement xmi:id= s10 /> <fact xmi:type= toif:StatementIsInvolvedInFinding statement= s10 finding= f10 /> <fact xmi:type= toif:StatementIsSinkOfFinding statement= s20 finding= f20 /> <fact xmi:type= toif:StatementIsSourceOfFinding statement= s30 finding= f30 /> <fact xmi:type= toif:StatementPrecedesStatement statement1= s30 statement2= s20 /> <fact xmi:type= toif:StatementPrecedesStatement statement1= s10 statement2= s20 /> <fact xmi:type= toif:StatementHasCodeLocation statement1= s10 location= loc10 /> <fact xmi:type= toif:Statement xmi:id= s20 > <description text=”*pHandler( pData, 0x200 );” /> </fact> <fact xmi:type= toif:Statement xmi:id= s30 /> <fact xmi:type= toif:Finding xmi:id= f10 /> <fact xmi:type= toif:Finding xmi:id= f20 /> <fact xmi:type= toif:CodeLocation xmi:id= loc10 > <linenumber linenumber= 1856 /> </fact> <fact xmi:type= toif:DataElement xmi:id= d10 > <name name= X /> <description text=”struct pData * X[ MAXDATA];” /> </fact> <fact xmi:type= toif:DataIsInvolvedInFinding data= finding= f10 /> <fact xmi:type= toif:DataIsInvolvedInFinding data= project= f20 /> <fact xmi:type= toif:DataIsInvolvedInStatement data= statement= s20 /> <fact xmi:type= toif:DataIsInvolvedInStatement data= statement= s30 /> <fact xmi:type= toif:DataIsDefinedAtCodeLocation data= location= loc10 /> <fact xmi:type= toif:Statement xmi:id= s20 /> <fact xmi:type= toif:Statement xmi:id= s30 /> <fact xmi:type= toif:Finding xmi:id= f10 /> <fact xmi:type= toif:Finding xmi:id= f20 /> <fact xmi:type= toif:CodeLocation xmi:id= loc10 > <linenumber linenumber= 1856 /> </fact> Cheers, Nick On Nov 9, 2017, at 7:02 PM, Larry Golding (Comcast) < larrygolding@comcast.net > wrote: Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   version : 1.0.0 ,   runs : [     {       tool : {         name : TaintTracker       },         results : [         {           ruleId : CA2001 ,           locations : [             {                       analysisTarget : {                         uriBaseId : SRCROOT ,                         uri : src/db/sql.cs ,                         region : {                           startLine : 63,                           startColumn : 12,                           endColumn : 18                        }               }             }           ],           message : Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   version : 1.0.0 ,   runs : [     {       tool : {         name : TaintTracker       },         results : [         {           ruleId : CA2001 ,           locations : [             {                       analysisTarget : $(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18             }           ],           message : Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   version : 1.0.0 ,   runs : [     {       tool : {         name : TaintTracker       },         results : [         {           ruleId : CA2001 ,           locations : [             {               analysisTarget : {                 uriBaseId : SRCROOT ,                 uri : src/db/sql.cs ,                 region : {                   startLine : 63,                   startColumn : 12,                   endColumn : 18                 }               }             }           ],           message : Tainted data is used to execute a SQL command. The data entered the system [here](0) ,           relatedLocations : [             {               message : source of tainted data ,               physicalLocation : {                 uriBaseId : SRCROOT ,                 uri : src/ui/input.cs ,                 region : {                   startLine : 20,                   startColumn : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry


  • 12.  RE: [sarif] Alternatives for embedding links

    Posted 11-13-2017 23:01
    Hi Nikolai,   Thanks for the information. SARIF also has a way to represent the “role” played by each statement in a code flow, but it’s not exactly the same as TOIF. To explain this, I need to present a brief summary of the relevant SARIF objects and properties:   A result object (Section 3.17) is similar to a TOIF “finding”. Among other properties, it may contain: A property named codeFlows (Section 3.17.10), whose value is an array of 1 or more codeFlow objects. A property named relatedLocations (Section 3.17.12), whose value is an array of one or more annotatedCodeLocation objects (see below), “each of which represents a location relevant to understanding the result”. A codeFlow object (Section 3.22) “specifies a possible execution path through the code.” Among other properties, it contains: A property named locations (Section 3.22.3), whose value is an array of 1 or more annotatedCodeLocation objects (see below), ordered according to the flow of execution. An annotatedCodeLocation object (Section 3.25) “represents a physical location together with additional information relevant to the use of the location in a particular context.” Among other properties, it contains: A property named physicalLocation (Section 3.25.3), whose value is a physicalLocation object. A property named kind (Section 3.25.9), whose value is one of a set of strings such as assignment , branch , call , callReturn , etc . SARIF log file viewers can use the call/return information to render a code flow as a tree, where the nesting represents the call/return structure. A property named taintKind (Section 3.25.13), whose value is “a string which classifies state transitions in code locations relevant to taint analysis”. The value is either source (a location where “untrusted data enters the system”), sink (a location where “untrusted data enters some security-sensitive code”), or sanitizer (a location after the execution of which the data “is presumed to be safe”).   With that background, I make a couple of observations:   Because result.relatedLocations is an array of annotatedCodeLocation objects, any related location can include information that describes the location, such as kind or taintKind . That is, a related location can include this information even if it is not part of a code flow . However, neither SARIF’s kind property nor its taintKind property has exactly the same semantics as the TOIF relationships “source of a finding,” “sink of a finding,” and “transit statement”. SARIF’s kind property describes what sort of statement this is (assignment, function call, etc.), while its taintKind property is specific to a particular kind of tool that performs “taint analysis”. Having said that, we could certainly add another property named (for example) “ role ” to the annotatedCodeLocation object, whose value would be one of resultSource , resultSink , or transitStatement . That would duplicate the semantics of the corresponding TOIF relationships. Here is my example from the discussion of Option 2, below, enhanced to show the hypothetical role property (as well as the similar-but-distinct kind and taintKind properties):   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0)" ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "role" : "resultSource" ,               "kind" : "call" ,               "taintKind": "source",               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   Larry   From: Nikolai Mansourov [mailto:nick@kdmanalytics.com] Sent: Monday, November 13, 2017 11:02 AM To: Larry Golding (Comcast) <larrygolding@comcast.net> Cc: sarif@lists.oasis-open.org Subject: Re: [sarif] Alternatives for embedding links   Hello all,   To add a little bit different perspective on the available options, the approach taken by TOIF involves a handful of semantic elements to identify “location relevant to understanding the results”.    While the main elements are Finding and Code location, TOIF also introduced two so-called “semantic elements”: Statement and Data Element.   There are at least three important “roles” of a statement with regards to a finding (this assumes that many findings involve a certain dataflow):  - sink of a finding. A sink statement is part of a “necessary condition” of the finding. For example, in a buffer overflow a “sink” is a statement that accesses a buffer. - source of a finding. A source statement is part of the “sufficient condition” of the finding.  - transit statement. A transit statement is part of the dataflow between the source(s) and the sink.   Note that this approach is used in the Julliet test set.   One advantage of this approach is that it makes the roles of code locations explicit as an integral part of the specification. Compliant tools can then introduce visual markup for the presentation purposes in a safe way.     Identifying semantic element relevant to a finding cannot in general be done in a non-intrusive modification of an SCA tool, most “legacy” SCA tools do not output details of the statements involved in a finding, only code locations.    Here are some definitions taken from the TOIF spec, followed by a couple of examples.   Statement Definition: An basic identifieable unit of behavior in software such as a source code statement, a basic block, a operator. Note : this corresponds to KDM ActionElement class Note : Defined in Figure 7. UML class diagram Semantic Statement Statement has code location Statement is involved in Finding Synonym: Finding is associated with statement Possibility: each Finding may be associated with many statement Statement is part of sink of Finding Note : In is a stronger form of the fact type Statement is involved in Finding where the role of Statement with respect to the Logical Weakness Model is known (i.e. sink) Statement is part of source of Finding Note : In is a stronger form of the fact type Statement is involved in Finding where the role of Statement with respect to the Logical Weakness Model is known (i.e. source) Statement is preceded by Statement Data element Definition: An basic identifieable data item is software such as global and local variables, records, formal parameters and constants. Note : This corresponds to the KDM DataElement class Note : Defined in Figure 8. UML class diagram Semantic Data Data    element is defined at Code Location  Data    element is involved in Finding Data    element has name Data    element is involved in Statement    An example of using this “semantic markdown” in TOIF:   <fact xmi:type="toif:Statement" xmi:id="s10"/> <fact xmi:type="toif:StatementIsInvolvedInFinding" statement="s10" finding="f10"/> <fact xmi:type="toif:StatementIsSinkOfFinding" statement="s20" finding="f20"/> <fact xmi:type="toif:StatementIsSourceOfFinding" statement="s30" finding="f30"/> <fact xmi:type="toif:StatementPrecedesStatement" statement1="s30" statement2="s20"/> <fact xmi:type="toif:StatementPrecedesStatement" statement1="s10" statement2="s20"/> <fact xmi:type="toif:StatementHasCodeLocation" statement1="s10" location="loc10"/> <fact xmi:type="toif:Statement" xmi:id="s20"> <description text=”*pHandler( pData, 0x200 );” /> </fact> <fact xmi:type="toif:Statement" xmi:id="s30"/> <fact xmi:type="toif:Finding" xmi:id="f10"/> <fact xmi:type="toif:Finding" xmi:id="f20"/> <fact xmi:type="toif:CodeLocation" xmi:id="loc10"> <linenumber linenumber="1856"/> </fact>     <fact xmi:type="toif:DataElement" xmi:id="d10"> <name name="X"/> <description text=”struct pData * X[ MAXDATA];” /> </fact> <fact xmi:type="toif:DataIsInvolvedInFinding" data= finding="f10"/> <fact xmi:type="toif:DataIsInvolvedInFinding" data= project="f20"/> <fact xmi:type="toif:DataIsInvolvedInStatement" data= statement="s20"/> <fact xmi:type="toif:DataIsInvolvedInStatement" data= statement="s30"/> <fact xmi:type="toif:DataIsDefinedAtCodeLocation" data= location="loc10"/> <fact xmi:type="toif:Statement" xmi:id="s20"/> <fact xmi:type="toif:Statement" xmi:id="s30"/> <fact xmi:type="toif:Finding" xmi:id="f10"/> <fact xmi:type="toif:Finding" xmi:id="f20"/> <fact xmi:type="toif:CodeLocation" xmi:id="loc10"> <linenumber linenumber="1856"/> </fact>   Cheers, Nick   On Nov 9, 2017, at 7:02 PM, Larry Golding (Comcast) < larrygolding@comcast.net > wrote:   Hello all,   Yesterday, I took an action to describe to you the two options we have discussed for embedding links to source files within SARIF message properties. Both options will work whether the message is plain text or contains formatting markup such as Markdown; that is, the “embedded links” proposal is independent of the “messages with formatting” proposal.   Both options involve using a syntax borrowed from Markdown to specify the link: [ link text ]( link target ) . They differ in how link target is expressed. Option 1: Mini-language The first option expresses the properties of the link target using a string in a “mini-language”. To understand this option, you need to know that SARIF defines a “ physicalLocation ” object (see Section 3.19 of the spec), with three properties:   A uri property, which is what it sounds like: the URI of the source file. A region property, which specifies a region within the source file, using properties such as startLine and startColumn . For a full explanation of region , see Section 3.20 – but you don’t need to understand those details to understand this option.   A uriBaseId property, which, if the URI is relative, indirectly specifies an absolute path upon which the relative URI is based. This is subtle; please see Section 3.3 for a full explanation, although you don’t really need to understand the details to understand this option. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : {                         "uriBaseId" : "SRCROOT" ,                         "uri" : "src/db/sql.cs" ,                         "region" : {                           "startLine" : 63,                           "startColumn" : 12,                           "endColumn" : 18                        }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data') "         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed in the mini-language as follows:   $(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data' ,   $(SRCROOT) refers to the uriBaseId , src/ui/input.cs is the uri , and the thing that looks like a URI fragment (starting with “#”) specifies the region, along with a “hover message”. The idea of the hover message is that if you click the link, your SARIF viewer application would open the specified file and highlight the region. Then, if you hovered your mouse over the region, the specified message would appear as the hover text.   This design has an interesting consequence: since the mini-language specifies everything that a physicalLocation object specifies, we could consider removing the physicalLocation object from the standard, and replacing it with a string in that format. Then the example above would appear as follows:   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {                       "analysisTarget" : "$(SRCROOT)/src/db/sql.cs#startLine=63,startColumn=12,endColumn=18"             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here]($(SRCROOT)src/ui/input.cs#startLine=20,startColumn=4,message='source of tainted data')"         }       ]     }   ] }   The analysisTarget property, whose value was previously a physicalLocation object, is now a string expressed in the mini-language. The introduction of the mini-language does not require us to remove the physicalLocation object, but Michael has argued that the spec should not have two different ways to express the same thing (the physicalLocation object on the one hand, and the mini-language on the other). Option 2: Index into relatedLocations The second option expresses the link target as an index into the result.relatedLocations array. To understand this option, you need to know that SARIF defines a property relatedLocations on the result object. Section 3.17.12 explains that this property contains:   … an array of one or more unique (§3.9) annotatedCodeLocation objects (§3.25), each of which represents a location relevant to understanding the result .   In this example, the location where the tainted data entered the system is “relevant to understanding the result”, so it makes sense in SARIF to express it as a “related location”. This option looks like this :   {   "version" : "1.0.0" ,   "runs" : [     {       "tool" : {         "name" : "TaintTracker"       },         "results" : [         {           "ruleId" : "CA2001" ,           "locations" : [             {               "analysisTarget" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/db/sql.cs" ,                 "region" : {                   "startLine" : 63,                   "startColumn" : 12,                   "endColumn" : 18                 }               }             }           ],           "message" : "Tainted data is used to execute a SQL command. The data entered the system [here](0) " ,           "relatedLocations" : [             {               "message" : "source of tainted data" ,               "physicalLocation" : {                 "uriBaseId" : "SRCROOT" ,                 "uri" : "src/ui/input.cs" ,                 "region" : {                   "startLine" : 20,                   "startColumn" : 4                 }               }             }           ]         }       ]     }   ] }   The link text is “ here ”, and the link target is expressed as an index into the relatedLocations array. Note that in this option, the “hover message” (“ source of tainted data ”) appears as the message property of the annotatedCodeLocation object in the relatedLocations array. In this example, there is only one related location, so the index is 0. Comparison of the options Option 1 (mini-language) has these advantages: It makes the SARIF file more compact (although that isn’t actually a design goal for SARIF). It enables someone reading the raw SARIF file to see the link target directly in the context of the message.   Option 2 (index into relatedLocations) has these advantages: It does not introduce a mini-language (except for the “embedded link” syntax itself, but that occurs in both options). It retains physicalLocation as a structured JSON object. It’s generally undesirable to introduce mini-languages. After all, SARIF consumers already use a (presumably) highly reliable JSON parser to read the SARIF file; why should consumers need an additional parser to crack the mini-language? It takes advantage of an existing SARIF facility (relatedLocations) which has exactly the semantics we need here (a location “relevant to understanding the result”). It avoids the need to define an escaping mechanism for characters (such as ‘#’ or ‘(‘) which, in the mini-language version, might appear in the “message” parameter of the “fragment”. It avoids the parsing needed to identify the “fragment” that specifies the region (as opposed to the “real” fragment; see #5). It avoids depending on the constraint that the “real” fragment in a URI specifying a nested file begin with a “/”.   #4 and #5 in this list require you to understand how SARIF represents locations within “nested files” (for example, files within a ZIP archive). I can explain this in more detail if necessary, but if you find #1 through #3 persuasive in themselves, I won’t bother. If you’re interested, you can see Section 3.12.9, which describes the run.files property. Look for the paragraph that starts “In some cases, a file might be nested within another file”.   We will discuss this further at the next TC meeting.   Thanks for reading all of this! Larry