OASIS Static Analysis Results Interchange Format (SARIF) TC

  • 1.  "uri" vs. "location"

    Posted 05-29-2018 18:51
    At TC #17, while discussing Issue #163 (“Add result.workItemLocation”), I pointed out that some “URI-valued” properties are represented as strings, while others are represented as fileLocation objects. I took an action to decide on a policy for deciding which type to use, and filed Issue #175 , “Decide on policy for fileLocation vs. URI,” where I wrote: We have two URI-like properties that are represented by URI-valued strings: ·          tool.downloadUri ·          versionControlDetails.uri  (the URI of the repo) ... while all the others (14 of them) are represented by  fileLocation  objects, including: ·          result.workItemLocation ·          rule.helpLocation Basically I've used  fileLocation  where there are potentially "many" of something (work items, rules), and URI-valued strings where there's only one of something (tool download location, VCS repo). Is this a valid distinction? Even if you can rationalize it, is it confusing? Should we use  fileLocation everywhere?   Having thought about this some more, I think the approach we’ve taken so far makes sense: If there are potentially many of something, use fileLocation ; if there’s only one of it, use a URI-reference-valued string.   I’m going to propose this at TC #19 (it’s too late to put it on the agenda for tomorrow’s TC #18).   Larry


  • 2.  RE: [sarif] "uri" vs. "location"

    Posted 05-29-2018 19:45
    I’m not sure I agree with this distinction. I think the location concepts makes most sense when we expect a URI (particularly a file URI) that might have a non-deterministic root.   We don’t expect any of the things you’ve cited below to be rooted in a non-deterministic way (off an enlistment, for example). So on first blush I’d expect all these things to be URIs.   I think I’ve made this point before, though, and we still have locations here, so perhaps I’m forgetting a line of reasoning someone put forward that’s relevant?   From: sarif@lists.oasis-open.org <sarif@lists.oasis-open.org> On Behalf Of Larry Golding (Comcast) Sent: Tuesday, May 29, 2018 11:49 AM To: sarif@lists.oasis-open.org Subject: [sarif] "uri" vs. "location"   At TC #17, while discussing Issue #163 (“Add result.workItemLocation”), I pointed out that some “URI-valued” properties are represented as strings, while others are represented as fileLocation objects. I took an action to decide on a policy for deciding which type to use, and filed Issue #175 , “Decide on policy for fileLocation vs. URI,” where I wrote: We have two URI-like properties that are represented by URI-valued strings: ·          tool.downloadUri ·          versionControlDetails.uri  (the URI of the repo) ... while all the others (14 of them) are represented by  fileLocation  objects, including: ·          result.workItemLocation ·          rule.helpLocation Basically I've used  fileLocation  where there are potentially "many" of something (work items, rules), and URI-valued strings where there's only one of something (tool download location, VCS repo). Is this a valid distinction? Even if you can rationalize it, is it confusing? Should we use  fileLocation everywhere?   Having thought about this some more, I think the approach we’ve taken so far makes sense: If there are potentially many of something, use fileLocation ; if there’s only one of it, use a URI-reference-valued string.   I’m going to propose this at TC #19 (it’s too late to put it on the agenda for tomorrow’s TC #18).   Larry


  • 3.  RE: [sarif] "uri" vs. "location"

    Posted 05-29-2018 20:34
    Yes, we have discussed it before. My counter-argument was that although we invented “URI base ids” to improve determinism, they have the nice side effect of reducing repetition and file size, even for deterministic URIs. For example:   {                                                                  # A run object   "uriBaseIds": {     "WORKITEMS": "https://github.com/Microsoft/sarif-sdk/issues"   # Deterministic, but nice to DRY out.   },     "results": [     {       "workItemLocation": {         "uriBaseId": "WORKITEMS",         "uri": "895"       }     }   ] }   That’s why I think “one/many” is an appropriate distinction. BUT! Even a singleton URI needs to be represented as a fileLocation if it’s non-deterministic. At the moment, there is no example in the spec of a property that’s both a singleton and non-deterministic. But to guard against that possibility, I’m ok with stating the guideline as:   Use fileLocation if the URI describes something of which there are potentially many per run (for example, a GitHub issue), OR if the URI is non-deterministic (it might vary from machine to machine or from run to run). Use a URI string if the URI describes something of which there is only one per run, AND is deterministic (for example, a GitHub repo).   Note that there is no spec change involved here if we adopt this policy. If we adopt a different policy (one that requires us to change the type of an existing property), then we need a spec change.   Larry   From: Michael Fanning <Michael.Fanning@microsoft.com> Sent: Tuesday, May 29, 2018 12:45 PM To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org Subject: RE: [sarif] "uri" vs. "location"   I’m not sure I agree with this distinction. I think the location concepts makes most sense when we expect a URI (particularly a file URI) that might have a non-deterministic root.   We don’t expect any of the things you’ve cited below to be rooted in a non-deterministic way (off an enlistment, for example). So on first blush I’d expect all these things to be URIs.   I think I’ve made this point before, though, and we still have locations here, so perhaps I’m forgetting a line of reasoning someone put forward that’s relevant?   From: sarif@lists.oasis-open.org < sarif@lists.oasis-open.org > On Behalf Of Larry Golding (Comcast) Sent: Tuesday, May 29, 2018 11:49 AM To: sarif@lists.oasis-open.org Subject: [sarif] "uri" vs. "location"   At TC #17, while discussing Issue #163 (“Add result.workItemLocation”), I pointed out that some “URI-valued” properties are represented as strings, while others are represented as fileLocation objects. I took an action to decide on a policy for deciding which type to use, and filed Issue #175 , “Decide on policy for fileLocation vs. URI,” where I wrote: We have two URI-like properties that are represented by URI-valued strings: ·          tool.downloadUri ·          versionControlDetails.uri  (the URI of the repo) ... while all the others (14 of them) are represented by  fileLocation  objects, including: ·          result.workItemLocation ·          rule.helpLocation Basically I've used  fileLocation  where there are potentially "many" of something (work items, rules), and URI-valued strings where there's only one of something (tool download location, VCS repo). Is this a valid distinction? Even if you can rationalize it, is it confusing? Should we use  fileLocation everywhere?   Having thought about this some more, I think the approach we’ve taken so far makes sense: If there are potentially many of something, use fileLocation ; if there’s only one of it, use a URI-reference-valued string.   I’m going to propose this at TC #19 (it’s too late to put it on the agenda for tomorrow’s TC #18).   Larry


  • 4.  RE: [sarif] "uri" vs. "location"

    Posted 05-29-2018 21:17
    Ouch, you just pointed out an issue. We defined uriBaseIds in part as a mechanism to allow a post-processing tool to elide non-deterministic information. If people simply use this mechanism to DRY out their URLs, they potentially break their absolute URLs without providing an obvious way to restore them (as people can do for version controlled scenarios in some cases by pointing to a local enlistment).   You have a case for this being a useful distinction but it is a non-obvious one in the spec.   I think my personal preference here is that it’s much cleaner to separate things we expect to be absolute (named xxxUri) vs. things we expect to be rebased (fileLocation). Yes, we lose the convenience of reducing log file size but I think it’s a useful trade-off.   Michael   From: Larry Golding (Comcast) <larrygolding@comcast.net> Sent: Tuesday, May 29, 2018 1:32 PM To: Michael Fanning <Michael.Fanning@microsoft.com>; sarif@lists.oasis-open.org Subject: RE: [sarif] "uri" vs. "location"   Yes, we have discussed it before. My counter-argument was that although we invented “URI base ids” to improve determinism, they have the nice side effect of reducing repetition and file size, even for deterministic URIs. For example:   {                                                                  # A run object   "uriBaseIds": {     "WORKITEMS": " https://github.com/Microsoft/sarif-sdk/issues "   # Deterministic, but nice to DRY out.   },     "results": [     {       "workItemLocation": {         "uriBaseId": "WORKITEMS",         "uri": "895"       }     }   ] }   That’s why I think “one/many” is an appropriate distinction. BUT! Even a singleton URI needs to be represented as a fileLocation if it’s non-deterministic. At the moment, there is no example in the spec of a property that’s both a singleton and non-deterministic. But to guard against that possibility, I’m ok with stating the guideline as:   Use fileLocation if the URI describes something of which there are potentially many per run (for example, a GitHub issue), OR if the URI is non-deterministic (it might vary from machine to machine or from run to run). Use a URI string if the URI describes something of which there is only one per run, AND is deterministic (for example, a GitHub repo).   Note that there is no spec change involved here if we adopt this policy. If we adopt a different policy (one that requires us to change the type of an existing property), then we need a spec change.   Larry   From: Michael Fanning < Michael.Fanning@microsoft.com > Sent: Tuesday, May 29, 2018 12:45 PM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] "uri" vs. "location"   I’m not sure I agree with this distinction. I think the location concepts makes most sense when we expect a URI (particularly a file URI) that might have a non-deterministic root.   We don’t expect any of the things you’ve cited below to be rooted in a non-deterministic way (off an enlistment, for example). So on first blush I’d expect all these things to be URIs.   I think I’ve made this point before, though, and we still have locations here, so perhaps I’m forgetting a line of reasoning someone put forward that’s relevant?   From: sarif@lists.oasis-open.org < sarif@lists.oasis-open.org > On Behalf Of Larry Golding (Comcast) Sent: Tuesday, May 29, 2018 11:49 AM To: sarif@lists.oasis-open.org Subject: [sarif] "uri" vs. "location"   At TC #17, while discussing Issue #163 (“Add result.workItemLocation”), I pointed out that some “URI-valued” properties are represented as strings, while others are represented as fileLocation objects. I took an action to decide on a policy for deciding which type to use, and filed Issue #175 , “Decide on policy for fileLocation vs. URI,” where I wrote: We have two URI-like properties that are represented by URI-valued strings: ·          tool.downloadUri ·          versionControlDetails.uri  (the URI of the repo) ... while all the others (14 of them) are represented by  fileLocation  objects, including: ·          result.workItemLocation ·          rule.helpLocation Basically I've used  fileLocation  where there are potentially "many" of something (work items, rules), and URI-valued strings where there's only one of something (tool download location, VCS repo). Is this a valid distinction? Even if you can rationalize it, is it confusing? Should we use  fileLocation everywhere?   Having thought about this some more, I think the approach we’ve taken so far makes sense: If there are potentially many of something, use fileLocation ; if there’s only one of it, use a URI-reference-valued string.   I’m going to propose this at TC #19 (it’s too late to put it on the agenda for tomorrow’s TC #18).   Larry


  • 5.  RE: [sarif] "uri" vs. "location"

    Posted 05-29-2018 21:38
    Ouch indeed. In fact, now I remember we even discussed the possibility of adding another property that would function just like run.uriBaseIds , except the values would be required to be deterministic (say, run.deterministicUriBaseIds ).   We rejected that option. But apparently we didn’t come to a common understanding of what it meant to “reject” it. You thought it meant “Don’t put deterministic URIs in run.uriBaseIds .” I thought it meant “Don’t define a separate property for deterministic URIs; just shove them in run.uriBaseIds ”.   Unfortunately for me, your understanding was the correct one. It has to be, for the reason you point out: A post-processing tool that converts a SARIF file to deterministic form will omit run.uriBaseIds , and then I can’t find my work items any more.   So we agree that we use fileLocation only for non-deterministic URIs. So, as part of this issue (#175), I now have to go through the spec and fix up the properties that incorrectly use fileLocation . I’ll have that change draft ready for TC #19.   Larry   From: Michael Fanning <Michael.Fanning@microsoft.com> Sent: Tuesday, May 29, 2018 2:17 PM To: Larry Golding (Comcast) <larrygolding@comcast.net>; sarif@lists.oasis-open.org Subject: RE: [sarif] "uri" vs. "location"   Ouch, you just pointed out an issue. We defined uriBaseIds in part as a mechanism to allow a post-processing tool to elide non-deterministic information. If people simply use this mechanism to DRY out their URLs, they potentially break their absolute URLs without providing an obvious way to restore them (as people can do for version controlled scenarios in some cases by pointing to a local enlistment).   You have a case for this being a useful distinction but it is a non-obvious one in the spec.   I think my personal preference here is that it’s much cleaner to separate things we expect to be absolute (named xxxUri) vs. things we expect to be rebased (fileLocation). Yes, we lose the convenience of reducing log file size but I think it’s a useful trade-off.   Michael   From: Larry Golding (Comcast) < larrygolding@comcast.net > Sent: Tuesday, May 29, 2018 1:32 PM To: Michael Fanning < Michael.Fanning@microsoft.com >; sarif@lists.oasis-open.org Subject: RE: [sarif] "uri" vs. "location"   Yes, we have discussed it before. My counter-argument was that although we invented “URI base ids” to improve determinism, they have the nice side effect of reducing repetition and file size, even for deterministic URIs. For example:   {                                                                  # A run object   "uriBaseIds": {     "WORKITEMS": " https://github.com/Microsoft/sarif-sdk/issues "   # Deterministic, but nice to DRY out.   },     "results": [     {       "workItemLocation": {         "uriBaseId": "WORKITEMS",         "uri": "895"       }     }   ] }   That’s why I think “one/many” is an appropriate distinction. BUT! Even a singleton URI needs to be represented as a fileLocation if it’s non-deterministic. At the moment, there is no example in the spec of a property that’s both a singleton and non-deterministic. But to guard against that possibility, I’m ok with stating the guideline as:   Use fileLocation if the URI describes something of which there are potentially many per run (for example, a GitHub issue), OR if the URI is non-deterministic (it might vary from machine to machine or from run to run). Use a URI string if the URI describes something of which there is only one per run, AND is deterministic (for example, a GitHub repo).   Note that there is no spec change involved here if we adopt this policy. If we adopt a different policy (one that requires us to change the type of an existing property), then we need a spec change.   Larry   From: Michael Fanning < Michael.Fanning@microsoft.com > Sent: Tuesday, May 29, 2018 12:45 PM To: Larry Golding (Comcast) < larrygolding@comcast.net >; sarif@lists.oasis-open.org Subject: RE: [sarif] "uri" vs. "location"   I’m not sure I agree with this distinction. I think the location concepts makes most sense when we expect a URI (particularly a file URI) that might have a non-deterministic root.   We don’t expect any of the things you’ve cited below to be rooted in a non-deterministic way (off an enlistment, for example). So on first blush I’d expect all these things to be URIs.   I think I’ve made this point before, though, and we still have locations here, so perhaps I’m forgetting a line of reasoning someone put forward that’s relevant?   From: sarif@lists.oasis-open.org < sarif@lists.oasis-open.org > On Behalf Of Larry Golding (Comcast) Sent: Tuesday, May 29, 2018 11:49 AM To: sarif@lists.oasis-open.org Subject: [sarif] "uri" vs. "location"   At TC #17, while discussing Issue #163 (“Add result.workItemLocation”), I pointed out that some “URI-valued” properties are represented as strings, while others are represented as fileLocation objects. I took an action to decide on a policy for deciding which type to use, and filed Issue #175 , “Decide on policy for fileLocation vs. URI,” where I wrote: We have two URI-like properties that are represented by URI-valued strings: ·          tool.downloadUri ·          versionControlDetails.uri  (the URI of the repo) ... while all the others (14 of them) are represented by  fileLocation  objects, including: ·          result.workItemLocation ·          rule.helpLocation Basically I've used  fileLocation  where there are potentially "many" of something (work items, rules), and URI-valued strings where there's only one of something (tool download location, VCS repo). Is this a valid distinction? Even if you can rationalize it, is it confusing? Should we use  fileLocation everywhere?   Having thought about this some more, I think the approach we’ve taken so far makes sense: If there are potentially many of something, use fileLocation ; if there’s only one of it, use a URI-reference-valued string.   I’m going to propose this at TC #19 (it’s too late to put it on the agenda for tomorrow’s TC #18).   Larry