My proposal
result.instanceGuid
result.correlationGuid
result.fingerprints
Revisit run.automationId and run.stableId. Replace these with a ‘descriptor’ object that contains a guid, a readable id, and a descriptive string.
[don’t get hung up on my names <g>]
run.automationCorrelationDescriptor
run.correlationDescriptor
run.correlationDescriptor = {
guid = “2374285703485093480983459830938540”,
readableId = “FxCop Nightly Run/Debug Non-Optimized”,
description = “FxCop nightly run produced by XXX build lab for compliance certification process. Contact ‘
mybuildlab@contoso.com’ with any questions.”
}
From: Larry Golding (Comcast) <
larrygolding@comcast.net>
Sent: Friday, May 4, 2018 10:09 AM
To: Michael Fanning <
Michael.Fanning@microsoft.com>; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com>;
sarif@lists.oasis-open.org Subject: RE: [sarif] partialFingerprints: the words the world has been waiting for
Ok, so we’ve agreed on this so far:
result.id , a GUID, unique across every result reported in every run. result.correlationId , a GUID, common to a set of results that are logically identical. result.fingerprints , and array of
calculated values, each of which captures a stable identifier for a set of logically identical results. An array, so that the RMS can improve a fingerprint without losing the old value.
This supports both Michael’s and Yekaterina’s/SCA’s usage scenarios.
Are we closed on this?
Now as to ids: I think there’s only one other id in the spec we should require to be a GUID:
run.id .
As for all the others:
Not
rule.id , for sure. Not
edge.id or node.id – they only need to be unique within a graph, and values like
"e1" or
"n2" are easier to read. Not
graph.id , or graphTraversal.id – they only need to be unique within a result (or, in the case of
graph.id , possibly within a run). Not
run.automationId – it’s intended to be a value that’s meaningful to the engineering system, like a build id. Not
run.stableId – that’s the namespaced thing like
"Nightly security scan/x86" . Not
physicalLocation.id – that’s the integer we use in message substitution sequences like
"{2}" . Not
threadFlow.id – it only needs to be unique within a
codeFlow , so sequential numbers would be fine. Not
notification.id – that should be something human readable like
"RunStarted" . Not
message.resourceId – that should be human readable like
"ErrorUnitializedVariable" .
Larry
From: Michael Fanning <
Michael.Fanning@microsoft.com >
Sent: Friday, May 4, 2018 7:54 AM
To: Larry Golding (Comcast) <
larrygolding@comcast.net >; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >;
sarif@lists.oasis-open.org ; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >
Subject: RE: [sarif] partialFingerprints: the words the world has been waiting for
We have a slot for computed fingerprints, it is result.fingerprints, so I suggest we don’t overload it. This property is an array in part because Yekaterina indicated that in rare cases a fingerprint would change slightly, and it may be
valuable to retain the previously computed version.
Let me repeat a comment that Henny made previously, our success here depends in part on clear semantics around what, exactly, lives in each property. If a correlationId could be either a computed fingerprint or a guid, we could just refer
people to the fingerprints array. If you’re looking for an id, just grab the first one in the array.
I really begin to think it’s cleaner if we redefine ids as guids, where guids make sense in the format, i.e., an entirely arbitrary (i.e., not computed from any results data), opaque, and unique. And leave result.fingerprints and result.partialFingerprints
dedicated to the fingerprints concept.
From: Larry Golding (Comcast) <
larrygolding@comcast.net >
Sent: Thursday, May 3, 2018 3:33 PM
To: Michael Fanning <
Michael.Fanning@microsoft.com >; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >;
sarif@lists.oasis-open.org ; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >
Subject: RE: [sarif] partialFingerprints: the words the world has been waiting for
Thanks for reading! Let’s put a pin in your “identifier object” question for a moment so I can ask you:
What do you think of overloading
correlationId so it can hold either an arbitrary identifier (as in your usage scenario)
or a computed “fingerprint” (as in Yekaterina’s SCA scenario) – as opposed to separate
correlationId and
fingerprint(s) properties? Remind me again why “fingerprints” (however we name it or overload it) is plural.
Now as to “identifier object”… as you say, it’s not useful for
result.id or
result.correlationId (because nobody’s going to generate a human-readable equivalent for those). As to
run.automationId and
run.stableId – I don’t see the point of a GUID associated with the namespaced human-readable values for this properties. GUIDs are fine when you have to guarantee uniqueness and there’s no central authority. IMO, within any given team’s engineering system,
there would be no danger of choosing two identifiers with the same human-readable name, but different semantics requiring them to be distinguished – because you have a central authority. The complexity/value trade-off doesn’t work for me here, but I’m open
to persuasion.
Thoughts?
From: Michael Fanning <
Michael.Fanning@microsoft.com >
Sent: Thursday, May 3, 2018 3:14 PM
To: Larry Golding (Comcast) <
larrygolding@comcast.net >; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >;
sarif@lists.oasis-open.org ; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >
Subject: RE: [sarif] partialFingerprints: the words the world has been waiting for
Larry, I read your proposal around id and correlationId and it is clear and makes very good conceptual sense.
These things are easy to describe in the spec: an id is a guid, generated on the fly, which is an identifier that is only good for a specific result in a single log file. The correlationId is a guid that correlates logically unique instances
of a result across multiple log files.
Btw – I am wondering whether we need an identifier object, which explicitly contains a guid, a readable id (which is an arbitrary namespaced label that provides some hierarchy, and a description). This id object would work well for automationId
and for stableId. For id or correlationId, looks less helpful. We might consider renaming these to result.instanceGuid and result.correlationGuid.
From: Larry Golding (Comcast) <
larrygolding@comcast.net >
Sent: Thursday, May 3, 2018 12:43 PM
To: 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >; Michael Fanning <
Michael.Fanning@microsoft.com >;
sarif@lists.oasis-open.org ; 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >
Subject: RE: [sarif] partialFingerprints: the words the world has been waiting for
Of course, we could dispense with
result.fingerprints and just keep result.correlationId , documenting that it could
either be an arbitrary identifier or a calculated fingerprint value.
I’m still not quite clear on whether
correlationId would need to be plural in that case.
From:
sarif@lists.oasis-open.org <
sarif@lists.oasis-open.org >
On Behalf Of Larry Golding (Comcast)
Sent: Thursday, May 3, 2018 11:03 AM
To: 'O'Neil, Yekaterina Tsipenyuk' <
katrina@microfocus.com >; 'Michael Fanning' <
Michael.Fanning@microsoft.com >;
sarif@lists.oasis-open.org ; O'Neil, Yekaterina Tsipenyuk <
katrina@microfocus.com >
Subject: RE: [sarif] partialFingerprints: the words the world has been waiting for
Hi all,
Yekaterina , thanks for the explanation. Since my memory is usually so poor, I was pleased to find that I remembered most of what you just wrote