Data Provenance (DPS) TC

 View Only
  • 1.  Help in resolving comment on source/provenance/use on PR#20

    Posted 05-28-2025 23:30

    A comment on PR#20 https://github.com/oasis-tcs/dps/pull/20/files/d973278765ae32de0780f17cb97463cfc45a8cec#r2112666416 states:

    "In general, I find the separation of the "trinity"-like provenance, source, and use a difficult concept. For example: why are provenance and source separate?"

    I am repeating in this email thread to get more members involved as I assume everyone is not following the GitHub PR comments (since there are only 4 forks and 6 watching, and even for those 6, if they are like myself, all the GitHub watch emails go into the bit bucket of a folder I never read since I get hundreds per day).

    This appears to me to be a very fundamental issue. I assumed there was general agreement on the D&TA work as a basis of moving forward. I just did the administrivia work of "OASIS-izing" it. This comment implies otherwise, at least to me - confessing my ignorance of the work that went into creating the D&TA spec. So I would like help from those who spent time creating the D&TA spec, and I would everyone to chime in with their opinion on accepting the D&TA work as our staring point.



    ------------------------------
    Duncan Sparrell
    Chief Cyber Curmudgeion
    sFractal Consulting LLC
    Oakton VA
    703-828-8646
    ------------------------------


  • 2.  RE: Help in resolving comment on source/provenance/use on PR#20

    Posted 05-30-2025 11:57

    Hi Duncan,

     

    Thanks for raising this, and I appreciate your effort in surfacing the question from PR#20 more broadly to the group.

    To clarify, the taxonomy reflected in the D&TA work was not arbitrarily defined; it was the result of extensive iteration and feedback over the course of more than 55 deep-dive conversations and many iterations. These involved both Working Group members and external experts, including data practitioners, compliance professionals, and legal stakeholders. We went through multiple conceptual models before settling on the current framing, recognizing that it isn't perfect, but it worked for most of the organizations represented in the Working Group. Of course, there is always room for (much!) improvement.

     

    For context, here are a few of the iterations we explored (happy to dig up the working document and share more if folks would like):

     

    Framed as core provenance questions:

    • What is this data?
    • Who supplied it?
    • When was this data produced?
    • Where was the data collected?
    • How was the data collected?
    • How can I use this data?

     

    Focused on traceability and legal dimensions:

    • Lineage
    • Supplier location (origination, processing, storage)
    • Legal rights
    • Privacy & protection
    • Generation date
    • Data format
    • Generation method
    • Intended use & restrictions

     

    Designed for operational and compliance clarity:

    • Usage (rights & restrictions)
    • Collection (source, standards & validation, recency, method)

     

    These models were reviewed and evolved through months of iterative feedback. The current separation between provenance, source, and use is an intentional one, meant to provide clarity to different kinds of users (technical, legal, and business) who need to interpret and apply this metadata for different reasons.

     

    That said, this taxonomy, like everything else in our work, should remain open to refinement. If there's a way to clarify or simplify the distinction between these elements in the spec, I'd support a discussion around that.

     

    Thanks again for keeping the group engaged on this.

     

    Kristina