Data Provenance (DPS) TC

 View Only
  • 1.  Proposal for name change of data provenance schema spec

    Posted 08-26-2025 10:49
    Dear members,

    triggered by a remark from David Kemp in the chat of our meeting today, 
    who noticed that "Data Provenance Version 1.0" is a maybe to terse title, 
    and suggested to call it "Data Provenance Metadata Version 1.0" I propose 
    the following:

    - Data Provenance Schema Version 1.0

    Rationale:

    - better describes what is provided
    - avoids "data thing red data" repetition dilemma
    - we offer in fact a schema for metadata and not the metadata itself
      on multiple levels: (a) info set and (b) data format schema sets

    Also, I would love to have David Kemp join me as Co-Editor.

    Thanks,
    Stefan.



  • 2.  RE: Proposal for name change of data provenance schema spec

    Posted 08-26-2025 13:53

    We have a few possibilities:

    • Data Provenance Standard
    • Data Provenance  (deleted standard as redundant - DPS Standard?  Like "ATM Machine" for automated banking)
    • Data Provenance Metadata (dpkemp suggestion)
    • Data Provenance Schema (sthagen suggestion)

     

    We are writing a standard for something, and I think that something is a metadata specification.  So despite "data" and "metadata" sounding awkward, and at the risk of coming full circle on "Standard", I'd prefer:

    • Data Provenance Metadata Specification, Version 1.0

     

    Rationale: "Schema" is too narrow.  The document is prose that both defines and explains/justifies what provenance metadata does and looks like.  A schema is a machine-readable artifact that accompanies the document (and has precedence if they are inconsistent).  And if we use "Schema", it isn't a schema for data provenance, it is a schema for metadata.

     

    I'd be happy to be listed as Co-Editor.

    Regards,
    David






  • 3.  RE: Proposal for name change of data provenance schema spec

    Posted 08-26-2025 15:07
    Thank you, David, for framing the options (and Stefan for the additional ideas).

    "Schema" feels narrow to me, since the work we are doing goes beyond just the machine-readable artifacts. At the same time, I also want to avoid the redundancy of "Standard" and the awkwardness of "data/metadata" repetition.

    For me, Data Provenance Metadata, Version 1.0 strikes the right balance since it keeps the focus on metadata (which is what we are specifying),
    it's concise, and it avoids confusion with implementation artifacts like schemas.

    Also, thank you to both Stefan and David for stepping up as co-editors.

    Kristina





  • 4.  RE: Proposal for name change of data provenance schema spec

    Posted 08-26-2025 17:20
    Thanks for the feedback,

    On Tue, Aug 26, 2025, at 21:06, Kristina Podnar via OASIS wrote:
    > Thank you, David, for framing the options (and Stefan for the additional ideas).
    >
    > "Schema" feels narrow to me, since the work we are doing goes beyond just the machine-readable artifacts. At the same time, I also want to avoid the redundancy of "Standard" and the awkwardness of "data/metadata" repetition.
    >
    > For me, Data Provenance Metadata, Version 1.0 strikes the right balance since it keeps the focus on metadata (which is what we are specifying),
    > it's concise, and it avoids confusion with implementation artifacts like schemas.
    >
    > Also, thank you to both Stefan and David for stepping up as co-editors.
    >
    > Kristina [...]
    > -------------------------------------------
    > Original Message:
    > Sent: 8/26/2025 1:53:00 PM
    > From: David Kemp
    > Subject: RE: Proposal for name change of data provenance schema spec
    >
    > We have a few possibilities:
    >
    > • Data Provenance Standard
    > • Data Provenance (deleted standard as redundant - DPS Standard? Like "ATM Machine" for automated banking)
    > • Data Provenance Metadata (dpkemp suggestion)
    > • Data Provenance Schema (sthagen suggestion)
    >
    >
    > We are writing a standard for something, and I think that something is a metadata specification. So despite "data" and "metadata" sounding awkward, and at the risk of coming full circle on "Standard", I'd prefer:
    >
    > • Data Provenance Metadata Specification, Version 1.0
    >
    >
    > Rationale: "Schema" is too narrow. The document is prose that both defines and explains/justifies what provenance metadata does and looks like. A schema is a machine-readable artifact that accompanies the document (and has precedence if they are inconsistent). And if we use "Schema", it isn't a schema for data provenance, it is a schema for metadata.
    >
    >
    >
    > I'd be happy to be listed as Co-Editor.
    >
    > Regards,
    > David [...]

    I can agree on many titles, as long as the prose and other artifacts
    work well and are consistent.

    I find the interpretation of schema too narrow, as in my understanding
    and use it is more than some machine readable artifact.
    More like a structure or pattern to describe something
    in a way that discards irrelevant details and emphasizes
    on the common or characteristic parts.

    Looking it up in Merriam-Webster I find "my semantic" as (2.)
    or the "broadly" use of (1.):

    """
    schema (noun), sche·​ma ˈskē-mə
    plural schemata ˈskē-mə-tə also schemas

    1.
    : a diagrammatic presentation
    broadly : a structured framework or plan : OUTLINE

    2.
    : a mental codification of experience that includes a particular
    organized way of perceiving cognitively and responding to a
    complex situation or set of stimuli
    """
    “Schema.” Merriam-Webster.com Dictionary, Merriam-Webster,
    https://www.merriam-webster.com/dictionary/schema. Accessed 26 Aug. 2025.

    So, hierarchically:

    schema <- information schema <- data schema

    like:

    sequence <- list <- []


    But, as I wrote a hundred words above, and because form follows function
    for me, and I can happily live with "Data Provenance Metadata Version 1.0"
    if that makes sense to you people in "the grand scheme of things".


    PS: Thank you David for your support as co-editor - looking forward
    to help shape that specification as a team. Much appreciated!


    Cheers,
    Stefan.