Data Provenance (DPS) TC

View Only

Back to discussions

Expand all | Collapse all

Naming for DP Model and Specification

1. Naming for DP Model and Specification

Recommend
David Kemp
Posted 06-24-2025 11:44
At the June meeting we decided:

Naming Decision: Specification renamed to "Data Provenance" (dropped "standards").

That decision is related to the initial work product specification that uses property tables to define providence metadata. As we discussed, the content of the tables is just a strawman to be replaced by actual content as determined by the TC. But the form of the specification should be agreed early so we have a way to specify the content.

Two approaches have been proposed:

anonymous collections of properties as shown in the contributed YAML schema and its equivalent JSON schema

named collections of properties ("Types") as shown in the property tables:

The naming decision referred to the name of the specification document, but it also applies to the schema defined by the specification. In the typed approach, the top level of the schema has the name "DataProvenance" (circled in green), while in the anonymous approach the schema root doesn't have a Type, it's just a collection of three properties.

Similarly, the first property is defined to be a named "Source" type (circled in orange) while in the anonymous approach all of the source content is defined as sets of properties nested under the source property.

I discuss this further under Issue 16. Most TC members are probably not interested in esoteric modeling details, but this decision has a couple of practical impacts:

Specification readability: Named types can be referred to without defining all the details of their content

XML support: Unlike JSON, XML elements must have a tag, so if we want to allow XML metadata the model must define those tags:

<DataProvenance> <Source> < ... > </Source> <Provenance> < ... > </Provenance> <Use> < ... > </Use> </DataProvenance>

Therefore, I Propose that the data provenance specification document and model schema use named types to define content.
2. RE: Naming for DP Model and Specification

Recommend
Kristina Podnar
Posted 06-25-2025 10:55
Hi Dave (and all!),

I will summarize in my language (sorry, I'm not a very technical person!) what I believe is your proposal and my response. Would you please help me understand if I went off the path or misrepresent any aspect?

The DPS specification will describe metadata, and we aim to represent this in a machine-readable way. We can either do this through:

Anonymous collections of properties
Named collections (or types)

Option 1 is a flexible and straightforward solution that can utilize JSON, as you suggested. Option 2 would have us group explicitly like the D&TA standards did (e.g., Source). Option 2 is easier to reuse and refer to, as well as support in other, non-JSON structures (like XML).

The reason the TC should care us that we need to prioritize readability (with named types, you can refer to something like "Source" instead of having to spell it out every time) and unlike JSON, XML requires element tags (which need names), so named types are better if XML support is necessary.

Given this understanding, we should adopt the named types of approach. This will make the structure more transparent, reusable, and compatible with XML and other systems that need named elements.

Thanks,

Kristina

Original Message
3. RE: Naming for DP Model and Specification

Recommend
David Kemp
Posted 06-26-2025 11:42
| view attached
Kristina,

You've got it.

A couple of additional points:

An information model (like the property tables shown in the spec) supplies information that is needed for multiple data formats (like XML, JSON, and concise binary data)
A JSON schema generated from the information model automatically supplies names using $defs. A JSON schema written by hand could also supply names using $defs, but the contributed JSON schema derived from YAML does not use $defs
A tree view can be generated from an information model or named JSON schema, in the format of a graphical tree or an ascii tree. Here's what both look like when derived from the anonymous schema. They'd look the same generated from a named schema except that the names would be more meaningful (e.g., "DataProvenance" instead of "Root").

Regards,
David

data-provenance-standards-1.0.0.schema.yml-to-json-is-lossless-conceptual.atree:

Root

├── Root.provenance

│   ├── Format

│   ├── Generation-method

│   │   └── Generation-method-item

│   ├── Root.provenance.generation-period

│   ├── Origin

│   │   └── Origin-item

│   │       └── Address

│   └── Origin-geography

│       └── Origin-geography-item

├── Root.source

│   └── Issuer

│       └── Issuer-item

│           └── Address

└── Root.use

     ├── Classification

     │   └── Classification-item

     │       └── Classification-item.regulation

     ├── Consents

     ├── Copyright

     ├── Intended-purpose

     │   └── Intended-purpose-item

     ├── License

     ├── Patent

     ├── Privacy-enhancing

     │   └── Privacy-enhancing-item

     │       ├── Parameters

     │       ├── Result

     │       └── Privacy-enhancing-item.tool-category

     ├── Processing-excluded

     │   └── Processing-excluded-item

     ├── Processing-included

     │   └── Processing-included-item

     ├── Storage-allowed

     │   └── Storage-allowed-item

     ├── Storage-forbidden

     │   └── Storage-forbidden-item

     └── Trademark

Original Message
4. RE: Naming for DP Model and Specification

Recommend
Kristina Podnar
Posted 06-27-2025 09:21
Thanks, David, for confirming and the additional clarifications. That really brings it all together for me.

It sounds like we're aligned on the value of using named types moving forward, which is great. I'll keep this in mind as we continue refining the specification. Hopefully everyone else on the TC is in agreement as well.

Thanks again for taking the time to explain!

Kristina

Original Message
5. RE: Naming for DP Model and Specification

Recommend
Stefan Hagen
Posted 06-30-2025 15:30
Dear members,

On Fri, Jun 27, 2025, at 15:20, Kristina Podnar via OASIS wrote:
> Thanks, David, for confirming and the additional clarifications. That really brings it all together for me. It sounds like we're aligned on the...
> Data Provenance (DPS) TC <https: groups.oasis-open.org communities community-home digestviewer?communitykey=2c60b2cf-45d3-48cd-8594-0194f182b33d>
>
> [...]
> Re: Naming for DP Model and Specification <https: groups.oasis-open.org discussion naming-for-dp-model-and-specification#bm06fd476c-3e0e-4bf4-9f6d-c5ec76739aa0>
> [...]
> Kristina Podnar <https: groups.oasis-open.org profile?userkey=22501efe-394d-4bea-8dee-019397343d8d>
> Jun 27, 2025 9:21 AM
> Kristina Podnar <https: groups.oasis-open.org profile?userkey=22501efe-394d-4bea-8dee-019397343d8d>
> Thanks, David, for confirming and the additional clarifications. That really brings it all together for me.
>
> It sounds like we're aligned on the value of using named types moving forward, which is great. I'll keep this in mind as we continue refining the specification. Hopefully everyone else on the TC is in agreement as well.
>
> Thanks again for taking the time to explain!
>
> Kristina [...]
> -------------------------------------------
> Original Message:
> Sent: 6/26/2025 11:42:00 AM
> From: David Kemp
> Subject: RE: Naming for DP Model and Specification
>
> Kristina,
>
> You've got it.
>
> A couple of additional points:
>
> • An information model (like the property tables shown in the spec) supplies information that is needed for multiple data formats (like XML, JSON, and concise binary data)
> • A JSON schema generated from the information model automatically supplies names using $defs. A JSON schema written by hand could also supply names using $defs, but the contributed JSON schema derived from YAML does not use $defs
> • A tree view can be generated from an information model or named JSON schema, in the format of a graphical tree or an ascii tree. Here's what both look like when derived from the anonymous schema. They'd look the same generated from a named schema except that the names would be more meaningful (e.g., "DataProvenance" instead of "Root").
>
>
> Regards,
> David
>
>
> *data-provenance-standards-1.0.0.schema.yml-to-json-is-lossless-conceptual.atree:*
>
>
>
> Root
>
> ├── Root.provenance
> │ ├── Format
> │ ├── Generation-method
> │ │ └── Generation-method-item
> │ ├── Root.provenance.generation-period
> │ ├── Origin
> │ │ └── Origin-item
> │ │ └── Address
> │ └── Origin-geography
> │ └── Origin-geography-item
> ├── Root.source
> │ └── Issuer
> │ └── Issuer-item
> │ └── Address
> └── Root.use
> ├── Classification
> │ └── Classification-item
> │ └── Classification-item.regulation
> ├── Consents
> ├── Copyright
> ├── Intended-purpose
> │ └── Intended-purpose-item
> ├── License
> ├── Patent
> ├── Privacy-enhancing
> │ └── Privacy-enhancing-item
> │ ├── Parameters
> │ ├── Result
> │ └── Privacy-enhancing-item.tool-category
> ├── Processing-excluded
> │ └── Processing-excluded-item
> ├── Processing-included
> │ └── Processing-included-item
> ├── Storage-allowed
> │ └── Storage-allowed-item
> ├── Storage-forbidden
> │ └── Storage-forbidden-item
> └── Trademark
>
> [...]

I added some additional thoughts for your reading pleasure
and consideration to the idea exchange around the ticket at:
https://github.com/oasis-tcs/dps/issues/16#issuecomment-3020193224

In my opinion there is no need to rush fixing names and topologies
yet, so enjoy.

But, I consider us being in a more leaf type exploring phase that
started with the questions TC members brought up and that Kristina
kindly started to answer / provide feedback to from the perspective
of a long time DTA discussions participant.

I am sure we have some interesting "functions" we want to represent
in our specifdied struicture and need to discuss about result types
(so to say).

"Functions" are to me here members of the objects that provide
answers to questions like:

- to which dataset do "I" relate
- when was "I" changed last time
- "I" am the latest version
- "I" have a history
- what human language am "I" in
- the rules of which region do apply
- "here" are further verifiable claims in this or that format
requiring this or that transport protocol

Rewritten in some pseudo code for a hosting structure dp
(for data-provenance):

- dp.data().link()
- dp.data().changed()
- dp.data().effective()
- dp.data().history()
- dp.provenance().lang()
- dp.provenance().region()
- dp.use().consents()

Thanks.

All the best,
Stefan

Original Message
6. RE: Naming for DP Model and Specification

Recommend
Kristina Podnar
Posted 07-01-2025 01:54
Thank you, Stefan. That is a perspective I was not considering.

I propose to the co-chairs and TC that we take this up in today's conversation. While it may not seem pressing, it seems to me to be on the critical path if we are to aim to get the standards out in the coming month and a half for public comment. This fits into topic #3 on our agenda.

Welcoming other perspectives and suggestions.

Kristina

Original Message

Data Provenance (DPS) TC

Naming for DP Model and Specification

David Kemp06-24-2025 11:44

Kristina Podnar06-25-2025 10:55

David Kemp06-26-2025 11:42

Kristina Podnar06-27-2025 09:21

Stefan Hagen06-30-2025 15:30

Kristina Podnar07-01-2025 01:54

1. Naming for DP Model and Specification

2. RE: Naming for DP Model and Specification

3. RE: Naming for DP Model and Specification

4. RE: Naming for DP Model and Specification

5. RE: Naming for DP Model and Specification

6. RE: Naming for DP Model and Specification

Contact Us

Membership

Privacy & Terms

Data Provenance (DPS) TC

Naming for DP Model and Specification

David Kemp06-24-2025 11:44

Kristina Podnar06-25-2025 10:55

David Kemp06-26-2025 11:42

Kristina Podnar06-27-2025 09:21

Stefan Hagen06-30-2025 15:30

Kristina Podnar07-01-2025 01:54

1. Naming for DP Model and Specification

2. RE: Naming for DP Model and Specification

3. RE: Naming for DP Model and Specification

4. RE: Naming for DP Model and Specification

5. RE: Naming for DP Model and Specification

6. RE: Naming for DP Model and Specification

Related Content

Naming for DP Model and Specification Attachments

Data Provenance Standards Executive Overview uploaded

Provenance Information Model

Use cases and data source mapping

DPS Use Cases uploaded

Contact Us

Membership

Privacy & Terms