Provenance Vocabulary Core Ontology Specification

25 August 2009

This version:
http://purl.org/net/provenance/ns-20090825
Latest version:
http://purl.org/net/provenance/ns
Revision:
Revision: 0.1
Authors:
Olaf Hartig (Database and Information System Research Group, Department of Computer Science, Humboldt-Universität zu Berlin)
Jun Zhao (Image Bioinformatics Research Group, Department of Zoology, University of Oxford)

Creative Commons License This work is licensed under a Creative Commons License. This copyright applies to the Provenance Vocabulary Core Ontology Specification and accompanying documentation.

Valid XHTML + RDFa Regarding underlying technology, the Provenance Vocabulary relies heavily on W3C's RDF technology, an open Web standard that can be freely used by anyone.

This visual layout and structure of the specification was adapted from the SIOC Core Ontology Specification edited by Uldis Bojars and John G. Breslin.


Abstract

The Provenance Vocabulary enables providers of Web data to publish provenance-related metadata about their data. The Provenance Vocabulary Core Ontology provides the main classes and properties required to describe provenance of data on the Web. This documents specifies the classes and properties introduced by the Provenance Vocabulary Core Ontology.


Status of this document

NOTE: This section describes the status of this document at the time of its publication. Other documents may supersede this document.

This specification is an evolving document. This document may be updated or added to based on implementation experience, but no commitment is made by the authors regarding future updates.

The authors welcome suggestions on the Provenance Vocabulary and this document. Please send comments to the Provenance Vocabulary users' mailing list (prov-vocab-users), public archives are available. Issues with the vocabulary can be reported using the issue tracker.

Table of contents


1. Introduction

The Provenance Vocabulary which is defined as an OWL-DL ontology is partitioned in a core ontology and supplementary modules. To avoid making the core ontology too complex the modules provide less frequently used terms and a broad range of extensions of the core terms. At present the Provenance Vocabulary has two modules: Types and Integrity Verification.

The vocabulary is designed very closely to the model for Web data provenance as presented in [Har09]. This model comprises two dimensions of Web data provenance: data creation and data access. Accordingly, the Provenance Vocabulary basically consists of three main parts: general terms, terms for data creation, and terms for data access.

Detailed information about using the Provenance Vocabulary and many examples can be found in [HZ09].

The XML Namespace URIs that must be used by implementations of this specification are:

2. The Provenance Vocabulary Core Ontology at a glance

An alphabetical index of Provenance Vocabulary Core Ontology terms, by class and by property, are given below. All the terms are hyperlinked to their detailed description for quick reference.

Classes: | Actor | Artifact | CreationGuideline | DataAccess | DataCreation | DataItem | DataProvidingService | DataPublisher | Document | Execution | HumanActor | NonHumanActor |

Properties: | accessedService | containedBy | createdBy | employedArtifact | involvedActor | operatedBy | performedAt | performedBy | precededBy | retrievedBy | usedBy | usedData | usedGuideline | yieldedBy |

3. Cross-reference for the Provenance Vocabulary Core Ontology classes and properties

The Provenance Vocabulary Core Ontology introduces the following classes and properties.

3.1. General classes

Class: Actor

Actor is a general class that represents actors which usually performed the execution (see the class Execution) of an action or a process.

identifier:http://purl.org/net/provenance/ns#Actor
equivalent to: foaf:Agent
super-class of:prv:HumanActor prv:NonHumanActor
in range of:prv:involvedActor prv:performedBy

[back to overview]


Class: HumanActor

HumanActor is a general class that represents actors who are social beings such as persons, organizations, companies.

identifier:http://purl.org/net/provenance/ns#HumanActor
sub-class of: prv:Actor
super-class of: prv:DataPublisher foaf:Person foaf:Organization foaf:Group
disjoint with: prv:Artifact prv:Execution
disjoint with: prv:NonHumanActor
in range of:prv:operatedBy

[back to overview]


Class: NonHumanActor

NonHumanActor is a general class that represents actors who are not social beings.

identifier:http://purl.org/net/provenance/ns#NonHumanActor
sub-class of: prv:Actor
super-class of: prv:DataProvidingService
disjoint with: prv:HumanActor
in domain of:prv:operatedBy

[back to overview]


Class: Execution

Execution is a general class that represents completed executions of actions or processes. An execution is usually performed by an actor (see the class Actor) and an execution, in most cases, yielded an artifact (see the class Artifact).

identifier:http://purl.org/net/provenance/ns#Execution
super-class of:prv:DataCreation prv:DataAccess
disjoint with: prv:Actor prv:Artifact
in domain of:prv:involvedActor prv:employedArtifact prv:performedBy prv:performedAt
in range of:prv:yieldedBy

[back to overview]


Class: Artifact

Artifact is a general class that represents artifacts which can be used during the execution (see the class Execution) of an action or a process and which can also be the result of such an execution.

identifier:http://purl.org/net/provenance/ns#Artifact
super-class of:prv:DataItem prv:Document
disjoint with: prv:Actor prv:Execution
in domain of:prv:yieldedBy
in range of:prv:employedArtifact prv:containedBy

[back to overview]


Class: DataItem

DataItem is a general class that represents data items of any kind.

identifier:http://purl.org/net/provenance/ns#DataItem
sub-class of: prv:Artifact irw:InformationResource
super-class of: prv:CreationGuideline
disjoint with: prv:Document
in domain of:prv:containedBy prv:createdBy prv:precededBy
in range of:prv:precededBy prv:usedData

[back to overview]


Class: Document

Document is a general class that represents a Web representation (i.e. a document) with which a data item has been retrieved from the Web.

identifier:http://purl.org/net/provenance/ns#Document
sub-class of: prv:Artifact irw:WebRepresentation
web:Representation
super-class of: prv:CreationGuideline
disjoint with: prv:DataItem
in domain of:prv:retrievedBy

[back to overview]


3.2. Abstract properties

Note, these properties are not intended to be used to describe instance data but to provide an abstract base for other properties.

Property: yieldedBy

This is an abstract property that refers to the execution by which an artifact was yielded.

Identifier:http://purl.org/net/provenance/ns#yieldedBy
OWL Type:ObjectProperty
Super-property of: prv:retrievedBy
Domain: prv:Artifact
Range: prv:Execution

[back to overview]


Property: involvedActor

This is an abstract property that refers to an actor involved in an execution.

Identifier:http://purl.org/net/provenance/ns#involvedActor
OWL Type:ObjectProperty
Super-property of: prv:accessedService
Domain: prv:Execution
Range: prv:Actor

[back to overview]


Property: employedArtifact

This is an abstract property that refers to an artifact which was used during an execution.

Identifier:http://purl.org/net/provenance/ns#employedArtifact
OWL Type:ObjectProperty
Super-property of: prv:usedData
Domain: prv:Execution
Range: prv:Artifact

[back to overview]


3.3. General properties

Property: containedBy

This property refers to an artifact that contained a data item. This artifact can either be a host document (see class prv:Document) or it can be another data item of a larger granularity (e.g. an RDF statement is usually contained in an RDF graph).

Identifier:http://purl.org/net/provenance/ns#containedBy
OWL Type:ObjectProperty
Domain: prv:DataItem
Range: prv:Artifact

[back to overview]


Property: performedBy

This property refers to an actor that/who performed an execution.

Identifier:http://purl.org/net/provenance/ns#performedBy
OWL Type:ObjectProperty
Sub-property of: prv:involvedActor
Domain: prv:Execution
Range: prv:Actor

[back to overview]


Property: operatedBy

This property refers to a human actor who was operating a non-human actor. For instance, a service provider operates a data providing service (see class prv:DataProvidingService). Another example is a human actor who operates a non-human data creating actor.

Identifier:http://purl.org/net/provenance/ns#operatedBy
OWL Type:ObjectProperty
Domain: prv:NonHumanActor
Range: prv:HumanActor

[back to overview]


Property: performedAt

This property refers to the time an execution has been performed at.

Identifier:http://purl.org/net/provenance/ns#performedAt
OWL Type:DatatypeProperty
Domain: prv:Execution
Range: xsd:dateTime

[back to overview]


3.4. Data creation classes

Class: DataCreation

DataCreation is a class that represents the completed creation of a data item.

identifier:http://purl.org/net/provenance/ns#DataCreation
sub-class of: prv:Execution
disjoint with: prv:DataAccess
in domain of:prv:usedData prv:usedGuideline
in range of:prv:createdBy

[back to overview]


Class: CreationGuideline

CreationGuideline is a class that represents a guideline used to guide the execution of a data creation. Examples for creation guidelines are transformation rules, mapping definitions, entailment rules, and database queries.

identifier:http://purl.org/net/provenance/ns#CreationGuideline
sub-class of: prv:DataItem
in range of:prv:usedGuideline

[back to overview]


3.5. Data creation properties

Property: createdBy

This property refers to the creation of a data item.

Identifier:http://purl.org/net/provenance/ns#createdBy
OWL Type:ObjectProperty
Sub-property of: prv:yieldedBy
Domain: prv:DataItem
Range: prv:DataCreation

[back to overview]


Property: usedData

This property refers to a source data item that has been used during the creation of a data item. Examples for source data are the content of a document used for machine learning, the statements in a knowledge base used to entail a new statement, and the entries in a database used to answer a query. Notice, all source data has provenance; we strongly encourage to describe this provenance as well, at least as far as available information permits.

Identifier:http://purl.org/net/provenance/ns#usedData
OWL Type:ObjectProperty
Sub-property of: prv:employedArtifact
Domain: prv:DataCreation
Range: prv:DataItem

[back to overview]


Property: usedGuideline

This property refers to a creation guideline which guided the execution of a data creation. Examples for creation guidelines are transformation rules, mapping definitions, entailment rules, and database queries. Notice, all creation guidelines have provenance; we strongly encourage to describe this provenance as well, at least as far as available information permits.

Identifier:http://purl.org/net/provenance/ns#usedGuideline
OWL Type:ObjectProperty
Sub-property of: prv:usedData
Domain: prv:DataCreation
Range: prv:CreationGuideline

[back to overview]


Property: precededBy

This property refers to an immediately preceding version of a data item; hence, the new version (i.e. the subject) has been created using the old version (i.e. the object). We strongly encourage to also describe this creation of the new version explicitly.

Identifier:http://purl.org/net/provenance/ns#precededBy
OWL Type:ObjectProperty
Sub-property of: dcterms:replaces
Domain: prv:DataItem
Range: prv:DataItem

[back to overview]


3.6. Data access classes

Class: DataAccess

DataAccess is a class that represents the completed execution of accessing a data item on the Web.

identifier:http://purl.org/net/provenance/ns#DataAccess
sub-class of: prv:Execution
disjoint with: prv:DataCreation
in domain of:prv:accessedService
in range of:prv:retrievedBy

[back to overview]


Class: DataProvidingService

DataProvidingService is a class that represents a non-human actor - usually a Web service or a server - that processes data access requests and actually sends the requested Web representations (i.e. prv:Document) over the Web.

identifier:http://purl.org/net/provenance/ns#DataProvidingService
sub-class of: prv:NonHumanActor
super-class of: irw:Server web:Server web:Service
in domain of:prv:usedBy
in range of:prv:accessedService

[back to overview]


Class: DataPublisher

DataPublisher is a class that represents entities such as persons, groups, or organizations who use a data providing service (see class prv:DataProvidingService) to publish data on the Web.

identifier:http://purl.org/net/provenance/ns#DataPublisher
sub-class of: prv:HumanActor
in range of:prv:usedBy

[back to overview]


3.7. Data access properties

Property: retrievedBy

This property refers to the data access by which a document that contains a data item has been retrieved from the Web.

Identifier:http://purl.org/net/provenance/ns#retrievedBy
OWL Type:ObjectProperty
Sub-property of: prv:yieldedBy
Domain: prv:Document
Range: prv:DataAccess

[back to overview]


Property: accessedService

This property refers to the service that provided the document during the execution of a data access.

Identifier:http://purl.org/net/provenance/ns#accessedService
OWL Type:ObjectProperty
Sub-property of: prv:involvedActor
Domain: prv:DataAccess
Range: prv:DataProvidingService

[back to overview]


Property: usedBy

This property refers to a data publisher who used a data providing services.

Identifier:http://purl.org/net/provenance/ns#usedBy
OWL Type:ObjectProperty
Domain: prv:DataProvidingService
Range: prv:DataPublisher

[back to overview]


5. Design Decisions

Vocabulary design decisions:

  1. use owl:disjointWith to explicitly identify disjoint classes
  2. don't define inverse properties because they are bad for interoperability (see http://dowhatimean.net/2006/06/an-rdf-design-pattern-inverse-property-labels)
  3. rule to decide about the direction of a property (i.e. a link) when both are possible: it should be possible to describe provenance of a data item starting from the data item itself (i.e. the data item becomes subject in triples) and 'working' towards more distant provenance elements. Thus, the links should not be directed towards the data item.

6. References