Pubby Metadata Extension
From trdf
This page describes the metadata extension for the Linked Data server Pubby.
- Olaf Hartig (Humboldt-Universität zu Berlin)
- Hannes Mühleisen (Humboldt-Universität zu Berlin)
Contents |
Introduction
In order to enable a reliable and approved consumption and processing of Linked Data in applications it requires various information about the consumed data (e.g. licensing and provenance). Since a large amount of this information is available to the publishers of the data it should become a practice to provide this information as metadata.
The metadata extension for Pubby provides a mechanism for adding metadata to the RDF graphs served by Pubby. Hence, this extension allows the publication of metadata together with the data itself. Additionally, the extension augments the HTML output templates so that the provided metadata is also visible in the human-readable output of Pubby.
In brief, the metadata added to the provided RDF graphs is generated from a template RDF graph by replacing placeholders in the template. These placeholders refer to run-time data, typical configuration options, and to additional configuration variables.
The original aim of developing the metadata extension was the automatic publication of provenance-related metadata. For this reason, the metadata extension includes a default template that adds various pieces of provenance information to the published data.
Configuration
Configuring Pubby to generate metadata and include it in the served graphs is simple. You just have to extend the configuration of each provided dataset with a reference to a file that contains a metadata template. This configuration is usually defined in the file config.n3 (maybe you changed this in the file web.xml). To refer to the metadata template in the configuration simply add the property conf:metadataTemplate to the description of the dataset. The value of this property is the name of a file that contains an N3 serialization of the template RDF graph. The metadata extension assumes the existence of this file in the ./WEB-INF/templates/ directory.
Example: The configuration file config.n3 should contain a dataset configuration section with a conf:metadataTemplate statement:
# ...
<> conf:dataset [
# ...
# other dataset configuration options
# ...
conf:metadataTemplate "metadata.n3"
] .
# ...
This configuration refers to the metadata template in the file ./WEB-INF/templates/metadata.n3.
Template RDF Graph
The metadata template is an RDF graph that may contain placeholder URIs. These placeholders allow a customization of the metadata that is served for each resource served. The placeholder URIs use the about: URI scheme. Placeholders can be used in the subject and object position of the triples in the template graph.
There are three different sources of the data with which the placeholders are replaced. Accordingly, the placeholders are organized in three groups: runtime, config, and metadata. All placeholder URIs follow the scheme: about:metadata:<group name>:<identifier>.
Placeholder Group "runtime"
Placeholders in this group are replaced with data that is only available during the runtime of Pubby. Hence, replacing a placeholder from this group requires code that is individual for each placeholder. Currently, Pubby recognizes and replaces the following runtime placeholders:
-
about:metadata:runtime:time- This placeholder is replaced by axsd:dateTimeliteral which represents the current date and time -
about:metadata:runtime:query- This placeholder is replaced by a string literal with the SPARQL DESCRIBE query which has been used to get the data about the requested resource. -
about:metadata:runtime:resource- This placeholder is replaced by the URI of the requested resource. -
about:metadata:runtime:graph- This placeholder is replaced by the URI of the current RDF graph that is served by Pubby.
Placeholder Group "config"
This group of placeholders corresponds to the configuration options for the current dataset as well as to the global Pubby configuration options. For instance, the placeholder about:metadata:config:sparqlEndpoint is replaced by the value of the conf:sparqlEndpoint property for the current dataset; the placeholder about:metadata:config:projectName is replaced by the value of property conf:projectName in the global Pubby configuration.
Placeholder Group "metadata"
Placeholders in this group allow the inclusion of additional metadata attributes. These attributes are defined as additional properties from the http://example.org/metadata# namespace (abbreviated with prefix meta:) for the dataset. Each placeholder in this group is replaced by the value of the property of which the fragment identifier equals the identifier part of the placeholder URI. For instance, the placeholder about:metadata:metadata:pubbyUser is replaced by the value of the property meta:pubbyUser; this requires the existence of this property in the dataset description of the current dataset:
# ...
<> conf:dataset [
# ...
# other dataset configuration options
# ...
conf:metadataTemplate "metadata.n3" ;
meta:pubbyUser <http://example.org/URI_of_publisher>
] .
# ...
Example
Here is a small example for a template RDF graph:
@prefix foaf: <http://xmlns.com/foaf/0.1/> . @prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> . <about:metadata:runtime:graph> foaf:primaryTopic <about:metadata:runtime:resource> . <about:metadata:runtime:graph> xhv:license <about:metadata:metadata:license> .
This template adds two triples to the RDF graphs served by the corresponding Pubby instance. The first triple asserts that the primary topic of the served RDF graph is the resource that was requested. The second triple identifies the license of the served RDF graph and requires the metadata attribute http://example.org/metadata#license to be specified in the configuration file.
Result
Each placeholder URI in the template graph is replaced as described before, yielding the metadata RDF graph. Placeholder URIs for which it is impossible to determine the replacing value are replaced by a new, unique blank node. To clean up the metadata before publishing it the metadata extension iteratively removes all triples with blank nodes that are not mentioned in other triples of the metadata RDF graph (i.e. that have no properties). After cleaning up the metadata RDF graph its triples are added to the RDF graph served by Pubby.
The metadata extension does not only add the metadata to the RDF graph but also visualizes it in the HTML representation. Here is an example:
Notice, the additional "Metadata" table below the usual property-value table. This new table provides a tree-like visualization of the metadata. The (more) links in the visualization allow to expand the view to show the properties of the corresponding entities.
Default Template
The default metadata template that comes with Pubby automatically provides a description of the provenance of the served RDF graphs. This description uses the Provenance Vocabulary and it includes information about the creation of served RDF graphs, the Pubby instance, the DESCRIBE query that has been used for creation, the SPARQL endpoint that answered the query, etc. The following additional metadata attributes may be configured to provide more detailed information about the Pubby instance and the accessed SPARQL endpoint:
-
meta:pubbyUser- This property refers to the URI of the data publisher who uses this Pubby instance to publish her/his data. The URI should be an HTTP URI that links to data about the publisher. -
meta:pubbyUserName,meta:pubbyUserHomepage- These properties provide the name and the homepage address of the data publisher who uses this Pubby instance. They could be used as an alternative tometa:pubbyUserif the publisher has no HTTP-dereferencable URI, yet. -
meta:pubbyOperator- This property refers to the URI of the service provider who operates this Pubby instance. The service provider and the data publisher may be the same entity; nonetheless, it is recommended to specify both properties (using the same value) so that an application can infer the fact that publisher and operator are the same. -
meta:pubbyOperatorName,meta:pubbyOperatorHomepage- These properties provide the name and the homepage address of the service provider who operates this Pubby instance. These properties could be used as an alternative tometa:pubbyOperator. -
meta:endpointUser- This property refers to the URI of the data publisher who uses the accessed SPARQL endpoint to provide her/his data. -
meta:endpointUserName,meta:endpointUserHomepage- Name and homepage address of the endpoint user and alternative tometa:endpointUser. -
meta:endpointOperator- This property refers to the URI of the service provider who operates the accessed SPARQL endpoint. -
meta:endpointOperatorName,meta:endpointOperatorHomepage
Known Issues
The HTML view for the metadata graph cannot handle circular references with more than two resources involved.
If anything goes wrong
Check the Pubby log files (normally located within your servlet container) for error messages, usually those are relativly meaningful.
