A TEI Project

Guidelines for conceptual mapping of TEI documents

Table of contents

1. Background

In the last few years, the TEI Ontologies SIG has been working on how world knowledge is expressed in TEI documents, in connection to other standards such as CIDOC-CRM and FRBR. This work has been reported in the meetings of the SIG as documented on the SIG Wiki (http://wiki.tei-c.org/index.php/SIG:Ontologies). In addition to the papers listed on the Wiki, an article titled “TEI and cultural heritage ontologies: Exchange of information?” was recently printed in Literary and Linguistic Computing.

As agreed at the SIG meeting in November 2007, an important next step will be to “start on the development of guidelines for how to create TEI documents that easily may be mapped to ontologies such as the CIDOC-CRM”. This document comprises a draft for such a set of guidelines.

2. Introduction

2.1. Definition

Not everyone in the TEI community agrees that TEI projects should have clearly articulated ontologies, because to some people ontologies are taxonomic and reductive. On the other hand, there are conceptual models or even worldviews implicit in all TEI projects as they exist now. There are vestiges and fragments of these conceptual models present in the current TEI tag to the degree that they are systematic. It seems to us important to ask people designing TEI projects and encoding individual documents to articulate and publish the conceptual model that defines their assumptions and that enable their encoding to be mapped onto . Furthermore, it seems obvious that articulating the worldview underlying text encoding is fundamental to good analysis.

The term ontologies has been described and defined already: “The term ontology means literally the study of being and was until recently the name of a branch of philosophy and a term used in the singular only. During the last ten years the term has been adopted by computer and information sciences and the scope of term has been expanded significantly. Today, it may denote everything from data models to classification systems and explanatory models in natural sciences.” For these guidelines, we use ontology to mean conceptual model, “a formally defined model resulting from an analysis of a specific domain and not necessarily a data model in the computer science sense.” (Ore & Eide: “TEI and cultural heritage ontologies: Exchange of information?” LLC 24(2) 2009.)

2.2. Local ontologies in TEI

There are silently assumed world views in the TEI, no ontology. The most explicit world view is expressed in the TEI-header and in the bibliographic reference module with a complete set of elements to encode common library practice. In addition the manuscript module represents a set of elements reflecting the normal content of manuscript catalogues encontered by the Master project.

Besides these two domain specific areas the ontological schema of TEI consists of:

Any ontology or datamodel which can be expressed by instances of the above elements can be expressed in TEI. In, for example, the FOAF example in wikipedia, see below all person-person predicates/properties can easily be expressed in TEI tags. The rest of the properties in the example are name, address, about. There exists a web-pointer element in TEI which can be typed to take care of that, if you want to.

2.3. The rest of this guidelines

This document is meant as a practical set of guidelines. We will not discuss the theoretical implications in any depth. In the bibliography in the end, there will be pointers to more theoretical discussions. What we will do is run through several examples, in which the main questions are being discussed. Then we will sum up towards the end.

3. Mapping

3.1. Example 1

In order to introduce out idea of mappings, we will work through a small example. It is taken from a hypothetical archaeological excavation report. The documents will be available a TEI document in order to be published on the web as a first class document, and to be exchanged with other users in the TEI format. The information expressed in the text, based on a specific reading, is also going to be imported into a CIDOC-CRM based database. Therefore, we will need a mapping of the information in the TEI document into the CIDOC-CRM database.

The text is as follows:

The excavation in Wasteland in 2005 was performed by Dr. Diggey. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.

A typical encoding of this example would be:

<p xml:id="p1">The excavation in <name type="placexml:id="n1key="place1">Wasteland</name> in <date xml:id="d1">2005</date> was performed by <name type="personxml:id="n2key="person1">Dr. Diggey </name>. He had the misfortune of breaking the beautiful sword (C50435) into 30 pieces.</p>

This encoding will provide the source for web and print publications, and it will give the necessary markup for creating name indexes. In order to make better indexes, taking persons and places, and not only their names, into consideration, one need to introduce the idea of persons. This can be done using person and place elements in the TEI header:

<sourceDesc>  <listPerson>   <person xml:id="person1">    <persName>Charles Atlas Diggey</persName> <!--...more about the doctor...-->   </person> <!-- ...more persons... -->  </listPerson>  <listPlace>   <place xml:id="place1">    <placeName>Wasteland</placeName> <!-- ...more about Wasteland... -->   </place> <!-- ...more places... -->  </listPlace> </sourceDesc>

But there are more information we want to deduce from this little text, namely two events and the information about the object. A possible mark up including this would be:

<p xml:id="p1">  <rs type="eventxml:id="e1">The excavation in <name type="placexml:id="n1">Wasteland</name> in <date xml:id="d1">2005</date>  </rs> was performed by <name type="personxml:id="n2">Dr. Diggey </name>. He had the misfortune of <rs type="eventxml:id="e2"> breaking <rs type="objectxml:id="o1">the beautiful      sword <rs type="objectxml:id="o_id1">(C50435)</rs>   </rs> into    30 pieces</rs>. </p>

Here, the descriptions of the events and of the object is seen as referring strings, for which a typology is created, including event and object. This typology should be stored in the TEI header.

In order to sum up the information we want to records, it is all related to the two important events in the text:

Most of this information could be stored in the TEI header using the what is already available. But the object would need an addition to TEI. This is related to two choices, which are more or less independent of each other:

First, should this information about the real world as it is read from the text, the ontology, be stored in the TEI document header or somewhere else, e.g. another TEI document or a database?

Secondly, which format should it be stored in? An expanded version of TEI, RDF/OWL, or database format?

Whichever choice is made regarding the first question, we assume there will be links between the names in the texts and the modelled objects in the ontology. And regardless of the answer to the second question, we suggest it would be good to include in the system a method for exporting the data into TEI.

TEI Ontologies SIG. Date:
This page is copyrighted