In the last few years, the TEI Ontologies SIG has been working on how world knowledge is expressed in TEI documents, in connection to other standards such as CIDOC-CRM and FRBR. This work has been reported in the meetings of the SIG as documented on the SIG Wiki (http://wiki.tei-c.org/index.php/SIG:Ontologies). In addition to the papers listed on the Wiki, an article titled “TEI and cultural heritage ontologies: Exchange of information?” was recently printed in Literary and Linguistic Computing.
As agreed at the SIG meeting in November 2007, an important next step will be to “start on the development of guidelines for how to create TEI documents that easily may be mapped to ontologies such as the CIDOC-CRM”. This document comprises a draft for such a set of guidelines.
All these standards have been developed in a context, for a purpose. The histories are import in order to understand that, which will help in understanding why they are as they are, and why things seens as shortcomings are there for a reason.
TEI has often been used to mark up texts without taking into consideration the world outside the text. In many cases, this is a good approach. But sometimes, one would like to add information that is related to an external world. One example is a historical document with many references to persons and places. A text internal approach would be to register all the names, using the appropriate TEI element types. But if one is interested in making an index of all the persons mentioned in the text, one is moving outside the text. The text contains names of the persons, but the reason to say that some of these names have a speical connection is that they refer to the same physical or imaginary person.
Many of the modules in TEI can be said to have an implicit ontology. The same way as TEI makes textual features existing in the text explicit, such modellin will enable us to make implicit conceptual structures explicit.
If one wants to do this, a reference to the person will need to be included in the data connected to the document, typically in the markup. This can be done in different ways, as will be discussed below.
In the following, such an approach will be called conceptual modelling. There are many different reasons for wanting to do this, and many potential end results. One project will often want to persue several results of this single process.
One result may be a CIDOC-CRM mapping of information from historical documents, used for import into a cultural heritage management system. Another may be an export into FRBR in order to connect the content in TEI documents to a library database. A third may be a mapping into Dublin Core in order to include information from TEI documents into a web publishing system, a fourth would be mapping to formats under development by Google or Yahoo.
Whatever the use may be, the method will open up for inclusion of TEI data in the semantic web. Not just the documents as items, but the world information described inside them. It opens for doing this on different levels of complexity, at different stages. The whole process can be pretty simple. Or one can make a complex mapping, but still export simpler versions from the mapping when that is requested. Converting from CIDOC-CRM to Dublin Core, for instance, is a well defined process. This also will enable mappings to future standards to come.
A good advice is to think about conceptual models from the outset. This should be part of the data analysis, in asking why to mark up, leading into what to mark up. What is the target ontology, what is the purpose of the markup?
Types of information in the header: Personlists with person elements, Placelists with place elements. There will be a one-to-one relationship between a person element and a real life/fictionous person.
If one want to record information about acting entities that are not persons (storms, animals, washing machines), e.g. in fiction of ethnographical texts, one may want to adjust TEI by adding an element agent similar to person.
Name type elements will always be encoded in TEI. For the person type elements, they may be encoded in TEI, typically in the header or a separate section in the body, or they may not be encoded in TEI and only be stored in an external conceptual model. In the former case, links will go from name type elements via person type elements to the conceptual model (and backwards), in the latter case, the links will go directly between the name type elements and the conceptual model.
Certain TEI elements may contain information that could be hard to map because some necessary information, e.g. reasons why something is asserted and who is responsible for the assertion, may be hidden. Some will also be based on a possible not formally availible "point zero", such as age.
"The term ontology means literally the study of being and was until recently the name of a branch of philosophy and a term used in the singular only. During the last ten years the term has been adopted by computer and information sciences and the scope of the term has been expanded significantly. Today, it may denote everything from data models to classification systems and explanatory models in natural sciences. In this article we use ontology in the meaning conceptual model. That is, a formally defined model resulting from an analysis of a spe- cific domain and not necessarily a data model in the computer science sense." (Ore & Eide: "TEI and cultural heritage ontologies: Exchange of information?" LLC 24(2) 2009)
Need high level of expicitness in order to interpret encoded texts so that the information they express can be modelled in a conceptual model. This explicitness can be added to the markup, or it can be in the extraction alghorithm what is the best trade-off?
Need more than nesting in order to connect values as in marrige example. In very simple cases nesting may do, but soon cases will appear when the relationships are too complex, such as "this also applies to the persons discussed in the last paragraph".
No: Cannot put relation between person element, relations have to be connected to the context, often the name, commonly the event (e.g. or marriage) This does not mean that two strings of characters marry, but it means that a marriage cannot be seen as a relationship taken out of time and place - it has to be connected to a place in the text.
Any project who are going to use these methods will need some sort of application to do the actual extraction of information from the TEI documents. Such applications are often written in XSLT, but any scripting or programming language could be used, e.g. PERL or PYTHON.
Even if it is impossible to make tools for mapping of all possible TEI documents, it would still be a good idea to develop applications that can be used to extract conceptual models from specified groups of TEI documents. Such applications could be used as is by some users, whereas others cn use them as a base for developing their tailor-suited systems. This would be similar to the XSLT stylesheets availible for transformation into HTML and PDF.
Conclusion/suggestions: Ontology mapping cannot be defined for 1, but could be defined for 2 and 3. If mappings are done on these levels, publish them! Including the ODD defining the TEI version being the souce of the mapping. Building up a library in connection to these guidelines?
One of the good things with this approach is that once your data is stored in a conceptual model, converting to simpler structures (Dublin Core, google-friendly models) will be easy. Should be suggest that toole would be developed for this?