Good work, Øyvind!
Having read through the updated draft (and having not contributed to the
prior work at all - so apologies for that), I would offer the following
I think the project would be helped by having a sharper focus on its
ultimate deliverables. The draft mentions RDF as a result syntax, and
the "cultural semantic web", but doesn't consider the implications of
this. (If the example fragment was worked through into an RDF
representation, this would be helpful.)
One implication is that we are aiming to extract assertions about the
entities of interest. That requires us not just to identify that a
person has been mentioned in the text. We must also extract
relationships between that person and an event, object, etc., where the
"predicate" (CRM Property) connecting them can be inferred from the TEI
markup. The fact that the person and the event occur near each other in
the TEI source isn't enough to achieve this - we need a method within
TEI to indicate such relationships explicitly. (In general, I think
that we should be promoting explicit marking-up of this semantic
information, not looking to text-scraping techniques to rescue us from
our failure to mark things up properly.) Of course, once we have such a
mechanism, it becomes trivially easy to generate the corresponding RDF -
and so it should be! I think that without a practical treatment of the
issue of relationships, the document will not serve a useful purpose.
A second implication of our "semantic web" direction of travel is that
we should be looking to encourage the use of persistent URLs to identify
the entities we are marking up. Again, if such URLs can make it into
the TEI source, the process of creating a useful RDF equivalent will be
that much easier. This is a problem for everyone's data, because
no-one's data in its natural state contains URLs (!) However, it would
be useful if the document makes clear the difference between an
identifier for an entity (which will typically carry no human-friendly
clue, e.g. http://www.geonames.org/6619872/) and the name(s) of that
entity, which are simply properties (e.g. "Royal Albert Memorial Museum
and Art Gallery", which is actually Exeter Museum). The problem is that
everyone treats names as though they were identifiers, despite their
shortcomings in this role.
Finally, I think it would be useful to talk about reference. The fact
that a certain paragraph is "about" Exeter Museum is clearly useful
information. However, it doesn't fit nicely into the pattern of an RDF
assertion. (If one is using the Topic Maps approach, it does fit nicely
- it is an "Occurrence".) The document should address this issue and
Hope this helps.
In message <0BA1E573-3DF8-4E66-96C6-6F30A55C2A95@...>, Øyvind
Eide <oyvind.eide@...> writes
>As you may have noticed, the discussion following the workshop in Ann
>Arbor fell into a dormant phase since the last draft was sent to the
>list in December last year. We have all been busy, I assume; I soon
>had to concentrate fully on my upgrade to PhD, a UK system for
>documenting that the PhD project is in good running. This was
>finalised before summer, and now I am able to take up on issues with
>less external pressure (if not less important ones).
>After re-examining the work we have done so far, taking the original
>goal into consideration, I have decided to rewrite the whole thing,
>copying in from the old version as appropriate.
>I hereby enclose the rewritten version, in which the order as well as
>the content of things are changed quite a bit. The TEI as well as a
>derived HTML version are enclosed.
>[ A MIME text / html part was included here. ]
>[ A MIME text / xml part was included here. ]
>We will have a half day meeting in the SIG at the TEI annual meeting
>in Zadar in November. I hope we can have a final discussion on the
>guidelines document there, as well as decide on future work of the
>SIG. Please, if you are able to come to Zadar, take part in this!
>Otherwise, please send any comments you may have on beforehand.
>The more immediate concern, however, is to comment on the enclosed
>draft, both the general layout and details, including omitted parts of
>the previous version you would like to re-introduce. There are a few
>parts with little or no content, should they be filled or should they
>The plan is to start working towards an ODD version of the document in
>a short while. Sebastian Rahtz has, luckily for us, offered to take
>part in this work; the list will, of course, be included in the
>discussion as well. Sebastian suggested an ODD just including the
>elements we know to map to CIDOC-CRM; the conversion as such could
>then be included using equiv. This is a very interesting idea to
>develop, with consequences beyond the TEI/CIDOC-CRM relationship.
>Centre for Computing in the Humanities, King's College London
>Unit for Digital Documentation, University of Oslo
>This SF.net Dev2Dev email is sponsored by:
>Show off your parallel programming skills.
>Enter the Intel(R) Threading Challenge 2010.
>Tei-ontology-sig mailing list
>No virus found in this incoming message.
>Checked by AVG - http://www.avg.com
>Version: 9.0.851 / Virus Database: 271.1.1/3107 - Release Date: