People sometimes want to attach metadata to a TEI document that doesn't fit into an existing header element. For example, you might want to attach Dublin Core metadata, or you might want to include Cataloguing-in-Publication data for the TEI document. It would be good to have a place for such metadata.
We might create a new optional child of <teiHeader>
called something like <containerForOtherMetadata>
. In it you could put elements from another namespace (like <dc:title>
), or you might create your own elements (like <cip>
). I'm not sure what the content model for <containerForOtherMetadata>
should be in order to allow for these things. Maybe it actually requires that <containerForOtherMetadata>
be in a different namespace.
(It's possible that before this ticket is resolved we will have added a new element for CIP data as a child of <publicationStmt>
. That idea is also being considered on TEI-L as I write this.)
Would the proposed
<standoff
> element fill this gap?Since Lou refers to "the proposed
<standoff>
element", I wonder whether there's a proposal of some sort discussing this element that I could look at. I'm afraid there's additional context that Lou and others are referring to that I'm unaware of.Ah, Lou was referring to https://sourceforge.net/p/tei/feature-requests/378/ (which is also referenced in comments below).
To me,
<standoff>
will be for ancillary components such as editorial notes, prosopographical info and other textual content which is "part of" the edition but not part of the original source text. Metadata is a different thing, and I think it does belong in the<teiHeader>
with the TEI metadata.So if I use RDF triples to represent my Dublin Core metadata, and I want
a convenient place to store it, it would go in the header, but if I use
them to describe the topics or peoples represented in my text they'd go
in
<standoff>
?(By the way, editorial notes generally go in the body of a TEI text,
don't they? and prosopographical info generally goes in the TEI Header,
at present, at least, doesn't it?)
Last edit: Kevin Hawkins 2013-06-18
Currently, editorial notes usually do go in the body of the text; information about people often goes in the header (because of particDesc), while other very similar information (about e.g. ships, places, etc.) may go virtually anywhere, including the body of a separate file. It's precisely this confusion that
<standoff>
is trying to address, surely; there should be a standard place to put peripheral material that is not metadata.To address the question about your RDF triples, I agree: if it's metadata, it belongs in the header, but if it's something else it belongs somewhere else. It seems unlikely to me that people would include Dublin Core encoded in RDF when it's not metadata, but I suppose it's possible.
RDF triples which annotate the text, ie assert that the
<person>
"Will Shakspear" here is the "sameAs" that person in dbpedia, would (I believe) go in the<standoff>
section,because they are not metadata. Well, you can argue the opposite too
I dont care where
<standoff>
(or any other name) goes, so longer as there is clear guidance for users on where to put it and how to use it.Should this discussion simply be rolled into https://sourceforge.net/p/tei/feature-requests/378/
and this ticket closed?
I don't think so, the Council didn't approve of <standoff> as the element "for everything you can't think of where to stow", but rather for a specific purpose :-)
Last edit: Piotr Banski 2013-11-10
Can someone please remind us for which "specific purpose" <standoff is intended, and why it would not be applicable for storing RDF triples of any kind ?
Only had a while now to search for that. Here's one link:
http://www.tei-c.org/Activities/Council/Meetings/tcm52.xml#body.1_div.2_div.5 (group B)
"Council agrees with Group B who thinks that this will be a beneficial addition to the TEI and that an we should have a container element
<standOff>
as part of model.resouceLike which contains things like<linkGrp>
/<link>
,<seg>
/<ab>
/<s>
/etc. and various other elements, including some new elements to be discussed by a later working party."I suggest the relevant minutes :-) . Additionally, Kevin's not talking of "triples of any kind" -- it's you who does so, a tad casuistically, I'm afraid. Triples of /some/ kind, that you would normally use to annotate the content of <text>, sure. But triples that express information alternative to or above that normally put in the header -- well, that's the way to abuse elements even before they are born... :-)
Last edit: Piotr Banski 2013-11-10
It seems that the confusion with <standoff> might stem from the idea of attaching a new element somewhere high in the tree. But whereas <standoff> was suggested, and approved, as a sibling of <text>, Kevin suggests a new element as a child of <teiHeader>, hence in a completely different part of the tree and with different implied semantics. And +1 from me for that.
I would like to suggest to keep this ticket apart for one more reason: Kevin mentions DC and CiP, but there are also other notable metadata (or even meta-metadata) initiatives, such as CMDI from Clarin, which, for at least political reasons, the TEI might want to support (at least by offering a smooth basis for deriving CMDI instances), cf, http://www.clarin.eu/node/3219
May I add that some SIGs might want to cooperate with the Council for that purpose (I can imagine at least Ontologies and LingSIG, possibly also Music, and probably some others).
The discussion has completely gone astray because of the wrong interference of with [standoff]. Still we do need such an element to contain external meta-data (like METS allows one to do). Did the council actually discuss this issue?
We did discuss it -- this is the action from the draft minutes:
Action: MH will offer DC examples and PS will offer MARC examples. LB will pull them together into some text to be inserted into the Guidelines. Given this, Council will consider whether to create the wrapper element.
[Minutes not finalized yet.]
Oxford 2013-11 face-to-face: MH will offer dc examples and PS will offer MARC examples. LB will pull them together into some text to be inserted into the Guidelines. Given this, Council will reconsider the feature request and whether to create the wrapper element.
Our Despatches project has an OAI-PMH interface which provides OAI records containing Dublin Core elements:
http://bcgenesis.uvic.ca/oai.xq?verb=ListRecords&metadataPrefix=oai_dc
I can easily imagine wanting to embed one or more components of an OAI record inside a TEI file. The OAI
<metadata>
element contains elements in the DC namespace, so if the example is to be limited to DC elements, they can be taken from this example (there are several thousand more available if required):Re-assigning to Paul to provide his MARC example.
surely all that OAI stuff is generated? you dont maintain it by hand, do you?
I'm really uncertain about this request. First, I don't think DC and MARC are good examples, since (echoing Sebastian) those are probably derived elements.
On the other hand I see the need of project specific meta data (elements). Cf. e.g. caseDesc of the St. Louis Freedom Suits Legal Encoding Project or the correspDesc element currently under discussion with the SIG correspondence. Additionally it could facilitate the migration of legacy encodings to P5, where all the non-migratable stuff (from the header) could be put into the new wrapper element. But, if there was an element with an anything-goes-content-model that’s probably Pandora’s box …
Yes, the OAI stuff is generated, but I would rather like to include it in the XML view of the file if I could. We already include a lot of DC: metadata in the XHTML view.
You have a TEI XML original; which you then derive DC/OAI from; and then you want to include back the result into the original? This seems odd to me. If the OAI/DC comes from some other source entirely, and is not related to whats in the TEU, and you want to use TEI as an archival/interchange combination format, then I have a bit more interest. but including the same info twice seems like a recipe for problems.