Request to change the TEI to allow < m > to contain <hi>. This is essential for allowing paleographical annotations of letters in languages whose linguistic elements are segmented on the morpheme (< m >) level, below the word (<w>) level. Currently, since < m > does not contain <hi>, any paleographic annotation using hi@rend on a character or sequence of characters within a morpheme (< m >) does not validate.
Would
<seg>
not be better than<hi>
for paleographic annotation?Are you suggesting to put a <seg> in the <m> and then do <hi>? Like this?
This would be possible, but it raises the question why we need <seg> within <m> but not within <w>. If a word or part of a word can be highlighted, shouldn't a morpheme or part of a morpheme be eligible for highlighting as well? The suggestion is to allow this, which seems intuitive to me:
We use <m> rather than <seg> for principled reasons (-ing is a morpheme), and having both seems redundant (though it does validate).
It's not very elegant, I agree. I don't think I've seen a project where a morphological hierarchy (
<m>
s etc.) is mixed up with a typographic transcriptional hierarchy (bold, italic, all that).But remember you can put @rend on
<seg>
directly:and you could distinguish this use of
<seg>
from linguistic annotation using its @type attribute.I appreciate your pointing us to existing ways to use the TEI tagset.
We are trying to be as synchronous as possible with the usage practices for the EpiDoc subset of the TEI. Our corpus is in Coptic, a language that is really structured by morphemes not words. The only other existing TEI corpus in Coptic is the papyri.info set of Coptic documentary papyri and ostraca. The guidelines and usage practices for EpiDoc are to use <hi> for the purposes we are describing (subscript, superscript, color, etc.). For reference:
http://www.stoa.org/epidoc/gl/latest/trans-charactershighlighted.html
http://www.stoa.org/epidoc/gl/latest/trans-raisedlowered.html
http://www.stoa.org/epidoc/gl/latest/trans-tallorsmall.html
To be as interoperable as possible with an existing usage, we are requesting the ability to use <hi> within in the same way that the guidelines dictate <hi> usage within <w>. Our other choices are to follow your suggestion and use <seg> throughout our entire corpus, and then it will not be compatible with the papyri.info corpus; or to use <hi> whenever we don't have an annotation and <seg> whenever we do, but that seems a little strange and internally inconsistent. And again, is not interoperable with papyri.info.
If the TEI agrees to the change, we will then petition EpiDoc for this change, as well. We are taking this approach, because we think allowing <hi> to be nested inside is a less significant change than asking EpiDoc to change their <seg> and <hi> usages and Guidelines.
Thanks for the consideration.
In haste: if the TEI agrees to this change (which I would be in favour of, and can discuss further later if needed) then the EpiDoc schema will inherit it from the TEI schema, so no separate petitioning will be needed!
(Indeed, if the TEI accept this change soon, then the EpiDoc ODD might implement it before it is available in the TEI schema, on the understanding that it will become canonical within a few months.)
I think I should hand this over to EpiDoc experts at this point -- I'll ask Hugh and Gabby to take a look. Thanks for your patience!
The potential issues I would expect to be raised by Council would be along the lines of: does this open a door to an huge cascade of requests for similar inline-level elements (
<emph>
,<soCalled>
,<mentioned>
, etc.) inside<m>
? To which a possible reply would be that we already allow a huge variety of non-linguistic stuff (linebreaks, pagebreaks, forme works,<space>
) inside<m>
, so how is this different?I agree that the precedent you mention above is already set by the inclusion of various inline elements within
<m>
, but perhaps even more so by the inclusion of many more inline elements within<w>
. Surely a morpheme should be no more restricted in what it can contain than a word, given that they are parallel and in some languages almost equivalent concepts. In any case, anything that can appear inside a word can, surely, by definition also appear inside a morpheme, given that if you're marking morphemes then all words are entirely ymade up of morphemes?Gabby, you stated it better than I did. Thank you. Thanks for everyone's careful consideration.
Will implement.
Hugh, if you're going to implement this in the TEI ODD, should we also sneak it into the EpiDoc ODD this week? (Happy to do that, with documentation, if you like.)
Thanks, Hugh!
Just chiming in to say this all makes sense to me, and I prefer it to <seg> since then we're staying consistent with everything else where <m> behaves much like <w>
Many thanks for everyone's deliberation. Does this discussion mean that this request been approved officially?
It has. I will be implementing it soon.
On Thu, Jun 25, 2015 at 5:06 PM, Caroline T. Schroeder ctschroeder@users.sf.net wrote:
Related
Feature Requests:
#556Great news. Thank you!
Just to note, Carrie, that this has already been implemented in the latest EpiDoc release (try validating against http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng and see if it works for you), in anticipation of forthcoming TEI compliance...
Thank you! It seems to be validating. Much appreciated.
Done in r13281.