Menu

#556 Allow <hi> to be contained by <m>

AMBER
closed-fixed
None
5(default)
2015-06-27
2015-05-27
No

Request to change the TEI to allow < m > to contain <hi>. This is essential for allowing paleographical annotations of letters in languages whose linguistic elements are segmented on the morpheme (< m >) level, below the word (<w>) level. Currently, since < m > does not contain <hi>, any paleographic annotation using hi@rend on a character or sequence of characters within a morpheme (< m >) does not validate.

Related

Feature Requests: #556

Discussion

  • Martin Holmes

    Martin Holmes - 2015-05-27

    Would <seg> not be better than <hi> for paleographic annotation?

     
  • Amir Zeldes

    Amir Zeldes - 2015-05-27

    Are you suggesting to put a <seg> in the <m> and then do <hi>? Like this?

    <w>walk<m><seg><hi rend="bold">ing</hi></seg></m></w>
    

    This would be possible, but it raises the question why we need <seg> within <m> but not within <w>. If a word or part of a word can be highlighted, shouldn't a morpheme or part of a morpheme be eligible for highlighting as well? The suggestion is to allow this, which seems intuitive to me:

    <w>walk<m><hi rend="bold">ing</hi></m></w>
    

    We use <m> rather than <seg> for principled reasons (-ing is a morpheme), and having both seems redundant (though it does validate).

     
  • Martin Holmes

    Martin Holmes - 2015-05-27

    It's not very elegant, I agree. I don't think I've seen a project where a morphological hierarchy (<m>s etc.) is mixed up with a typographic transcriptional hierarchy (bold, italic, all that).

    But remember you can put @rend on <seg> directly:

    <w>walk<m><seg rend="bold">ing</seg></m></w>
    

    and you could distinguish this use of <seg> from linguistic annotation using its @type attribute.

     
  • Caroline T. Schroeder

    I appreciate your pointing us to existing ways to use the TEI tagset.

    We are trying to be as synchronous as possible with the usage practices for the EpiDoc subset of the TEI. Our corpus is in Coptic, a language that is really structured by morphemes not words. The only other existing TEI corpus in Coptic is the papyri.info set of Coptic documentary papyri and ostraca. The guidelines and usage practices for EpiDoc are to use <hi> for the purposes we are describing (subscript, superscript, color, etc.). For reference:
    http://www.stoa.org/epidoc/gl/latest/trans-charactershighlighted.html
    http://www.stoa.org/epidoc/gl/latest/trans-raisedlowered.html
    http://www.stoa.org/epidoc/gl/latest/trans-tallorsmall.html

    To be as interoperable as possible with an existing usage, we are requesting the ability to use <hi> within in the same way that the guidelines dictate <hi> usage within <w>. Our other choices are to follow your suggestion and use <seg> throughout our entire corpus, and then it will not be compatible with the papyri.info corpus; or to use <hi> whenever we don't have an annotation and <seg> whenever we do, but that seems a little strange and internally inconsistent. And again, is not interoperable with papyri.info.

    If the TEI agrees to the change, we will then petition EpiDoc for this change, as well. We are taking this approach, because we think allowing <hi> to be nested inside is a less significant change than asking EpiDoc to change their <seg> and <hi> usages and Guidelines.

    Thanks for the consideration.

     
    • BODARD Gabriel

      BODARD Gabriel - 2015-05-28

      In haste: if the TEI agrees to this change (which I would be in favour of, and can discuss further later if needed) then the EpiDoc schema will inherit it from the TEI schema, so no separate petitioning will be needed!

      (Indeed, if the TEI accept this change soon, then the EpiDoc ODD might implement it before it is available in the TEI schema, on the understanding that it will become canonical within a few months.)

       
  • Martin Holmes

    Martin Holmes - 2015-05-28

    I think I should hand this over to EpiDoc experts at this point -- I'll ask Hugh and Gabby to take a look. Thanks for your patience!

     
  • Martin Holmes

    Martin Holmes - 2015-05-28

    The potential issues I would expect to be raised by Council would be along the lines of: does this open a door to an huge cascade of requests for similar inline-level elements (<emph>, <soCalled>, <mentioned>, etc.) inside <m>? To which a possible reply would be that we already allow a huge variety of non-linguistic stuff (linebreaks, pagebreaks, forme works, <space>) inside <m>, so how is this different?

     
    • BODARD Gabriel

      BODARD Gabriel - 2015-05-28

      I agree that the precedent you mention above is already set by the inclusion of various inline elements within <m>, but perhaps even more so by the inclusion of many more inline elements within <w>. Surely a morpheme should be no more restricted in what it can contain than a word, given that they are parallel and in some languages almost equivalent concepts. In any case, anything that can appear inside a word can, surely, by definition also appear inside a morpheme, given that if you're marking morphemes then all words are entirely ymade up of morphemes?

       
  • Caroline T. Schroeder

    Gabby, you stated it better than I did. Thank you. Thanks for everyone's careful consideration.

     
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-05-28

    Will implement.

     
    • BODARD Gabriel

      BODARD Gabriel - 2015-05-28

      Hugh, if you're going to implement this in the TEI ODD, should we also sneak it into the EpiDoc ODD this week? (Happy to do that, with documentation, if you like.)

       
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-05-28
    • assigned_to: Hugh A. Cayless
     
    • Caroline T. Schroeder

      Thanks, Hugh!

       
  • Amir Zeldes

    Amir Zeldes - 2015-05-28

    Just chiming in to say this all makes sense to me, and I prefer it to <seg> since then we're staying consistent with everything else where <m> behaves much like <w>

     
  • Caroline T. Schroeder

    Many thanks for everyone's deliberation. Does this discussion mean that this request been approved officially?

     
    • Hugh A. Cayless

      Hugh A. Cayless - 2015-06-25

      It has. I will be implementing it soon.

      On Thu, Jun 25, 2015 at 5:06 PM, Caroline T. Schroeder ctschroeder@users.sf.net wrote:

      Many thanks for everyone's deliberation. Does this discussion mean that
      this request been approved officially?


      Status: open
      Group: AMBER
      Created: Wed May 27, 2015 04:23 AM UTC by Caroline T. Schroeder
      Last Updated: Thu May 28, 2015 04:04 PM UTC
      Owner: Hugh A. Cayless

      Request to change the TEI to allow < m > to contain <hi>. This is
      essential for allowing paleographical annotations of letters in languages
      whose linguistic elements are segmented on the morpheme (< m >) level,
      below the word (<w>) level. Currently, since < m > does not contain <hi>,
      any paleographic annotation using hi@rend on a character or sequence of
      characters within a morpheme (< m >) does not validate.


      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/tei/feature-requests/556/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Feature Requests: #556

  • Caroline T. Schroeder

    Great news. Thank you!

     
    • BODARD Gabriel

      BODARD Gabriel - 2015-06-26

      Just to note, Carrie, that this has already been implemented in the latest EpiDoc release (try validating against http://www.stoa.org/epidoc/schema/latest/tei-epidoc.rng and see if it works for you), in anticipation of forthcoming TEI compliance...

       
  • Caroline T. Schroeder

    Thank you! It seems to be validating. Much appreciated.

     
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-06-27
    • status: open --> closed-fixed
     
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-06-27

    Done in r13281.