Menu

#545 Deprecate oVar and pVar, Revamp oRef and pRef

AMBER
open
None
5(default)
2015-02-10
2015-01-31
No

The TEI dictionary chapter comprises four element for referring back to forms in dictionary entries (oRef, oVar, pRef, pVar), whose respective usage has never corresponded to clear-cut scenarios, especially because of the lack of clear set of use cases and examples. This has lead to a low usage of these elements in most TEI based dictionary projects but also in the absence of best practices for all the concrete cases (examples, etymology) where marking forms and associating them to (real or virtual) entries would help formalising lexical content in a systematic way.

The main issue is that the difference between pRef and pVar (resp. oRef/oVar) does not match the logic of tagging form references in a dictionary entry:

  • pRef (resp. oRef) is limited (empty content) to the case where the form is exactly the one on the same entry, which is rarely the case (e.g. when orthographic variants exist)
  • pVar is only intended to be used when there is a variation (e.g. inflected form) but contrary to pRef, but with its non empty content, it is often tempting to use it to mark all types of forms

It has also been pointed out that there are also issues related to the unsatisfactory definition of @type and the absence of @notation.

Proposal: we suggest to drop oVar and pVar and extent both the scope and content model of pRef and oRef to offer a simple system for the annotation of forms (orthographic and phonetic) in dictionary entries, with a clear parallel to orth and pron in the description of forms.

The main changes would be:

  • allow text in oRef and pRef; while keeping the possibility to leave them empty when necessary
  • make them member of att.typed
  • make them member of att.lexicographic to bring them in line with and enable full correspondence with linguistic/lexicographic usage of orth> and prone
  • add @notation in pRef in order to bring it in line with pron (probably a good opportunity to make a class out of @notation); useful in cases where there are more than one notations being represented in pron
  • from a semantic point of view, allow these elements to point to any dictionary entry not just the current entry’s head item (same dictionary or even other dictionaries in the case of the marking up of etymology)

Related

Bugs: #720

Discussion

  • Piotr Banski

    Piotr Banski - 2015-01-31

    This looks like a good step forward -- deprecating the Var elements while giving the Ref ones more flexibility is a welcome suggestion.

    I have a remark and a request, for now:

    • Syd has already handled @notation in pVar (see ticket #523), so extending this to the modified pRef definitely calls out for a class.

    • would you please elaborate on the last point, i.e., the long-distance references, possibly by adducing some use cases? the suggestion seems logical on the one hand, but it also implies that the "tilde rendering" gets pushed from the more-or-less central focus of these elements, to a very contextual side-effect. I'm not saying it's bad, but its effect is worth highlighting already at this stage.

     
  • Laurent Romary

    Laurent Romary - 2015-02-04

    To answer Piotr on the last point, here's a possible example of what we have in ming for etymology. It describes a borrowing from English to Japanese. The idea is to mark-up forms (and pronunciations) by means of the revamped oRef/pRef so that one can point to another lexical resource. It may be the case that this resource doe snot exist (yet) or cannot be referenced. @corresp is thus optional of course. But the underlying semantic that etymon are for that would potentially deserve lexical description seems important to me.

             <entry xml:id="taxi" xml:lang="jpn">
                <form type="lemma">
                   <orth type="transliterated" notation="romanji">takushī</orth>
                   <orth notation="katakana">タクシー</orth>
                   <pron notation="ipa">taku'shi:</pron>
                   <gramGrp>
                      <pos>noun</pos>
                   </gramGrp>
                </form>
                <sense>
                   <cit type="translation">
                      <quote>taxi</quote>
                   </cit>
                </sense>
                <etym type="borrowing">
                   <lbl>source</lbl>
                   <lang>English</lang>
                   <cit type="etymon">
                      <oRef xml:lang="eng-US" corresp="http://en.wiktionary.org/wiki/taxi">taxi</oRef>
                      <pRef xml:lang="eng-US">'tæksi</pRef>
                   </cit>
                </etym>
             </entry>
    
     
  • Piotr Banski

    Piotr Banski - 2015-02-04

    Hi Laurent, thanks for this, I'll try to have a closer look by the end of the day. This appears to take the *Ref elements into the new century, but then, that's what they needed.

    One remark for now: your @xml:lang is placed too high: on the <entry>, it also incorrectly applies to <sense> and <etym>.

     
  • Piotr Banski

    Piotr Banski - 2015-02-04

    ... and <pRef xml:lang="eng-US">'tæksi</pRef> is linguistically strange, as well. I don't know if xml:lang (or rather the relevant RFC or BP) accepts sublabels for phonetic script, but if it does, one should be applied here, I think.

     
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-02-10
    • assigned_to: Stefanie Gehrke
     
  • Hugh A. Cayless

    Hugh A. Cayless - 2015-02-10

    Assigning to Stefanie. This looks like a fair amount of work, and I think will need some discussion here and/or on the Council list, and may need to be broken up into smaller chunks. [feature-requests:#544] would be invalidated if this is implemented, so I think they go together.

     

    Related

    Feature Requests: #544