Menu

#155 Expansion to guidelines for "haplography"

9.6
done
guidelines (82)
5(medium)
2024-03-21
2021-06-13
No

The current guidelines for ligatures show examples for ligatured letters within a word, and those spanning two words where the ligatured letters are different.

In runic inscriptions, due to runic orthography not allowing the same rune to be written twice in succession, we get another form of "ligature" where the same rune is shared between words. One example is ok kuþs (and God's) rendered as okuþs , with the k shared between words.

It would be great to have guidelines on how to handle such examples, either as an expansion of ligatures, or whether it should be handled as an omitted character. It also raises the question on how the xslt would handle such a case.

Discussion

  • BODARD Gabriel

    BODARD Gabriel - 2021-06-14
    • labels: --> guidelines
     
  • BODARD Gabriel

    BODARD Gabriel - 2021-06-14

    In a Greek inscription I would call this a haplography, i.e. a kind of omission. I guess it's slightly different when it's a common linguistic phenomenon rather than an error/irregular spelling, but from your description I don't see why it needs special handling. The "k" in "okuþs" looks no different from a normal "k", right? The only difficulty is in tokenisation—much as words written with crasis in Greek, which we currently handle with a single <w> with two values in @lemma.

    I certainly wouldn't call this a ligature, unless I'm very much misunderstanding or missing something…

     
    • Karl Farrugia

      Karl Farrugia - 2021-06-14

      No the k there would certainly not look differently. As you say, the difficultly is indeed in the tokenisation and in any eventual possibility for analysis, and I was wondering if a similar solution to the ligature with <link> could be used (which is why I was thinking about ligatures even though it certainly is not one). However your solution with crasis could certainly work, although perhaps I might have missed it in the guidelines.
      Thank you so much for your response!

       
      • BODARD Gabriel

        BODARD Gabriel - 2021-06-14

        It's quite possible that the crasis solution is not mentioned in the Guidelines (another ticket to request adding it, and suggest where? Under <w> or somewhere else?), but in short this is what I have done with this case:

        <w lemma="καί ἐκ">κἆκ</w>

        On the understanding that space-delimited values in @lemma are multiple terms that are all indexed side-by side. In other words, this is functionally equivalent to, but more elegant than <w lemma="καί"><w lemma="ἐκ">κἆκ</w></w>.

        Aside: I use the same "stacked" lemmata to indicate an uncertainly restored word that we might want to appear in the index in two places (where ease of discovery is more important than statistical accuracy), e.g.: <w lemma="ὁ ὅς">τὴν</w>.

         
  • BODARD Gabriel

    BODARD Gabriel - 2021-08-17

    We will look at whether we should add guidelines for this in "Words and Lemmatization" and/or "Regularisation," in consultation with the Runic inscriptions project. To be addressed at the September EpiDoc ticket sprint.

     
  • BODARD Gabriel

    BODARD Gabriel - 2021-08-21

    I have asked Karl and colleagues at the Runic inscription project to post here the answers to these questions:

    1. Could you provide with a couple of examples of this phenomenon, such as the okuþs that you gave in the ticket, including the reference of the inscription or manuscript in which it appears?
    2. How is this normally indicated in a print edition? (In a Greek or Latin text we might see the omitted character supplied, for example, as "aequ<u>s", or if across two words, "Ζεὺ(δ) δέ".)
    3. Why in particular do you need to indicate this in the digital edition? (In order to tag individual words? For indexing? To reflect the print edition? Other reasons?)
     
  • BODARD Gabriel

    BODARD Gabriel - 2021-08-21
    • summary: Expansion to the ligature guidelines --> Expansion to guidelines for "haplography"
    • status: unread --> needs-feedback
    • assigned_to: BODARD Gabriel
     
  • Karl Farrugia

    Karl Farrugia - 2021-08-23

    Thank you Gabriel for the follow-up! Here are my answers to the requests. Sorry it's a bit long, but I hope these answer your questions adequately.

    1. I'm including 5 examples here to show its utilisation in a variety of ways. All of them are in runes, transliterated. The first line is the signum, second is a transliteration as found on the inscription, third is expanded, and fourth is standardised.
      a. is quite straightforward in Old Danish, with the d being shared between gud and drotæn
      b.-d. I'm including to show that this is a specifically runic phenomenon, rather than linguistic, with the same technique being used in Latin inscriptions in runes.
      e. is a relatively rare, but nonetheless attested for, phenomenon, where multiple letters are shared. In this case, the final ku is shared with the initial ku in the phrase hlku kuþrs.

    a. DR EM85;493
    gudrotæn
    gud drotæn

    Guð dróttin

    b. DR NOR1999;21
    spiritusancti
    spiritus sancti

    Spiritus sancti

    c. G 278
    inomini tomini
    in nomini tomini

    In nomine Domini

    d. N 615
    pater : noster : kuisinselo
    pater noster kui is in selo

    Pater noster, qui es in cœlis

    e. N 11
    hlkuþrs
    hlku kuþrs

    helgu Guðs

    1. The representation differs between printed editions and online corpora. In printed editions, they are not marked, opting instead for a straight transliteration. In online corpora, it is marked using pipes. For example DR EM85;493 (a. above) is given as gud| |drotæn. In N 11 (e.), it is marked as hlk|u| |k|uþrs.

    2. The main reason for indicating these cases is that all words are tokenised within <w> in the project to allow us to include both the transcription and the normlised versions using <choice> <orig> <reg>. While full lemmatisation is not in the project's scope at the moment, this may be a possibility in the future. In addition, the project aims at tagging certain phrases, including run-of-the-mill parts like names/persons, and also things like prayers, certain formulae etc. (which we are currently wrapping in <rs> tags). I am currenly wrapping the joined-up phrase in a single <w>, with the <orig> being a transliteration (e.g. gudrotæn) and the <reg> showing the full normalisation (e.g. Guð dróttin). The issue that comes out of this is that, if we want to tag only a part of such a setup, e.g. only gud in gudrotæn, it could be a problem if it is bound up within the same <w>, since we would normally wrap the <persName> around the whole tokens.
      In addition, if we opt to follow the indication system used by the online corpora, we would require a way to mark the phenomenon to be captured by the XSLT and correctly represented in the edition.

     

    Last edit: BODARD Gabriel 2021-09-21
  • BODARD Gabriel

    BODARD Gabriel - 2021-09-21

    We propose to add:

    1. to the lemmatization page, examples (a) and (e) from Karl's note above
    2. to the reg/orig page, a couple examples of regularising/normalising two words with haplography between them, also from the list above.
     
  • BODARD Gabriel

    BODARD Gabriel - 2021-09-21
    • Group: 9.2 --> 9.3
     
  • BODARD Gabriel

    BODARD Gabriel - 2022-01-18
    • Group: 9.3 --> 9.4
     
  • BODARD Gabriel

    BODARD Gabriel - 2022-12-13
    • status: needs-feedback --> accepted
    • assigned_to: BODARD Gabriel --> Martina Filosa
     
  • BODARD Gabriel

    BODARD Gabriel - 2022-12-13

    @filosam Please add Runic examples as discussed above to idx-wordslemmata

     
  • BODARD Gabriel

    BODARD Gabriel - 2023-06-15
    • Group: 9.4 --> 9.6
     
  • Martina Filosa

    Martina Filosa - 2024-03-06

    Done! 😊 It needs some proofreading, though.

     
  • Martina Filosa

    Martina Filosa - 2024-03-06
    • status: accepted --> needs-feedback
     
  • Martina Filosa

    Martina Filosa - 2024-03-21
    • status: needs-feedback --> done
     

Log in to post a comment.