The current guidelines for ligatures show examples for ligatured letters within a word, and those spanning two words where the ligatured letters are different.
In runic inscriptions, due to runic orthography not allowing the same rune to be written twice in succession, we get another form of "ligature" where the same rune is shared between words. One example is ok kuþs (and God's) rendered as okuþs , with the k shared between words.
It would be great to have guidelines on how to handle such examples, either as an expansion of ligatures, or whether it should be handled as an omitted character. It also raises the question on how the xslt would handle such a case.
In a Greek inscription I would call this a haplography, i.e. a kind of omission. I guess it's slightly different when it's a common linguistic phenomenon rather than an error/irregular spelling, but from your description I don't see why it needs special handling. The "k" in "okuþs" looks no different from a normal "k", right? The only difficulty is in tokenisation—much as words written with crasis in Greek, which we currently handle with a single
<w>
with two values in@lemma
.I certainly wouldn't call this a ligature, unless I'm very much misunderstanding or missing something…
No the k there would certainly not look differently. As you say, the difficultly is indeed in the tokenisation and in any eventual possibility for analysis, and I was wondering if a similar solution to the ligature with
<link>
could be used (which is why I was thinking about ligatures even though it certainly is not one). However your solution with crasis could certainly work, although perhaps I might have missed it in the guidelines.Thank you so much for your response!
It's quite possible that the crasis solution is not mentioned in the Guidelines (another ticket to request adding it, and suggest where? Under
<w>
or somewhere else?), but in short this is what I have done with this case:<w lemma="καί ἐκ">κἆκ</w>
On the understanding that space-delimited values in
@lemma
are multiple terms that are all indexed side-by side. In other words, this is functionally equivalent to, but more elegant than<w lemma="καί"><w lemma="ἐκ">κἆκ</w></w>
.Aside: I use the same "stacked" lemmata to indicate an uncertainly restored word that we might want to appear in the index in two places (where ease of discovery is more important than statistical accuracy), e.g.:
<w lemma="ὁ ὅς">τὴν</w>
.We will look at whether we should add guidelines for this in "Words and Lemmatization" and/or "Regularisation," in consultation with the Runic inscriptions project. To be addressed at the September EpiDoc ticket sprint.
I have asked Karl and colleagues at the Runic inscription project to post here the answers to these questions:
Thank you Gabriel for the follow-up! Here are my answers to the requests. Sorry it's a bit long, but I hope these answer your questions adequately.
a. is quite straightforward in Old Danish, with the d being shared between gud and drotæn
b.-d. I'm including to show that this is a specifically runic phenomenon, rather than linguistic, with the same technique being used in Latin inscriptions in runes.
e. is a relatively rare, but nonetheless attested for, phenomenon, where multiple letters are shared. In this case, the final ku is shared with the initial ku in the phrase hlku kuþrs.
a. DR EM85;493
gudrotæn
gud drotæn
Guð dróttin
b. DR NOR1999;21
spiritusancti
spiritus sancti
Spiritus sancti
c. G 278
inomini tomini
in nomini tomini
In nomine Domini
d. N 615
pater : noster : kuisinselo
pater noster kui is in selo
Pater noster, qui es in cœlis
e. N 11
hlkuþrs
hlku kuþrs
helgu Guðs
The representation differs between printed editions and online corpora. In printed editions, they are not marked, opting instead for a straight transliteration. In online corpora, it is marked using pipes. For example DR EM85;493 (a. above) is given as gud| |drotæn. In N 11 (e.), it is marked as hlk|u| |k|uþrs.
The main reason for indicating these cases is that all words are tokenised within
<w>
in the project to allow us to include both the transcription and the normlised versions using<choice> <orig> <reg>
. While full lemmatisation is not in the project's scope at the moment, this may be a possibility in the future. In addition, the project aims at tagging certain phrases, including run-of-the-mill parts like names/persons, and also things like prayers, certain formulae etc. (which we are currently wrapping in<rs>
tags). I am currenly wrapping the joined-up phrase in a single<w>
, with the<orig>
being a transliteration (e.g. gudrotæn) and the<reg>
showing the full normalisation (e.g. Guð dróttin). The issue that comes out of this is that, if we want to tag only a part of such a setup, e.g. only gud in gudrotæn, it could be a problem if it is bound up within the same<w>
, since we would normally wrap the<persName>
around the whole tokens.In addition, if we opt to follow the indication system used by the online corpora, we would require a way to mark the phenomenon to be captured by the XSLT and correctly represented in the edition.
Last edit: BODARD Gabriel 2021-09-21
We propose to add:
@filosam Please add Runic examples as discussed above to idx-wordslemmata
Done! 😊 It needs some proofreading, though.