EpiDoc: Epigraphic Documents in TEI XML / Request Features / #155 Expansion to guidelines for "haplography"

BODARD Gabriel - 2021-06-14

labels: --> guidelines
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2021-06-14

In a Greek inscription I would call this a haplography, i.e. a kind of omission. I guess it's slightly different when it's a common linguistic phenomenon rather than an error/irregular spelling, but from your description I don't see why it needs special handling. The "k" in "okuþs" looks no different from a normal "k", right? The only difficulty is in tokenisation—much as words written with crasis in Greek, which we currently handle with a single <w> with two values in @lemma.

I certainly wouldn't call this a ligature, unless I'm very much misunderstanding or missing something…

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Karl Farrugia - 2021-06-14
  
  No the k there would certainly not look differently. As you say, the difficultly is indeed in the tokenisation and in any eventual possibility for analysis, and I was wondering if a similar solution to the ligature with <link> could be used (which is why I was thinking about ligatures even though it certainly is not one). However your solution with crasis could certainly work, although perhaps I might have missed it in the guidelines.
  Thank you so much for your response!
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
  - BODARD Gabriel - 2021-06-14
    
    It's quite possible that the crasis solution is not mentioned in the Guidelines (another ticket to request adding it, and suggest where? Under <w> or somewhere else?), but in short this is what I have done with this case:
    
    <w lemma="καί ἐκ">κἆκ</w>
    
    On the understanding that space-delimited values in @lemma are multiple terms that are all indexed side-by side. In other words, this is functionally equivalent to, but more elegant than <w lemma="καί"><w lemma="ἐκ">κἆκ</w></w>.
    
    Aside: I use the same "stacked" lemmata to indicate an uncertainly restored word that we might want to appear in the index in two places (where ease of discovery is more important than statistical accuracy), e.g.: <w lemma="ὁ ὅς">τὴν</w>.
    
    If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2021-08-17

We will look at whether we should add guidelines for this in "Words and Lemmatization" and/or "Regularisation," in consultation with the Runic inscriptions project. To be addressed at the September EpiDoc ticket sprint.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2021-08-21

I have asked Karl and colleagues at the Runic inscription project to post here the answers to these questions:

Could you provide with a couple of examples of this phenomenon, such as the okuþs that you gave in the ticket, including the reference of the inscription or manuscript in which it appears?

How is this normally indicated in a print edition? (In a Greek or Latin text we might see the omitted character supplied, for example, as "aequ<u>s", or if across two words, "Ζεὺ(δ) δέ".)

Why in particular do you need to indicate this in the digital edition? (In order to tag individual words? For indexing? To reflect the print edition? Other reasons?)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2021-08-21

summary: Expansion to the ligature guidelines --> Expansion to guidelines for "haplography"

status: unread --> needs-feedback

assigned_to: BODARD Gabriel
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Farrugia - 2021-08-23

Thank you Gabriel for the follow-up! Here are my answers to the requests. Sorry it's a bit long, but I hope these answer your questions adequately.

I'm including 5 examples here to show its utilisation in a variety of ways. All of them are in runes, transliterated. The first line is the signum, second is a transliteration as found on the inscription, third is expanded, and fourth is standardised.
a. is quite straightforward in Old Danish, with the d being shared between gud and drotæn
b.-d. I'm including to show that this is a specifically runic phenomenon, rather than linguistic, with the same technique being used in Latin inscriptions in runes.
e. is a relatively rare, but nonetheless attested for, phenomenon, where multiple letters are shared. In this case, the final ku is shared with the initial ku in the phrase hlku kuþrs.

a. DR EM85;493
gudrotæn
gud drotæn
Guð dróttin

b. DR NOR1999;21
spiritusancti
spiritus sancti
Spiritus sancti

c. G 278
inomini tomini
in nomini tomini
In nomine Domini

d. N 615
pater : noster : kuisinselo
pater noster kui is in selo
Pater noster, qui es in cœlis

e. N 11
hlkuþrs
hlku kuþrs
helgu Guðs

The representation differs between printed editions and online corpora. In printed editions, they are not marked, opting instead for a straight transliteration. In online corpora, it is marked using pipes. For example DR EM85;493 (a. above) is given as gud| |drotæn. In N 11 (e.), it is marked as hlk|u| |k|uþrs.

The main reason for indicating these cases is that all words are tokenised within <w> in the project to allow us to include both the transcription and the normlised versions using <choice> <orig> <reg>. While full lemmatisation is not in the project's scope at the moment, this may be a possibility in the future. In addition, the project aims at tagging certain phrases, including run-of-the-mill parts like names/persons, and also things like prayers, certain formulae etc. (which we are currently wrapping in <rs> tags). I am currenly wrapping the joined-up phrase in a single <w>, with the <orig> being a transliteration (e.g. gudrotæn) and the <reg> showing the full normalisation (e.g. Guð dróttin). The issue that comes out of this is that, if we want to tag only a part of such a setup, e.g. only gud in gudrotæn, it could be a problem if it is bound up within the same <w>, since we would normally wrap the <persName> around the whole tokens.
In addition, if we opt to follow the indication system used by the online corpora, we would require a way to mark the phenomenon to be captured by the XSLT and correctly represented in the edition.

Last edit: BODARD Gabriel 2021-09-21
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2021-09-21

We propose to add:

to the lemmatization page, examples (a) and (e) from Karl's note above

to the reg/orig page, a couple examples of regularising/normalising two words with haplography between them, also from the list above.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2021-09-21

Group: 9.2 --> 9.3
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2022-01-18

Group: 9.3 --> 9.4
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2022-12-13

status: needs-feedback --> accepted

assigned_to: BODARD Gabriel --> Martina Filosa
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2022-12-13

@filosam Please add Runic examples as discussed above to idx-wordslemmata

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

BODARD Gabriel - 2023-06-15

Group: 9.4 --> 9.6
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martina Filosa - 2024-03-06

Done! 😊 It needs some proofreading, though.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martina Filosa - 2024-03-06

status: accepted --> needs-feedback
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Martina Filosa - 2024-03-21

status: needs-feedback --> done
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Expansion to guidelines for "haplography"

XML text markup for ancient documents

Group

Searches

Help

#155 Expansion to guidelines for "haplography"

Discussion