Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#153 element for punctuation marks

GREEN
closed
nobody
5
2009-06-19
2008-12-01
Alexey Lavrentev
No

Tagging punctuation marks can be useful for several reasons: a) preparing a text for processing with linguistic annotation tools ; b) making a distinction bertween marks coming from the primary source and those added by editors; c) research on punctuation systems, etc. Currently available TEI elements do not meet all the requirements for a proper encoding of punctuation marks (see in attachment a revised version of my paper presented at the last TEI MM for detailed arguments).
The proposed *punct* element could join the segLike model class.
Here is a tentavi formal definition:

element punct
{
att.global.attributes,
att.transcriptional.attributes,
att.segLike.attributes,
attribute force { data.word }?,
attribute unit { data.word }?,
attribute direction { "before" | "after" | "unknown" | "inapplicable" }?,
( text | model.gLike | model.cLike | model.pPart.edit )*
}

Discussion

  • Lou Burnard
    Lou Burnard
    2009-01-14

    I don't understand why <c> is not an adequate solution to this problem.

     
  • Lou Burnard
    Lou Burnard
    2009-01-14

    • milestone: --> 871207
     
  • This was the point of my paper ar TEI MM (see attachment). To sum it up: characters and punctuation marks are linguistic objects with different functional and formal properties. The proposed *punct* element has a different set of attributes and content model compared to *c*. I have also considered the possibility of redefining the *c* element to make it more suitable for tagging punctuation (as it seems to be rarely used for other purposes in practice) but this would create an inconsistency in the system of linguistic segmentation tags.

     
  • David Sewell
    David Sewell
    2009-03-29

    [will be discussed in some detail at TEI Council meeting in Lyon]

     
  • Lengthy discussion. Agreed that <c> is used in legacy data as simplification of <punct> . Clear direction is needed, <c> was used ambiguously previously. We dont like name (Menota uses tho).

     
  • Lou Burnard
    Lou Burnard
    2009-04-03

    • milestone: 871207 --> GREEN
     
  • Lou Burnard
    Lou Burnard
    2009-06-19

    Implemented at rev 6591 (though some more examples are still needed)

     
  • Lou Burnard
    Lou Burnard
    2009-06-19

    • status: open --> closed