#578 partial and recursive segmentation of s-units

GREEN
closed-fixed
Martin Holmes
None
1(low)
2013-11-27
2013-06-05
Kevin Hawkins
No

The content model of <s> allows for this element to nest inside itself. However, the definition has a note that says, "For segmentation which is partial or recursive, the seg should be used instead." In this case, it seems that <s> should not be allowed to nest inside itself, or we should drop that note.

Discussion

  • Lou Burnard
    Lou Burnard
    2013-06-05

    The distinction between <s> and <seg> is precisely that the former
    may not self-nest. In P3 and earlier SGML-based versions of the
    Guidelines this eas enforced by means of an inclusion exception. In P4
    it was not enforced, and the note you refer to was added. In P5 there is
    a schematron rule to enforce this constraint, so I would question your
    assertion that <s> can self-nest. The original intention btw was also
    that <s> should provide an end-to-end segmentation of a text, but we
    have not yet added a constraint to that effect.

     
    Last edit: Kevin Hawkins 2013-06-05
  • Piotr Banski
    Piotr Banski
    2013-06-05

    If <s> is used together with <phr> and <w> to directly reflect the underlying syntactic constituent structure, it makes every sense to let <s> self-nest. It makes no sense not to let it self-nest, in fact.

     
    Last edit: Piotr Banski 2013-11-10
  • Piotr Banski
    Piotr Banski
    2013-06-05

    I think we're looking at an unfortunate mixture of two interpretations of < s>: as a span within running text, and as a syntactic node in a syntactic representation. The note on < seg> that Kevin quotes might make some sense on the former interpretation. It doesn't make any sense whatsoever on the latter interpretation.

     
  • Kevin Hawkins
    Kevin Hawkins
    2013-06-05

    I was looking at the content model, not the presence of any Schematron constraints. I see now that the content model of <s> uses macro.phraseSeq, so I assume that we decided it was more elegant to keep that and add a Schematron constraint rather than set up a content model that includes all of macro.phraseSeq except for <s>.

    I suggest revising the note from:

    For segmentation which is partial or recursive, the seg should be used instead.

    to:

    For end-to-end segmentation which is partial or recursive, seg should be used instead.

     
  • Piotr Banski
    Piotr Banski
    2013-06-05

    What does "end-to-end" mean, please? Especially in connection with "partial".

     
  • Kevin Hawkins
    Kevin Hawkins
    2013-06-05

    I took that language from the previous sentence in the note in the element spec:

    The s element may be used to mark orthographic sentences, or any other segmentation of a text, provided that the segmentation is end-to-end, complete, and non-nesting.

    Lou used it as well, and I think it is being used the way we sometimes use "tessellating": that is, encoding all of the character data in exactly one instance of the element in question.

     
  • James Cummings
    James Cummings
    2013-11-09

    What is needed to close this ticket? More clarity in the guidelines?

     
  • Kevin Hawkins
    Kevin Hawkins
    2013-11-10

    We need to decide whether we think my proposed wording in my comment above ( https://sourceforge.net/p/tei/bugs/578/#5f95 ) is actually clearer or just raises more questions. If it's clearer, we need to decide whether to accept it and who will implement.

     
  • Lou Burnard
    Lou Burnard
    2013-11-10

    Sorry Kevin, but I find your rewording confusing. You can use <seg> for any kind of segmentation, not simply end-to-end segmentation. In fact it is quite plausible to have an end to end segmentation (a tesselation, if you prefer) done with <s> and then to nest <seg>s within them. And, as Piotr, suggests it makes little sense to talk about "partial" end-to-end-segmentation.

     
  • Martin Holmes
    Martin Holmes
    2013-11-12

    Council 2013-11-12: Action on MH to revise either the content model of s so that it doesn't nest (copying macro.phraseSeq and removing s), or removing s from macro.phraseSeq and replacing it manually everywhere macro.phraseSeq would put it.

     
  • Martin Holmes
    Martin Holmes
    2013-11-12

    • assigned_to: Martin Holmes
    • Group: AMBER --> GREEN
    • Priority: 5 --> 1(low)
     
  • Martin Holmes
    Martin Holmes
    2013-11-27

    • status: open --> closed-fixed
     
  • Martin Holmes
    Martin Holmes
    2013-11-27

    I've implemented this at rev 12668, although I must say I don't like the results at all; the content model of <s> is now truly horrible, and will get out of sync with macro.phraseSeq if we're not careful. I would actually recommend reversing this decision and letting the Schematron do the job.