Yes, of course: my mistake. I had overlooked the fact that vowel+vowel
-> vowel rule isn't a straightforward orthographic process (though maybe
there are scripts in which that would be a harder distinction to make)
Let me try to summarize where I think we are in agreement
(a) it's important and useful to indicate the basis on which a
particular segmentation has been made
(b) sometimes we want to indicate more than possible segmentation for a
given input string
(c) naming these things is not easy
With some hesitation, I think we're agreed on using <choice> as a means
of bracketing together multiple segmentations which we wish to regard as
in some sense altermate views of the same bit of text.
We seem to be disagreeing only on what to call the various types of
segmentation: <seg> has both a type and a subtype attribute, so one
could say e.g. <seg type="word-division" subtype="sandhi"> (for your
level=3) though I agree that the typological implications of that are
less than wonderful. I don't like "level" because it implies to me a
value judgment of some sort. These are not value judgments, they are
simply different procedures that need to be applied to achieve the goal
of a meaningful segmentation.
More crucially though, I am still of the view that the revised choice
model can handle the cases so far discussed. Perhaps I should now set
out exactly how I think that model should work, since I have not yet
done so. Maybe *that* would catch the attention of others on this list!
John Smith wrote:
> On Tue, 21 Sep 2004, Lou-at-home wrote:
>>Thank you for the explanation -- I hope I may tire your patience a
>>little more... Let me see if I have this right now: for levels 1 and 2,
>>the purpose of the level attribute is to indicate the basis on which a
>>word-ending was assigned at this particular point in the input compound:
>>either because of a consonant-vowel combination, or because of removal
>>of a vowel-vowel combination. Level 3, however, seems to be different:
>>segments identified at this level can't be derived from the text by
>>purely orthographic rules, they result from some additional
>>morphological or other linguistic analysis of segments identified at
>>levels 1 or 2.
>>Is that right?
> No. Level 1 analyses orthographic sequences; levels 2 and 3 both
> analyse orthographic *and* morphophonemic sequences, but do so with
> differing degrees of completeness.
>>Because if it is, I think <seg level="1"> and <seg level="2">
>>(preferably less opaquely named) are appropriately members of the
>>tei.sic class, and <seg level="3"> (preferably ditto) is an appropriate
>>member of the tei.corr class, and I still think we have a solution!
> As far as the naming goes, I'm not wedded to <sequence> and <segment>.
> But I defy you to find a "less opaque" way of referring to the
> different information recorded at the different levels of analysis.
> Please bear in mind that the "level" attributes have the agreement of
> the Sanskritists on the TEI-Sanskrit workgroup: they "feel" right.
> John Smith
> Dr J. D. Smith * john.smith@...
> Faculty of Oriental Studies * http://bombay.oriental.cam.ac.uk
> Sidgwick Avenue * Tel. 01223 335140
> Cambridge CB3 9DA * Fax 01223 335110
> This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
> Project Admins to receive an Apple iPod Mini FREE for your judgement on
> who ports your project to Linux PPC the best. Sponsored by IBM.
> Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
> Tei-choice mailing list