As I understand it, the proposal currently on the table to deal with
"sequences" in Sanskrit looks like this:
<seg type="word" subtype="level1">aasiid</seg>
<seg type="word" subtype="level3">aasiit</seg>
I'm personally content to see the elements I referred to as <sequence>
and <segment> both replaced by <seg>; and the outer <choice> to
indicate the relationship between the different main <seg>s was pretty
much the point of bringing the discussion to this group.
This means that the only part of the proposal that I find hard to
accept is the use of the attribute /subtype="level1"/. Despite what
has been said by Lou Burnard and Sebastian Rahtz, it seems to me (and
to others who have seen it) that this implies that "level1" is a
subtype of "word", which it isn't.
It would be very annoying to get so far, and then to get stuck on what
may seem a small issue. But salvation may be at hand. It has struck me
that one could argue that /type="word"/ is redundant. If the type of
the first seg is wordgroup, all its constituents will be words by
definition; similarly if it is a compound, all its members must be
compoundmembers, etc. So one could miss "word" out and make use of the
newly freed "type" attribute:
And that would be that.
Lou has suggested (pers. comm.) a second possible approach using
<orig> and <reg>. But as far as I can see exactly the same issues are
likely to arise there -- and <reg> doesn't even have a "type"
attribute, never mind a "subtype". So at present I'm inclined to stick
with nested <seg>s.
I'd be grateful for comments. If this looks like a viable approach
from the point of view of the <choice> group, I'll see if I can sell
it to the Sanskrit group.
Dr J. D. Smith * john.smith@...
Faculty of Oriental Studies * http://bombay.oriental.cam.ac.uk
Sidgwick Avenue * Tel. 01223 335140
Cambridge CB3 9DA * Fax 01223 335110