Syd having expressed some concern about my desire to simplify the
content model for <choice> still further, I thought I should try to
defend the proposition a bit more carefully.
My proposal is to define <choice> as containing (tei.choosable,
tei.choosable+), with <sic>, <corr>, <seg>, <reg>, <orig> (which I wish
to rename <irreg>), <abbr>, <expan> all members of the tei.choosable
class, along with <choice> itself.
Syd's comment (in a note to me which I hope he won't mind my sharing --
we're all friends here) was:
>
> You know my view -- dumping everything into one big bag does not help
> a user sort out which types of things are "source" things (i.e.,
> first children) and which types of things are "derived" things (i.e.,
> subsequent children), let alone which "source" things go with which
> "derived" things.
>
My rationale is both principled and pragmatic, as is so much else in the
TEI. To start with the pragmatic: the earlier model proposed (tei.sic,
tei.corr+) suffers from the problem that the two classes must be
disjoint if we are to avoid "non-deterministic content model" errors in
DTD land. That is, we cannot have elements (such as <seg>) appearing
both as members of tei.sic and tei.corr. We've experimented with trying
to define more rigorous content models, which aimed to simply duplicate
the existing janus tags model, and found that they rapidly get rather
complex. Moreover, it seems to me that in defining <choice> more
generally as a mechanism for indicating places where a "choice" of
encodings is feasible we are adding a useful new facility that could do
more than the old janus tags provide.
The principle is the simple one of observing that the distinction
between "source" things and "derived" things is hard to sustain in
actual practice. All encoding is, in some sense, "derived" -- it's all
interpretation. The distinction between <sic> and <corr> is that the
former asserts it is a "source" thing, (which it does perfectly well
wherever you place it in in the content model) and you might want to
make a choice of such assertions. In the case of a truly illegible ms
for example, you might want to assert that it either reads "foo" or
"foe" but definitely not both -- so you might want a choice of two
<sic>s. In the case of a normalization or correction, you might equally
well want to assert that there is a choice of <corr>s for a single <sic>
(or more!). The assertion as to which tagging corresponds with which is
implied by the grouping within a <choice>.
Think of <choice> as being a node in a decision tree. At this point in
the text more than encoding is feasible, and the decision as to which
should be used is up to an application. The application might, as Syd
suggests, make that decision on the basis of the order in which
alternates are presented, or on the basis of the semantics of the
alternates, or some combination thereof. The TEI doesn't need to
legislate for that though: we just say that a processor should use only
one alternate for a given purpose.
In some ways, the model I'm now proposing is a simplification of what we
already have in the <app> mechanism used in the Textcrit module; in
others though it is a generalization of that, providing a useful tool
for the kind of lexical analysis that (e.g.) the Sanskrit workgroup has
requested -- to propose alternative tokenization or morphological
analyses of a single input string. That seems generally useful enough to
warrant placing this mechanism in the core module.
One last point: there are several elements which carry a REG attribute,
mostly names of persons or places. Should <name> (etc.) therefore also
be a choosable element? I am not sure. It seems to me that e.g. <name
reg="Edinburgh">Auld Reekie</name> is not really asserting that
"<reg>Edinburgh</reg>" is an alternative way of encoding "<name>Auld
Reekie</name>". Rather it is annotating the latter with some extra
information, which could also, and probably more efficiently, be
provided by the existing KEY attribute or some equivalent pointing
mechanism. So I am now inclining to the view that those REG attributes
should either be removed entirely, or replaced by child <reg> elements
for <name> etc.
I'm currently working on the draft, to add more discussion and produce
some more detailed examples, which is the best way I know of reaching a
firm conclusion on such matters. Counter examples would therefore be
most welcome!
Lou
|