#41 <egXML> content model

RED
closed-fixed
5
2008-09-04
2008-07-18
No

The TEI guidelines unambiguously state that <egXML> should be used to contain XML example fragments:
eg: Section 22.4.2 Exemplification of Components (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/TD.html#TDeg) quotes the description for <egXML>:
"egXML (example of XML) contains a single
well-formed XML fragment demonstrating the use
of some XML element or attribute, in which the
egXML element itself functions as the root
element. "
and further states explicitly that
"[a]n egXML element should not be used to tag
non-XML examples: the general purpose eg or q
elements should be used for such purposes."

Yet, <egXML> is formally specified to contain only text (http://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-egXML.html):

<rng:element name="egXML">
<rng:ref name="att.global.attributes"/>
<rng:ref name="att.xmlspace.attributes"/>
<rng:text/>
</rng:element>

Which seems obviously wrong to me. James Cummings pointed me to a modified definition of <egXML> in http://www.tei-c.org/release/xml/tei/custom/odd/tei_odds.odd:

<elementSpec module="tagdocs" ns="http://www.tei-c.org/ns/Examples" usage="mwa" ident="egXML" mode="change">
<content>
<oneOrMore xmlns="http://relaxng.org/ns/structure/1.0">
<choice>
<text/>
<ref name="anyTEI"/>
</choice>
</oneOrMore>
</content>
</elementSpec>

Where "anyTEI" is specified as a macro listing all TEI elements. There are two problems with this definition as well:
1. all examples in the distributed TEI odd files should be properly namespaced. Instead of

<egXML xmlns="http://www.tei-c.org/ns/Examples">
<p>I fully appreciate Gen. Pope's splendid
achievements with their invaluable results; but
you must know that Major Generalships in the
Regular Army, are not as plenty as blackberries.
</p>
</egXML>

it should be

<eg:egXML xmlns:eg="http://www.tei-c.org/ns/Examples">
<p>I fully appreciate Gen. Pope's splendid
achievements with their invaluable results; but
you must know that Major Generalships in the
Regular Army, are not as plenty as blackberries.
</p>
</eg:egXML>

2. Even then, this content model would be too limited w.r.t. the prose description that "any well-formed XML fragment" should be allowed in <egXML>. It appears that such a general content model can be expressed in Relax NG without much problems[1].

To round up, I suggest that
1. Minimally, the general content model of <egXML> should be replaced with the adapted definition from tei_odds.odd.
2. Optimally, the general content model of <egXML> should be relaxed to allow for any XML element
3. Optionally, the examples TEI ODD files should be recoded with proper namespaces.

Ron Van den Branden

====
[1] examples of such definitions:
* http://www.relaxng.org/tutorial-20011203.html#IDAFLZR
* http://blog.subterfusion.net/2008/accepting-any-element-in-relaxng/

Discussion

1 2 > >> (Page 1 of 2)
  • Lou Burnard
    Lou Burnard
    2008-07-19

    Logged In: YES
    user_id=1021146
    Originator: NO

    The main use for egXML in P5 is as a means of validating the examples against the TEI example schema, which is a schema that permits any TEI element as root, but observes constraints for its children thereafter. A content model of ANY or of plain text would not give us this (very useful) degree of validation. This is why we use the definition in tei_odds rather than the "canonical" one. The latter is different because we don't assume that everyone will necessarily want to use egXML in TEI ODD documents. We chose "text" as the least annoying possible content model for people wanting to use this element to mark up examples from other XML syntaxes, fully expecting that they might want to modify it in the same way that we have for ODD purposes. I agree that ANY would have been another choice we might have made, so that egXML examples would at least be constrained to be well-formed.

    I don't understand the point you are making about the namespace.

     
  • Syd Bauman
    Syd Bauman
    2008-07-19

    Logged In: YES
    user_id=686243
    Originator: NO

    Deep and impressive analysis Ron, keep up the good work! However, I
    only agree with one of your three suggestions, as follows.

    1. The content model from tei_odds.odd is, deliberately, *far* more
    restrictive than the generic <egXML> element's content, because it
    is not intended to be used for any TEI document (which might, e.g.,
    be used to exemplify some non-TEI language or elements outside the
    TEI namespace). The tei_odd schema is not intended for use by the
    general user, but rather the user who wants to write an ODD for a
    TEI customization.

    2. The general content model for <egXML> always should have been
    something that constrains the content to match the prose, e.g.
    using the pattern 'any', declared as
    any = ( element * { any* } | attribute * { text }* | text )
    might do the trick. I think this is an egregious and corrigible
    error, although it is a minor one.

    3. AFAIK, there is no difference between
    <egXML xmlns="http://www.tei-c.org/ns/Examples">
    and
    <eg:egXML xmlns:eg="http://www.tei-c.org/ns/Examples">
    As far as an XML processor is concerned, those two are the same.
    (The element with local-name 'egXML' from the namespace
    'http://www.tei-c.org/ns/Examples'.) While I can see the argument
    that the latter might be a little easier for humans to follow
    what's going on (less likely your eye skips the detail that the
    element is in a different namespace), it may turn out to be quite a
    pain to maintain in the source for the Guidelines, depending on the
    software used (as everytime one ran the source through a processor,
    it may change it back).

     
  • Logged In: YES
    user_id=95949
    Originator: NO

    Two points.

    a) the reason why <egXML> does not have the generic "any XML with any attribute" pattern is because
    it does not translate to DTDs. In retrospect, I now think we should have made that work as it
    ought, and dumbed down the DTD to PCDATA. I propose to make that change at the next release unless
    there are objections from the Council when its discussed.

    b) sorry, but

    <egXML xmlns="http://www.tei-c.org/ns/Examples">
    and
    <eg:egXML xmlns:eg="http://www.tei-c.org/ns/Examples">

    are really not the same at all. In the first case, the namespace
    is inherited by child elements, in the second case only the <egXML>
    is in that example namespace.

     
  • Syd Bauman
    Syd Bauman
    2008-07-19

    Logged In: YES
    user_id=686243
    Originator: NO

    Sebastian --

    (a) Excellent! Glad to hear it.

    (b) Yes, indeed, you are correct of course. I should have been more precise, as in

    <egXML xmlns="http://www.tei-c.org/ns/Examples">
    <p>Quack!</p>
    </egXML>

    and

    <eg:egXML xmlns:eg="http://www.tei-c.org/ns/Examples">
    <eg:p>Quack!</eg:p>
    </eg:egXML>

    are the same. (Although the same applies to other descendants, too, of course.)

     
  • Logged In: YES
    user_id=95949
    Originator: NO

    Just for the record, by the way, it was a conscious decision that all the
    examples in the Guidelines be encoded in the Examples namespace rather
    than the _real_ TEI namespace. It was (and probably still is) a controversial
    decision, but the rationale was twofold:

    a) otherwise, every small example
    would have to have an xmlns attribute added to its root element(s), and that
    this would impose a burden on editors.

    b) if all the examples were actually in the TEI namespace, it would
    play merry hell with processing the actual Guidelines text. XSL
    constructs like <xsl:number level="any"/> would all have to be adjusted
    to exclude anything which was a descendant of egXML. Doable, of course,
    and maybe this was pusillanimity :-}

     
  • Logged In: YES
    user_id=1110667
    Originator: YES

    > Just for the record, by the way, it was a conscious decision that all the
    > examples in the Guidelines be encoded in the Examples namespace rather
    > than the _real_ TEI namespace. It was (and probably still is) a
    > controversial decision, but the rationale was twofold:

    Ah, my assumption was that <egXML> was explicitly designed to allow its contents to be validated in their own namespace (instead of the Examples namespace). That's why I thought

    <p xmlns="http://www.tei-c.org/ns/1.0">
    <eg:egXML xmlns:eg="http://www.tei-c.org/ns/Examples">
    <p>Quack!
    <div>This is not exactly a valid TEI example</div>
    </p>
    </eg:egXML>
    </p>

    would
    a) allow the erroneous TEI content of <egXML> to be spotted at validation
    b) be a way to avoid the TEI namespace declaration on any child element

    But on second thought, my association between 'TEI namespacedness' and validation might be too naive? If a TEI document is validated against, say, TEI lite, will this schema association automatically apply to TEI-namespaced contents of <egXML> as well? Or am I completely missing the point?

     
  • Logged In: YES
    user_id=95949
    Originator: NO

    I think you may be reading too much into this. In retrospect, I wonder
    if we designed this element wrong, putting <egXML> into its own namepsace,
    and using that to hide the namespace of the contents.

    Still, it is what it is. You can put any element in there, and adjust
    the schema to validate them, and thats aboyt as far as it goes. The
    Guidelines cheat, but putting fake TEI examples in there, without
    explaining it well, which I confess to being ashamed of :-{

     
  • Lou Burnard
    Lou Burnard
    2008-08-17

    • milestone: --> RED
     
  • Lou Burnard
    Lou Burnard
    2008-09-03

    • assigned_to: nobody --> rahtz
     
    • status: open --> closed-fixed
     
1 2 > >> (Page 1 of 2)