The <defaultVal> element was provided for backwards compatibility in P5 ODD as a way of representing the default value for an attribute, i.e. that which should be assumed if the attribute is not specified. This was useful in the SGML world where tag minimization was an issue, but seems much less appropriate in XML, where its effect is to require a parser to assume the default value was supplied even if the attribute is unspecified. This is annoying if the user attempts to constrain the value by supplying a closed <valList> : if they forget to remove the <defaultVal at the same time, and the default value is not on the closed value list, then it becomes impossible to create a valid instance. It is also a nuisance for software like oddByExample which derives an ODD from document instances because occurrences of an attribute with a default value are indistinguishable from attributes for which a value is actually specified in the source document.
In most cases, I think the intention of the element would be better subsumed into the attribute definition ("if this attribute is not specified then ...") rather than being magically created by the parser. We hate magic.
However there are quite a lot of uses of this in P5 (72 to be exact, some of them in attribute classes), so removing them is not a trivial change.
I think the situation is more nuanced than this. The default value is only interpolated if you use a DOCTYPE to point at a DTD, and run it through a parser which takes notice.
So there is alternative action, which is to stop defaultVal creating the default thing in the DTD, leave it as a way of explicitly saying what you think the default value (in human reading terms) should be. I actually prefer this to embedding that quite interesting info in prose.
No, the default value is always "interpolated" by the current schema generation software, irrespective of whether you are making a DTD or any other kind of schema. Or so I believe. How else would the two specific problems I describe above arise?
You can prove this easily. Take any ODD file which has an
<attDef>in it, and run it through an XML processor. Does the attDef gain a 'usage="opt"'? No.If you use RELAX NG there are no circumstances under which the schema is taken account of by XML processors that I am aware of. A schema-aware processor may take account of the default in an XSD file, I suppose, but most of us here don't use those, I think. Obviously if you have a DTD, then the usage="opt" will be added after the processor reads:
<!ATTLIST attDef %att.global.attributes; %att.identified.attributes; usage (req|rec|opt) "opt"Its also possible that we are talking at cross-purposes here.
Yes, the defaultVal instruction is interpreted and generates something in the chosen schema language, but its usually ignored there by XML processors. Your editor may be doing the interpolation, I suppose.
I think we are talking at cross purposes. Please consider the two specific issues I raised above. Case 1: An ODD in which a closed valList is added to constrain the values of some attribute and which does not include a valItem for a defaultVal specified for that attribute. Case 2: an ODD being generated (by oddbyexample) from a corpus of documents will always include any attribute which has a default value, even if it is never actually specified in the corpus. I agree these may be edge cases, but that doesn't make them any less annoying!
Last edit: Lou Burnard 2015-02-08
Case 1: yes, you can create a nonsense ODD where the defaultVal isn't present in the closed list. You could add a Schematron rule to
<attDef>to check for that. Its plain ole user error.Case 2: you're saying that oddbyexample takes account of
<defaultVal>when generating its output? sounds like an error in oddbyexample, not a reason to abandon defaultVal.I suspect you are running oddbyexample on a set of instance documents which use DTDs. When I do that, I strip out DOCTYPE instruction first (xmllint --dropdtd).
On 08/02/15 23:46, Sebastian Rahtz wrote:
Adding a schematron rule sounds like a good idea. I don't think most
people think of default values when creating valLists.
OK, it's an error in obe then...
Good guess, but in fact no: these documents all invoke tei_all
Schematron might help for case 1, yes.
Case 2 looks like a bug. In fact, each doc in the corpus in question has a
?oxygen RNGSchema="http://www.tei-c.org/release/xml/tei/custom/schema/relaxng/tei_all.rng" type="xml"?>
at the start, not a DOCTYPE so something weird is going on.
Regardless of the specific mechanics of these cases, I heartily agree with getting rid of defaultVal. I've lost count of the number of times a gentle identity transform has resulted in hundreds of infuriating attributes in the output because I happened to forget to delete them from the schema. Attributes should exist when you use them, not mysteriously come into being because you're ignorant or slapdash. Using tei_all and never having heard of @part should not mean that all your elements are @part="N".
You're both reinforcing my view that we should make
<defaultVal>documentary, not schema-generating. i.e. its a processing change.if someone can explain to me how Lou's DTD-less docs pick up default attributes when run through OBE, I shall be very interested. Martin, you don't use DTDs either - show me how these @part="N" things get in from your identity transform?
Thanks Martin, that's an even better example. Sebastian: apply the attached identity transform (ident.xsl) to the attached template file from the oxygen package (untitled-1) and you will find lots of @part attributes pop up (untitled-1-ident). No DTDs were involved. Just XSLT and oXygen!
Last edit: Lou Burnard 2015-02-09
You've demonstrated, to my surprise, that Saxon can take a RELAX NG schema into account when it processes an XML file. Of course, it doesn't happen on my command-line, only in oxygen.
To quote myself "You're reinforcing my view that we should make <defaultval> documentary, not schema-generating. i.e. its a processing change."</defaultval>
Ah. If I do the command line "saxon untitled-1.xml ident.xsl" then I get the expected identity transformation with no added attributes. If however I do:
"saxon -val:strict untitled-1.xml ident.xsl"
to request "strict" validation, then I get the following
net.sf.saxon.trans.LicenseException: Requested feature (schema-aware XSLT) requires Saxon-EE
Anyway, is your recommendation that ODD processing should change so that <defaultVal> affects only generated DTDs, or so that it should just generate additional text in the doc, or both? Either would be an improvement, methinks.
I recommend just the text in the doc. i.e. it should make no output in any schema language. So:
I'll even volunteer to do 3., if this is all agreed (though its not at all hard)
That all sounds good to me, except that DTD-users will then find their DTDs don't do what they expect them to. Can't we keep the dinosaurs happy by stopping defaultVal writing only for RELAXNG?
This feels to me like a discussion on where to set the border between magic and reality, for some arbitrary values used in the definition of magic.
In the course of this discussion, you have discovered what individual options trigger the magical effects. Why not just put that into three/four paragraphs of prose and two code examples, into a section on "Defaulting attribute values".
Add to that one Schematron check for the potential bug reported by Lou as his issue #1, and a bug/issue report for oddbyexample (maybe a request for a cmdline option) that focuses on what the ODD derivation process should / shouldn't take into account.
In this way, there will be a record of the findings (and a handy link for replies on the TEI-L), one safeguard for the processing logic, and one issue report (to oddbyexample), to be considered separately. All look advantageous, and won't make dinosaurs unhappy.
Last edit: Piotr Banski 2015-02-09
i am not in favour of having DTDs behave differently from RELAX NG. But i think you should get a Council vote on that one.
Piotr - there is no bug in odd by example, it turns out. Just depends which XML processing chain you use.
Let's not forget the original issues in working out the mechanics of these symptoms:
1) Should an attribute's having a
<defaultVal>mean that even in the absence of the attribute, it is assumed to be there with that value? NO, resoundingly.2) Is there any value whatsoever in
<defaultVal>? Again, resoundingly NO; it's pointless because if you want to force people to supply a value for an attribute, you can just make it req and give it a closed val list.I'd suggest asking the community if anyone (to their knowledge) uses or depends on it. I bet nobody does. If not, we can just nuke it.
(1) Um, no not quite. In DTD Land (and, for all I know, also in XSD City) the absence of an attribute means something different depending on whether or not a default value has been specified for it.
In DTD land <p> actually means <p complete="YES">
yours Tyrannosaurus Rex
I am puzzled, Martin, about where you think we should document that the effect of no @usage on attDef means the same as "opt"? I claim it has real value as documentation
Hi Martin,
It seems to me that, given the findings in this ticket, this may be better stated as an "If (you want this behaviour and your favourite schema lg allows for it) then ... else ...".
Unless I misunderstand you, you may be stating here a general question about data modelling, rather than a question about the TEI.
The point on XSD is a good one. The behaviour there is somewhat similar to DTDs:
'Default values of both attributes and elements are declared using the default attribute, although this attribute has a slightly different consequence in each case. When an attribute is declared with a default value, the value of the attribute is whatever value appears as the attribute's value in an instance document; if the attribute does not appear in the instance document, the schema processor provides the attribute with a value equal to that of the default attribute. Note that default values for attributes only make sense if the attributes themselves are optional, and so it is an error to specify both a default value and anything other than a value of optional for use. "
So my first point is wrong, in the sense that the behaviour I abhor is the one which is written into both DTD and XSD: schemas not only provide constraint and documentation of an instance document, but also complete it by adding attributes and elements which are not present in it. My second point is as Piotr says a more general one about data modelling, but that's what ODD is about, isn't it? We're not trying to provide a precursor only to known schema systems; we're trying to provide an independent modelling system which can be converted (to some extent lossily) into existing schema systems, but which is certainly allowed to support approaches to data modelling which are distinct from them.
@Sebastian: I'm not sure I understand your question about @usage.
Martin, you are writing an ODD to RELAX NG convert. you meet an attest with an attDef. you need to make an RNG attribute. Is it optional or not? You have to decide. If there is no @usage to guide you, what algorithm do you put in your program?
@Sebastian: I must be missing something obvious; I don't see how this relates to
<defaultVal>. I'm not asking to get rid of @usage.no, but how will you process <attdef> if there is no @usage present?</attdef>
Sebastian Rahtz
Chief Data Architect
University of Oxford IT Services
13 Banbury Road, Oxford OX2 6NN. Phone +44 1865 283431
Eh? I'm ABSOLUTELY NOT asking to remove @usage. Are you suggesting that there's something inherent in removing
<defaultVal>which causes @usage to disappear?