From: Simon D. <sim...@na...> - 2004-09-22 08:39:49
|
Following what Chris and I talked about re XML Schema and importing namespace'd attributes, I thought that I really should have known the answer to his question. So I re-looked at some stuff about XML validity. The answer to the issue of mixing in attributes from another namespace is that the resulting document instances won't be either Valid (as per the XML Spec) or compliant with the original schema (ie, Chris was right). The reason why validity and compliant with a XML Schema instance is different is covered below. The exercise was a good memory jog and some of this stuff may be useful background for people: 1. Strict XML validity in terms of the XML standard can only occur against DTDs not against XML Schemas. (ie, the XML 1.0 spec has no notion of validity without a DTD). This is because, of course, the XML spec pre-dates the XML Schema spec. As a result, document instances of an XML format that is only defined by XML Schema can never by strictly considered valid. For instance, in the eyes of the XML 1.0 spec, a plaintext instance may be well-formed but it can't be valid. 2. So why doesn't that world only ever use DTDs? Well in part because DTDs aren't XML Namespace aware. Again, this is because DTDs pre-date XML Namespaces. As a result, for a XML document instance that uses XML Namespace to be valid against a DTD, the namespace declaration and prefix *must* be hard-coded into the DTD. For instance, for a DTD to be able to validate a plaintext instance, the line element would need to be declared as plaintext:line in a DTD. But hard-coding the namespace declaration and prefix defeats the whole idea of namespaces, which are meant to allow you to arbitrarily mix XML vocabularies. 3. XML Schema attempts, amongst other things, to overcome this XML Namespace problem. XML Schema is XML Namespace aware. One of the ways that it does this is by allowing a particular XML Namespace to be the 'target' namespace for a particular set of element and attributes declarations. With XML Schema instances you can say that elements A and B are a part of namespace X and elements C and D are a part of namespace Y. Doing so means that you can add the elements A, B, C, D in the one document instance (with appropriate namespace declarations), check the instance against appropriate XML Schema-encoded schemas using a schema processor, and be able to say whether or not the elements in the document instance meet the constraints defined for them in the schemas. But this isn't the same as saying that the document instance is valid, since strict-by-the-XML-spec validity can only be determined against a DTD. 4. What a schema processor 'validates' isn't the document instance but 'element information items' and 'attribute information items' (see XML Schema Part 1, section 2). Element and attribute information items represent the particular information associated with a particular element or attribute (see XML Information Set at w3c.org). So XML Schema can confirm that a particular element is structurally correct, but not that an instance is correct. Hence people talk about schema 'vocabularies' rather than 'document types': schemas define 'words' (elements and attributes) rather than validate documents. 5. When you look at the XML Information Set document from w3c.org, this begins to make sense. In XML Information Set, it explains that a XML document instance is represented by 'document information item' that has a bunch of properties. None of these properties, however, hold a XML Namespace value for the document instance. Instead, it is the element information items (children of the document information item and other element information items) and the attribute information items (children of particular element information items) hold XML Namespace values. In other words, document instances don't 'belong' to XML Namespaces, only element and attributes within those document instances do. 6. When you think about it, this makes perfect sense. The idea behind XML Namespaces is that you can include elements and attributes from different XML Namespaces within the one XML document instance. In such a model, you can't really say ahead of time what will end up in any one document instance. All you can hope for is that a particular piece of content in an instance is structured correctly. Of course, as a practical concern if the root element in a document instance belongs to a certain namespace that acts as a default for all the other element content of the instance, so it has the effect of acting as if it is the instance's default namespace. 7. Back to what a schema processor does with a document instance and a schema instance. The XML Schema spec defines two forms of 'validity': 1) local schema-validity. Does the current element or attribute in the document instance meet the constraints of the schema instance? 2) 'overall validation' (or 'assessment'). Does the current element and all its children meet all of their schema-related constraints after any 'infoset augmentation' of the current element and its children? 'Infoset augmentation' refers to the populating of the document instance with any default values contained in the schema instance. 'Assessment' thus is populating default values for the current element and all its children and then checking 'local schema-validity' for the element and all its children. (See XML Schema Part 1, Section 2.1) 8. This leads to some not-at-first-glance-obvious properties of XML Schema: - schemas can't define a root element - a schema processor might declare an element 'valid' (local schema-validity) even if its children aren't valid (Well at least this is how it seems to me. It seems that for the following example document: <html><head><title>Test Page</title></head><body><foo>undefined child element</foo></body></html> it would be possible for a schema processor to declare that the <html> element has 'local schema-validity' even though it might not declare that the <body> element has 'local schema-validity'. What do people think?) Simon -----Original Message----- From: xen...@li... [mailto:xen...@li...]On Behalf Of Chris Sent: Friday, September 17, 2004 9:00 PM To: xen...@li... Subject: Re: [Xena-devel] Xena, meta-data and export Simon Davis wrote: Without diving into specs, etc to check I think the answer to this question is this it is an acceptable practice. After all this is how the XLink standard is meant to work. Since the attributes aren't a part of the other-format's namespace I *think* it shouldn't impact on the validity of the document. It would surprise me if it works, because it seems like a loophole in schema enforcement. Regarding XLink, I notice that the following example DTD, explicitely declares the Xlink attributes, which makes me think they need to be declared in a schema.... http://www.xml.com/pub/a/2000/09/xlink/index.html?page=3 <http://www.xml.com/pub/a/2000/09/xlink/index.html?page=3> Nevertheless, one way around this objection may be to add the extra attributes to the container element not to the root element of the embedded object itself. |