[Xena-devel] XML Validity, XML Namespaces, and XML Schema: some notes

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Following what Chris and I talked about re XML Schema and importing
namespace'd attributes, I  thought that I really should have known the
answer to his question. So I re-looked at some stuff about XML validity. The
answer to the issue of mixing in attributes from another namespace is that
the resulting document instances won't be either Valid (as per the XML Spec)
or compliant with the original schema (ie, Chris was right). The reason why
validity and compliant with a XML Schema instance is different is covered
below.

The exercise was a good memory jog and some of this stuff may be useful
background for people:

1. Strict XML validity in terms of the XML standard can only occur against
DTDs not against XML Schemas. (ie, the XML 1.0 spec has no notion of
validity without a DTD). This is because, of course, the XML spec pre-dates
the XML Schema spec. As a result, document instances of an XML format that
is only defined by XML Schema can never by strictly considered valid. For
instance, in the eyes of the XML 1.0 spec, a plaintext instance may be
well-formed but it can't be valid.

2. So why doesn't that world only ever use DTDs? Well in part because DTDs
aren't XML Namespace aware. Again, this is because DTDs pre-date XML
Namespaces. As a result, for a XML document instance that uses XML Namespace
to be valid against a DTD, the namespace declaration and prefix *must* be
hard-coded into the DTD. For instance, for a DTD to be able to validate a
plaintext instance, the line element would need to be declared as
plaintext:line in a DTD. But hard-coding the namespace declaration and
prefix defeats the whole idea of namespaces, which are meant to allow you to
arbitrarily mix XML vocabularies.

3. XML Schema attempts, amongst other things, to overcome this XML Namespace
problem. XML Schema is XML Namespace aware. One of the ways that it does
this is by allowing a particular XML Namespace to be the 'target' namespace
for a particular set of element and attributes declarations. With XML Schema
instances you can say that elements A and B are a part of namespace X and
elements C and D are a part of namespace Y. Doing so means that you can add
the elements A, B, C, D in the one document instance (with appropriate
namespace declarations), check the instance against appropriate XML
Schema-encoded schemas using a schema processor, and be able to say whether
or not the elements in the document instance meet the constraints defined
for them in the schemas. But this isn't the same as saying that the document
instance is valid, since strict-by-the-XML-spec validity can only be
determined against a DTD.

4. What a schema processor 'validates' isn't the document instance but
'element information items' and 'attribute information items' (see XML
Schema Part 1, section 2). Element and attribute information items represent
the particular information associated with a particular element or attribute
(see XML Information Set at w3c.org). So XML Schema can confirm that a
particular element is structurally correct, but not that an instance is
correct. Hence people talk about schema 'vocabularies' rather than 'document
types': schemas define 'words' (elements and attributes) rather than
validate documents.

5. When you look at the XML Information Set document from w3c.org, this
begins to make sense. In XML Information Set, it explains that a XML
document instance is represented by 'document information item' that has a
bunch of properties. None of these properties, however, hold a XML Namespace
value for the document instance. Instead, it is the element information
items (children of the document information item and other element
information items) and the attribute information items (children of
particular element information items) hold XML Namespace values. In other
words, document instances don't 'belong' to XML Namespaces, only element and
attributes within those document instances do.

6. When you think about it, this makes perfect sense. The idea behind XML
Namespaces is that you can include elements and attributes from different
XML Namespaces within the one XML document instance. In such a model, you
can't really say ahead of time what will end up in any one document
instance. All you can hope for is that a particular piece of content in an
instance is structured correctly. Of course, as a practical concern if the
root element in a document instance belongs to a certain namespace that acts
as a default for all the other element content of the instance, so it has
the effect of acting as if it is the instance's default namespace.

7. Back to what a schema processor does with a document instance and a
schema instance. The XML Schema spec defines two forms of 'validity':
1) local schema-validity. Does the current element or attribute in the
document instance meet the constraints of the schema instance?
2) 'overall validation' (or 'assessment'). Does the current element and all
its children meet all of their schema-related constraints after any 'infoset
augmentation' of the current element and its children? 'Infoset
augmentation' refers to the populating of the document instance with any
default values contained in the schema instance. 'Assessment' thus is
populating default values for the current element and all its children and
then checking 'local schema-validity' for the element and all its children.
(See XML Schema Part 1, Section 2.1)

8. This leads to some not-at-first-glance-obvious properties of XML Schema:
- schemas can't define a root element
- a schema processor might declare an element 'valid' (local
schema-validity) even if its children aren't valid
(Well at least this is how it seems to me. It seems that for the following
example document:
<html><head><title>Test Page</title></head><body><foo>undefined child
element</foo></body></html>
it would be possible for a schema processor to declare that the <html>
element has 'local schema-validity' even though it might not declare that
the <body> element has 'local schema-validity'. What do people think?)

Simon

-----Original Message-----
From: xen...@li...
[mailto:xen...@li...]On Behalf Of Chris
Sent: Friday, September 17, 2004 9:00 PM
To: xen...@li...
Subject: Re: [Xena-devel] Xena, meta-data and export

Simon Davis wrote: 

Without diving into specs, etc to check I think the answer to this question
is this it is an acceptable practice. After all this is how the XLink
standard is meant to work. Since the attributes aren't a part of the
other-format's namespace I *think* it shouldn't impact on the validity of
the document. 

It would surprise me if it works, because it seems like a loophole in schema
enforcement.

Regarding XLink, I notice that the following example DTD, explicitely
declares the Xlink attributes,
which makes me think they need to be declared in a schema....

http://www.xml.com/pub/a/2000/09/xlink/index.html?page=3
<http://www.xml.com/pub/a/2000/09/xlink/index.html?page=3> 

Nevertheless, one way around this objection may be to add the extra
attributes to the container element not to the root element of the embedded
object itself.

[Xena-devel] XML Validity, XML Namespaces, and XML Schema: some notes

NO LONGER MAINTAINED

[Xena-devel] XML Validity, XML Namespaces, and XML Schema: some notes