Without going to details of the various proposals (I also have one brewing
at the back of my mind), there are some questions that needs to be decided
up front.
1. What is the intended power of the schema language?
 Would it be a regular language (something that can be implemented as a
finite state machine),
 Or a contextfree language (something that requires a stack),
 Or even stronger (recursivelyenumerable/Turingcomplete)?
There are two related issues that are worth considering:
2. Would a schema language allow specifying typespecific constraints?
3. The systems above are geared towards describing sequences/trees rather
than graphs. I'm not aware of a formal model for describing DAG languages
(regardless of the power). Then again I'm not much of a theoretician in this
field, my courses on this are from 17 years ago. Anyone has a clue on how to
do this formally?
An example to put this in context:
Contrived example:
amounts:
 codes: &P1
sku: 12345
barcode: 54321
amount: 7
 codes: &P2
sku: 67890
barcode: 09876
amount: 3
prices:
 codes: *P1
price: 15.99
 codes: *P2
price: 52.75
 A regular language can't even specify that the number of entries in the
"prices" sequence must be equal to the number of entries in the "amounts"
sequence.
 A context free language can do that. But I it can't specify that the value
for "codes" of the Nth element in both sequences is equal.
 A "complete" language can do that. It would require some mechanism that
specifies this value must be the actual _identical_ node rather than being
merely an equal one, however.
 But even so it is impossible to specify that barcodes are valid (satisfy a
CRC condition) without resorting to typespecific operators. You could fake
requirements such as being positive or being at a resolution of 0.01 using
regular expressions, but that's just luck, and it gets messy very fast
(imagine writing the regexp for floats between 17.5 and 327.4  and keep in
mind the exponential notation).
I think we need a clear answer on these questions before we start debating
the specifics of any schema language. At the very least, each schema
proposal needs to clarify what is its position on the three questions above,
otherwise it would be impossible to discuss it.
Some musings:
The beauty of regular languages is that they are very simple and efficient
to implement. It is also practical to specify them completely using a small
"universal" set of primitives.
So regular languages are the immediate attraction, even if they do give up
on describing certain classes of constraints...
For example, Clark is fond of RELAXNG; from what I've seen in RELAXNG it
is a regular language. It has hooks to invoke typespecific constraints.
This provides us a "proof by authority" that regular languages are
sufficient in practice. That gives a concrete and attractive answer to
questions (1) and (2).
However, RELAXNG has no way (that I know of) to specify the
equality/identity constraints necessary for describing a directed acyclic
graph (as opposed to an ordered tree). So question (3) remains open.
I've tried to figure out in my head what extensions would be required for a
regular language to be able to describe a graph instead of an ordered tree.
So far all I've got is a headache and a suspicion that it can't be done
without using more power than regular languages provide.
This is the point where one goes to the theory books to look for inspiration
but as I said I'm very rusty here. We could of course just forge ahead and
add some adhoc constructs that seem reasonable, but I'd rather have some
sort of a theoretical basis for whatever we use. It is too easy to mess this
sort of thing up beyond all hope otherwise.
Thoughts?
Have fun,
Oren BenKiki
