Re: [Yaml-core] Schema

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Friday 30 January 2004 09:46 am, you wrote:
> On Fri, Jan 30, 2004 at 09:24:53AM -0800, Sean O'Dell wrote:
> | Your folk's terminology still confuses the living daylights out of me. 
> | =)
>
> Well, you can blame me for the most part. I'm curious what you found
> difficult?  It may not be the words, but how they are introduced... or it
> may be the words themselves.  We spent *alot* of time trying to make the
> spec less formal, I hope it was at least somewhat successful?

I think it's just that you guys spend so much time talking about it that your 
terms and speech have evolved well past what I find easily comprehensible 
when I stick my head in once every couple of months.  You need a PR guy that 
knows how to talk to the project sophomores.  =)

> | cases:
> |   - case: &firstname
> |       type: str
> |       name: firstname
> |       assert: "[A-Z][a-z]+"
> |
> |   - case: &lastname
> |       type: str
> |       name: lastname
> |       assert:
> |         any:
> |           - "[A-Z][a-z]+"
> |           - "[A-Z]'[A-Z][a-z]+"
>
> Ok, so you are defining scalar classes here?  Perhaps c/type/tag/ ?

Well, the case part of the structure is meant to convey parameters for the 
type/value of the node itself (perhaps I should move name out of the case 
structure; yeah, I'll do that).  So, for map and seq, only type really 
applies.  For other types (the scalars), the assert structure tests the value 
of the scalar.

> | schema:
> |   - node:
> |       id: root
>
> curious, what is "id" and "name" above, how do you intend
> to use them (examples of the schema would help)?

Oh, this *is* confusing, I know.  I couldn't find a better way to say this.

The name tag is the name of the node in the data and the id tag is the name of 
the node in the schema (unrelated really).

The purpose of the id tag is to help give more meaningful error messages.  
Right now, if a schema is poorly defined or if the data doesn't match, I 
report schema node paths like so:

node[0]/node[0]

The id value lets me say:

root/firstname

> |       case:
> |         type: map
> |       quantity: 1
>
> RELAX NG took the regex approach of not using quantity, but
> instead using ? (optional), + (one or more), * (zero or more),
> and making the default "exactly one".  So, if you want two,
> you have to explicitly duplicate the item twice.  This isn't
> such a bad price to pay for the extra simplicity and consistency
> with regular expressions.

I know simplicity and power are often trade-offs, but my way isn't very 
complicated, and it lets you specify ranges like "no more than 5" or "from 3 
to 10" and so on.  This is one of those things that just seems like the 
schema should let people do.

> | In a nutshell, it lets you test nodes for type, and both map and seq can
> | have children nodes.  You can test scalars using regex and ranges. 
> | Scalars can be str, num, bool, null or time. Numbers have number ranges,
> | timestamps have time ranges.  The assert key handles value testing, and
> | it allows for complex "and" and "or" operations.  Also, in seq types,
> | children nodes can specify which position they should occupy
>
> I'm curious why you just don't use the order that they appear to
> imply the order, I think RELAX NG does a great job in this case;
> it just doesn't do mappings well (as everything in XML is ordered,
> with elements).

This schema actually was designed more with "data" in mind and less with YAML 
in mind, although I am working from YAML test data initially.  My next step 
is to pull in some XML data from REXML and see what sort of tweaking I need 
to do.  Ultimately, it should be useful for any (or at least a goodly number) 
of data graphs from any source, so long as they stick to the map/seq/scalar 
system.

Regarding ordering: YAML let's you mix up structures any way you want.  Why 
should the schema suddenly turn that on its head?  I don't think the schema 
should force the YAML data to order structures in a fixed way (the way they 
appear in the schema).  My way would default to "any" so nodes can appear in 
any order, but if you NEED order, you can add in "first" and "next" keywords.  

The way most schema designs forced order in this way was probably the PRIMARY 
reason I decided to do my own schema.  Sometimes simple is good, and 
sometimes simple is just lazy, and that struck me as the lazy way of 
approaching ordering.

A troubling thought: YAML allows for maps to be ordered, and I can't for the 
life of me figure out how to check map ordering.  It comes in as a plain Hash 
in Ruby.  I'm allowing order to be specified in my schema, just in case I 
figure it out, but I think it might just be a futile exercise.  Any ideas?

        Sean O'Dell