From: Sean O'D. <se...@ce...> - 2004-01-31 11:21:04
|
On Friday 30 January 2004 09:46 am, you wrote: > On Fri, Jan 30, 2004 at 09:24:53AM -0800, Sean O'Dell wrote: > | Your folk's terminology still confuses the living daylights out of me. > | =) > > Well, you can blame me for the most part. I'm curious what you found > difficult? It may not be the words, but how they are introduced... or it > may be the words themselves. We spent *alot* of time trying to make the > spec less formal, I hope it was at least somewhat successful? I think it's just that you guys spend so much time talking about it that your terms and speech have evolved well past what I find easily comprehensible when I stick my head in once every couple of months. You need a PR guy that knows how to talk to the project sophomores. =) > | cases: > | - case: &firstname > | type: str > | name: firstname > | assert: "[A-Z][a-z]+" > | > | - case: &lastname > | type: str > | name: lastname > | assert: > | any: > | - "[A-Z][a-z]+" > | - "[A-Z]'[A-Z][a-z]+" > > Ok, so you are defining scalar classes here? Perhaps c/type/tag/ ? Well, the case part of the structure is meant to convey parameters for the type/value of the node itself (perhaps I should move name out of the case structure; yeah, I'll do that). So, for map and seq, only type really applies. For other types (the scalars), the assert structure tests the value of the scalar. > | schema: > | - node: > | id: root > > curious, what is "id" and "name" above, how do you intend > to use them (examples of the schema would help)? Oh, this *is* confusing, I know. I couldn't find a better way to say this. The name tag is the name of the node in the data and the id tag is the name of the node in the schema (unrelated really). The purpose of the id tag is to help give more meaningful error messages. Right now, if a schema is poorly defined or if the data doesn't match, I report schema node paths like so: node[0]/node[0] The id value lets me say: root/firstname > | case: > | type: map > | quantity: 1 > > RELAX NG took the regex approach of not using quantity, but > instead using ? (optional), + (one or more), * (zero or more), > and making the default "exactly one". So, if you want two, > you have to explicitly duplicate the item twice. This isn't > such a bad price to pay for the extra simplicity and consistency > with regular expressions. I know simplicity and power are often trade-offs, but my way isn't very complicated, and it lets you specify ranges like "no more than 5" or "from 3 to 10" and so on. This is one of those things that just seems like the schema should let people do. > | In a nutshell, it lets you test nodes for type, and both map and seq can > | have children nodes. You can test scalars using regex and ranges. > | Scalars can be str, num, bool, null or time. Numbers have number ranges, > | timestamps have time ranges. The assert key handles value testing, and > | it allows for complex "and" and "or" operations. Also, in seq types, > | children nodes can specify which position they should occupy > > I'm curious why you just don't use the order that they appear to > imply the order, I think RELAX NG does a great job in this case; > it just doesn't do mappings well (as everything in XML is ordered, > with elements). This schema actually was designed more with "data" in mind and less with YAML in mind, although I am working from YAML test data initially. My next step is to pull in some XML data from REXML and see what sort of tweaking I need to do. Ultimately, it should be useful for any (or at least a goodly number) of data graphs from any source, so long as they stick to the map/seq/scalar system. Regarding ordering: YAML let's you mix up structures any way you want. Why should the schema suddenly turn that on its head? I don't think the schema should force the YAML data to order structures in a fixed way (the way they appear in the schema). My way would default to "any" so nodes can appear in any order, but if you NEED order, you can add in "first" and "next" keywords. The way most schema designs forced order in this way was probably the PRIMARY reason I decided to do my own schema. Sometimes simple is good, and sometimes simple is just lazy, and that struck me as the lazy way of approaching ordering. A troubling thought: YAML allows for maps to be ordered, and I can't for the life of me figure out how to check map ordering. It comes in as a plain Hash in Ruby. I'm allowing order to be specified in my schema, just in case I figure it out, but I think it might just be a futile exercise. Any ideas? Sean O'Dell |