From: Steve H. <sh...@zi...> - 2002-08-22 22:51:21
|
I am starting to think about how I would write a validating YAML parser on top of PyYaml. There are two major issues here--how do you define the schemas, and how do you implement the parser. Just to get the discussion rolling, let me throw out a really simple example: SCHEMA: allowed: - type: map - keys: - type: string name: FirstName - type: string name: LastName validation: | (value in ['Evans', 'Smith', 'Jones']) - type: string seq: 1 name: Children max: 10 CONFORMING YAML: --- FirstName: John LastName: Smith Children: - Billy - Mary A validating parser would want to have a pull interface to the base parser. (I don't believe any existing YAML implementations provide such an interface. Most implementations load the whole document at once.) On the particular data set that I've given, the validating layer might make the following calls into a pull parser: >>> parser.isStartMap() 1 >>> parser.getKey() 'FirstName' >>> parser.getScalar() 'John' >>> parser.getScalar() 'Smith' >>> parser.getStartMap() 1 >>> parser.getKey() 'Children' >>> parser.isStartMap() 1 >>> parser.isStartList() 1 >>> parser.getScalar() 'Billy' >>> parser.getScalar() 'Mary' >>> parser.getScalar() StopIteration exception thrown >>> parser.getKey() StopIteration exception thrown >>> parser.getKey() StopIteration exception thrown Does this make sense so far? Once you implemented all the methods for a pull parser, you would probably want to implement the all-the-data-at-once yaml.load() method on top of the pull parser methods. Since the generic load call would not have a schema to drive the parsing process, you would need some sort of node-testing method(s). >>> parser.getNodeType() 'map' >>> parser.getNodeType() 'scalar' >>> parser.getScalar() 'FirstName' >>> parser.getNodeType() 'scalar' >>> parser.getScalar() 'John' # etc. Thanks, Steve |
From: Rolf V. <rol...@he...> - 2002-08-23 10:21:05
|
It would be nice to have a schema syntax that is close to the final document, so the parser can follow it exactly while requesting elements from the incomming document. Literals appear as is, while type definitions are sort of variables in the schema definition. Also a benefit for readability. A way to escape type definitions is needed. Some possibilities: schema: Firstname: !def string LastName: !def string="Evans"|"Smith"|"Jones" Children: - !defup { max: 10 } - !def string schema: Firstname: !str ? LastName: !str "Evans"|"Smith"|"Jones" Children: - !attr {max: 10} - !str ? where '?' stands for any string. We could also think about integrating EBNF in some way, because it is compact and clean. James Clark has used it in its schema language for XML, Relax NG, compact version: http://www.oasis-open.org/committees/relax-ng/compact-20020607.html The structure of the Java parser is such that it would be easy to make a pull parser extension. I'll take note. Cheers. Rolf. -- Steve Howell wrote: > I am starting to think about how I would write a validating YAML parser on top > of PyYaml. There are two major issues here--how do you define the schemas, and > how do you implement the parser. Just to get the discussion rolling, let me > throw out a really simple example: > > SCHEMA: > > allowed: > - type: map > - keys: > - type: string > name: FirstName > - type: string > name: LastName > validation: | > (value in ['Evans', 'Smith', 'Jones']) > - type: string > seq: 1 > name: Children > max: 10 > > CONFORMING YAML: > > --- > FirstName: John > LastName: Smith > Children: > - Billy > - Mary > > A validating parser would want to have a pull interface to the base parser. (I > don't believe any existing YAML implementations provide such an interface. Most > implementations load the whole document at once.) On the particular data set > that I've given, the validating layer might make the following calls into a pull > parser: > > >>>>parser.isStartMap() >>> > 1 > >>>>parser.getKey() >>> > 'FirstName' > >>>>parser.getScalar() >>> > 'John' > >>>>parser.getScalar() >>> > 'Smith' > >>>>parser.getStartMap() >>> > 1 > >>>>parser.getKey() >>> > 'Children' > >>>>parser.isStartMap() >>> > 1 > >>>>parser.isStartList() >>> > 1 > >>>>parser.getScalar() >>> > 'Billy' > >>>>parser.getScalar() >>> > 'Mary' > >>>>parser.getScalar() >>> > StopIteration exception thrown > >>>>parser.getKey() >>> > StopIteration exception thrown > >>>>parser.getKey() >>> > StopIteration exception thrown > > Does this make sense so far? > > Once you implemented all the methods for a pull parser, you would probably want > to implement the all-the-data-at-once yaml.load() method on top of the pull > parser methods. Since the generic load call would not have a schema to drive > the parsing process, you would need some sort of node-testing method(s). > > >>>>parser.getNodeType() >>> > 'map' > >>>>parser.getNodeType() >>> > 'scalar' > >>>>parser.getScalar() >>> > 'FirstName' > >>>>parser.getNodeType() >>> > 'scalar' > >>>>parser.getScalar() >>> > 'John' > # etc. > > Thanks, > > Steve > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > > > |
From: Rolf V. <rol...@he...> - 2002-08-23 10:41:05
|
Rolf Veen wrote: > It would be nice to have a schema syntax that is close > to the final document, so the parser can follow it exactly > while requesting elements from the incomming document. Literals > appear as is, while type definitions are sort of variables > in the schema definition. Also a benefit for readability. > > A way to escape type definitions is needed. Some possibilities: > > schema: > Firstname: !def string > LastName: !def string="Evans"|"Smith"|"Jones" > Children: > - !defup { max: 10 } > - !def string > > schema: > Firstname: !str ? > LastName: !str "Evans"|"Smith"|"Jones" > Children: > - !attr {max: 10} > - !str ? Oops, invalid YAML. Well, you get the idea :-). Rolf. > where '?' stands for any string. > > We could also think about integrating EBNF in some way, because > it is compact and clean. James Clark has used it in its > schema language for XML, Relax NG, compact version: > > http://www.oasis-open.org/committees/relax-ng/compact-20020607.html > > > The structure of the Java parser is such that it would be > easy to make a pull parser extension. I'll take note. > > Cheers. > Rolf. > > -- > > Steve Howell wrote: > > I am starting to think about how I would write a validating YAML > parser on top > > of PyYaml. There are two major issues here--how do you define the > schemas, and > > how do you implement the parser. Just to get the discussion rolling, > let me > > throw out a really simple example: > > > > SCHEMA: > > > > allowed: > > - type: map > > - keys: > > - type: string > > name: FirstName > > - type: string > > name: LastName > > validation: | > > (value in ['Evans', 'Smith', 'Jones']) > > - type: string > > seq: 1 > > name: Children > > max: 10 > > > > CONFORMING YAML: > > > > --- > > FirstName: John > > LastName: Smith > > Children: > > - Billy > > - Mary > > > > A validating parser would want to have a pull interface to the base > parser. (I > > don't believe any existing YAML implementations provide such an > interface. Most > > implementations load the whole document at once.) On the particular > data set > > that I've given, the validating layer might make the following calls > into a pull > > parser: > > > > > >>>>parser.isStartMap() > >>> > > 1 > > > >>>>parser.getKey() > >>> > > 'FirstName' > > > >>>>parser.getScalar() > >>> > > 'John' > > > >>>>parser.getScalar() > >>> > > 'Smith' > > > >>>>parser.getStartMap() > >>> > > 1 > > > >>>>parser.getKey() > >>> > > 'Children' > > > >>>>parser.isStartMap() > >>> > > 1 > > > >>>>parser.isStartList() > >>> > > 1 > > > >>>>parser.getScalar() > >>> > > 'Billy' > > > >>>>parser.getScalar() > >>> > > 'Mary' > > > >>>>parser.getScalar() > >>> > > StopIteration exception thrown > > > >>>>parser.getKey() > >>> > > StopIteration exception thrown > > > >>>>parser.getKey() > >>> > > StopIteration exception thrown > > > > Does this make sense so far? > > > > Once you implemented all the methods for a pull parser, you would > probably want > > to implement the all-the-data-at-once yaml.load() method on top of > the pull > > parser methods. Since the generic load call would not have a schema > to drive > > the parsing process, you would need some sort of node-testing method(s). > > > > > >>>>parser.getNodeType() > >>> > > 'map' > > > >>>>parser.getNodeType() > >>> > > 'scalar' > > > >>>>parser.getScalar() > >>> > > 'FirstName' > > > >>>>parser.getNodeType() > >>> > > 'scalar' > > > >>>>parser.getScalar() > >>> > > 'John' > > # etc. > > > > Thanks, > > > > Steve > > > > > > > > > > > > > > > > ------------------------------------------------------- > > This sf.net email is sponsored by: OSDN - Tired of that same old > > cell phone? Get a new here for FREE! > > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > > _______________________________________________ > > Yaml-core mailing list > > Yam...@li... > > https://lists.sourceforge.net/lists/listinfo/yaml-core > > > > > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > > > |
From: Rolf V. <rol...@he...> - 2002-08-26 07:43:02
|
Hi, Steve. Have you though about how operators can be expressed ? I'm thinking about OR, for example. Lets say we have a schema where some key can be either a scalar or a map, and if it is a map you want to specify its structure. And directives, such as 'if' ? A different thing is that if we finally have an event tree representing the schema, then we can build a validating parser even with the push parser, at least in Java. Code could be something like: YamlSchema schema = new YamlSchema("schema.yml"); try { Object graph = Yaml.load("aFile.yml", schema); } catch ( SyntaxException e) { ... } catch ( SchemaException e) { ... } The exceptions abort the parsing process. Cheers. Rolf. Steve Howell wrote: > I am starting to think about how I would write a validating YAML parser on top > of PyYaml. There are two major issues here--how do you define the schemas, and > how do you implement the parser. Just to get the discussion rolling, let me > throw out a really simple example: > > SCHEMA: > > allowed: > - type: map > - keys: > - type: string > name: FirstName > - type: string > name: LastName > validation: | > (value in ['Evans', 'Smith', 'Jones']) > - type: string > seq: 1 > name: Children > max: 10 > > CONFORMING YAML: > > --- > FirstName: John > LastName: Smith > Children: > - Billy > - Mary > > A validating parser would want to have a pull interface to the base parser. (I > don't believe any existing YAML implementations provide such an interface. Most > implementations load the whole document at once.) On the particular data set > that I've given, the validating layer might make the following calls into a pull > parser: > > >>>>parser.isStartMap() >>> > 1 > >>>>parser.getKey() >>> > 'FirstName' > >>>>parser.getScalar() >>> > 'John' > >>>>parser.getScalar() >>> > 'Smith' > >>>>parser.getStartMap() >>> > 1 > >>>>parser.getKey() >>> > 'Children' > >>>>parser.isStartMap() >>> > 1 > >>>>parser.isStartList() >>> > 1 > >>>>parser.getScalar() >>> > 'Billy' > >>>>parser.getScalar() >>> > 'Mary' > >>>>parser.getScalar() >>> > StopIteration exception thrown > >>>>parser.getKey() >>> > StopIteration exception thrown > >>>>parser.getKey() >>> > StopIteration exception thrown > > Does this make sense so far? > > Once you implemented all the methods for a pull parser, you would probably want > to implement the all-the-data-at-once yaml.load() method on top of the pull > parser methods. Since the generic load call would not have a schema to drive > the parsing process, you would need some sort of node-testing method(s). > > >>>>parser.getNodeType() >>> > 'map' > >>>>parser.getNodeType() >>> > 'scalar' > >>>>parser.getScalar() >>> > 'FirstName' > >>>>parser.getNodeType() >>> > 'scalar' > >>>>parser.getScalar() >>> > 'John' > # etc. > > Thanks, > > Steve > > > > > > > > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=sourceforge1&refcode1=vs3390 > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core > > > |
From: Steve H. <sh...@zi...> - 2002-08-26 12:46:19
|
Rolf wrote: > Have you though about how operators can be expressed ? I'm thinking > about OR, for example. Lets say we have a schema where some key can > be either a scalar or a map, and if it is a map you want to specify its > structure. And directives, such as 'if' ? > Well, I thought a little bit about the OR scenario. Basically I would use a sequence to provide a list of alternative types. name: Name type: - type: map items: - name: firstname type: scalar - name: lastname type: scalar - type: scalar Pretty ugly, huh? This would validate either of these documents: Name: Rolf Veen --- Name: firstname: clark lastname: evans > A different thing is that if we finally have an event tree representing > the schema, then we can build a validating parser even with the push > parser, at least in Java. Code could be something like: > > YamlSchema schema = new YamlSchema("schema.yml"); > > try { > Object graph = Yaml.load("aFile.yml", schema); > } > catch ( SyntaxException e) { ... } > catch ( SchemaException e) { ... } > > The exceptions abort the parsing process. > Sure, that makes sense to me. As new events get pushed, you adjust where you are in the schema to do the validation. With the pull way, it's the opposite--you basically walk the schema, deciding what kind of data you expect, and then you pull a bit of data from the YAML document and verify that it really conforms. When you start supporting multiple types, as in my example above, things will get a little tricky under either architecture. Cheers, Steve |
From: why t. l. s. <yam...@wh...> - 2002-09-03 06:59:48
|
I've had some time to review this schema discussion some more and I'm becoming more interested in seeing this realized. And as I've thought about it, I think we could do alot by leveraging YAML and YPath. More thoughts below... Steve Howell (sh...@zi...) wrote: > Well, I thought a little bit about the OR scenario. Basically I would use a > sequence to provide a list of alternative types. > > name: Name > type: > - type: map > items: > - name: firstname > type: scalar > - name: lastname > type: scalar > - type: scalar > > Pretty ugly, huh? This would validate either of these documents: > > Name: Rolf Veen > --- > Name: > firstname: clark > lastname: evans Here's my idea of a cool schema for the above: --- #YAML:1.0 !whys/schema /: map: {} /Name: str: {} seq: /firstname: str: {} /initial: str: {} ~: {} /lastname: str: {} Okay, this may or may not make sense right off, but I find it pretty readable. Basically the root-level map contains YPath expressions as the keys. I don't know if they necessarily need to be processed in order, so I figured a regular map would work fine. Each entry in the map has a value which is a map with keys for each plausible type. The value of the nested map shows options pertinent to validating that type entry. So the above schema reads: /: map: {} 'The top level element is a map with no special options.' /Name: str: {} 'The Name key in the map can be a string...' seq: '...or a sequence...' /firstname: str: {} '...where the firstname is a string...' /initial: str: {} ~: {} '...the initial can be a string or not required...' /lastname: str: {} '...and the lastname value must be a string.' An example of options to include with a type might be stuff like allowed styles or simple grammar. For example, you could required the lastname to be in a literal block: /lastname: str: block: literal I've also thought it would be wise to allow many schema in a single file, allowing a large schema to be broken into parts: --- #YAML:1.0 !whys/schema name-schema: /: map: {} # .. rest of name schema employees-schema: /: map: {} /*: name-schema: {} > > A different thing is that if we finally have an event tree representing > > the schema, then we can build a validating parser even with the push > > parser, at least in Java. Code could be something like: > > > > YamlSchema schema = new YamlSchema("schema.yml"); > > > > try { > > Object graph = Yaml.load("aFile.yml", schema); > > } > > catch ( SyntaxException e) { ... } > > catch ( SchemaException e) { ... } > > > > The exceptions abort the parsing process. > > Or (for the above schema with several types): YamlSchemaDocument ysd = new YamlSchemaDocument( "schema.yml" ); try { Object graph = Yaml.load( "aFile.yml", ysd.getSchema( "employees-schema" ) ); } catch ... I know I've wandered quite a ways from what Steve's pitching (and it's all still just coming together in my mind today), but I think it could be powerful to use YPath to do this. You'll see a similiar idea in my YamlDiff solution: http://wiki.yaml.org/yamlwiki/YamlDiff Another grey donkey has passed, _why |
From: why t. l. s. <yam...@wh...> - 2002-09-03 07:21:46
|
Tom Sawyer (tra...@tr...) wrote: > you refer to ypath. where can i find out more about that specification? > > cheers, > tom There is no specification yet. But there was a huge discussion on the list in early August. Start here: http://sourceforge.net/mailarchive/forum.php?thread_id=953037&forum_id=1771 Then scour this page for more: http://sourceforge.net/mailarchive/forum.php?max_rows=25&style=ultimate&offset=58&forum_id=1771 PyYaml currently implements a degree of YPath. Clark sites a number of examples in a reply message found in that first link. _why |
From: Tom S. <tra...@tr...> - 2002-09-03 07:54:31
|
take a look at this. it is based off of xml:proof (see http://www.transami.net/ruby/xmlproof/xmlproof.html) Name: %text% --- Name: firstname: %text% lastname: %text% pretty straight foward, using two documents for the OR condition, which is one option if we can group schema fragments together in some fashion. as for the validating mechinism, xml:proof has something called a die. the %scalar% entry above is a die specifying datatype. you can do many things with die syntax. for example: - #1..9# this validates a sequence such that it must have between 1 and 9 items. and how about this: symbol: /\w+/ this defines a regular expression die that must be comformed to. and of course you can combine these (and usually will): code: %text% /\w+/ the die syntax also provides ways to specify sort order of mappings; whether all, some or one of a group of mappings must exit; and more. now to be sure, taking xml:proof and applying it to yaml requires a little redesign. but it certainly is doable. and IMHO this die syntax is very easy to read, mainly b/c the schema looks so much like the document to be validated, not to mention that it is rather powerful. i do not believe any other schema goes quite so far in its ability to designate the validity of a document, in any markup. -transami On Tue, 2002-09-03 at 01:08, why the lucky stiff wrote: > I've had some time to review this schema discussion some more and > I'm becoming more interested in seeing this realized. And as I've > thought about it, I think we could do alot by leveraging YAML and > YPath. >=20 > More thoughts below... >=20 > Steve Howell (sh...@zi...) wrote: > > Well, I thought a little bit about the OR scenario. Basically I would = use a > > sequence to provide a list of alternative types. > >=20 > > name: Name > > type: > > - type: map > > items: > > - name: firstname > > type: scalar > > - name: lastname > > type: scalar > > - type: scalar > >=20 > > Pretty ugly, huh? This would validate either of these documents: > >=20 > > Name: Rolf Veen > > --- > > Name: > > firstname: clark > > lastname: evans >=20 > Here's my idea of a cool schema for the above: >=20 > --- #YAML:1.0 !whys/schema > /: > map: {} > /Name: > str: {} > seq: > /firstname: > str: {} > /initial: > str: {} > ~: {} > /lastname: > str: {} >=20 > Okay, this may or may not make sense right off, but I find it pretty > readable. Basically the root-level map contains YPath expressions > as the keys. I don't know if they necessarily need to be processed > in order, so I figured a regular map would work fine. >=20 > Each entry in the map has a value which is a map with keys for > each plausible type. The value of the nested map shows options > pertinent to validating that type entry. >=20 > So the above schema reads: >=20 > /: > map: {} >=20 > 'The top level element is a map with no special options.' >=20 > /Name: > str: {} >=20 > 'The Name key in the map can be a string...' >=20 > seq: >=20 > '...or a sequence...' >=20 > /firstname: > str: {} >=20 > '...where the firstname is a string...' >=20 > /initial: > str: {} > ~: {} >=20 > '...the initial can be a string or not required...' >=20 > /lastname: > str: {} >=20 > '...and the lastname value must be a string.' >=20 > An example of options to include with a type might be > stuff like allowed styles or simple grammar. For > example, you could required the lastname to be in > a literal block: >=20 > /lastname: > str: > block: literal >=20 > I've also thought it would be wise to allow many schema > in a single file, allowing a large schema to be broken > into parts: >=20 > --- #YAML:1.0 !whys/schema > name-schema: > /: > map: {} > # .. rest of name schema >=20 > employees-schema: > /: > map: {} > /*: > name-schema: {} >=20 > > > A different thing is that if we finally have an event tree representi= ng > > > the schema, then we can build a validating parser even with the push > > > parser, at least in Java. Code could be something like: > > > > > > YamlSchema schema =3D new YamlSchema("schema.yml"); > > > > > > try { > > > Object graph =3D Yaml.load("aFile.yml", schema); > > > } > > > catch ( SyntaxException e) { ... } > > > catch ( SchemaException e) { ... } > > > > > > The exceptions abort the parsing process. > > > >=20 > Or (for the above schema with several types): >=20 > YamlSchemaDocument ysd =3D new YamlSchemaDocument( "schema.yml" ); >=20 > try { > Object graph =3D Yaml.load( "aFile.yml", ysd.getSchema( "employees-sc= hema" ) ); > } > catch ... >=20 > I know I've wandered quite a ways from what Steve's pitching (and it's > all still just coming together in my mind today), but I think it could > be powerful to use YPath to do this. You'll see a similiar idea in > my YamlDiff solution: >=20 > http://wiki.yaml.org/yamlwiki/YamlDiff >=20 > Another grey donkey has passed, >=20 > _why >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=3Dsourceforge1&refcode1=3Dvs3390 > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core >=20 --=20 tom sawyer, aka transami tra...@tr... |
From: Steve H. <sh...@zi...> - 2002-09-03 12:38:17
|
Why wrote: > > Here's my idea of a cool schema for the above: > > --- #YAML:1.0 !whys/schema > /: > map: {} > /Name: > str: {} > seq: > /firstname: > str: {} > /initial: > str: {} > ~: {} > /lastname: > str: {} > First, let me make a minor clarification--in my original example, I didn't have any sequences. The Name key was intended to map to a value that was either a map with firstname and lastname as the keys, or to a simple scalar. Now, on to business. One concern about the ypath syntax--does the full ypath syntax have to be supported? If so, I think it might be quite difficult to implement a schema-validating parser on top of a simple pull parser. I do like the conciseness of Why's syntax, although the empty hashes at the leaves of the tree are kind of ugly. Look at my example again: > > name: Name > > type: > > - type: map > > items: > > - name: firstname > > type: scalar > > - name: lastname > > type: scalar > > - type: scalar It seems like the essential problem is that any node, we have two pieces of information--name and type--that carry equal importance. You have two choices in how you distinguish the types of metadata: 1) You can use a simple YAML map like I have. 2) You can use more syntantic sugar, like Why and Tom have. These are my problems with syntactic sugar: 1) The punctuation can get ugly--both aesthetically and from a escaping/parsing standpoint. 2) You lose the ability to think in terms of simple data structures for schema-based applications. One nice thing about a verbose schema is that it takes one line to parse--do your YAML load--and then the parsed structure looks *exactly* like the schema itself. Again, going back to my example... > > name: Name > > type: > > - type: map > > items: > > - name: firstname > > type: scalar > > - name: lastname > > type: scalar > > - type: scalar ...if I were trying to generate some kind of automatic web interface from this schema (as a developer), then I have a very simple data structure to work with. Basically, at each node, I grab the name, put that out as the label, then I grab the type, and pick the widget accordingly. Simple. If the type itself is a sequence in the YAML above, then I have to put some kind of widget in their to let the user pick which one of the available types they're entering. Now, look at Tom's xml:proof-like data structure: Name: %text% --- Name: firstname: %text% lastname: %text% From a pure readability standpoint, this is very elegant. From a programmer's standpoint, though, things that are values to me end up being map keys in the proofish data structure. This can be somewhat awkward to process. Actually, after the proofish schema is parsed, the leaf nodes might really look more like my schema internally. For example, the proofish schema might have this: firstname: %text% /(\w+)/ But internally I'd have this: label_for_key: firstname value: datatype: text regex: (\w+) I'd rather work with just one schema, which makes sense to me both externally and internally. Cheers, Steve |
From: Tom S. <tra...@tr...> - 2002-09-03 14:42:28
|
hi steve, i see your point and it makes me realize that my type of schema is really somthing like a "higher-level" schema vs. your "lower-level" one. to make a queer analogy, like perl vs. c. thinking of it this way, i have to say that i am now leaing toward your notation, as it is more important to have yaml's "c" in place before it's "perl". does that make any sense? :-) anyway, it might turn out that we can boost your notation to take into account the many aspects covered by my die sytax (regexp, ranges, etc.) and thus my schema would really be unnecessary, or we can keep yours strictly at the base level, and somthing like my idea would be a complementry technology. also i should point out that a my schema is not something you'd typically parse directly in your own code, rather it is something that would be parsed by another tool, a proofreader, and your code would then communicate with that. a quick mention of why's notation. IMHO i'd just prefer increased initial readability over the nifty use of ypath, outside of this i don't see any significant difference between it and steve's approach. but please, correct me if i'm wrong. -tom On Tue, 2002-09-03 at 06:38, Steve Howell wrote: > Why wrote: > > > > Here's my idea of a cool schema for the above: > > > > --- #YAML:1.0 !whys/schema > > /: > > map: {} > > /Name: > > str: {} > > seq: > > /firstname: > > str: {} > > /initial: > > str: {} > > ~: {} > > /lastname: > > str: {} > > >=20 > First, let me make a minor clarification--in my original example, I didn'= t have > any sequences. The Name key was intended to map to a value that was eith= er a > map with firstname and lastname as the keys, or to a simple scalar. >=20 > Now, on to business. One concern about the ypath syntax--does the full y= path > syntax have to be supported? If so, I think it might be quite difficult = to > implement a schema-validating parser on top of a simple pull parser. >=20 > I do like the conciseness of Why's syntax, although the empty hashes at t= he > leaves of the tree are kind of ugly. >=20 > Look at my example again: >=20 > > > name: Name > > > type: > > > - type: map > > > items: > > > - name: firstname > > > type: scalar > > > - name: lastname > > > type: scalar > > > - type: scalar >=20 > It seems like the essential problem is that any node, we have two pieces = of > information--name and type--that carry equal importance. You have two ch= oices > in how you distinguish the types of metadata: >=20 > 1) You can use a simple YAML map like I have. > 2) You can use more syntantic sugar, like Why and Tom have. >=20 > These are my problems with syntactic sugar: >=20 > 1) The punctuation can get ugly--both aesthetically and from a escaping/p= arsing > standpoint. > 2) You lose the ability to think in terms of simple data structures for > schema-based applications. >=20 > One nice thing about a verbose schema is that it takes one line to parse-= -do > your YAML load--and then the parsed structure looks *exactly* like the sc= hema > itself. Again, going back to my example... >=20 > > > name: Name > > > type: > > > - type: map > > > items: > > > - name: firstname > > > type: scalar > > > - name: lastname > > > type: scalar > > > - type: scalar >=20 > ...if I were trying to generate some kind of automatic web interface from= this > schema (as a developer), then I have a very simple data structure to work= with. > Basically, at each node, I grab the name, put that out as the label, then= I grab > the type, and pick the widget accordingly. Simple. If the type itself i= s a > sequence in the YAML above, then I have to put some kind of widget in the= ir to > let the user pick which one of the available types they're entering. >=20 > Now, look at Tom's xml:proof-like data structure: >=20 > Name: %text% > --- > Name: > firstname: %text% > lastname: %text% >=20 > >From a pure readability standpoint, this is very elegant. From a progra= mmer's > standpoint, though, things that are values to me end up being map keys in= the > proofish data structure. This can be somewhat awkward to process. >=20 > Actually, after the proofish schema is parsed, the leaf nodes might reall= y look > more like my schema internally. For example, the proofish schema might h= ave > this: >=20 > firstname: %text% /(\w+)/ >=20 > But internally I'd have this: >=20 > label_for_key: firstname > value: > datatype: text > regex: (\w+) >=20 > I'd rather work with just one schema, which makes sense to me both extern= ally > and internally. >=20 > Cheers, >=20 > Steve >=20 >=20 >=20 >=20 >=20 > ------------------------------------------------------- > This sf.net email is sponsored by: OSDN - Tired of that same old > cell phone? Get a new here for FREE! > https://www.inphonic.com/r.asp?r=3Dsourceforge1&refcode1=3Dvs3390 > _______________________________________________ > Yaml-core mailing list > Yam...@li... > https://lists.sourceforge.net/lists/listinfo/yaml-core >=20 --=20 tom sawyer, aka transami tra...@tr... |
From: Steve H. <sh...@zi...> - 2002-09-03 15:10:26
|
Tom, I like your explanation of how schemas might have different representations, depending on what level you were looking at them. I also agree with you that the lower-level representation is the more important one to nail down. You could certainly have a higher-level schema representation be parsed into a lower-level representation by some sort of proofreader too. Another thing we should consider when making a schema language is extensibility. A schema is really just metadata for a particular set of YAML documents. My favorite use case for schemas is generating GUI apps for YAML data, because it's a use case that Brian and I implemented successfully with his Perl version of YAML. Suppose you had two different GUI widgets for entering lists, where one works better for small lists, and the other works better for big lists. You might want an extra field in your schema for specifying that widget. A verbose syntax makes that easy. name: Corporation type: map items: - name: Board of directors type: seq widget: small_list - name: Employees type: seq widget: large_list This simple extension to the schema might save you a ton of custom coding, and you can still parse the schema with an ordinary YAML loader. Cheers, Steve Howell http://showell.westhost.com/ P.S. Tom, my mailer doesn't deal well with PGP signatures, so I can't easily qoute your emails. I'm working on finding a better mailer. |
From: why t. l. s. <yam...@wh...> - 2002-09-03 16:20:17
|
Steve Howell (sh...@zi...) wrote: > Why wrote: > > > > Here's my idea of a cool schema for the above: > > > > --- #YAML:1.0 !whys/schema > > /: > > map: {} > > /Name: > > str: {} > > seq: > > /firstname: > > str: {} > > /initial: > > str: {} > > ~: {} > > /lastname: > > str: {} > > > > First, let me make a minor clarification--in my original example, I didn't have > any sequences. The Name key was intended to map to a value that was either a > map with firstname and lastname as the keys, or to a simple scalar. Error on my part. The 'seq' key should be a 'map'. Sorry bout that. It was 1AM. > Now, on to business. One concern about the ypath syntax--does the full ypath > syntax have to be supported? If so, I think it might be quite difficult to > implement a schema-validating parser on top of a simple pull parser. I'm not sure if all of YPath would be needed or not. I envision that just simple paths would be needed. > I do like the conciseness of Why's syntax, although the empty hashes at the > leaves of the tree are kind of ugly. Well, can we work this out Steve? We could do this: # .. in the /Name validating tree .. map: /firstname: - str /initial: - str - ~ /lastname: - str: block: literal Visually this makes sense to me. The type list is an array of strings. If a certain type entry has additional data, then you could make the string a map key with the mapping storing other data. It would help compress the schema as well, if desired: --- #YAML:1.0 !whys/schema /: [map] /Name: - str - seq: /firstname: [str] /initial: [str, ~] /lastname: [str] > Look at my example again: > > > > name: Name > > > type: > > > - type: map > > > items: > > > - name: firstname > > > type: scalar > > > - name: lastname > > > type: scalar > > > - type: scalar > > It seems like the essential problem is that any node, we have two pieces of > information--name and type--that carry equal importance. You have two choices > in how you distinguish the types of metadata: > > 1) You can use a simple YAML map like I have. > 2) You can use more syntantic sugar, like Why and Tom have. > > These are my problems with syntactic sugar: > > 1) The punctuation can get ugly--both aesthetically and from a escaping/parsing > standpoint. > 2) You lose the ability to think in terms of simple data structures for > schema-based applications. Yeah, that's interesting. Because I can't see myself working well with the schema you have. It's too verbose to hand write. Plus, how does an array definition look? - type: seq items: - type: scalar - type: integer - type: map (Meaning 'an array of scalar, integer, map only, please!') Or would it be... - type: seq items: - type: scalar - type: seq (Meaning 'scalars or seq allowed in this array!') Or would you provide a way for both to be expressed? This is where I could see YPath excelling. Instead of having to learn the various ways to structure my sequence schema, I know the YPaths to get to what needs to be validated: /arr/0: [scalar] /arr/1: [integer] /arr/2: [map] /arr/*: [scalar, seq] > One nice thing about a verbose schema is that it takes one line to parse--do > your YAML load--and then the parsed structure looks *exactly* like the schema > itself. Again, going back to my example... > > > > name: Name > > > type: > > > - type: map > > > items: > > > - name: firstname > > > type: scalar > > > - name: lastname > > > type: scalar > > > - type: scalar > > ...if I were trying to generate some kind of automatic web interface from this > schema (as a developer), then I have a very simple data structure to work with. > Basically, at each node, I grab the name, put that out as the label, then I grab > the type, and pick the widget accordingly. Simple. If the type itself is a > sequence in the YAML above, then I have to put some kind of widget in their to > let the user pick which one of the available types they're entering. I can see how a plain hierarchy like you have would make this very simple. But my hierarchy isn't far off. If I limited the YPath syntax to simple names then the difference between us is negligible. You call it syntactic sugar, but really we're just carrying different data to accomplish the same thing. > I'd rather work with just one schema, which makes sense to me both externally > and internally. Sounds like you've already made up your mind, which sucks because discussion is just beginning. I still think you have a lot of limitations and excess that can be eliminated. Honestly, I'm kind of confused. You've said yourself that you think your schema examples are ugly. You seemed displeased in your previous messages. And yet, you're not really giving my or Tom's suggestion a chance. You're devouring it like it's schem-a-roni. What gives? _why |
From: Steve H. <sh...@zi...> - 2002-09-03 17:00:42
|
----- Original Message ----- From: "why the lucky stiff" <yam...@wh...> > > Well, can we work this out Steve? We could do this: > > # .. in the /Name validating tree .. > map: > /firstname: > - str > /initial: > - str > - ~ > /lastname: > - str: > block: literal > First of all, this does look better to me than your original version. What would do you do, though, if you wanted to add a regex for firstname? Once the "firstname" node has multiple attributes, I would go to a map: name: firstname type: str regex: [A-Z][a-z]+ I admit that my schema is ugly for the simplest case, but I think it's more flexible for the complex cases. > Plus, how does an array > definition look? > > - type: seq > items: > - type: scalar > - type: integer > - type: map > > (Meaning 'an array of scalar, integer, map only, please!') Or would it be... > > - type: seq > items: > - type: scalar > - type: seq > > (Meaning 'scalars or seq allowed in this array!') Or would you provide > a way for both to be expressed? This is where I could see YPath excelling. > Instead of having to learn the various ways to structure my sequence schema, > I know the YPaths to get to what needs to be validated: > > /arr/0: [scalar] > /arr/1: [integer] > /arr/2: [map] > > /arr/*: [scalar, seq] > How about a hybrid: - type: seq widget: popup_dialog items: - 0: type: scalar - 1: type: integer - 2: type: map - *: type: [{type: scalar}, {type: seq}] > [...] > > I'd rather work with just one schema, which makes sense to me both externally > > and internally. > > Sounds like you've already made up your mind, which sucks because discussion > is just beginning. > No, please don't read my posts that way. I'm just trying to state my objectives, which can probably reached many different ways, and I am not even sure my objectives make sense. Just going on the lessons that I've learned from one schema-based project, which might not be the most representative use case for most people. One thing that was unique about our project was that we didn't hand-write any of the schemas. We were porting the schemas from XML schemas that were done in XML-Spy. In general, I don't think there will be too much hand-writing of schemas in the long run. Eventually, we should have YAML-driven tools that generate the schemas. > I still think you have a lot of limitations and excess that can be eliminated. > Honestly, I'm kind of confused. You've said yourself that you think your schema > examples are ugly. You seemed displeased in your previous messages. And yet, > you're not really giving my or Tom's suggestion a chance. You're devouring it > like it's schem-a-roni. What gives? > I was hungry! |
From: why t. l. s. <yam...@wh...> - 2002-09-03 17:33:06
|
Steve Howell (sh...@zi...) wrote: > > > > Well, can we work this out Steve? We could do this: > > > > # .. in the /Name validating tree .. > > map: > > /firstname: > > - str > > /initial: > > - str > > - ~ > > /lastname: > > - str: > > block: literal > > > > First of all, this does look better to me than your original version. > > What would do you do, though, if you wanted to add a regex for firstname? Once > the "firstname" node has multiple attributes, I would go to a map: > > name: firstname > type: str > regex: [A-Z][a-z]+ /lastname: - str: regex: '[A-Z][a-z]+' The mapping is for options. > How about a hybrid: > > - type: seq > widget: popup_dialog > items: > - 0: > type: scalar > - 1: > type: integer > - 2: > type: map > - *: > type: [{type: scalar}, {type: seq}] Not a bad answer. That last line really throws me though at first glance. I guess I'd really like to see the 'type' key go away with the type becoming a string or a map key with options. Indeed, this would be nothing short of frosty sno-cones. How would the '*' work in a mapping though? Can you not have a map key with an asterisk? > No, please don't read my posts that way. I'm just trying to state my > objectives, which can probably reached many different ways, and I am not even > sure my objectives make sense. Just going on the lessons that I've learned from > one schema-based project, which might not be the most representative use case > for most people. Cool. Understood. My use case is hand-writing. I'm going to spec out my ideas for YAML structures with whatever this ends up being. But I can see your use case as well. YPath would complicate things for languages which don't have YPath. And I like your idea of using some of the YPath syntax to express the schema. (Your '*' entry in the map above.) Personally, I'm not against having more than one schema. I think there's times when you need a standard and times when you need to try a couple things out to see what works. I may just start !okay/schema as an experiment or as an abstraction layer above your full-figured formatting. _why |
From: Steve H. <sh...@zi...> - 2002-09-03 20:07:37
|
----- Original Message ----- From: "why the lucky stiff" <yam...@wh...> > > What would do you do, though, if you wanted to add a regex for firstname? Once > > the "firstname" node has multiple attributes, I would go to a map: > > > > name: firstname > > type: str > > regex: [A-Z][a-z]+ > > /lastname: > - str: > regex: '[A-Z][a-z]+' > I go back to Tom's idea that it's almost like two layers of schema. Your schema is a high-level representation that compresses out the most annoying hash keys: "type" and "name." My schema is the low-level representation that's easy to program off of: schema = yaml.load(steves_doc).next() make_html_widget( schema['name'], schema['type'], schema['regex']) vs. maybe: schema = yaml.load(whys_doc).next() slashed_name = schema.keys()[0] name = slashed_name[1:] make_html_widget( name, schema[name]['type'], schema[name]['regex']) It's the whole conservation of ugliness theory. The ugliness that you remove from the schema can seep into the code, if you're not careful. > [...] > How would the '*' work in a mapping though? Can you not have a map key > with an asterisk? > Don't know. > > My use case is hand-writing. I'm going to spec out my ideas for YAML structures > with whatever this ends up being. > [...] > > Personally, I'm not against having more than one schema. I think there's times > when you need a standard and times when you need to try a couple things out to > see what works. I may just start !okay/schema as an experiment or as an abstraction > layer above your full-figured formatting. > Sounds good. I am all in favor of experimentation, and I like the layered approach. Also, I really think that YAML is flexible enough to support multiple flavors of schema, including one-off versions for certain applications. |
From: why t. l. s. <yam...@li...> - 2002-09-03 14:49:16
|
Steve Howell (sh...@zi...) wrote: > Why wrote: > > > > Here's my idea of a cool schema for the above: > > > > --- #YAML:1.0 !whys/schema > > /: > > map: {} > > /Name: > > str: {} > > seq: > > /firstname: > > str: {} > > /initial: > > str: {} > > ~: {} > > /lastname: > > str: {} > > > > First, let me make a minor clarification--in my original example, I didn't have > any sequences. The Name key was intended to map to a value that was either a > map with firstname and lastname as the keys, or to a simple scalar. Error on my part. The 'seq' key should be a 'map'. Sorry bout that. It was 1AM. > Now, on to business. One concern about the ypath syntax--does the full ypath > syntax have to be supported? If so, I think it might be quite difficult to > implement a schema-validating parser on top of a simple pull parser. I'm not sure if all of YPath would be needed or not. I think most schemas are pretty straightforward. I envision that just simple paths would be needed. > I do like the conciseness of Why's syntax, although the empty hashes at the > leaves of the tree are kind of ugly. Well, can we work this out Steve? We could do this: # .. in the /Name validating tree .. map: /firstname: - str /initial: - str - ~ /lastname: - str: block: literal Visually this makes sense to me. The type list is an array of strings. If a certain type entry has additional data, then you could make the string a map key with the mapping storing other data. It would help compress the schema as well: --- #YAML:1.0 !whys/schema /: [map] /Name: - str - seq: /firstname: [str] /initial: [str, ~] /lastname: [str] > Look at my example again: > > > > name: Name > > > type: > > > - type: map > > > items: > > > - name: firstname > > > type: scalar > > > - name: lastname > > > type: scalar > > > - type: scalar > > It seems like the essential problem is that any node, we have two pieces of > information--name and type--that carry equal importance. You have two choices > in how you distinguish the types of metadata: > > 1) You can use a simple YAML map like I have. > 2) You can use more syntantic sugar, like Why and Tom have. > > These are my problems with syntactic sugar: > > 1) The punctuation can get ugly--both aesthetically and from a escaping/parsing > standpoint. > 2) You lose the ability to think in terms of simple data structures for > schema-based applications. Yeah, that's interesting. Because I can't see myself working well with the schema you have. It's too verbose to hand write. Plus, how does an array look? - type: seq items: - type: scalar - type: integer - type: map (Meaning 'an array of scalar, integer, map only, please!') Or would it be... - type: seq items: - type: scalar - type: seq (Meaning 'scalars or seq allowed in this array!') Or would you provide a way for both to be expressed? This is where I could see YPath excelling. Instead of having to learn the various ways to structure my sequence schema, I know the YPaths to get to what needs to be validated: /arr/0: [scalar] /arr/1: [integer] /arr/2: [map] /arr/*: [scalar, seq] > One nice thing about a verbose schema is that it takes one line to parse--do > your YAML load--and then the parsed structure looks *exactly* like the schema > itself. Again, going back to my example... > > > > name: Name > > > type: > > > - type: map > > > items: > > > - name: firstname > > > type: scalar > > > - name: lastname > > > type: scalar > > > - type: scalar > > ...if I were trying to generate some kind of automatic web interface from this > schema (as a developer), then I have a very simple data structure to work with. > Basically, at each node, I grab the name, put that out as the label, then I grab > the type, and pick the widget accordingly. Simple. If the type itself is a > sequence in the YAML above, then I have to put some kind of widget in their to > let the user pick which one of the available types they're entering. I can see how a plain hierarchy like you have would make this very simple. But my hierarchy is straighforward as well. If I limited to YPath to simple names then the difference between us is negligible. You call it syntactic sugar, but really we're just carrying different data to accomplish the same thing. > I'd rather work with just one schema, which makes sense to me both externally > and internally. Sounds like you've already made up your mind, which sucks because discussion is just beginning. I still think you have a lot of limitations and excess that can be eliminated. Honestly, I'm kind of confused. You've said yourself that you think your schema examples are ugly. You seemed displeased in your previous messages. And yet, you're not really giving my or Tom's suggestion a chance. They appear to be your cannon fodder. Hopefully, you can help us pull out the advantages of each to construct something better. _why |