From: <ir...@ms...> - 2002-09-08 23:40:50
|
I tried to read the spec again yesterday, but again it made my head spin. I don't have a background in tokenizer/parser theory so I can't say which parts are necessary and which are not. I just noticed a lot of words like "syntax", "serial", "native", "generic", "parser", "loader", "dumper", "emitter", "type family", "kind", "graph", "transfer", "taguri" that are too much to assimilate all at once. It leaves me with the feeling -- and this is what I told Steve -- "Does YAML really need a spec that's 73 pages long?" If it does, maybe it's not a "simple" replacement for XML, and maybe it's not the tool for small jobs. Now, Python's grammar and parser module (AST objects) also make my head spin for the same reason. But fortunately they take up only one small corner of the documentation and don't prevent me from *using* Python. They do prevent me from using the parser module to write a program that lists all a program's dependencies (imports), a project I've tried to do on and off for years but have always given up on. (Yes, I could do a text search for "import" but I want to use Python's real parser.) But that's just one insignificant application, not something that prevents me from using Python. XML on the other hand requires you to learn the gory details of DTDs, namespaces, XSLT, etc, in order to do anything more than the simplest file. You can't even use most public DTDs with a pretty intricate knowledge, much less add a couple private tags to them, and the whole thing gives you a sense of "Why does this have to be so complicated? Why do I have to specify so many details just because some huge corporation needs them for their documents?" YAML has a large gap between the quick reference on the site (and the other tutorials) and the spec. The tutorials get you started but then what? For instance, the date format. Four formats, but no explanation about what it's supposed to be used for, what it's not supposed to be used for, and why it doesn't handle certain less-precise formats users would commonly want. I realize YAML is a new and rapidly-changing project and that this will come. Still, these are the kinds of problems users face. The gap can eventually be filled by a big users' guide so that most people don't have to look at the spec, -or- by simplifying the spec so it can double as a users' guide. For instance, do we need five quoting styles for strings (bare, '', "", |, > )? Or 3+ styles for single- and multi-line mappings? I wasn't here when each style was adopted, and maybe they are all necessary for significant problem domains, but it's worth considering whether any can be dropped. That would shorten the spec and the quickref, and perhaps make YAML more accessible. At the least, it would make the Perl/Python/Ruby implementors' job easier, not having to code so many special cases. Note that I'm not saying we need to do something about strings and mappings in particular; I'm just using them as examples. I like Oren's direction in his big message today. Supporting schema-blind and schema-specific usages equally seems wise, and I like the idea of banishing type conversions to the smallest portion of code possible. > | Nobody's goldfish is gonna die if I provide this *optional* method. > > This is where we differ. No body is going to die if _why doesn't > throw an error when he finds duplicate keys in his loader. It's possible for the loader to have a default recommended mode that does whatever, -and- provide options like "extract mappings as pairs", "all values are strings", or anything we find a significant constituency for in the future. The options make certain uses of YAML possible *without* detracting from the core/recommended usage. (Now, maybe Oren's schema proposal will handle all the situations these options would, we'll see.) Perhaps some comparisions between YAML and Cheetah would be helpful. I'm a developer for Cheetah, a string template system for Python (www.cheetahtemplate.org). I don't work on the parser code (can you guess why?), but I do other coding and also wrote the users' guide and developers' guide. 1) We want Cheetah to be the template system of choice for the widest variety of tasks. Its primary purpose is for Webware servlets, and we won't add anything that detracts from that. But we've added features needed by CGI scripts, Python source generators, Java source generators, and shell scripts running templates as standalone commands. We don't add features one person needs. But we add features that make Cheetah suitable for an entire class of applications it wasn't before. Anyway, I see YAML as being able to fulfill a similar role in the data-storage/data-exchange realm. Certain core uses, but also flexible for other uses on the side. 2) Rather than working from a written grammar and spec, Cheetah's design was based on a spec inside somebody's head. Then the users' guide was written, and I'm now writing an EBNF grammar based on the users' guide. Not a suitable approach for an interoperational project like YAML, but it's interesting that the two grew in almost opposite directions. 3) YAML is in rapid change now anticipating a 1.0 spec freeze. Cheetah was in rapid change last summer and then entered a long beta while we slowly decided whether we're all comfortable with the current spec. Since there haven't been any requests for backward-incompatible changes for several months (except one), we're pretty confident about entering a final beta as soon as a few small tasks are completed. There are plenty of requests after that, but they are all behind-the-scenes work or adding optional features, not things that would change the spec. It's like popping popcorn, or at least that's been our experience. Eventually the popping slows down and you can freeze the spec, confident most people are comfortable with it. And if you end up in a stalemate argument, the BDFL says this is the way it's gonna be, and the dissenters either take it or leave it. But usually the best design emerges on its own. Thanks to Brian, Clark and Oren for putting all this work into what might turn out to be a lifesaver for XML refugees. -- -Mike (Iron) Orr, ir...@ms... (if mail problems: ms...@oz...) http://iron.cx/ English * Esperanto * Russkiy * Deutsch * Espan~ol |