From: Clark C . E. <cc...@cl...> - 2002-09-05 18:53:27
|
Mike Orr wrote: | A few more thoughts about string values and types. Part of it may seem | to advocate strong typing and another part weak typing, but it's not | a contradiction, just an acknowledgement that in some cases the one | strategy is more convenient, and sometimes the other. Thanks Mike. Indeed this is a very difficult issue; and our "graph model" reflects a compromise. Each scalar is a string coupled with type information. It's a middle-of-the-road approach and the libyaml parser (without a specific language binding) will provide direct access to this information. What I think You and Steve are getting at is how to make a custom loader, that is, one that doesn't necessarly use the type information part (but only the string). And this is why the information model separates parsing from loading. | I decided to go back to MySQL as the storage format for my event | calendar, partly because it offers more type choices and mature | querying-by-date, but also because I like the reassurance of knowing | each field is a certain type and cannot store another type. I'm looking | at YAML for this application now more as an export/edit/import format | rather than a storage format, alongside other editing formats (e.g., a | wxWindows GUI). YAML isn't and will never be a replacement for a relational database store which is optimized for serious indexing and querying. However, the type system of YAML should be able to support clean movement of information between databases and other systems. Eventually YAML will get a very serious path and query/transform tools and at this time you may want to consider using it for light-weight storage of simple "files". | The "contradictory" part is, I started out being glad YAML has all these | types (string, integer, null, timestamp), but I ended up thinking I'd | rather use it just as a string parser. We've talked about the | limitations of date regexes I'm still interested in the specific problems you are having regarding the timestamp. | With some applications, you want the reassurance that the value will be | a string without forcing the .yml file maintainer to remember all the | conversion and quoting rules. For instance, a string value may happen | to look like a decimal integer. That's fine in Perl which automatically | converts between int and string. But in Python, if you do a string | operation on an integer you get an exception. Likewise, Perl | automatically converts null (undef) to/from ', but Python raises an | exception if you use null (None) as ' or vice-versa. So I want strong | typing on predefined (database) values, but weak typing ("everything is | a string") on "unreliable" values (web input to a CGI script, a | configuration file) that still needs to be checked and validated. Well, YAML's core purpose is data serialization and messaging. You've identified above why a common set of type definitions is very useful for moving data between languages/environments. | So I like the idea of a "scalar values are always strings" option. | Whether it's the default or not I don't care, as long as I can set it | conveniently in one step. This is the "graph model" view of YAML, where scalars are strings plus a type family. Right now PyYaml merges the parsing and loading processes so that you only have access to the python language binding of YAML. Thus, you don't really have access directly to the graph model... | I also like the idea of layering we've been discussing: having an | optional middle layer above the parser that does the type conversions. | Whether that's part of YAML or above YAML I don't care, as long as | there's a standardized way to do it so the app developer doesn't | have to recreate it from scratch each time. I'm very worried about this layer if it requires access a third document in order to give information about what nodes have what types. This approach is very good for human readability, but costs quite a bit as far as administration goes (the need for a local registry, etc.) especially if the process works with more than one language. | Finally, while some applications may prefer type identifiers to be | encoded right in the .yml document as part of the value (!!mx.DateTime), | mine would prefer not. It reminds me too much of a Roxen configuration | file (which puts <int>...</int> or such around integer values, for | instance). I'd rather have the .yml maintainer and the YAML parser know | by external knowledge that certain fields are certain types, and that | its ostensible string value is really a constructor argument for a type | converter, rather than putting !!type in the document or having to put | single quotes around the value. Yes, an external document to replace the "built-in" type bindings may be a good idea. Right now I think that this is 6 months out at the very least as there are many ways to do it and lots of small issues which will emerge. Certainly one or more bindings could hack it out in a week... this isn't the issue. The issue is formalizing the mechanism so that it can be used across the board. I'm really cautious about this.. many people jumped to XML beacuse the DTDs weren't required. It'd be nice to have a good 80% type system and then have something more complicated for the advanced people. Best, Clark |