From: Keith D. <ya...@ke...> - 2002-06-17 09:07:16
|
Hi all, I've been following the discussion about quoting rules. I just wanted to put in my 2c - if I misunderstand the issue or repeat something that's already been said and rejected, please be nice when you correct me :) With that out of the way... first, my understanding of the issue: We want YAML to be human writable. To accomplish this, we want implicit types wherever possible. However, if your implicit types are defined in a way that isn't *completely* obvious, it's real easy to say something you don't mean, as we've seen with the ending period example. Furthermore, we want to make sure we don't give ourselves a backward compatibility headache later because we've been too free with what we'll take in. To start with: a period on my screen is ONE PIXEL. I don't want to have something I type turn from a string into something invalid because I have a one pixel dot on the end. For the sake of argument, say you accept spaces in an implicit string, and I have my address in YAML. My address is 99 Ronald Ct. If I change it to 99 Ronald Ct. it suddenly isn't a string anymore. BAD. So, here's my proposal. ALL inline strings have to be quoted, either with a single or double quote (normal escaping rules apply). Therefore, there's nothing implicit. Furthermore, if you do that, you don't always have to worry about what is defined as an implicit string when you're worrying about what can be implicit with your other types. (You don't have to worry about whether 2002-02-02 is a string or a date.) More importantly, you don't have to worry about backward compatibility issues -- strings will always be quoted, and be *obviously* strings. DONE. Forever. This gives us a lot of neat benefits. I'll list them here for our enjoyment: To reiterate: 1. No backward compatibility issues: strings are quoted, that's it. New types can fit in without any trouble. 2. No surprises! (One pixel shouldn't change the meaning of some text. Same goes for the few pixels required for a comma, or apostrophe). 3. Frees up unquoted text to be implicitly whatever we want. To give a bunch of examples (using backquote to delineate YAML text). Null can stay as `~`, or be `none`, `nil`, `null` or all of the above can be accepted. Similarly: Booleans can just be `yes`, `no`, `true`, `false`, `on`, and `off`. Numbers can look like numbers. You can even recognize things like urls, dates and times, e-mail addresses, ip addresses, filenames, etc. See comments about REBOL below. 4. This is *much* better defined than "unquoted strings match this regex", otherwise you need to put quotes around it. It takes a lot more brain cycles to constantly be checking to make sure your "string" matches the string production in the YAML grammar. Finally, since *most* strings have to be quoted, why not make all strings quoted, and give yourself all the benefits I'm listing? 5. This simplifies the parser a lot. If it's quoted, it's a string, and you're done. This means that anything else is a special value (like `yes`, or `null`), and if it isn't, it's an error. I think we can learn a lot from what REBOL has done with implicit types. REBOL has like 20 built in datatypes and doesn't get confused. Strings are always quoted or surrounded by curly braces, REBOL understands e-mail addresses and URLs natively, it understands certain date and time formats natively, boolean values, numbers, money, none (REBOL's null type), tuples (like IP addresses), tags (like XML or HTML tags), file names (in a canonical cross-platform format), words (symbols in the REBOL language - think lisp/scheme), and more. See http://www.rebol.com/docs/core23/rebolcore-2.html#sect2. for a gentle introduction to REBOL's types, and see http://www.rebol.com/docs/core23/rebolcore-16.html for a little more depth than you probably need right now. Finally, if you accept a few date formats (like ISO8601) natively, it doesn't make it any harder to parse, especially since you don't have to figure out whether it's a string or not. This is going a little bit off-topic, but if you decide on a date/time format such as YYYY-MM-DD HH-MM-SS (and maybe a timezone), you don't have to have a "!" to tell the YAML parser that you want a date - it'll know because it doesn't have to worry about it being a string. You can include just the date, or the time, etc. I'll stop here. Take a look at how REBOL does it, I think it makes sense. Any major objections? Respectfully, Keith |