From: Oren Ben-K. <or...@ri...> - 2002-06-16 07:23:46
|
Ryan King [mailto:rk...@pa...] wrote: > Is it true that, under the new spec, quotes are required for > strings containing anything other than alphanumeric, spaces, and > underscores? Yes and no :-) If you want the value to be considered a string, then in general yes. You forgot to mention '-' which is allowed. So, your example can be written as: Foo-bar: 'baz.' Have fun, Oren Ben-Kiki |
From: <sh...@zi...> - 2002-06-17 01:40:51
|
> On 2002.06.16, Oren Ben-Kiki <or...@ri...> wrote: > > Ryan King [mailto:rk...@pa...] wrote: > > > Personally, I would go the complete opposite direction, and make the > quoting rule this: "You only must quote that which is syntactically > ambiguous." > > Pros: > - It makes the output quieter. Ticks and double-ticks are noisy, > and generally of no help to the humans, who can usually tell what > is meant without them. > > - It makes the input easier for humans. A machine doesn't care if > it has to put extra escaping around a string, but a human does. > Therefore, we should minimize the amount of escaping needed by a > hand-edited file, etc.), even if it means we trade a little > syntactic simplicity for machine-generated files. > I really like the goal of making YAML a human-writable format. It's an important niche for YAML to fill. If you just want persistence, most languages have alternatives. If you want interoperability, XML's not a bad choice, despite all its warts. But if you want human-writability, I don't really know many good alternatives to YAML. > Cons: > - Causes migration issues as new implicit types are needed. > YAML's still young enough for this not to be a big issue. > - Causes implementation issues (codifying "that which is > syntactically ambiguous" into a set of productions would be a > fairly challenging task) > Yep, but I think the cost/benefit ratio here could be well worth it. > I recognize that the implicit types issue is a major caveat with my idea > as-is. What if, then, we went with a different approach toward implicit > types: To do away with them. > > Here is a theoretically possible rule set: > > - Every leaf is a string by default > > - Leaves preceded by "!foo" are given type "foo" (as they currently > do) > > As a syntactic optimization, we could add this rule: > > - Leaves preceded by lone !'s are pattern-matched to determine types > > Examples: > "5" # A string: > 5 # Also just a string: > !int 5 # This parses as an "int": > ! 5 # Also parses as an "int", because the string "5" pattern-matches > I like this solution, although maybe !! would be better. |
From: Oren Ben-K. <or...@ri...> - 2002-06-17 07:36:24
|
Brian Ingerson [mailto:in...@tt...] wrote: > I cautiously concur. Perhaps we've oversimplied the simple scalar. For > instance, this example in the spec is now wrong: > > first: There is no unquoted empty string. > second: 12 # This is an integer. > third: !str 12 # This is a string. > span: this contains # Interleaved > three spaces # comments. > > The final period makes it invalid. Sure, you can call it an > oversite, but if > /we've/ made the oversite, think how often other people will... My oversight - I didn't go over the simple examples to ensure they are still simple. There are probably other such bad examples now... > In my opinion, this could really make or break the "YAML as > config file" use case. Hmmm. > > Here is a theoretically possible rule set: > > > > - Every leaf is a string by default > > > > - Leaves preceded by "!foo" are given type "foo" (as > > they currently do) > > > > As a syntactic optimization, we could add this rule: > > > > - Leaves preceded by lone !'s are pattern-matched to > > determine types > > > > Examples: > > "5" # A string: > > 5 # Also just a string: > > !int 5 # This parses as an "int": > > ! 5 # Also parses as an "int", because the string > > # "5" pattern-matches That would work. In fact '! 5' has this meaning today (a standalone '!' forces implicit typing). > I'd be perfectly happy with: > > string: 2001-12-14 > date: ! 2001-12-14 > int: 42 > float: 3.12 > null: ~ > bool: ! true > url: ! http://www.yaml.org > ipaddr: ! 192.168.55.9 That would make integers, floats and null "more magical" than anything else - a very special case. Let's see how this would look with a consistent '!' for implicit: string: 2001-12-14 date: ! 2001-12-14 int: ! 42 float: ! 3.12 null: ! ~ or: ! null # Hmmm. We could use '.null' instead of or # in addition to '~', in today's scheme. # Should we? bool: ! true url: ! http://www.yaml.org ipaddr: ! 192.168.55.9 Some gut reactions: - People may forget the '!' and be confused about why the file doesn't work as they expect. It is one of those things you'll need a buddy to look over your shoulder to see... while a quoted '2001-12-14' is *obviously* a string. On the other hand, today people may put non-word/ /_ characters in a simple value and be confused as to why *that* doesn't work. No clear winner here, I guess. - Having the '!' for numbers is butt-ugly. For the other cases, it is "less bad". Especially if we use 'null' instead of '~'. So, it seems to me that if we go that way, only numbers - integers and floats - should be considered "basic" enough to warrant omitting the '!'. And I'd still feel uncomfortable about that :-) - It still takes getting used to. I'm not certain how users would react. The current way is *much* cleaner, when one is using an implicit type... So, I have mixed feelings about this. > address: 123 Main Street Yes, that's nice. > movie: The Good, the Bad, and the Ugly > title: YAML: The new frontier Wait! Here you are suggesting an independent relaxation. You are implying that if a simple value isn't contained in an in-line collection it may contain in-line indicators. Example: movie: The Good, the Bad, and the Ugly movies: [ 'The Good, the Bad, and the Ugly', For a fistful of dollars ] It is possible to make this special case... Do we want to? > --- > > Side note: we might want to consider '!!' for private > implicits. That would > match private explicits. Double hmmm. :-) > I know it's a lot to swallow at this point. But let's give it > some thought please. Let's. You are proposing a tradeoff - let's make implicit types "less elegant", in order to make simple strings "more elegant". This seems a zero-sum game - we can't have both, they are competing for the same "syntax space". Or are they...? Let's see if we can't have a win-win here. The problem is future implicit types. Otherwise we could just say that any simple value not matching any regexp is a string. That would be the best of all worlds. But we can't do that - we want to leave the door open for additional types. OK, we need to commit on what is a string. So, maybe we could work around the problem by just relaxing the regexp for implicit strings? We just restricted it from "starting with alpha" to "contain word/ /_". Perhaps *this* is the problem. How about we say that it must: - Begin with alpha (or '_'). Yes, I know, the address example breaks. But that's really a very special case - and it doesn't relate to our main use case of configuration files etc. - Contain only word, '_', ' ', and some other characters we specify that would "reasonably" appear in "simple text". That is, punctuation characters: ',', '.', ';', '!', '?', '/', '''', '"', , '(', ')'. - Note the above excludes ':'. Not because of ambiguity (see below), but because of URIs. It is possible to include ':' in the allowed characters and still have URLs be implicit types (giving up on URIs) if we do one of the following: - Add a rule that forbids any punctuation character to be doubled. This would give us URLs as implicit types (they all contain '//'). General URIs would still be a problem. However, general URIs aren't a very common data type anyway. - Simply forbid '/' in an implicit string, giving up on "this is a yes/no answer." type of sentences. That's simpler. Again it gives us URLs but not general URIs. To complement relaxing the string regexp we'd need to: - Relax the restrictions for space and in-line indicators. Today we are very restrictive; they are completely banned from all simple values. However: '- ' may be used except for at the start of a simple key; '[' may be used except for at the start of a simple scalar; ': ' and ']' may be used except for inside in-line sequences; ', ' may be used except for inside in-line collections; This gives us four simple variants: - Simple key, not in inline collection - Simple value, not in inline collection - Simple scalar in inline sequence - Simple scalar in inline mapping Each would have a slightly different set of restrictions. So, to summarize, the alternative is: - Relax the string regexp to "anything that seems like a sentence". - Relax the simple value restrictions to "anything that isn't ambiguous". - Oh, and allow .null for nulls in addition to ~ :-) This seems a win-win; we keep the elegant simple strings *and* the elegant implicit values, at the cist of - again - implementation complexity (not too much of it). Thoughts? Have fun, Oren Ben-Kiki |
From: Keith D. <ya...@ke...> - 2002-06-17 09:07:16
|
Hi all, I've been following the discussion about quoting rules. I just wanted to put in my 2c - if I misunderstand the issue or repeat something that's already been said and rejected, please be nice when you correct me :) With that out of the way... first, my understanding of the issue: We want YAML to be human writable. To accomplish this, we want implicit types wherever possible. However, if your implicit types are defined in a way that isn't *completely* obvious, it's real easy to say something you don't mean, as we've seen with the ending period example. Furthermore, we want to make sure we don't give ourselves a backward compatibility headache later because we've been too free with what we'll take in. To start with: a period on my screen is ONE PIXEL. I don't want to have something I type turn from a string into something invalid because I have a one pixel dot on the end. For the sake of argument, say you accept spaces in an implicit string, and I have my address in YAML. My address is 99 Ronald Ct. If I change it to 99 Ronald Ct. it suddenly isn't a string anymore. BAD. So, here's my proposal. ALL inline strings have to be quoted, either with a single or double quote (normal escaping rules apply). Therefore, there's nothing implicit. Furthermore, if you do that, you don't always have to worry about what is defined as an implicit string when you're worrying about what can be implicit with your other types. (You don't have to worry about whether 2002-02-02 is a string or a date.) More importantly, you don't have to worry about backward compatibility issues -- strings will always be quoted, and be *obviously* strings. DONE. Forever. This gives us a lot of neat benefits. I'll list them here for our enjoyment: To reiterate: 1. No backward compatibility issues: strings are quoted, that's it. New types can fit in without any trouble. 2. No surprises! (One pixel shouldn't change the meaning of some text. Same goes for the few pixels required for a comma, or apostrophe). 3. Frees up unquoted text to be implicitly whatever we want. To give a bunch of examples (using backquote to delineate YAML text). Null can stay as `~`, or be `none`, `nil`, `null` or all of the above can be accepted. Similarly: Booleans can just be `yes`, `no`, `true`, `false`, `on`, and `off`. Numbers can look like numbers. You can even recognize things like urls, dates and times, e-mail addresses, ip addresses, filenames, etc. See comments about REBOL below. 4. This is *much* better defined than "unquoted strings match this regex", otherwise you need to put quotes around it. It takes a lot more brain cycles to constantly be checking to make sure your "string" matches the string production in the YAML grammar. Finally, since *most* strings have to be quoted, why not make all strings quoted, and give yourself all the benefits I'm listing? 5. This simplifies the parser a lot. If it's quoted, it's a string, and you're done. This means that anything else is a special value (like `yes`, or `null`), and if it isn't, it's an error. I think we can learn a lot from what REBOL has done with implicit types. REBOL has like 20 built in datatypes and doesn't get confused. Strings are always quoted or surrounded by curly braces, REBOL understands e-mail addresses and URLs natively, it understands certain date and time formats natively, boolean values, numbers, money, none (REBOL's null type), tuples (like IP addresses), tags (like XML or HTML tags), file names (in a canonical cross-platform format), words (symbols in the REBOL language - think lisp/scheme), and more. See http://www.rebol.com/docs/core23/rebolcore-2.html#sect2. for a gentle introduction to REBOL's types, and see http://www.rebol.com/docs/core23/rebolcore-16.html for a little more depth than you probably need right now. Finally, if you accept a few date formats (like ISO8601) natively, it doesn't make it any harder to parse, especially since you don't have to figure out whether it's a string or not. This is going a little bit off-topic, but if you decide on a date/time format such as YYYY-MM-DD HH-MM-SS (and maybe a timezone), you don't have to have a "!" to tell the YAML parser that you want a date - it'll know because it doesn't have to worry about it being a string. You can include just the date, or the time, etc. I'll stop here. Take a look at how REBOL does it, I think it makes sense. Any major objections? Respectfully, Keith |
From: <sh...@zi...> - 2002-06-17 12:50:58
|
> > So, here's my proposal. ALL inline strings have to be quoted, either with a > single or double quote (normal escaping rules apply). Therefore, there's > nothing implicit. > Your proposal creates a lot of simplifications, but I really believe we need to optimize YAML for strings, so I would prefer that we make refinements to Ryan's earlier proposal, rather than forcing string quoting on people. Particularly if you use a scripting language, most of your data is likely to be strings. Often data that has the semantic type of date, integer, or URL can be processed just as easily if you read it in as a string. I could see many YAML users not even caring about types. Read everything in as a string, and let the application do the conversions. For applications that care about types, we provide the bang syntaxes for type metadata. You can either be explicit about the type, or if your type matches a certain regex, you can use the shorthand of !!. |
From: Clark C . E. <cc...@cl...> - 2002-06-17 23:27:06
|
Keith, Welcome. Thank you for contributing your thoughts. Your post made me consider an option, lets call it "flex" where an implicit text value starts with an alpha character can contain `!&().,;/-_ but not |\~@#$%^*+={}[]:<> See below for the rationale! On Mon, Jun 17, 2002 at 05:05:49AM -0400, Keith Devens wrote: | So, here's my proposal. ALL inline strings have to be quoted, either with a | single or double quote (normal escaping rules apply). Therefore, there's | nothing implicit. These are great arguments below, but let me put in the counter... | 1. No backward compatibility issues: strings are quoted, that's it.a Well, we already have backward compatibility issues if we want to restrict implicit text to anything less than starting with alpha. | 2. No surprises! (One pixel shouldn't change the meaning of some text.) Yes, but in most cases it will get you a syntax error! | 3. Frees up unquoted text to be implicitly whatever we want. Yes; but your examples below make me think that there is a set of chararacters we can exclude from the implicit string regular expression so that most unquoted text works this way. | 4. This is *much* better defined than "unquoted strings match this regex" Well... if one wants to be sure, they can always quote! Quoting keys is also really ugly... | 5. This simplifies the parser a lot. If it's quoted, it's a string Yes, but I don't think string should be treated differently than any other implicit type; it should be given a regex and if someone isn't sure they can just use quotes. That said, you have great arguments here; I guess I'm just spoiled. One thing to note is that REBOL has to have language keywords in addition to strings... we don't. So, I enjoy your reflection, it helps us alot. And the REBOL pointer is just fantastic... | I think we can learn a lot from what REBOL has done with implicit types. | See http://www.rebol.com/docs/core23/rebolcore-2.html#sect2. numbers: [ 1234, -432, 3.1415, 1.23E12, 123,4, 0,01, 1,2E12 ] times: [ 12:34, 20:05:32, 0:25.345, 0:25,345 ] dates: [ 20-Apr-1998, 20/Apr/1998, 20-4-1998, 1998-4-20, 1980-4-20/12:32, 1998-3-20/8:32-8:00 ] money: [ $12.34, USD$12.34, CAD$123.45, DEM$1234,56 ] tuples: [ 2.3.0.3.1, 255.255.0, 199.4.80.7 ] email: in...@re... uris: [ http://www.rebol.com, ftp://freda:gr...@da...m/dir/files ] files: [ %data.txt, %images/photo.jpg, %../scripts/*.r ] pairs: [ 100x50, 1024x768, -50x200 ] issues: [ #707-467-8000, #0000-1234-5678-9999, #MFG-932-721-A ] All of these are great examples of stuff that would be slick to have as YAML implicit types. What's interesting is that most of the examples above will work with the current compromise with three exceptions, e-mail, uris, and money. In each of these cases a special character was used. What if we defined the text implicit to start with alpha character, but exclude anywhere the following, |\~@#$%^*+={}[]:<> leaving these special characters `!&().,;/-_ This is probably an improvement of the current situation. Hmm. I actually like this better than Neil's suggestion. Yes, it is more complicated; but it seems to provide more of the ballence we are looking for? Clark |
From: Keith D. <ya...@ke...> - 2002-06-18 09:31:55
|
How about a compromise between all these proposals. It has the following benefits: 1. Backward compatibility with current YAML just about every case (from what I understand, correct me if I'm wrong) 2. You don't have to have the exclamation point - I'm still not totally confident I understand what that's for. It tells the parser that you want to consider what follows to be a special type instead of just a string? 3. You don't have to quote every string :) The compromise is this: You have string quoting be optional. The default interpretation of any scalar is as a string, regardless of its contents. However, if it matches one of YAML's implicit types (boolean, null, etc.), YAML will interpret it as a special type. Besides watching out for YAML meta characters like #, it's just a string. Here's why I think this might work, will require the least amount of line noise to DWIM, and will provide the most backward and forward compatibility: Except for the very few special YAML words, like `yes` for boolean, `2002-06-18` for dates if you so choose, etc. you won't need quotes to have scalars be interpreted as strings. OTOH, for those special values you don't need to have an exclamation point either. Let me give some examples from REBOL, and you'll see that this is actually exactly how REBOL does things. Disclaimer: I'm not a REBOL zealot, regardless of how I sound. I don't actually use the language for anything. But I've read the book on it, and I think it's an extremely well thought out and consistently designed language. I think we can learn from it, and even if we wind up disagreeing with how the designers did something, very smart people worked on it (ever hear of Carl Sassenrath?) and we'll learn even from disagreeing with it. Now, one clarification. Whereas YAML's default type would be 'string', REBOL's default type is "word!". Some examples of REBOL's implicit typing (this is all actual REBOL code I'm typing into the interpreter): >> type? http://yahoo.com == url! >> type? http:/ == url! >> type? http: ** Script Error: http needs a value ** Near: type? http: See what happened? It's not a URL anymore, it's actually a "set-word!" - REBOL expects a value to be "assigned" to the word 'http'. >> type? yes == logic! >> type? yessir ** Script Error: yessir has no value ** Near: type? yessir Here, since it doesn't understand 'yessir' by default, it considers it a word, and then evaluates the word looking for what's stored in it so it can figure out what its type is. But yessir has no value. Incidentally, you can say this: >> unset 'yes >> type? yes ** Script Error: yes has no value ** Near: type? yes And screw with your environment. There are no reserved words in REBOL, only default meanings for the words. But this segues right into my next point. If I say "unset yes" (after restarting my interpreter since I tainted my environment), watch what happens. >> unset yes ** Script Error: unset expected word argument of type: word block ** Near: unset yes I have to say "unset 'yes". Unset requires a word! value, so you have to say "I want yes the word, not yes the logic! value" by putting an apostrophe in front. (Correct me if I'm wrong, but this is straight from scheme.) To get to the point, REBOL's leading apostrophe to force a "thing" to be a word! instead of whatever the word would normally evaluate to is the same idea as putting quotes around a YAML value to force it to be a string in case it would be interpreted as something different. REBOL gets by extremely well with this implicit scheme. In the very rare cases you want to break REBOL's default behavior, you can, as in this example: >> type? to-word "http://yahoo.com" == word! This would be analogous to supplying an explicit type in YAML, if I understand correctly. This whole thing has a few consequences. It's much more forgiving than throwing an exception if you have a period in what would otherwise be a string :) In this, it is very much like the Perl philosophy of things -- knowing the backgrounds of some of the people in this group as well as the origins of YAML, I don't think this is a problem. This is actually very much like how Perl handles barewords. If you have a hash and want to use a bareword as a key, in the vast majority of cases it's no problem: $hash{shifty} = "eyes". But the moment you want to say $hash{shift} = "down" You have a problem because shift has special meaning in Perl. So, this brings me to the last of our considerations: forward compatibility. Defined this way, YAML will have about the same forward compatibility problems that programming languages have. Putting aside the reality of Perl 6, if a new version of Perl were to define a new function that people had used as hash keys, it'd break existing code. That's why you're not supposed to use bare words, but Perl will let you, knowing full well that it reserves the right to break it later. It's a sort of "cultural mandate", in a way. In YAML terms, as well as in Perl terms, the rule would be (YAML) and is (Perl): "you should use quotes, but you don't have to. However, be careful, because it'll break if we define a new type (or meta-character, I guess) (YAML) or function (Perl)." In short, "reserve" the words you want to be reserved, even if you don't use them yet (languages commonly reserve words they may not even have any intention of using - I think Java reserves "goto", for example). So in YAML terms, reserving words means probably something similar to what you're saying with your "flex" syntax. You want to reserve "\" to indicate that the line continues, "#" for a comment, etc. So, pick your "reserved words" -- including both special characters you want to be able to have on the end of scalars or within scalars to mean special things (this continues ('\'), this starts a comment ('#'), etc.), and special formats like \d\d\d\d-\d\d-\d\d and special words 'yes', 'off', etc. You could go ahead and use sigils for special things that can't otherwise be recognized by their form, like REBOL does with money. But (and this is a very minor point) I see no reason why something can't be interpreted as a string even if it starts with a number (in addition to "alpha character"s), as long as the whole value isn't a number. Also, it doesn't have to exclude certain characters from the string altogether. The spec only has to specify that certain characters can't appear where they would otherwise have meaning (for instance, a # anywhere in the string because otherwise it's a comment, or a \ at the end of a line because that might be a continuation). Ok, I think I'm done. I even think we might agree about how to handle this stuff. A major issue I don't know what you think about is whether you want YAML to take the Perl philosophy (DWIM) and not throw exceptions all over the place if YAML doesn't recognize something as a special word. I think DWIMming is good. If you accept all of the above, reserved words and meta-characters have to be defined, and then you're done. Whew. Keith |
From: Neil W. <neilw@ActiveState.com> - 2002-06-18 10:14:29
|
Hi Keith, I just have to speak up on behalf of Perl syntax here. The following is all wrong: Keith Devens [18/06/02 05:30 -0400]: > This is actually very much > like how Perl handles barewords. If you have a hash and want to use a > bareword as a key, in the vast majority of cases it's no problem: > > $hash{shifty} = "eyes". > > But the moment you want to say > > $hash{shift} = "down" > > You have a problem because shift has special meaning in Perl. So, this > brings me to the last of our considerations: forward compatibility. Defined > this way, YAML will have about the same forward compatibility problems that > programming languages have. Putting aside the reality of Perl 6, if a new > version of Perl were to define a new function that people had used as hash > keys, it'd break existing code. That's why you're not supposed to use bare > words, but Perl will let you, knowing full well that it reserves the right > to break it later. It's a sort of "cultural mandate", in a way. Perl *always* interprets a literal /-?\w+/ as a string inside "stringifying" contexts. Two such contexts are hash keys and the LHS of "fat arrows": %hash = ( shift => 27, dump => 42, int => 21, push => "hello", # quotes required on RHS using 'strict'. -pop => "looks like an option", ); shift if $hash{shift}; pop if $hash{-pop}; If you really want to use the return value of a function inside a hash key, you have to make it look like an expression so Perl will evaluate it. That's really easy in practice: die "Foo!" unless $hash{ shift }; # die "Foo!" unless $hash{shift()}; # die "Foo!" unless $hash{ print("Hello"), $a + 1 } Where your point is valid is for the following: my $a = seven; print $a; If you just run that on the commandline (perl -le '$a=seven;print$a') you'll see that $a gets the *string* "seven". That's because it's a "bareword" which is turned into a string because no such function 'seven' exists at compile time. In fact, even this example will still print "seven": my $a = seven; print $a; sub seven { 7 } because the compiler didn't see the subroutine until after it compiled the opcode for the assignment. It all changes as soon as you start defining things at compile time. That's because Perl's compiler consults the symbol table whenever it finds a bareword, to DWYM. This prints "7": sub seven { 7 } $a = seven; print $a; This is the compile error you're after: use strict; my $a = seven; print $a; sub seven { 7 } Because no such function exists, and "strict" forbids the compiler to make a string out of it. So you're SOL. (I had to add "my" to declare $a, too). This fixes everything up again: use strict; sub seven { 7 } my $a = seven; print $a; Only a very specific class of barewords have ever been broken, and that's not even on by default. I'm not aware of any mode you can set which overrides stringification inside hash keys and fat arrows. I can only assume you were led astray by an editor's highlighting mode, and never actually tried any of this...? > In YAML > terms, as well as in Perl terms, the rule would be (YAML) and is (Perl): > "you should use quotes, but you don't have to. However, be careful, because > it'll break if we define a new type (or meta-character, I guess) (YAML) or > function (Perl)." This sucks the big one. I want guarantees that no data serialized under YAML:1.0 will break *ever*. Except perhaps in YAML:2.0, which might conceivably be totally different. My point is, I want the freedom to add an arbitrary number of metacharacters without worry. This all hinges on the assumption that we'll continue to use metacharacters to denote implicit types. If we don't, fine. I still want to be restrictive enough that later specifications simply use more metacharacters, thus allowing more things to be legal YAML, but still preserving the exact semantics of every previous version. To some extent, this is an orthogonal issue to DWIMity. I still want YAML to DWIM -- but I also want the next version of YAML to DWIMT (do what I mean tomorrow). I think we're pretty close to a solution that does that. In fact, I think most of the proposals out there come very close... with the sole exception of the "treat everything that's *not* recognized as a string" proposal, which has absolutely no forward compatibility beyond quoted strings. I cannot stress enough how much it sucks. I simply guarantee I will not use YAML unless I know that YAML I ship to my customers will be readable after I upgrade my application to the next version of libyaml. That's that! Later, Neil |
From: Keith D. <ya...@ke...> - 2002-06-18 16:13:31
|
> Perl *always* interprets a literal /-?\w+/ as a string inside "stringifying" > contexts. Two such contexts are hash keys and the LHS of "fat arrows": Yikes! What was I thinking. I was totally wrong. I hate being wrong in public. I apologize. > Only a very specific class of barewords have ever been broken, and that's not > even on by default. I'm not aware of any mode you can set which overrides > stringification inside hash keys and fat arrows. I can only assume you were > led astray by an editor's highlighting mode, and never actually tried any of > this...? I don't think I was led astray by an editor's highlighting mode. Don't know what I was thinking, or where I had first learned that this was an exception to Perl's stringification rules. Obviously I didn't try it :) That'll teach me to never post anything again at 5 in the morning. > This sucks the big one. I want guarantees that no data serialized under > YAML:1.0 will break *ever*. Except perhaps in YAML:2.0, which might > conceivably be totally different. My point is, I want the freedom to add an > arbitrary number of metacharacters without worry. > > This all hinges on the assumption that we'll continue to use metacharacters > to denote implicit types. If we don't, fine. I still want to be restrictive > enough that later specifications simply use more metacharacters, thus > allowing more things to be legal YAML, but still preserving the exact > semantics of every previous version. > > To some extent, this is an orthogonal issue to DWIMity. I still want > YAML to DWIM -- but I also want the next version of YAML to DWIMT (do > what I mean tomorrow). I think we're pretty close to a solution that > does that. In fact, I think most of the proposals out there come very > close... with the sole exception of the "treat everything that's *not* > recognized as a string" proposal, which has absolutely no forward > compatibility beyond quoted strings. I cannot stress enough how much it > sucks. I simply guarantee I will not use YAML unless I know that YAML I > ship to my customers will be readable after I upgrade my application to > the next version of libyaml. That's that! Ok, you win. Keith |
From: Oren Ben-K. <or...@ri...> - 2002-06-17 14:47:49
|
Clark C . Evans [mailto:cc...@cl...] wrote: > | Stick to restrictive. That goes for *all versions*. If > | you're tempted to say > | "the default is this", and it's a wide default, you're > | probably going to > | cause problems later. We should keep the unquoted strings > | restricted to the > | regexp /[-_A-Za-z0-9]/. That way, any strings containing > | those will be quoted > | in all YAML-1.0 versions, and will continue to work > correctly forever. > > Hmm. This seems to limit unquoted strings to a single word, a bit > more restrictive than current. I like it. > > The use case for unquoted strings is enumerated values, and this > fits perfectly with that use case. The problem we are encountering > is when people have two or more tokens separated by a space, pretty > soon they want to add punctuation or end with a period. This > forces the sentence use case into a quoted (or better yet block > or folded form). Nice. This also fits with 99% of my current > unquoted string usage. Neat. So, the change is to just remove the ' ' from the implicit string regexp. I'll go with that. Combined with "tabling" the escaped block notion, this means the current spec only needs very minor fixes: - Remove ' ' from string regexp. - Allow throwaway_indicator+ in comment lines. - Allow '\ ' and '\_'. - Fix folded productions to make the special handling of the final line break explicit. Is that it? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-06-17 15:07:22
|
| So, the change is to just remove the ' ' from the implicit | string regexp. I'll go with that. Great. It seems to be the 80% mark. It addresses most of Keith's issues and also gives us migration path in the future for other implicit types. | Combined with "tabling" the escaped block notion I'll leave this up to you. It seems Brian and his associates want it back in (with a different indicator) and I'd like the spec to become final soon. | - Remove ' ' from string regexp. | - Allow throwaway_indicator+ in comment lines. | - Allow '\ ' and '\_'. | - Fix folded productions to make the special | handling of the final line break explicit. ;) Clark |
From: Dave L. <dl...@si...> - 2002-06-18 19:32:06
|
> I could see many YAML users not even caring about types. Read > everything in as a string, and let the application do the conversions. I would like to be able to do some simple processing by streaming YAML past an awk script, so count me in. Arguing from the basis of the native types of current scripting languages seems to miss the point that dates, times, and currency amounts are all human native types. (otherwise why would localization be so important?) Is it unreasonable to wish that type metadata stay "meta", so applications which don't instantiate objects can still easily parse and produce YAML, to interoperate with those which do? (In other words -- is it difficult to ensure that parsing YAML and type assignment are logically separated?) -Dave ---------- > It seems that if we had [a subtler marker], then an explicit marker > for implicit types is the best solution all around. If YAML is for configuration files, then to my eyes, "clobber: ! .on" is simply asking for trouble. But that's a minor blemish. |
From: Oren Ben-K. <or...@ri...> - 2002-06-18 20:32:36
|
Dave Long wrote: > Arguing from the basis of the native > types of current scripting languages > seems to miss the point that dates, > times, and currency amounts are all > human native types. (otherwise why > would localization be so important?) I'd call them "domain-specific" types, but you are basically right. > Is it unreasonable to wish that type > metadata stay "meta", so applications > which don't instantiate objects can > still easily parse and produce YAML, > to interoperate with those which do? It is very reasonable. And in fact it is OK for you to do so for "unknown implicit types" - as long as you round trip them properly. > (In other words -- is it difficult > to ensure that parsing YAML and type > assignment are logically separated?) They are. At some parsing level, you just say "here's some value, it should be an implicit type. I'll ask my type library about it". This "type library" is completely divorced from the actual parsing. Also, there's no requirement for your application to handle types it isn't interested in. The minimum required is !str, !seq and !map - that's all. If your application doesn't have any dates - don't do dates. In general there are two types of applications. One that is schema-specific. The schema may be very wide (e.g., "serializing Java classes") or very narrow ("my schema for data about penguins"). A schema-specific system can happily ignore types outside the schema (in your case, dates). They just don't exist for it. No fuss, no cost. An example would be a Java de/serialization code or a penguin statistics tool. A schema-independent system should just sigh when it sees an "unknown type" and round-trip it. This is rather easy, especially in the case of an unknown implicit type. Just store it as a string, internally, and be certain to emit it in the same way you have scanned it. That's all. It is trickier for unknown explicit types - you have to remember the transfer method together with the value. Again, no big deal. An example of a schema-independent system would be a YAML pretty-printer, or a hub routing YAML messages, etc. The reasoning behind having some domain-specific types in the spec is that the relevant domain is very wide. Dates, for example, appear in config files, financial records, databases, log files, and so on. So we figured they are important enough to *mention* in the spec (not require). We may, *if* the need arises, define additional "widely used" types later on, based on feedback from actual use of YAML. We just felt that date is "obviously" very useful. Hope this helps, Oren Ben-Kiki |
From: Ryan K. <rk...@pa...> - 2002-06-16 13:43:11
|
On 2002.06.16, Oren Ben-Kiki <or...@ri...> wrote: > Ryan King [mailto:rk...@pa...] wrote: > > Is it true that, under the new spec, quotes are required for > > strings containing anything other than alphanumeric, spaces, and > > underscores? > > Yes and no :-) Ok. Thanks for confirming -- I wasn't sure if I was somehow reading the spec incorrectly. Personally, I would go the complete opposite direction, and make the quoting rule this: "You only must quote that which is syntactically ambiguous." Pros: - It makes the output quieter. Ticks and double-ticks are noisy, and generally of no help to the humans, who can usually tell what is meant without them. - It makes the input easier for humans. A machine doesn't care if it has to put extra escaping around a string, but a human does. Therefore, we should minimize the amount of escaping needed by a hand-edited file, etc.), even if it means we trade a little syntactic simplicity for machine-generated files. Cons: - Causes migration issues as new implicit types are needed. - Causes implementation issues (codifying "that which is syntactically ambiguous" into a set of productions would be a fairly challenging task) I recognize that the implicit types issue is a major caveat with my idea as-is. What if, then, we went with a different approach toward implicit types: To do away with them. Here is a theoretically possible rule set: - Every leaf is a string by default - Leaves preceded by "!foo" are given type "foo" (as they currently do) As a syntactic optimization, we could add this rule: - Leaves preceded by lone !'s are pattern-matched to determine types Examples: "5" # A string: 5 # Also just a string: !int 5 # This parses as an "int": ! 5 # Also parses as an "int", because the string "5" pattern-matches Pros: - You can still express everything you could previously express. - It is clearer to the human as to which things are going to be magical and which aren't. Cons: - Less compact representation of special types Thanks for listening... I welcome correction. - Ryan King |
From: Brian I. <in...@tt...> - 2002-06-16 21:04:16
|
On 16/06/02 09:43 -0400, Ryan King wrote: > On 2002.06.16, Oren Ben-Kiki <or...@ri...> wrote: > > Ryan King [mailto:rk...@pa...] wrote: > > > Is it true that, under the new spec, quotes are required for > > > strings containing anything other than alphanumeric, spaces, and > > > underscores? > > > > Yes and no :-) > > Ok. Thanks for confirming -- I wasn't sure if I was somehow reading the > spec incorrectly. > > Personally, I would go the complete opposite direction, and make the > quoting rule this: "You only must quote that which is syntactically > ambiguous." I cautiously concur. Perhaps we've oversimplied the simple scalar. For instance, this example in the spec is now wrong: first: There is no unquoted empty string. second: 12 # This is an integer. third: !str 12 # This is a string. span: this contains # Interleaved three spaces # comments. The final period makes it invalid. Sure, you can call it an oversite, but if /we've/ made the oversite, think how often other people will... In my opinion, this could really make or break the "YAML as config file" use case. > > Pros: > - It makes the output quieter. Ticks and double-ticks are noisy, > and generally of no help to the humans, who can usually tell what > is meant without them. > > - It makes the input easier for humans. A machine doesn't care if > it has to put extra escaping around a string, but a human does. > Therefore, we should minimize the amount of escaping needed by a > hand-edited file, etc.), even if it means we trade a little > syntactic simplicity for machine-generated files. > > Cons: > - Causes migration issues as new implicit types are needed. > > - Causes implementation issues (codifying "that which is > syntactically ambiguous" into a set of productions would be a > fairly challenging task) > > I recognize that the implicit types issue is a major caveat with my idea > as-is. What if, then, we went with a different approach toward implicit > types: To do away with them. > > Here is a theoretically possible rule set: > > - Every leaf is a string by default > > - Leaves preceded by "!foo" are given type "foo" (as they currently > do) > > As a syntactic optimization, we could add this rule: > > - Leaves preceded by lone !'s are pattern-matched to determine types > > Examples: > "5" # A string: > 5 # Also just a string: > !int 5 # This parses as an "int": > ! 5 # Also parses as an "int", because the string "5" pattern-matches > > Pros: > - You can still express everything you could previously express. > > - It is clearer to the human as to which things are going to be > magical and which aren't. > > Cons: > - Less compact representation of special types Again, I would cautiously concur. I've never really liked the implicit types that much, except for int and float, which are imperative. All the rest are just overkill. Well, actually I think null is pretty nice too. but the dates drive me nuts. I hate that I need to support them. They're just not that useful. I'd much rather be explicit with them. I'd be perfectly happy with: string: 2001-12-14 date: ! 2001-12-14 int: 42 float: 3.12 null: ~ bool: ! true url: ! http://www.yaml.org ipaddr: ! 192.168.55.9 If we accept the minimal risk of saying that strings, ints, floats and nulls are special, then we can add all the other patterns by requiring a bang. This is actually a very good thing, because it's a visual cue that the "date" is an object, not a string like everything else. Then we can allow 'string' to be the default type (as it should be). It would not be an implicit type anymore. As long as it wasn't a float int or null, and wasn't syntactically ambiguous, it would be a string. address: 123 Main Street movie: The Good, the Bad, and the Ugly title: YAML: The new frontier I know it's a lot to swallow at this point. But let's give it some thought please. --- Side note: we might want to consider '!!' for private implicits. That would match private explicits. Cheers, Brian |