From: Oren Ben-K. <or...@ri...> - 2002-06-17 09:33:06
|
Keith Devens [mailto:ya...@ke...] wrote: > ... if your implicit types are > defined in a > way that isn't *completely* obvious, it's real easy to say > something you > don't mean, as we've seen with the ending period example. Good point. > So, here's my proposal. ALL inline strings have to be quoted, > either with a single or double quote (normal escaping rules > apply). That is, make string into an explicit type. Hmmm. > Any major objections? You'll end up quoting a long of single-word values in configuration files. You could argue that in all these cases you are actually using an "enumerated type" but since there's just one single global namespace it would be impossible to treat the values this way. I think that's too much. Also, compare this to the current state of affairs. The only difference is that string is also an implicit value for a *very* restricted set of cases - start with a letter and contain only letters, numbers, '-', '_' and ' '. That seems "very obvious". Of course: > ... a period on my screen is ONE PIXEL. I don't want to have > something I type turn from a string into something invalid because I have a > one pixel dot on the end. There is that... and that's why I suggested relaxing the rule (allowing more characters). Putting aside the complex rules (doubling etc.) this means just sticking with a certain character set (that includes '.' :-). Is this "completely obvious"? > Take a look at how REBOL does it, I think it makes sense. REBOL uses a character-set approach for words (they can't contain certain characters, but are allowed to contain things like & and ? (Hmmm... '&' makes sense in simple text, right?). So it isn't that different... Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-06-17 16:22:49
|
Brian Ingerson [mailto:in...@tt...] wrote: > 1) ints and floats work as they do today. > 2) null works as today (~). (This shows up like crazy in dumps, and > needs to be clean. It is also a native type.) So far, so good. > 3) dates and times require explicit declaration (or just '! '). > Otherwise they are strings. (Most people in Perl and Python don't > need automatic Date object creation. String would be the expected > behaviour from a config file.) I have a problem with: this being much: 2002-01-16 different from: ! 2002-01-16 It is rather confusing IMVHO. Quoting, on the other hand, makes sense to people. I'd rather go with having to write: this for an str: '2002-01-16' > 4) Unquoted strings, outside an Inline collection, must start with a > word-char (including numbers). After that anything goes. It's a > string. (This assumes we've already checked for ints and floats). This rules out implicit types such as: USD 20 I don't like it. > 5) Unquoted strings, inside an inline collection, must be only > made of word chars. Otherwise you must quote them. That is more reasonable... > 6) All future oddball types, like URI, require explicit or just '! '. > (Scripting languages don't have native support for anything outside > str, int, float, null. People will expect URIs to be strings. There > should be a visual indicator if that isn't true. '! ' works nicely) I don't know. I *liked* implicit types being implicit. > 7) Bools can either stay as is, or just simplify to '! true' > == '!bool 1'. > 8) Cases not described above, result in a parse error. > 9) Use '!!' for private types. (I'm not sure how private > implicit typing > would work, but I'm throwing it in anyway. It's probably YAGNI) Probably. Today it starts with `. > This (IMO) covers everything in the most reasonable way. I don't see > any real warts. And it seems to distribute nicely over the design > goals above. Let's compare your proposal with Nail's proposal (as today + remove ' ' from the str regexp): > 1) Keep YAML tidy Tradeoff: your littering it with ! for dates etc. isn't tidy; Neil's forcing one to quote 'foo bar' isn't, either. No advantage. 2) Make YAML DWIM Tricky. I mean 2002-01-15 to be a date. Your proposal doesn't do that. On the other hand, I mean 99 Foo road to be a string, and Neil's proposal doesn't do that. Again, no advantage. 3) Provide an upgrade path Neil's wins here - anything that isn't a single word (with word chars) is a potential implicit type. 4) Keep the productions reasonable You mean the regexp - I think Neil's wins here, too. Str is a simple implicit type with a simple regexp, and there's nothing magical about ints or anything else. All in all I favor Neil's proposal at this point: --- one: - 42 - 3.1415 two: - ~ - [~, foo, ~] three: - 12:34 - 09-11-2001 # we could support friendlier dates too! - Mon Jun 17 08:02:40 PDT 2002 four: # Must be quoted: - '123 Main Street' - 'm|.*?\\(foo\w*)?.+$|;' - 'YAML's cool!' - 'http://www.yaml.org' - '192.168.0.1' - true # String! - '09-11-2001' - Only-one_word - 'Otherwise, quote it.' five: - {foo: 'bar bar bar', 'x.y': 3.12} six: - http://www.yaml.org - 192.168.0.1 - $19.99 seven: - .true # or: - ! 1 eight: - .net - $19.99 nine: - ! `red - ! `white - !america.com/color blue Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-06-17 17:45:05
|
On 17/06/02 12:24 -0400, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > All in all I favor Neil's proposal at this point: Well, from a pragmatic point, I'll agree with you. The one big problem this creates is all the legacy YAML I have now needs conversion. I guess better now than later. I like being strict up front. The downer, is that this could be a death blow to YAML as Config. Time will tell. The whitespace issues also hurt us here. But at least the quoting thing will be easy to explain. OK. I'm in. I'd still like to hear Ryan and Steve's take. > --- > one: > - 42 > - 3.1415 > two: > - ~ > - [~, foo, ~] > three: > - 12:34 > - 09-11-2001 > # we could support friendlier dates too! > - Mon Jun 17 08:02:40 PDT 2002 > four: # Must be quoted: > - '123 Main Street' > - 'm|.*?\\(foo\w*)?.+$|;' > - 'YAML's cool!' - "YAML's cool!" > - 'http://www.yaml.org' > - '192.168.0.1' > - true # String! > - '09-11-2001' > - Only-one_word > - 'Otherwise, quote it.' > five: > - {foo: 'bar bar bar', 'x.y': 3.12} > six: > - http://www.yaml.org > - 192.168.0.1 > - $19.99 > seven: > - .true > # or: > - ! 1 > eight: > - .net > - $19.99 > nine: > - ! `red > - ! `white > - !america.com/color blue We can still do the '!! ', right. That backtick, is one of the remaining warts. (We already use '!!foo' for private explicit) --- So let's hear the rest of the opinions, but I think we'll move forward the fastest with Neil's proposal. Plus, we gotta keep Neil happy, and on board. He's our most powerful implementor! ;D --- As an aside, how hard would it be to add '>>', '||', (and '!!' :) ? I think they'll be nice minor additions. Let's get it done soon if we're really getting past this unquoted issue. Cheers, Brian |
From: Neil W. <neilw@ActiveState.com> - 2002-06-17 19:51:06
|
Hi yamlers, Wow. What a weekend :) Let me come clean right away: this regexp /[-_A-Za-z0-9]/ was actually missing a ' ' because of a brain fart, not an epiphany. Only my subconcious can claim any credit for it. But after reading through the discussion it provoked, I'm beginning to like it. Oren Ben-Kiki [17/06/02 12:24 -0400]: > Let's compare your proposal with Nail's proposal (as today + remove ' ' from > the str regexp): > > > 1) Keep YAML tidy > > Tradeoff: your littering it with ! for dates etc. isn't tidy; Neil's forcing > one to quote 'foo bar' isn't, either. No advantage. Right, but quotes have an established meaning to people -- '!' doesn't. It may be ugly, but it is understandable. It passes the girlfriend test with flying colours. Laura had no idea what the difference between '! foo' and 'foo' was (because she hasn't read the spec). But she immediately grasped the difference between 'foo' and '"foo"'. She guessed (guessed!) that quotes were required when there was ambiguity. Of course, she doesn't know what ambiguities there are (because the hasn't read the spec). It's interesting to watch how "beginners" learn something new. They NEVER read the spec, they just dump some data and watch for patterns. This is how people will use YAML. The spec will be read only be the eager beavers on this list. > 2) Make YAML DWIM > > Tricky. I mean 2002-01-15 to be a date. Your proposal doesn't do that. On > the other hand, I mean 99 Foo road to be a string, and Neil's proposal > doesn't do that. Again, no advantage. Giggle. In my proposal, '99 Foo road' is a parse error. Assuming we only want to compare _working_ YAML under each proposal, '"99 Foo road"' _is_ a string, so it _does_ do what you want. But that's probably not what you wanted to show... > nine: > - ! `red > - ! `white What the hell are those? Is my head slanting? I prefer all of these: nine: - ! .red - ! $red - ! ~red - ! (red) - ! #red - ! @red - ! ?red - ! |red - ! +red - ! =red - ! %red - ! ^red Wow. We have a lot of metacharacters free. It's time to add more features, gentlemen! Later, Neil |
From: Oren Ben-K. <or...@ri...> - 2002-06-17 17:08:01
|
My main problem with requiring '!' for implicit typing is that it is rather ugly. So I've been thinking, what if we have a less offensive syntax for it? It seems that if we had one, then an explicit marker for implicit types is the best solution all around. For example: A String: 2002-01-12 Also a string: 12 A date:: 2002-01-12 An int:: 12 Or: - 2002-02-12 # String - 12 # String -: 2002-01-12 # date -: 12 # int OK, maybe this is too subtle/obscure, but you get the idea. # This is *so* ugly... date: ! 2002-01-12 int: ! 12 Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-06-17 14:00:40
|
On Mon, Jun 17, 2002 at 05:34:35AM -0400, Oren Ben-Kiki wrote: | > So, here's my proposal. ALL inline strings have to be quoted, | > either with a single or double quote (normal escaping rules | > apply). | | That is, make string into an explicit type. Hmmm. | | > Any major objections? | | You'll end up quoting a long of single-word values in configuration files. | You could argue that in all these cases you are actually using an | "enumerated type" but since there's just one single global namespace it | would be impossible to treat the values this way. Exactly, the primary use case for non-quoted types is for enumerated values. About 60 percent of my data fits this case, and quoting is not acceptable. I see a few othogonal issues: 1. Do we have a different top-level "simple" scalar from one that is nested within an in-line mapping or sequence. This is important since ", " and ": " cannot appear in the latter, but could possibly appear in the former. This may complicate the productions a bit more, but would make YAML a bit more useable (less quoting required). If we want to do this, I'd suggest further limiting the nested simple scalar to a single word, that is [{ this: is_ok }, { while: this is not ok }] 2. Do we want to reserve spots for more implicit types? There are three alternatives: a) Yes, and we keep it very restrictive; b) Yes, but we be less restrictive; c) No, we never have any more implicit types. Case (b) may be interesting. We keep a few "reserved" indicators and then make everything which doesn't match implicits for date, time, float, integer, etc. automatically become a text implied type. So, it is _more_ complicated, but _more_ human readable if we have different productoins for the nested vs top-level simple scalar and if we reserve just a few additional indicators which a simple scalar cannot contain. Clark |
From: Brian I. <in...@tt...> - 2002-06-17 15:22:56
|
On 17/06/02 10:09 -0400, Clark C . Evans wrote: > > So, it is _more_ complicated, but _more_ human readable if > we have different productoins for the nested vs top-level > simple scalar and if we reserve just a few additional > indicators which a simple scalar cannot contain. I see a few design goals: 1) Keep YAML tidy 2) Make YAML DWIM 3) Provide an upgrade path 4) Keep the productions reasonable I like the ideas that are floating around. Here's how I would most like to see things: 1) ints and floats work as they do today. 2) null works as today (~). (This shows up like crazy in dumps, and needs to be clean. It is also a native type.) 3) dates and times require explicit declaration (or just '! '). Otherwise they are strings. (Most people in Perl and Python don't need automatic Date object creation. String would be the expected behaviour from a config file.) 4) Unquoted strings, outside an Inline collection, must start with a word-char (including numbers). After that anything goes. It's a string. (This assumes we've already checked for ints and floats). 5) Unquoted strings, inside an inline collection, must be only made of word chars. Otherwise you must quote them. 6) All future oddball types, like URI, require explicit or just '! '. (Scripting languages don't have native support for anything outside str, int, float, null. People will expect URIs to be strings. There should be a visual indicator if that isn't true. '! ' works nicely) 7) Bools can either stay as is, or just simplify to '! true' == '!bool 1'. 8) Cases not described above, result in a parse error. 9) Use '!!' for private types. (I'm not sure how private implicit typing would work, but I'm throwing it in anyway. It's probably YAGNI) This (IMO) covers everything in the most reasonable way. I don't see any real warts. And it seems to distribute nicely over the design goals above. --- one: - 42 - 3.1415 two: - ~ - [~, foo, ~] three: - ! 12:34 - ! 09-11-2001 # we could support friendlier dates too! - ! Mon Jun 17 08:02:40 PDT 2002 four: - 123 Main Street - m|.*?\\(foo\w*)?.+$|; - YAML's cool! - http://www.yaml.org - 192.168.0.1 - true - 09-11-2001 - Unquoted strings, outside an Inline collection, must start with a word-char (including numbers). After that anything goes. five: - {foo: 'bar bar bar', 'x.y': 3.12} six: - ! http://www.yaml.org - ! 192.168.0.1 - ! $19.99 seven: - .true # or: - ! true eight: - .net - $19.99 nine: - !! red - !! white - !america.com/color blue Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-06-17 16:05:45
|
On Mon, Jun 17, 2002 at 08:22:43AM -0700, Brian Ingerson wrote: | 1) Keep YAML tidy | 2) Make YAML DWIM | 3) Provide an upgrade path | 4) Keep the productions reasonable | | I like the ideas that are floating around. Here's how I would most like | to see things: | | 1) ints and floats work as they do today. | 2) null works as today (~). (This shows up like crazy in dumps, and | needs to be clean. It is also a native type.) | 3) dates and times require explicit declaration Dates and times show up like crazy in my dumps. In business data dates, times, and currency are big use cases -- much more common than floating point values. | Otherwise they are strings. (Most people in Perl and Python don't | need automatic Date object creation. String would be the expected | behaviour from a config file.) Yes, for system programming you don't need dates and times very often. However, for business applications dates and times are essential. | 4) Unquoted strings, outside an Inline collection, must start with a | word-char (including numbers). After that anything goes. It's a | string. (This assumes we've already checked for ints and floats). | 5) Unquoted strings, inside an inline collection, must be only | made of word chars. Otherwise you must quote them. So, we want to introduce yet-another-scalar-kind? We already have block and folded; now we want to split simple scalars into the nested (very restrictive) and top-level (very flexible) variants? I'm more leaning towards Neil's suggestion of limiting unquoted values to a single word. | 6) All future oddball types, like URI, require explicit or just '! '. | (Scripting languages don't have native support for anything outside | str, int, float, null. People will expect URIs to be strings. There | should be a visual indicator if that isn't true. '! ' works nicely) I see URIs as becoming more and more common type of data, and I can see how one would like do include complex numbers as well. Do we want to draw a line in the sand now and say that YAML will never, ever have other implict types? | four: | - 123 Main Street | - m|.*?\\(foo\w*)?.+$|; | - YAML's cool! | - http://www.yaml.org | - 192.168.0.1 | - true | - 09-11-2001 | - Unquoted strings, outside an Inline collection, must start with | a word-char (including numbers). After that anything goes. Of these use cases, only the last one is interesting to me, the rest are not interesting. The first two would be better as blocks. The URL, Date, and IP address would be better off being typed. You seem to be dividing the current "simple" scalar type into two subordinate types: flex: This type is your #4 above. I'd limit the impliict text recognition to start with an alpha character so that dates and numeric types can co-exist peacefully. word: This type is similar to #5. In this type the scalar value can only be a word (sequence of word chars beginning with alpha). This is designed to work with nested maps/lists. It could also be the production for "key" values for regular multi-line mappings. This isn't so bad. Is it worth the extra complexity? It would prevent a URI implicit type. Best, Clark |
From: Brian I. <in...@tt...> - 2002-06-17 16:40:19
|
On 17/06/02 12:14 -0400, Clark C . Evans wrote: > On Mon, Jun 17, 2002 at 08:22:43AM -0700, Brian Ingerson wrote: > | 1) Keep YAML tidy > | 2) Make YAML DWIM > | 3) Provide an upgrade path > | 4) Keep the productions reasonable > | > | I like the ideas that are floating around. Here's how I would most like > | to see things: > | > | 1) ints and floats work as they do today. > | 2) null works as today (~). (This shows up like crazy in dumps, and > | needs to be clean. It is also a native type.) > | 3) dates and times require explicit declaration > > Dates and times show up like crazy in my dumps. In business > data dates, times, and currency are big use cases -- much more > common than floating point values. Yes. Agreed. But.. They are not native types in Python, Perl, etc. Ints and floats *are*. So if you want to show that a date string is going to be loaded into a Date class, please _indicate_ it. date: ! 06-17-2002 time: ! 12:15 alternate date: ! Sun Oct 28 2001 The ISO8601 dates are so darn long that adding a '! ' won't cause any readability loss anyway. Also, your business cases are probably not of the "human writable YAML" form. > > | Otherwise they are strings. (Most people in Perl and Python don't > | need automatic Date object creation. String would be the expected > | behaviour from a config file.) > > Yes, for system programming you don't need dates and times > very often. However, for business applications dates and > times are essential. I'm not suggesting they aren't. But for the human writable goal, let's just use unquoted for strings, ints and floats. If it came down to it, I'd go with Ryan's stricter proposal of "unquoted is string only". Ints and floats require '! '. For config stuff, the application would know which strings should become ints and floats. For RPC stuff, the emitter would do the right thing: size: ! 9 color: blue weight: ! 3.5 Not too bad. (I'm catching on Ryan) That way, we drop all implicit typing if there is no '! '. I'd still require that the unquoted begin with [_A-Za-z0-9]. Anything else would be a parse error. That way all other characters are reserved. > > | 4) Unquoted strings, outside an Inline collection, must start with a > | word-char (including numbers). After that anything goes. It's a > | string. (This assumes we've already checked for ints and floats). > | 5) Unquoted strings, inside an inline collection, must be only > | made of word chars. Otherwise you must quote them. > > So, we want to introduce yet-another-scalar-kind? We already > have block and folded; now we want to split simple scalars > into the nested (very restrictive) and top-level (very flexible) > variants? I'm more leaning towards Neil's suggestion of limiting > unquoted values to a single word. Yeah. It's a possibility. But for the human writable goal, It just gets in the way. It is a simple rule though. I'll consider it. > > | 6) All future oddball types, like URI, require explicit or just '! '. > | (Scripting languages don't have native support for anything outside > | str, int, float, null. People will expect URIs to be strings. There > | should be a visual indicator if that isn't true. '! ' works nicely) > > I see URIs as becoming more and more common type of data, > and I can see how one would like do include complex numbers > as well. Do we want to draw a line in the sand now and say > that YAML will never, ever have other implict types? No lines are being drawn. We just use an implicit type indicator. Always. > > | four: > | - 123 Main Street > | - m|.*?\\(foo\w*)?.+$|; > | - YAML's cool! > | - http://www.yaml.org > | - 192.168.0.1 > | - true > | - 09-11-2001 > | - Unquoted strings, outside an Inline collection, must start with > | a word-char (including numbers). After that anything goes. > > Of these use cases, only the last one is interesting to me, > the rest are not interesting. The first two would be better > as blocks. The URL, Date, and IP address would be better off > being typed. You seem to be dividing the current "simple" > scalar type into two subordinate types: > > flex: This type is your #4 above. I'd limit the impliict > text recognition to start with an alpha character so > that dates and numeric types can co-exist peacefully. > > word: This type is similar to #5. In this type the > scalar value can only be a word (sequence of > word chars beginning with alpha). This is > designed to work with nested maps/lists. It could > also be the production for "key" values for > regular multi-line mappings. I'm torn. Single word is definitely simple. Both to explain and to implement. But it just kind of leaves me feeling crippled. I have a ton of YAML docs that would instantly become invalid. I'd need to cruft them up with lots and lots of quotes. I like the Perl attitude. Larry never shyed away from going the extra mile to make things DWIM. Sure, it's a double-bladed sword. So far, we've tried hard to make YAML rock. Let's not oversimplify things too much. Cheers, Brian > > This isn't so bad. Is it worth the extra complexity? > It would prevent a URI implicit type. > > Best, > > Clark |
From: Clark C . E. <cc...@cl...> - 2002-06-17 17:13:36
|
| > Dates and times show up like crazy in my dumps. In business | > data dates, times, and currency are big use cases -- much more | > common than floating point values. | | They are not native types in Python, Perl, etc. Ints and floats *are*. They are native in SQL and even DBASE II. In DBASE II floating point values weren't native. The point is dates and times are essential business data types that are very prolific. Dates and times are just as important as floating point values, if not more so since there are more business programmers. | date: ! 06-17-2002 | time: ! 12:15 Yuck. | Also, your business cases are probably not of the "human writable YAML" form. But they are human readable; invoices in YAML are clean and simple. Being human writable is nice readable was our first concern, right? | If it came down to it, I'd go with Ryan's stricter proposal of "unquoted | is string only". Ints and floats require '! '. For config stuff, the | application would know which strings should become ints and floats. For | RPC stuff, the emitter would do the right thing: | | size: ! 9 | color: blue | weight: ! 3.5 Ok. So you want an implicit type indicator. This is butt ugly for common dumps. Try it. Ick. If it comes down to it, I'd go the other way and require that all strings get quoted. | > flex: This type is your #4 above. I'd limit the impliict | > text recognition to start with an alpha character so | > that dates and numeric types can co-exist peacefully. | > | > word: This type is similar to #5. In this type the | > scalar value can only be a word (sequence of | > word chars beginning with alpha). This is | > designed to work with nested maps/lists. It could | > also be the production for "key" values for | > regular multi-line mappings. | | I'm torn. Single word is definitely simple. Both to explain and to implement. | But it just kind of leaves me feeling crippled. I have a ton of YAML docs | that would instantly become invalid. I'd need to cruft them up with lots and | lots of quotes. In the "!" proposal that leaves alot of my examples and I have to cruft them up with lots and lots of explanation marks. | I like the Perl attitude. Larry never shyed away from going the extra mile to | make things DWIM. Sure, it's a double-bladed sword. So far, we've tried hard | to make YAML rock. Let's not oversimplify things too much. Well... let me think on this for a few days. I have alot of day job stuff to get done and now that we are questioning basic syntax again I really need to approach it with a clear head. Clark |