From: Oren Ben-K. <or...@ri...> - 2002-06-18 07:49:33
|
Correction: > - *Unavoidable* restrictions for each are: > > top-value: [ *SPC, *IND, *COM, *SEQ ] ^^^^ '- ' there doesn't cause an ambiguity. So: > This is 4 different unique combinations: > > top-value: [ *SPC, *IND, *COM ] > top-key: [ *SPC, *IND, *COM, *SEQ, *KEY ] > [ top-seq, > inline-key ]: [ *SPC, *IND, *COM, *KEY ] > [ inline-sep, > inline-value ]: [ *SPC, *IND, *COM, *SEP ] And: > If option 0 is out of the question, I'd like to suggest the following > instead (call this option 1): > > top-value: [ *SPC, *IND, *COM ] > [ top-key, > top-seq, > inline-key ]: [ *SPC, *IND, *COM, *SEQ, *KEY ] > [ inline-sep, > inline-value ]: [ *SPC, *IND, *COM, *SEP ] Now, if we just said that all styles had *SEQ (something pretty reasonable, I think), we automatically get option 2, that is very similar to option 1. Only three styles: top-value: [ *SPC, *IND, *COM, *SEQ ] [ top-key, top-seq, inline-key ]: [ *SPC, *IND, *COM, *SEQ, *KEY ] [ inline-sep, inline-value ]: [ *SPC, *IND, *COM, *SEQ, *SEP ] This is more consistent than option 1. I like it better. In fact, I like it best of all the options so far. Thoughts? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-06-18 13:11:59
|
Clark C . Evans [mailto:cc...@cl...] wrote: > My preference... > > ? [top-key, inline-key, inline-value, inline-seq] > : [*SPC, *IND, *COM, *SEQ, *KEY, *SEP, *SPN, *INL } > ? [top-value, top-seq] > : [*SPC, *IND, *COM, *SEQ, *KEY] > > And I even think top-value and top-seq should be *SPN. I think the above is much too restrictive. But since you've now raised a "grand unified style" proposal, let's handle that first and return to this if it is still relevant. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-06-18 14:25:44
|
So, now we've got the unification out of the way :-) > My preference... > > ? [top-key, inline-key, inline-value, inline-seq] > : [*SPC, *IND, *COM, *SEQ, *KEY, *SEP, *SPN, *INL } > ? [top-value, top-seq] > : [*SPC, *IND, *COM, *SEQ, *KEY] > > And I even think top-value and top-seq should be *SPN. *KEY is a PITA. It will crop up in top-value (Note: It *will* crop up!). We have a reason it allows us to make ':' a BAD character and catch URIs as an implicit type. So I'm willing to go with it being universal... This changes my proposal to: [ top-value, top-key, top-seq ]: [ *SPC, *IND, *COM, *SEQ, *KEY ] [ inline-seq, inline-key, inline-value ]: [ *SPC, *IND, *COM, *SEQ, *KEY, *SEP, *INL ] Hey, look, just two styles now! I've included *INL in the in-line style above. *INL isn't that crucial anyway. I'm much less comfortable with *SPN. There's no reason whatsoever that inline-value should have *SPN. It unnecessarily restricts in-line collections to "small" ones. I want in-line style to be "good enough" for all people who have a hard time with the mandatory indented block concept. Now, I have much less of a problem with one-line keys. So I'll go along with: [ top-value, top-seq ]: [ *SPC, *IND, *COM, *SEQ, *KEY ] [ inline-value, inline-seq ]: [ *SPC, *IND, *COM, *SEQ, *KEY, *SEP ] [ top-key, inline-key ]: [ *SPC, *IND, *COM, *SEQ, *KEY, *SEP, *INL, *SPN ] OK, that's three styles again (top-value, inline-value and key). The top-key is a bit "too restricted" but so be it - it is being artificially restricted to one line anyway, what's a bit more restriction about ', ' and '[]{}'... Would this be acceptable to you? Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-06-18 14:46:47
|
On Tue, Jun 18, 2002 at 10:27:17AM -0400, Oren Ben-Kiki wrote: | [ top-value, | top-seq ]: [ *SPC, *IND, *COM, *SEQ, *KEY ] | | [ inline-value, | inline-seq ]: [ *SPC, *IND, *COM, *SEQ, *KEY, *SEP ] | | [ top-key, | inline-key ]: [ *SPC, *IND, *COM, *SEQ, *KEY, *SEP, *INL, *SPN ] | | OK, that's three styles again (top-value, inline-value and key). The top-key | is a bit "too restricted" but so be it - it is being artificially restricted | to one line anyway, what's a bit more restriction about ', ' and '[]{}'... I'll go with this one; but would still like to see some simplification. Clark |
From: Brian I. <in...@tt...> - 2002-06-18 15:32:39
|
On 18/06/02 10:56 -0400, Clark C . Evans wrote: > On Tue, Jun 18, 2002 at 10:27:17AM -0400, Oren Ben-Kiki wrote: > | [ top-value, > | top-seq ]: [ *SPC, *IND, *COM, *SEQ, *KEY ] > | > | [ inline-value, > | inline-seq ]: [ *SPC, *IND, *COM, *SEQ, *KEY, *SEP ] > | > | [ top-key, > | inline-key ]: [ *SPC, *IND, *COM, *SEQ, *KEY, *SEP, *INL, *SPN ] > | > | OK, that's three styles again (top-value, inline-value and key). The top-key > | is a bit "too restricted" but so be it - it is being artificially restricted > | to one line anyway, what's a bit more restriction about ', ' and '[]{}'... > > I'll go with this one; but would still like to see some simplification. [ top-value, top-seq ]: [ *A_N, *COM, *KEY ] [ inline-value, inline-seq ]: [ *A_N, *COM, *KEY, *SEP ] [ top-key, inline-key ]: [ *A_N, *COM, *KEY, *SEP, *SPN ] How about that? Reduced rules, and no *INL, which is just too artificial for my taste. Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-06-18 18:19:58
|
Brian Ingerson wrote: > If you add the restriction we've be using all along: > - &A_N Must start with A-Za-z0-9_ # implies *SPC *IND *SEQ Aren't you confusing the restriction on unquoted in-line values with the regexp for strings? Surely unquoted in-lines can start with any non-indicator printable character (for example, '+' as in: int: +12 Let's just say for shorthand that: *CORE ::= *SPC + *IND + *SEQ + *COM That is, all things that really can't/shouldn't appear in *any* such value, no matter what. The proposal becomes: [ top-value, top-seq ]: [ *CORE, *KEY ] # seq requires *KEY so we just ban it everywhere. [ inline-value, inline-seq ]: [ *CORE, *KEY, *SEP ] # Inline bans *SEP as well. [ top-key, inline-key ]: [ *CORE, *KEY, *SEP, *SPN ] # Keys can't span. I think we all agree on this? We need to agree on three things: - The scalar styles (Clark's latest looks basically OK) - The restrictions on the unquoted in-line style (seems the above is acceptable to all) - The regexp for strings (we have some reasonable options, Clark will make a call). Does that "settle it"? (hopeful tone :-) Have fun, Oren Ben-Kiki |
From: Clark C . E. <cc...@cl...> - 2002-06-18 19:20:39
|
On Tue, Jun 18, 2002 at 02:21:34PM -0400, Oren Ben-Kiki wrote: | Let's just say for shorthand that: | | *CORE ::= *SPC + *IND + *SEQ + *COM | | That is, all things that really can't/shouldn't appear in *any* such value, | no matter what. The proposal becomes: | | [ top-value, | top-seq ]: [ *CORE, *KEY ] | # seq requires *KEY so we just ban it everywhere. | | [ inline-value, | inline-seq ]: [ *CORE, *KEY, *SEP ] | # Inline bans *SEP as well. | | [ top-key, | inline-key ]: [ *CORE, *KEY, *SEP, *SPN ] | # Keys can't span. | | I think we all agree on this? Ok. I don't think that multi-line scalars within an in-line collection is good practice; but if it will help acceptance more than hurt it then I'm game. A further clarification, this limits quoted variants to a single line as well? | We need to agree on three things: | - The scalar styles (Clark's latest looks basically OK) I talked to Brian and he doesn't like the unification and thinks that the current path is good. So, unless there is an outcry of support for the proposed unification; I guess it will just fade away as a unfulfilled strawman. ;) | - The restrictions on the unquoted in-line style (seems | the above is acceptable to all) It's acceptable, given the reservations above. Brian and I discussed how the top-* items can have the semantics of the folded type so that we don't introduce yet another set of folding rules. The indentation for the multi-line top-* scalars is determined from the first non-blank line past the opening line. If you want any other indentation setting, use the folded variant. This would allow for Wiki stuff within this scalar form and keep the logic a bit more uniform between the two scalar styles. | - The regexp for strings (we have some reasonable | options, Clark will make a call). This comes down to what direction we want to take implicit types. We have a few choices: - Drop them altogether; I include using the parenthesis mechanism (even if it is auto-detected within parens) as making them explicit. This is simple since it has a specific indicator for auto-type detection. The parenthesis is just a nicer looking, stand-alone ! indicator. Nothing more nothing less; and it isn't implicit any more. This isn't pretty beacuse it leaves out integers, timestamps, floats, null, and other very common types which are not strings. - Fix them at the current level, possibly with a few quick modifications. This would not prevent us from defining above parenthesis mechanism which could be more flexible in the future. This is nice in that commonly used types of today are supported well; and we can use the parenthesis marker for types of tomorow. If we go this route, we have to choose if stuff like tuples (ip addresses) and URIs should be implicit types. An argument could be made that these are application level objects and that typing beyond the core types should be done at the application level; either explicitly or via a schema which has the expected data types. - Allow us to add them in the future; be quite restrictive up front. This comes at the price of having to quote things, perhaps things that don't make sense that they should be quoted. Before my suggestion, let me get two changes to implicit types that I think we should implement: - We clean up the date/time productions to eliminate time. Time just doesn't make sense without date as there arn't any good native "time-only" types. We only keep the timestamp type with an optional date-only shorthand used only when the time is exactly midnight GMT. We further limit the regex to be exactly one token by only using a T to separate the date from the optional time component. - We add a very cute format for floating point values which happen to have two decimal places. 34.3% is treated as 0.343 ; this would make alot of business YAML representations look great. Yes? (nothing like pure un-adultrated sugar) The compromise I'm now leaning to is a fixed approach with a few escape hatches so we can be creative later if the need emerges. Unfortunately, this means that it will be a bit complicated... 1. The regexp for matching text content should exclude a few characters that are not commonly used for punctuation that we can use later for indicators or other implicit types or for other purposes. |~#%^*+=<>$ We really can't restrict @ due to e-mail addresses, \ due to DOS pathnames, : due to URLs. 2. Anything starting with alpha-numeric value that is more than one token (contains a space) is a text via the implicit rule. 3. Anything starting with alphabetic character is text via the implicit rule. This rules out implicit typing of URLs, pathnames, and email. This isn't so bad. I don't know what a URL type would give me anyway... Hmm. Ok. I have to get back to work... so this is off... Clark |
From: Brian I. <in...@tt...> - 2002-06-18 23:33:34
|
On 18/06/02 15:29 -0400, Clark C . Evans wrote: > On Tue, Jun 18, 2002 at 02:21:34PM -0400, Oren Ben-Kiki wrote: > | Let's just say for shorthand that: > | > | *CORE ::= *SPC + *IND + *SEQ + *COM > | > | That is, all things that really can't/shouldn't appear in *any* such value, > | no matter what. The proposal becomes: > | > | [ top-value, > | top-seq ]: [ *CORE, *KEY ] > | # seq requires *KEY so we just ban it everywhere. > | > | [ inline-value, > | inline-seq ]: [ *CORE, *KEY, *SEP ] > | # Inline bans *SEP as well. > | > | [ top-key, > | inline-key ]: [ *CORE, *KEY, *SEP, *SPN ] > | # Keys can't span. > | > | I think we all agree on this? > > Ok. I don't think that multi-line scalars within an in-line > collection is good practice; but if it will help acceptance > more than hurt it then I'm game. A further clarification, > this limits quoted variants to a single line as well? I concur that using multi-line scalars within an in-line collection is not good practice. Especially when you want to allow comments on any line. I'd really like to avoid comments in the middle of a token. I'd also like to avoid inline-collection continuation/spanning mid-token. --- bad practices: #should be illegal IMO - { foo: figuring out # first line scalars is hard, # second line } - { foo: "even double quotes # what am I? should be single line" # I yaml what I yaml! I think that both Clark and I want to preserve the original intent of inline collections (small groupings of small things) while allowing them to use multiple lines. But I think we all agree that scalar spanning in inline collections is possible. > > | We need to agree on three things: > | - The scalar styles (Clark's latest looks basically OK) > > I talked to Brian and he doesn't like the unification and > thinks that the current path is good. So, unless there is > an outcry of support for the proposed unification; I guess > it will just fade away as a unfulfilled strawman. ;) We are very close with our current spec, and I really like the multiple forms to boot. A little bit of redundancy is a good thing (paraphrasing Larry Wall :) I didn't like that we threw out double quoting on a whim, just because we might be able to get by without it. Double quoting is a well known idiom that YAML is currently capitalizing on. I also like '>' nested folding. I'm not sure that you could do folding right without it in all cases, because sometimes you can't autodetect the indentation level. Thus our explicit number indicator: '>4'. Wiki also benefits visually from the nested style folding for scalars that begin with indented wiki content. We've been down these treacherous roads before. We're on the right track. > | - The restrictions on the unquoted in-line style (seems > | the above is acceptable to all) > > It's acceptable, given the reservations above. > > Brian and I discussed how the top-* items can have > the semantics of the folded type so that we don't > introduce yet another set of folding rules. The > indentation for the multi-line top-* scalars is > determined from the first non-blank line past the > opening line. If you want any other indentation > setting, use the folded variant. This would allow > for Wiki stuff within this scalar form and keep the > logic a bit more uniform between the two scalar styles. I agreed with Clark that this was possible. I don't really desire it though. I think the nested-folded form should use wiki style folding, and the spanning top-value should use collapse style folding, where all leading and trailing whitespace collapses to a single space. This actually gives us more power. The added bonus is that collapse style doesn't care about indentation width, so no explicit indicator is needed. > | - The regexp for strings (we have some reasonable > | options, Clark will make a call). > > This comes down to what direction we want to take > implicit types. We have a few choices: > > - Drop them altogether; I include using the parenthesis > mechanism (even if it is auto-detected within parens) > as making them explicit. > > This is simple since it has a specific indicator for > auto-type detection. The parenthesis is just a > nicer looking, stand-alone ! indicator. Nothing more > nothing less; and it isn't implicit any more. > > This isn't pretty beacuse it leaves out integers, > timestamps, floats, null, and other very common types > which are not strings. > > - Fix them at the current level, possibly with a few > quick modifications. This would not prevent us from > defining above parenthesis mechanism which could be > more flexible in the future. +1 > This is nice in that commonly used types of today are > supported well; and we can use the parenthesis marker > for types of tomorow. If we go this route, we have > to choose if stuff like tuples (ip addresses) and > URIs should be implicit types. An argument could be > made that these are application level objects and that > typing beyond the core types should be done at the > application level; either explicitly or via a > schema which has the expected data types. > > - Allow us to add them in the future; be quite restrictive > up front. This comes at the price of having to quote > things, perhaps things that don't make sense that they > should be quoted. > > Before my suggestion, let me get two changes to implicit > types that I think we should implement: > > - We clean up the date/time productions to eliminate > time. Time just doesn't make sense without date as > there arn't any good native "time-only" types. We > only keep the timestamp type with an optional date-only > shorthand used only when the time is exactly > midnight GMT. We further limit the regex to be > exactly one token by only using a T to separate > the date from the optional time component. To clarify, we drop !date and !time from the spec, and just support !timestamp. (Perhaps we can rename !timestamp to !date for brevity) The new date could be written in either of the following formats: --- - 2001-06-18 # Implies midnight for time. - 2001-06-18T19:22:45.5Z If we enforce the 'T' then we don't have any implicits with whitespace in them, except string. This makes it easier to have a relaxed string implicit. > > - We add a very cute format for floating point values > which happen to have two decimal places. 34.3% is > treated as 0.343 ; this would make alot of business > YAML representations look great. Yes? (nothing > like pure un-adultrated sugar) I don't care. I don't care. (quoting Jon "not John" Wayne) > The compromise I'm now leaning to is a fixed approach with > a few escape hatches so we can be creative later if the > need emerges. Unfortunately, this means that it will > be a bit complicated... > > 1. The regexp for matching text content should exclude > a few characters that are not commonly used for > punctuation that we can use later for indicators or > other implicit types or for other purposes. > > |~#%^*+=<>$ > > We really can't restrict @ due to e-mail addresses, > \ due to DOS pathnames, : due to URLs. -1 # Parens and leading alphanums cover all the bases. > 2. Anything starting with alpha-numeric value that > is more than one token (contains a space) is a > text via the implicit rule. +1 > 3. Anything starting with alphabetic character is > text via the implicit rule. +1 > > This rules out implicit typing of URLs, pathnames, and > email. This isn't so bad. I don't know what a URL type > would give me anyway... Hmm. +1 Cheers, Brian |
From: Brian I. <in...@tt...> - 2002-06-18 22:36:06
|
Sorry to be missing the action. My mail server was down for a few hours. On 18/06/02 14:21 -0400, Oren Ben-Kiki wrote: > Brian Ingerson wrote: > > If you add the restriction we've be using all along: > > - &A_N Must start with A-Za-z0-9_ # implies *SPC *IND *SEQ > > Aren't you confusing the restriction on unquoted in-line values with the > regexp for strings? Surely unquoted in-lines can start with any > non-indicator printable character (for example, '+' as in: Yes. *blush* Sorry. > > int: +12 > > Let's just say for shorthand that: > > *CORE ::= *SPC + *IND + *SEQ + *COM Cool. > > That is, all things that really can't/shouldn't appear in *any* such value, > no matter what. The proposal becomes: > > [ top-value, > top-seq ]: [ *CORE, *KEY ] > # seq requires *KEY so we just ban it everywhere. > > [ inline-value, > inline-seq ]: [ *CORE, *KEY, *SEP ] > # Inline bans *SEP as well. > > [ top-key, > inline-key ]: [ *CORE, *KEY, *SEP, *SPN ] > # Keys can't span. > > I think we all agree on this? > > We need to agree on three things: > - The scalar styles (Clark's latest looks basically OK) > - The restrictions on the unquoted in-line style (seems the above is > acceptable to all) > - The regexp for strings (we have some reasonable options, Clark will make a > call). > > Does that "settle it"? (hopeful tone :-) We'll see... Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-06-18 20:40:20
|
Clark C . Evans wrote: > | no matter what. The proposal becomes: > | > | [ top-value, > | top-seq ]: [ *CORE, *KEY ] > | # seq requires *KEY so we just ban it everywhere. > | > | [ inline-value, > | inline-seq ]: [ *CORE, *KEY, *SEP ] > | # Inline bans *SEP as well. > | > | [ top-key, > | inline-key ]: [ *CORE, *KEY, *SEP, *SPN ] > | # Keys can't span. > | > | I think we all agree on this? > > Ok. I don't think that multi-line scalars within an in-line > collection is good practice; but if it will help acceptance > more than hurt it then I'm game. A further clarification, > this limits quoted variants to a single line as well? Nope. Quoted can always span. We don't want several quoted style variants, it is bad enough we'll have several "simple" style variants. Besides, there there's *really* no reason to limit quoted keys. You have a start quote and an end quote - no possible ambiguity or confusion. > | We need to agree on three things: > | - The scalar styles (Clark's latest looks basically OK) > > I talked to Brian and he doesn't like the unification and > thinks that the current path is good. So, unless there is > an outcry of support for the proposed unification; I guess > it will just fade away as a unfulfilled strawman. ;) Well, I rather liked it. But if Brian feels strongly against it, I'll let it go. > | - The restrictions on the unquoted in-line style (seems > | the above is acceptable to all) > > It's acceptable, given the reservations above. So maybe it isn't OK after all (quoted keys spanning lines) :-) But I think it is reasonable as it is. > Brian and I discussed how the top-* items can have > the semantics of the folded type so that we don't > introduce yet another set of folding rules. The > indentation for the multi-line top-* scalars is > determined from the first non-blank line past the > opening line. If you want any other indentation > setting, use the folded variant. This would allow > for Wiki stuff within this scalar form and keep the > logic a bit more uniform between the two scalar styles. I'll go with that (more productions work... sigh). > | - The regexp for strings (we have some reasonable > | options, Clark will make a call). > > This comes down to what direction we want to take > implicit types. We have a few choices: > ... > - Fix them at the current level, possibly with a few > quick modifications. This would not prevent us from > defining above parenthesis mechanism which could be > more flexible in the future. Basically my choice. > ... > Before my suggestion, let me get two changes to implicit > types that I think we should implement: > > - We clean up the date/time productions to eliminate > time. Time just doesn't make sense without date as > there arn't any good native "time-only" types. I'll go with that. > ... We further limit the regex to be > exactly one token by only using a T to separate > the date from the optional time component. This I have a problem with. It is *so* unreadable. > - We add a very cute format for floating point values > which happen to have two decimal places. 34.3% is > treated as 0.343 ; this would make alot of business > YAML representations look great. Yes? (nothing > like pure un-adultrated sugar) Almost too sweet for my taste. If nobody else objects, I'll go with it. > The compromise I'm now leaning to is a fixed approach with > a few escape hatches so we can be creative later if the > need emerges. Unfortunately, this means that it will > be a bit complicated... I think we should be able to work out something simpler. After some reflection, I'm not too happy with "good" and "bad" characters. I'll think about this some more. Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@ri...> - 2002-06-19 08:55:33
|
Brian Ingerson [mailto:in...@tt...] wrote: > > | [ top-value, > > | top-seq ]: [ *CORE, *KEY ] > > | # seq requires *KEY so we just ban it everywhere. > > | > > | [ inline-value, > > | inline-seq ]: [ *CORE, *KEY, *SEP ] > > | # Inline bans *SEP as well. > > | > > | [ top-key, > > | inline-key ]: [ *CORE, *KEY, *SEP, *SPN ] > > | # Keys can't span. > > | > > | I think we all agree on this? > > > > Ok. So that seems settled... > > I don't think that multi-line scalars within an in-line > > collection is good practice; but if it will help acceptance > > more than hurt it then I'm game. A further clarification, > > this limits quoted variants to a single line as well? > > I concur that using multi-line scalars within an in-line collection is > not good practice. Especially when you want to allow comments > on any line. > I'd really like to avoid comments in the middle of a token. > I'd also like to > avoid inline-collection continuation/spanning mid-token. I'm afraid I have to insist, for the same old reason: I want a way to satisfy the "block indentation is evil" crowd. > --- > bad practices: #should be illegal IMO > - { foo: figuring out # first line > scalars is hard, # second line > } > - { foo: "even double quotes # what am I? > should be single line" # I yaml what I yaml! '# what am I?' is content. And yes, this is evil. Anyone writing such text deserves what he gets. And he'd better use syntax-highlighting editor :-) > I think that both Clark and I want to preserve the original > intent of inline collections (small groupings of small things) > while allowing them to use multiple lines. And I disagree (for the reason I gave above). > But I think we all > agree that scalar spanning in inline collections is possible. OK :-) > > | We need to agree on three things: > > | - The scalar styles (Clark's latest looks basically OK) > > > > I talked to Brian and he doesn't like the unification and > > thinks that the current path is good. So, unless there is > > an outcry of support for the proposed unification; I guess > > it will just fade away as a unfulfilled strawman. ;) > > We are very close with our current spec, and I really like > the multiple forms to boot. A little bit of redundancy is a > good thing (paraphrasing Larry Wall :) You know, YAML started very Python-ish ("do it the one true way"), and it has evolved to include more Perl ("There is more than one way to do it"); you might say it YAML is very "Parrot"-ish now. Which I believe is a good thing. Multi-line in-line scalars are a good example of this. It allows writing long text both in a "Python" style (using indentation) and "Perl" style (ignoring indentation). I think this is great. > > Brian and I discussed how the top-* items can have > > the semantics of the folded type so that we don't > > introduce yet another set of folding rules... > > I agreed with Clark that this was possible. I don't really > desire it though. Neither do I, for various reasons. > I think the nested-folded form should use wiki style folding, and the > spanning top-value should use collapse style folding... +10. I was going to send a lengthy posting on that - but given you agree I'll spare us all :-) I'll just mention the most important reason I agree: if we fold in-line values, there's too little a difference between '>' and simple. It would just beg for a unification of the kind Clark proposed. So as long as we don't do such a unification, I strongly feel that in-line spanning values should be collapsed (as they are today) instead of folded. > > | - The regexp for strings (we have some reasonable > > | options, Clark will make a call). > > > > - Fix them at the current level, possibly with a few > > quick modifications. This would not prevent us from > > defining above parenthesis mechanism which could be > > more flexible in the future. > > +1 We all agree on that, it seems. The question is what is exactly the best string regexp for the job. This is still TBD. > > - We clean up the date/time productions to eliminate > > time. Time just doesn't make sense without date as > > there arn't any good native "time-only" types. We > > only keep the timestamp type with an optional date-only > > shorthand used only when the time is exactly > > midnight GMT. We further limit the regex to be > > exactly one token by only using a T to separate > > the date from the optional time component. > > To clarify, we drop !date and !time from the spec, and just support > !timestamp. Right. > (Perhaps we can rename !timestamp to !date for brevity) +1 > The new date could be written in either of the following > formats: > > --- > - 2001-06-18 # Implies midnight for time. I was musing about making 12:00:00.00 (noon) the default time part of a date. If we used that, no matter what time zone you use, the date stays the same. While if you are using 00:00:00.00, if you apply the wrong time zone, you will get a different date. Thoughts? > - 2001-06-18T19:22:45.5Z > > If we enforce the 'T' then we don't have any implicits with > whitespace in them, except string. This makes it easier to have a > relaxed string implicit. 'T' is *so* ugly. Whoever came up with this one deserves to be reading such dates all day long. A space there is *much* more readable. Compare: - 2001-06-18T19:22:45.5+05:00 - 2001-06-18 19:22:45.5 +05:00 And I'm certain we'll be able to find a reasonable regexp for strings regardless of the space issue. > > - We add a very cute format for floating point values > > which happen to have two decimal places. 34.3% is > > treated as 0.343 ; this would make alot of business > > YAML representations look great. Yes? (nothing > > like pure un-adultrated sugar) > > I don't care. I don't care. (quoting Jon "not John" Wayne) +1 - neither do I :-) Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-06-19 14:34:15
|
On 19/06/02 04:57 -0400, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > > > | [ top-value, > > > | top-seq ]: [ *CORE, *KEY ] > > > | # seq requires *KEY so we just ban it everywhere. > > > | > > > | [ inline-value, > > > | inline-seq ]: [ *CORE, *KEY, *SEP ] > > > | # Inline bans *SEP as well. > > > | > > > | [ top-key, > > > | inline-key ]: [ *CORE, *KEY, *SEP, *SPN ] > > > | # Keys can't span. > > > | > > > | I think we all agree on this? > > > > > > Ok. > > So that seems settled... > > > > I don't think that multi-line scalars within an in-line > > > collection is good practice; but if it will help acceptance > > > more than hurt it then I'm game. A further clarification, > > > this limits quoted variants to a single line as well? > > > > I concur that using multi-line scalars within an in-line collection is > > not good practice. Especially when you want to allow comments > > on any line. > > I'd really like to avoid comments in the middle of a token. > > I'd also like to > > avoid inline-collection continuation/spanning mid-token. > > I'm afraid I have to insist, for the same old reason: I want a way to > satisfy the "block indentation is evil" crowd. If such a crowd exists. I haven't heard any push back on this topic. I can't really believe that anyone will seriously consider using YAML, and not use the indented structures. I also doubt that implementors will even bother to do a decent job of Dumping inline structures. It's not a Dumping requirement. Just more work for the parser/loader. But OK :) > > > --- > > bad practices: #should be illegal IMO > > - { foo: figuring out # first line > > scalars is hard, # second line > > } > > - { foo: "even double quotes # what am I? > > should be single line" # I yaml what I yaml! > > '# what am I?' is content. And yes, this is evil. Anyone writing such text > deserves what he gets. And he'd better use syntax-highlighting editor :-) > > > I think that both Clark and I want to preserve the original > > intent of inline collections (small groupings of small things) > > while allowing them to use multiple lines. > > And I disagree (for the reason I gave above). > > > But I think we all > > agree that scalar spanning in inline collections is possible. > > OK :-) > > > > | We need to agree on three things: > > > | - The scalar styles (Clark's latest looks basically OK) > > > > > > I talked to Brian and he doesn't like the unification and > > > thinks that the current path is good. So, unless there is > > > an outcry of support for the proposed unification; I guess > > > it will just fade away as a unfulfilled strawman. ;) > > > > We are very close with our current spec, and I really like > > the multiple forms to boot. A little bit of redundancy is a > > good thing (paraphrasing Larry Wall :) > > You know, YAML started very Python-ish ("do it the one true way"), and it > has evolved to include more Perl ("There is more than one way to do it"); > you might say it YAML is very "Parrot"-ish now. Which I believe is a good > thing. Multi-line in-line scalars are a good example of this. It allows > writing long text both in a "Python" style (using indentation) and "Perl" > style (ignoring indentation). I think this is great. True. The Perl philosophy is to definitely go the extra mile for the user (or for the camel :). No arbitrary restrictions. > > > > Brian and I discussed how the top-* items can have > > > the semantics of the folded type so that we don't > > > introduce yet another set of folding rules... > > > > I agreed with Clark that this was possible. I don't really > > desire it though. > > Neither do I, for various reasons. > > > I think the nested-folded form should use wiki style folding, and the > > spanning top-value should use collapse style folding... > > +10. I was going to send a lengthy posting on that - but given you agree > I'll spare us all :-) I'll just mention the most important reason I agree: > if we fold in-line values, there's too little a difference between '>' and > simple. It would just beg for a unification of the kind Clark proposed. So > as long as we don't do such a unification, I strongly feel that in-line > spanning values should be collapsed (as they are today) instead of folded. I talked a little more with Clark about this late yesterday. He brought up the point that taking Wiki away is arbitrary. He had me weakly agreeing. I think that wiki belongs where there is a solid sense of lefthand margin. This strong sense is in the nested forms. In the non-nested forms, you are just continuing a line more or less. And there can be ambiguities as to where the lefthand margin is. So I'll side weakly with Oren. It brings up the point of hard newlines though. do we: "Allow empty lines like the one below to indicate a hard newline, or do we collapse them and require escaping like on this line\n" > > > > | - The regexp for strings (we have some reasonable > > > | options, Clark will make a call). > > > > > > - Fix them at the current level, possibly with a few > > > quick modifications. This would not prevent us from > > > defining above parenthesis mechanism which could be > > > more flexible in the future. > > > > +1 > > We all agree on that, it seems. The question is what is exactly the best > string regexp for the job. This is still TBD. I have an issue to raise here. Why, Oren, are you against there being an _order_ to the regexps? In reality, there is an order, so why not do it by definition? That makes the string regexp become /^\w/ if it comes *after* the int, float and date regexps. What's the big deal here? regexp order: - int - float - date - string - null - all future parenthesized regexps How does Neil feel about this? > > > - We clean up the date/time productions to eliminate > > > time. Time just doesn't make sense without date as > > > there arn't any good native "time-only" types. We > > > only keep the timestamp type with an optional date-only > > > shorthand used only when the time is exactly > > > midnight GMT. We further limit the regex to be > > > exactly one token by only using a T to separate > > > the date from the optional time component. > > > > To clarify, we drop !date and !time from the spec, and just support > > !timestamp. > > Right. > > > (Perhaps we can rename !timestamp to !date for brevity) > > +1 > > > The new date could be written in either of the following > > formats: > > > > --- > > - 2001-06-18 # Implies midnight for time. > > I was musing about making 12:00:00.00 (noon) the default time part of a > date. If we used that, no matter what time zone you use, the date stays the > same. While if you are using 00:00:00.00, if you apply the wrong time zone, > you will get a different date. Thoughts? > > > - 2001-06-18T19:22:45.5Z > > > > If we enforce the 'T' then we don't have any implicits with > > whitespace in them, except string. This makes it easier to have a > > relaxed string implicit. > > 'T' is *so* ugly. Whoever came up with this one deserves to be reading such > dates all day long. A space there is *much* more readable. Compare: > > - 2001-06-18T19:22:45.5+05:00 > > - 2001-06-18 19:22:45.5 +05:00 I think all the ISO dates are ugly. But Clark convinced me that these are good because they are standard. That's about the only reason I can tolerate dates outside parens. Otherwise we could support all 20+ dates inside parens. So my question is "Is the space instead T thing part of the standard?". If so, I'll go with it provided we can make the regexps ordered. :-) If not, why are we even suggesting it. We should just put it in parens with the whole lot of them. The thing is already 38 characters long. Parens would put it at a nice even 40. :) > > And I'm certain we'll be able to find a reasonable regexp for strings > regardless of the space issue. > > > > - We add a very cute format for floating point values > > > which happen to have two decimal places. 34.3% is > > > treated as 0.343 ; this would make alot of business > > > YAML representations look great. Yes? (nothing > > > like pure un-adultrated sugar) > > > > I don't care. I don't care. (quoting Jon "not John" Wayne) > > +1 - neither do I :-) Yip. Yip. Gaaaaaadammit. Cheers, Brian |
From: Oren Ben-K. <or...@ri...> - 2002-06-19 14:59:31
|
Brian Ingerson [mailto:in...@tt...] wrote: > I have an issue to raise here. Why, Oren, are you against > there being an > _order_ to the regexps? Because I do want to allow adding implicit types later on. requiring an order makes it much more of "a package deal". It brings us to a place where we say "... and everything else is a string" which means never ever adding any implicit type (that isn't surrounded by parenthesis or whatever). I think it is possible to create a string regexp that will be DWIM (I suggested a few). > > - 2001-06-18T19:22:45.5+05:00 > > > > - 2001-06-18 19:22:45.5 +05:00 > > I think all the ISO dates are ugly. But Clark convinced me > that these are > good because they are standard. That's about the only reason > I can tolerate > dates outside parens. Otherwise we could support all 20+ > dates inside parens. > > So my question is "Is the space instead T thing part of the > standard?". I think it isn't. This *was* proposed in the IETF draft on the subject - for the same reason I suggested it - but I think it never made it into a full-fledged RFC. It seems a common "wish" people have from the standard... I've seen a note saying that this is "controversial" and that "... reading of the ISO8601 spec doesn't _seem_ to allow it". In another place I've seen a mention that ISO8601 makes the 'T' "optional if both parties agree to it" which makes me wonder about the degree of flexibility in the standard. The only way to be certain is to buy the thing (that isn't exactly cheap) and read it. Or find someone who has access to it... > If so, I'll go with it provided we can make the regexps ordered. :-) > > If not, why are we even suggesting it. We should just put it > in parens with the whole lot of them. The thing is already > 38 characters long. Parens would put it at a nice even 40. :) I suggested it because that is the best combination of readability (it is very readable), standard (ISO8601, or almost, and the IETF draft - OK, not an RFC, but...), plus all the good technical properties (being able to do a string sort etc.). I still think we should allow it (while also allowing for the ISO8601 ugly 'T' format). As for the (), I'd only resort to using them for "last resort" - things like (16 Jan 2001). Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2002-06-19 15:23:28
|
On 19/06/02 11:01 -0400, Oren Ben-Kiki wrote: > Brian Ingerson [mailto:in...@tt...] wrote: > > I have an issue to raise here. Why, Oren, are you against > > there being an > > _order_ to the regexps? > > Because I do want to allow adding implicit types later on. requiring an > order makes it much more of "a package deal". It brings us to a place where > we say "... and everything else is a string" which means never ever adding > any implicit type (that isn't surrounded by parenthesis or whatever). I > think it is possible to create a string regexp that will be DWIM (I > suggested a few). And I don't. The parens solution gives us 100% backwards compatibility for future implicits that start with word-chars. And the suggested string regexp allows us to still add implicits that start with non-word, non-indicator charaters, like $19.99. (Horrors!) Strings are king. Floats and Intses are princes. Dates cause debates. I have no desire to make anything else magical without some sort of indicator. The honeymoon is long over for implicits. They just aren't that compelling to warrant all this protection for future concerns. Cheers, Brian |