From: Oren Ben-K. <or...@be...> - 2004-11-06 19:25:10
|
The current "each directive is a line" rule has an undesirable consequence. I means that if the top-level node of a document is a non-indented block scalar, each line must not start with a '%' character (think PostScript!). %YAML 1.1 --- | foo %YAML 1.1 --- | %!PS-Adobe-3.0 PostScript stuff ... Oops! Proposed solution: Require that the '---' and '...' be on a line of their own. Not even trailing comments. Directives appear *after* the '---' (and a '---' is required for directives to appear). Example: --- %YAML 1.1 | foo --- %YAML 1.1 | %!PS-Adobe-3.0 PostScript stuff ... This way, the only forbidden lines in non-indented top-level block scalar content are "---<line-break>" and "...<line-break>". Everything else goes. Writing examples for all the productions is a great way to capture these nits :-) Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-11-07 02:10:13
|
On Sat, Nov 06, 2004 at 09:24:58PM +0200, Oren Ben-Kiki wrote: | %YAML 1.1 | --- | | foo | %YAML 1.1 | --- | | %!PS-Adobe-3.0 | PostScript stuff | ... Alternatively, we can only allow directives: (a) at the top of the file before the initial --- (b) following ... and before --- Your suggestion seems to put the directives 'inside' a given document, and that seems not quite right. Clark |
From: Oren Ben-K. <or...@be...> - 2004-11-07 06:49:14
|
On Sunday 07 November 2004 04:10, Clark C. Evans wrote: > On Sat, Nov 06, 2004 at 09:24:58PM +0200, Oren Ben-Kiki wrote: > | %YAML 1.1 > | --- | > | foo > | %YAML 1.1 > | --- | > | %!PS-Adobe-3.0 > | PostScript stuff > | ... > > Alternatively, we can only allow directives: > > (a) at the top of the file before the initial --- > (b) following ... and before --- OK. That would work. Note it harms concatenating streams; if the second one starts with a directive, you have to add a '...' before it, but only if the first stream didn't end with one. > Your suggestion seems to put the directives 'inside' a given > document, and that seems not quite right. Right - because directives get inherited by the following documents. Which raises another concatenation issue; we have a problem if the second stream starts without directives. If the first stream specified any, then they'll change the way the second one is parsed. So, YAML stream concatenation is no longer very simple. It requires: - Ensuring the second stream is using the same encoding as the first one. There's no helping this one, but at least one doesn't have to examine the contents of the streams to do it. - Adding '...' to the second stream to allow it to use directives, unless the first stream ended with one. Parsing the tail of the first stream is a serious PITA. It prevents ycat from doing a simple buffered write(read(buf-size)) for large streams. We could solve this one by allowing '...' to be repeated. This way you could always add '...' without having to examine the first stream's content. - Adding a '%YAML' directive to the second stream unless (1) it is empty or (2) it already starts with directives. This is the nastiest one because it requires specifying some explicit YAML version to the stream - something 'ycat' has no way of knowing. How about we allow an '%YAML' production without an explicit version number? This would be a great way to 'reset' the parser in mid-stream in other cases as well. - Adding '---' to the second stream unless (1) it is empty or (2) it already starts with one. These last two are also PITA but I see no way around them. To avoid lookahead, 'ycat' will copy any leading comments of the second stream and only when seeing the first non-comment line it would insert '...', '%YAML' and '---' as necessary. Once it reached this point, ycat could switch to a simple write(read(buf-size)) for the rest of the stream's content. A far cry from 'cat'... Bottom line: - Directives appear at start of stream or following a '...'. - Directives are always followed by a '---'. - ycat != cat. To make 'ycat' easier: - '...' may be repeated as long as there's nothing but comments between the two '...' lines (lets ycat avoid parsing the tail of the first stream, something which is rather complex to do). - %YAML directive is allowed to specify no version number (lets ycat avoid having to invent a specific YAML version numbers). Seems like a reasonable compromise... Thoughts? Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-11-09 20:11:00
|
On Sun, Nov 07, 2004 at 08:14:36AM +0200, Oren Ben-Kiki wrote: | - Directives appear at start of stream or following a '...'. | - Directives are always followed by a '---'. | - '...' may be repeated as long as there's nothing but | comments between the two '...' lines | - %YAML directive is allowed to specify no version number Good deal. Clark |
From: Brian I. <in...@tt...> - 2004-11-10 00:10:04
|
On 09/11/04 15:10 -0500, Clark C. Evans wrote: > On Sun, Nov 07, 2004 at 08:14:36AM +0200, Oren Ben-Kiki wrote: > | - Directives appear at start of stream or following a '...'. > | - Directives are always followed by a '---'. > | - '...' may be repeated as long as there's nothing but > | comments between the two '...' lines > | - %YAML directive is allowed to specify no version number > > Good deal. How about not allowing top level scalars to use the first column? Cheers, Brian |
From: Oren Ben-K. <or...@be...> - 2004-11-10 05:42:30
|
On Wednesday 10 November 2004 02:12, Brian Ingerson wrote: > How about not allowing top level scalars to use the first column? That doesn't solve the problem because of: --- "---" : "..." "%YAML" : "1.1" ... The problem is with all non-indented scalars, not just top-level ones. Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2004-11-11 17:26:44
|
On 10/11/04 07:42 +0200, Oren Ben-Kiki wrote: > On Wednesday 10 November 2004 02:12, Brian Ingerson wrote: > > How about not allowing top level scalars to use the first column? > > That doesn't solve the problem because of: > > --- > "---" : "..." > "%YAML" : "1.1" > ... > > The problem is with all non-indented scalars, not just top-level ones. But top-level, multiline, block scalars are the problem. Keys in top level mappings are not the proble because, as you show, you can quote the unindented ones. And all the other ones are indented. The real problem is that we allow a big chunk of multiline unindented text to have a '---\n' slapped on the front of it, and we call that valid yaml. I'm suggesting that we at least indent it. It not only solves the syntax ambiguities, but also gives a visual clue that the text is YAML encoded. Cheers, Brian |
From: Oren Ben-K. <or...@be...> - 2004-11-11 17:54:54
|
On Thursday 11 November 2004 19:27, Brian Ingerson wrote: > > The problem is with all non-indented scalars, not just top-level > > ones. > > But top-level, multiline, block scalars are the problem. Keys in top > level mappings are not the proble because, as you show, you can quote > the unindented ones. And all the other ones are indented. You can also quote the top level scalar... But you are right, its unindented top-level block scalars which are the problem. > The real problem is that we allow a big chunk of multiline unindented > text to have a '---\n' slapped on the front of it, and we call that > valid yaml. I'm suggesting that we at least indent it. It not only > solves the syntax ambiguities, but also gives a visual clue that the > text is YAML encoded. Sure, we could. Gain: no forbidden lines problem; No need for a '...' before '%' derivatives, ycat becomes a bit simpler. Pain: You are unable to slap a header in front of some piece of text and make it YAML. Is it worth it? Clark, what do you think? Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-11-11 19:21:41
|
On Thu, Nov 11, 2004 at 07:54:42PM +0200, Oren Ben-Kiki wrote: | > The real problem is that we allow a big chunk of multiline unindented | > text to have a '---\n' slapped on the front of it, and we call that | > valid yaml. I'm suggesting that we at least indent it. It not only | > solves the syntax ambiguities, but also gives a visual clue that the | > text is YAML encoded. | | Sure, we could. Gain: no forbidden lines problem; No need for a '...' | before '%' derivatives, ycat becomes a bit simpler. Pain: You are | unable to slap a header in front of some piece of text and make it | YAML. Is it worth it? Clark, what do you think? I've never used this "feature", indentation is easy: sed "s/^/ /g" So, given the minimal end-user benefit, and irritating implementation issues, I'm OK with forbidding these buggers. Clark |
From: Oren Ben-K. <or...@be...> - 2004-11-11 20:49:33
|
On Thursday 11 November 2004 19:54, Oren Ben-Kiki wrote: > Sure, we could. Gain: no forbidden lines problem; That's actually not true, because you still can't use '---' as the key for a top-level non-indented mapping, unless you quote it. So a "forbidden lines" rule is still needed. Here's a "slippery slope" for handling this: Option A: Currently there's a "forbidden lines" problem. This can be dealt with by allowing "..." to be repeated (and to appear before the first document). Option B1: Alternatively, we forbid non-indented top-level block scalars like Brian suggested. This, by itself, doesn't fully solve the problem, Option B2: The above B1 option still has forbidden lines for simple, non-indented keys. Solution: use the '%' character for the document start and end lines. Since simple keys can't start with '%', there wouldn't be any "forbidden lines" anywhere - problem solved: %--- ---: foo ...: bar ===: baz %... Option B3: %... isn't very nice, somehow. Since we are breaking compatibility anyway, we can use anything for the end line, like %===. At this point, what we have is a syntax which has no possible ambiguity between a document start/end/directive/comment line and actual content. Its main problem is incompatibility with YAML 1.0, but we're talking YAML 1.1 anyway - now is the time to float all incompatibility ideas. But, we can also take a whole different approach: Option C1: Use a directive to override the default end line. For example: %END !@#$% foo: bar !@#$% (where The default value of %END is the current '...'). At this point, there's no reason to forbid unindented top-level block scalars; just like in mime, one can solve any ambiguity problem by simply declaring an explicit end line he knows doesn't appear inside the text itself: %END !@#$% | foo #bar ... baz !@#$% Option C2: This still doesn't allow you to embed YAML text as unindented block content, because it will contain document start lines. No problem, throw another directive at it: %START {{{--- %END ---}}} # This stream contains two documents. # 1st document is a YAML stream. {{{--- | # 1st YAML stream as content. %YAML 1.1 --- First --- Second ... ---}}} {{{--- # 2nd document is just text. | Text ---}}} At this point, we have all the power offered by mime headers - we can take _anything_ and convert it to YAML content by surrounding it with an appropriate header and footer, even YAML streams. Neat. My natural tendency is to go with C3 because it solves everything, is compatible with the current syntax, and is potentially the most readable, since the author decides on the best start/end lines for his content. I also love the cool feature of being able to turn anything (printable) into YAML content, including YAML steams. We have briefly discussed this feature a while back and rejected it... and speaking of features we have discussed way back, the start/end directives are suspiciously similar to the "here" flow scalar style: --- foo: bar: { baz : << -=-=-=- unindented content -=-=-=- } (Content is treated like an unindented literal block, always end with a line break). We have rejected both start/end directives and "here" content before, on the same grounds - they were seen as unnecessary complications which would be rarely, if ever, used in practice and could potentially greatly reduce readability. These reasons have merit, but they are subjective. Its a little-noticed fact that YAML doesn't force people to use significant white space; people are free to write the whole document in a flow style, making it seem very much like a normal Perl data structure: --- { foo: bar, baz: [ 1, { two : 2 } ] } ... However, YAML is still slightly biased against this use case; specifically, in such documents, a "here" flow scalar style is very useful indeed (because it is the only flow scalar style that isn't folded). The current answer for this is "use the literal block style", but of course that's not possible once the collection is written in the flow style. Similarly, the use of YAML to "encapsulate" arbitrary printable text may seem esoteric at a first glance, but is very useful for all sort of odd "edge cases", especially whenever YAML is used in a messaging system. I suppose the current answer for that is "use multi-part mime messages", but again that's not possible once the container is a YAML stream. So, in a sense, both start/end directives and "here" documents are required to make YAML a "complete" format. Thoughts? Have fun, Oren Ben-Kiki |
From: Oren Ben-K. <or...@be...> - 2004-11-12 06:33:39
|
On Thursday 11 November 2004 21:21, Clark C. Evans wrote: > On Thu, Nov 11, 2004 at 07:54:42PM +0200, Oren Ben-Kiki wrote: > | Sure, we could. Gain: no forbidden lines problem; No need for a > | '...' before '%' derivatives, ycat becomes a bit simpler. Pain: You > | are unable to slap a header in front of some piece of text and make > | it YAML. Is it worth it? Clark, what do you think? > > I've never used this "feature", indentation is easy: sed "s/^/ /g" > So, given the minimal end-user benefit, and irritating implementation > issues, I'm OK with forbidding these buggers. OK, I guess that the start/end directives and "here" flow style are too much ;-) But, if we go Brian's way - require top-level scalars to be indented - can we also change the start/end lines to be "%---" and "%===" (or some other pair of lines starting with "%")? This will completely remove the "forbidden lines" problem. Is that OK? Have fun, Oren Ben-Kiki |
From: Damian C. <dam...@gm...> - 2004-11-12 11:00:44
|
> > On Thu, Nov 11, 2004 at 07:54:42PM +0200, Oren Ben-Kiki wrote: > > | Pain: You > > | are unable to slap a header in front of some piece of text and make > > | it YAML. Is it worth it? Having to indent random text to include it as a single-element YAML stream does not seem egregious to me. Compared with the shennanigans that some other formats (such as XML) require to embed one text in another, it is simple and easy to implement, even at the level of shell scripts. To look at it another way, going to great lengths to allow '---' in column 0 permitted without confusing the parser in to starting a new document is likely to confuse the human reader even more! > But, if we go Brian's way - require top-level scalars to be indented - > can we also change the start/end lines to be "%---" and "%===" (or some > other pair of lines starting with "%")? Writers need to know to quote keys starting with '%' (I recently had to do this in a table of units of measure, which included '%' as a unit). Having to do the same with '---' used as a key is just the same, right? I think the use of '---' to separate documents has the advantage of being such an obvious separator that the human reader can recognize it as such without having to consider whether what follows the '%' is a directive or not. Consider the log-file use-case. The following fictional log is readable without knowing anything about YAML: --- date: 2004-11-12 message: overflow in silo 12 --- date: 2004-11-13 message: underflow in silo 11 Using '%---' (or even %{...%}) would be distracting at best. -- Damian -- Damian Cugley, Alleged Literature http://www.alleged.org.uk/pdc/ |
From: Oren Ben-K. <or...@be...> - 2004-11-12 12:20:43
|
On Friday 12 November 2004 13:00, Damian Cugley wrote: > > ... if we go Brian's way - require top-level scalars to be > > indented - can we also change the start/end lines to be "%---" and > > "%===" (or some other pair of lines starting with "%")? > > Writers need to know to quote keys starting with '%' (I recently had > to do this in a table of units of measure, which included '%' as a > unit). Having to do the same with '---' used as a key is just the > same, right? No. The '%' is a (reserved) indicator, like '!', '|', '>', '&', '*' etc. As such, it can't appear as the first character in a plain scalar anywhere - not just in unindented ones. In contrast, we can't forbid '-' from being a leading character because of negative numbers ("foo: -123"). So, the '---' can't start a plain scalar only if its unindented: --- --- : Ok, indented ... --- --- : Error, not indented ... I'd like to avoid this special case if possible. > Consider the log-file use-case. The following fictional log is > readable without knowing anything about YAML: > > --- > date: 2004-11-12 > message: overflow in silo 12 > --- > date: 2004-11-13 > message: underflow in silo 11 > > Using '%---' (or even %{...%}) would be distracting at best. I admit that "%---" stands out more. But, this is a document boundary, you'd expect it to stand out: %--- date: 2004-11-12 message: overflow in silo 12 %--- date: 2004-11-13 message: underflow in silo 11 %=== Using %{ %} is more quiet: %{ date: 2004-11-12 message: overflow in silo 12 %{ date: 2004-11-13 message: underflow in silo 11 %} There are a zillion more possibilities we could use. Surely we can find one that looks good: %- date: 2004-11-12 message: overflow in silo 12 %- date: 2004-11-13 message: underflow in silo 11 %= %--- date: 2004-11-12 message: overflow in silo 12 %--- date: 2004-11-13 message: underflow in silo 11 %+++ %---% date: 2004-11-12 message: overflow in silo 12 %---% date: 2004-11-13 message: underflow in silo 11 %+++% %%%%% date: 2004-11-12 message: overflow in silo 12 %%%%% date: 2004-11-13 message: underflow in silo 11 % % % We could also co-opt other indicators, like ">" and "|": >>> date: 2004-11-12 message: overflow in silo 12 >>> date: 2004-11-13 message: underflow in silo 11 ||| Hmmm. I think I like this one. Much more quiet than using '%' and still doesn't conflict with anything. How about it? Have fun, Oren Ben-Kiki |
From: trans. (T. Onoma) <tra...@ru...> - 2004-11-12 12:55:15
|
On Friday 12 November 2004 07:20 am, Oren Ben-Kiki wrote: | On Friday 12 November 2004 13:00, Damian Cugley wrote: | > > ... if we go Brian's way - require top-level scalars to be | > > indented - can we also change the start/end lines to be "%---" and | > > "%===" (or some other pair of lines starting with "%")? | > | > Writers need to know to quote keys starting with '%' (I recently had | > to do this in a table of units of measure, which included '%' as a | > unit). Having to do the same with '---' used as a key is just the | > same, right? | | No. The '%' is a (reserved) indicator, like '!', '|', '>', '&', '*' etc. | As such, it can't appear as the first character in a plain scalar | anywhere - not just in unindented ones. | | In contrast, we can't forbid '-' from being a leading character because | of negative numbers ("foo: -123"). So, the '---' can't start a plain | scalar only if its unindented: What's wrong with quoting it? --- "---" : Ok, quoted ... | --- | --- : Error, not indented | ... | | I'd like to avoid this special case if possible. Of all exceptions I can think of, that's about the most obvious. BTW, does '---' and '...' really have to be two different things? Why not just use '---' for both. It there is nothing but blank space below the final '---' the parser would know that it is a end stream marker. | > Consider the log-file use-case. The following fictional log is | > readable without knowing anything about YAML: | > | > --- | > date: 2004-11-12 | > message: overflow in silo 12 | > --- | > date: 2004-11-13 | > message: underflow in silo 11 | > | > Using '%---' (or even %{...%}) would be distracting at best. | | I admit that "%---" stands out more. But, this is a document boundary, | you'd expect it to stand out: | | %--- | date: 2004-11-12 | message: overflow in silo 12 | %--- | date: 2004-11-13 | message: underflow in silo 11 | %=== | | Using %{ %} is more quiet: | | %{ | date: 2004-11-12 | message: overflow in silo 12 | %{ | date: 2004-11-13 | message: underflow in silo 11 | %} | | There are a zillion more possibilities we could use. Surely we can find | one that looks good: | | %- | date: 2004-11-12 | message: overflow in silo 12 | %- | date: 2004-11-13 | message: underflow in silo 11 | %= | | %--- | date: 2004-11-12 | message: overflow in silo 12 | %--- | date: 2004-11-13 | message: underflow in silo 11 | %+++ | | %---% | date: 2004-11-12 | message: overflow in silo 12 | %---% | date: 2004-11-13 | message: underflow in silo 11 | %+++% | | %%%%% | date: 2004-11-12 | message: overflow in silo 12 | %%%%% | date: 2004-11-13 | message: underflow in silo 11 | % % % | | We could also co-opt other indicators, like ">" and "|": | | | date: 2004-11-12 | message: overflow in silo 12 | | date: 2004-11-13 | message: underflow in silo 11 | | | Hmmm. I think I like this one. Much more quiet than using '%' and still | doesn't conflict with anything. How about it? Please, no. Stick with '---'. (Oddly, my email program clobbered those last ones) T. |
From: Oren Ben-K. <or...@be...> - 2004-11-12 14:48:51
|
On Friday 12 November 2004 14:55, trans. (T. Onoma) wrote: > | In contrast, we can't forbid '-' from being a leading character > | because of negative numbers ("foo: -123"). So, the '---' can't > | start a plain scalar only if its unindented: > > What's wrong with quoting it? Nothing. It is just that there's a rule saying you _must_ quote it - only when its unindented. Its a wart. Besides, if we do that, what's the point of forbidding unindented top-level block scalars? Its just an ineffective half-measure. As long as we have a "forbidedn line" rule, what's wrong with applying it to all scalar styles? > | I'd like to avoid this special case if possible. > > Of all exceptions I can think of, that's about the most obvious. Fine. The way I see it, we either: - Keep things as they are (with unindented block scalars) - We pick a start/end line pair that doesn't conflict with plain scalars ('>>>'/'|||', '|---'/'|...', or whatever), require block scalars to be indented, and get rid of the "forbidden line" rule altogether. I don't think there's anything new to add here, so I'll shift the burden to Clark (sorry for doing this again :-) So, Clark, which way should it be? Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2004-11-12 17:35:23
|
On 12/11/04 16:48 +0200, Oren Ben-Kiki wrote: > On Friday 12 November 2004 14:55, trans. (T. Onoma) wrote: > > | In contrast, we can't forbid '-' from being a leading character > > | because of negative numbers ("foo: -123"). So, the '---' can't > > | start a plain scalar only if its unindented: > > > > What's wrong with quoting it? > > Nothing. It is just that there's a rule saying you _must_ quote it - > only when its unindented. Its a wart. > > Besides, if we do that, what's the point of forbidding unindented > top-level block scalars? Its just an ineffective half-measure. As long > as we have a "forbidedn line" rule, what's wrong with applying it to > all scalar styles? > > > | I'd like to avoid this special case if possible. > > > > Of all exceptions I can think of, that's about the most obvious. > > Fine. The way I see it, we either: > > - Keep things as they are (with unindented block scalars) > - We pick a start/end line pair that doesn't conflict with plain scalars > ('>>>'/'|||', '|---'/'|...', or whatever), require block scalars to be > indented, and get rid of the "forbidden line" rule altogether. > > I don't think there's anything new to add here, so I'll shift the burden > to Clark (sorry for doing this again :-) So, Clark, which way should it > be? What about me? :P I'd like to cast a strong vote for indenting top level block scalars, sticking with '---', and using quoting to remove ambiguities. Here's my rationale. By changing fundamental syntax on a whim we are sending a bad message to the community as well as creating backwards compatibility issues. I would also argue that indenting top level scalars is not a disruuptive change because nobody uses YAML for top level scalars. It is an edge use case at best. The changes to tagging that comprised YAML 1.1 were necessary because tagging was never well thought out in the first place. And we came out of it pretty clean since. In Perl's implementation for instance, tagging was sparsely used, namespacing was never used, and so in the end backwards compatibility was not really affected. While I was open to fixing tagging issues, I am not willing to open the doors to reinventing YAML. Many people in the world have a sense of what YAML is now, and I'm not going to go changing it on them. We'll never get off the ground doing that. Cheers, Brian |
From: trans. (T. Onoma) <tra...@ru...> - 2004-11-12 18:00:33
|
On Friday 12 November 2004 09:48 am, Oren Ben-Kiki wrote: | Fine. The way I see it, we either: | | - Keep things as they are (with unindented block scalars) | - We pick a start/end line pair that doesn't conflict with plain scalars | ('>>>'/'|||', '|---'/'|...', or whatever), require block scalars to be | indented, and get rid of the "forbidden line" rule altogether. | | I don't think there's anything new to add here, so I'll shift the burden | to Clark (sorry for doing this again :-) So, Clark, which way should it | be? Okay, but what was your take of using the same token for both start and end? Thanks, T. |
From: Oren Ben-K. <or...@be...> - 2004-11-12 18:13:37
|
On Friday 12 November 2004 20:00, trans. (T. Onoma) wrote: > Okay, but what was your take of using the same token for both start > and end? Well, consider: --- foo ... # No document yet --- Second document ... Pretty clear. Now consider: --- foo --- # No document here? --- Second document? --- Pretty confusing. And we'd have to change the rules so that '# No document' would be, actually, not a document instead of an empty document. Besides, if we aren't going to mess with '---' and '...', let's just not mess with them :-) Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-11-12 18:54:54
|
On Fri, Nov 12, 2004 at 01:00:22PM -0500, trans. (T. Onoma) wrote: | Okay, but what was your take of using the same token for both start and end? Our current indicators arn't quite begin/end, they are a bit more subtle. --- means "start a new document, if there is a previous document, then consider it finished" ... means "consider this document finished, but don't start a new one quite yet". Cheers! Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Oren Ben-K. <or...@be...> - 2004-11-12 18:10:35
|
On Friday 12 November 2004 19:37, Brian Ingerson wrote: > > I don't think there's anything new to add here, so I'll shift the > > burden to Clark (sorry for doing this again :-) So, Clark, which > > way should it be? > > What about me? :P Well, you already said what you want, I expected Clark to know that ;-) But seriously, because I view this issue as "just" an aesthetic judgement call, we can't argue it on merits. Its up to someone to flip the coin - and that's Clark's job. > I'd like to cast a strong vote for indenting top level block scalars, > sticking with '---', and using quoting to remove ambiguities. > > Here's my rationale. By changing fundamental syntax on a whim we are > sending a bad message to the community as well as creating backwards > compatibility issues. You could say that about the directive syntax as well... but point well taken. This is a strong argument for the "keep things as they are" alternative, which I'm OK with. > I would also argue that indenting top level scalars is not a > disruuptive change because nobody uses YAML for top level scalars. It > is an edge use case at best. It just isn't very useful to restrict only the top-level scalars. Since top-level collections can be unindented, forbidden lines may even appear inside quoted scalars: --- { foo : "bar --- ... baz" } ... So, there are 3 places where there's a problem: - Unindented simple plain scalar keys. - Unindented top-level block scalars. - Unindented scalar content inside flow collections. IMVHO it doesn't make sense to pick on one of these tree and introduce a special case to solve it, while leaving the other two unsolved. It is so much simpler to just have a single "forbidden lines" rule in the spec, and apply it to _all_ content. > While I was open to fixing tagging issues, I am not willing to open > the doors to reinventing YAML. Many people in the world have a sense > of what YAML is now, and I'm not going to go changing it on them. > We'll never get off the ground doing that. Well, I don't know if many documents out there actually use '---' and '...'. Obviously I don't view these to be as important a part of YAML as you do - certainly not enough to call changing them "reinventing YAML". The start/end directives and "here" flow style - that would be (a bit of) reinventing. Again, a subjective judgement call... That said, it seems nobody likes the idea of playing with '---' and '...'. So, how about I just keep things as they are today? Forget I even raised the issue :-) Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-11-12 18:51:11
|
To start, let me say I like Brian's proposal to require indentation in the ambiguous cases. It's more YAMLish -- using whitespace to mark structure. I also think that Oren is doing a fantastic job identifying loose ends and helping us tie them up! On Fri, Nov 12, 2004 at 08:10:36PM +0200, Oren Ben-Kiki wrote: | Since top-level collections can be unindented, forbidden lines may | even appear inside quoted scalars: | | --- | { foo : "bar | --- | ... | baz" } | ... Right. Also, by this logic %--- can also appear here, or any of the other delimiters you've proposed - yes? I suppose we could also require flow collections to be indented. | So, there are 3 places where there's a problem: | | - Unindented simple plain scalar keys. | - Unindented top-level block scalars. | - Unindented scalar content inside flow collections. | | IMVHO it doesn't make sense to pick on one of these tree and introduce a | special case to solve it, while leaving the other two unsolved. It is | so much simpler to just have a single "forbidden lines" rule in the | spec, and apply it to _all_ content. I like the current begin/end markers, and I'd rather not change them. I see the current issue as a tedious syntax issue, and not something that: (a) impacts the information model, or (b) seriously hinders usability. These were the two justifications for handling the plain scalar wart and adding a better %TAG mechanism. So, let me rule-out changing the begin/end markers. Can we get back to the initial issue? As I remember, the nasty use case was when %!PS-Adobe-3.0 postscript content appeared in the body of a non-indented scalar, here's what you wrote: On Sun, Nov 07, 2004 at 08:14:36AM +0200, Oren Ben-Kiki wrote: | - Directives appear at start of stream or following a '...'. | - Directives are always followed by a '---'. | - '...' may be repeated as long as there's nothing but | comments between the two '...' lines | - %YAML directive is allowed to specify no version number Let's contrast this with an extended version of Brian's solution, where all content is indented at least one space unless it was a top level block (not flow) mapping or sequence. With this alternative, the only issue is plain scalar keys starting with '---' or '...' we can just ban this sequence _anywhere_ in a plain scalar, no? This not only solves the problem, but it could perhaps be sold as an improvement in YAML readability. I could go with either solution. Oren's proposal, to limit directives between a '...' (or the begin of the file) and a '---' seems like the path of least resistence to solve the un-indented %issue. Anything I missed? Clark |
From: Oren Ben-K. <or...@be...> - 2004-11-12 21:39:50
|
On Friday 12 November 2004 20:51, Clark C. Evans wrote: > ... by this logic %--- can also appear here, or any of the > other delimiters you've proposed - yes? Right :-( > I suppose we could also=20 > require flow collections to be indented. Now that would be nasty. There's no need to require a space before the=20 '{'. > I like the current begin/end markers, and I'd rather not change > them. =A0I see the current issue as a tedious syntax issue, ... It is that, isn't it? <evil grin> > ... So, let me rule-out changing the begin/end markers. OK. > Let's contrast this with an extended version of Brian's solution, > where all content is indented at least one space unless it was a > top level block (not flow) mapping or sequence. Better: all _scalar_ content is indented by at least one space, unless=20 it is a simple key. This is just a "max(n,1)" added in two productions,=20 "flow-content" and "block-content". > With this=20 > alternative, the only issue is plain scalar keys starting with '---' > or '...' we can just ban this sequence _anywhere_ in a plain scalar, > no? This not only solves the problem, but it could perhaps be sold > as an improvement in YAML readability. Banning it everywhere in plain scalars is an extreme overkill, given=20 most of them will be indented anyway. It is better to ban it only where=20 it causes a problem. Simple, too: l-block-mapping(n) ::=3D l-block-map-entry(n)+ l-block-map-entry(n) ::=3D ( s-indent(n) ns-l-block-map-entry(n) ) - l-forbidden-content > I could go with either solution. Oren's proposal, to limit > directives between a '...' (or the begin of the file) and a '---' > seems like the path of least resistence to solve the un-indented > %issue. That proposal was: | - Directives appear at start of stream or following a '...'. | - Directives are always followed by a '---'. | - '...' may be repeated as long as there's nothing but | =A0 comments between the two '...' lines=20 | - %YAML directive is allowed to specify no version number It is actually more complex than the above approach (requires a new=20 variant for %YAML and so on). > Anything I missed? Yes, a decision :-) I'll spare you this time, though. I still itch to use '|---' and '|...'=20 instead of '---' and '...', but given this is out, Brian's way is=20 simpler. So, sed 's/^/ /' it is. I hope this settles everything, so=20 I'll get back to my spec.dbk... Have fun, Oren Ben-Kiki |
From: trans. (T. Onoma) <tra...@ru...> - 2004-11-12 21:48:46
|
On Friday 12 November 2004 04:39 pm, Oren Ben-Kiki wrote: | I'll spare you this time, though. I still itch to use '|---' and '|...' | instead of '---' and '...', but given this is out, Brian's way is | simpler. So, sed 's/^/ /' it is. I hope this settles everything, so | I'll get back to my spec.dbk... So what's the final outcome? Will this be illegal? --- | This is text. ... T. |
From: Oren Ben-K. <or...@be...> - 2004-11-12 22:14:28
|
On Friday 12 November 2004 23:48, trans. (T. Onoma) wrote: > So what's the final outcome? > > Will this be illegal? > > --- | > This is text. > ... Yes. Have fun, Oren Ben-Kiki |
From: Brian I. <in...@tt...> - 2004-11-13 00:10:48
|
On 12/11/04 23:39 +0200, Oren Ben-Kiki wrote: > On Friday 12 November 2004 20:51, Clark C. Evans wrote: > > Anything I missed? > > Yes, a decision :-) > > I'll spare you this time, though. I still itch to use '|---' and '|...' > instead of '---' and '...', but given this is out, Brian's way is > simpler. So, sed 's/^/ /' it is. I hope this settles everything, so > I'll get back to my spec.dbk... +1 Cheers, Brian |