From: Oren Ben-K. <ore...@ya...> - 2002-01-19 13:37:52
Attachments:
spec.zip
|
Hi Guys. My E-mail connection has been really acting up so I resorted to setting up this yahoo account to send this. Hopefully tomorrow all will be fixed... At any rate, attached is the latest spec. It includes "everything", but there are problems with it. I fixed the examples but I think they need more work. The productions should be OK (but see below), however they are really re-worked so a review is most welcome. The YAC list needs minor updates according to our latest agreements, I think the spec is now a better reflection of them. There's also a problem I encountered when formalizing Brian's proposal for chomping folded scalars. It turns out this requires unbound lookahead (sigh). In the productions this becomes an ambiguity. So currently the spec has a built-in problem in it. An example of the problem: this: ]- normal value, # an empty lines # Another comment A streaming API would want to send the empty line to the application before it deals with the comment (it would want to send the comment too, for an editor...). However it wouldn't know whether it is content or comment. It isn't enough to just count empty lines because of LS/PS etc.; one would have to store the line break type. In short, a mess... BTW, when chomping is done there's *always* lookahead, as in here: block: |- value # 1,000,000 chars # of comment another: ... Only after the comment is done does the parser know if the line break is to be reported as content or not. Granted in this case the amount of memory required is fixed (one line break's worth), but it is still a wart. Also note that chomping doesn't handle well the case when one wants to have blank lines following a block value (for readability), like this: block: |- some value another: ... Because chomping will also remove the trailing line break from the block together with the "comment" empty line. Ways to avoid this being a problem: A. If a nested leaf is chomped, the rule is that *all* empty lines are comments, period. While this sounds "special" at first, note that the special case is actually the non-chomped leaf values, which are the only place where an empty line is content. It also seems restrictive, but how often would one need content with empty lines in it, which doesn't also end with a line break? B. We could give up on chomping altogether. If the final line break of a block should be ignored then use an escaped block and escape it: like : |\ this\ --- Neither of these solve the trailing lines problem (one would still have to give up the trailing line break to get rid of them). The only way around this that I can see is: C. Define two independent modifiers, one for empty lines and one for the final line break. This really seems excessive... 16 different style variants? Ugh! Finally, we could: D. Disallow comments in nested leaf values. Allow empty comment lines *after* the value only if preceded by at least one explicit comment line: this: |- some value # value done! empty # lines are now comments. another: ... This allows simple chomping (removing one single final line break) with no lookahead at all. I tend towards (D), myself. It seems to me that mixing comments and text value together causes more grief than it is worth. Perhaps one of you can find a better workaround... Either way IMO the current situation is unacceptable and needs to be fixed. I'll get back on-line this evening my time, if the timing is good we could IRC... Otherwise I'll just catch up with your replies tomorrow. Have fun, Oren Ben-Kiki __________________________________________________ Do You Yahoo!? Send FREE video emails in Yahoo! Mail! http://promo.yahoo.com/videomail/ |
From: Clark C . E. <cc...@cl...> - 2002-01-19 14:06:45
|
| At any rate, attached is the latest spec. It includes | "everything", but there are problems with it. Way cool. This is checked into CVS and is now accessable via HEAD at [1]. | this: ]- | normal value, | # an empty lines | | # Another comment | ... | | A streaming API would want to send the empty line to | the application before it deals with the comment (it | would want to send the comment too, for an editor...). | However it wouldn't know whether it is content or | comment. Ahh. In the (modified) case above, it must "chomp" the empty line, but it can't determine this till the comment is processed. Ok. Got it. Ick. Let me chew on it for a while... Clark [1] http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/yaml/spec/spec.html?rev=HEAD&content-type=text/html -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Brian I. <in...@tt...> - 2002-01-19 16:36:52
|
On 19/01/02 09:23 -0500, Clark C . Evans wrote: > | At any rate, attached is the latest spec. It includes > | "everything", but there are problems with it. > > Way cool. This is checked into CVS and is now > accessable via HEAD at [1]. > > | this: ]- > | normal value, > | # an empty lines > | > | # Another comment > | ... > | > | A streaming API would want to send the empty line to > | the application before it deals with the comment (it > | would want to send the comment too, for an editor...). > | However it wouldn't know whether it is content or > | comment. > > Ahh. In the (modified) case above, it must "chomp" > the empty line, but it can't determine this till > the comment is processed. Ok. Got it. Ick. > Let me chew on it for a while... There will obviously be different parser interfaces. A line by line interface would have no problems here. An interface used by a loader wouldn't care about comments. A push interface could report comments early. I'd like to know the _exact_ use case that is driving this. For whom is this causing a problem? Cheers, Brian |
From: Clark C . E. <cc...@cl...> - 2002-01-19 17:46:48
|
On Sat, Jan 19, 2002 at 08:36:45AM -0800, Brian Ingerson wrote: | > | A streaming API would want to send the empty line to | > | the application before it deals with the comment (it | > | would want to send the comment too, for an editor...). | > | However it wouldn't know whether it is content or | > | comment. | > | > Ahh. In the (modified) case above, it must "chomp" | > the empty line, but it can't determine this till | > the comment is processed. Ok. Got it. Ick. | > Let me chew on it for a while... | | There will obviously be different parser interfaces. A line by line interface | would have no problems here. An interface used by a loader wouldn't care | about comments. A push interface could report comments early. First, I want to veryify that I understand "chomp", it only clears the last new-line, right? Thus, it forces a one-new-line lookahead. I've been in-favor of "chomp" since this is my understanding... If "chomp" can kill multiple trailing new-lines, then it requires the parser create a variable length "to-be-determined" buffer for new lines that are encountered. If a non-new line is found, then the buffer is reported. Otherwise, if the scalar ends, then the buffer is discarded. Unforuntately, since there are many types of new lines (FS/LS) this simply can't be a counter... thus, if chomp does kill all trailing new lines I don't like it... complicates the parser a bit too much. I may have faulty memory, but I thought I agreed to the following rules: 1. Chomp only kills one trailing new line. 2. New-lines within scalars are significant, all other new lines are insignficiant 3. Comments can occur anywhere (even within within scalars) and are stripped. I'm reading into Oren's post that somehow spaces within scalars become insignificant due to interation with comments? Or somehow chomp can snip more than one trailing new line? With the above constraints, the parser need only store one new-line character; and thus avoids nasty variable-length look-ahead buffers. If the above rules are what we agreed, great, I don't see a problem. Otherwise, let's talk about this a bit... Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |
From: Clark C . E. <cc...@cl...> - 2002-01-19 18:17:20
|
| 1. Chomp only kills one trailing new line. | | 2. New-lines within scalars are significant, | all other new lines are insignficiant | | 3. Comments can occur anywhere (even within | within scalars) and are stripped. Ok. I think I know what happened. Brian may have always assumed that chomp kills many trailing new lines; but I (and possibly Oren) assumed otherwise since we didn't have any examples to conclude otherwise. And since we were not using blank lines liberally... this was consistent. Now that blank lines are liberally ignored outside scalars, it makes sense to allow chomping of more than one new line (with inter-spurced comments or otherwise). The problem with a multi-line chop is that it has an unbounded look-head buffer since a new line has multiple variants (FS,LS,NL). It seems we have three options: A) Restrict chomp to only trim one new line; this means that new lines in scalars (regardless of comment interaction) are always content... except for the trailing new line. B) Let the parser suffer an unbounded buffer. C) Re-define chomp to only strip NL, and not strip FS and LS. This will turn the look-ahead into a positive integer which is quite managable. I'd like A or C. Best, Clark -- Clark C. Evans Axista, Inc. http://www.axista.com 800.926.5525 XCOLLA Collaborative Project Management Software |