From: Brian I. <in...@tt...> - 2002-01-29 03:07:15
|
On 28/01/02 20:25 -0000, Oren Ben-Kiki wrote: > Brian Ingerson wrote: > > > Sure, no problem about that, but consider that in this case, > > > if (D) is not accepted, the parser may return a comment chunk > > > and only then a chunk of the leaf value *which started before > > > the comment chunk* it previously reported. In fact, the parser > > > may report any number of comment chunks and then a leaf value > > > which started before all of them. > > > > No. I don't see it that way. The parser just returns the next chunk, > > whether > > it be a comment or a part of the leaf. Everything is still in order. The > > application just needs to expect either. > > The problem is that due to folding, you may have to report > a text content "chunk" a long long time after you have > actually seen it: Good point. I could make arguments against it but I've actually been a fan of 'D' all along. I just wanted to make sure there was due diligence. > like this: ]- > This is a line feed -> > > # and these > # are 1,000,000 > # lines of comment, > # reported in 1,000 > # repeated calls to > # "get chunk". > Ah, text. Goody. The parser > can finaly report the line feed > it has seen a 1,000,000 lines ago. > > This isn't a problem if all you are interested in is the > content in the file. It *does* become a problem if you > are a YAML editor, because an editor is also interested in > the syntax of the file. > > In extreme cases, a single *character* starts before a > comment and ends *after* it! Example: > > this: \ > First line feed -> > # Is it up here: ^ > # or down there: v > > # Not to mention > # the second one: > > <- Second line feed > > Granted this isn't insurmountable for an editor > writer. But I think it is PITA. Also, just > reading this gives me a headache. It is very > unclear there are two consecutive empty content > lines in the example above. > > So (D) is about keeping the syntax model > and syntax-level tools simple. This includes > human eyeballs reading YAML files :-) > > You rightly ask: > > I fail to see the relationship between C and D. Aren't they separate > > issues? > > The original motivation was just addressing lookahead, so both > (C) and (D) were seen (by me at least) as "less necessary" if > you assume that the other one is used. > > (C) said that chomp removes the final sequence of NL characters > (line feeds, after normalization). This meant that it was possible > to allow comments to mix with leaf value, because lookahead was > reduced to a counter. > > (D) said that the first explicit comment line terminates the > leaf value. As originally specified, it also said that chomp > removed the single final line break of any form from the value. > > As you point out, (D) and (C) can co-exist. That's just not > the way I first thought of them. > > Considering proposal (E) = (C) + (D): A leaf value ends at > the first explicit line, and chomp removes all trailing > normalized line feed characters (not LS or PS)... I like > it. It allows one to write: > > this: ]- > value > > followed by: ]- > this one > > # this was the old value, > # it is now commented out. > > # this is an alternative > # value. Neat, isn't it? > > # And neither the processor > # nor the human reader have > # to read all the way down > # to here to know what the > # value really is. > > which is: ]- > Readable *and* efficient to parse. > > I think option (E) is the best of both worlds... > If I "tended towards" (D), I'd "much rather" use > option (E). Would (E) work for you? I like E too. So +1. Cheers, Brian |