On Fri, Aug 08, 2003 at 11:51:34PM -0700, Brian Ingerson wrote:
| On 08/08/03 10:17 +0300, Oren Ben-Kiki wrote:
| > Brian Ingerson [mailto:ingy@...] wrote:
| > > > Ahh. I think you are correct. This case is ambiguous:
| > > > key: |
| > > > # content or an empty scalar with a comment
| > >
| > > I don't see this as ambiguous. It is definitely content.
| To me the following cases are all obvious. Here's my rule:
| In the abscence of an explicit indentation indicator, all lines
| following the line with the pipe '|', are *content* until a
| line is reached that is the same or less indented than the line
| with the pipe. Dead simple.
So, according to this rule...
| > ---
| > Key0: |
| > # Thing (is this comment or content?)
| > ---
| > ### Section 1
| > Key1: value # This key...
| > Key2: | # That key...
| > Key3: | # This one requires
| > # a long comment (or
| > # is it content?
second two lines are content
| > Key4: | # This one doesn't, but...
| > ### Section 2 (is this comment or content?)
| > Key5: | # Not to mention:
| > # Content or comment?
| > ### Section 3 (this keeps getting better...)
Let me add another example:
one: | # more comment
this: is content
Oren, Does this work? If so, I'm game.
| > > I say, screw the lookahead and scan the whole scalar for the
| > > least indented line. That definitely does the most service to
| > > the user. For most implementations, this is a no brainer. For
| > > the C implementations on embedded devices, let them croak if
| > > the indentation can't be determined within the static buffer
| > > size. It's just a limitation of that parser.
| > +1. I just feel better knowing a truly streaming parser is possible
| > (even if it is more complicated). It is OK for a parser to have
| > (reasonable) implementation restrictions. At some level, all computers
| > re finite state machines pretending to be Turing machines anyway.
| > Implementation restrictions are inevitable.
Oren, you are a bit vague here. Here is what I think Brian is
proposing (please correct me if I'm wrong Brian):
The specification requires you to have random access to the
entire scalar value (so that you can detect the least indented
line). In this case, streaming YAML parsers which do not do
this (infinite buffer size) are not compliant; however, such
parsers could provide a user-defined maximum scalar buffer size.
IMHO, we have worked very hard up to this point to reduce lookahead
to a very minimal amount. I would very much like to keep it this way.
While Brian's suggestion may be good, I'd rather not change direction
at this point in time.
| poem: |
| YAML by Brian Ingerson
| The once was a language called YAML,
| That appealed to both reptile and mammal,
| Though neither would agree,
| On how scalars should be,
| So they fight on; the snake and the camel.
In particular, the above example is currently malformed YAML
and I'd rather keep it that way.
| OK. Streaming is still entirely possible as long as the parser reports
| which quoting style indicators were used, and the receiver maintains these
| same quoting style indicators.
Ick. The point of the built-in scalar styles is so that the
parser can handle this stuff for you. Am I mis-understanding?
| Let me give you a very simple example of why this is necessary:
| 1) A scalar with an embedded null can only be represented with
| double quotes.
| 2) If a parser reports scalars in 2k chunks...
| 3) And a scalar has a null character several megabytes into
| the string
| 4) The receiver would have to puke if it had chosen the wrong
| emission style.
| 5) So the safe thing is for the receiver to emit using the
| original style.
I think you are mixing what the emitter and parser requirements.
Certainly if the entire scalar is not provided to the emitter or
if the style is not given, then the emitter must use the double
quoted style, "just in case". However, I don't think this is
related to the parser issue above.
| BTW, comments could pretty easily be added to this API. I comments
| should be reported with whether they were on their own line or not.
On Fri, Aug 08, 2003 at 05:45:57PM -0600, Shane Holloway (IEEE) wrote:
| I like that Expat calls you back for comments, and I think it would be a
| good idea for YAML as well.
Yes, the YAML parser should report comments.
| So in that vein, perhaps the parser should read the scalar with the
| comments included, and have the returned object have a setting to determin
| whether the comments should be included. With this, there could be a
| scalar block setting to specify whether to exclude comments by default in
| the content? Or, perhaps make the comment leader itself a block setting,
| with a setting for no-comments/all data?
See Oren's response. Mixing content and comments gets ugly quick.
| As for column 0 comments, I really don't care for the look of them. I
| like my comments to flow at the same level as my content -- whether it
| be code or data. ;)
Brian's rule effectively does this (only that it is column 0 with
respect to the current indentation level).
On Wed, Aug 06, 2003 at 08:15:34AM +0300, Oren Ben-Kiki wrote:
| l-blk-empty-line-feed(n) ::=
| i-spaces(<=n) b-as-line-feed
| As long as the number of spaces is less than the indentation level, the
| line is considered "empty". Of course, the question is "what is the
| indentation level?". The parser needs to detect it. We already have a
| simple rule saying that the first non-empty line must not have any
| leading spaces. So if there's no explicit indentation, and a leading
| line contains only spaces - they must be all indentation spaces...
Ok. So, in this model, as long as the number of spaces in each
indentation is less than the indentation in the first non-whitespace
printable, then all is ok.
| This complicates the implementation a little bit; you need both a
| counter and a memory of the maximal number of spaces in the empty lines.
| Then, if it turns out the indentation is less than that maximum, the
| file is in error. For example ('.' stands for a space):
| ....block: |
| ........Indented four spaces; above line is OK.
| => "\nIndented four spaces; above line is OK.\n"
| ....block: |
| ......Indented two spaces; above line is in error.
| => Error.
| ....block: |2
| ......Above line is OK due to explicit indentation.
| => " \nAbove line is OK due to explicit indentation.\n"
........Six leading lines
| If you want to pinpoint the error, you need to also remember where the
| line with the maximal number of spaces was. That's still acceptable (you
| need three variables instead of one, big deal).
_why, does this sound resonable to you. It makes sense to me.