From: Tim P. <ti...@po...> - 2004-08-26 22:36:13
|
Hi, I'd just like some feedback on the valid characters in a simple=20 block mapping. (referring to the 1.0 draft spec) [205] ns-l-block-simple-entry I've tried expanding this but ignoring all of the option properties and aliases <ns-plain-single> <c-mapping-value> <ns-plain-multi> and expanding ns-plain-single <ns-plain-first-char><nb-ns-plain-chunk>? <c-mapping-value> <ns-plain-first-char><nb-ns-plain-chunk>? <s-ns-plain-next>* This leads to my first problem in that the definition of ns-plain-first-char seems to be=20 ( ns-plain-char(c) - c-top-indicators ) | ( -?:, ns-plain-char(c) ) and ns-plain-char(c) =3D nb-plain-char(c) - s-char and nb-plain-char(c) =3D nb-plain-char-key OR nb-plain-char-out and nb-plain-char-key =3D ( nb-char - :#,[]{} ) OR=20 ns-plain-char(key) # OR : ns-plain-char(key) OR , ns-plain-char(key)=20 and nb-plain-char-out =3D ( nb-char - :# ) OR=20 ns-plain-char(key) # OR : ns-plain-char(key)=20 I'll try to expand this a little ( ns-plain-char(c) - c-top-indicators ) | ( -?:, ns-plain-char(c) ) ( (ns-plain-char(key) # - s-char) - c-top-indicators ) OR ( (-?:, : ns-plain-char(key)) - s-char ) ( ((nb-plain-char(key) - s-char) # - s-char) - c-top-indicators ) OR ( (-?:, : (nb-plain-char(key) - s-char)) - s-char ) ( (((, ns-plain-char(key)) - s-char) # - s-char) - c-top-indicators ) OR ( (-?:, : ((: ns-plain-char(key)) - s-char)) - s-char ) ( (((, (nb-plain-char(key) - s-char)) - s-char) # - s-char) - c-top-indic= ators ) OR ( (-?:, : ((: (nb-plain-char(key) - s-char)) - s-char)) - s-char ) I chose to stop at this point but I think you can see it could carry on... = at the end I got this ( (((, ((nb-char - :#,[]{}) - s-char)) - s-char) # - s-char) - c-top-indi= cators ) OR ( (-?:, : ((: ((nb-char - :#,[]{}) - s-char)) - s-char)) - s-char ) which could expand to=20 :::d Trying out a few other of these combinations gives valid simple keys of a 2 $ ^ -$ ?^ :2 ::s,## :?### .:,:,:,s^## but not : # { [ and nb-plain-char can be: :# {:, hence ns-plain-single can be : {:,## :#{:, so a simple block mapping can be: {:,## :#{:,: :X###{:,## which would be a key of {:,## :#{:, and a value of :X###{:,## ....phew!!! I've just re-read that and I think it's correct to the spec. However, although it follows the spec it doesn't seem quite right? (and I think it might baffle most parsers out there.) The problem seems to be caused by: [155] nb-plain-char-key contains=20 [157] ns-plain-char(flow-key) contains [154] nb-plain-char contains [155] nb-plain-char-key which is recursive. I've spent a good 5 or 6 solid days looking at the spec over the last two months and have even written a couple of tools to try to further understand it. http://pollenation.dyndns.org/~tim/dc/processyamlspec.py?start=3Dl-yaml-str= eam&levels=3D2&list=3D1 I can understand the desire for a formal grammar defining YAML if it is one that can be formally tested but without an available bnf parser that can process the YAML pseudo BNF it is impossible to test for these types of infinite recursion. I may have understood everything wrong but if I can't understand the spec for a simple key value mapping after spending so much time on it I think an alternative method of (possibly formally) explaining the specification would be useful. Please don't think I'm making waves as I really want to get a parser built and I think that YAML is one of the best new ideas around. I want to see YAML and ReST as a combined data serialisation and content representation toolkit and have already blueprinted a web content framework defining content with YAML/ReST and Twisted/Nevow. I'm currently trying to write a comprehensive suite of tests for YAML starting with the very basics and working up and including many 'fail' examples to push the edge cases. Anyway I'm waffling now :-) Regards Tim ps I'm unclear about how the existing YAML parsers manage to cope with this self recursion (this recursion exists in the jan 2004 spec also [193->192->189->193]). Could parser writers tell me how they got around this, if at all? |
From: Oren Ben-K. <or...@be...> - 2004-08-29 17:39:58
|
On Friday 27 August 2004 01:36, Tim Parkin wrote: > Hi, > > I'd just like some feedback on the valid characters in a simple > block mapping... > > and expanding ns-plain-single The pertinent productions in the in-work draft are: [156] nb-plain-char-key ::=3D ( nb-char - =93:=94 - =93#=94 - =93,=94 - =93[=94 - =93]=94 - =93{= =94 - =93}=94 ) | ( ns-plain-char(flow-key) =93#=94 ) | ( =93:=94 ns-plain-char(flow-key) ) | ( =93,=94 ns-plain-char(flow-key) ) =20 [157] ns-plain-char(c) ::=3D nb-plain-char(c) - s-char [158] ns-plain-first-char(c) ::=3D ( ns-plain-char(c) - c-top-indicators ) | ( ( =93-=94 | =93?=94 | =93:=94 | =93,=94 ) ns-plain-char(c) ) > This leads to my first problem in that the definition of > ns-plain-first-char seems to be... >[recursive!] > I chose to stop at this point but I think you can see it could carry on... > at the end I got this... > which could expand to > > :::d Yup. > Trying out a few other of these combinations gives valid simple keys of > > a > 2 > $ > ^ > -$ > ?^ Right. > :2 > : > ::s,## > : > :?### > > .:,:,:,s^## Correct. > but not : > > # > { > [ Yes, illegal (for obvious reasons). > and nb-plain-char can be: > :# Yes. > {:, No. '{' can't appear in an plain text, period. Neither can '}', '[', ']'.=20 Sorry. > hence ns-plain-single can be : > > {:,## :#{:, Nope. It could be, however,=20 -:,## :#?: > so a simple block mapping can be: > > {:,## :#{:,: :X###{:,## Nope (all these '{'). But it could be: -:,## :#?:,: :X###-:,## > ....phew!!! I've just re-read that and I think it's correct to the spec. You missed the '{}[]' being completely invalid. Ther are removed in=20 nb-plain-char and are _never_ introduced in any other production. > However, although it follows the spec it doesn't seem quite right? (and > I think it might baffle most parsers out there.) Well, I hope not. The thing is the recursive rule is really saying "don't h= ave=20 this character near a space". In practice, a parser (lexer) would probably= =20 use a different technique to enforce this constraint. > The problem seems to be caused by: > > [155] nb-plain-char-key > ... > which is recursive. Yes. This is just a BNF mechanism. BNF isn't the most convenient form to=20 express some things, but that's all we have. > I've spent a good 5 or 6 solid days looking at the spec over the last > two months and have even written a couple of tools to try to further > understand it. > >=20 http://pollenation.dyndns.org/~tim/dc/processyamlspec.py?start=3Dl-yaml-str= eam&levels=3D2&list=3D1 Hey, nice! I did something along these lines in the examples following the = BNF=20 rules - take a peek at the pre-posting version Clark has put up. > I can understand the desire for a formal grammar defining YAML if it is > one that can be formally tested but without an available bnf parser that > can process the YAML pseudo BNF it is impossible to test for these types > of infinite recursion. Not really. Recursive BNF rules are the norm in almost every realistic=20 language, there's nothing especially problematic about using them for=20 "characters" as opposed to, say, "C code blocks". > I may have understood everything wrong but if I can't understand the > spec for a simple key value mapping after spending so much time on it I > think an alternative method of (possibly formally) explaining the > specification would be useful. Again, I encourage you to take a peek at the version Clark has put up. The= =20 productions aren't 100% debugged yet (I'll get on that first thing after I= =20 get the 'Z' books into their shelves, promise :-). But there's a whole new= =20 way of correlating the examples to the BNF that should clarify the syntax=20 better. It would be especially helpful if you could submit specific example= s=20 that I could add to clarify whatever points you find especially sticky. > Please don't think I'm making waves as I really want to get a parser > built and I think that YAML is one of the best new ideas around. Not at all. I'm delighted someone is actually reading the spec and finding= =20 problems in it! I wish more people did :-) Carry on the good work, Oren Ben-Kiki > ps I'm unclear about how the existing YAML parsers manage to cope with > this self recursion (this recursion exists in the jan 2004 spec also > [193->192->189->193]). Could parser writers tell me how they got around > this, if at all? A parser "simply" uses a stack to implement recursion - much in the same wa= y=20 that, say, the C run time uses a stack to do the same. In fact some parsers= =20 directly use the execution stack (while others build one of their own). If= =20 you think of each rule as a function call whose side effect is to consume=20 some characters, it all falls into place. There are many tricks to writing a parser, but recursion isn't usually your= =20 main problem. In fact, being able to recurse rules is (roughly) what=20 seperates a "parser" (BNF, context free language) from a lexer (regular=20 expression, finite automata). "To iterate is human, to recurse, divine." Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-08-30 04:54:51
|
Tim, The rules for the plain scalar are quite complicated: a) each content line starts and ends with a non-space character, leading spaces are ignored and trailing spaces are an error b) in all cases ": " and " #" are forbidden c) if the plain scalar is a key or is inside a flow collection, then the brackets {} and [] are forbidden; furthermore, in these cases, ", " is forbidden. d) the 'first line' of a plain scalar has a further restriction: it may not start with an indicator character... unless that character is followed immediately by another non-space character and is the dash, question, colon, and comma. e) keys may not contain a newline f) flow scalars are limited to 2K (BTW, where is this restriction mentioned?) You are totally right that this is complicated, let me justify each exception: a) sometimes people indent values in a configuration file, and having leading spaces be significant would be quite the headache; also, trailing spaces are almost always an error, if you want them, use single quotes b) ' #' starts a comment, ': ' starts a value c) ", " is OK in the value of a plain scalar, such as one's name: "Evans, Clark", however, it is the delmiter for flow mappings and sequences; similarly {} and [] indicate flow mappings and sequences d) this one is obvious, one should be using '*' or other items significant to YAML here 1) since list items start with a '- ' it is OK to allow -3 for negative numbers; thus a special exception for this indicator b) I have no clue why the special exception exists for ? : and , in this context; IMHO, these special cases should be nixed e) needed so that detection of keys is is limited to a sigle line f) this is a bugger of a production, by limiting it to 2K we can help stop parser bugs or at least, keep the damage limited The result is quite a mess; and quite possibly the most irritating thing about the YAML spec. While one may think for a moment that the rules could be made easier for the developer, lots of things one would "expect" you could do, such as putting a http://url as a value wouldn't work (this is exception b, for example). ... Oren/Brian, As far as the productions for the plain scalar, perhaps we could provide a set of regular expressions? This could make writing parsers a bit easier? Also, for ns-plain-multi, isn't ns-plain-single optional? Kind Regards, Clark -- Clark C. Evans Prometheus Research, LLC. http://www.prometheusresearch.com/ o office: +1.203.777.2550 ~/ , mobile: +1.203.444.0557 // (( Prometheus Research: Transforming Data Into Knowledge \\ , \/ - Research Exchange Database /\ - Survey & Assessment Technologies ` \ - Software Tools for Researchers ~ * |
From: Tim P. <ti...@po...> - 2004-08-30 14:02:14
|
On Sun, 2004-08-29 at 18:39, Oren Ben-Kiki wrote:=20 > > {:, >=20 > No. '{' can't appear in an plain text, period. Neither can '}', '[', ']'.= =20 > Sorry. > > hence ns-plain-single can be : > > > > {:,## :#{:, >=20 > Nope. It could be, however,=20 >=20 > -:,## :#?: >=20 > > so a simple block mapping can be: > > > > {:,## :#{:,: :X###{:,## >=20 > Nope (all these '{'). But it could be: >=20 > -:,## :#?:,: :X###-:,## Cool that explains things better. Clark's post also adds some welcome expansion. I think I misunderstood the link between ns-plain-char and nb-char. Would this be simpler using look ahead/look behind regexps or do you already have a YAMLBNF parser implemented using this grammar? > Not really. Recursive BNF rules are the norm in almost every realistic=20 > language, there's nothing especially problematic about using them for=20 > "characters" as opposed to, say, "C code blocks". I don't have a problem with BNF or EBNF, but a pseudo EBNF with no available implementation, and hence no way of testing productions against, just means I have to try to understand everything in my head before I can come up with examples. I agree that to formally define a grammar BNF can be useful but only if you know how to process it and hence are able to comprehensively test and prove things with it If there were a YAMLBNF parser then I would have no problem with the YAMLBNF (I could use it to test my understanding against) apart from the desire for more clarity in explaining the reasoning behind the productions. I think the condesation of different state examples into the YAMLBNF is the equivalent of compiling source into bytecode. More efficent, yes. More succint, mmm, yes. but dark that way .. sorry <ahem> but it makes reverse engineering them back into a legible form a formidable task. The addition of the sort of explanations that I've just seen in my inbox from Clark makes things a lot clearer. I'm sure there are a few things that can be done to make the YAML spec easier to get into and I'll try my best to help once I'm fairly confident about the spec myself. > Again, I encourage you to take a peek at the version Clark has put up. Th= e=20 > productions aren't 100% debugged yet (I'll get on that first thing after = I=20 > get the 'Z' books into their shelves, promise :-). But there's a whole ne= w=20 > way of correlating the examples to the BNF that should clarify the syntax= =20 > better. It would be especially helpful if you could submit specific examp= les=20 > that I could add to clarify whatever points you find especially sticky. I've seen the new examples using the borders around different productions and it's great for explaining straightforward examples of complex productions, which are necessary, but not so good at explaining edge case of simple yet highly state dependant productions. Or rather let me rephrase that. The examples are good but I'd like to see a LOT more that would document the edge cases. BUT thats something I'm working on at the moment, hence my push to understand the YAMLBNF in detail. If I get the opportunity/time I'll also try to document the edge case tests and send them back to you for possible inclusion in an appendix to the docs? Many many thanks for the response. Tim |
From: Oren Ben-K. <or...@be...> - 2004-08-30 19:43:19
|
On Monday 30 August 2004 17:02, Tim Parkin wrote: > If there were a YAMLBNF parser then I would have no problem with the > YAMLBNF (I could use it to test my understanding against) apart from the > desire for more clarity in explaining the reasoning behind the > productions. Well, I've been working on one. Or at least have been trying to make time to do so. Every time I start getting somewhere, spec work pops up and eats all my free time (I've come to believe its a curse). > The addition of the sort of explanations that I've just seen in my inbox > from Clark makes things a lot clearer. I'm sure there are a few things > that can be done to make the YAML spec easier to get into and I'll try > my best to help once I'm fairly confident about the spec myself. You are most welcome to do so. Its a tough nut to crack and any notions are welcome. > If I get the opportunity/time I'll also try to document the edge case > tests and send them back to you for possible inclusion in an appendix to > the docs? That would be great. A comprehensive set of examples would do wonders. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-08-30 06:51:24
|
Oren/Brian, I'm thinking of simplifying the plain-scalar productions: (a) make '- ' illegal within a plain scalar, this would help syntax highlighters signal when a list entry happens, the current spec makes this very difficult (b) remove top-level exceptions for plain scalars like '?X' ':X' and ',X' where X is a non-space character; I don't see the reason for these exceptions, if you want this sort of thing... quote it! yea? Thus, [155] nb-plain-char-out ::= ( nb-char - ':' - '#' - '-' ) | ( ns-plain-char(flow-out) '#' ) | ( ':' ns-plain-char(flow-out) ) | ( '-' ns-plain-char(flow-out) ) [156] nb-plain-char-key ::= ( nb-char - ':' - '#' - '-' - ',' - '[' - ']' - '{' - '}' ) | ( ns-plain-char(flow-key) '#' ) | ( ':' ns-plain-char(flow-key) ) | ( ',' ns-plain-char(flow-key) ) | ( '-' ns-plain-char(flow-key) ) [157] ns-plain-char(c) ::= nb-plain-char(c) - s-char [158] ns-plain-first-char(c) ::= ( ns-plain-char(c) - c-top-indicators ) | ( '-' ns-plain-char(c) ) |
From: Clark C. E. <cc...@cl...> - 2004-08-30 07:37:24
|
Ok. Assuming these two (rather minor) simplifications: | (a) make '- ' illegal within a plain scalar, this would | help syntax highlighters signal when a list entry happens, | the current spec makes this very difficult | | (b) remove top-level exceptions for plain scalars like | '?X' and ',X' where X is a non-space character; I | don't see the reason for these exceptions, if you want | this sort of thing... quote it! yea? I've implemented what I think is a relatively complete version of ns-plain-simple (and thus nb_ns_plain_chunk). This is attached, as well as a character table header file, and the python code used to make the character table. Clearly the python code isn't needed once you have the header file. These two simplifications make the implementation easier, and the productions shorter; also, they probably make YAML more readable. Best, Clark |
From: Clark C. E. <cc...@cl...> - 2004-08-30 07:49:15
|
Sorry to spam the list with source code, this is checked into the repository at: http://cvs.sourceforge.net/viewcvs.py/yaml/libyaml/ Best, Clark P.S. Yes, I know, it doesn't handle whitespace or non-ascii character sequences, yet. |
From: Oren Ben-K. <or...@be...> - 2004-08-30 19:54:51
|
On Monday 30 August 2004 10:37, Clark C. Evans wrote: > Ok. Assuming these two (rather minor) simplifications: > | (a) make '- ' illegal within a plain scalar, this would > | help syntax highlighters signal when a list entry happens, > | the current spec makes this very difficult I don't see it would help. A list entry always matches the regexp: "^\s*(-\s+)+". Forbidding '- ' inside plain scalars buys you nothing in this regard and it does costs you - having to quote simple English sentences with a dash in them (like this one :-) say: What's the matter - never seen a dash before? :-) > | (b) remove top-level exceptions for plain scalars like > | '?X' and ',X' where X is a non-space character; I > | don't see the reason for these exceptions, if you want > | this sort of thing... quote it! yea? No. URLs: http://foo.bar/cgi-bin/script?a=b tag:foo.bar,2002/baz I do not want to have to quote URLs. We've jumped through hoops to allow the '#' (fragment) just for this case. Well, URLs also use , and ?. And let's not forget numbers (1,000). If you are talking just about the _first_ character of a plain scalar - well, we _could_ rule out ',', '?' and ':' as first characters, if we wanted to (we have to allow '-'). I don't see it gains us anything, though - certainly it doesn't simplify the implementation any. > These two simplifications make the implementation easier, and > the productions shorter; also, they probably make YAML more > readable. Alas, they don't :-( Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-08-30 23:22:20
|
On Mon, Aug 30, 2004 at 10:54:46PM +0300, Oren Ben-Kiki wrote: | > These two simplifications make the implementation easier, and | > the productions shorter; also, they probably make YAML more | > readable. | | Alas, they don't :-( Yea. You turn out to be spot on. I think I've implemented the ns_plain_single correctly, could you comment on the implementation [1]? Clark [1] cvs.sf.net:/cvsroot/yaml/libyaml , a slightly-older snapshot, http://cvs.sourceforge.net/viewcvs.py/yaml/libyaml/plain.c |
From: Oren Ben-K. <or...@be...> - 2004-08-31 19:36:07
|
On Tuesday 31 August 2004 02:22, Clark C. Evans wrote: > Yea. You turn out to be spot on. I think I've implemented the > ns_plain_single correctly, could you comment on the implementation [1]? I'll try to clear some YAML time this weekend, I'll take a good look at the code then. Have fun, Oren Ben-Kiki |
From: Clark C. E. <cc...@cl...> - 2004-09-20 09:50:57
|
Ok. I'm continuing to work quite slowly, on a "C" implementation that is a pull-parser and push-emitter. The code for this is checked into CVS at sf.net [1]. If I keep up at one day per week, it'll be finished by early next year, perhaps. I'm trying to follow the specification as exactly as I can. If anyone would like to 'jump-in', you are certainly welcome; in particular, it'd be cool to have help doing the emitter. What is started: - public parser/emitter interface (yaml.h) - basic skelliton for parser/emitter - it knows about single-line plain scalars - it knows about explicit documents, but only with a single-line plain scalar - it does leading single-line comments Next steps: - adding support for 'nesting' so I can do collections - figure out how to report scalars in chunks. - add very simple sequence - add python binding Cheers! Clark [1] cvs.sf.net:/cvsroot/yaml/libyaml http://cvs.sourceforge.net/viewcvs.py/yaml/libyaml/ |