From: Stoll, C. <Car...@t-...> - 2002-03-12 14:03:06
|
>-----Ursprungliche Nachricht----- >Von: Joel de Guzman [mailto:dj...@gm...] >Gesendet: Donnerstag, 7. Marz 2002 09:53 >An: spi...@li... >Betreff: Re: [Spirit-general] MAJOR BREAK-THROUGH !!! Yabadabadoo... >Must Read :-) > > > >Hello, > >Thanks for all your comments. As promised March is a Spirit hacking >month for me. It's coding time! I hope to deliver some cool stuff. > >1) About LL parsing: > >It is clear now that Spirit is an FP parser. The initial version was >recursive descent. Soon, I wish to implement Doug's idea not to limit >Spirit to RD. Since I am more accustomed to top-down, predictive LL(1) >will be my first attempt. The ideas are already forming in my mind and >implementation wise, it is very close to what we already have. The >nice thing about LL is that the hierarchical object composite is a >direct model of the grammar. > >Surely, LL will have more limitations than the uncostrained >backtracking RD parser that we already have. For instance, >spirit::ll_1 will only take in purely *static-parsers* from which a >meta-program can extract the first-follow information. There are >advantages of course. For instance, a forward-iterator is all that is >needed since there is no need to look-forward. We all know that many >well designed languages (not C/C++) can be LL(1). So, we can use LL(1) >or unconstrained back-tracking RD interchangeably on a per-grammar- >module basis. We can use the right tool for the right job. > >2) About LR parsing: > >Automata **is an implementation detail**. Bottom-up parsing can be >done using FP. Again, Spirit is an FP parser. From the research that I >have done, I know now that recursive-ascent is equivalent to FP >bottom-up parsing. YACC style LR parsers use automata. Compared to FP >parsers, this is more or less synonymous to using a virtual machine. I >have this feeling that recursive ascent, when done right with modern >C++ metaprogramming techniques will leave any automata based LR >implementations in the dust. > >At this point, I am not sure yet on the exact implementation strategy. >Surely, it will be functional (FP). Yet, I have very scarce research >material on RA (recursive ascent). I imagine that it will be some sort >of grammar-tree transformation but that too might be too simplistic. >My attack strategy is to first implement LL(1) then experiment on >writing some parser directives that involve more extensive parser- >hierarchy transormations (ala longest, shortest directive). Examples >are a parser directive that will transorm left-recursion into a valid >LL grammar. Example: > > a = b | a >> op >> b; > >which will internally, metaprogrammatically, modify the non-LL >structure to this: > > a = b >> *(op >> b); > >and another parser directive that will do left-factoring. Example: > > r = a >> b >> c >> d | a >> b >> c >> e; > >a non-LL(1) grammar, will be transformed to this: > > r = (a >> b >> c) >> d | e; > >As with the phoenix experience, the experience gained here I feel will >lead us towards the RA implementation. > >***I welcome some help doing the experimentation and research*** > >3) Numbers in rule_id: > >I knew you'd say that Doug :-)... > >One problem with using types as IDs is that it has to be declared in a >namespace somewhere outside the local function scope. Not much of a >problem but a bit awkward. A struct encaspulating the grammar might be >a good place. (BTW, I am also thinking about enum types). > >There is no problem of rule merging at all as you have said. Why?, >because grammar snippets are scoped. If you have read about my recent >posts on local-variables and local-functions in Phoenix, IDs outside a >scope are not directly visible. A grammar snippet is like a class that >cannot be extended outside it. In fact, you can even have nested >grammar snippets where the inner grammar is totally independent of the >outer grammar (like nested classes). For example: > > rule_id<0> a; > rule_id<1> b; > rule_id<2> c; > > a = ... > b = ... > c = ( > > a = ... > b = ... > c = ... > ) > >The nested rule a is a totally independent rule from the outer rule of >the same name. To access the outer scope of a, you will have to write >something like: a.outer<1>() <<< not yet implemented in the code >example presented but I know how it is done >>>. Outside the grammar, >given a grammar 'g', you'll have to write something like: g.get(id) to >explicitly ask for an inner rule in 'g' given a rule_id 'id'. As it >is, 'g' is just an opaque black box <<< also not yet implemented in >the code example presented >>>. > >Something like this will happen in Doug's example where >primary_expression is passed in as a parameter. The passed-in argument >will be like the outer rule c above. rule_ids only apply to its >immediate scope. > This looks good, does it enable rekursive definitions, too? rule_id<0> a; rule_id<1> b; b = ( a = ..., b = b.outer<1>() ... ) BTW, which rule of b is called, when I do a "b.parse(..)"? >4) About the single parser arg: > >This is an internal implementation model. As Dan said, we can have an >outer parse(first, last) interface as before. Dan and I were mildly >shocked by the performace loss. I guess this is because there are >zillions of small parse functions to call in a typical parsing >traversal. > ><<< BTW, I like the idea of a language with only single argument and >single return polymorphic functions plus a tuples facility. >>> > >5) Access to grammar rules (Jcab?) (private, public etc.). This is a >good idea. I think all of these concerns can be done. Example: > > rule_id<0> a; > rule_id<1> b; > rule_id<2> c; > > public_, > > a = ..., > b = ..., > > private_, > > c = ... > >Or: > > rule_id<0> a; // defaults to public > rule_id<1> b; // defaults to public > rule_id<2, private_t> c; > > a = ..., > b = ..., > c = ... > > <<< Or should private_ be default? >>> > I wouldn't couple the access type to the rule id, what if the global a should be public, but b.a is private public_, a = ..., b = ( private_, a = ..., public_, b = ... ),... >6) Crying compilers... > >Yes, I pity the compiler. As always, factoring is a good thing. I >think that we all have the right ingredients now to do real grammar >snippets. I'm sure there will be more concerns to come. I guess >rewriting Hartmut's C parser will make the design flaws visible. > >I wonder how the compiler will be able to take a single C-grammar >expression... I pity the compiler :-+. Can it handle that at all? >There's only one way to find out :-) Perhaps we can win a world record >for the longest C++ single expression! > I had already to shorten some of my rules, because their type had too many nested templates (it was about seven alternatives with longer sequences inside and some attached actions ;-) >7) Performance: > >Already, Dan reported that the XML parser is 50% slower than it's >hand-written C counterpart (Unfair considering no validation is done >in the Spirit parser). My intuition tells me that a no-virtual >function overhead should give a further boost in performance. But >wait, it doesn't stop there. Wait till we get to LL(1) and LR(1) >deterministic parsing! > >------------------------------------ > >Ok, that's it for today. Again, thank you all for your valuable >comments and insights. > >Many thanks, >--Joel > >PS> The FTP storage is great! Thanks JCAB. Source forge once had >public FTP facilities but AFAIK has been dropped due to heavy traffic >and some more problems. > CS |