Thread: [cedet-semantic] indentation engine for Emacs Ada mode; help getting started
Brought to you by:
zappo
From: Stephen L. <ste...@st...> - 2012-10-06 20:23:30
|
I'm rewriting the indentation engine for Emacs Ada mode. The trigger was the recent sytax introduced by Ada 2012; it breaks the old indentation engine. I've got a smie-based engine working fairly well, but it has a major problem; there are some keywords that can only be properly recognized by doing a full parse from the start of the file. So I'm looking at semantic, which could be a better parser for that style. First, is anyone aware of an indentation engine based on semantic? A working example would be a big help. For an indentation engine, I need to cache information for each found token; mainly where the start of the enclosing statement is. I'm having a hard time getting started; I can't find the main parsing loop, to see what hooks are available for caching things. I found the top level function 'bovinate'; when I run it on a python buffer (for example), it outputs semantic information; that looks very promising. However, I can't trace via edebug down into `semantic-parse-region'; that is defined by define-overloadable-function. I don't see where python-mode defines an overload for that, and I also don't see where the default is defined. How do I find the local binding for an overloadable function? I tried (mode-local-symbol 'semantic-parse-region) in the python buffer; that returned nil. -- -- Stephe |
From: Stephen L. <ste...@st...> - 2012-10-06 21:14:53
|
Stephen Leake <ste...@st...> writes: > I found the top level function 'bovinate'; when I run it on a python > buffer (for example), it outputs semantic information; that looks very > promising. However, I can't trace via edebug down into > `semantic-parse-region'; that is defined by define-overloadable-function. > > I don't see where python-mode defines an overload for that, and I also > don't see where the default is defined. Well, I figured out this part; the default of an overloaded symbol is <name>-default. So I found the core loop for semantic-parse-region; I'll have to study it some more. Next question: how do I compile a wisent grammar file in Emacs 24.2? I'd like to build a very simple grammar for a small subset of Ada, so I can play with it interactively and see if I can do what I need for an indentation engine. In Emacs 23, there was a file '../etc/grammars/wisent-grammar.el' that would read a .wy file and produce a -wy.el file. I don't see the equivalent in Emacs 24.2. Do I need to download something from sourceforge? -- -- Stephe |
From: Stephen L. <ste...@st...> - 2012-10-08 23:20:00
Attachments:
wisent-ada-simple.wy
|
David Engster <de...@ra...> writes: > Stephen Leake writes: >> David Engster <de...@ra...> writes: > >>> Also, IMO there are too many indentation issues that are more a matter >>> of taste; just look at how many comment-styles the cc-mode indentation >>> engine supports. >> >> Yes. That's not a problem in Ada; the Ada community is pretty uniform, >> so a _small_ choice of styles is adequate. GPS (the AdaCore IDE) only >> has about 4 settings for indentation! > > That must be a nice community to be in. ;-) Yes, it is. It's also a great language to write code in. In only Google appreciated that, writing code for Android would be so much easier :(. >> Back to semantic; I have a very small grammar written, compiled, and >> running. Here's the entire rules part: > > Could you please send the entire .wy file? That will it make it much > easier for me to test. Attached. > I'm guessing you still need to add some boilerplate, for example you > have to tell Semantic what kind of Lexer it should use for Ada > buffers. There is an Analyzers section in the -wy.el file output by semantic-grammar-batch-build-packages, and I see appropriate tokens when I step thru bovinate. > It's been a while since I've written a language mode myself, so I'm a > bit rusty in that area. Have a look at the small examples like > wisent-calc.el. Doesn't look significantly different to me, but I'm clearly missing something. I'll try running that with bovinate, as well. -- -- Stephe |
From: Stephen L. <ste...@st...> - 2012-10-09 05:00:46
|
Stephen Leake <ste...@st...> writes: > David Engster <de...@ra...> writes: > >> It's been a while since I've written a language mode myself, so I'm a >> bit rusty in that area. Have a look at the small examples like >> wisent-calc.el. > > Doesn't look significantly different to me, but I'm clearly missing > something. I'll try running that with bovinate, as well. Did some experimenting. wisent-calc.el defines the main function wisent-calc, which calls semantic/wisent/wisent.el wisent-parse to parse the string. That code looks like the LALR parser I was expecting, and I can see it executing the actions. But that's plain wisent, not semantic on top of wisent. I think I need to use semantic, eventually? I managed to configure a buffer containing calc source (just "1 + 1;") to do something when I call bovinate on it; it calls sematic.el semantic-repeat-parse-whole-stream. But it puts no tags in the *Parser Output* buffer. Stepping thru semantic-repeat-parse-whole-stream, I don't see any lexer tokens. So I guess that lexer is not set properly (which you suggested). wisent-calc calls (semantic-lex-init), which I did not have in my Ada buffer init. Calling that in my "calc buffer" doesn't help. semantic/semantic.el bovinate calls semantic-fetch-tags, which calls semantic-parse-region, which is an overloadable function, which I still have not figured out. How do I find out what the overlaodable function actually calls? I can't step into it in edebug, and (mode-local-symbol 'semantic-parse-region) in an Ada buffer returns nil! Which I guess means "run the default", which is semantic-parse-region-default, which eventually runs semantic-repeat-parse-whole-stream, which calls semantic-parse-stream (another overloadable), which for Ada is semantic-parse-stream-default, which is semantic-bovine.el semantic-bovinate-stream. That code looks like a parser, although it's got some features I don't understand, and it doesn't look like an LALR parser. It does have code that seems to evaluate actions, but that never gets triggered on my Ada code. Is there some documentation I can read that explains how semantic is supposed to work? It's clearly very flexible/customizable, but that also makes it hard to understand. I guess I can write my top level indent code to call wisent.el wisent-parse, at least for now, so I can play with the grammar and the indentation algorithm design. Running semantic-grammar-batch-build-packages on wisent-calc.wy produces the same wisent-calc-wy.el file that's in the cedet tarball. The main difference (as far as I can tell) from my Ada file is the Epilogue, so apparently I need to specify an Epilogue. But nothing in that file seems to point to wisent.el wisent-parse; that's simply called directly by wisent-calc. -- -- Stephe |
From: David E. <de...@ra...> - 2012-10-18 15:03:06
|
Eric M. Ludlam writes: > Performance, as David pointed out, is a problem, which is why most > grammars are just tagging grammars. The c++ parser is particularly slow > even though it is just a tagging parser, but that is because it uses the > 'bovine' parser. The wisent parser is much better. There are two java > parsers, one is a tagging parser, and the other parsers the whole file, > and is still quite competitive with the tagging only version. I must admit that even after all those years, the Semantic grammar framework is confusing to me. This surely has to do with the fact that I'm no computer science guy, so I never *properly* had to learn that stuff. ;-) Anyway, I think the difference between Bovine and Wisent should be documented better in the docs. Here's what (I think) I know: - Bovine is a LL parser. Wisent is a LALR parser. - This entails: - Bovine is top-bottom, while Wisent is bottom-up - Bovine uses a predict/match algorithm, while Wisent uses shift/reduce What really confuses me is the fact that the grammars fed into those parsers look pretty much the same to me. Yes, there are some minor issues, like Wisent having several %start rules and some changes in the lexing part, but otherwise I don't see much difference. Also, I learned that every LL grammar is a LR grammar (but not the other way round). So I'm wondering: if Wisent is so much more efficient, can't we somehow convert Bovine grammars to Wisent? I'm guessing we cannot, but I'd like to know why. :-) -David |
From: Stephen L. <ste...@st...> - 2012-10-06 22:46:24
|
Stephen Leake <ste...@st...> writes: > Next question: how do I compile a wisent grammar file in Emacs 24.2? > > I'd like to build a very simple grammar for a small subset of Ada, so I > can play with it interactively and see if I can do what I need for an > indentation engine. > > In Emacs 23, there was a file '../etc/grammars/wisent-grammar.el' that > would read a .wy file and produce a -wy.el file. I don't see the > equivalent in Emacs 24.2. > > Do I need to download something from sourceforge? So apparently the answer to that is "yes"; in cedet-1.1/semantic/wisent there is a Makefile that compiles .wy files. Copied bits of that to my Makefile, and now I have a file wisent-ada-simple-wy.el. Which defines a function wisent-ada-simple-wy--install-parser, but I'm having trouble getting that to run. I think it may be a compatibility issue between cedet-1.1 and Emacs 24; I'll try cedet-1.0.1 (tomorrow :) -- -- Stephe |
From: David E. <de...@ra...> - 2012-10-07 13:43:42
|
Stephen Leake writes: > Stephen Leake <ste...@st...> writes: > >> Next question: how do I compile a wisent grammar file in Emacs 24.2? >> >> I'd like to build a very simple grammar for a small subset of Ada, so I >> can play with it interactively and see if I can do what I need for an >> indentation engine. >> >> In Emacs 23, there was a file '../etc/grammars/wisent-grammar.el' that >> would read a .wy file and produce a -wy.el file. I don't see the >> equivalent in Emacs 24.2. >> >> Do I need to download something from sourceforge? > > So apparently the answer to that is "yes"; in cedet-1.1/semantic/wisent > there is a Makefile that compiles .wy files. Copied bits of that to my > Makefile, and now I have a file wisent-ada-simple-wy.el. > > Which defines a function wisent-ada-simple-wy--install-parser, but I'm > having trouble getting that to run. > > I think it may be a compatibility issue between cedet-1.1 and Emacs 24; > I'll try cedet-1.0.1 (tomorrow :) I'm afraid you're going the wrong way. :-) Emacs had the necessary grammar framework in admin/grammars since 23.4, I think. Those packages were just upgraded in Emacs-bzr to first-class citizens and moved to lisp/cedet/, so wisent-grammar-mode should kick in as soon as you load a .wy file. You can then generate the parser by hitting C-c C-c. If it doesn't work, please let me know (but make sure you're using latest Emacs from bzr). You can also use CEDET from bzr, but that shouldn't be necessary anymore for doing grammar work. Regarding using Wisent for indentation: no one has done that yet. In fact, our parsers don't look at every line of code. Not even close; it would just be way too slow. Instead, we are only parsing pretty large semantical chunks like functions, classes, etc. and only look inside them if we have to (for getting a list of local variables, for instance). On a meta-note: I've always been doubtful about using grammars for indentation. Speed is just one problem; the real issue is that *while* code is written, you are very often confronted with structures the grammar considers illegal and hence cannot parse. Also, IMO there are too many indentation issues that are more a matter of taste; just look at how many comment-styles the cc-mode indentation engine supports. Anyway, I'd love to be proven wrong (and maybe SMIE already does, I haven't tried it yet). BTW, a good read on indentation are Steve Yegges experiences while writing js2-mode: http://steve-yegge.blogspot.de/2008/03/js2-mode-new-javascript-mode-for-emacs.html -David |
From: Stephen L. <ste...@st...> - 2012-10-07 22:19:38
|
David Engster <de...@ra...> writes: > Stephen Leake writes: >> >> I think it may be a compatibility issue between cedet-1.1 and Emacs 24; >> I'll try cedet-1.0.1 (tomorrow :) > > I'm afraid you're going the wrong way. :-) > > Emacs had the necessary grammar framework in admin/grammars since 23.4, > I think. Those packages were just upgraded in Emacs-bzr to first-class > citizens and moved to lisp/cedet/, so wisent-grammar-mode should kick in > as soon as you load a .wy file. You can then generate the parser by > hitting C-c C-c. If it doesn't work, please let me know (but make sure > you're using latest Emacs from bzr). > > You can also use CEDET from bzr, but that shouldn't be necessary anymore > for doing grammar work. Ok. I'd rather not mess with building from trunk (I have enough else to do!). I managed to get things running with CEDET 1.1 and Emacs 24, by putting the CEDET tree at the front of load-path. That will do for now; if this project ever gets finished, I'll worry about merging to trunk. > Regarding using Wisent for indentation: no one has done that yet. I was afraid so. > In fact, our parsers don't look at every line of code. Not even close; > it would just be way too slow. Do you have recent actual numbers? I'm always wary of premature optimization in cases like this. Which is why I'd like to get a semantic Ada parser working, so I can measure it, and compare it to the equivalent SMIE parser. At the same time, it's good to hear that the grammar does not have to be exactly the same as the Ada grammar; I can leave out things that don't affect indentation; I'm also doing that with SMIE, and it simplifies things a lot. > On a meta-note: I've always been doubtful about using grammars for > indentation. I've been maintaining Emacs Ada mode for a while now, and every time someone reports a bug or wish for the indentation engine, I get the urge to rewrite it using a grammar-based approach :). So I'm finally giving in to that. > Speed is just one problem; the real issue is that *while* code is > written, you are very often confronted with structures the grammar > considers illegal and hence cannot parse. I'll have to include partially written code in my tests. My philosophy here is that as long as the indentation engine doesn't crash or hang, the bad indentation reminds you that the code is not yet legal; I find that helpful. In addition, one advantage of the SMIE parser is that it is mostly local, so illegal code far away tends not to affect it. And, as I said above, I simply leave out large parts of the grammar, that don't affect indentation. So it just doesn't see illegalities in those part of the grammar. So far, I'm much happier with the SMIE approach than the former ad-hoc approach! Not sure about semantic yet; I haven't got a parser working (more below). > Also, IMO there are too many indentation issues that are more a matter > of taste; just look at how many comment-styles the cc-mode indentation > engine supports. Yes. That's not a problem in Ada; the Ada community is pretty uniform, so a _small_ choice of styles is adequate. GPS (the AdaCore IDE) only has about 4 settings for indentation! > Anyway, I'd love to be proven wrong (and maybe SMIE already does, I > haven't tried it yet). BTW, a good read on indentation are Steve Yegges > experiences while writing js2-mode: > > http://steve-yegge.blogspot.de/2008/03/js2-mode-new-javascript-mode-for-emacs.html That is interesting. He implies that most of the grammar is useless, which I agree with. I suspect that a grammar approach to Ada is more fruitful, because Ada is a much more structured language. SMIE does take advantage of parse-partial-sexp when it can. And the actual trigger for me to give semantic a serious try was the post from Stefan Monnier pointing out that I was using the SMIE parser to do a full parse from the beginning of the buffer, so I might as well use an LALR parser. Which is semantic. If it works, it will reduce the amount of ad-hoc code in my SMIE ada-indent engine (it does seem very hard to avoid ad-hoc code!). Back to semantic; I have a very small grammar written, compiled, and running. Here's the entire rules part: package_body : PACKAGE BODY name IS declarative_part BEGIN statement END name SEMICOLON (TAG $3 'block) ; name : IDENTIFIER | name DOT IDENTIFIER (TAG $1 'name) ; declarative_part : subprogram_body ; subprogram_body : FUNCTION name RETURN name IS BEGIN statement END name SEMICOLON (TAG $2 'block) ; statement : RETURN NUMERIC_LITERAL SEMICOLON (TAG $2 'return) ; And the test text that it parses: package body Ada_Mode.Nominal is function Function_1 return Integer is begin return 1; end Function_1; begin Function_1; end Ada_Mode.Nominal; I'm running "bovinate" to test this. But I never see any tags in the bovinate output. I've stepped thru semantic-repeat-parse-whole-stream, and the 'tag' variable is always nil. I'm expecting to see 'name, 'block, 'return when the corresponding non-terminals are reduced. Is that what I should expect? Any hints? -- -- Stephe |
From: David E. <de...@ra...> - 2012-10-08 15:18:47
|
Stephen Leake writes: > David Engster <de...@ra...> writes: >> In fact, our parsers don't look at every line of code. Not even close; >> it would just be way too slow. > > Do you have recent actual numbers? I'm always wary of premature > optimization in cases like this. Well, let's say it this way: people often complain about CEDET being too slow. Parsing a big, complicated C++ file can easily take seconds. But that's C++, where especially the pre-processor can make things rather difficult. >> Speed is just one problem; the real issue is that *while* code is >> written, you are very often confronted with structures the grammar >> considers illegal and hence cannot parse. > > I'll have to include partially written code in my tests. My philosophy > here is that as long as the indentation engine doesn't crash or hang, > the bad indentation reminds you that the code is not yet legal; I find > that helpful. Sounds reasonable. Maybe this is not a big issue for indentation; for smart completions or stuff like Flymake this is a pretty big problem. >> Also, IMO there are too many indentation issues that are more a matter >> of taste; just look at how many comment-styles the cc-mode indentation >> engine supports. > > Yes. That's not a problem in Ada; the Ada community is pretty uniform, > so a _small_ choice of styles is adequate. GPS (the AdaCore IDE) only > has about 4 settings for indentation! That must be a nice community to be in. ;-) > Back to semantic; I have a very small grammar written, compiled, and > running. Here's the entire rules part: Could you please send the entire .wy file? That will it make it much easier for me to test. > I'm running "bovinate" to test this. But I never see any tags in the > bovinate output. > > I've stepped thru semantic-repeat-parse-whole-stream, and the 'tag' > variable is always nil. > > I'm expecting to see 'name, 'block, 'return when the corresponding > non-terminals are reduced. Is that what I should expect? I'm guessing you still need to add some boilerplate, for example you have to tell Semantic what kind of Lexer it should use for Ada buffers. It's been a while since I've written a language mode myself, so I'm a bit rusty in that area. Have a look at the small examples like wisent-calc.el. If you send me the complete .wy file, I can take a shot at getting it work. -David |
From: David E. <de...@ra...> - 2012-10-09 20:59:56
|
Stephen Leake writes: >> I'm guessing you still need to add some boilerplate, for example you >> have to tell Semantic what kind of Lexer it should use for Ada >> buffers. > > There is an Analyzers section in the -wy.el file output by > semantic-grammar-batch-build-packages, and I see appropriate tokens when > I step thru bovinate. I didn't have much time this evening, but I attached a wisent-ada.el file which contains some boilerplate to hopefully get you started. I think most of your current problems stem from the lexing part. As you can see in the generated parser, there are lexers for keywords, symbols, etc. defined for you, but you still have to put those in the right positions in the definition of the lexer (for example, do not put the symbol lexer before the keyword lexer). This is done in the attached wisent-ada.el file (see the `define-lex' command). BTW, I know nothing about Ada, so this lexer surely isn't the right thing yet. You can test the lexer by going to the beginning of the buffer and then call `semantic-lex-test'. It should output ((PACKAGE 1 . 8) (BODY 9 . 13) (IDENTIFIER 14 . 22) (DOT 22 . 23) (IDENTIFIER 23 . 30) (IS 31 . 33) (FUNCTION 38 . 46) .... and so on. Note that it has to output IDENTIFIER instead of just 'symbol', otherwise it doesn't work. I'm still not getting any tags with your grammar because some rule does not properly match. Especially at the beginning, writing Semantic grammars can be pretty tedious. Unfortunately, Wisent does not have a step debugger (Bovine has one), hence a good strategy is to start with very, *very* simple rules. For example, just start with compilation_unit : package_body ; package_body : PACKAGE BODY IDENTIFIER (TAG $3 'block) ; and try to parse "package body foobar". This at least works for me: (("foobar" block nil (:filename "/home/void/test.ada") #<overlay from 1 to 20 in test.ada>)) Then slowly make things more complicated. If you get stuck, remember that you can always call regular Elisp functions from the grammar. For instance, in the above example, you can write package_body : PACKAGE BODY IDENTIFIER (debug $3 'block) ; If you don't get a backtrace, then obviously the rule does not match and you have to investigate why. If you get a backtrace, you can then look at the argument list and see what '$3' actually is. So much for now; I hope that this helps you a bit. -David |
From: David E. <de...@ra...> - 2012-10-09 21:01:44
Attachments:
wisent-ada.el
|
David Engster writes: > but I attached a wisent-ada.el file No I did not. Classic. :-) -David |
From: Stephen L. <ste...@st...> - 2012-10-12 09:05:04
|
David Engster <de...@ra...> writes: > Stephen Leake writes: >>> I'm guessing you still need to add some boilerplate, for example you >>> have to tell Semantic what kind of Lexer it should use for Ada >>> buffers. >> >> There is an Analyzers section in the -wy.el file output by >> semantic-grammar-batch-build-packages, and I see appropriate tokens when >> I step thru bovinate. > > I didn't have much time this evening, but I attached a wisent-ada.el > file which contains some boilerplate to hopefully get you started. <snip> Thanks. I haven't had a chance to look at this yet; I went down another path (see my other post today). -- -- Stephe |
From: Eric M. L. <eri...@gm...> - 2012-10-12 01:42:17
|
Hi Stephen & David Sorry about being slow to reply. Punkin Chunkin season is upon me, and keeps me distracted. David is right about using semantic for indentation. There are no examples, and no good patterns to follow. I've thought about this problem on and off for a long time but I've been too focused on getting the rest of the system working well to try it out. Performance, as David pointed out, is a problem, which is why most grammars are just tagging grammars. The c++ parser is particularly slow even though it is just a tagging parser, but that is because it uses the 'bovine' parser. The wisent parser is much better. There are two java parsers, one is a tagging parser, and the other parsers the whole file, and is still quite competitive with the tagging only version. My conclusion has been that an indentation parser would start with a tagging parser, exactly as Stephen is starting out. Step one, make that work. Once that works, you can create additional goals for function bodies, and use the partial reparse engine, or something like it to start at the beginning of your function, block, or other solid anchor point, and parse up to the cursor. Wherever it stops, you would then determine indentation from that. How you get that data when it stops, I have no idea, but that's been my guess so far, and might not be too bad performance wise. The attachment with the example code didn't extract for me so I didn't look at it. The most important bit to get your parser running is the setup function. For example, wisent-java-default-setup is in the java-mode-hook. That function installs the parser and sets up lots of customizations needed for the java language. That function can also be quite simple, such as wisent-dot-setup-parser in cogre/wisent-dot.el. After re-creating you grammar, you often have to eval the newly created variables in the -wy.el file by hand w/ C-M-x because they are defvar, and just reloading the file won't work. It is a pain, but it will force things to work. You can use `semantic-describe-buffer' to make sure key overrides have been set in your buffer. Use semantic-lex-debug to make sure the lexer is creating the tokens your grammar expects. Use wisent-debug-on-entry in a rule to make sure your actions are being called when you expect. Sticking calls to (message ...) is also an easy way to make sure your actions are called. Good Luck Eric |
From: Stephen L. <ste...@st...> - 2012-10-12 09:02:48
|
"Eric M. Ludlam" <eri...@gm...> writes: > Sorry about being slow to reply. Punkin Chunkin season is upon me, > and keeps me distracted. <snip> Thanks for the advice. I must admit this project has me a little obsessed; my real work is suffering :) I'm currently experimenting with using the Wisent parser, and a lexer built directly on the Emacs C scanning functions forward-comment and skip-syntax-forward. That's the lexer SMIE uses, and it's quite fast, and certainly adequate for the task - everything is either a keyword or an identifier, I don't need any more detailed info than that for indentation. I don't see any fundamental reason why an LALR parser should be slower than an operator precedence parser, or at least, not enough slower to matter. In both cases, I end up parsing large portions of the text (whole functions, anyway), in order to compute indentation, and the SMIE parser is more than fast enough. I'm using the wisent parser because I understand it, and because it seemed easier to bypass the semantic lexer that way. Is there any documentation on the semantic parser? It works quite nicely for a very simple subset of the Ada syntax. But when I try to expand it, the wisent grammar compiler complains that all the rules and nonterminals are useless. Apparently it sees some conflict I don't see. So I wrote the same grammar in OpenToken (http://stephe-leake.org/ada/opentoken.html, an LALR parser generator written in Ada that I maintain), because it has lots of nice diagnostics to help debug grammar issues, and it does not complain, but generates a parser that works. So now I'm puzzled. There is a strong motivation to use an LALR parser for indentation; I can encode most of the information needed for indenting directly in the grammar actions. For example: package_body : PACKAGE BODY name IS declarations BEGIN statements END name SEMICOLON (wisi-cache-keywords '($1 'other nil $region1) '($4 'block-start nil $region4) '($6 'block-both nil $region6) '($8 'block-end nil $region8) '($10 'statement-end nil $region10)) ; In SMIE, that same information is embedded in ad-hoc code to "refine" the tokens. So the actual indentation engine code is quite simple; _much_ simpler, and _much_ less language-specific, than in SMIE. So the final system will be easier to maintain (because it is more closely related to the language grammar), and _much_ easier to port to a new language. So I'll keep working at it. I may try writing Ada code to dump the OpenToken parser table as a wisent lisp parser table. That will be easier for me than figuring out what's going on in the Wisent grammar compiler. -- -- Stephe |
From: Eric L. <eri...@gm...> - 2012-10-19 03:02:21
|
On 10/18/2012 11:02 AM, David Engster wrote: > Eric M. Ludlam writes: >> Performance, as David pointed out, is a problem, which is why most >> grammars are just tagging grammars. The c++ parser is particularly slow >> even though it is just a tagging parser, but that is because it uses the >> 'bovine' parser. The wisent parser is much better. There are two java >> parsers, one is a tagging parser, and the other parsers the whole file, >> and is still quite competitive with the tagging only version. > I must admit that even after all those years, the Semantic grammar > framework is confusing to me. This surely has to do with the fact that > I'm no computer science guy, so I never *properly* had to learn that > stuff. ;-) > > Anyway, I think the difference between Bovine and Wisent should be > documented better in the docs. Here's what (I think) I know: > > - Bovine is a LL parser. Wisent is a LALR parser. > - This entails: > - Bovine is top-bottom, while Wisent is bottom-up > - Bovine uses a predict/match algorithm, while Wisent uses shift/reduce > > What really confuses me is the fact that the grammars fed into those > parsers look pretty much the same to me. Yes, there are some minor > issues, like Wisent having several %start rules and some changes in the > lexing part, but otherwise I don't see much difference. > > Also, I learned that every LL grammar is a LR grammar (but not the other > way round). So I'm wondering: if Wisent is so much more efficient, can't > we somehow convert Bovine grammars to Wisent? I'm guessing we cannot, > but I'd like to know why. :-) > It hadn't occurred to me to try to port bovine grammars to wisent. Early on, that would have been very challenging, but when David Ponce merged the two language types together, it should be much easier now. That would be a big win to do that, since one whole style of parser generator could be eliminated. Sadly, I'd have to relearn all the specifics again to explain the differences. I seem to remember one difference was related to how the actions and tag quoting worked, but I don't really remember. Eric |
From: David E. <de...@ra...> - 2012-10-19 06:26:52
|
Eric Ludlam writes: > It hadn't occurred to me to try to port bovine grammars to wisent. > Early on, that would have been very challenging, but when David Ponce > merged the two language types together, it should be much easier now. > > That would be a big win to do that, since one whole style of parser > generator could be eliminated. Sadly, I'd have to relearn all the > specifics again to explain the differences. I seem to remember one > difference was related to how the actions and tag quoting worked, but > I don't really remember. Yes, I saw some differences there, but I think - while tedious - it's not difficult to convert that. One could probably write a converter which does of most of that stuff. I also thing that would be a big improvement, since we could drop Bovine and would probably gain some speed in the parsing department. Maybe I'll just try it with my Fortran90 tagging parser and see how difficult it actually is. OTOH, there are currently more urgent things to do (but much less interesting)... -David |
From: David E. <de...@ra...> - 2012-10-21 20:14:26
|
David Engster writes: > Eric Ludlam writes: >> It hadn't occurred to me to try to port bovine grammars to wisent. >> Early on, that would have been very challenging, but when David Ponce >> merged the two language types together, it should be much easier now. >> >> That would be a big win to do that, since one whole style of parser >> generator could be eliminated. Sadly, I'd have to relearn all the >> specifics again to explain the differences. I seem to remember one >> difference was related to how the actions and tag quoting worked, but >> I don't really remember. > > Yes, I saw some differences there, but I think - while tedious - it's > not difficult to convert that. One could probably write a converter > which does of most of that stuff. I looked into it a bit, and it is definitely doable. Most of the differences I've seen so far are in the lexing part and in how lambda expressions have to be quoted. The F90 tagging parser worked pretty much after 30 minutes. I then turned to the C grammar, and here things don't go as smoothly. The main problem is that it is nigh impossible to debug the actual parser. Even if we had a stepping debugger for Wisent, it actually wouldn't help much. I mean, I *can* output the shifts and reduces the parser is doing, and it also outputs what token it would expect next to match "some rule", but it does not say which. All it has are these magic tables which tell the parser what to do next; it's just so removed from the actual grammar. I can see now why people are still fond of LL parsing, it's way easier to step through the rules. Anyway, I'll try a bit more. I have a hunch it's a just little thing that's not working yet. -David |
From: David E. <de...@ra...> - 2012-10-21 21:24:36
|
David Engster writes: > The F90 tagging parser worked pretty much after 30 minutes. I then > turned to the C grammar, and here things don't go as smoothly. What is really strange is that my c.wy produces oodles of shift/reduce and reduce/reduce conflicts. So I guess there still must be something I'm missing, but I cannot figure out why I get these ambiguities in the grammar. -David |
From: David E. <de...@ra...> - 2012-10-22 18:02:24
|
David Engster writes: > David Engster writes: >> The F90 tagging parser worked pretty much after 30 minutes. I then >> turned to the C grammar, and here things don't go as smoothly. > > What is really strange is that my c.wy produces oodles of shift/reduce > and reduce/reduce conflicts. So I guess there still must be something > I'm missing, but I cannot figure out why I get these ambiguities in the > grammar. I removed the lambda expressions and send the grammar through Bison to get a more verbose output. Indeed, the grammar has several ambiguities. I think I could remove the reduce/reduce conflicts, but then there are also a few dozen(!) shift/reduce errors... so I'd rather not go further with this. It's a shame, I'd really like to now how much faster a LALR parser would be. Also, I'm wondering why we even get a working LL parser from that grammar. -David |