Thread: [cedet-semantic] Wisent grammars and bison
Brought to you by:
zappo
From: Miguel G. <mig...@gm...> - 2013-04-01 13:54:49
|
Hello List, I am very much interested in using semantic's wisent C/C++ grammar in a C+ + project I'm working on. Unfortunately I'm not at all familiar with bison or wisent and so, at this point, I'm simply assessing the feasibility of using semantic's grammars so as to not fall in the idiotic trap of reinventing the wheel; after all, Eric Ludlum et al have done an amazing job with the grammar (as well as with CEDET, of course!). So, being illiterate as I am in bison/wisent speak, I've decided to turn to the semantic mailing list. My question is how much work would be required to migrate the C/C++ grammar to bison? I've noticed that apart from the `c.by' grammar file there's also the files `c-by.el', `c.el' and `clang.el'; are these elisp files the result of running the grammar through wisent? -- Miguel |
From: Eric M. L. <er...@si...> - 2013-04-01 14:36:53
|
On 04/01/2013 09:26 AM, Miguel Guedes wrote: > Hello List, > > I am very much interested in using semantic's wisent C/C++ grammar in a C+ > + project I'm working on. Unfortunately I'm not at all familiar with > bison or wisent and so, at this point, I'm simply assessing the > feasibility of using semantic's grammars so as to not fall in the idiotic > trap of reinventing the wheel; after all, Eric Ludlum et al have done an > amazing job with the grammar (as well as with CEDET, of course!). > > So, being illiterate as I am in bison/wisent speak, I've decided to turn > to the semantic mailing list. My question is how much work would be > required to migrate the C/C++ grammar to bison? I've noticed that apart > from the `c.by' grammar file there's also the files `c-by.el', `c.el' and > `clang.el'; are these elisp files the result of running the grammar > through wisent? Hi, There are two parsers in CEDET. The one that the C grammar uses is not wisent, and wisent is the grammar that is most like bison. The implication is that some while the syntax is similar, the rules will run in a slightly different order. On the whole, you might find that the rules work ok. You are probably better off finding one of the other reference implementations for the C++ grammar. You will also find that the grammars are for tagging only. They don't parse the implementation, just the function and variable definitions. I'm not sure how much of a grammar you need, so that could be a factor. Good Luck Eric |
From: David E. <de...@ra...> - 2013-04-01 16:33:23
|
Eric M. Ludlam writes: > On 04/01/2013 09:26 AM, Miguel Guedes wrote: >> I am very much interested in using semantic's wisent C/C++ grammar in a C+ >> + project I'm working on. Unfortunately I'm not at all familiar with >> bison or wisent and so, at this point, I'm simply assessing the >> feasibility of using semantic's grammars so as to not fall in the idiotic >> trap of reinventing the wheel; after all, Eric Ludlum et al have done an >> amazing job with the grammar (as well as with CEDET, of course!). >> >> So, being illiterate as I am in bison/wisent speak, I've decided to turn >> to the semantic mailing list. My question is how much work would be >> required to migrate the C/C++ grammar to bison? I've noticed that apart >> from the `c.by' grammar file there's also the files `c-by.el', `c.el' and >> `clang.el'; are these elisp files the result of running the grammar >> through wisent? > > There are two parsers in CEDET. The one that the C grammar uses is not > wisent, and wisent is the grammar that is most like bison. The > implication is that some while the syntax is similar, the rules will run > in a slightly different order. On the whole, you might find that the > rules work ok. You are probably better off finding one of the other > reference implementations for the C++ grammar. Which one? :-) You cannot parse C++ with plain Bison; you need some additional trickery to deal with the context-sensitive stuff. There is a thesis by E.D. Willink in which he wrote a grammar which you can find here: http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y Read the initial comment, though. If you want to parse C++, your current best bet is using libclang and/or libtooling. -David |
From: Miguel G. <mig...@gm...> - 2013-04-02 10:47:51
|
Hi David, On Mon, 2013-04-01 at 18:33 +0200, David Engster wrote: > You cannot parse C++ with plain Bison; you need some additional trickery > to deal with the context-sensitive stuff. > > There is a thesis by E.D. Willink in which he wrote a grammar which you > can find here: > > http://www.computing.surrey.ac.uk/research/dsrg/fog/CxxGrammar.y > > Read the initial comment, though. > Yes, I'd come across Willink's thesis and grammar before but I was reluctant to use it (and still am) as it was created in 1999 and doesn't seem to have been updated since. > If you want to parse C++, your current best bet is using libclang and/or > libtooling. > That was my first bet - libclang, that is. But after conducting some tests it turns out that even with optimisations turned on it performs too slow when used on medium/large projects. Also it suffers from a number of issues such as not being thread-safe - multi-threading being the main reason why I started my project -, or having to fully re-parse a source file if it is being edited rather than just parse the line(s) that have been changed, among others. I haven't looked at libtooling yet but I was discouraged of using it in the clang mailing lists by core devs and to focus on libclang instead. Will still be posting to the clang mailing list to ask for more advice as I may be missing something but, really, I don't think my project needs a full compiler stack parsing code; tagging code should more than suffice. -- Miguel |
From: Miguel G. <mig...@gm...> - 2013-04-02 10:32:47
|
Hi Eric, On Mon, 2013-04-01 at 10:36 -0400, Eric M. Ludlam wrote: > There are two parsers in CEDET. The one that the C grammar uses is not > wisent, and wisent is the grammar that is most like bison. Could you please point me to the grammar used by semantic? Also, what is the wisent grammar (./lisp/cedet/semantic/bovine/c.by) used for? > The implication is that some while the syntax is similar, the rules will > run in a slightly different order. On the whole, you might find that the > rules work ok. You are probably better off finding one of the other > reference implementations for the C++ grammar. > > You will also find that the grammars are for tagging only. They don't > parse the implementation, just the function and variable definitions. > I'm not sure how much of a grammar you need, so that could be a factor. Tagging is exactly what I'm looking for. I find that as a general rule semantic works really well but it tends to suffer from latency (possibly due to its single-threaded limitation?) when working with medium/large C/C++ projects which I personally find extremely disruptive to the coding flow. Hence why I thought of starting a project to expose a multithreaded server accessible from within emacs (or any other medium, command line included) to carry out similar functions to semantic's - i.e. code complete, symbol look up - as well as others I've thought of. -- Miguel |
From: Eric M. L. <er...@si...> - 2013-04-02 11:50:14
|
On 04/02/2013 06:32 AM, Miguel Guedes wrote: > Hi Eric, > > On Mon, 2013-04-01 at 10:36 -0400, Eric M. Ludlam wrote: >> There are two parsers in CEDET. The one that the C grammar uses is not >> wisent, and wisent is the grammar that is most like bison. > > Could you please point me to the grammar used by semantic? Also, what is > the wisent grammar (./lisp/cedet/semantic/bovine/c.by) used for? Hi, The .by extension is for the 'bovine' LL parser which is what is used by default. .wy extensions are for wisent LALR grammars. >> The implication is that some while the syntax is similar, the rules will >> run in a slightly different order. On the whole, you might find that the >> rules work ok. You are probably better off finding one of the other >> reference implementations for the C++ grammar. >> >> You will also find that the grammars are for tagging only. They don't >> parse the implementation, just the function and variable definitions. >> I'm not sure how much of a grammar you need, so that could be a factor. > > Tagging is exactly what I'm looking for. I find that as a general rule > semantic works really well but it tends to suffer from latency (possibly > due to its single-threaded limitation?) when working with medium/large > C/C++ projects which I personally find extremely disruptive to the coding > flow. Hence why I thought of starting a project to expose a multithreaded > server accessible from within emacs (or any other medium, command line > included) to carry out similar functions to semantic's - i.e. code > complete, symbol look up - as well as others I've thought of. Thanks for the explanation. This sounds like a pretty interesting project. Other fast parsers I've used to make CEDET faster include GNU Global, CScope, and Ebrowse. You might like the ebrowse parser, as it is a fast tagging parser, but it isn't detailed enough. CScope might be another option. Good Luck Eric |
From: David E. <de...@ra...> - 2013-04-02 15:05:16
|
Eric M. Ludlam writes: > On 04/02/2013 06:32 AM, Miguel Guedes wrote: >> Hi Eric, >> >> On Mon, 2013-04-01 at 10:36 -0400, Eric M. Ludlam wrote: >>> There are two parsers in CEDET. The one that the C grammar uses is not >>> wisent, and wisent is the grammar that is most like bison. >> >> Could you please point me to the grammar used by semantic? Also, what is >> the wisent grammar (./lisp/cedet/semantic/bovine/c.by) used for? > > The .by extension is for the 'bovine' LL parser which is what is used by > default. .wy extensions are for wisent LALR grammars. To followup on that: The grammars for LL (Bovine) and LALR (Wisent/Bison) are not much different syntactically. A few months ago I tried porting the C++ grammar from Bovine to Wisent, and while I could get Bison to accept it, it had a *lot* of conflicts. I'm not very familiar with grammars, parsers and all that, but it seems that LL parsers are more robust against such conflicts than LALR ones (top-down vs bottom-up). >> Tagging is exactly what I'm looking for. I find that as a general rule >> semantic works really well but it tends to suffer from latency (possibly >> due to its single-threaded limitation?) when working with medium/large >> C/C++ projects which I personally find extremely disruptive to the coding >> flow. Regarding Semantic's speed, there are two separate issues: - The actual parsing: Yes, that is slow, but it is usually a one-time cost, so I'm not sure how much it really matters. The C++ parser speed can be pretty well tested with 'make itest-stl-batch', which parses roughly 100 headers from the C++ STL (vector, string, etc.). It takes roughly 10 seconds on my machine, so we're talking ~10 files per second. A proper LALR grammar for C++ would probably be significantly faster. Still, it's Emacs Lisp, of course; it will always be way slower than something in C/C++. - Dealing with the database, which involves building search paths and finding tags in it. This very much depends on the size of the database. I think Eric had in mind to maybe use something externally here one day. >> Hence why I thought of starting a project to expose a multithreaded >> server accessible from within emacs (or any other medium, command line >> included) to carry out similar functions to semantic's - i.e. code >> complete, symbol look up - as well as others I've thought of. People already did this using libclang. For example: https://github.com/Andersbakken/rtags >> That was my first bet - libclang, that is. But after conducting some >> tests it turns out that even with optimisations turned on it performs >> too slow when used on medium/large projects. Hmm. We must have a different perspective on "medium/large projects". I found libclang to be very fast. >> I haven't looked at libtooling yet but I was discouraged of using it in >> the clang mailing lists by core devs and to focus on libclang instead. Interesting. I found libtooling to be more powerful than libclang, because you have the full C++ API. However, last I checked, actually getting something to compile using libtooling was not easy. -David |
From: Miguel G. <mig...@gm...> - 2013-04-02 12:16:05
|
On Tue, 2013-04-02 at 07:50 -0400, Eric M. Ludlam wrote: > Thanks for the explanation. This sounds like a pretty interesting > project. Other fast parsers I've used to make CEDET faster include GNU > Global, CScope, and Ebrowse. You might like the ebrowse parser, as it > is a fast tagging parser, but it isn't detailed enough. CScope might be > another option. > Thanks for the invaluable input, Eric. Will be checking out ebrowse as I've used it in the past and, as you rightly said, it's pretty fast. -- Miguel |
From: Stephen L. <ste...@st...> - 2013-04-04 08:06:46
|
"Eric M. Ludlam" <er...@si...> writes: > On 04/03/2013 07:50 AM, Stephen Leake wrote: >> >> There is now another option. For parsing Ada, I've written a generalized >> LALR parser, using wisent as a starting point. It's still beta code; see >> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, the section >> on Ada mode 5.0 > > Unlike tagging grammars, you need to parse the whole file, so it seems > like merging indentation code, and tagging code is probably not a good > idea. The wisi overlay on wisent seems like a nice solution. wisi is actually an almost complete replacement for wisent. I started out using the wisent LALR parser, but then realized I needed a generalized parser. I am still using some of the lexer table construction functions. I don't fully understand what is meant by "tagging parser". The parser actions provided by wisi store semantic information in text properties; that is then used by the indentation engine. What do the "tagging parser" actions do? How can you parse anything less than "the full syntax" with an LALR parser? You have to parse all of the input tokens. > I'll be curious if you need any wisent patches to make your solution > work. If there is some support you need (such as integrating patches > assigned to GNU) let me know. I have a copyright assignment for Emacs on file. It might make sense to integrate wisi into cedet, or it might make sense to make it a separate ELPA package (I plan to do that with Ada mode). > If it is well generalized, I could imaging creating such an > indentation engine for grammar files (.wy) to see how it works. That should be doable now; my intent is to make wisi useful for any language. Although I have no idea how to cope with preprocessors such as C and C++; they change the language that needs to be parsed, on the fly! As an example, I have a complete parser and indentation engine using wisi for GNAT gpr files; see gpr-grammar.wy, gpr-wisi.el gpr files are the GNAT compiler option files; the syntax is similar to Ada, but is much simpler. There is also a test driver framework; see build/* and test/* I'd appreciate feedback on how useful this is for non-Ada languages :). Note that the wisi parser expects a different structure from the lexer; you should be able to use the wisi lexer. > Grammar files aren't very complicated, so serious overkill to be > sure. Yes, but small examples are always useful. And it might be the simplest way to do an indentation engine; that's certainly my goal. The main difference between wisi indentation engines and any others I've seen is the use of cached info in text properties; that separates parsing from indenting, which _significantly_ speeds things up, allowing the use of more complex parsers. -- -- Stephe |
From: Eric M. L. <er...@si...> - 2013-04-04 11:45:26
|
On 04/04/2013 04:06 AM, Stephen Leake wrote: >> > Unlike tagging grammars, you need to parse the whole file, so it seems >> > like merging indentation code, and tagging code is probably not a good >> > idea. The wisi overlay on wisent seems like a nice solution. > wisi is actually an almost complete replacement for wisent. I started > out using the wisent LALR parser, but then realized I needed a > generalized parser. I am still using some of the lexer table > construction functions. Ah, I hadn't realized it had been separated due to the .wy files around in your project. > I don't fully understand what is meant by "tagging parser". This just means that the parser only has rules for parsing the outside of a tag. For example: int my_c_fcn (int a) { Misc code here; } Never sees the 'Misc code' because the lexer merges everything between { and } into one lexical token. It does the same to the ( and ), but the parser knows that block has arguments in it, so parser asks the lexer to expand the block, and then it recurses. Having wisent be an iterative parser makes it hugely robust to bad code, since it can easily skip over bogus text and try again. This works because code is very regular, and since editors depend on well balanced { } to do indenting, people rarely put the { } notation into CPP macros, so it is pretty reliable. It also speeds up parsing. For languages with begin/end, when the lexer sees a keyword such as 'begin', 'if' or whatever the initializer is can use tricks to zip to the end, and wrap that up into one token. For example, for macros, if the lexer sees #if <condition> it knows how to skip out to the #endif using a handy function used by the generalized C indentation engine. That's what makes most of the semantic grammars different, is it depends on the lexer hiding large portions of the buffer. The indentation engine needs access to indent those parts of the buffer. On the flip side, the indentation engine could depend on a tagging parser to find boundaries between independent indentable parts, perhaps simplifying it's job. Since there is no tagging parser for ada (unless you wrote one) it may be hard to experiment. ;) Of course, your wisi indentation engine could provide a tag list to semantic after it is done analyzing. I hadn't examined what you have enough to know for sure. Eric |
From: David E. <de...@ra...> - 2013-04-04 18:25:44
|
Stephen Leake writes: > "Eric M. Ludlam" <er...@si...> writes: > >> On 04/03/2013 07:50 AM, Stephen Leake wrote: >>> >>> There is now another option. For parsing Ada, I've written a generalized >>> LALR parser, using wisent as a starting point. It's still beta code; see >>> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, the section >>> on Ada mode 5.0 This looks very interesting. However, I cannot download org.opentoken.stephe-2013-03-27.tar.bz2 (it gives me a 404). Also, I don't understand how you generate the parser. From the Makefile it seems you're using Opentoken? What are you using your Wisent-derived parser generator for, then? > It might make sense to integrate wisi into cedet, or it might make > sense to make it a separate ELPA package (I plan to do that with Ada mode). I really like how you're dealing with the grammar conflicts. If it can even deal with reduce/reduce, I would at least try port the C++ grammar to that. I'd *really* like to get our C++ parsing faster. I had a good look at semantic-bovinate-stream and profiled it with various tests (which is difficult because it's recursive), but I cannot see any possibility for major speedups, except going parallel (by using several Emacs processes) or rewriting it as a primitive in C (which the Emacs maintainers probably wouldn't like). > That should be doable now; my intent is to make wisi useful for any > language. Although I have no idea how to cope with preprocessors such as > C and C++; they change the language that needs to be parsed, on the fly! This is dealt with on the lexing level, which "mostly works". It's still a major pain, though. A better solution would be to have a C preprocessor implemented in Emacs Lisp, which runs before the lexer. This does not sound terribly hard (the C preprocessor is pretty dumb, after all), but the main problem is that Semantic would see a different code than the user in his buffer, and you have to manage a mapping between the two. I imagine this to be pretty hard to get right (and reasonably fast). -David |
From: Stephen L. <ste...@st...> - 2013-04-04 22:26:19
|
David Engster <de...@ra...> writes: > Stephen Leake writes: >> "Eric M. Ludlam" <er...@si...> writes: >> >>> On 04/03/2013 07:50 AM, Stephen Leake wrote: >>>> >>>> There is now another option. For parsing Ada, I've written a generalized >>>> LALR parser, using wisent as a starting point. It's still beta code; see >>>> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, the section >>>> on Ada mode 5.0 > > This looks very interesting. However, I cannot download > org.opentoken.stephe-2013-03-27.tar.bz2 (it gives me a 404). Sorry, that should be org.opentoken.stephe-4.0w-2013-03-27.tar.bz2 (note the 4.0w). page fixed now. > Also, I don't understand how you generate the parser. From the Makefile > it seems you're using Opentoken? Yes. > What are you using your Wisent-derived parser generator for, then? I'm not. The wisi generalized LALR parser (not the parser generator) is a rewrite of the wisent LALR parser. >> It might make sense to integrate wisi into cedet, or it might make >> sense to make it a separate ELPA package (I plan to do that with Ada mode). > I really like how you're dealing with the grammar conflicts. If it can > even deal with reduce/reduce, It can. > I would at least try port the C++ grammar to that. > I'd *really* like to get our C++ parsing faster. It does slow down the parser, relative to plain LALR. The slowdown is proportional to the number of conflicts, and the number of tokens it takes to resolve each conflict. A truly ambiguous construct yeilds two (or more) successful parses, which is an error in wisi (the parser doesn't know which set of actions to apply). It should still be much faster than LL, I think. You can turn on wisi-debug, load build/run-wisi-test.el, open test/wisi/*.input, and run the parser via run-test-here; that will show the parse states, including spawning and terminating parsers for conflicts. Hmm. I'll have to provide the *-wy.el files if you want to do that without running Opentoken first. I guess I should include those in the zip file. > I had a good look at semantic-bovinate-stream and profiled it with > various tests (which is difficult because it's recursive), but I > cannot see any possibility for major speedups, except going parallel > (by using several Emacs processes) or rewriting it as a primitive in C > (which the Emacs maintainers probably wouldn't like). I don't think the Emacs maintainers object to using C for proven speed gain, as long as it's significant. Of course, I'd rather write it in Ada, and they probably would object to that (sigh). >> That should be doable now; my intent is to make wisi useful for any >> language. Although I have no idea how to cope with preprocessors such as >> C and C++; they change the language that needs to be parsed, on the fly! > > This is dealt with on the lexing level, which "mostly works". It's still > a major pain, though. A better solution would be to have a C > preprocessor implemented in Emacs Lisp, which runs before the > lexer. Yes, that does seem the most robust approach. There is a small preprocessor for Ada (not in the language standard), and I have to do something with that. >This does not sound terribly hard (the C preprocessor is pretty > dumb, after all), but the main problem is that Semantic would see a > different code than the user in his buffer, and you have to manage a > mapping between the two. Yes. C macros can insert whole lines; there is a C directive that says what the original file and line is for #include, but I'm not sure that works for other macros. The Ada preprocessor preserves line numbers, and only substitutes text, so that should be easier to manage. -- -- Stephe |
From: David E. <de...@ra...> - 2013-04-05 22:15:03
|
Stephen Leake writes: > David Engster <de...@ra...> writes: > >> Stephen Leake writes: >>> "Eric M. Ludlam" <er...@si...> writes: >>> >>>> On 04/03/2013 07:50 AM, Stephen Leake wrote: >>>>> >>>>> There is now another option. For parsing Ada, I've written a generalized >>>>> LALR parser, using wisent as a starting point. It's still beta code; see >>>>> http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, the section >>>>> on Ada mode 5.0 >> >> This looks very interesting. However, I cannot download >> org.opentoken.stephe-2013-03-27.tar.bz2 (it gives me a 404). > > Sorry, that should be org.opentoken.stephe-4.0w-2013-03-27.tar.bz2 > (note the 4.0w). page fixed now. Thanks. I have a hard time building that, though. My Debian stable box is apparently too old to build this, since its gprbuild does not know '--target'. On Arch Linux box, there's no gprbuild package, so I tried building from source, but I get gpr_version.ads:39:15: (style) space required with gcc 4.8. >> What are you using your Wisent-derived parser generator for, then? > > I'm not. The wisi generalized LALR parser (not the parser generator) is > a rewrite of the wisent LALR parser. Ah OK. I misunderstood your first post. > It does slow down the parser, relative to plain LALR. The slowdown is > proportional to the number of conflicts, and the number of tokens it > takes to resolve each conflict. A truly ambiguous construct yeilds two > (or more) successful parses, which is an error in wisi (the parser > doesn't know which set of actions to apply). > > It should still be much faster than LL, I think. I think so, too. > You can turn on wisi-debug, load build/run-wisi-test.el, open > test/wisi/*.input, and run the parser via run-test-here; that will show > the parse states, including spawning and terminating parsers for > conflicts. > > Hmm. I'll have to provide the *-wy.el files if you want to do that > without running Opentoken first. I guess I should include those in the > zip file. Since I want to port the C++ grammar, I'm afraid I'll somehow have to get Opentoken running. :-) >> I had a good look at semantic-bovinate-stream and profiled it with >> various tests (which is difficult because it's recursive), but I >> cannot see any possibility for major speedups, except going parallel >> (by using several Emacs processes) or rewriting it as a primitive in C >> (which the Emacs maintainers probably wouldn't like). > > I don't think the Emacs maintainers object to using C for proven speed > gain, as long as it's significant. Maybe. Still, writing that as an Emacs primitive will take me me quite some time, and I'm not even sure it will result in a major speedup since I still have to deal with Lisp structures; I don't think there's that much difference if I use 'car' in Emacs Lisp or XCAR in C. LL parsing is just painfully inefficient. >>This does not sound terribly hard (the C preprocessor is pretty >> dumb, after all), but the main problem is that Semantic would see a >> different code than the user in his buffer, and you have to manage a >> mapping between the two. > > Yes. C macros can insert whole lines; there is a C directive that says > what the original file and line is for #include, but I'm not sure that > works for other macros. #include is a special case. We cannot just insert the file; that quickly leads to huge buffers which are impossible to handle as soon as you bring in the C++ STL or even Boost, which can pretty easily pull in a thousand header files. We have to cache the preprocessor stuff from those files and pull it in as needed. This would probably have to be a completely new database just for the preprocessor. I guess it would all be pretty messy. -David |
From: Stephen L. <ste...@st...> - 2013-04-06 11:51:15
|
David Engster <de...@ra...> writes: > Stephen Leake writes: >> David Engster <de...@ra...> writes: >> >> Sorry, that should be org.opentoken.stephe-4.0w-2013-03-27.tar.bz2 >> (note the 4.0w). page fixed now. > > Thanks. I have a hard time building that, though. Ah. I've been testing with GNAT 7.1, a supported release. You can install GNAT GPL 2012 from http://libre.adacore.com/. Hmm. The build/linux_release directory is not set up to compile wisi (I have not been testing on Linux. So you have to use: cd build/windows_release gprbuild --target=x86-windows -P opentoken.gpr wisi-generate Note that the executable has a .exe extension even on Linux; I do that to simplify the makefiles. >> You can turn on wisi-debug, load build/run-wisi-test.el, open >> test/wisi/*.input, and run the parser via run-test-here; that will show >> the parse states, including spawning and terminating parsers for >> conflicts. >> >> Hmm. I'll have to provide the *-wy.el files if you want to do that >> without running Opentoken first. I guess I should include those in the >> zip file. > > Since I want to port the C++ grammar, I'm afraid I'll somehow have to > get Opentoken running. :-) Ok. Let me know if the above doesn't work. I can easily test with GNAT GPL 2012 on Windows, so I'll add that to my snapshot release process. I also have a Debian box, but it's not cooperating at the moment. -- -- Stephe |
From: Stephen L. <ste...@st...> - 2013-04-06 12:03:06
|
Stephen Leake <ste...@st...> writes: > David Engster <de...@ra...> writes: > >> Stephen Leake writes: >>> David Engster <de...@ra...> writes: >>> >>> Sorry, that should be org.opentoken.stephe-4.0w-2013-03-27.tar.bz2 >>> (note the 4.0w). page fixed now. >> >> Thanks. I have a hard time building that, though. > > Ah. I've been testing with GNAT 7.1, a supported release. On windows. > You can install GNAT GPL 2012 from http://libre.adacore.com/. > > Hmm. The build/linux_release directory is not set up to compile wisi (I > have not been testing on Linux. So you have to use: > > cd build/windows_release > gprbuild --target=x86-windows -P opentoken.gpr wisi-generate That should be x86-linux Sigh; apparently it's too early to be answering email today :). -- -- Stephe |
From: Stephen L. <ste...@st...> - 2013-04-06 14:33:19
|
Stephen Leake <ste...@st...> writes: > Stephen Leake <ste...@st...> writes: > >> David Engster <de...@ra...> writes: >> >>> Stephen Leake writes: >>>> David Engster <de...@ra...> writes: >>>> >>>> Sorry, that should be org.opentoken.stephe-4.0w-2013-03-27.tar.bz2 >>>> (note the 4.0w). page fixed now. >>> >>> Thanks. I have a hard time building that, though. >> >> Ah. I've been testing with GNAT 7.1, a supported release. > > On windows. > >> You can install GNAT GPL 2012 from http://libre.adacore.com/. >> >> Hmm. The build/linux_release directory is not set up to compile wisi (I >> have not been testing on Linux. So you have to use: >> >> cd build/windows_release >> gprbuild --target=x86-windows -P opentoken.gpr wisi-generate > > That should be x86-linux > > Sigh; apparently it's too early to be answering email today :). And it won't work anyway; GNAT GPL 2012 doesn't get Ada 2012 syntax quite right. I'll find workarounds, and post a new version. -- -- Stephe |
From: Stephen L. <ste...@st...> - 2013-04-06 18:15:04
|
Stephen Leake <ste...@st...> writes: > And it won't work anyway; GNAT GPL 2012 doesn't get Ada 2012 syntax > quite right. > > I'll find workarounds, and post a new version. This is now done. New version posted to http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, with instructions on installing GNAT GPL 2012 on Debian testing. -- -- Stephe |
From: David E. <de...@ra...> - 2013-04-07 12:27:51
|
Stephen Leake writes: > Stephen Leake <ste...@st...> writes: > >> And it won't work anyway; GNAT GPL 2012 doesn't get Ada 2012 syntax >> quite right. >> >> I'll find workarounds, and post a new version. > > This is now done. New version posted to > http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, with > instructions on installing GNAT GPL 2012 on Debian testing. Thanks, I now managed to compile wisi-generate and it seems to work nicely. -David |
From: Stephen L. <ste...@st...> - 2013-04-03 12:50:53
|
David Engster <de...@ra...> writes: > To followup on that: The grammars for LL (Bovine) and LALR > (Wisent/Bison) are not much different syntactically. A few months ago I > tried porting the C++ grammar from Bovine to Wisent, and while I could > get Bison to accept it, it had a *lot* of conflicts. I'm not very > familiar with grammars, parsers and all that, but it seems that LL > parsers are more robust against such conflicts than LALR ones (top-down > vs bottom-up). There is now another option. For parsing Ada, I've written a generalized LALR parser, using wisent as a starting point. It's still beta code; see http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, the section on Ada mode 5.0 A "generalized" parser handles a conflict by spawning another parser, and following both paths until one errors out. So far it's working well, but I haven't put in all of Ada yet, so I'm not sure how fast it will ultimately be. -- -- Stephe |
From: Eric M. L. <er...@si...> - 2013-04-04 01:11:05
|
On 04/03/2013 07:50 AM, Stephen Leake wrote: > David Engster<de...@ra...> writes: > >> To followup on that: The grammars for LL (Bovine) and LALR >> (Wisent/Bison) are not much different syntactically. A few months ago I >> tried porting the C++ grammar from Bovine to Wisent, and while I could >> get Bison to accept it, it had a *lot* of conflicts. I'm not very >> familiar with grammars, parsers and all that, but it seems that LL >> parsers are more robust against such conflicts than LALR ones (top-down >> vs bottom-up). > > There is now another option. For parsing Ada, I've written a generalized > LALR parser, using wisent as a starting point. It's still beta code; see > http://stephe-leake.org/emacs/ada-mode/emacs-ada-mode.html, the section > on Ada mode 5.0 Hi Stephen, This sounds pretty interesting. The details in wisi.el are quite detailed. I'd often thought it would be possible to create an indentation engine with the grammars in CEDET, so it is cool to see that it is, in practice, working. Unlike tagging grammars, you need to parse the whole file, so it seems like merging indentation code, and tagging code is probably not a good idea. The wisi overlay on wisent seems like a nice solution. I'll be curious if you need any wisent patches to make your solution work. If there is some support you need (such as integrating patches assigned to GNU) let me know. If it is well generalized, I could imaging creating such an indentation engine for grammar files (.wy) to see how it works. Grammar files aren't very complicated, so serious overkill to be sure. Eric |
From: Stephen L. <ste...@st...> - 2013-04-04 22:31:39
|
"Eric M. Ludlam" <er...@si...> writes: > On 04/04/2013 04:06 AM, Stephen Leake wrote: > >> I don't fully understand what is meant by "tagging parser". > > This just means that the parser only has rules for parsing the outside > of a tag. For example: > > int my_c_fcn (int a) { > > Misc code here; > > } > > Never sees the 'Misc code' because the lexer merges everything between > { and } into one lexical token. Ah, that makes sense. > On the flip side, the indentation engine could depend on a tagging > parser to find boundaries between independent indentable parts, > perhaps simplifying it's job. Might be useful for C or C++; Ada is easy enough to parse. > Since there is no tagging parser for ada (unless you wrote one) No. > it may be hard to experiment. ;) Of course, your wisi indentation > engine could provide a tag list to semantic after it is done > analyzing. Yes; I just need to understand the structure of a "tag list", and add the appropriate actions in wisi.el and *-grammar.wy But if "tag" here means "navigation aid", as the various Emacs "tag" commands do, then that's not needed with Ada; the GNAT compiler outputs comprehensive cross-reference information, and it's much nicer to navigate using that. gcc for C and g++ also output cross-reference info; does CEDET have a utility to take advantage of that? I gather GNU Global does not, which seems odd. -- -- Stephe |
From: Eric M. L. <er...@si...> - 2013-04-07 16:49:45
|
On 04/04/2013 06:31 PM, Stephen Leake wrote: >> > it may be hard to experiment. ;) Of course, your wisi indentation >> > engine could provide a tag list to semantic after it is done >> > analyzing. > Yes; I just need to understand the structure of a "tag list", and add > the appropriate actions in wisi.el and *-grammar.wy > > But if "tag" here means "navigation aid", as the various Emacs "tag" > commands do, then that's not needed with Ada; the GNAT compiler outputs > comprehensive cross-reference information, and it's much nicer to > navigate using that. Hi Stephen, I think TAG means a navigation aid to most people, but for CEDET, tags serve a range of purposes. It is navigation, completion (dumb and smart), hints in the minibuffer, breadcrumbs, decoration, and a meta data that can be converted through srecode into code in other supported languages, doc, comments, or into UML diagrams. > gcc for C and g++ also output cross-reference info; does CEDET have a > utility to take advantage of that? I gather GNU Global does not, which > seems odd. One of the main drivers for me in developing the tags as defined by Semantic / CEDET is to create a data structure that is complete enough for a wide range of purposes, and universal enough for a wide range of languages. Tool authors who create tags for Semantic automatically support all languages supported by CEDET. In reverse, anyone who creates tags for their language automatically gets support for all the tools built on the CEDET API. So, even though ADA might provide nice cross-ref tables, having something that converts them into Semantic tags will magically get you support for all the existing CEDET based tools. If you write ADA based tools on the CEDET APIs, then other languages with similar features to ADA might be able to use your tools. In addition, since ADA has cross-referencing tools, you could probably also support the symref API through it, though there is a lot less using that feature, but it could save you some time re-writing what is already there. Eric |