From: Haoyu B. <div...@gm...> - 2008-12-14 09:23:23
|
On Fri, Dec 12, 2008 at 6:45 AM, William S Fulton <ws...@fu...> wrote: > dav...@da... wrote: >>> It is up to Haoyu what he does, but putting together a prototype and >>> proof of concept of using the gcc parser sounds like a challenging, but >>> achievable final year project given the skills I've already seen in >>> Haoyu. Just replacing DOH with what I outlined would be useful but not >>> too hard. It would not be as beneficial for SWIG as a whole as we don't >>> really have anyone at the moment who, very importantly, has demonstrated >>> good SWIG experience, and is able to commit a chunk of time towards >>> change. >>> >> >> You guys will have to forgive me for chiming in from afar, but I saw a message titled "DOH Removal" and immediately thought "are you guys nuts?!?!" :-). In all >> seriousness though, even though DOH could be improved in many ways or even replaced with the STL, it seems like there are much more interesting problems that >> could be worked on instead of that. >> >> Regarding a switch in parsers though, let me add my 0.02. I think in the big picture, switching to gcc or gcc-xml could be a difficult prospect because there are some >> aspects of Swig that don't easily mesh. The really big issue is that Swig has always been able to work with incomplete type information. For example, if Swig sees some >> declaration that uses a type of "Foo *", Swig doesn't need to see the definition of "Foo" to generate code that works. As a result, it has always been possible to use Swig >> on partial header files. gcc will definitely not allow you to be that forgiving. The other problem, that is related to this, is that type names have a lot more significance >> in Swig than they do in a normal C compiler. For example, Swig can be programmed to do different things with a typename such as "size_t" than it does with "unsigned >> int", even if those types are actually the same in the C compiler through the use of a typedef. This is kind of a subtle point, but if typenames are lost (i.e., all of the >> typedef names) in gcc or gcc-xml, then you're going to cause some pretty serious problems with certain Swig interface files. The mixing of Swig directives and C code >> together is also an issue, but not as major in my opinion. >> >> That said, rewriting the Swig parser is still something that's overdue at this point. However, I'm wondering if another alternative (instead of hacking gcc), might be to >> simply use a more powerful parser generator. A major problem with the Swig parser is that it's trying to parse C++ with a LALR(1) parser. Although not entirely >> impossible, it's never been easy--if anything it's rather hacky. Frankly, a lot of problems in the Swig parser have been due to this. So, you might get better mileage >> by trying to use something more powerful and modern (look at the ANTLR project for example). >> >> Also, if the whole goal of this parser rewrite is simply to support nested classes, don't bother. Supporting nested classes in the current Swig parser would actually be >> easy. The reason nested classes were never really supported is that none of the backend language modules were programmed in a way that could deal with them in any >> kind of sane manner. As a result, that feature just kept getting deferred and deferred and deferred and put in the "eventually, we'll do something with it" category. >> >> For what it's worth, I've been thinking about rewriting the Swig parser in Python as a side-project. Although I've been in hiding from Swig lately, I've actively been >> maintaining the PLY (http://www.dabeaz.com/ply) project on the side. It's always been in the back of my mind to write a Python-based Swig parser just for kicks to >> see what could be done with it. For example, maybe one could build a Swig-ctypes bridge with it or something. I digress. >> >> William, regarding your last comment about getting experienced developers, I have been thinking about how I might get back into Swig development for the last few >> years. However, things have been a little crazy here (Mental note: Getting married, moving, having a baby, and starting a new business all in the same year doesn't >> leave a whole lot of spare time. Yow!). I also have to admit that every time I look at Swig lately, it makes my head explode---especially the UTL. I'm actually glad to >> hear that some plans are underway to try and clean that up. >> >> Anyways, getting back to the gcc issue. I would certainly encourage people to play with that, but my gut feeling tells me that using gcc is going to be significantly >> harder in practice than in theory. Of course, I could be wrong :-). >> > > Dave, I'm glad you've 'returned from the dead' and spotted the post as > your input is probably one of the most valuable here. I really don't > know the internals of gcc, but I'd imagine it has numerous parse stages > and modifying it to get the behaviour we want is not going to be > completely out of the question. Arguably, it could be a bigger job than > hoped, but it might simply require adding in support for the SWIG % > directives and then passing the parse tree on to SWIG. It surely isn't > going to throw the type information away wrt typedefs, so hopefully we > can still use this. However much I might speculate and guess, I'd like > to see it properly analysed and rejected or accepted as a solution. > > I think the gcc parser would offer much more than just nested classes. > There are many problems in expressions, other subtle parse problems and > the ongoing expansion of the c++ language that needs dealing with. I > doubt c++ will stand still in the near future and having an actively > maintained parser is necessary to keep SWIG up to date. All we really > need from gcc is the parse tree. If the parser is rewritten from scratch > it will need some really dedicated person to do that and I can't see > anyone doing that as it requires quite a bit ongoing time and > particularly rare skills; it would be great if Dave found the time to > knock out a new parser in Python or modify the current one, but > realistically I don't see it happening as it is a whole lot more than a > small part time project and we don't have sponsors to fund it. I'd > rather sit on top of the shoulders of giants (gcc) and make the > necessary tweaks for our requirements, which I'd hope were not too onerous. > > As for Haoyu, you're probably thinking what have I unleashed in the > simple query posted?!? All I can say is please feel free to do whatever > takes your fancy, your contribution will be welcome. > > William Yes, so many interesting ideas. :) These days I did a survey on parsing C++. There's another people did a similar survey [1], and concluded that there's no open source C++ frontend existed for general use. I also looked into the source code of GCC 4.3 and gcc-xml. And then I think if we interface with gcc, we would introduce new burden for maintaining the interface. gcc-xml has 4000 LOC for doing so. Since SWIG need to deal with special %... directives, things become more complicated. As we just need to parse C++ header, not the full C++, it may be easier? I noticed nowadays Bison supported GLR [2]. As mentioned in [1], there's successful GLR based C++ parser, such as [3]. So I think we can upgrade our parser to GLR, then various hack in our parser used to avoid shift/reduce and reduce/reduce could be removed, and the parser would become cleaner and easier to maintain. William's idea for DOH removal is similar to what I thought. And David's plugin idea also interesting. So put them together, what I want to do is: First, reimplement the DOH interface by STL. It maybe done by an object inheritance hierarchy or template and function overload. I'll see which one is more suitable. Second, wrap SWIG by SWIG. Since DOH is STL now, it could be easily done by SWIG's existing STL typemap. And then, for example, you can extend the SWIG's Python module in Python, or grab SWIG's parsing tree and do whatever you want in your favorite language. Finally, hacking SWIG become a fun and we could able to put more effort on UTL and parsing frontend. References: [1] Parsing C++, http://www.nobugs.org/developer/parsingcpp/ [2] Bison's GLR support documentation, http://www.gnu.org/software/bison/manual/html_mono/bison.html#GLR-Parsers [3] Elsa: The Elkhound-based C/C++ Parser, http://www.cs.berkeley.edu/~smcpeak/elkhound/sources/elsa/ -- Haoyu Bai |