Has somebody already had the idea to build a simple class diagram displaying for Perl Classes with GEF?
We have a little Perl script here that parses class files and generates a Graphviz directed graph.
It would by much cooler to have this integrated in Eclipse with GEF. So we would have round-trip engineering - ok, just the half turn.
I don't know about GEF, but something like Java's Type Hierarchy view would be definitely nice to have.
Apropos parsing Perl source: I have been reworking EPIC to parse code more accurately and faster for quite a few weekends now. I chose ANTLR instead of the hand-crafted regexps on which it has relied so far. However, I am nearing the (late) conclusion that parsing accurately enough in real time for fast and correct syntax highlighting and to support features like "Open SUB Declaration" and code completion is close to impossible.
I also briefly looked at the famed PPI module (there is talk that it will be integrated into Komodo?), but it takes 8 seconds to process my sample large file on a fast machine. Even caching the result (which you can IMHO hardly do on an edited document) brings it down to just 1.5 seconds. So PPI seems unacceptable as support for syntax highlighting (with ANTLR I get something like 300ms, however with a lower quality of results).
Anyway, PPI might be interesting for less interactive things like refactoring or creating accurate class hierarchies. Maybe you should have a look at it.
Currently I am hesitating whether to check in my latest and "greatest" ANTLR-based code to CVS, mostly because I am unsure whether other EPIC devs could live with it. Not that there is much going on, but one should not take architectural changes like this one too lightly.
I had a discussion with some students here on Saturday who built an Entity Relationship Editor for Eclipse based on GEF / Graphical Editing Framework:
As I understood, GEF provides a Broadcaster/Listener Architecture to plug in a DataModel of the Graph of the Code (that optionally can be build on EMF / Eclipse Modelling Framework). All Changes to the DataModel will be propagated to GEF, i.e. the Diagram.
You already have the Informations about Classes and Methods (alas not the member Variables, because Perl is to dynamic to predict the members before runtime) somewhere in your Model that makes the Outline View.
With a grep or your new ANTLR-code it should be easy to also have the "use base" informations.
With that Model we could build a nice graphical Class Diagramm (or maybe Class Hierarchy View).
But the point is: I absolutely don't know when I should do this... (it would be nice, by the way, to write some Java Code for change... I miss it. :-) ) I never have written anything for the Eclipse Platform.
And the thing with OpenSubDeclaration and Refactoring: I agree with you, that this one will be very, very difficult (or even impossible?) with Perl, because of is type-lessness, odd syntax and dynamics. I've read non-ending discussions about this but I am not experienced enough to make my own accurate opinion... :-(
I crossed PPI some days ago, but I really don't know if it is powerful enough to do a "move method" or "change method signature"...?
I have also written a GEF-based editor for presenting (Java class and package) dependency diagrams two years ago or so. It was quite fun and easy. The major part of work was implementing a graph layout algorithm because there was nothing suitable in GEF back then (I don't know about today).
The Perl model you mentioned only exists in a rudimentary form for the currently edited file. The way it was implemented originally (and can be seen in the current CVS code), the whole source file is simply grepped through during inactivity, and the results of this operation are directly fed into the Outline view. Note that it also spots the "use" directives found in the file. Folding is implemented on top of the "bracket matching" code, which in turn uses information generated by syntax highlighting. The desirable model-view separation is hardly present - it's all view and very little of the actual "model". But it mostly works and can be certainly admired for that reason.
I am currently attempting to replace the current "smoke & mirrors" architecture with something more coherent, like parse once and generate a useful model for multiple purposes. In fact, "parse once" is not sufficient, reparsing must happen incrementally and quickly on each change. Some edits cause major problems performance-wise: for example opening a POD comment at the beginning of a huge file or entering a curly brace would both trigger reparsing the whole file. This stuff is quite challenging to implement acceptably. The benefits would be that syntax highlighting, bracket matching, outline etc. could all be implemented in terms of the single model rather than by depending on each other in mysterious ways.
I started small by trying to fix the Open SUB Declaration feature. However, as I went further I experienced an avalanche effect - one change led to another and now I am on the verge of replacing the whole cbg.editor plug-in on which EPIC heavily depends with something new and tailored to Perl's idiosyncracies. These are big, non-incremental changes, not consulted with anyone and not risk-free. So far the results show promise, yet I would not hold my breath waiting for a release.
The class hierarchy/diagram you envision would require collecting information from multiple source files, a whole project and beyond. I think it would be reasonably easy to gather the information in a dedicated pass over the files (contradicting my central model idea), but it would be challenging to achieve any round-trip functionality. Also note that class hierarchies can be expressed by @ISA, not necessarily by use base. Furthermore, the use statements can appear anywhere (for example inside of conditional branches). These things can be just as dynamic as variables in principle. Usually - in most cases? - they resemble static declarations, which makes it worthwhile to consider features such as you described.
Apart from the parse-and-collect-information approach it might be worthwhile to consider gathering information dynamically: just invoke an instance of Perl interpreter, let it execute "use XYZ" and query whatever you like. This approach is already taken by EPIC sometimes (for example, in code autocompletion). It is quite appealing to leverage the Perl interpreter for introspection rather than reinvent the wheel yourself. The major limiting factor is the performance overhead, but it should not matter much for your feature. I'd say, if you enjoy it, go for it.
As far as refactoring goes, the usual argument I hear is "refactoring originated from Smalltalk, which is also dynamically typed, so it is possible". I don't know Smalltalk nor its refactoring tools; they supposedly consult the user to resolve ambiguities. My current view is that if one wanted to implement something like that for Perl, PPI would be a good start because it provides as accurate information as possible for a single file. Beyond that much is unclear - PPI does not even pretend to do cross-module analysis or type inferencing.
From my perspective, the fact that no popular refactoring tools exist for Perl (and C++?) after so many years speaks against these languages. But then, one does not always have the freedom to choose.
Jan, I am so happy to see your posting - I admire your approach which is both thoughtful and pragmatic.
If we can add even a few refactoring ioperations that is progress - we have one ("extract subroutine") now, and having more will only help. I suggest adding whatver is easy to add - that may attract more users and developers, and thus, more overall brainpower.
Eventually, Perl6 might make all this easier, but until then, adding a little at a time is a Good Thing.
(See my article on Perl needing better tools: http://www.perl.com/pub/a/2005/08/25/tools.html\)
Yes, the graph layout algorithm might be the biggest problem. I don't know yet if there is something suitable in GEF. If there is no, there is no use to write such a class diagram plugin.
Furthermore, as you say, it would be difficult to integrate a model over all files of a project.
Nevertheless, I have realized that this project is a big number to big for me. I'd had to invest such an amount of time that I suppose it'd be easier to persuade my company to stop using Perl in favour of Java. ;-) And that's even a better perspective.
Surfing around on Eclipse I found this new Project: http://www.eclipse.org/proposals/dltk/
They seem to have the same idea, but through a totally different approach. I' ve not read through these things carefully, but I doubt that the are able to face all this problems with the dynamism and awful syntax of Perl we discussed.
Nevertheless: I will of course continue using e-p-i-c all day because there still is a lot of Perl Code here.
If you're interested I could post our little Perl Script that generates a Graphviz Graph Classdiagramm with inheritance an method names. It's a very simple approach but very useful.
To summarise the PPI situation...
1. It was never intended for real-time work.
There are so many cascading problems parsing Perl code I took the attitude that it was more important to be right than fast. In fact, there are some problems that are provably impossible and chasing them becomes an exercise in folly.
That said, it has only this week been completed to the point of parsing random line noise, so as far as Make It Work go we're done.
Which means Make It Fast is only just getting started. Particularly for large documents (go read Genezzo::Parse::SQL) it is most definitely slow.
2. PPI is usable for background processing
Even the Perl interpreter is useless for real-time, and so PPI is intended only for use in background processing.
3. PPI does not implement cross-method analysis.
PPI doesn't implement any analysis at all, it is a parser and only a parser. However, it does provide the necesary base for writing these sorts of code.
See things like Perl::Metrics, PPIx::Analyze and so on.
As for type inferencing, you should know as well as I Perl doesn't support types, and any attempt to try in a universal way would be an exercise in folly.
I've been slowly working towards a proof of concept of what a "refactoring editor" might actually look like, and the things that can be done for Perl might not look like you are used to in Eclipse. But we'll see...
I am very gald to see you weigh in here, and everything you posted makes sense to me.
In January I expect to be working with Jeff Thalhammer <firstname.lastname@example.org>, who wrote "perlcritic", which uses PPI. Maybe we can help make PPI faster, and/or help add more refactoring features to EPIC.
Thanks for your insights. While I agree with the "make it work"/"make it fast" approach in general, I realise that in some situations "make it work fast (for some people)" might be even more preferable ;-).
My impression is that tools for inconvenient languages like Perl, C++ don't progress because people draw the conclusion that "if you know you can't make it work in general, don't bother to try". PPI is a counterexample, yet I think the "folly"-related pieces of your comment might reflect some of that attitude.
It is not a folly to try making a real-time Perl parser which works better than present tools - which also means faster - when applied to my and my co-workers' code (a limited subset of Perl indeed). On the other hand, I do not care as much about it being able to parse line noise or a hand-crafted sample which proves that "it cannot work".
Having said that, I would be very interested in how much you can speed up PPI in the coming next phase of development. Also, thanks for getting PPI out in the first place. Regardless of its suitability for one purpose or another, it undeniably helps us better estimate the challenge. It can also act as an excellent correctness benchmark for other aspiring Perl parsers.
The attitude is born of many MANY wounds and scars, the result of years of failure on the part of many brilliant people.
In fact, until I got my Perl Foundation grant, pretty much the entire Perl community was convinced I was nuts for thinking I had a way around the problems.
I do think your parser will be needed. Any sufficiently useful Perl editor will need three parser.
1. Perl itself for things like test runs, debugging and other functions that are worth the dangers.
2. PPI or the like (syntactically-thorough) for safety and completeness when doing significant tasks (especially when modifying files).
3. A real-time parser for (at the very least) syntax highlighting and other tasks that don't alter the code.
If you are working on the last of these good luck, and feel free to ask questions. There are many things you can learn from the ways PPI does things.
But you REALLY need to pick your battles when attacking the syntax. Parsing Perl is a fractally-complex problem, every time you think you've solved something, you find 3-4 more problems.
Some of the problems with Perl's syntax will kick your ass if you try to beat them comprehensively. Other problems it is certainly possible to make some headway against.
But reading your description brought back the tilting-at-windmills feelings I remember from when I tried to do the same thing 3-4 years ago.
So I guess my advice is "be careful when tilting at windmills".
As for making PPI faster, there's a few different directions to go in (take a look at PPI::XS as a good starting point is you know any XS) and when you get some time to look into it, I'll be happy to advise and assist :)