[Codenarc-developer] More performance improvements - options?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi everyone/Chris,

I have an idea for more performance improvements that I believe will
make another big difference in how CodeNarc performs.

There are many, many rules that look at only MethodCallExpressions.
Each of these rules (about 50 of them), walks the AST individually and
searches out these MethodCallExpressions. Most the time of these rules
is spent in just walking the tree, which is 95% redundant processing
for all the rules. It would be better if the tree were walked once,
all the MethodCallExpressions were collected, and then each of the 50
rules called with just the relevant visitMethodCallExpression without
re-walking the entire tree.

One idea to implement this is to create a CompositeRule that has a
list of other rules. Then the composite walks the tree once and calls
the other (children) rules when encountered. The problem with this is
that a lot of logic works off rule names, and the composite has many
rule names. I don't see a clean way to implement this.

Another idea is to create a parent class that all the MethodCall
visitors could extend. Then that parent class could, in a static
block, cache all of the MethodCallExpressions and avoid walking the
tree redundantly. But this is not clean because each rule is run many
times, once for each SourceCode object (a source file). It's complex
to implement, has to worry about multithreading, and uses static
fields on instance objects. I just don't like it.

My last idea, which I think is the best solution, is to add a new
method to the SourceCode interface. SourceCode#getAst exists today. I
propose we add SourceCode#getMethodCallExpressions. Then for each
SourceCode object we would only walk the tree once and collect useful
information about the tree, and rules could get the information when
they need it. This is pretty clean to implement. The lifecycle of the
cache has the same lifecycle as the source file, which makes sense. It
would still need to be multithread safe, but that doesn't sound hard.
I would write this method on SourceCode:
  Map<ClassNode, MethodCallExpression> getMethodCallExpressions()

There are other performance enhancements like this as well. I think
PropertyExpressions have many rules. Then as we want to optimize we
could just add new methods onto the interface.

What do you think of these approaches?

-- 
Hamlet D'Arcy
ham...@gm...