Re: [Valgrind-developers] binary code optimization

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Nuno

Sorry to be slow replying.  Slowness is because I don't have any good
alternative suggestions.

> > As the Valgrind paper says, these are fairly heavyweight optimisations
> > for a binary translation system.  You could try to do some more
> > compiler-style optimisations, but I think the scope for improvement there
> > is not so great. But I could be wrong -- Julian, what do you think?
>
> Uhm, it seems that most easy optimizations are already implemented, as I
> would expect. Anyway I'll take a closer look (maybe next week) to what is
> implemented to see if we have any chance do improve anything at all.
> Inter-block optimizations (e.g. inlining, GCSE, ...) would be really cool,
> but I'm not sure how Vex is suited for that.

I agree .. earlier in the life of the project there was a lot of effort
put into doing good IR level optimisation; and then before 3.3.0 another
round of iropt and code generator tuning.  So most of the easy and 
even the not-so-easy stuff is already done.

From a general usability perspective, when a Valgrind tool runs too slowly
to be usable, that is almost always a problem to do with the tool and not
to do with the core code generation mechanisms.  So speeding up the core
doesn't improve usability much.  Hacking on the tools usually makes a
much bigger difference.

It depends what you want to do.  If you want to do some hacking on
classical compiler transformations then perhaps there are not so many
interesting opportunities left in Valgrind.

> > A very interesting project you could be try would be to implement
> > chaining for Vex -- the Valgrind paper (above) talks about this. 
> > (Section 2.3.6 of http://www.valgrind.org/docs/phd2004.pdf discusses the
> > old implementation of chaining that was in pre-Vex Valgrind -- you can
> > see the actual implementation in Valgrind 2.4.1 at
> > http://www.valgrind.org/downloads/old.html.)
>
> Sounds interesting, but maybe this requires too much valgrind internals
> knowledge, but I'll take a look into this as well.

Another thing you could chase is to consider enhancing the superblock
formation.  Currently vex follows unconditional branches and calls when
forming superblocks, but stops at indirect and conditional branches.
I experimented with the usual simple heuristic for conditional branches:
assume backwards branches taken and forward not taken, and extended it
to follow conditional branches on that basis.  Often made performance
worse, though; although the translations might be a bit faster, they are
also a lot bigger (more I1 misses) and the JIT of course runs more slowly
too.  Not worth the hassle I reckon.

> > this in section 5.4).  A more interesting thing is to speed up any of the
> > real existing tools, especially Memcheck, since that's the most widely
> > used.
>
> Yes, but I'm not sure I can justify such a project about dynamic program
> analysis in a virtual execution class. The class is targeted at dynamic
> binary translation, binary interpretation, intra-block optimizations,
> virtualization, and so on.

Yes, I see that.

Nevertheless if it does happen that working on a tool is OK, it might
be worth looking at Callgrind.  It's a great tool (I have used it quite
a lot in '08) but a little slow; and from some initial profiles it
looks like there are some possibilities for speedup in there.  Also,
porting the branch-mispredict profiling from Cachegrind into Callgrind
would be a cool thing to do.

J