Re: [Valgrind-developers] vex: r1418 - trunk/priv/guest-ppc32

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> > Handle the out-of-range shift cases for slw/srw in a different way
> > which creates less IR and fewer insns at the back end.  Worth about 2%
> > running bzip2 -d with --tool=none.
>
> how did you find out about optimizing this?

The new JIT does continuous low-overhead profiling of the bbs being
executed, on all architectures.  I simply ran 

  valgrind --tool=none --profile-flags=10001000 bzip2 -tvv bigfile.bz2

and then read the immensely detailed result.  The 1000.. stuff is the
same as for --trace-flags.  This shows the initial code and the IR after
instrumentation and optimisation, for the most popular 100 translations.

To profile V more generally you can now do self-hosting and use cachegrind
(or calltree presumably).  We had some fun with that a couple of weekends
ago -- I managed to run Qt designer running on valgrind --tool=none running
on valgrind --tool=cachegrind.

Before you ask ..

(1) Check out 2 trees, "inner" and "outer".  "inner" runs the app
    directly and is what you will be profiling.  "outer" does the
    profiling.

(2) Configure inner with --enable-inner and build/install as
    usual.

(3) Configure outer normally and build/install as usual.

(4) Choose a very simple program (date) and try

    outer/.../bin/valgrind --weird-hacks=enable-outer   \
       --tool=cachegrind -v inner/.../bin/valgrind --tool=none -v prog

It's fragile, confusing and slow, but it does work well enough for
you to get some useful performance data.

> Hmmm... this leads to a further question: Could a persistant translation
> cache speed up Valgrind

Very likely.  Nobody has ever tried it afaik though.

J