|
From: Julian S. <js...@ac...> - 2005-10-08 21:20:53
|
> > Handle the out-of-range shift cases for slw/srw in a different way
> > which creates less IR and fewer insns at the back end. Worth about 2%
> > running bzip2 -d with --tool=none.
>
> how did you find out about optimizing this?
The new JIT does continuous low-overhead profiling of the bbs being
executed, on all architectures. I simply ran
valgrind --tool=none --profile-flags=10001000 bzip2 -tvv bigfile.bz2
and then read the immensely detailed result. The 1000.. stuff is the
same as for --trace-flags. This shows the initial code and the IR after
instrumentation and optimisation, for the most popular 100 translations.
To profile V more generally you can now do self-hosting and use cachegrind
(or calltree presumably). We had some fun with that a couple of weekends
ago -- I managed to run Qt designer running on valgrind --tool=none running
on valgrind --tool=cachegrind.
Before you ask ..
(1) Check out 2 trees, "inner" and "outer". "inner" runs the app
directly and is what you will be profiling. "outer" does the
profiling.
(2) Configure inner with --enable-inner and build/install as
usual.
(3) Configure outer normally and build/install as usual.
(4) Choose a very simple program (date) and try
outer/.../bin/valgrind --weird-hacks=enable-outer \
--tool=cachegrind -v inner/.../bin/valgrind --tool=none -v prog
It's fragile, confusing and slow, but it does work well enough for
you to get some useful performance data.
> Hmmm... this leads to a further question: Could a persistant translation
> cache speed up Valgrind
Very likely. Nobody has ever tried it afaik though.
J
|