|
From: Nicholas N. <nj...@ca...> - 2003-03-21 10:08:24
|
Hi,
I've been doing some benchmarking, I thought I'd report the results since
AFAIK nobody has done tests this thorough before.
All experiments were performed on an AMD Athlon 1400 MHz with 1GB of RAM
(recent upgrade from 256MB :) running Red Hat Linux 7.1, kernel version
2.4.19. The test programs are a subset of the SPEC2000 suite, basically
those ones I could get to work. All were tested with the ``test''
(smallest) inputs. I didn't test Helgrind because the programs aren't
threaded, so it would have been a bit pointless.
First of all was time performance.
Program Time Nulgrind Memcheck Addrcheck Cachegrind
------------------------------------------------------
bzip2 10.8s 2.5 14.1 10.0 30.9
crafty 3.2s 7.5 55.2 36.5 117.6
gap 0.9s 5.5 35.1 25.9 48.5
gcc 1.5s 8.6 38.9 27.4 70.3
gzip 1.8s 4.4 26.5 20.9 53.3
mcf 0.3s 2.8 12.1 6.3 18.5
parser 3.4s 3.6 21.6 17.0 34.3
twolf 0.2s 5.0 31.1 21.3 50.9
vortex 6.5s 7.7 62.8 52.2 88.1
------------------------------------------------------
ammp 19.3s 2.2 24.1 21.0 46.5
art 26.1s 6.1 14.1 11.4 20.0
equake 2.1s 6.1 30.5 28.3 50.4
mesa 2.7s 4.9 41.5 32.7 63.4
------------------------------------------------------
Column 1 gives the benchmark name, column 2 gives the time taken to run
the benchmark normally, and columns 3--6 gives the slowdown factor for
each skin. Programs above the line are integer programs, those below are
floating point programs.
Second was code expansion.
Program Code size Nulgrind Memcheck Addrcheck Cachegrind
----------------------------------------------------------------
bzip2 32KB 5.1 11.6 6.4 9.1
crafty 153KB 4.4 10.6 5.7 8.1
gap 137KB 5.6 12.4 6.9 9.6
gcc 561KB 5.9 12.8 7.3 9.9
gzip 28KB 5.4 12.1 6.8 9.4
mcf 28KB 5.7 13.0 7.2 9.9
parser 94KB 6.0 13.2 7.4 10.1
twolf 110KB 5.2 11.9 6.7 9.3
vortex 230KB 5.8 12.7 7.6 10.1
----------------------------------------------------------------
ammp 64KB 4.6 11.4 6.8 9.5
art 21KB 5.5 12.5 7.0 9.8
equake 42KB 5.0 11.8 6.7 9.2
mesa 65KB 4.7 10.8 6.4 8.9
----------------------------------------------------------------
Format is the same, except column two shows original code size.
I haven't given averages because (a) I'm never sure which average
(arithmetic, geometric, harmonic, pentatonic, teutonic, moronic, whatever)
to use, and (b) averages encourage people to ignore the actual results.
The most interesting things to me were:
1. Variation in slowdown factor -- the worst slowdown for each skin was
4--6 times worse than the best slowdown.
2. How slow Memcheck is: I was thinking it's 10--20 times slower, but it's
usually worse than that.
3. How slow Addrcheck is: it's certainly not twice as fast as Addrcheck.
Maybe 30% faster.
4. crafty has the worst overall time performance, but the smallest code
expansion. It's a chess puzzle solver that uses 64-bit longs heavily,
which maybe Valgrind doesn't handle so well timewise.
If anyone else who has the SPEC2000 benchmark suite is interested in
running tests on their machine, let me know and I'll send you my test
script and instructions on how to use it.
N
|