|
From: Jeremy F. <je...@go...> - 2002-11-20 09:57:11
|
On Wed, 2002-11-20 at 01:08, Julian Seward wrote:
> I'd still like to know why bzip2 runs at approx a 10x slowdown, if the
> translations are chained, there is no callouts to helpers with
> --skin=none. So the only thing happening is jumping between translations.
> In which case I'd expect a 10 x code size increase, but it's more like
> 4:1. So what's going on, I wonder?
Well, any indirect jumps (which includes calls into dynamic libraries
and returns) are still going through the dispatch loop. I think I've
worked out how to extend the chaining mechanism to handle indirect jumps
(though I think returns are still likely to be tricky, unless you've got
functions with a limited number of call-sites).
> It might be helpful to do this benchmarking with a simple microbenchmark
> with only a couple of hot bbs in the inner loop and no I/O, to factor out
> those effects.
I'm measuring user time, so it should factor out time waiting for I/O
and time spent in the kernel.
> Easy to test the net effect; disable %EIP generation altogether
> (one-liner in vg_to_ucode.c). I bet it gives another 10% or so.
> I would try it now but I have to rush off and duke it out with ARM
> code all day :-)
Good guess - it gets 20% on bzip2, 10% on cc1 (that's still keeping EIP
up to date once per basic block). I think you're right about computing
the EIP as we need it rather than keeping it up to date. I guess we
could tack a table onto the generated code itself or something.
baseline: bzip2 < TAGS > /dev/null
time=0.49s
valgrind --skin=none --chain-bb=no --extended-bb=no --enable-inceip=yes bzip2 < TAGS > /dev/null
time=6.09s ratio:12.4
valgrind --skin=none --chain-bb=yes --extended-bb=no --enable-inceip=yes bzip2 < TAGS > /dev/null
time=4.85s ratio:9.8
valgrind --skin=none --chain-bb=no --extended-bb=no --enable-inceip=no bzip2 < TAGS > /dev/null
time=5.29s ratio:10.7
valgrind --skin=none --chain-bb=yes --extended-bb=no --enable-inceip=no bzip2 < TAGS > /dev/null
time=4.04s ratio:8.2
baseline: /usr/lib/gcc-lib/i386-redhat-linux/3.0.4/cc1 -fpreprocessed coregrind/x.i -quiet -dumpbase x.i -O2 -version -o /dev/null
time=4.47s
valgrind --skin=none --chain-bb=no --extended-bb=no --enable-inceip=yes /usr/lib/gcc-lib/i386-redhat-linux/3.0.4/cc1 -fpreprocessed coregrind/x.i -quiet -dumpbase x.i -O2 -version -o /dev/null
time=80.11s ratio:17.9
valgrind --skin=none --chain-bb=yes --extended-bb=no --enable-inceip=yes /usr/lib/gcc-lib/i386-redhat-linux/3.0.4/cc1 -fpreprocessed coregrind/x.i -quiet -dumpbase x.i -O2 -version -o /dev/null
time=58.96s ratio:13.1
valgrind --skin=none --chain-bb=no --extended-bb=no --enable-inceip=no /usr/lib/gcc-lib/i386-redhat-linux/3.0.4/cc1 -fpreprocessed coregrind/x.i -quiet -dumpbase x.i -O2 -version -o /dev/null
time=74.03s ratio:16.5
valgrind --skin=none --chain-bb=yes --extended-bb=no --enable-inceip=no /usr/lib/gcc-lib/i386-redhat-linux/3.0.4/cc1 -fpreprocessed coregrind/x.i -quiet -dumpbase x.i -O2 -version -o /dev/null
time=53.51s ratio:11.9
Bedtime.
J
|