|
From: Julian S. <js...@ac...> - 2002-11-19 19:03:51
|
On Tuesday 19 November 2002 5:20 pm, Jeremy Fitzhardinge wrote:
> On Tue, 2002-11-19 at 08:12, Nicholas Nethercote wrote:
> > There are sure to be other problems, though. And I have no idea how it
> > would affect the MemCheck instrumentation; already the code produced
> > contains redundant "PUTL tX, %ESP" instructions because MemCheck always
> > needs %ESP to be up-to-date.
>
> Perhaps a slightly simpler approach would be to dedicate a register to
> ESP rather than a memory location, so that it is persistently in
> register across basic blocks. With any luck the register allocator
> could then remove redundant moves, and turn (assuming we reserve %edi
> for %ESP):
Yes, this seems a good compromise. We already played with the issue of
5 vs 6 allocatable registers and it makes very little difference at least
to memcheck, so perhaps it would be ok to give one to %ESP.
Nick, one thing you seemed to not comment on is ...
> addl $2, 36(%ebp) (the INCEIP)
ie, we keep exact track of %eip; does DynamoRIO ? What sort of
exception model at they presenting?
> I got basic block chaining working last night. I got about 25%
> improvement (which is nice, but I was hoping for more) in the particular
> benchmark I tried (gcc 3.0.4's cc1 -O2 pass over vg_from_ucode). On the
> whole, the performance was pretty dismal: the native run took about 4.6
> seconds; the non-chained-bb nulgrind took 81.2 seconds, and the
> chained-bb nulgrind took 60 seconds. I haven't looked into it further:
> I was hoping it would be a largely CPU-bound test, but maybe its
> actually spending all its time in malloc or something.
Strange it's so slow. I think you should try bzip2; that is almost
completely compute bound and does something like 7 malloc calls per
file processed. Here's what I have:
time ./Inst/bin/valgrind --skin=none ~/bzip2-1.0.2/bzip2 -v < ~/wbt00.ps
> /dev/null
(valgrind startup msgs deleted)
(stdin): 13.372:1, 0.598 bits/byte, 92.52% saved, 782064 in, 58487 out.
real 0m7.760s
user 0m7.670s
sys 0m0.030s
time ~/bzip2-1.0.2/bzip2 -v < ~/wbt00.ps > /dev/null
(stdin): 13.372:1, 0.598 bits/byte, 92.52% saved, 782064 in, 58487 out.
real 0m0.738s
user 0m0.680s
sys 0m0.050s
So more like a 11 - 12 x slowdown than the 35 x you get.
> My next experiment might be [...]
Let me encourage you to make measurements (direct vs indirect jump counts)
to gain insight into your current hackery, before embarking on more. I for
one would like to be assured with numbers that the Right Thing is happening
and that our assumptions about costs, event frequencies, etc, are justified.
J
|