|
From: Patrick J. L. <lop...@gm...> - 2014-02-26 16:40:59
|
On Wed, Feb 26, 2014 at 7:16 AM, Julian Seward <js...@ac...> wrote: > > > What would make Valgrind faster is > > (1) improve the caching of guest registers in host registers across > basic block boundaries. Currently all guest registers cached in > host registers are flushed back into memory at block boundaries, > and no host register holds any live value across the boundary. > This is simple but very suboptimal, creating large amounts of > memory traffic. Sounds more like large amounts of L1 cache traffic. > I suspect that the combination of (1) and (2) causes processor write > buffers to fill up and start stalling, although I don't have numbers > to prove that. Maybe, but maybe not. (3) and (especially) (4) might well have greater impact. It is notoriously difficult to guess where a modern CPU is spending its time without a profiler. Random memory access is of course a disaster, but that sounds more like (4) than (1) or (2). It would be very interesting to see a micro-profile of Valgrind. - Pat |