|
From: Yan <ya...@ya...> - 2014-02-26 15:21:51
|
For (3), would something like making all statements conditional (like LoadG, StoreG, and Exit are) do, or are we talking about something more complex? On Wed, Feb 26, 2014 at 7:16 AM, Julian Seward <js...@ac...> wrote: > > On 02/26/2014 12:23 PM, Kirill Batuzov wrote: > > I tend to agree with Kirill. It would be great to make Valgrind/Memcheck > faster, and there are certainly ways to do that, but using LLVM is not > one of them. > > > Second, in DBT you translate code in small portions like basic blocks, > > or extended basic blocks. They have very simple structure. There is no > > loops, there is no redundancy from translation high level language to > > low level. There is nothing good sophisticated optimizations can do > > better then very simple ones. > > Yes. One of the problems of the "Let's use LLVM and it'll all go much > faster" concept is that it lacks a careful analysis of what makes Valgrind > (and QEMU, probably) run slowly in the first place. > > As Kirill says, the short blocks of code that V generates make it > impossible for LLVM to do sophisticated loop optimisations etc. > Given what Valgrind's JIT has to work with -- straight line pieces > of code -- it generally does a not-bad job of instruction selection > and register allocation, and I wouldn't expect that substituting LLVM's > implementation thereof would make much of a difference. > > What would make Valgrind faster is > > (1) improve the caching of guest registers in host registers across > basic block boundaries. Currently all guest registers cached in > host registers are flushed back into memory at block boundaries, > and no host register holds any live value across the boundary. > This is simple but very suboptimal, creating large amounts of > memory traffic. > > (2) improve the way that the guest program counter is represented. > Currently it is updated before every memory access, so that if an > unwind is required, it is possible. But this again causes lots of > excess memory traffic. This is closely related to (1). > > (3) add some level of control-flow if-then-else support to the IR, so > that the fast-case paths for the memcheck helper functions > (helperc_LOADV64le etc) can be generated inline. > > (4) Redesign Memcheck's shadow memory implementation to use a 1 level > map rather than 2 levels as at present. Or something more > TLB-like. > > I suspect that the combination of (1) and (2) causes processor write > buffers to fill up and start stalling, although I don't have numbers > to prove that. What _is_ very obvious from profiling Memcheck using > Cachegrind is that the generated code contains much higher proportion > of memory references than "normal integer code". And in particular > it contains perhaps 4 times as many stores as "normal integer code". > Which can't be a good thing. > > (3) is a big exercise -- much work -- but potentially very beneficial. > (4) is also important if only because we need a multithreaded > implementation of Memcheck. (1) and (2) are smaller projects and would > constitute a refinement of the existing code generation framework. > > > In conclusion I second what have already been said: this project sounds > > like fun to do, but do not expect much practical results from it. > > The above projects (1) .. (4) would also be fun :-) and might generate more > immediate speedups for Valgrind. > > J > > > > ------------------------------------------------------------------------------ > Flow-based real-time traffic analytics software. Cisco certified tool. > Monitor traffic, SLAs, QoS, Medianet, WAAS etc. with NetFlow Analyzer > Customize your own dashboards, set traffic alerts and generate reports. > Network behavioral analysis & security monitoring. All-in-one tool. > > http://pubads.g.doubleclick.net/gampad/clk?id=126839071&iu=/4140/ostg.clktrk > _______________________________________________ > Valgrind-developers mailing list > Val...@li... > https://lists.sourceforge.net/lists/listinfo/valgrind-developers > |