|
From: Timur I. <tim...@go...> - 2013-01-07 12:51:16
|
Hi Julian, Thanks for your detailed reply! 2013/1/7 Julian Seward <js...@ac...>: > >> This is roughly a 50x slowdown, I'd expect it to be much smaller when >> doing little to no additional instrumentation. > > For no-instrumentation (--tool=none) I'd expect a slowdown in the range > 3-4. But that's only in the steady state, when the cost of doing the > JITting is small compared to the cost of running the generated code. > > It maybe that the case you mentioned is the worst case -- JITting a huge > amount of code (half of Chromium) and then do very little work before > exiting. Yes, this was my reasoning too. I wonder if it's possible to optimize this worst case. We observe very slow startup of Chromium tests under Memcheck and startup takes ~50% of the run time on the build bots. This particular DumpRenderTree test I've mentioned before actually uses little of Chromium code (this is mostly a WebKit target) so it's not half of Chromium but still it loads very slow. > You can assess that by looking at these lines in the --stats=yes output: > > transtab: new 3,171 (69,078 -> 1,143,658; ratio 165:10) [0 scs] > > which shows how much code is translated (== some measure of the JIT cost), > and > > scheduler: 82,744 event checks. > > which shows how many backwards branches are counted by the simulator > (== some measure of the cost of running the code). What numbers to you see? transtab: new 173,061 (4,555,266 -> 33,374,561; ratio 73:10) [0 scs] scheduler: 79,073,144 event checks. See the full stats attached. >> Is there any way to optimize Valgrind for such usage? > Do more useful work per program-start. Unfortunately, sometimes this is not an option. e.g. in browser security testing (generate random HTMLs until the browser crashes, then minimize) we have to start a full browser, give it a small HTML and restart if the browser crashes. Even if the browser survives, it may be in a broken state, so we restart it anyways. You might imagine 90% of our testing is wasted on startup when using Valgrind :( And Chromium multi-process architecture makes it hard to use fork() tricks to save the startup time... >> What I want is basically to instrument only a couple of .so modules, >> leaving anything else unchanged. > > Not doable for Memcheck/Helgrind -- they need to track the complete > memory state from startup. I totally understand this Memcheck restriction. Interestingly, the memcheck startup+execution time is just 2x-3x more than startup+execution time of nonetool. I wonder if the nonetool worst case optimization could improve the memcheck startup time by 1.5-2x? Let me disagree with the Helgrind consideration, as data race detection works totally fine when memory accesses are sampled, e.g. see [1] and ThreadSanitizer. Of course, one still needs to see all the synchronization in the app, but this is doable using more lightweight methods (e.g. at compile time or at dynamic linking time). Other tools may benefit from partial instrumentation too. E.g. AddressSanitizer can be sped up by only instrumenting "interesting" modules; if you had unaddr-only or leak-only Memcheck versions that'd probably benefit from sampling too. [1] Daniel Marino, Madanlal Musuvathi, Satish Narayanasamy, LiteRace: effective sampling for lightweight data-race detection, Proceedings of the 2009 ACM SIGPLAN conference on Programming language design and implementation, June 15-21, 2009, Dublin, Ireland |