|
From: Konstantin S. <kon...@gm...> - 2007-11-15 16:08:43
|
Hi, I was running helgrind on various medium-sized tests and I experience an issue with helgrind's speed. One of the tests - runs ~3 seconds by itself - runs ~50 seconds under memcheck (passes) - dies after ~150 seconds completing less than 30% of work. The test simply does not like to be delayed that much -- it has too strict timeouts. The test has > 50 threads. I tried profiling the helgrind run (using oprofile) and see the following profile. samples % image name symbol name 82533 14.9762 helgrind avl_find_node <<<<<<<<< this one seem to have an indirect call of kCmp... Might be worth inlining 73035 13.2528 helgrind cacheline_wback 66704 12.1039 helgrind evh__mem_help_read_4 <<<< this one seems to be called indirectly... 55637 10.0958 anon (tgid:19421 range:0x4b2f000-0x5c55000) (no symbols) <<<< I suspect that this is the actual user's code, right? 46728 8.4791 helgrind cacheline_fetch 26357 4.7827 helgrind msm__handle_read 25291 4.5892 helgrind shadow_mem_set32 (I tested unmodified version I checked out yesterday from trunk). What is the typical slowdown of helgrind (compared to native run or compared to memcheck)? Does it depend on the number of threads/locks? Do you have any suggestions on further analysis/improvements of helgrind performance? Thanks, --kcc |
|
From: Julian S. <js...@ac...> - 2007-11-15 18:43:44
|
> What is the typical slowdown of helgrind (compared to native run or > compared to memcheck)? 30 - 60 x, but is very workload dependent, much more so than memcheck > Does it depend on the number of threads/locks? It depends on a lot of things. The following are very expensive: - deleting (eg, free, delete[], stack deallocation) containing a lock - pthread_join > Do you have any suggestions on further analysis/improvements of helgrind > performance? Run (the problem program) with -v. It prints ~ 100 lines of stats at the end. Send those. J |
|
From: Julian S. <js...@ac...> - 2007-11-16 09:52:53
|
> With appropriate interceptors added to hg_interceps.c, the picture is > different. > The test now fails due to missed timeouts (at least it looks so). > See attachment log2. We now see much more LSETs. Try increasing N_WAY_BITS from 16 to 17. That might improve performance a bit. Run on the fastest machine you have, with the largest L2 cache you can find (high end Core 2 machine?) > How do I make sure that the test fails due to delayed timeouts and not due > to something else? I don't know. Make your application have longer timeouts? > Can I run helgrind so that it does all the intrusion (instrumentation), but > does *not* do any TSET/LSET/etc bookkeeping? No. What functionality should and should not be available in this "reduced functionality" mode? > Yet another question: can I include helgrind.h into my program as an > alternative to creating intercepts for my own locking primitives? Do you > have examples? No and no. J |
|
From: Konstantin S. <kon...@gm...> - 2007-11-16 13:22:48
|
On Nov 16, 2007 12:52 PM, Julian Seward <js...@ac...> wrote: > > > With appropriate interceptors added to hg_interceps.c, the picture is > > different. > > The test now fails due to missed timeouts (at least it looks so). > > See attachment log2. We now see much more LSETs. > > Try increasing N_WAY_BITS from 16 to 17. That might improve performance > a bit. Run on the fastest machine you have, with the largest L2 cache > you can find (high end Core 2 machine?) Well, this makes the program fail ~10% faster :) > > > > How do I make sure that the test fails due to delayed timeouts and not > due > > to something else? > > I don't know. Make your application have longer timeouts? > > > Can I run helgrind so that it does all the intrusion (instrumentation), > but > > does *not* do any TSET/LSET/etc bookkeeping? > > No. What functionality should and should not be available in this > "reduced functionality" mode? I've commented out the body of hg_handle_client_request -- now everything passes. It's not very useful though. :) :) I'll continue digging this... > > > > Yet another question: can I include helgrind.h into my program as an > > alternative to creating intercepts for my own locking primitives? Do you > > have examples? > > No and no. > > J > |
|
From: Julian S. <js...@ac...> - 2007-11-18 01:43:09
|
On Friday 16 November 2007 14:22, Konstantin Serebryany wrote: > On Nov 16, 2007 12:52 PM, Julian Seward <js...@ac...> wrote: > > > With appropriate interceptors added to hg_interceps.c, the picture is > > > different. > > > The test now fails due to missed timeouts (at least it looks so). Try updating to >= r7179. There's some small chance that 7179 improves the situation. J |
|
From: Konstantin S. <kon...@gm...> - 2007-11-22 15:24:48
|
Nope, it did not help. :( But thanks anyway. --kcc On Nov 18, 2007 4:42 AM, Julian Seward <js...@ac...> wrote: > > On Friday 16 November 2007 14:22, Konstantin Serebryany wrote: > > On Nov 16, 2007 12:52 PM, Julian Seward <js...@ac...> wrote: > > > > With appropriate interceptors added to hg_interceps.c, the picture > is > > > > different. > > > > The test now fails due to missed timeouts (at least it looks so). > > Try updating to >= r7179. There's some small chance that 7179 improves > the situation. > > J > |