|
From: Christoph B. <bar...@or...> - 2006-09-26 09:37:00
|
Hi, when I use callgrind_control -s I often see: Number of threads: 0 although there are about 13 threads in the process. When I use callgrind_control -b I only see: Frame: Backtrace for Thread 1 <backtrace> Frame: Backtrace for Thread 4 <backtrace> Where are the backtraces for the other threads? Christoph |
|
From: Howard C. <hy...@sy...> - 2006-09-26 09:57:49
|
Has anyone written a cache simulator for cachegrind that tracks caches in separate CPU cores? The main thing I'm interested in would be to see how often cache line sharing occurs in a multithreaded program. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/ |
|
From: Josef W. <Jos...@gm...> - 2006-09-26 10:19:59
|
On Tuesday 26 September 2006 11:36, Christoph Bartoschek wrote: > Hi, > > when I use callgrind_control -s I often see: > > Number of threads: 0 I suspect the script callgrind_control to be the culprit, because the number of threads actually always should be increasing in a callgrind run (probably there is a bug hiding with fork..). Neverless, to circumvent this script, can you run the following in the cwd of a callgrind run: echo s > callgrind.cmd Shortly after, there should exist a file "callgrind.res". What is the line starting with "threads:"? Moreover, how many lines exist beginning with "current-tid:"? This should enumerate all existing threads. Josef |
|
From: Christoph B. <bar...@or...> - 2006-09-26 10:33:07
Attachments:
callgrind.res
|
Am Dienstag, 26. September 2006 12:16 schrieb Josef Weidendorfer: > I suspect the script callgrind_control to be the culprit, > because the number of threads actually always should be increasing > in a callgrind run (probably there is a bug hiding with fork..). > > Neverless, to circumvent this script, can you run the following > in the cwd of a callgrind run: > > echo s > callgrind.cmd > > Shortly after, there should exist a file "callgrind.res". > What is the line starting with "threads:"? > Moreover, how many lines exist beginning with "current-tid:"? > This should enumerate all existing threads. On a simpler programm that does not starts one additional thread I see in callgrind.res: threads: 0 1 2 but only one current-tid: current-tid: 3 Christoph |
|
From: Josef W. <Jos...@gm...> - 2006-09-26 11:53:30
|
On Tuesday 26 September 2006 11:50, Howard Chu wrote: > Has anyone written a cache simulator for cachegrind that tracks caches > in separate CPU cores? No. It would be nice to have it, and it should not be very difficult to add to cachegrind/callgrind. However, there are two caveats: - You have to map VG threads to processors. This needs the simulation of some scheduling strategy inside of valgrind. An easy one is roundrobin assignment: proc = thread % procnumber, but this usually does not match reality because a sane scheduler takes the load of a thread into account with the goal of equally distributing threads loads to processors. - In VG, threads are scheduled in a sequentially order, which does not match any reality on a SMP machine where threads can run simultaneously. This can have large influences on the number of coherency misses you are interested in (false cache sharing). Moreover, VGs scheduling interval should be small to get reasonable results, which slowdowns the simulation even more. Josef > The main thing I'm interested in would be to see > how often cache line sharing occurs in a multithreaded program. |
|
From: Howard C. <hy...@sy...> - 2006-11-30 10:33:56
|
Josef Weidendorfer wrote: > On Tuesday 26 September 2006 11:50, Howard Chu wrote: >> Has anyone written a cache simulator for cachegrind that tracks caches >> in separate CPU cores? > > No. > It would be nice to have it, and it should not be very difficult to > add to cachegrind/callgrind. > > However, there are two caveats: > - You have to map VG threads to processors. This needs the simulation > of some scheduling strategy inside of valgrind. An easy one is > roundrobin assignment: proc = thread % procnumber, but this usually does > not match reality because a sane scheduler takes the load of a thread > into account with the goal of equally distributing threads loads to > processors. > - In VG, threads are scheduled in a sequentially order, which does not > match any reality on a SMP machine where threads can run simultaneously. > This can have large influences on the number of coherency misses you are > interested in (false cache sharing). Moreover, VGs scheduling > interval should be small to get reasonable results, which slowdowns the > simulation even more. This seems to be the greater problem, since VG's scheduler only runs one thread at a time. Right, using a smaller scheduling interval would probably help. We'd want to make sure that each thread executes for exactly the same number of cycles, to have any hope of simulating simultaneous execution. >> The main thing I'm interested in would be to see >> how often cache line sharing occurs in a multithreaded program. -- -- Howard Chu Chief Architect, Symas Corp. http://www.symas.com Director, Highland Sun http://highlandsun.com/hyc OpenLDAP Core Team http://www.openldap.org/project/ |
|
From: Josef W. <Jos...@gm...> - 2006-09-26 13:41:16
|
On Tuesday 26 September 2006 12:33, Christoph Bartoschek wrote: > > Moreover, how many lines exist beginning with "current-tid:"? > > This should enumerate all existing threads. Sorry, wrong explanation. "current-tid" specifies the current running thread. So one such entry is enough :-) For backtraces of the threads, the line frames-T: <backtrace length for thread T> and according function/calls lines are required. > On a simpler programm that does not starts one additional thread I see in > callgrind.res: > > threads: 0 1 2 This shows 3 threads running. The strange thing here is thread "0" - that should never happen, as according to pub_tool_threadstate.h #define VG_INVALID_THREADID ((ThreadId)(0)) thread 0 is not valid, and the code for this line starts at 1. > but only one current-tid: > current-tid: 3 But this thread 3 is not among the ones given above. That is strange. Hmm... time for the next hack ;-) If you have such a "callgrind.cmd" file, you can force callgrind_control to interpret it with the following command (without callgrind running; "callgrind.cmd" will be deleted afterwards!): callgrind_control -w . -b & (sleep 1; rm callgrind.cmd) Anyway, the file you attached looks sane for me. If I run the above on your attached file, it gives the stacktrace for thread 1 and 3. Thread 2 has actually has to frames, so it probably finished already? Perhaps it would be good to output something like: Frame: Backtrace for Thread 2 (emtpy) Josef > > Christoph > |
|
From: Josef W. <Jos...@gm...> - 2006-09-26 14:05:44
|
On Tuesday 26 September 2006 15:41, Josef Weidendorfer wrote: > > threads: 0 1 2 > > This shows 3 threads running. > The strange thing here is thread "0" - that should never > happen, as according to pub_tool_threadstate.h > > #define VG_INVALID_THREADID ((ThreadId)(0)) > > thread 0 is not valid, and the code for this line starts > at 1. Correction: ... the code in callgrind to output the "threads:" line always starts with 1. > Anyway, the file you attached looks sane for me. > If I run the above on your attached file, it gives the > stacktrace for thread 1 and 3. Thread 2 has actually > has to frames, Correction again: ... has zero frames on its shadow call stack, BTW: There is a bug with parsing the content of the "threads:" line. I will prepare a patch. Josef |
|
From: Christoph B. <bar...@or...> - 2006-09-26 14:24:17
|
Am Dienstag, 26. September 2006 15:41 schrieb Josef Weidendorfer: > > Anyway, the file you attached looks sane for me. > If I run the above on your attached file, it gives the > stacktrace for thread 1 and 3. Thread 2 has actually > has to frames, so it probably finished already? > Perhaps it would be good to output something like: > > Frame: Backtrace for Thread 2 > (emtpy) If I attach with gdb at the same position I see the three threads I expect: Thread 1 is the same as Thread 1 in the output of callgrind_control Thread 2 is the same as Thread 3 in the output of callgrind_control Thread 3 is the last thread and something like this is its backtrace: pthread_cond_wait dcmRT_lock_DCM_LongLock start_thread clone This thread is just waiting in on a condition variable. It would also be a bug if this thread was not waiting here. Christoph |