|
From: Vojtech F. <voj...@se...> - 2010-05-05 08:21:15
|
Hi,
I wanted to measure how much time our application spends in situations like
waiting for a locked mutex, IO, etc. - things that instruction count can hardly
measure. Callgrind has an option --collect-systime that looks to be what I want.
However, I have problems interpreting the results.
First of all the app is multithreaded. But I am interested only in one thread
that does the important work. The results suggested that a lot of time is spent
in pthread_cond_signal in a function responsible for some logging stuff. When I
turned off logging and measured the time (without valgrind) it didn't seem to
help to speed up the app. I might do my time measurement in a wrong way but I
would like know more about collecting systime. The valgrind manual does not say
much about it.
How does it work with threads? When I have 3 threads that all sleep 1 second
should the time collected be 1 or 3? (1. The whole process sleeps 1 second and
this is somehow distributed among the threads or 2. Each thread sleeps 1 second
and the result is a sum) When run with --separate-threads=yes, should each
thread show 1 spent in sleep?
>From what I saw (see below) it looks like case 1. But I don't understand how the
individual threads contributed to the result.
I tried to understand what is going on using a small sample:
#include <pthread.h>
#include <unistd.h>
#include <stdio.h>
void* child_fn ( void* arg )
{
printf("Thread %d started\n", (int)(arg));
sleep(1);
printf("Thread %d finished\n", (int)(arg));
return NULL;
}
int main ( void )
{
const int th_cnt(2);
pthread_t threads[th_cnt];
printf("Main thread started\n");
for (int i1 = 0; i1 < th_cnt; i1++)
pthread_create(&(threads[i1]), NULL, child_fn, (void*)i1);
printf("Main thread started other threads\n");
sleep(4);
printf("Main thread finished sleeping\n");
for (int i1 = 0; i1 < th_cnt; i1++)
pthread_join(threads[i1], NULL);
printf("Main thread joined other threads\n");
return 0;
}
Run in callgrind, with output:
"Main thread started
Main thread started other threads
Thread 0 started
Thread 1 started
Thread 0 finished
Thread 1 finished
Main thread finished sleeping
Main thread joined other threads"
In KCacheGrind the result (for sysTime) is a disconnected graph. 5010 spend in
start_thread (called twice) and no callees from start_thread. Another component
of the graph contains the main function together with sleep, both having cost
1003 (called once). child_fn function (called twice) has weight 0.
With output:
"Main thread started
Thread 0 started
Main thread started other threads
Thread 1 started
Thread 1 finished
Thread 0 finished
Main thread finished sleeping
Main thread joined other threads"
It looks completely different. start_thread now takes 6025, calls child_fn
(1008), that calls sleep (1008). Function main takes only 2.
When I removed "sleep(4);" line from the main thread, one possible output was:
"Main thread started
Thread 0 started
Thread 1 started
Main thread started other threads
Main thread finished sleeping
Thread 0 finished
Thread 1 finished
Main thread joined other threads"
In KCacheGrind:
main (2005) -> pthread_join (1003) -> _dl_sysinfo_int80 (2005)
start_thread (1000)
child_fn(0)
BTW: Maybe the outputs are not so reliable (and relevant) as I assume due to
buffering and synchronization.
Thanks.
|