Thanks for your suggestions. Sorry I couldn't respond sooner. I tried
running single and multiple instances of single threaded program but I can
account for all the cycles. I suspected the problem may have to do with
multi-threading so I tried a multithreaded test program with 16 threads but
I can account for all the cycles. The problem reproduces only with our real
application (text search engine). But then I can account for all the cycles
using the same application on Alpha and profiling using DCPI.
I didn't see any buffer overflows or anything else suspicious. So I don't
know where to go from here. I guess I'll try to make VTune work and then
compare results (will let you know).
From: Philippe Elie [mailto:phil.el@...]
Sent: Saturday, September 28, 2002 9:45 PM
To: Satish Katiyar
Subject: Re: CPU cycle accounting
Satish Katiyar wrote:
> We are not able to account for all the clock cycles using Oprofile. We
> measure CPU_CLK_UNHALTED (count=10000) on a dual CPU server running at 1.2
> GHz each for 60 seconds interval. We add all the clock cycles (kernel and
> user processes) and get a number of 2.4 billion cycles / sec (two CPUs)
> the server is (almost) idle. But if we measure the same event when the
> system is spending about 95% time in a multithreaded application, we can
> account for only 1.8 billion cycles! Does anybody know what might be the
> reason ?
on UP the accuracy is very good, can you try this silly test
on your idle smp
int i = 0;
for (i = 0 ; i < 600000000; ++i) ;
$ time a.out
nr a.out samples * samples_rate == time of runs in sec * cpu speed in Hz
then trying launchig twice run of the test case in parallel.
Perhaps it will give some clue.
> Thinking we may be losing buffer, I also tried doubling the buffer, hash
> note sizes for OProfile but it did not show any improvement in
> Then I took sample over 300 seconds interval and the number of cycles I
> could account for, improved a little bit but not completely. The
> fact is that I can consistently account for all the cycles when the system
> is relatively idle. What does this tell us ?
nothing for now, use dmesg to check if module report
any buffer overflow, if dmesg don't report any buffer
overflow no samples would lost in module.
after a run of your application and stopping profiling
can you paste on the mail list /var/lib/oprofile/oprofile.log
which report some statistics about lost samples etc.
> Could it be related to interrupt delivery on a busy system ? May be the
> OProfile daemon fails to recieve all the data from kernel when the system
> loaded ? Any pointers will be helpful.
I've no idea, I'll write a small multithread application and
will try it on a smp box.
Have you other unusual things on your box (such APM) what kernel
version/processor do you use ?
Is your application doing a lot of context switch ?