Thread: slowdown of opreport during profiling

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi, dear experts!

I am using oprofile for cluster wide profiling of an x86_64 HPC cluster 
with dual-processor nodes (2 x Xeon E5345 @ 2.33GHz)
running RHEL 4.8 with 2.6.9-55.ELsmp kernel and oprofile 0.8.1-36 from 
RHEL distro.

I've implemented simple oprofile + ganglia monitoring system integration 
which allows to monitor value of some metric
(BUS_TRAN_MEM in my case) got from oprofile through gangia's web 
frontend on every node.
For this purpose I set startup of oprofile at node start:
sudo /usr/bin/opcontrol --start --event=BUS_TRAN_MEM:10000 --no-vmlinux
and periodical (once for minute) run of opreport for node wide profiling
opreport -l -m all -r -t 1
which allows to count metric change during past minute and send 
information to ganglia.

And that is working fine without any visible overhead.

Recently I've tried to set up profiling with differentiation by one-node 
jobs running on the cluster.
For this purpose I set startup of oprofile at node start with profiling 
separately by thread:
sudo /usr/bin/opcontrol --start --event=BUS_TRAN_MEM:10000 --no-vmlinux 
--separate-thread
and once after job finish run of opreport
opreport -t 1 -r tgid:${JOB_TGID} --merge=tgid
which allowed to get job's "usage of a metric" during execution.

And that is working fast and perfectly.

But as a consequence I faced significant slowdown of execution of 
opreport used for node wide profiling as described earlier (opreport -l 
-m all -r -t 1).
Specifically, just after start of oprofile daemon, time of execution of 
opreport is less than a second but after day of running profiling daemon,
time of execution becomes close to one minute which creates sensible 
overhead on cluster nodes.

Probably it is concerned with increasing of profiling information volume 
in connection with --separate-thread profiling.
But could somebody give a hint on any workarounds which help in 
decreasing opreport execution time in my case.

Best regards,
Arthur

Thread: slowdown of opreport during profiling

oprofile-list