From: Malita, F. <Florin.Malita@Glenayre.com> - 2005-04-14 13:53:09
|
I'm using oprofile 0.8.2 on a P4 2.4GHz SMP box running RH AS 2.1 (kernel 2.4.9-e40). I had to make a couple of minor changes for it to build against that kernel but it seems to work OK except that it completely misses the processes using most of the CPU (which, ironically, are the reason for using oprofile in the first place :) Here's a top snapshot: 9:12am up 13:51, 1 user, load average: 18.01, 17.36, 15.90 1007 processes: 984 sleeping, 23 running, 0 zombie, 0 stopped CPU0 states: 44.1% user, 4.3% system, 15.0% nice, 50.5% idle CPU1 states: 42.4% user, 9.0% system, 17.4% nice, 48.1% idle Mem: 3858868K av, 1501260K used, 2357608K free, 15704K shrd, 195792K buff Swap: 2048184K av, 0K used, 2048184K free 243848K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 2394 root 15 0 35716 34M 19452 S 10.0 0.9 10:47 lsp2_server 2409 root 15 0 35716 34M 19452 S 8.4 0.9 17:26 lsp2_server 2398 root 15 0 35716 34M 19452 S 6.8 0.9 16:06 lsp2_server 2397 root 16 0 35716 34M 19452 S 3.7 0.9 15:34 lsp2_server 2382 root 15 0 35716 34M 19452 S 3.5 0.9 13:37 lsp2_server 2372 root 15 0 35716 34M 19452 S 3.3 0.9 7:14 lsp2_server 2405 root 15 0 35716 34M 19452 S 2.3 0.9 16:09 lsp2_server 2389 root 15 0 35716 34M 19452 S 2.1 0.9 10:37 lsp2_server 2402 root 15 0 35716 34M 19452 S 1.7 0.9 20:21 lsp2_server 3845 root 16 0 1756 1752 1220 S 1.7 0.0 13:46 basebrd 24936 root 26 10 2000 2000 748 R N 1.7 0.0 0:00 top 2276 root 15 0 90216 88M 15044 R 0.9 2.3 2:25 mrsvr ... The lsp2_server is the multithreaded ACE/TAO based app I'm trying to profile. It is consistently using most of the CPU but to my surprise there's nothing about it nor about the ACE/TAO libraries in the oprofile capture... [root@vav3-as-005 samples]# opreport --exclude-dependent CPU: P4 / Xeon, speed 2392.33 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 GLOBAL_POWER_E...| samples| %| ------------------ 931246 95.5890 vmlinux-2.4.9-e.40smp 22726 2.3327 imb 7198 0.7388 e1000.o 4520 0.4640 bash 2566 0.2634 jbd.o 2338 0.2400 ians.o 997 0.1023 ext3.o 864 0.0887 insmod 382 0.0392 usb-uhci.o 273 0.0280 killall5 223 0.0229 opreport 128 0.0131 less 123 0.0126 aic79xx.o 107 0.0110 scsi_mod.o 97 0.0100 raid1.o 84 0.0086 lspci 51 0.0052 top 49 0.0050 oprofile.o 49 0.0050 mysqlshow 40 0.0041 grep 34 0.0035 gawk 31 0.0032 lspdiag 21 0.0022 ssh 14 0.0014 sd_mod.o 8 8.2e-04 dmesg 8 8.2e-04 netstat Nothing there and nothing in the more detailed reports, not even a single event in lsp2_server or ACE/TAO. I find that weird, the only explanation I can think of is that the cycles are not spent in the app or libs but in some lower level (libc/kernel). That is extremely hard to believe though, I'm pretty sure there's lots of cycles wasted by the app itself and even more by the ACE/TAO libs. Maybe somebody can shed some light, I must be missing something... Another issue I have is the oprofile overhead - I expected that to be below 10% CPU but oprofiled is consistently using more than 50%: 9:31am up 14:10, 1 user, load average: 12.79, 16.06, 16.22 873 processes: 869 sleeping, 4 running, 0 zombie, 0 stopped CPU0 states: 42.0% user, 54.0% system, 2.1% nice, 2.1% idle CPU1 states: 82.1% user, 14.0% system, 11.0% nice, 3.0% idle Mem: 3858868K av, 1360412K used, 2498456K free, 16188K shrd, 198684K buff Swap: 2048184K av, 0K used, 2048184K free 250344K cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND 5483 root 25 0 2592 2496 552 R 65.9 0.0 0:05 oprofiled 2396 root 15 0 35828 34M 19452 S 18.1 0.9 15:20 lsp2_server 2392 root 15 0 35828 34M 19452 S 16.9 0.9 13:32 lsp2_server 2391 root 15 0 35828 34M 19452 S 10.8 0.9 19:20 lsp2_server 2399 root 15 0 35828 34M 19452 S 8.4 0.9 15:57 lsp2_server Drilling down on the vmlinux profile, it shows that by far most of the cycles are wasted in vsnprintf() and its helper function number(): [root@vav3-as-005 samples]# opreport --exclude-dependent -- demangle=smart --symbols /boot/vmlinux-2.4.9-e.40smp | head CPU: P4 / Xeon, speed 2392.33 MHz (estimated) Counted GLOBAL_POWER_EVENTS events (time during which processor is not stopped) with a unit mask of 0x01 (mandatory) count 100000 samples % symbol name 458608 49.2467 number 282615 30.3480 vsnprintf 39442 4.2354 __rdtsc_delay 17874 1.9194 get_ksyms_list 15565 1.6714 system_call 7320 0.7860 schedule 6202 0.6660 sys_select Since oprofile seems to have a high overhead here, I'm wondering if the kernel bottleneck (vsnprintf/number) is related to the oprofile module operation. IOW, is oprofile interfering too much with the system for the results to be accurate? I would appreciate any thoughts/ideas. Please CC me, I'm not subscribed. Thanks, Florin |
From: John L. <le...@mo...> - 2005-04-14 14:45:42
|
On Thu, Apr 14, 2005 at 09:52:33AM -0400, Malita, Florin wrote: > I'm using oprofile 0.8.2 on a P4 2.4GHz SMP box running RH AS 2.1 > (kernel 2.4.9-e40). I had to make a couple of minor changes for it to > build against that kernel but it seems to work OK except that it > completely misses the processes using most of the CPU (which, > ironically, are the reason for using oprofile in the first place :) Hmmm. I'm not sure we can help much until you're using a standard Linus kernel with oprofile. > The lsp2_server is the multithreaded ACE/TAO based app I'm trying to > profile. It is consistently using most of the CPU but to my surprise > there's nothing about it nor about the ACE/TAO libraries in the oprofile > capture... I've no idea what the problem could be here. > 931246 95.5890 vmlinux-2.4.9-e.40smp That seems grossly over the top. > Drilling down on the vmlinux profile, it shows that by far most of the > cycles are wasted in vsnprintf() and its helper function number(): This is usually a result of top if I remember correctly. I'm afraid you'll have to check oprofiled.log, use --verbose, maybe add debugging checks to oprofile. Can't think what the problem might be at all right now. regards john |
From: William C. <wc...@re...> - 2005-04-14 15:43:51
|
Malita, Florin wrote: > I'm using oprofile 0.8.2 on a P4 2.4GHz SMP box running RH AS 2.1 > (kernel 2.4.9-e40). I had to make a couple of minor changes for it to > build against that kernel but it seems to work OK except that it > completely misses the processes using most of the CPU (which, > ironically, are the reason for using oprofile in the first place :) OProfile assigns samples to object files based on the addresses for the samples. Thus, it is possible that much of the time is being spent in the kernel on the behalf of the process. OProfile would record those sample for the kernel, top would record the time for the process. That could explain some of the difference. OProfile does a similar thing for shared libraries. There are a couple opcontrol options that might give you a better picture of what is going on: --separate=library # associates samples for shared library with app --separate=kernel # associate samples for kernel with app > The lsp2_server is the multithreaded ACE/TAO based app I'm trying to > profile. It is consistently using most of the CPU but to my surprise > there's nothing about it nor about the ACE/TAO libraries in the oprofile > capture... > > [root@vav3-as-005 samples]# opreport --exclude-dependent > CPU: P4 / Xeon, speed 2392.33 MHz (estimated) > Counted GLOBAL_POWER_EVENTS events (time during which processor is not > stopped) with a unit mask of 0x01 (mandatory) count 100000 > GLOBAL_POWER_E...| > samples| %| > ------------------ > 931246 95.5890 vmlinux-2.4.9-e.40smp > 22726 2.3327 imb > 7198 0.7388 e1000.o > 4520 0.4640 bash > 2566 0.2634 jbd.o > 2338 0.2400 ians.o > 997 0.1023 ext3.o > 864 0.0887 insmod > 382 0.0392 usb-uhci.o > 273 0.0280 killall5 > 223 0.0229 opreport > 128 0.0131 less > 123 0.0126 aic79xx.o > 107 0.0110 scsi_mod.o > 97 0.0100 raid1.o > 84 0.0086 lspci > 51 0.0052 top > 49 0.0050 oprofile.o > 49 0.0050 mysqlshow > 40 0.0041 grep > 34 0.0035 gawk > 31 0.0032 lspdiag > 21 0.0022 ssh > 14 0.0014 sd_mod.o > 8 8.2e-04 dmesg > 8 8.2e-04 netstat > > Nothing there and nothing in the more detailed reports, not even a > single event in lsp2_server or ACE/TAO. I find that weird, the only > explanation I can think of is that the cycles are not spent in the app > or libs but in some lower level (libc/kernel). That is extremely hard to > believe though, I'm pretty sure there's lots of cycles wasted by the app > itself and even more by the ACE/TAO libs. > > Maybe somebody can shed some light, I must be missing something... > > Another issue I have is the oprofile overhead - I expected that to be > below 10% CPU but oprofiled is consistently using more than 50%: This overhead is a result of the default event. OProfile sets the sampling interval to 100,000 clock cycles per sample, this would be 24,000 samples per second per processor on your machine. You could set the sampling interval to be larger to reduce the sample rate. The following option on the opcontrol setup should reduce the sampling by an order of magnitude: --event=GLOBAL_POWER_EVENTS:1000000:1:1:1 > > 9:31am up 14:10, 1 user, load average: 12.79, 16.06, 16.22 > 873 processes: 869 sleeping, 4 running, 0 zombie, 0 stopped > CPU0 states: 42.0% user, 54.0% system, 2.1% nice, 2.1% idle > CPU1 states: 82.1% user, 14.0% system, 11.0% nice, 3.0% idle > Mem: 3858868K av, 1360412K used, 2498456K free, 16188K shrd, 198684K > buff > Swap: 2048184K av, 0K used, 2048184K free 250344K > cached > > PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME COMMAND > 5483 root 25 0 2592 2496 552 R 65.9 0.0 0:05 oprofiled > 2396 root 15 0 35828 34M 19452 S 18.1 0.9 15:20 lsp2_server > 2392 root 15 0 35828 34M 19452 S 16.9 0.9 13:32 lsp2_server > 2391 root 15 0 35828 34M 19452 S 10.8 0.9 19:20 lsp2_server > 2399 root 15 0 35828 34M 19452 S 8.4 0.9 15:57 lsp2_server > > Drilling down on the vmlinux profile, it shows that by far most of the > cycles are wasted in vsnprintf() and its helper function number(): > > [root@vav3-as-005 samples]# opreport --exclude-dependent -- > demangle=smart --symbols /boot/vmlinux-2.4.9-e.40smp | head > CPU: P4 / Xeon, speed 2392.33 MHz (estimated) > Counted GLOBAL_POWER_EVENTS events (time during which processor is not > stopped) with a unit mask of 0x01 (mandatory) count 100000 > samples % symbol name > 458608 49.2467 number > 282615 30.3480 vsnprintf > 39442 4.2354 __rdtsc_delay > 17874 1.9194 get_ksyms_list > 15565 1.6714 system_call > 7320 0.7860 schedule > 6202 0.6660 sys_select > > Since oprofile seems to have a high overhead here, I'm wondering if the > kernel bottleneck (vsnprintf/number) is related to the oprofile module > operation. IOW, is oprofile interfering too much with the system for the > results to be accurate? -Will |