From: Harry M. <hj...@ta...> - 2005-11-23 20:22:24
|
In examining the output for several related programs, I was struck by how much time as being wasted in 'poll_idle'. Because these apps typically consumed very large (1GB) files, I originally thought that this was all due to disk i/o, but now I'm not so sure. In trying to grasp how different programs behaved in terms of I/O and how that related to 'poll_idle', I ran a few other apps. Below are the oprofile outputs for 3 of them, one is for a compilation of a fairly large lib, 2 others are bioinformatics apps with which I'm familiar. Especially notable is the oprofile output for clustalw. It ingested and egested relatively tiny files, ran for about 8 minutes, but oprofile suggests that 50% of the time, the CPU was idle. This cannot be disk I/O (the system was essentially idle except for this test), so can memory operations account for this much idle time (cache misses, moving data to & from main mem, page faults, etc)? And if so, how would I go about quantifying this? Is this the point where I have to dive into tau? 1) compilation of the hdf5 library (which I needed to do anyway). CPU: AMD64 processors, speed 1990.96 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % app name symbol name 1495583 30.8198 cc1 (no symbols) 870380 17.9361 vmlinux poll_idle 486332 10.0219 libc-2.3.2.so (no symbols) 241578 4.9782 bash (no symbols) 197392 4.0677 vmlinux unmap_vmas 160895 3.3156 vmlinux copy_page_range 127658 2.6307 vmlinux copy_page 108131 2.2283 ld-2.3.2.so (no symbols) 101516 2.0920 as (no symbols) 99807 2.0567 vmlinux clear_page 49755 1.0253 libbfd-2.15.so (no symbols) 48223 0.9937 vmlinux page_fault 2) a run of clustalw, a multiple alignment app from bioinformatics which takes a small amount of data (227KB) and performs a large amount of integer computation on it, and then emits a relatively small amount of data (3 files totalling 180KB). The program is fairly chatty tho (lots of screen output, redirected to a file in this case, but the total is included in the 180K output). samples % app name symbol name 8950726 48.0420 vmlinux poll_idle 7498500 40.2473 clustalw forward_pass 832632 4.4691 clustalw prfscore 382780 2.0545 clustalw pdiff 247423 1.3280 clustalw reverse_pass 99006 0.5314 clustalw diff 77909 0.4182 vmlinux timer_interrupt 77141 0.4140 clustalw open_penalty2 73940 0.3969 clustalw calc_prf1 59589 0.3198 clustalw open_penalty1 49189 0.2640 clustalw ext_penalty1 47153 0.2531 clustalw ext_penalty2 3) a run of tacg, another bioinformatics sequence matching app that consumes a larger amount of data (~5MB) and performs a fair amount of integer arithmetic on it and then emits a file of ~117MB) samples % app name symbol name 212135 53.1937 vmlinux poll_idle 111922 28.0649 libc-2.3.2.so (no symbols) 17894 4.4870 tacg hash 7601 1.9060 tacg Anti_Par 6468 1.6219 tacg Translate 3451 0.8654 tacg Cutting 2421 0.6071 tacg Degen_Cmp 2292 0.5747 libm-2.3.2.so (no symbols) 2034 0.5100 tacg HorribleAccounting 1913 0.4797 tacg LinearMap 1847 0.4631 tacg GetSequence2 1638 0.4107 vmlinux timer_interrupt Harry Mangalam wrote: > You may be right about trying to measure per-function 'interstitial' time > (good phrasing), but oprofile does allow the pooled idle time, which is > what I asked for: > > CPU: AMD64 processors, speed 1990.96 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit > mask of 0x00 (No unit mask) count 100000 > samples % app name symbol name > 1049696 72.2791 vmlinux poll_idle > 75365 5.1894 ncbo swapn4b > 43424 2.9901 vmlinux copy_user_generic_c > 27051 1.8627 ncbo nco_var_sbt > 24031 1.6547 vmlinux memcpy > 20563 1.4159 vmlinux clear_page > 19553 1.3464 vmlinux memset > > 72% of time spent waiting for data; 5% swapping bytes, 2% actually > calculating..!#%!$%%% :( Not very much point in making this one MPI until > we can relieve that bottleneck. > > It WOULD be VERY nice to be able to calculate the per-function idle time > to see exactly which functions are hung, but that's for another day (but > probably coming soon..). I suspect that tau will be able to do this, but > it requires more instrumentation than oprofile and I'm not ready to go > that route yet. > (comments on tau from people who have used it more than welcome!) > > Thanks to the both of you. > > Harry > > > On Tuesday 22 November 2005 02:34 pm, Rob Mueller wrote: >> I wanted to do something similar as well, and asked about it here: >> >> http://www.nabble.com/Re:-%22real%22-time-counts--p1049898.html >> >> Thinking about it, it does make sense that this is not possible with a >> sampling profiler. You would sort of need the stack backtrace to somehow >> dive back into all processes that have gone into the kernel via an IO >> path and are now in a blocked state. >> >> I never got SystemTap to do exactly what I wanted, but I did get an >> instrumenting profiler to get the data I wanted (real time counts in each >> sub). In the end, the Intel Vtune package got me what I wanted, I >> couldn't find any other instrumenting profilers on linux that would give >> "wall clock" results rather than cpu time results. >> >> Rob >> >> -----Original Message----- >> From: opr...@li... >> [mailto:opr...@li...] On Behalf Of Harry >> Mangalam >> Sent: Tuesday, November 22, 2005 7:42 AM >> To: opr...@li... >> Subject: Can you also sum HALTed cycles? >> >> Trying to oprofile a heavily I/O-bound app. It would be very nice if I >> could count idle CPU cycles as well (those NOT in CPU_CLK_UNHALTED >> events (Cycles outside of halt state). ie I want to count halt states >> as well to see how much time the CPU is waiting on data. >> >> Are there flags to tell oprofile to do this as well? >> >> Or are my assumptions incorrect (counting them either wouldn't tell me >> or oprofile has no way of capturing this). >> >> Gobsmack me with a URL if I've missed something obvious. >> >> Thanks >> Harry >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >> Register for a JBoss Training Course. Free Certification Exam for All >> Training Attendees Through End of 2005. For more info visit: >> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >> Register for a JBoss Training Course. Free Certification Exam >> for All Training Attendees Through End of 2005. For more info visit: >> http://ads.osdn.com/?ad_idv28&alloc_id845&op=ick >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list > |