From: Evgeny P. <Evg...@ex...> - 2005-11-22 17:28:24
|
Look here: http://oprofile.sourceforge.net/doc/kernel-profiling.html#idle (2.2 Idle time) =20 -----Original Message----- From: opr...@li... [mailto:opr...@li...] On Behalf Of Harry Mangalam Sent: Tuesday, November 22, 2005 7:42 AM To: opr...@li... Subject: Can you also sum HALTed cycles? Trying to oprofile a heavily I/O-bound app. It would be very nice if I could count idle CPU cycles as well (those NOT in CPU_CLK_UNHALTED events (Cycles outside of halt state). ie I want to count halt states as well to see how much time the CPU is waiting on data. Are there flags to tell oprofile to do this as well? Or are my assumptions incorrect (counting them either wouldn't tell me or oprofile has no way of capturing this). =20 Gobsmack me with a URL if I've missed something obvious. Thanks Harry ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=3D7628&alloc_id=3D16845&op=3Dclick _______________________________________________ oprofile-list mailing list opr...@li... https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Rob M. <ro...@fa...> - 2005-11-22 22:34:55
|
I wanted to do something similar as well, and asked about it here: http://www.nabble.com/Re:-%22real%22-time-counts--p1049898.html Thinking about it, it does make sense that this is not possible with a=20 sampling profiler. You would sort of need the stack backtrace to somehow=20 dive back into all processes that have gone into the kernel via an IO pat= h=20 and are now in a blocked state. I never got SystemTap to do exactly what I wanted, but I did get an=20 instrumenting profiler to get the data I wanted (real time counts in each= =20 sub). In the end, the Intel Vtune package got me what I wanted, I couldn'= t=20 find any other instrumenting profilers on linux that would give "wall clo= ck"=20 results rather than cpu time results. Rob -----Original Message----- From: opr...@li... [mailto:opr...@li...] On Behalf Of Harry Mangalam Sent: Tuesday, November 22, 2005 7:42 AM To: opr...@li... Subject: Can you also sum HALTed cycles? Trying to oprofile a heavily I/O-bound app. It would be very nice if I could count idle CPU cycles as well (those NOT in CPU_CLK_UNHALTED events (Cycles outside of halt state). ie I want to count halt states as well to see how much time the CPU is waiting on data. Are there flags to tell oprofile to do this as well? Or are my assumptions incorrect (counting them either wouldn't tell me or oprofile has no way of capturing this). Gobsmack me with a URL if I've missed something obvious. Thanks Harry ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_id=3D7628&alloc_id=3D16845&op=3Dclick _______________________________________________ oprofile-list mailing list opr...@li... https://lists.sourceforge.net/lists/listinfo/oprofile-list ------------------------------------------------------- This SF.Net email is sponsored by the JBoss Inc. Get Certified Today Register for a JBoss Training Course. Free Certification Exam for All Training Attendees Through End of 2005. For more info visit: http://ads.osdn.com/?ad_idv28&alloc_id=16845&op=3Dick _______________________________________________ oprofile-list mailing list opr...@li... https://lists.sourceforge.net/lists/listinfo/oprofile-list |
From: Harry M. <hj...@ta...> - 2005-11-22 23:08:46
|
You may be right about trying to measure per-function 'interstitial' time (good phrasing), but oprofile does allow the pooled idle time, which is what I asked for: CPU: AMD64 processors, speed 1990.96 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % app name symbol name 1049696 72.2791 vmlinux poll_idle 75365 5.1894 ncbo swapn4b 43424 2.9901 vmlinux copy_user_generic_c 27051 1.8627 ncbo nco_var_sbt 24031 1.6547 vmlinux memcpy 20563 1.4159 vmlinux clear_page 19553 1.3464 vmlinux memset 72% of time spent waiting for data; 5% swapping bytes, 2% actually calculating..!#%!$%%% :( Not very much point in making this one MPI until we can relieve that bottleneck. It WOULD be VERY nice to be able to calculate the per-function idle time to see exactly which functions are hung, but that's for another day (but probably coming soon..). I suspect that tau will be able to do this, but it requires more instrumentation than oprofile and I'm not ready to go that route yet. (comments on tau from people who have used it more than welcome!) Thanks to the both of you. Harry On Tuesday 22 November 2005 02:34 pm, Rob Mueller wrote: > I wanted to do something similar as well, and asked about it here: > > http://www.nabble.com/Re:-%22real%22-time-counts--p1049898.html > > Thinking about it, it does make sense that this is not possible with a > sampling profiler. You would sort of need the stack backtrace to somehow > dive back into all processes that have gone into the kernel via an IO path > and are now in a blocked state. > > I never got SystemTap to do exactly what I wanted, but I did get an > instrumenting profiler to get the data I wanted (real time counts in each > sub). In the end, the Intel Vtune package got me what I wanted, I couldn't > find any other instrumenting profilers on linux that would give "wall > clock" results rather than cpu time results. > > Rob > > -----Original Message----- > From: opr...@li... > [mailto:opr...@li...] On Behalf Of Harry > Mangalam > Sent: Tuesday, November 22, 2005 7:42 AM > To: opr...@li... > Subject: Can you also sum HALTed cycles? > > Trying to oprofile a heavily I/O-bound app. It would be very nice if I > could count idle CPU cycles as well (those NOT in CPU_CLK_UNHALTED > events (Cycles outside of halt state). ie I want to count halt states > as well to see how much time the CPU is waiting on data. > > Are there flags to tell oprofile to do this as well? > > Or are my assumptions incorrect (counting them either wouldn't tell me > or oprofile has no way of capturing this). > > Gobsmack me with a URL if I've missed something obvious. > > Thanks > Harry > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. Get Certified Today > Register for a JBoss Training Course. Free Certification Exam for All > Training Attendees Through End of 2005. For more info visit: > http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > > > ------------------------------------------------------- > This SF.Net email is sponsored by the JBoss Inc. Get Certified Today > Register for a JBoss Training Course. Free Certification Exam > for All Training Attendees Through End of 2005. For more info visit: > http://ads.osdn.com/?ad_idv28&alloc_id845&op=ick > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: Harry M. <hj...@ta...> - 2005-11-23 20:22:24
|
In examining the output for several related programs, I was struck by how much time as being wasted in 'poll_idle'. Because these apps typically consumed very large (1GB) files, I originally thought that this was all due to disk i/o, but now I'm not so sure. In trying to grasp how different programs behaved in terms of I/O and how that related to 'poll_idle', I ran a few other apps. Below are the oprofile outputs for 3 of them, one is for a compilation of a fairly large lib, 2 others are bioinformatics apps with which I'm familiar. Especially notable is the oprofile output for clustalw. It ingested and egested relatively tiny files, ran for about 8 minutes, but oprofile suggests that 50% of the time, the CPU was idle. This cannot be disk I/O (the system was essentially idle except for this test), so can memory operations account for this much idle time (cache misses, moving data to & from main mem, page faults, etc)? And if so, how would I go about quantifying this? Is this the point where I have to dive into tau? 1) compilation of the hdf5 library (which I needed to do anyway). CPU: AMD64 processors, speed 1990.96 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % app name symbol name 1495583 30.8198 cc1 (no symbols) 870380 17.9361 vmlinux poll_idle 486332 10.0219 libc-2.3.2.so (no symbols) 241578 4.9782 bash (no symbols) 197392 4.0677 vmlinux unmap_vmas 160895 3.3156 vmlinux copy_page_range 127658 2.6307 vmlinux copy_page 108131 2.2283 ld-2.3.2.so (no symbols) 101516 2.0920 as (no symbols) 99807 2.0567 vmlinux clear_page 49755 1.0253 libbfd-2.15.so (no symbols) 48223 0.9937 vmlinux page_fault 2) a run of clustalw, a multiple alignment app from bioinformatics which takes a small amount of data (227KB) and performs a large amount of integer computation on it, and then emits a relatively small amount of data (3 files totalling 180KB). The program is fairly chatty tho (lots of screen output, redirected to a file in this case, but the total is included in the 180K output). samples % app name symbol name 8950726 48.0420 vmlinux poll_idle 7498500 40.2473 clustalw forward_pass 832632 4.4691 clustalw prfscore 382780 2.0545 clustalw pdiff 247423 1.3280 clustalw reverse_pass 99006 0.5314 clustalw diff 77909 0.4182 vmlinux timer_interrupt 77141 0.4140 clustalw open_penalty2 73940 0.3969 clustalw calc_prf1 59589 0.3198 clustalw open_penalty1 49189 0.2640 clustalw ext_penalty1 47153 0.2531 clustalw ext_penalty2 3) a run of tacg, another bioinformatics sequence matching app that consumes a larger amount of data (~5MB) and performs a fair amount of integer arithmetic on it and then emits a file of ~117MB) samples % app name symbol name 212135 53.1937 vmlinux poll_idle 111922 28.0649 libc-2.3.2.so (no symbols) 17894 4.4870 tacg hash 7601 1.9060 tacg Anti_Par 6468 1.6219 tacg Translate 3451 0.8654 tacg Cutting 2421 0.6071 tacg Degen_Cmp 2292 0.5747 libm-2.3.2.so (no symbols) 2034 0.5100 tacg HorribleAccounting 1913 0.4797 tacg LinearMap 1847 0.4631 tacg GetSequence2 1638 0.4107 vmlinux timer_interrupt Harry Mangalam wrote: > You may be right about trying to measure per-function 'interstitial' time > (good phrasing), but oprofile does allow the pooled idle time, which is > what I asked for: > > CPU: AMD64 processors, speed 1990.96 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit > mask of 0x00 (No unit mask) count 100000 > samples % app name symbol name > 1049696 72.2791 vmlinux poll_idle > 75365 5.1894 ncbo swapn4b > 43424 2.9901 vmlinux copy_user_generic_c > 27051 1.8627 ncbo nco_var_sbt > 24031 1.6547 vmlinux memcpy > 20563 1.4159 vmlinux clear_page > 19553 1.3464 vmlinux memset > > 72% of time spent waiting for data; 5% swapping bytes, 2% actually > calculating..!#%!$%%% :( Not very much point in making this one MPI until > we can relieve that bottleneck. > > It WOULD be VERY nice to be able to calculate the per-function idle time > to see exactly which functions are hung, but that's for another day (but > probably coming soon..). I suspect that tau will be able to do this, but > it requires more instrumentation than oprofile and I'm not ready to go > that route yet. > (comments on tau from people who have used it more than welcome!) > > Thanks to the both of you. > > Harry > > > On Tuesday 22 November 2005 02:34 pm, Rob Mueller wrote: >> I wanted to do something similar as well, and asked about it here: >> >> http://www.nabble.com/Re:-%22real%22-time-counts--p1049898.html >> >> Thinking about it, it does make sense that this is not possible with a >> sampling profiler. You would sort of need the stack backtrace to somehow >> dive back into all processes that have gone into the kernel via an IO >> path and are now in a blocked state. >> >> I never got SystemTap to do exactly what I wanted, but I did get an >> instrumenting profiler to get the data I wanted (real time counts in each >> sub). In the end, the Intel Vtune package got me what I wanted, I >> couldn't find any other instrumenting profilers on linux that would give >> "wall clock" results rather than cpu time results. >> >> Rob >> >> -----Original Message----- >> From: opr...@li... >> [mailto:opr...@li...] On Behalf Of Harry >> Mangalam >> Sent: Tuesday, November 22, 2005 7:42 AM >> To: opr...@li... >> Subject: Can you also sum HALTed cycles? >> >> Trying to oprofile a heavily I/O-bound app. It would be very nice if I >> could count idle CPU cycles as well (those NOT in CPU_CLK_UNHALTED >> events (Cycles outside of halt state). ie I want to count halt states >> as well to see how much time the CPU is waiting on data. >> >> Are there flags to tell oprofile to do this as well? >> >> Or are my assumptions incorrect (counting them either wouldn't tell me >> or oprofile has no way of capturing this). >> >> Gobsmack me with a URL if I've missed something obvious. >> >> Thanks >> Harry >> >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >> Register for a JBoss Training Course. Free Certification Exam for All >> Training Attendees Through End of 2005. For more info visit: >> http://ads.osdn.com/?ad_id=7628&alloc_id=16845&op=click >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by the JBoss Inc. Get Certified Today >> Register for a JBoss Training Course. Free Certification Exam >> for All Training Attendees Through End of 2005. For more info visit: >> http://ads.osdn.com/?ad_idv28&alloc_id845&op=ick >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list > |
From: John L. <le...@mo...> - 2005-11-24 01:56:33
|
On Wed, Nov 23, 2005 at 12:17:46PM -0800, Harry Mangalam wrote: > suggests that 50% of the time, the CPU was idle. This cannot be disk I/O > (the system was essentially idle except for this test), so can memory > operations account for this much idle time (cache misses, moving data to & > from main mem, page faults, etc)? This isn't what poll_idle tells you. All it tells you is how much time was spent running the code in poll_idle (roughly, taking into statistical errors, etc.). Semantically, it means something like "the time we ran this code, there was nothing else ready to run". This can be for a number of reasons, but not things like you mention above (how can we be taking a page fault? poll_idle doesn't /do/ anything!). To delve deeper, you really need a system comprehension tool (systemtap perhaps) not a profiler. I can't comment on whether Tau will help you. Some simple sanity checks with vmstat might help. Just to be sure: this is a single CPU? regards john |
From: Harry M. <hj...@ta...> - 2005-11-24 05:32:58
|
Thanks for the reply and info. The application is a serial app (not OMP, not MPI), but it is running on a dual CPU machine under Linux-SMP. Could that contribute to the odd stats? Does oprofile sum the entire capacity of all the CPUs to equal 100% (!??!) That was, I guess, my indirect question - what does 'poll_idle' mean and what contributes to it / what does it include? If clustalw is doing no disk I/O and is saturating 1 CPU (by top, by xosview), what else could be contributing to this amount of poll_idle? On Wednesday 23 November 2005 05:55 pm, John Levon wrote: > On Wed, Nov 23, 2005 at 12:17:46PM -0800, Harry Mangalam wrote: > > suggests that 50% of the time, the CPU was idle. This cannot be disk I/O > > (the system was essentially idle except for this test), so can memory > > operations account for this much idle time (cache misses, moving data to > > & from main mem, page faults, etc)? > > This isn't what poll_idle tells you. All it tells you is how much time > was spent running the code in poll_idle (roughly, taking into > statistical errors, etc.). > > Semantically, it means something like "the time we ran this code, there > was nothing else ready to run". This can be for a number of reasons, but > not things like you mention above (how can we be taking a page fault? > poll_idle doesn't /do/ anything!). So it's a true 'waiting for something to do' - that's hard to believe in the midst of what I can see is going on, unless it's reporting the sum of both CPUs. (?????) > To delve deeper, you really need a system comprehension tool (systemtap > perhaps) not a profiler. I can't comment on whether Tau will help you. This has been mentioned by another person; I'll take a look; also SGI's Speedshop was just OpenSourced a few days ago, but requires quite a bit of setup with some specific versioning and patches. > Some simple sanity checks with vmstat might help. Thanks! I'm manning it now. > Just to be sure: this is a single CPU? Yes, as above -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |
From: John L. <le...@mo...> - 2005-11-24 10:03:19
|
On Wed, Nov 23, 2005 at 09:31:00PM -0800, Harry Mangalam wrote: > The application is a serial app (not OMP, not MPI), but it is running on a > dual CPU machine under Linux-SMP. Could that contribute to the odd stats? > Does oprofile sum the entire capacity of all the CPUs to equal 100% (!??!) No, not quite. Read the manual on what the percentage in the output means. In fact going through the manuals is probably a good idea anyway, I'm not sure you've yet got a clear idea of how things work. Hint: --separate=cpu might clarify things for you a bit regards john |
From: Harry M. <hj...@ta...> - 2005-11-24 16:34:27
|
On Thursday 24 November 2005 02:02 am, John Levon wrote: > On Wed, Nov 23, 2005 at 09:31:00PM -0800, Harry Mangalam wrote: > > The application is a serial app (not OMP, not MPI), but it is running on > > a dual CPU machine under Linux-SMP. Could that contribute to the odd > > stats? Does oprofile sum the entire capacity of all the CPUs to equal > > 100% (!??!) > > No, not quite. Read the manual on what the percentage in the output > means. In fact going through the manuals is probably a good idea anyway, > I'm not sure you've yet got a clear idea of how things work. > > Hint: --separate=cpu might clarify things for you a bit Thanks for the gentle suggestion. :) Willdo. > regards > john -- Cheers, Harry Harry J Mangalam - 949 856 2847 (vox; email for fax) - hj...@ta... <<plain text preferred>> |