From: Andy S. <a_s...@ya...> - 2012-05-27 03:06:23
|
Hi all, I'm profiling one big multi-threaded application with oprofile and the numbers that I get from oprofile somehow do not make much sense. I'd appreciate any help figuring out what these numbers mean. Here is my setup: CPU: Dual Intel Xeon E5345 @ 2.33GHz OS: RHEL5 Linux 2.6.18 x86_64 oprofile 0.9.4 I'm profiling CPU_CLK_UNHALTED counter to see where CPU time is spent in the code. For the simple small application (integer arithmetic in 1 thread) the number of samples that I get from oprofile matches very well (after multiplying by reset count and dividing by CPU frequency) the number of seconds reported by "time" command. For the large multi-threaded application these numbers do not match at all. The application in question is a python interpreter that spawns two additional threads, all calculations are happening in the main thread, additional threads do not consume any significant CPU. I run all my tests with a performance governor which should mean that all cores run at the same frequency (which is max 2.33GHz). What I get in this case is that numbers from oprofile seem to be much lower than what is reported by "time". Here is one example: % time python ... real 19m6.477s user 18m17.784s sys 0m12.979s opcontrol is set to include only python image: % oprofile CPU: Core 2, speed 2333 MHz (estimated) Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (Unhalted core cycles) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 4905931 100.000 python If I do simple arithmetic for oprofile counts: 4,905,931 * 100,000 / 2,333,000,000 = 210.28 sec while time reports 18m17.784s = 1097.78 sec (user CPU) These numbers are so much different that I'm sure that I'm missing something. What do I need to get more or less close estimate of CPU time from oprofile samples? Do I need to use some other event type to make up this large difference? Andy. |
From: Maynard J. <may...@us...> - 2012-05-29 14:18:44
|
On 05/26/2012 10:05 PM, Andy Salnikov wrote: > Hi all, > > I'm profiling one big multi-threaded application with oprofile > and the numbers that I get from oprofile somehow do not > make much sense. I'd appreciate any help figuring out what > these numbers mean. > > Here is my setup: > > CPU: Dual Intel Xeon E5345 @ 2.33GHz > OS: RHEL5 Linux 2.6.18 x86_64 > oprofile 0.9.4 > > I'm profiling CPU_CLK_UNHALTED counter to see where CPU time > is spent in the code. > > For the simple small application (integer arithmetic in 1 thread) > the number of samples that I get from oprofile matches very well > (after multiplying by reset count and dividing by CPU frequency) > the number of seconds reported by "time" command. > > For the large multi-threaded application these numbers do not match > at all. The application in question is a python interpreter that spawns > two additional threads, all calculations are happening in the main > thread, additional threads do not consume any significant CPU. > I run all my tests with a performance governor which should mean > that all cores run at the same frequency (which is max 2.33GHz). > What I get in this case is that numbers from oprofile seem to be much > lower than what is reported by "time". Here is one example: > > % time python ... > real 19m6.477s > user 18m17.784s > sys 0m12.979s > > opcontrol is set to include only python image: > > % oprofile > CPU: Core 2, speed 2333 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a > unit mask of 0x00 (Unhalted core cycles) count 100000 > CPU_CLK_UNHALT...| > samples| %| > ------------------ > 4905931 100.000 python > > If I do simple arithmetic for oprofile counts: > 4,905,931 * 100,000 / 2,333,000,000 = 210.28 sec > while time reports 18m17.784s = 1097.78 sec (user CPU) > > These numbers are so much different that I'm sure that I'm missing > something. What do I need to get more or less close estimate of > CPU time from oprofile samples? Do I need to use some other event > type to make up this large difference? If you do 'opcontrol --status', I'm guessing that the "Separate options" is "none". Correct? When looking into this, I discovered a problem involving the combination of "--separate=none" and the specification of an image filter. I opened a bug report for this (https://sourceforge.net/tracker/?func=detail&aid=3530527&group_id=16191&atid=116191). As a workaround, remove the image filter (via 'opcontrol --image=all') and also use 'opcontrol --separate=lib,kernel'. Then after collecting your profile, generate a report in such a way as to include *only* the python application -- e.g., opreport [options] <python_pathname> -Maynard > > Andy. > > > > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > |
From: Andy S. <a_s...@ya...> - 2012-05-29 17:29:05
|
Hi Maynard, thanks for suggestion. I had indeed an image filter defined to only include python executable but --separate option was set to lib. What I found following your suggestion in other thread is that when I remove callgraph (set it to 0) then I get correct sample count. Callgraph was set to 32 before and with that setting oprofile was silently dropping a lot of samples. Thanks, Andy On 5/29/2012 7:18 AM, Maynard Johnson wrote: > On 05/26/2012 10:05 PM, Andy Salnikov wrote: >> Hi all, >> >> I'm profiling one big multi-threaded application with oprofile >> and the numbers that I get from oprofile somehow do not >> make much sense. I'd appreciate any help figuring out what >> these numbers mean. >> >> Here is my setup: >> >> CPU: Dual Intel Xeon E5345 @ 2.33GHz >> OS: RHEL5 Linux 2.6.18 x86_64 >> oprofile 0.9.4 >> >> I'm profiling CPU_CLK_UNHALTED counter to see where CPU time >> is spent in the code. >> >> For the simple small application (integer arithmetic in 1 thread) >> the number of samples that I get from oprofile matches very well >> (after multiplying by reset count and dividing by CPU frequency) >> the number of seconds reported by "time" command. >> >> For the large multi-threaded application these numbers do not match >> at all. The application in question is a python interpreter that spawns >> two additional threads, all calculations are happening in the main >> thread, additional threads do not consume any significant CPU. >> I run all my tests with a performance governor which should mean >> that all cores run at the same frequency (which is max 2.33GHz). >> What I get in this case is that numbers from oprofile seem to be much >> lower than what is reported by "time". Here is one example: >> >> % time python ... >> real 19m6.477s >> user 18m17.784s >> sys 0m12.979s >> >> opcontrol is set to include only python image: >> >> % oprofile >> CPU: Core 2, speed 2333 MHz (estimated) >> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a >> unit mask of 0x00 (Unhalted core cycles) count 100000 >> CPU_CLK_UNHALT...| >> samples| %| >> ------------------ >> 4905931 100.000 python >> >> If I do simple arithmetic for oprofile counts: >> 4,905,931 * 100,000 / 2,333,000,000 = 210.28 sec >> while time reports 18m17.784s = 1097.78 sec (user CPU) >> >> These numbers are so much different that I'm sure that I'm missing >> something. What do I need to get more or less close estimate of >> CPU time from oprofile samples? Do I need to use some other event >> type to make up this large difference? > If you do 'opcontrol --status', I'm guessing that the "Separate options" is "none". Correct? When looking into this, I discovered a problem involving the combination of "--separate=none" and the specification of an image filter. I opened a bug report for this (https://sourceforge.net/tracker/?func=detail&aid=3530527&group_id=16191&atid=116191). > > As a workaround, remove the image filter (via 'opcontrol --image=all') and also use 'opcontrol --separate=lib,kernel'. Then after collecting your profile, generate a report in such a way as to include *only* the python application -- e.g., > opreport [options]<python_pathname> > > -Maynard >> Andy. >> >> >> >> >> ------------------------------------------------------------------------------ >> Live Security Virtual Conference >> Exclusive live event will cover all the ways today's security and >> threat landscape has changed and how IT managers can respond. Discussions >> will include endpoint security, mobile security and the latest in malware >> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ >> _______________________________________________ >> oprofile-list mailing list >> opr...@li... >> https://lists.sourceforge.net/lists/listinfo/oprofile-list >> > |