From: Carl L. <ce...@us...> - 2004-10-14 20:54:33
|
Will: I did the following experiments using load_v2 on a Power 5 box. Load_v2 alloacates an 8MByte array of integers and then repeatedly walks through the array fetching one integer per cache line and doing adding or subtracting the value to the running sum. The purpose is to generate as a large number memory requests. The cpu utilization is 100% when running the workload. The workload timing was done in a script as follows: date load_v2 date In a script, I measured the time for a single copy of load_v2 to run without oprofile running. Then I start Oprofile and then measure the execution time of a single copy of load_v2 to determine the overhead with Oprofile running for various events and count values. I measured the overhead with counting clocks and a couple of L1 data cache events. Remember the program generates a large number of caches misses. The following table gives the results for the various events and count values. event count without Oprofile with Oprofile slowdown clks 1000 410 sec 1098 sec 1098/410 = 2.678 clks 10000 408 sec 476 sec 476/408 = 1.167 clks 50000 408 sec 423 sec 423/408 = 1.037 clks 100000 408 sec 415 sec 415/408 = 1.017 PM_LD_MISS_L1_G43 1000 408 sec 409 sec 409/408 = 1.002 PM_LD_REF_L1_LSU0_G46 1000 408 sec 427 sec 427/408 = 1.047 5000 408 sec 415 sec 415/408 = 1.017 10000 410 sec 411 sec 411/410 = 1.002 I then changed load_v2 to just repeatedly do a calculation: a= a*b+1. The purpose of this experiment was to try and issue as many instructions as possible. The results are as follows: event count without Oprofile with Oprofile slowdown clks 50000 215 sec 222 sec 222/215 = 1.033 PM_INST_CMPL_GP9 1000 214 sec 545 sec 545/214 = 2.547 10000 214 sec 247 sec 247/214 = 1.154 50000 214 sec 218 sec 218/214 = 1.014 Based on the above data, we should set instructions and clocks to a minimum of 50,000. The rest of the events can be set at 1000 for a minimum with the exception of counting hypervisor clocks. There is an event for counting the number of clocks spent in hypervisor mode. The hypervisor mode is another category like user and kernel. On IBM machines, you can run virtual machines on the same physical hardware. The hypervisor is responsible for virtualizing the I/O, networks etc. It also swaps in and out virtual machines like the OS swaps processes in and out of a processor. I don't have a good workload for measuring the hypervisor load. But based what I have seen, the hypervisor should execute less then 10% of the time. It has been observed to be as high as 40%. When the hypervisor is running, the counts increase at the same rate as clks. Hence the Oprofile overhead for counting hypervisor clocks will be the same as for clks when the hypervisor is running. Based on the above data, we feel 10,000 is a good minimum until we get more experience with this event. I did not encounter any problems with the system hanging for any of the count values that I used. Perhaps if we went lower then 1000 we might see a problem. Maynard is updating the minimum counts for the 970, Power 4 and Power 5 event files. He will resend the entire 970 patch. In seperate messages, he will send patches to update the Power 4 and Power 5 event files. Carl Love |
From: Maynard P. J. <may...@us...> - 2004-10-14 21:28:45
Attachments:
ppc970-patch-for-oprofCVS-10.14.04
|
Carl Love wrote: > Based on the above data, we should set instructions and clocks to a > minimum of 50,000. > The rest of the events can be set at 1000 for a minimum > > Maynard is updating the minimum counts for the 970, Power 4 and Power 5 > event files. He > will resend the entire 970 patch. In seperate messages, he will send > patches to update the > Power 4 and Power 5 event files. > Attached is a re-send of the full ppc970 patch, with the event file updated as described above. Thanks for your help, Will and Carl! -- Maynard Johnson |
From: John L. <le...@mo...> - 2004-10-14 21:36:29
|
On Thu, Oct 14, 2004 at 04:28:30PM -0500, Maynard P. Johnson wrote: > Carl Love wrote: > >Based on the above data, we should set instructions and clocks to a > >minimum of 50,000. > >The rest of the events can be set at 1000 for a minimum > > > >Maynard is updating the minimum counts for the 970, Power 4 and Power 5 > >event files. He > >will resend the entire 970 patch. In seperate messages, he will send > >patches to update the > >Power 4 and Power 5 event files. > > > Attached is a re-send of the full ppc970 patch, with the event file > updated as described above. I wasn't clear on this. Are you setting new minimums because the old ones were *unsafe*, or because they just had high overhead? Overhead is not a good reason to set a minimum... regards john |
From: Carl L. <ce...@us...> - 2004-10-14 23:50:01
|
John: The initial question from Will was that we were not consistent in setting the minimum values. He was also concerned about the system hanging if the value was too low. As my experiment shows, the minimum values are sufficiently high to prevent the system from hanging, i.e. they are safe. It seemed to us that Will was also concerned about the performance impact. Clearly from the measurements, if the count value is too low the tool will incure an excessive amount of overhead. The Oprofile documenation says that Oprofile has less then a 5% performance impact (Ch 6). In chapter 3.6, it warnes about making the count too small and incurring excessive overhead. The thought behind raising the minimums was to ensure the tool did not incure an excessive performance impact, i.e. keep it no more then 5%. We looked at several other systems and note that the min count for clocks is at 100,000 which I suspect is high enough not only to avoid system hangs but also keep the overhead very low. Since Oprofile uses the minimum count if no value is specified, it seems like you would want a value that would give reasonable performance. The idea behind the change was to be consistent between the IBM platforms and be consistent with what has been done on the other platforms. So, the question to you and Will is, what are the guidelines to setting the minimum values? We are willing to change them for 970, power 4 and power 5 to make them consistent across the various platforms, if the two of you desire that. We have data that shows what the settings need to be to ensure the system runs and what the setting should be to keep the overhead at a reasonable level. We will be happy to do make any changes to the minimum values that you and Will feel that is an appropriate thing to do. Thanks for the input. Carl Love John Levon wrote: >On Thu, Oct 14, 2004 at 04:28:30PM -0500, Maynard P. Johnson wrote: > > > >>Carl Love wrote: >> >> >>>Based on the above data, we should set instructions and clocks to a >>>minimum of 50,000. >>>The rest of the events can be set at 1000 for a minimum >>> >>>Maynard is updating the minimum counts for the 970, Power 4 and Power 5 >>>event files. He >>>will resend the entire 970 patch. In seperate messages, he will send >>>patches to update the >>>Power 4 and Power 5 event files. >>> >>> >>> >>Attached is a re-send of the full ppc970 patch, with the event file >>updated as described above. >> >> > >I wasn't clear on this. Are you setting new minimums because the old >ones were *unsafe*, or because they just had high overhead? Overhead is >not a good reason to set a minimum... > >regards >john > > > |
From: John L. <le...@mo...> - 2004-10-15 00:27:21
|
On Thu, Oct 14, 2004 at 04:48:11PM -0700, Carl Love wrote: > Since Oprofile uses the minimum count if no value is specified Hmm, where? This would be very wrong behaviour. And: 82 part = next_part(&cp); 83 84 if (!part) { 85 fprintf(stderr, "Invalid count for event %s\n", events[i]); 86 exit(EXIT_FAILURE); 87 } > The idea behind the change was to be consistent between the IBM You should be consistent. But the minimum should be just that: a guideline of the safe minimum (there's a bit of guess work, but hey). There are situations where the user *wants* such overhead (imagine I want lots of data on a specific few instructions in a crypto routine or whatever; I don't care much for overhead there necessarily, but I might appreciate the firehose being on). We shouldn't police that. We should just make it slightly harder for somebody to accidentally make their machine unusable (i.e. unable to stop oprofile again) > if the two of you desire that. We have data that shows what the > settings need to be to ensure the system runs and > what the setting should be to keep the overhead at a reasonable level. This data would be cool to document somewhere so you can point PPC users at it. regards john |
From: Maynard P. J. <may...@us...> - 2004-10-15 14:45:20
|
John, Carl is out of the office today, so I'll respond for him . . . John Levon wrote: > On Thu, Oct 14, 2004 at 04:48:11PM -0700, Carl Love wrote: > > >>Since Oprofile uses the minimum count if no value is specified > > > Hmm, where? This would be very wrong behaviour. I believe Carl was actually referring to the default count used when the user specifies '--event=default'. For all platforms, this default count is 100000. And for most platforms, the default event is some cycles-related event. Carl and I had discussed this fact. Right or wrong, I think the 50000 minimum that Carl suggests for cycles-related events was partly based on this default, as well as the concern for system performance degradation. > > > You should be consistent. But the minimum should be just that: a > guideline of the safe minimum (there's a bit of guess work, but hey). Yes, and fortunately, a knowledgeable OProfile user can pretty easily change the minimum counts if they wish. > > There are situations where the user *wants* such overhead (imagine I > want lots of data on a specific few instructions in a crypto routine or > whatever; I don't care much for overhead there necessarily, but I might > appreciate the firehose being on). We shouldn't police that. We should > just make it slightly harder for somebody to accidentally make their > machine unusable (i.e. unable to stop oprofile again) > > >>if the two of you desire that. We have data that shows what the >>settings need to be to ensure the system runs and >>what the setting should be to keep the overhead at a reasonable level. > > > This data would be cool to document somewhere so you can point PPC users > at it. I think Carl was referring to the data from his earlier note, based on a benchmark that he rolled himself that was narrowly focused on just a few events. There is no such performance documentation for all PowerPC events that I'm aware of. The upshot is this: At least for the cycles-related events, a minimum count of 1000 is probably too low. The test results showing a 2.7 degradation factor could, in some cases, result in a thrashing machine. On the other hand, 50000 may be overkill and doesn't allow for higher resolution cycle-based profiling (without having to edit the appropriate file). I suggest 10000 for a good compromise. Will, John -- are you OK with that? > > regards > john > > -- Thanks, Maynard |
From: John L. <le...@mo...> - 2004-10-15 15:01:41
|
On Fri, Oct 15, 2004 at 09:44:55AM -0500, Maynard P. Johnson wrote: > >>Since Oprofile uses the minimum count if no value is specified > > > >Hmm, where? This would be very wrong behaviour. > I believe Carl was actually referring to the default count used when the > user specifies '--event=default'. For all platforms, this default count > is 100000. And for most platforms, the default event is some > cycles-related event. Carl and I had discussed this fact. Right or > wrong, I think the 50000 minimum that Carl suggests for cycles-related > events was partly based on this default, as well as the concern for > system performance degradation. OK, but this is a very different value, and is set differently from the minimum: tags in the event files. > >You should be consistent. But the minimum should be just that: a > >guideline of the safe minimum (there's a bit of guess work, but hey). > Yes, and fortunately, a knowledgeable OProfile user can pretty easily > change the minimum counts if they wish. This was part of the motivation for event files in the first place. > The upshot is this: At least for the cycles-related events, a minimum > count of 1000 is probably too low. The test results showing a 2.7 > degradation factor could, in some cases, result in a thrashing machine. > On the other hand, 50000 may be overkill and doesn't allow for higher > resolution cycle-based profiling (without having to edit the appropriate > file). I suggest 10000 for a good compromise. Will, John -- are you OK > with that? Sounds fine to me john |
From: William C. <wc...@nc...> - 2004-10-15 17:14:14
|
John Levon wrote: > On Fri, Oct 15, 2004 at 09:44:55AM -0500, Maynard P. Johnson wrote: > > >>>>Since Oprofile uses the minimum count if no value is specified >>> >>>Hmm, where? This would be very wrong behaviour. >> >>I believe Carl was actually referring to the default count used when the >>user specifies '--event=default'. For all platforms, this default count >>is 100000. And for most platforms, the default event is some >>cycles-related event. Carl and I had discussed this fact. Right or >>wrong, I think the 50000 minimum that Carl suggests for cycles-related >>events was partly based on this default, as well as the concern for >>system performance degradation. > > > OK, but this is a very different value, and is set differently from the > minimum: tags in the event files. > > >>>You should be consistent. But the minimum should be just that: a >>>guideline of the safe minimum (there's a bit of guess work, but hey). >> >>Yes, and fortunately, a knowledgeable OProfile user can pretty easily >>change the minimum counts if they wish. > > > This was part of the motivation for event files in the first place. > > >>The upshot is this: At least for the cycles-related events, a minimum >>count of 1000 is probably too low. The test results showing a 2.7 >>degradation factor could, in some cases, result in a thrashing machine. >> On the other hand, 50000 may be overkill and doesn't allow for higher >>resolution cycle-based profiling (without having to edit the appropriate >>file). I suggest 10000 for a good compromise. Will, John -- are you OK >>with that? I am okay with 10000 for the cycles. Should that also be the minimum for the instruction retired events too? I saw it set to 2000 in a number of places. The main purpose of the minimum values in the event file it provide some safety by preventing the event counts being set so low that the processor gets stuck in the interrupt routine. High overhead is not an issue. -Will > > > Sounds fine to me > > john > |
From: Maynard P. J. <may...@us...> - 2004-10-15 17:29:14
|
William Cohen wrote: > > > John Levon wrote: > >> On Fri, Oct 15, 2004 at 09:44:55AM -0500, Maynard P. Johnson wrote: >> >> >> >>> The upshot is this: At least for the cycles-related events, a >>> minimum count of 1000 is probably too low. The test results showing >>> a 2.7 degradation factor could, in some cases, result in a thrashing >>> machine. On the other hand, 50000 may be overkill and doesn't allow >>> for higher resolution cycle-based profiling (without having to edit >>> the appropriate file). I suggest 10000 for a good compromise. Will, >>> John -- are you OK with that? > > > I am okay with 10000 for the cycles. Should that also be the minimum for > the instruction retired events too? I saw it set to 2000 in a number of > places. > Yes, that was my intent. Looks like I missed two of them. Do you want me to resend the patch or do you want to just change those two counts to '10000'? Thanks. > -Will > >> >> john >> -- Maynard |
From: William C. <wc...@nc...> - 2004-10-15 17:42:52
|
Maynard P. Johnson wrote: > William Cohen wrote: >> I am okay with 10000 for the cycles. Should that also be the minimum >> for the instruction retired events too? I saw it set to 2000 in a >> number of places. >> > Yes, that was my intent. Looks like I missed two of them. Do you want > me to resend the patch or do you want to just change those two counts to > '10000'? Thanks. Not a big deal I can correct these. -Will |
From: Maynard P. J. <may...@us...> - 2004-10-15 18:03:22
|
William Cohen wrote: > > > Maynard P. Johnson wrote: > >> William Cohen wrote: > > >>> I am okay with 10000 for the cycles. Should that also be the minimum >>> for the instruction retired events too? I saw it set to 2000 in a >>> number of places. >>> >> Yes, that was my intent. Looks like I missed two of them. Do you >> want me to resend the patch or do you want to just change those two >> counts to '10000'? Thanks. > > > Not a big deal I can correct these. -Will Thanks, Will! > > > > > ------------------------------------------------------- > This SF.net email is sponsored by: IT Product Guide on ITManagersJournal > Use IT products in your business? Tell us what you think of them. Give us > Your Opinions, Get Free ThinkGeek Gift Certificates! Click to find out more > http://productguide.itmanagersjournal.com/guidepromo.tmpl > _______________________________________________ > oprofile-list mailing list > opr...@li... > https://lists.sourceforge.net/lists/listinfo/oprofile-list > -- Maynard Johnson |
From: Maynard P. J. <may...@us...> - 2004-10-15 17:14:40
Attachments:
ppc970-patch-for-oprofCVS-10.14.04-v2
|
John Levon wrote: > On Fri, Oct 15, 2004 at 09:44:55AM -0500, Maynard P. Johnson wrote: > > > >>The upshot is this: At least for the cycles-related events, a minimum >>count of 1000 is probably too low. The test results showing a 2.7 >>degradation factor could, in some cases, result in a thrashing machine. >> On the other hand, 50000 may be overkill and doesn't allow for higher >>resolution cycle-based profiling (without having to edit the appropriate >>file). I suggest 10000 for a good compromise. Will, John -- are you OK >>with that? > > > Sounds fine to me > > john > OK, the patch is attached with the minimum counts updated as described above. P.S. John, I have a doc patch ready to be contributed, too, but I've been waiting until the 970 stuff is accepted since the doc patch mentions all three PowerPC platforms we're supporting. So look for that to be contributed as soon as the 970 patch is committed. Thanks! -- Maynard |
From: John L. <le...@mo...> - 2004-10-15 17:21:46
|
On Fri, Oct 15, 2004 at 12:13:22PM -0500, Maynard P. Johnson wrote: > diff -paurNX ../diff_fileExclusionFilter oprofile/ChangeLog ../oprof-cvs-10.14.04-970Patched/oprofile/ChangeLog > --- oprofile/ChangeLog 2004-10-13 20:50:26.000000000 -0500 > +++ ../oprof-cvs-10.14.04-970Patched/oprofile/ChangeLog 2004-10-14 16:12:40.166945264 -0500 > @@ -1,3 +1,12 @@ > +2004-10-14 Maynard Johnson <may...@us...> > + > + * events/Makefile.am > + * libop/op_cpu_type.c > + * libop/op_cpu_type.h > + * libop/op_events.c > + * utils/op_help.c > + * utils/opcontrol You forgot to mention what's changed :) Also, the format has trailing colons for each file. john |
From: William C. <wc...@nc...> - 2004-10-15 17:41:57
|
John Levon wrote: > On Fri, Oct 15, 2004 at 12:13:22PM -0500, Maynard P. Johnson wrote: > > >>diff -paurNX ../diff_fileExclusionFilter oprofile/ChangeLog ../oprof-cvs-10.14.04-970Patched/oprofile/ChangeLog >>--- oprofile/ChangeLog 2004-10-13 20:50:26.000000000 -0500 >>+++ ../oprof-cvs-10.14.04-970Patched/oprofile/ChangeLog 2004-10-14 16:12:40.166945264 -0500 >>@@ -1,3 +1,12 @@ >>+2004-10-14 Maynard Johnson <may...@us...> >>+ >>+ * events/Makefile.am >>+ * libop/op_cpu_type.c >>+ * libop/op_cpu_type.h >>+ * libop/op_events.c >>+ * utils/op_help.c >>+ * utils/opcontrol > > > You forgot to mention what's changed :) Also, the format has trailing > colons for each file. > > john > I corrected that in the checkin. -Will |