You can subscribe to this list here.
2001 |
Jan
(1) |
Feb
|
Mar
(7) |
Apr
(3) |
May
(3) |
Jun
(7) |
Jul
(10) |
Aug
(1) |
Sep
(50) |
Oct
(74) |
Nov
(28) |
Dec
(32) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2002 |
Jan
(63) |
Feb
(27) |
Mar
(88) |
Apr
(21) |
May
(59) |
Jun
(41) |
Jul
(61) |
Aug
(89) |
Sep
(179) |
Oct
(152) |
Nov
(190) |
Dec
(92) |
2003 |
Jan
(140) |
Feb
(160) |
Mar
(193) |
Apr
(107) |
May
(84) |
Jun
(60) |
Jul
(97) |
Aug
(97) |
Sep
(42) |
Oct
(105) |
Nov
(99) |
Dec
(52) |
2004 |
Jan
(99) |
Feb
(97) |
Mar
(62) |
Apr
(73) |
May
(94) |
Jun
(37) |
Jul
(32) |
Aug
(89) |
Sep
(87) |
Oct
(72) |
Nov
(114) |
Dec
(35) |
2005 |
Jan
(25) |
Feb
(42) |
Mar
(120) |
Apr
(151) |
May
(71) |
Jun
(36) |
Jul
(35) |
Aug
(92) |
Sep
(19) |
Oct
(57) |
Nov
(77) |
Dec
(61) |
2006 |
Jan
(107) |
Feb
(114) |
Mar
(66) |
Apr
(101) |
May
(74) |
Jun
(64) |
Jul
(42) |
Aug
(51) |
Sep
(106) |
Oct
(118) |
Nov
(138) |
Dec
(162) |
2007 |
Jan
(148) |
Feb
(222) |
Mar
(73) |
Apr
(160) |
May
(166) |
Jun
(125) |
Jul
(184) |
Aug
(58) |
Sep
(41) |
Oct
(102) |
Nov
(111) |
Dec
(52) |
2008 |
Jan
(104) |
Feb
(67) |
Mar
(48) |
Apr
(125) |
May
(114) |
Jun
(98) |
Jul
(206) |
Aug
(89) |
Sep
(88) |
Oct
(163) |
Nov
(115) |
Dec
(113) |
2009 |
Jan
(131) |
Feb
(85) |
Mar
(157) |
Apr
(198) |
May
(202) |
Jun
(154) |
Jul
(156) |
Aug
(75) |
Sep
(80) |
Oct
(148) |
Nov
(88) |
Dec
(83) |
2010 |
Jan
(78) |
Feb
(59) |
Mar
(89) |
Apr
(54) |
May
(92) |
Jun
(66) |
Jul
(38) |
Aug
(73) |
Sep
(84) |
Oct
(91) |
Nov
(52) |
Dec
(62) |
2011 |
Jan
(86) |
Feb
(68) |
Mar
(129) |
Apr
(121) |
May
(154) |
Jun
(81) |
Jul
(55) |
Aug
(55) |
Sep
(58) |
Oct
(115) |
Nov
(88) |
Dec
(95) |
2012 |
Jan
(105) |
Feb
(62) |
Mar
(52) |
Apr
(54) |
May
(103) |
Jun
(89) |
Jul
(152) |
Aug
(73) |
Sep
(58) |
Oct
(60) |
Nov
(52) |
Dec
(90) |
2013 |
Jan
(102) |
Feb
(63) |
Mar
(68) |
Apr
(128) |
May
(82) |
Jun
(94) |
Jul
(87) |
Aug
(29) |
Sep
(24) |
Oct
(25) |
Nov
(40) |
Dec
(51) |
2014 |
Jan
(41) |
Feb
(60) |
Mar
(33) |
Apr
(22) |
May
(38) |
Jun
(23) |
Jul
(86) |
Aug
(113) |
Sep
(23) |
Oct
(22) |
Nov
(18) |
Dec
(13) |
2015 |
Jan
(40) |
Feb
(12) |
Mar
(28) |
Apr
(32) |
May
(53) |
Jun
(65) |
Jul
(27) |
Aug
(6) |
Sep
(13) |
Oct
(25) |
Nov
(48) |
Dec
(19) |
2016 |
Jan
(5) |
Feb
(10) |
Mar
(23) |
Apr
(31) |
May
(19) |
Jun
(28) |
Jul
(19) |
Aug
(2) |
Sep
(9) |
Oct
(18) |
Nov
(10) |
Dec
(4) |
2017 |
Jan
(23) |
Feb
(42) |
Mar
(13) |
Apr
(5) |
May
(7) |
Jun
(26) |
Jul
(13) |
Aug
(8) |
Sep
(1) |
Oct
(3) |
Nov
(27) |
Dec
(4) |
2018 |
Jan
(9) |
Feb
(22) |
Mar
(27) |
Apr
(16) |
May
(7) |
Jun
(5) |
Jul
(7) |
Aug
(1) |
Sep
(36) |
Oct
(17) |
Nov
(1) |
Dec
(5) |
2019 |
Jan
(1) |
Feb
|
Mar
(11) |
Apr
(4) |
May
(7) |
Jun
(6) |
Jul
(9) |
Aug
(4) |
Sep
(6) |
Oct
(4) |
Nov
(5) |
Dec
(13) |
2020 |
Jan
(60) |
Feb
(57) |
Mar
(4) |
Apr
(71) |
May
(1) |
Jun
(1) |
Jul
(7) |
Aug
(11) |
Sep
(6) |
Oct
|
Nov
(2) |
Dec
|
2021 |
Jan
(42) |
Feb
(2) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
(1) |
Dec
|
2022 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: John L. <le...@mo...> - 2002-08-15 00:15:43
|
On Thu, Aug 15, 2002 at 12:13:44AM +0100, Dave Jones wrote: > This is where Andi and I disagree. > Andi keeps old code around 'in case its needed later'. > I'm of the opinion 'rip it out, and re-add as necessary'. Well, you're lucky, I agree with the latter too. Keep an eye on the future, sure, but dead code is untested, distracting, pointless code regards john -- "It is unbecoming for young men to utter maxims." - Aristotle |
From: Philippe E. <ph...@wa...> - 2002-08-14 23:32:46
|
Dave Jones wrote: >Things are taking shape with the hammer oprofile port. >It however, still doesn't work due to some issue with the >NMI handler (I think). > you don't receive nmi at all ? You have probably already checked this but your nmi handler swapgs in the same way as irq entry in entry.S : + testl $3, 8(%rsp) + je 1f + swapgs +1: cld but the nmi handler of 2.5.31 use a different way : /* NMI could happen inside the critical section of a swapgs, so it is needed to use this expensive way to check. */ movl $MSR_GS_BASE,%ecx rdmsr xorl %ebx,%ebx testl %edx,%edx js 1f swapgs Phil |
From: Dave J. <da...@co...> - 2002-08-14 23:11:15
|
On Wed, Aug 14, 2002 at 09:44:28PM +0100, John Levon wrote: > > while for X86_VENDOR_AMD everything but hammer returns equivalent to > > CPU_NO_GOOD. Shouldn't it return CPU_RTC for X86_VENDOR_AMD that are > > non-hammer? > > is there anything that is x86-64 architectecture but isn't AMD/Hammer ? > :) There is the much fabled Intel skunkworks project that will produce a hammer clone if it proves successful enough. This is where Andi and I disagree. Andi keeps old code around 'in case its needed later'. I'm of the opinion 'rip it out, and re-add as necessary'. Transmeta have also aparently licensed x86-64 technology, so at some point there could be various clones, but I don't believe in half-arsed support for things that don't exist yet are really worth it. Dave -- | Dave Jones. http://www.codemonkey.org.uk |
From: Dave J. <da...@co...> - 2002-08-14 23:04:55
|
On Wed, Aug 14, 2002 at 02:37:35PM -0400, William Cohen wrote: > For op_cpu get_cpu_type(void) could be improved. A hard coding of "-1" > shouldn't be used. If the value is invalid it should be CPU_NO_GOOD. > Also it seems odd that for non X86_VENDOR_AMD everything is CPU_RTC, > while for X86_VENDOR_AMD everything but hammer returns equivalent to > CPU_NO_GOOD. Shouldn't it return CPU_RTC for X86_VENDOR_AMD that are > non-hammer? Hmm, point. > For op_events.c why repeat the entries that Athlon for the hammer? If > the events are identical, why not creat a combined entry: Example purposes only. The hammer events are different, I'm just not sure I can disclose them yet. Dave -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs |
From: William C. <wc...@nc...> - 2002-08-14 21:38:45
|
John Levon wrote: > On Wed, Aug 14, 2002 at 02:37:35PM -0400, William Cohen wrote: > > >>For op_cpu get_cpu_type(void) could be improved. A hard coding of "-1" >>shouldn't be used. If the value is invalid it should be CPU_NO_GOOD. >> > > right > > > >>Also it seems odd that for non X86_VENDOR_AMD everything is CPU_RTC, >>while for X86_VENDOR_AMD everything but hammer returns equivalent to >>CPU_NO_GOOD. Shouldn't it return CPU_RTC for X86_VENDOR_AMD that are >>non-hammer? >> > > is there anything that is x86-64 architectecture but isn't AMD/Hammer ? > :) I am not sure there are non-Hammer x86-64 processors. However, it would be possible and likely that Clawhammer and SledgeHammer have different values for cpuid. I don't know if the things the performance monitoring hardware measure are different between the processors. I know that the Itanium and Itanium 2 performance monitoring hardware have different restrictions and event codes, which require knowing which processor implementation is being used, e.g. 32-bit counters on Itanium vs. 48-bit counters on Itanium 2 and restrictions on which registers can measure which events. Plus, there is the possibility of future different implementations. You can look at Intel documents 245320-003 (Itanium) and 251110-001 (Itanium 2) for the gory details. Dave, could you get some clarification from AMD whether there are differences that the performance monitoring software needs to know about the Hammer implementations? Is AMD going to be smart about this and make sure that newer processors have performance monitoring hardware that is a pure superset of the older hardware? The differences between Itanium and Itanium2 performance monitoring hardware makes this impossible. -Will |
From: John L. <le...@mo...> - 2002-08-14 20:47:52
|
On Wed, Aug 14, 2002 at 02:37:35PM -0400, William Cohen wrote: > For op_cpu get_cpu_type(void) could be improved. A hard coding of "-1" > shouldn't be used. If the value is invalid it should be CPU_NO_GOOD. right > Also it seems odd that for non X86_VENDOR_AMD everything is CPU_RTC, > while for X86_VENDOR_AMD everything but hammer returns equivalent to > CPU_NO_GOOD. Shouldn't it return CPU_RTC for X86_VENDOR_AMD that are > non-hammer? is there anything that is x86-64 architectecture but isn't AMD/Hammer ? :) regards john |
From: William C. <wc...@nc...> - 2002-08-14 19:21:05
|
For op_cpu get_cpu_type(void) could be improved. A hard coding of "-1" shouldn't be used. If the value is invalid it should be CPU_NO_GOOD. Also it seems odd that for non X86_VENDOR_AMD everything is CPU_RTC, while for X86_VENDOR_AMD everything but hammer returns equivalent to CPU_NO_GOOD. Shouldn't it return CPU_RTC for X86_VENDOR_AMD that are non-hammer? For op_events.c why repeat the entries that Athlon for the hammer? If the events are identical, why not creat a combined entry: #define OP_ATHLON_HAMMER (OP_ATHLON | OP_HAMMER) And then have entries of the form: { CTR_ALL, OPATHLON_HAMMER, 0xc0, 0, "RETIRED_INSNS", 3000,}, This would reduce the number of entries in the table and decrease redudancy. -Will Dave Jones wrote: > Things are taking shape with the hammer oprofile port. > It however, still doesn't work due to some issue with the > NMI handler (I think). > > I'd appreciate comments on how I've gone about this so far.. > > In short, I did the following so far.. > > 1, copy module/x86 to module/x86-64 > 2, set up configuration to autodetect hammer and build > correct module > 3, fix up obvious 64 bit problems highlighted by the compiler > 4, rewrite NMI handler > 5, rewrite syscall handlers > 6, add wrappers for things like regs->eip accesses. > > do a diff -urN module/x86 module/x86-64 to see the real meat.. > > Parts of oprofile likely still need auditting fixing for 64 bit > issues, but this is where we stand today.. a 50KB diff.. > > Interested parties can find it at http://www.codemonkey.org.uk/opk8.diff > > Again, it doesn't work yet, but its getting there. After its working > I'll go about stripping out some of the cruft that came over from x86/ > that isn't relevant. > > Dave > > |
From: William C. <wc...@nc...> - 2002-08-14 18:57:39
|
Here is a minor patch that will get rid of a couple of the cast operations in op_fileio.c. 2002-08-14 William Cohen <wc...@nc...> * libutil/op_fileio.c (op_read_file): Change fprintf specifier. (op_write_file): Ditto. -Will John Levon wrote: > On Wed, Aug 14, 2002 at 05:52:38PM +0100, Dave Jones wrote: > > >>Things are taking shape with the hammer oprofile port. >>It however, still doesn't work due to some issue with the >>NMI handler (I think). >> >>I'd appreciate comments on how I've gone about this so far.. >> > > I had a quick browse and only have nitpicky comments (can't we pick the > right % instead of those casts for the printfs, can't we have a > ip_value(regs) macro instead of the IP_REG thing, etc.) > > What does your celeron comment refer to btw ? My box seems to work OK :) > > >>Again, it doesn't work yet, but its getting there. After its working >>I'll go about stripping out some of the cruft that came over from x86/ >>that isn't relevant. >> > > OK. Have you tried a kernel patch to hook into the nmi handler ? This > would check if it's the set_gate that's at fault etc. > > regards > john > > |
From: Dave J. <da...@co...> - 2002-08-14 17:36:03
|
On Wed, Aug 14, 2002 at 06:21:12PM +0100, John Levon wrote: > I had a quick browse and only have nitpicky comments (can't we pick the > right % instead of those casts for the printfs, can't we have a > ip_value(regs) macro instead of the IP_REG thing, etc.) Noted. > What does your celeron comment refer to btw ? My box seems to work OK :) I wasn't sure it had been tested. There was something else too when I wrote that TODO entry, but I'm at a loss to remember it right now. > OK. Have you tried a kernel patch to hook into the nmi handler ? This > would check if it's the set_gate that's at fault etc. I've been fiddling with it a little this afternoon. I don't trust the set_gate magic (If I don't get anywhere with this problem I'll wait until Andi looks it over -- he wrote the one I 'borrowed' from the x86-64 kernel). Oh something I also neglected to mention.. Support for hammer in 32 bit mode is also needed at some point. I've not tried yet, but hopefully the existing nmi handler should work as is. It just needs the CPU recognition added and the events updated. Low hanging fruit... Dave -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs |
From: John L. <le...@mo...> - 2002-08-14 17:24:21
|
On Wed, Aug 14, 2002 at 05:52:38PM +0100, Dave Jones wrote: > Things are taking shape with the hammer oprofile port. > It however, still doesn't work due to some issue with the > NMI handler (I think). > > I'd appreciate comments on how I've gone about this so far.. I had a quick browse and only have nitpicky comments (can't we pick the right % instead of those casts for the printfs, can't we have a ip_value(regs) macro instead of the IP_REG thing, etc.) What does your celeron comment refer to btw ? My box seems to work OK :) > Again, it doesn't work yet, but its getting there. After its working > I'll go about stripping out some of the cruft that came over from x86/ > that isn't relevant. OK. Have you tried a kernel patch to hook into the nmi handler ? This would check if it's the set_gate that's at fault etc. regards john -- "It is unbecoming for young men to utter maxims." - Aristotle |
From: John L. <le...@mo...> - 2002-08-14 17:06:11
|
On Mon, Aug 12, 2002 at 05:55:01PM +0200, Philippe Elie wrote: > so 2 entry per cacheline is better, for bz2 test overhead is even less than > the original oprofile. The problem now is to decide if you apply the new > layout format for all kernel version or only when really needed, I prefer > the first solution > > pros: > > - on UP it doesn't pessimizes overhead > > - no multiple ABI between module/daemon, actually only the module is > compiled using the intended kernel target, adding a dependencies of > daemon on the target kernel means: you compile for 2.5.31 make install > reboot, use the module all things work, you reboot to your usual kernel, > try oprofile and dae receive completly wrong data . I agree, let's keep it simple. > Index: dae/opd_kernel.c > =================================================================== > RCS file: /cvsroot/oprofile/oprofile/dae/opd_kernel.c,v Look OK modulo the cleanups we discussed. regards john |
From: Dave J. <da...@co...> - 2002-08-14 16:50:14
|
Things are taking shape with the hammer oprofile port. It however, still doesn't work due to some issue with the NMI handler (I think). I'd appreciate comments on how I've gone about this so far.. In short, I did the following so far.. 1, copy module/x86 to module/x86-64 2, set up configuration to autodetect hammer and build correct module 3, fix up obvious 64 bit problems highlighted by the compiler 4, rewrite NMI handler 5, rewrite syscall handlers 6, add wrappers for things like regs->eip accesses. do a diff -urN module/x86 module/x86-64 to see the real meat.. Parts of oprofile likely still need auditting fixing for 64 bit issues, but this is where we stand today.. a 50KB diff.. Interested parties can find it at http://www.codemonkey.org.uk/opk8.diff Again, it doesn't work yet, but its getting there. After its working I'll go about stripping out some of the cruft that came over from x86/ that isn't relevant. Dave -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs |
From: John H. <joh...@ce...> - 2002-08-13 16:19:59
|
On Tue, 2002-08-13 at 18:04, John Levon wrote: > On Tue, Aug 13, 2002 at 06:01:59PM +0200, John Hearns wrote: > > > If we would like to use op_to_source, > > does it make sense to run this whilst the program is executing, > > or if we should wait until execution is over? > > There is no need to wait, as long as there is a sample file for the > binary (i.e. you might need to run op_dump first). The profiler does not > affect the running of binaries at all. > Thanks. We're running op-dump every 30 seconds using the 'watch' command, so that is certainly being done. |
From: John L. <le...@mo...> - 2002-08-13 16:08:01
|
On Tue, Aug 13, 2002 at 06:01:59PM +0200, John Hearns wrote: > If we would like to use op_to_source, > does it make sense to run this whilst the program is executing, > or if we should wait until execution is over? There is no need to wait, as long as there is a sample file for the binary (i.e. you might need to run op_dump first). The profiler does not affect the running of binaries at all. regards john -- "It is unbecoming for young men to utter maxims." - Aristotle |
From: John H. <joh...@ce...> - 2002-08-13 16:02:11
|
I do apologise for asking a stupid question. If we would like to use op_to_source, does it make sense to run this whilst the program is executing, or if we should wait until execution is over? John Hearns |
From: William C. <wc...@nc...> - 2002-08-13 02:40:39
|
Cool. I have also started to make some patches to make oprofile work better on 64-bit platforms. -Will Dave Jones wrote: > On Sat, Aug 03, 2002 at 12:10:06AM +0100, John Levon wrote: > > > I have been looking throught the data oprofile structures and it looks > > > like a lot of it is coded to assume the addresses all fit in 32-bits. > > > Are there any thoughts into making oprofile 64-bit clean. Maybe have a > > > special typedef for eip rather just using u32? > > > > I imagine Dave and Bob will have patches in this area > > Yep, I have the beginnings of an x86-64 port at home which does > cleanup usage of eip and so on (x86-64 uses rip instead of eip). > I'll commit the 'obvious' bits when I get back home. > > Dave > > |
From: Philippe E. <ph...@wa...> - 2002-08-12 15:51:36
|
sorry, the two last mail on this subject was wrong, the patch for 2 entry per 1 cacheline was incorrect. patch1.diff --> 5 entry per 2 cacheline patch2.diff --> 2 entry per 1 cacheline timing order are real/user/system # bz2-test.20020811-141811.out (3 run) 1 374.60 (0.00%) 370.85 (0.00%) 3.64 (0.00%) no profiling # bz2-test-300000.20020811-141811.out (3 run) 2 375.11 (0.14%) 371.24 (0.11%) 3.81 (4.67%) original 2 375.26 (0.21%) 371.75 (0.25%) 3.41 (-4.21%) 5 entry per 2 cacheline 2 375.11 (0.18%) 371.53 (0.16%) 3.51 (2.33%) 2 entry per 1 cacheline # bz2-test-100000.20020811-141811.out (3 run) 3 379.42 (1.29%) 375.84 (1.35%) 3.53 (-3.02%) original 3 379.53 (1.35%) 375.59 (1.28%) 3.88 (8.99%) 5 entry per 2 cacheline 3 379.00 (1.22%) 375.23 (1.16%) 3.66 (6.71%) 2 entry per 1 cacheline # bz2-test-25000.20020811-141811.out (3 run) 4 389.13 (3.88%) 385.23 (3.88%) 3.84 (5.49%) original 4 388.98 (3.88%) 385.36 (3.92%) 3.56 (-0.00%) 5 entry per 2 cacheline 4 387.70 (3.55%) 384.20 (3.58%) 3.44 (0.29%) 2 entry per 1 cacheline *** # kernel-compile.20020811-150034.out (3 run) 1 529.66 (0.00%) 492.14 (0.00%) 37.50 (0.00%) no profiling # kernel-compile-300000.20020811-150034.out (3 run) 2 535.54 (1.11%) 496.75 (0.94%) 37.09 (-1.09%) original 2 535.74 (1.20%) 497.26 (0.81%) 36.81 (1.88%) 5 entry per 2 cacheline 2 536.12 (1.13%) 498.52 (1.17%) 35.76 (-4.31%) 2 entry per 1 cacheline # kernel-compile-100000.20020811-150034.out (3 run) 3 542.21 (2.37%) 501.12 (1.82%) 36.79 (-1.89%) original 3 543.51 (2.67%) 502.14 (1.80%) 37.03 (2.49%) 5 entry per 2 cacheline 3 543.32 (2.49%) 501.35 (1.74%) 37.42 (0.13%) 2 entry per 1 cacheline # kernel-compile-25000.20020811-150034.out (3 run) 4 571.24 (7.85%) 519.83 (5.63%) 38.75 (3.33%) original 4 573.90 (8.41%) 522.67 (5.96%) 38.31 (6.03%) 5 entry per 2 cacheline 4 571.43 (7.79%) 518.74 (5.27%) 39.46 (5.59%) 2 entry per 1 cacheline so 2 entry per cacheline is better, for bz2 test overhead is even less than the original oprofile. The problem now is to decide if you apply the new layout format for all kernel version or only when really needed, I prefer the first solution pros: - on UP it doesn't pessimizes overhead - no multiple ABI between module/daemon, actually only the module is compiled using the intended kernel target, adding a dependencies of daemon on the target kernel means: you compile for 2.5.31 make install reboot, use the module all things work, you reboot to your usual kernel, try oprofile and dae receive completly wrong data . - this layout is probably more easy to use for 64 bits arch. regards, Phil |
From: Philippe E. <ph...@wa...> - 2002-08-12 15:45:15
|
Philippe Elie wrote: > patch1.diff --> 5 entry per 2 cacheline > patch2.diff --> 2 entry per 1 cacheline > > timing order are real/user/system > > # bz2-test.20020811-141811.out (3 run) > 1 374.60 (0.00%) 370.85 (0.00%) 3.64 (0.00%) no profiling > > # bz2-test-300000.20020811-141811.out (3 run) > 2 375.11 (0.14%) 371.24 (0.11%) 3.81 (4.67%) original > 2 375.26 (0.21%) 371.75 (0.25%) 3.41 (-4.21%) 5 entry per 2 cacheline > 2 374.99 (0.15%) 371.14 (0.05%) 3.78 (13.86%) 2 entry per 1 cacheline > > # bz2-test-100000.20020811-141811.out (3 run) > 3 379.42 (1.29%) 375.84 (1.35%) 3.53 (-3.02%) original > 3 379.53 (1.35%) 375.59 (1.28%) 3.88 (8.99%) 5 entry per 2 cacheline > 3 378.34 (1.05%) 374.90 (1.06%) 3.40 (2.41%) 2 entry per 1 cacheline > > # bz2-test-25000.20020811-141811.out (3 run) > 4 389.13 (3.88%) 385.23 (3.88%) 3.84 (5.49%) original > 4 388.98 (3.88%) 385.36 (3.92%) 3.56 (-0.00%) 5 entry per 2 cacheline > 4 388.11 (3.66%) 384.30 (3.60%) 3.73 (12.35%) 2 entry per 1 cacheline > > *** > > # kernel-compile.20020811-150034.out (3 run) > 1 529.66 (0.00%) 492.14 (0.00%) 37.50 (0.00%) no profiling > > # kernel-compile-300000.20020811-150034.out (3 run) > 2 535.54 (1.11%) 496.75 (0.94%) 37.09 (-1.09%) original > 2 535.74 (1.20%) 497.26 (0.81%) 36.81 (1.88%) 5 entry per 2 cacheline > 2 535.92 (1.10%) 497.72 (0.98%) 36.39 (-2.10%) 2 entry per 1 cacheline > > # kernel-compile-100000.20020811-150034.out (3 run) > 3 542.21 (2.37%) 501.12 (1.82%) 36.79 (-1.89%) original > 3 543.51 (2.67%) 502.14 (1.80%) 37.03 (2.49%) 5 entry per 2 cacheline > 3 543.48 (2.53%) 502.53 (1.95%) 36.44 (-1.96%) 2 entry per 1 cacheline > > # kernel-compile-25000.20020811-150034.out (3 run) > 4 571.24 (7.85%) 519.83 (5.63%) 38.75 (3.33%) original > 4 573.90 (8.41%) 522.67 (5.96%) 38.31 (6.03%) 5 entry per 2 cacheline > 4 573.15 (8.13%) 521.19 (5.74%) 38.48 (3.52%) 2 entry per 1 cacheline > > so 2 entry per cacheline is better, for bz2 test overhead is even less > than > the original oprofile. The problem now is to decide if you apply the new > layout format for all kernel version or only when really needed, I prefer > the first solution > > cons: > > - pessimize slighlty the overhead when uneeded > > pros: > > - no multiple ABI between module/daemon, actually only the module is > compiled using the intended kernel target, adding a dependencies of > daemon on the target kernel means: you compile for 2.5.31 make install > reboot, use the module all things work, you reboot to your usual kernel, > try oprofile and dae receive completly wrong data . > > - this layout is probably more easy to use for 64 bits arch. > > regards, > Phil > >------------------------------------------------------------------------ > >Index: dae/opd_kernel.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_kernel.c,v >retrieving revision 1.16 >diff -u -r1.16 opd_kernel.c >--- dae/opd_kernel.c 18 Jun 2002 03:24:46 -0000 1.16 >+++ dae/opd_kernel.c 12 Aug 2002 08:17:02 -0000 >@@ -330,7 +330,7 @@ > * If the sample could not be located in a module, it is treated > * as a kernel sample. > */ >-static void opd_handle_module_sample(u32 eip, u16 count) >+static void opd_handle_module_sample(u32 eip, u32 count, u32 counter) > { > struct opd_module * module; > >@@ -347,7 +347,7 @@ > if (module->image != NULL) { > opd_stats[OPD_MODULE]++; > opd_put_image_sample(module->image, >- eip - module->start, count); >+ eip - module->start, count, counter); > } else { > opd_stats[OPD_LOST_MODULE]++; > verbprintf("No image for sampled module %s\n", >@@ -376,16 +376,16 @@ > * Handle a sample in kernel address space or in a module. The sample is > * output to the relevant image file. > */ >-void opd_handle_kernel_sample(u32 eip, u16 count) >+void opd_handle_kernel_sample(u32 eip, u32 count, u32 counter) > { > if (eip < kernel_end) { > opd_stats[OPD_KERNEL]++; >- opd_put_image_sample(kernel_image, eip - kernel_start, count); >+ opd_put_image_sample(kernel_image, eip - kernel_start, count, counter); > return; > } > > /* in a module */ >- opd_handle_module_sample(eip, count); >+ opd_handle_module_sample(eip, count, counter); > } > > /** >Index: dae/opd_kernel.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_kernel.h,v >retrieving revision 1.3 >diff -u -r1.3 opd_kernel.h >--- dae/opd_kernel.h 18 Jun 2002 03:24:46 -0000 1.3 >+++ dae/opd_kernel.h 12 Aug 2002 08:17:02 -0000 >@@ -17,7 +17,7 @@ > void opd_init_kernel_image(void); > void opd_parse_kernel_range(char const * arg); > void opd_clear_module_info(void); >-void opd_handle_kernel_sample(u32 eip, u16 count); >+void opd_handle_kernel_sample(u32 eip, u32 count, u32 counter); > int opd_eip_is_kernel(u32 eip); > > #endif /* OPD_KERNEL_H */ >Index: dae/opd_proc.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_proc.c,v >retrieving revision 1.128 >diff -u -r1.128 opd_proc.c >--- dae/opd_proc.c 6 Aug 2002 04:14:18 -0000 1.128 >+++ dae/opd_proc.c 12 Aug 2002 08:17:03 -0000 >@@ -255,30 +255,6 @@ > > > /** >- * opd_get_count - retrieve counter value >- * @param count raw counter value >- * >- * Returns the counter value. >- */ >-inline static u16 opd_get_count(const u16 count) >-{ >- return (count & OP_COUNT_MASK); >-} >- >- >-/** >- * opd_get_counter - retrieve counter type >- * @param count raw counter value >- * >- * Returns the counter number (0-N) >- */ >-inline static u16 opd_get_counter(const u16 count) >-{ >- return OP_COUNTER(count); >-} >- >- >-/** > * opd_put_image_sample - write sample to file > * @param image image for sample > * @param offset (file) offset to write to >@@ -290,12 +266,10 @@ > * > * @count is the raw value passed from the kernel. > */ >-void opd_put_image_sample(struct opd_image * image, u32 offset, u16 count) >+void opd_put_image_sample(struct opd_image * image, u32 offset, u32 count, u32 counter) > { > db_tree_t * sample_file; >- int counter; > >- counter = opd_get_counter(count); > sample_file = &image->sample_files[counter]; > > if (!sample_file->base_memory) { >@@ -306,7 +280,7 @@ > } > } > >- db_insert(sample_file, offset, opd_get_count(count)); >+ db_insert(sample_file, offset, count); > } > > >@@ -325,13 +299,13 @@ > struct opd_proc * proc; > > opd_stats[OPD_SAMPLES]++; >- opd_stats[OPD_SAMPLE_COUNTS] += opd_get_count(sample->count); >+ opd_stats[OPD_SAMPLE_COUNTS] += sample->count; > > verbprintf("DO_PUT_SAMPLE: c%d, EIP 0x%.8x, pid %.6d, count %.6d\n", >- opd_get_counter(sample->count), sample->eip, sample->pid, sample->count); >+ sample->counter, sample->eip, sample->pid, sample->count); > > if (opd_eip_is_kernel(sample->eip)) { >- opd_handle_kernel_sample(sample->eip, sample->count); >+ opd_handle_kernel_sample(sample->eip, sample->count, sample->counter); > return; > } > >@@ -360,7 +334,7 @@ > verb_show_sample(opd_map_offset(&proc->maps[i], sample->eip), > &proc->maps[i], "(LAST MAP)"); > opd_put_image_sample(proc->maps[i].image, >- opd_map_offset(&proc->maps[i], sample->eip), sample->count); >+ opd_map_offset(&proc->maps[i], sample->eip), sample->count, sample->counter); > } > > opd_stats[OPD_PROCESS]++; >@@ -376,7 +350,7 @@ > u32 offset = opd_map_offset(&proc->maps[map], sample->eip); > if (proc->maps[map].image != NULL) { > verb_show_sample(offset, &proc->maps[map], ""); >- opd_put_image_sample(proc->maps[map].image, offset, sample->count); >+ opd_put_image_sample(proc->maps[map].image, offset, sample->count, sample->counter); > } > proc->last_map = map; > opd_stats[OPD_PROCESS]++; >Index: dae/opd_proc.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_proc.h,v >retrieving revision 1.8 >diff -u -r1.8 opd_proc.h >--- dae/opd_proc.h 6 Aug 2002 03:54:34 -0000 1.8 >+++ dae/opd_proc.h 12 Aug 2002 08:17:03 -0000 >@@ -34,7 +34,7 @@ > }; > > void opd_put_sample(struct op_sample const * sample); >-void opd_put_image_sample(struct opd_image * image, u32 offset, u16 count); >+void opd_put_image_sample(struct opd_image * image, u32 offset, u32 count, u32 counter); > void opd_handle_fork(struct op_note const * note); > void opd_handle_exit(struct op_note const * note); > void opd_handle_exec(u16 pid); >Index: dae/oprofiled.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/oprofiled.c,v >retrieving revision 1.93 >diff -u -r1.93 oprofiled.c >--- dae/oprofiled.c 29 Jul 2002 23:40:22 -0000 1.93 >+++ dae/oprofiled.c 12 Aug 2002 08:17:04 -0000 >@@ -614,8 +614,8 @@ > verbprintf("%.6u: EIP: 0x%.8x pid: %.6d count: %.6d\n", > i, buffer[i].eip, buffer[i].pid, buffer[i].count); > >- if (pid_filter && pid_filter != buffer[i].pid) >+ if (pid_filter && (u32)pid_filter != buffer[i].pid) > continue; > if (pgrp_filter && pgrp_filter != getpgid(buffer[i].pid)) > continue; >Index: libop/op_config.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/libop/op_config.h,v >retrieving revision 1.5 >diff -u -r1.5 op_config.h >--- libop/op_config.h 18 Jun 2002 03:24:46 -0000 1.5 >+++ libop/op_config.h 12 Aug 2002 08:17:05 -0000 >@@ -42,7 +42,7 @@ > /*@{\name module default/min/max settings */ > > /** 65536 * 32 = 2097152 bytes default */ >-#define OP_DEFAULT_HASH_SIZE 65536 >+#define OP_DEFAULT_HASH_SIZE 65536/2 /* hack we use 2 cache line and want the same buffer size */ > /** maximum number of entry in module samples hash table */ > #define OP_MAX_HASH_SIZE 262144 > /** minimum number of entry in module samples hash table */ >Index: libop/op_hw_config.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/libop/op_hw_config.h,v >retrieving revision 1.4 >diff -u -r1.4 op_hw_config.h >--- libop/op_hw_config.h 6 Jun 2002 16:18:15 -0000 1.4 >+++ libop/op_hw_config.h 12 Aug 2002 08:17:05 -0000 >@@ -19,19 +19,19 @@ > #define OP_MAX_COUNTERS 4 > > /** the number of bits neccessary to store OP_MAX_COUNTERS values */ >-#define OP_BITS 2 >+//#define OP_BITS 2 > > /** The number of bits available to store count. The 16 value is > * sizeof_in_bits(op_sample.count) */ >-#define OP_BITS_COUNT (16 - OP_BITS) >+#define OP_BITS_COUNT 16 > > /** counter nr mask */ >-#define OP_CTR_MASK ((~0U << (OP_BITS_COUNT + 1)) >> 1) >+//#define OP_CTR_MASK ((~0U << (OP_BITS_COUNT + 1)) >> 1) > > /** top OP_BITS bits of count are used to store counter number */ >-#define OP_COUNTER(x) (((x) & OP_CTR_MASK) >> OP_BITS_COUNT) >+//#define OP_COUNTER(x) (((x) & OP_CTR_MASK) >> OP_BITS_COUNT) > /** low bits store the counter value */ >-#define OP_COUNT_MASK ((1U << OP_BITS_COUNT) - 1U) >+#define OP_COUNT_MAX ((1U << OP_BITS_COUNT) - 1U) > > /** maximum number of events between interrupts. Counters are 40 bits, but > * for convenience we only use 32 bits. The top bit is used for overflow >Index: libop/op_interface.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/libop/op_interface.h,v >retrieving revision 1.6 >diff -u -r1.6 op_interface.h >--- libop/op_interface.h 12 Jul 2002 18:30:36 -0000 1.6 >+++ libop/op_interface.h 12 Aug 2002 08:17:05 -0000 >@@ -31,10 +31,11 @@ > > /** Data type to transfer samples counts from the module to the daemon */ > struct op_sample { >- u16 count; /**< samples count; high order bits contains the counter nr */ >- u16 pid; /**< 32 bits but only 16 bits are used currently */ >+ u16 count; /**< samples count */ >+ u16 counter; /**< counter nr */ >+ u32 pid; /**< 32 bits but only 30 bits are used currently */ > u32 eip; /**< eip value where occur interrupt */ >-} __attribute__((__packed__, __aligned__(8))); >+} /*__attribute__((__packed__, __aligned__(8)))*/; > > /** the current kernel-side profiler state */ > enum oprof_state { >@@ -64,8 +65,8 @@ > u32 len; > u32 offset; > u32 hash; >- u16 pid; >- u16 type; >+ u32 pid; >+ u32 type; > }; > > /** >Index: module/oprofile.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/module/oprofile.c,v >retrieving revision 1.69 >diff -u -r1.69 oprofile.c >--- module/oprofile.c 21 Jul 2002 16:51:43 -0000 1.69 >+++ module/oprofile.c 12 Aug 2002 08:17:06 -0000 >@@ -115,7 +115,8 @@ > { > ops->eip = eip; > ops->pid = pid; >- ops->count = (1U << OP_BITS_COUNT) * ctr + 1; >+ ops->count = 1; >+ ops->counter = ctr; > } > > void regparm3 op_do_profile(uint cpu, struct pt_regs * regs, int ctr) >Index: module/oprofile.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/module/oprofile.h,v >retrieving revision 1.38 >diff -u -r1.38 oprofile.h >--- module/oprofile.h 12 Jul 2002 18:30:37 -0000 1.38 >+++ module/oprofile.h 12 Aug 2002 08:17:07 -0000 >@@ -41,11 +41,12 @@ > > #define regparm3 __attribute__((regparm(3))) > >-#define OP_NR_ENTRY (SMP_CACHE_BYTES/sizeof(struct op_sample)) >+#define OP_NR_ENTRY ((SMP_CACHE_BYTES*2)/sizeof(struct op_sample)) > > struct op_entry { > struct op_sample samples[OP_NR_ENTRY]; >-}; >+ int dummy; >+} __cacheline_aligned_in_smp; > > /* per-cpu dynamic data */ > struct _oprof_data { >@@ -135,7 +136,7 @@ > #define DNAME_STACK_MAX 1024 > > /* is the count at maximal value ? */ >-#define op_full_count(c) (((c) & OP_COUNT_MASK) == OP_COUNT_MASK) >+#define op_full_count(c) ((c) == OP_COUNT_MAX) > > /* the ctr bit is used to separate the two counters. > * Simple and effective hash. If you can do better, prove it ... >Index: utils/op_start >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/utils/op_start,v >retrieving revision 1.9 >diff -u -r1.9 op_start >--- utils/op_start 8 Aug 2002 20:51:52 -0000 1.9 >+++ utils/op_start 12 Aug 2002 08:17:07 -0000 >@@ -88,7 +88,7 @@ > --ctrN-unit-mask=val unit mask for ctr N > --ctrN-kernel=[0|1] whether to count kernel events for ctr N > --ctrN-user=[0|1] whether to count user events for ctr N >- Allowed counter for N are [$OP_COUNTERS] >+ Allowed counters for N are [$OP_COUNTERS] > --rtc-value=val RTC value (only if RTC is being used) > --pid-filter=pid Only profile process pid > --pgrp-filter=pgrp Only profile process tty group pgrp > > >------------------------------------------------------------------------ > >Index: dae/opd_kernel.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_kernel.c,v >retrieving revision 1.16 >diff -u -r1.16 opd_kernel.c >--- dae/opd_kernel.c 18 Jun 2002 03:24:46 -0000 1.16 >+++ dae/opd_kernel.c 12 Aug 2002 10:15:28 -0000 >@@ -330,7 +330,7 @@ > * If the sample could not be located in a module, it is treated > * as a kernel sample. > */ >-static void opd_handle_module_sample(u32 eip, u16 count) >+static void opd_handle_module_sample(u32 eip, u32 count, u32 counter) > { > struct opd_module * module; > >@@ -347,7 +347,7 @@ > if (module->image != NULL) { > opd_stats[OPD_MODULE]++; > opd_put_image_sample(module->image, >- eip - module->start, count); >+ eip - module->start, count, counter); > } else { > opd_stats[OPD_LOST_MODULE]++; > verbprintf("No image for sampled module %s\n", >@@ -376,16 +376,16 @@ > * Handle a sample in kernel address space or in a module. The sample is > * output to the relevant image file. > */ >-void opd_handle_kernel_sample(u32 eip, u16 count) >+void opd_handle_kernel_sample(u32 eip, u32 count, u32 counter) > { > if (eip < kernel_end) { > opd_stats[OPD_KERNEL]++; >- opd_put_image_sample(kernel_image, eip - kernel_start, count); >+ opd_put_image_sample(kernel_image, eip - kernel_start, count, counter); > return; > } > > /* in a module */ >- opd_handle_module_sample(eip, count); >+ opd_handle_module_sample(eip, count, counter); > } > > /** >Index: dae/opd_kernel.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_kernel.h,v >retrieving revision 1.3 >diff -u -r1.3 opd_kernel.h >--- dae/opd_kernel.h 18 Jun 2002 03:24:46 -0000 1.3 >+++ dae/opd_kernel.h 12 Aug 2002 10:15:28 -0000 >@@ -17,7 +17,7 @@ > void opd_init_kernel_image(void); > void opd_parse_kernel_range(char const * arg); > void opd_clear_module_info(void); >-void opd_handle_kernel_sample(u32 eip, u16 count); >+void opd_handle_kernel_sample(u32 eip, u32 count, u32 counter); > int opd_eip_is_kernel(u32 eip); > > #endif /* OPD_KERNEL_H */ >Index: dae/opd_proc.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_proc.c,v >retrieving revision 1.128 >diff -u -r1.128 opd_proc.c >--- dae/opd_proc.c 6 Aug 2002 04:14:18 -0000 1.128 >+++ dae/opd_proc.c 12 Aug 2002 10:15:28 -0000 >@@ -255,30 +255,6 @@ > > > /** >- * opd_get_count - retrieve counter value >- * @param count raw counter value >- * >- * Returns the counter value. >- */ >-inline static u16 opd_get_count(const u16 count) >-{ >- return (count & OP_COUNT_MASK); >-} >- >- >-/** >- * opd_get_counter - retrieve counter type >- * @param count raw counter value >- * >- * Returns the counter number (0-N) >- */ >-inline static u16 opd_get_counter(const u16 count) >-{ >- return OP_COUNTER(count); >-} >- >- >-/** > * opd_put_image_sample - write sample to file > * @param image image for sample > * @param offset (file) offset to write to >@@ -290,12 +266,10 @@ > * > * @count is the raw value passed from the kernel. > */ >-void opd_put_image_sample(struct opd_image * image, u32 offset, u16 count) >+void opd_put_image_sample(struct opd_image * image, u32 offset, u32 count, u32 counter) > { > db_tree_t * sample_file; >- int counter; > >- counter = opd_get_counter(count); > sample_file = &image->sample_files[counter]; > > if (!sample_file->base_memory) { >@@ -306,7 +280,7 @@ > } > } > >- db_insert(sample_file, offset, opd_get_count(count)); >+ db_insert(sample_file, offset, count); > } > > >@@ -325,13 +299,13 @@ > struct opd_proc * proc; > > opd_stats[OPD_SAMPLES]++; >- opd_stats[OPD_SAMPLE_COUNTS] += opd_get_count(sample->count); >+ opd_stats[OPD_SAMPLE_COUNTS] += sample->count; > > verbprintf("DO_PUT_SAMPLE: c%d, EIP 0x%.8x, pid %.6d, count %.6d\n", >- opd_get_counter(sample->count), sample->eip, sample->pid, sample->count); >+ sample->counter, sample->eip, sample->pid, sample->count); > > if (opd_eip_is_kernel(sample->eip)) { >- opd_handle_kernel_sample(sample->eip, sample->count); >+ opd_handle_kernel_sample(sample->eip, sample->count, sample->counter); > return; > } > >@@ -360,7 +334,7 @@ > verb_show_sample(opd_map_offset(&proc->maps[i], sample->eip), > &proc->maps[i], "(LAST MAP)"); > opd_put_image_sample(proc->maps[i].image, >- opd_map_offset(&proc->maps[i], sample->eip), sample->count); >+ opd_map_offset(&proc->maps[i], sample->eip), sample->count, sample->counter); > } > > opd_stats[OPD_PROCESS]++; >@@ -376,7 +350,7 @@ > u32 offset = opd_map_offset(&proc->maps[map], sample->eip); > if (proc->maps[map].image != NULL) { > verb_show_sample(offset, &proc->maps[map], ""); >- opd_put_image_sample(proc->maps[map].image, offset, sample->count); >+ opd_put_image_sample(proc->maps[map].image, offset, sample->count, sample->counter); > } > proc->last_map = map; > opd_stats[OPD_PROCESS]++; >Index: dae/opd_proc.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/opd_proc.h,v >retrieving revision 1.8 >diff -u -r1.8 opd_proc.h >--- dae/opd_proc.h 6 Aug 2002 03:54:34 -0000 1.8 >+++ dae/opd_proc.h 12 Aug 2002 10:15:28 -0000 >@@ -34,7 +34,7 @@ > }; > > void opd_put_sample(struct op_sample const * sample); >-void opd_put_image_sample(struct opd_image * image, u32 offset, u16 count); >+void opd_put_image_sample(struct opd_image * image, u32 offset, u32 count, u32 counter); > void opd_handle_fork(struct op_note const * note); > void opd_handle_exit(struct op_note const * note); > void opd_handle_exec(u16 pid); >Index: dae/oprofiled.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/dae/oprofiled.c,v >retrieving revision 1.93 >diff -u -r1.93 oprofiled.c >--- dae/oprofiled.c 29 Jul 2002 23:40:22 -0000 1.93 >+++ dae/oprofiled.c 12 Aug 2002 10:15:28 -0000 >@@ -614,8 +614,8 @@ > verbprintf("%.6u: EIP: 0x%.8x pid: %.6d count: %.6d\n", > i, buffer[i].eip, buffer[i].pid, buffer[i].count); > >- if (pid_filter && pid_filter != buffer[i].pid) >+ if (pid_filter && (u32)pid_filter != buffer[i].pid) > continue; > if (pgrp_filter && pgrp_filter != getpgid(buffer[i].pid)) > continue; >Index: libop/op_hw_config.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/libop/op_hw_config.h,v >retrieving revision 1.4 >diff -u -r1.4 op_hw_config.h >--- libop/op_hw_config.h 6 Jun 2002 16:18:15 -0000 1.4 >+++ libop/op_hw_config.h 12 Aug 2002 10:15:28 -0000 >@@ -19,19 +19,19 @@ > #define OP_MAX_COUNTERS 4 > > /** the number of bits neccessary to store OP_MAX_COUNTERS values */ >-#define OP_BITS 2 >+//#define OP_BITS 2 > > /** The number of bits available to store count. The 16 value is > * sizeof_in_bits(op_sample.count) */ >-#define OP_BITS_COUNT (16 - OP_BITS) >+#define OP_BITS_COUNT 31 > > /** counter nr mask */ >-#define OP_CTR_MASK ((~0U << (OP_BITS_COUNT + 1)) >> 1) >+//#define OP_CTR_MASK ((~0U << (OP_BITS_COUNT + 1)) >> 1) > > /** top OP_BITS bits of count are used to store counter number */ >-#define OP_COUNTER(x) (((x) & OP_CTR_MASK) >> OP_BITS_COUNT) >+//#define OP_COUNTER(x) (((x) & OP_CTR_MASK) >> OP_BITS_COUNT) > /** low bits store the counter value */ >-#define OP_COUNT_MASK ((1U << OP_BITS_COUNT) - 1U) >+#define OP_COUNT_MAX ((1U << OP_BITS_COUNT) - 1U) > > /** maximum number of events between interrupts. Counters are 40 bits, but > * for convenience we only use 32 bits. The top bit is used for overflow >Index: libop/op_interface.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/libop/op_interface.h,v >retrieving revision 1.6 >diff -u -r1.6 op_interface.h >--- libop/op_interface.h 12 Jul 2002 18:30:36 -0000 1.6 >+++ libop/op_interface.h 12 Aug 2002 10:15:28 -0000 >@@ -31,10 +31,11 @@ > > /** Data type to transfer samples counts from the module to the daemon */ > struct op_sample { >- u16 count; /**< samples count; high order bits contains the counter nr */ >- u16 pid; /**< 32 bits but only 16 bits are used currently */ >+ u32 count; /**< samples count */ >+ u32 counter; /**< counter nr */ >+ u32 pid; /**< 32 bits but only 30 bits are used currently */ > u32 eip; /**< eip value where occur interrupt */ >-} __attribute__((__packed__, __aligned__(8))); >+} __attribute__((__packed__, __aligned__(16))); > > /** the current kernel-side profiler state */ > enum oprof_state { >@@ -64,8 +65,8 @@ > u32 len; > u32 offset; > u32 hash; >- u16 pid; >- u16 type; >+ u32 pid; >+ u32 type; > }; > > /** >Index: module/oprofile.c >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/module/oprofile.c,v >retrieving revision 1.69 >diff -u -r1.69 oprofile.c >--- module/oprofile.c 21 Jul 2002 16:51:43 -0000 1.69 >+++ module/oprofile.c 12 Aug 2002 10:15:29 -0000 >@@ -115,7 +115,8 @@ > { > ops->eip = eip; > ops->pid = pid; >- ops->count = (1U << OP_BITS_COUNT) * ctr + 1; >+ ops->count = 1; >+ ops->counter = ctr; > } > > void regparm3 op_do_profile(uint cpu, struct pt_regs * regs, int ctr) >Index: module/oprofile.h >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/module/oprofile.h,v >retrieving revision 1.38 >diff -u -r1.38 oprofile.h >--- module/oprofile.h 12 Jul 2002 18:30:37 -0000 1.38 >+++ module/oprofile.h 12 Aug 2002 10:15:30 -0000 >@@ -45,7 +45,8 @@ > > struct op_entry { > struct op_sample samples[OP_NR_ENTRY]; >-}; >+ int dummy; >+} __cacheline_aligned_in_smp; > > /* per-cpu dynamic data */ > struct _oprof_data { >@@ -135,7 +136,7 @@ > #define DNAME_STACK_MAX 1024 > > /* is the count at maximal value ? */ >-#define op_full_count(c) (((c) & OP_COUNT_MASK) == OP_COUNT_MASK) >+#define op_full_count(c) ((c) == OP_COUNT_MAX) > > /* the ctr bit is used to separate the two counters. > * Simple and effective hash. If you can do better, prove it ... >Index: utils/op_start >=================================================================== >RCS file: /cvsroot/oprofile/oprofile/utils/op_start,v >retrieving revision 1.9 >diff -u -r1.9 op_start >--- utils/op_start 8 Aug 2002 20:51:52 -0000 1.9 >+++ utils/op_start 12 Aug 2002 10:15:30 -0000 >@@ -88,7 +88,7 @@ > --ctrN-unit-mask=val unit mask for ctr N > --ctrN-kernel=[0|1] whether to count kernel events for ctr N > --ctrN-user=[0|1] whether to count user events for ctr N >- Allowed counter for N are [$OP_COUNTERS] >+ Allowed counters for N are [$OP_COUNTERS] > --rtc-value=val RTC value (only if RTC is being used) > --pid-filter=pid Only profile process pid > --pgrp-filter=pgrp Only profile process tty group pgrp > > |
From: Philippe E. <ph...@wa...> - 2002-08-12 13:10:46
|
patch1.diff --> 5 entry per 2 cacheline patch2.diff --> 2 entry per 1 cacheline timing order are real/user/system # bz2-test.20020811-141811.out (3 run) 1 374.60 (0.00%) 370.85 (0.00%) 3.64 (0.00%) no profiling # bz2-test-300000.20020811-141811.out (3 run) 2 375.11 (0.14%) 371.24 (0.11%) 3.81 (4.67%) original 2 375.26 (0.21%) 371.75 (0.25%) 3.41 (-4.21%) 5 entry per 2 cacheline 2 374.99 (0.15%) 371.14 (0.05%) 3.78 (13.86%) 2 entry per 1 cacheline # bz2-test-100000.20020811-141811.out (3 run) 3 379.42 (1.29%) 375.84 (1.35%) 3.53 (-3.02%) original 3 379.53 (1.35%) 375.59 (1.28%) 3.88 (8.99%) 5 entry per 2 cacheline 3 378.34 (1.05%) 374.90 (1.06%) 3.40 (2.41%) 2 entry per 1 cacheline # bz2-test-25000.20020811-141811.out (3 run) 4 389.13 (3.88%) 385.23 (3.88%) 3.84 (5.49%) original 4 388.98 (3.88%) 385.36 (3.92%) 3.56 (-0.00%) 5 entry per 2 cacheline 4 388.11 (3.66%) 384.30 (3.60%) 3.73 (12.35%) 2 entry per 1 cacheline *** # kernel-compile.20020811-150034.out (3 run) 1 529.66 (0.00%) 492.14 (0.00%) 37.50 (0.00%) no profiling # kernel-compile-300000.20020811-150034.out (3 run) 2 535.54 (1.11%) 496.75 (0.94%) 37.09 (-1.09%) original 2 535.74 (1.20%) 497.26 (0.81%) 36.81 (1.88%) 5 entry per 2 cacheline 2 535.92 (1.10%) 497.72 (0.98%) 36.39 (-2.10%) 2 entry per 1 cacheline # kernel-compile-100000.20020811-150034.out (3 run) 3 542.21 (2.37%) 501.12 (1.82%) 36.79 (-1.89%) original 3 543.51 (2.67%) 502.14 (1.80%) 37.03 (2.49%) 5 entry per 2 cacheline 3 543.48 (2.53%) 502.53 (1.95%) 36.44 (-1.96%) 2 entry per 1 cacheline # kernel-compile-25000.20020811-150034.out (3 run) 4 571.24 (7.85%) 519.83 (5.63%) 38.75 (3.33%) original 4 573.90 (8.41%) 522.67 (5.96%) 38.31 (6.03%) 5 entry per 2 cacheline 4 573.15 (8.13%) 521.19 (5.74%) 38.48 (3.52%) 2 entry per 1 cacheline so 2 entry per cacheline is better, for bz2 test overhead is even less than the original oprofile. The problem now is to decide if you apply the new layout format for all kernel version or only when really needed, I prefer the first solution cons: - pessimize slighlty the overhead when uneeded pros: - no multiple ABI between module/daemon, actually only the module is compiled using the intended kernel target, adding a dependencies of daemon on the target kernel means: you compile for 2.5.31 make install reboot, use the module all things work, you reboot to your usual kernel, try oprofile and dae receive completly wrong data . - this layout is probably more easy to use for 64 bits arch. regards, Phil |
From: Dave J. <da...@su...> - 2002-08-12 12:03:40
|
On Sat, Aug 03, 2002 at 12:10:06AM +0100, John Levon wrote: > > I have been looking throught the data oprofile structures and it looks > > like a lot of it is coded to assume the addresses all fit in 32-bits. > > Are there any thoughts into making oprofile 64-bit clean. Maybe have a > > special typedef for eip rather just using u32? > > I imagine Dave and Bob will have patches in this area Yep, I have the beginnings of an x86-64 port at home which does cleanup usage of eip and so on (x86-64 uses rip instead of eip). I'll commit the 'obvious' bits when I get back home. Dave -- | Dave Jones. http://www.codemonkey.org.uk | SuSE Labs |
From: John H. <joh...@ce...> - 2002-08-09 14:44:51
|
Could some kind soul please offer me some help off-list? |
From: Philippe E. <ph...@wa...> - 2002-08-08 20:48:22
|
William Cohen wrote: hi, all patch taht decrease dependencies on architecture specific things (like here nr counters) are usefull. > 2002-08-08 Will Cohen <wc...@nc...> > > * utils/op_start: Do not compute quantity of counters from > CPUTYPE. Get counter names directly from > /proc/sys/dev/oprofile. applied with this minor change @@ -88,7 +88,7 @@ --ctrN-unit-mask=val unit mask for ctr N --ctrN-kernel=[0|1] whether to count kernel events for ctr N --ctrN-user=[0|1] whether to count user events for ctr N - Allowed range for N is [0-$MAX_COUNTER] + Allowed counter for N are [$OP_COUNTERS] the op_start --help was broken by your patch @@ -145,33 +145,20 @@ HASH_MAP_DEVICE_FILE="$DIR/ophashmapdev" CPUTYPE=`cat /proc/sys/dev/oprofile/cpu_type` + OP_COUNTERS=`ls /proc/sys/dev/oprofile/ | grep "^[0-9]\+\$" | tr "\n" " "` accetp only integer, tr suppress linefeed to allow use of OP_COUNTERS in op_start --help IS_RTC=0 case "$CPUTYPE" in - 0|1|2) - OP_MAX_COUNTERS=2 - MAX_COUNTER=1 - ;; - 3|5|6|7) cvs tree contain only a 3) regards, Phil |
From: John L. <le...@mo...> - 2002-08-08 19:09:37
|
On Thu, Aug 08, 2002 at 08:52:22PM +0200, Philippe Elie wrote: > John if you agree I'll commit it. The proper way is probably > to export a nr_counters file but with the following change it > seems sufficiently robust. Seems OK. I haven't looked at the actual patch though, just the description. Please do regards john -- "It is unbecoming for young men to utter maxims." - Aristotle |
From: Philippe E. <ph...@wa...> - 2002-08-08 18:48:55
|
William Cohen wrote: [snip] John if you agree I'll commit it. The proper way is probably to export a nr_counters file but with the following change it seems sufficiently robust. > > CPUTYPE=`cat /proc/sys/dev/oprofile/cpu_type` >+ OP_COUNTERS=`ls /proc/sys/dev/oprofile/ | grep "[0-9][0-9]*"` > OP_COUNTERS=`ls /proc/sys/dev/oprofile/ | grep "^[0-9]\+\$"` restrict matched string to integer only John, if you agree I'll commit it. regards, Phil |
From: William C. <wc...@nc...> - 2002-08-08 16:50:35
|
I would like to localize the processor specific information and avoiding having cpu specific infomation in multiple places. op_start is currently written that it needs to know the processor type to compute the number of counters. I have developed a patch that just gets the counter information from /proc/sys/dev/oprofile, so op_start doesn't need to know the specific processor implementation. It only reads the cpu type to determine if the module is running in RTC mode. This patch should make it a little easier to add additional processors to oprofile. 2002-08-08 Will Cohen <wc...@nc...> * utils/op_start: Do not compute quantity of counters from CPUTYPE. Get counter names directly from /proc/sys/dev/oprofile. -Will |