From: Will C. <wc...@re...> - 2004-02-25 17:07:14
Attachments:
kernel-2.6-oprofileia32e.patch
oprofile-0.8-ia32e.patch
|
I have been working on oprofile ia32e support. The ia32e performance monitoring hardware is pretty much like the p4. I have a preliminary kernel patch that makes a unique identifiers for the ia32e (x86-64/ia32 and x86-64/ia32e-ht) and oprofile recognize the associated identifiers. -Will |
From: Philippe E. <ph...@wa...> - 2004-02-25 17:51:58
|
On Wed, 25 Feb 2004 at 12:05 +0000, Will Cohen wrote: > I have been working on oprofile ia32e support. The ia32e performance > monitoring hardware is pretty much like the p4. I have a preliminary > kernel patch that makes a unique identifiers for the ia32e (x86-64/ia32 > and x86-64/ia32e-ht) and oprofile recognize the associated identifiers. Will I applied your patch diff events/x86_64/ia32e/, events/i386/p4/ and events/x86_64/ia32e-ht/, events/i386/p4-ht/ and there is absolutely no difference so by "pretty much like the p4" you mean exactly identical at oprofile point of view (and this is what I got too by reading the documentation) So do we need to treat ia32e specially ? The only bad point is the displayed name, I don't really take care about that. I first tough than adding these two lines in op_cpu_type.c will work { "Intel ia32e", "x86-64/ia32e", CPU_P4, 8 }, { "Intel ia32e with 2 hyper-threads", "x86-64/ia32e-ht", CPU_P4_HT2, 4}, but we use op_get_cpu_type_str() and op_get_cpu_name() by getting the cpu type from samplefiles header and so we will get the wrong string. There is solution to fix that in a backward compatible way (i/e use one of unused field in sample file header, use it to select a sub-processor type) and we don't need to bump sample file version. John any feeling about the naming issue ? I'm a bit inclined to call ia32e(-ht) a P4(-ht) and to do nothing about ia32e but I'm prolly a bit too lazy ... In case we want the kernel patch is there any box available to test it ? regards, Phil |
From: William C. <wc...@nc...> - 2004-02-25 19:47:35
|
Philippe Elie wrote: > On Wed, 25 Feb 2004 at 12:05 +0000, Will Cohen wrote: > > >>I have been working on oprofile ia32e support. The ia32e performance >>monitoring hardware is pretty much like the p4. I have a preliminary >>kernel patch that makes a unique identifiers for the ia32e (x86-64/ia32 >>and x86-64/ia32e-ht) and oprofile recognize the associated identifiers. > > > Will I applied your patch diff events/x86_64/ia32e/, events/i386/p4/ and > events/x86_64/ia32e-ht/, events/i386/p4-ht/ and there is absolutely > no difference so by "pretty much like the p4" you mean exactly identical > at oprofile point of view (and this is what I got too by reading the > documentation) So do we need to treat ia32e specially ? The only bad point > is the displayed name, I don't really take care about that. I first tough > than adding these two lines in op_cpu_type.c will work My understanding is the events are the same. It seems likely to me that there are likely to be enhancements like the athlon to amd64. However, I don't know about that for sure. I haven't seen documentation listing the events for the ia32e. Some of the performance monitoring hardware has changed because pointers go from 32-bit to 64-bit. > { "Intel ia32e", "x86-64/ia32e", CPU_P4, 8 }, > { "Intel ia32e with 2 hyper-threads", "x86-64/ia32e-ht", CPU_P4_HT2, 4}, > > but we use op_get_cpu_type_str() and op_get_cpu_name() by getting > the cpu type from samplefiles header and so we will get the wrong string. > There is solution to fix that in a backward compatible way (i/e use one > of unused field in sample file header, use it to select a sub-processor type) > and we don't need to bump sample file version. > > John any feeling about the naming issue ? I'm a bit inclined to call > ia32e(-ht) a P4(-ht) and to do nothing about ia32e but I'm prolly a bit > too lazy ... Using p4 for ia32e has two possible drawbacks: 1) The ia32e is 64-bit. things might get a bit confusing have both 32-bit and 64-bit p4 sample files. Of course to be really confusing the ia32e can run in 32-bit legacy mode. 2) I suspect there may be some differences in the events supplied by the p4 and ia32e. For example the hammer had a super set of Athlon events. > In case we want the kernel patch is there any box available to test it ? I do have an ia32e machine here to test things out out. -Will |
From: Philippe E. <ph...@wa...> - 2004-02-25 20:32:53
|
On Wed, 25 Feb 2004 at 14:45 +0000, William Cohen wrote: > Philippe Elie wrote: > Using p4 for ia32e has two possible drawbacks: > > 1) The ia32e is 64-bit. things might get a bit confusing have both > 32-bit and 64-bit p4 sample files. Of course to be really confusing the > ia32e can run in 32-bit legacy mode. The samples file format is already 64 bits even on 32 bits platform. Obviously we will got problem with callgraph... The size of data passed from the driver to daemon is given by the pointer_size which is known at runtime Afaics our dependencies on pointer size are very small. We blindly use a typedef unsigned long long vma_t; and we narrow down to 32 bits offset when possible. post profile tools use bfd_vma type, on such platform if you run a 64 bit kernel libbfd will be compiled to support 64 bits (else it's not possible to link/compile the kernel) so it'll safe too. pp tools determine at run time if the vma fits in 32 or 64 bits. > 2) I suspect there may be some differences in the events supplied by the > p4 and ia32e. For example the hammer had a super set of Athlon events. but appendix D1 #30083501.pdf: There are no 64-bit mode specific extensions/modifications to event counting and imprecise sampling of the Performance Monitoring capabilities. beside that the doc describe pebs support extended to 64 bits which we doesn't support. > >In case we want the kernel patch is there any box available to test it ? > > I do have an ia32e machine here to test things out out. hey, is it working as it ? If the model reported as cpu_family 15 model 3 and oprofile doesn't work as it (driver/daemon/pp tools), and depending on the needed change we will need perhaps separate events files but fow now I prefer we try to re-use P4 events files. I'm bored with 4 near to be identical events file and unit mask file (for the common party with ht), let say you want to modify branch_retired um and we will need to modify four files. A good alternative will be to use your patch but symlink the event files for now and provide separate events file if it's needed. regards, Phil |
From: Will C. <wc...@re...> - 2004-02-25 21:33:54
|
Philippe Elie wrote: > On Wed, 25 Feb 2004 at 14:45 +0000, William Cohen wrote: > > >>Philippe Elie wrote: > > >>Using p4 for ia32e has two possible drawbacks: >> >>1) The ia32e is 64-bit. things might get a bit confusing have both >>32-bit and 64-bit p4 sample files. Of course to be really confusing the >>ia32e can run in 32-bit legacy mode. > > > The samples file format is already 64 bits even on 32 bits platform. > Obviously we will got problem with callgraph... The size of data > passed from the driver to daemon is given by the pointer_size which > is known at runtime > > Afaics our dependencies on pointer size are very small. We blindly use a > > typedef unsigned long long vma_t; > > and we narrow down to 32 bits offset when possible. > > post profile tools use bfd_vma type, on such platform if you run a 64 > bit kernel libbfd will be compiled to support 64 bits (else it's not > possible to link/compile the kernel) so it'll safe too. pp tools determine > at run time if the vma fits in 32 or 64 bits. > > > >>2) I suspect there may be some differences in the events supplied by the >>p4 and ia32e. For example the hammer had a super set of Athlon events. > > > but appendix D1 #30083501.pdf: > > There are no 64-bit mode specific extensions/modifications to event > counting and imprecise sampling of the Performance Monitoring > capabilities. That doesn't exclude Intel from making the events a pure super set of the P4 events that work in both 32- and 64-bit mode. > beside that the doc describe pebs support extended to 64 bits which we > doesn't support. > > >>>In case we want the kernel patch is there any box available to test it ? >> >>I do have an ia32e machine here to test things out out. > > > hey, is it working as it ? I have gotten some oprofile data collection, but it isn't working in all cases. The 2.4 kernel used in RHEL 3 works when the processors are not in HT mode and I have been able to collect data with RHEL3. Tried the 2.6 kernel and oprofile locks up the machine in the oprofile_add_sample. Comment out the call to oprofile_add_sample() in p4_check_ctrs() and the nmi routine takes samples and the machine runs. It is very strange. > If the model reported as cpu_family 15 model 3 and oprofile doesn't work > as it (driver/daemon/pp tools), and depending on the needed change we will > need perhaps separate events files but fow now I prefer we try to re-use > P4 events files. I'm bored with 4 near to be identical events file and > unit mask file (for the common party with ht), let say you want to modify > branch_retired um and we will need to modify four files. yes, that is a good counter argument against having yet another variation of the P4 events. It is tedious to change the same thing in each of the files. > A good alternative will be to use your patch but symlink the event files > for now and provide separate events file if it's needed. Earlier today I sent email to Intel whether the P4 aren ia32e are truly identical. It would be good to know for sure, so we can make an informed decision about this. -Will |
From: John L. <le...@mo...> - 2004-02-26 01:17:56
|
On Wed, Feb 25, 2004 at 12:05:43PM -0500, Will Cohen wrote: > +inline static int __init is_ia32e(void) > +{ > + return (test_bit(X86_FEATURE_LM, current_cpu_data.x86_capability)); > +} What's LM ? Is this at all like the Intel-preferred way of testing ? john |
From: William C. <wc...@nc...> - 2004-02-26 02:27:18
|
John Levon wrote: > On Wed, Feb 25, 2004 at 12:05:43PM -0500, Will Cohen wrote: > > >>+inline static int __init is_ia32e(void) >>+{ >>+ return (test_bit(X86_FEATURE_LM, current_cpu_data.x86_capability)); >>+ > > > What's LM ? Is this at all like the Intel-preferred way of testing ? > > john This flag is indicates that the processor has the 64-bit technology. It doesn't inidicate the processor is currently in 64-bit mode. It is very much like the ht feature flag. I haven't seen an update version of AP-485 and looking over the current set of patches in the kernel don't really provide great details on identifying the processor. The 30083401.pdf page 144 mentions the capability flag for "64-biut extensions technology available" for 80000001H, "Extended Processor Signature and Extended Feature Bits." It is an official bit. -Will |
From: John L. <le...@mo...> - 2004-02-26 16:04:42
|
On Wed, Feb 25, 2004 at 09:25:16PM -0500, William Cohen wrote: > The 30083401.pdf page 144 mentions the capability flag for "64-biut > extensions technology available" for 80000001H, "Extended Processor > Signature and Extended Feature Bits." It is an official bit. OK john -- "Spammers get STABBED by GOD." - Ron Echeverri |
From: Andi K. <ak...@mu...> - 2004-02-29 11:21:48
|
John Levon <le...@mo...> writes: > On Wed, Feb 25, 2004 at 12:05:43PM -0500, Will Cohen wrote: > >> +inline static int __init is_ia32e(void) >> +{ >> + return (test_bit(X86_FEATURE_LM, current_cpu_data.x86_capability)); >> +} > > What's LM ? Is this at all like the Intel-preferred way of testing ? I don't think it's a good idea to only test for LM(=long mode) here. Future Intel CPUs may have completely different performance counters, but likely will still have LM. And the current shipping P4Es are Prescotts and have these counters, but long mode is not enabled. Better test for family/model (family == 15, model >= 3). I must admit I never even liked the "x86_64" for hammer, because an hammer can as well run in 32bit legacy mode and then there is no x86-64. I think the event files should be only keyed on the CPU family/mode, not on 32bit/64bit. oprofile really should not care about the 32bit/64bitness, and I think it doesn't except for the naming. Call it always i386/cpu, like i386/p4e (=prescott) -Andi |
From: Philippe E. <ph...@wa...> - 2004-02-29 17:03:29
|
On Sun, 29 Feb 2004 at 12:18 +0000, Andi Kleen wrote: > John Levon <le...@mo...> writes: > > > On Wed, Feb 25, 2004 at 12:05:43PM -0500, Will Cohen wrote: > > > >> +inline static int __init is_ia32e(void) > >> +{ > >> + return (test_bit(X86_FEATURE_LM, current_cpu_data.x86_capability)); > >> +} > > > > What's LM ? Is this at all like the Intel-preferred way of testing ? > > I don't think it's a good idea to only test for LM(=long mode) here. > Future Intel CPUs may have completely different performance counters, > but likely will still have LM. And the current shipping P4Es are > Prescotts and have these counters, but long mode is not enabled. > > Better test for family/model (family == 15, model >= 3). agreed the current code do this. > > I must admit I never even liked the "x86_64" for hammer, because an > hammer can as well run in 32bit legacy mode and then there is no > x86-64. I think the event files should be only keyed on the CPU > family/mode, not on 32bit/64bit. oprofile really should not > care about the 32bit/64bitness, and I think it doesn't except > for the naming. Call it always i386/cpu, like i386/p4e (=prescott) a bit confusing with p4 Extreme Edition nope ? Since there is no user visible change in performance monitor what about using the existing entry. So rather to report the cpu type we must report the PMU type: -available events for CPU type "Athlon" +Performance monitoring unit "Athlon" based -available events for CPU type "PIII" +available events for performance monitoring unit type "PIII" prescott PMU will be reported as P4 / Xeon PMU. regards, Phil |
From: William C. <wc...@nc...> - 2004-03-01 16:03:21
|
Andi Kleen wrote: > John Levon <le...@mo...> writes: > > >>On Wed, Feb 25, 2004 at 12:05:43PM -0500, Will Cohen wrote: >> >> >>>+inline static int __init is_ia32e(void) >>>+{ >>>+ return (test_bit(X86_FEATURE_LM, current_cpu_data.x86_capability)); >>>+} >> >>What's LM ? Is this at all like the Intel-preferred way of testing ? > > > I don't think it's a good idea to only test for LM(=long mode) here. > Future Intel CPUs may have completely different performance counters, > but likely will still have LM. And the current shipping P4Es are > Prescotts and have these counters, but long mode is not enabled. > > Better test for family/model (family == 15, model >= 3). You are right that using the model number will be more reliable. For the time being it appears that the ia32e performance monitoring hardware is the same as p4. Future generation of intel processors may have different performance monitoring hardware, but have a variety of LM. > I must admit I never even liked the "x86_64" for hammer, because an > hammer can as well run in 32bit legacy mode and then there is no > x86-64. I think the event files should be only keyed on the CPU > family/mode, not on 32bit/64bit. oprofile really should not > care about the 32bit/64bitness, and I think it doesn't except > for the naming. Call it always i386/cpu, like i386/p4e (=prescott) So should have Opteron processors been "i386/amd64"? The amd64 performance monitoring events are a super set of the athlons. I haven't tried it but I would think that those events are still available even when the processor is running in 32-bit legacy mode. -Will |
From: Andi K. <ak...@mu...> - 2004-03-01 16:27:41
|
> > >I must admit I never even liked the "x86_64" for hammer, because an > >hammer can as well run in 32bit legacy mode and then there is no > >x86-64. I think the event files should be only keyed on the CPU > >family/mode, not on 32bit/64bit. oprofile really should not > >care about the 32bit/64bitness, and I think it doesn't except > >for the naming. Call it always i386/cpu, like i386/p4e (=prescott) > > So should have Opteron processors been "i386/amd64"? The amd64 amd64 is the architecture too, that would be a bit confusing. i386/k8 would be probably better. But changing it now would probably add even more confusion, so better keep it for now. > performance monitoring events are a super set of the athlons. I haven't > tried it but I would think that those events are still available even > when the processor is running in 32-bit legacy mode. They are. -Andi |