From: <al...@fc...> - 2002-11-04 22:34:45
Attachments:
cpu_type.c.diff
|
On 2002-10-31 20:28 you wrote: > Once Linus accepts the last 64-bit thingy patch and releases 2.5.46, > I want to (finally !!) release 0.4. > > Any objections ? If you have pending bugfixes let me know now. Hi, John, Sorry this isn't in sooner, but other activities demanded my attention. The P4 Hyper-Threading detection code in oprofile/module/x86/cpu_type.c does not take into account the situation where the processor is HT capable, but HT is not enabled. In this case, the HT machine is just a fast P4 and we can use NMI-based sampling (i.e. not revert to RTC). Most (probably all, actually) P4s can have HT disabled in the BIOS. Unfortunately, detecting this is not as simple as reading the HT bit in cpuid. All that bit does is mark a cpu as capable of HT. So, the attached patch to oprofile/module/x86/cpu_type.c detects if HT has been enabled or not if the cpu is HT capable, and used RTC only if HT is enabled, for your review. Kind of a roundabout way of getting at the info, but that's how the Intel docs do it. This was tested on a two-way HT capable machine. Best Regards, Alex -- Alex Tsariounov Linux and Open Source Lab Hewlett-Packard Company |
From: John L. <le...@mo...> - 2002-11-05 02:54:22
|
On Mon, Nov 04, 2002 at 03:33:47PM -0700, Alex Tsariounov wrote: > Sorry this isn't in sooner, but other activities demanded my attention. > The P4 Hyper-Threading detection code in oprofile/module/x86/cpu_type.c > does not take into account the situation where the processor is HT > capable, but HT is not enabled. I think this can wait. It looks relatively risky. Will, can you take a look ? There are some minor style issues, but mostly I wonder why we must check all CPUs. Does P4 relax the traditional restriction that SMP machines must be, well, symmetric ? > +static void do_cpu_id(void *data); > + just move it before, and remove the prototype. > /** > * p4_threads - determines the number of logical processor threads in a die > * Do we really need both checks ? > + * A P4 can be capable of HT, but not enabled via BOIS. The check for BIOS > +#ifdef CONFIG_SMP reverse sense of this like : #ifndef CONFIG_SMP return 0; #else ... > + printk(KERN_INFO "oprofile: P4 Hyper-Threading is enabled.\n"); > + else > + printk(KERN_INFO > + "oprofile: P4 Hyper-Threading capable, but not enabled.\n"); probably too chatty. just tell the user if it was disabled to use perfctr, and why. > + cpuid (1, &eax, &ebx, &ecx, &edx); cpuid(1,... > + ((unsigned char*)data)[smp_processor_id()] = > + (unsigned char) ((ebx & 0xff000000) >> 24); Magic numbers make baby jesus cry. I'd prefer this to be done in two separate steps at least. > - if (p4_threads()>1 ) > - return CPU_RTC; > + if (p4_threads()>1 ) { should be: if (p4_threads() > 1) { > + if (p4_ht_enabled()) > + return CPU_RTC; > + else > + return CPU_P4; and you can surely simplify the if/else nesting using && in the if. thanks john -- "I find that life has more appeal Without a driver who's asleep behind the wheel" - The Divine Comedy |
From: Alex T. <al...@fc...> - 2002-11-05 17:22:45
|
On Tue, Nov 05, 2002 at 02:51:05AM +0000, John Levon wrote: > There are some minor style issues, but mostly I wonder why we must check > all CPUs. Does P4 relax the traditional restriction that SMP machines > must be, well, symmetric ? I just followed the example code from Intel at: http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial::codesample_cpucounter_desc&cntType=IDS_EDITORIAL&catCode=CDN There's another white paper on the subject floating around their site somewhere. I would think that HT enabled/disabled would be a system switch, but perhaps there is no restriction on which procs in a system have HT enabled and you can have odd number of active logical processors? > > /** > > * p4_threads - determines the number of logical processor threads in a die > > * > > Do we really need both checks ? Yes, since the first check tells us if the processor is capable of HT; if it is not, then we do not have to smp_call_function() the apic id's of all processors. > > > + cpuid (1, &eax, &ebx, &ecx, &edx); > > cpuid(1,... > > > + ((unsigned char*)data)[smp_processor_id()] = > > + (unsigned char) ((ebx & 0xff000000) >> 24); > > Magic numbers make baby jesus cry. > I'd prefer this to be done in two separate steps at least. Do you mean assign ((unsigned char*)data)[smp_processor_id()] to a temp ptr, then assign ((ebx & 0xff000000) >> 24) to it? The 0xff000000 is there since we need only bits EBX[31:24] for the apic id. I'll also apply all other changes you mention. Regards, Alex |
From: John L. <le...@mo...> - 2002-11-05 17:47:51
|
On Tue, Nov 05, 2002 at 10:16:33AM -0700, Alex Tsariounov wrote: > I just followed the example code from Intel at: > http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial::codesample_cpucounter_desc&cntType=IDS_EDITORIAL&catCode=CDN ok > Do you mean assign ((unsigned char*)data)[smp_processor_id()] to a temp > ptr, then assign ((ebx & 0xff000000) >> 24) to it? The 0xff000000 is > there since we need only bits EBX[31:24] for the apic id. something like that yes > I'll also apply all other changes you mention. please do and resubmit, phil convinced me we may as well ship with this to give it some testing regards john -- "When a man has nothing to say, the worst thing he can do is to say it memorably." - Calvin Trillin |
From: Alex T. <al...@fc...> - 2002-11-05 17:56:44
|
On Tue, Nov 05, 2002 at 05:44:44PM +0000, John Levon wrote: > On Tue, Nov 05, 2002 at 10:16:33AM -0700, Alex Tsariounov wrote: > > > I just followed the example code from Intel at: > > http://cedar.intel.com/cgi-bin/ids.dll/content/content.jsp?cntKey=Generic+Editorial::codesample_cpucounter_desc&cntType=IDS_EDITORIAL&catCode=CDN > > ok > > > Do you mean assign ((unsigned char*)data)[smp_processor_id()] to a temp > > ptr, then assign ((ebx & 0xff000000) >> 24) to it? The 0xff000000 is > > there since we need only bits EBX[31:24] for the apic id. > > something like that yes > > > I'll also apply all other changes you mention. > > please do and resubmit, phil convinced me we may as well ship with this > to give it some testing Ok, will do. Regards, Alex |
From: Alex T. <al...@fc...> - 2002-11-05 19:01:09
Attachments:
cpu_type.c.diff
|
On Tue, Nov 05, 2002 at 10:50:36AM -0700, Alex Tsariounov wrote: > On Tue, Nov 05, 2002 at 05:44:44PM +0000, John Levon wrote: > > please do and resubmit, phil convinced me we may as well ship with this > > to give it some testing > > Ok, will do. Ok, changes applied and modified patch attached. A couple of notes: Phil, I did get it wrong so I reverted back to how Intel does it (the mask calculation); John, your original question of why check all processors: we don't know which processor we are originally running on when the check is performed, so although we could check num_threads +1 processors (or something like that), it's probably simpler to just do them all at that point; also, smp_call_function() doesn't do a selective run; I changed my do_cpu_id(&apic_id) to do_cpu_id(apic_id), although ANSI allows &array, it's probably more intuitive to just use the array name. Regards, Alex |
From: William C. <wc...@nc...> - 2002-11-05 20:58:03
|
I have read over the documentation on hyperthreading. The code follows the psuedo code in the Intel manuals pretty closely. However, you get a warning that the do_cpu_id() function is unused when building a UP kernel. With the -Werror that kills the build. do_cpu_id() in cpu_type.c should only be there when configured for SMP kernels. -Will Alex Tsariounov wrote: > On Tue, Nov 05, 2002 at 10:50:36AM -0700, Alex Tsariounov wrote: > >>On Tue, Nov 05, 2002 at 05:44:44PM +0000, John Levon wrote: >> >>>please do and resubmit, phil convinced me we may as well ship with this >>>to give it some testing >> >>Ok, will do. > > > Ok, changes applied and modified patch attached. > > A couple of notes: > > Phil, I did get it wrong so I reverted back to how Intel > does it (the mask calculation); > > John, your original question of why check all processors: > we don't know which processor we are originally running on > when the check is performed, so although we could check > num_threads +1 processors (or something like that), it's > probably simpler to just do them all at that point; also, > smp_call_function() doesn't do a selective run; > > I changed my do_cpu_id(&apic_id) to do_cpu_id(apic_id), > although ANSI allows &array, it's probably more intuitive > to just use the array name. > > Regards, > Alex > > > ------------------------------------------------------------------------ > > Index: cpu_type.c > =================================================================== > RCS file: /cvsroot/oprofile/oprofile/module/x86/cpu_type.c,v > retrieving revision 1.9 > diff -u -r1.9 cpu_type.c > --- cpu_type.c 2 Oct 2002 19:24:56 -0000 1.9 > +++ cpu_type.c 5 Nov 2002 18:18:42 -0000 > @@ -9,6 +9,9 @@ > * @author Philippe Elie > */ > > +#include <asm/smpboot.h> > +#include <linux/smp.h> > + > #include "oprofile.h" > #include "op_msr.h" > > @@ -40,6 +43,70 @@ > return processor_threads; > } > > +/** > + * do_cpu_id - Call the cpuid instruction and fill in data at passed address > + * > + * We expect that data is pointing to an array of unsigned chars as big as there > + * are cpus in an smp system. > + */ > +static void do_cpu_id(void *data) > +{ > + int eax, ebx, ecx, edx; > + unsigned char * ptr = (unsigned char *) data; > + > + cpuid(1, &eax, &ebx, &ecx, &edx); > + > + /* bits EBX[31:24] define APIC ID */ > + ptr[smp_processor_id()] = (unsigned char) ((ebx & 0xff000000) >> 24); > +} > + > +/** > + * p4_ht_enabled - Determines if Hyper Threading is enabled or not. > + * > + * A P4 can be capable of HT, but not enabled via BIOS. The check for > + * this is unfortunately not simple and involves running cpuid on all > + * logical processors and checking for bits in the APIC_ID fields. > + * As per Intel docs. Returns 1 if enabled, 0 otherwise. > + * > + */ > +static int p4_ht_enabled(void) > +{ > +#ifndef CONFIG_SMP > + return 0; > +#else > + int enabled, threads, i; > + unsigned char mask; > + unsigned char apic_id[smp_num_cpus]; > + unsigned int cpu; > + > + /* First check for capability */ > + threads = p4_threads(); > + if (threads == 1) return 0; > + /* Create mask for low order bits */ > + mask = 0xff; > + i = 1; > + while (i < threads) { > + i *= 2; > + mask <<= 1; > + } > + /* Get APIC_ID from all logial procs and self */ > + smp_call_function(do_cpu_id, apic_id, 1, 1); > + do_cpu_id(apic_id); > + /* If the low order bits are on, then HT is enabled */ > + enabled = 0; > + cpu = 0; > + do { > + if (apic_id[cpu] & ~mask) { > + enabled = 1; > + break; > + } > + cpu++; > + } while (cpu < smp_num_cpus); > + > + return enabled; > +#endif /* CONFIG_SMP */ > +} > + > __init op_cpu get_cpu_type(void) > { > __u8 vendor = current_cpu_data.x86_vendor; > @@ -71,11 +138,14 @@ > return CPU_PPRO; > case 0xf: > if (model <= 3) { > - /* Cannot handle HT P4 hardware */ > - if (p4_threads()>1 ) > + /* Cannot handle enabled HT P4 hardware */ > + if ((p4_threads() > 1) && p4_ht_enabled()) { > + printk(KERN_INFO > + "oprofile: P4 HT enabled, reverting to RTC\n"); > return CPU_RTC; > + } > else > - return CPU_P4; > + return CPU_P4; > } else > /* Do not know what it is */ > return CPU_RTC; |
From: Alex T. <al...@fc...> - 2002-11-05 21:10:16
Attachments:
cpu_type.c.diff
|
On Tue, Nov 05, 2002 at 03:48:51PM -0500, William Cohen wrote: > I have read over the documentation on hyperthreading. The code follows > the psuedo code in the Intel manuals pretty closely. > > However, you get a warning that the do_cpu_id() function is unused when > building a UP kernel. With the -Werror that kills the build. do_cpu_id() > in cpu_type.c should only be there when configured for SMP kernels. > > -Will Oops! Thanks for catching that, I see it. Corrected patch attached. Regards, Alex |
From: John L. <le...@mo...> - 2002-11-06 01:59:07
|
On Tue, Nov 05, 2002 at 02:04:04PM -0700, Alex Tsariounov wrote: > Corrected patch attached. ok applied, let's see how this plays out. I plan to release tomorrow... john |
From: Philippe E. <ph...@wa...> - 2002-11-05 04:52:13
|
Alex Tsariounov wrote: > On 2002-10-31 20:28 you wrote: > >>Once Linus accepts the last 64-bit thingy patch and releases 2.5.46, >>I want to (finally !!) release 0.4. >> >>Any objections ? If you have pending bugfixes let me know now. > > > Hi, John, > > Sorry this isn't in sooner, but other activities demanded my attention. > The P4 Hyper-Threading detection code in oprofile/module/x86/cpu_type.c > does not take into account the situation where the processor is HT > capable, but HT is not enabled. > > In this case, the HT machine is just a fast P4 and we can use NMI-based > sampling (i.e. not revert to RTC). Most (probably all, actually) P4s can > have HT disabled in the BIOS. Unfortunately, detecting this is not as > simple as reading the HT bit in cpuid. All that bit does is mark a cpu > as capable of HT. > > So, the attached patch to oprofile/module/x86/cpu_type.c detects if HT > has been enabled or not if the cpu is HT capable, and used RTC only if > HT is enabled, for your review. > > Kind of a roundabout way of getting at the info, but that's how the > Intel docs do it. This was tested on a two-way HT capable machine. Can the bios disable HT only for one physical processor ? > > Index: cpu_type.c ... > +static int p4_ht_enabled(void) > +{ > +#ifdef CONFIG_SMP > + int enabled, threads; > + unsigned char mask; > + unsigned char apic_id[smp_num_cpus]; > + unsigned int cpu; > + > + /* First check for capability */ > + threads = p4_threads(); > + if (threads == 1) return 0; > + /* Create mask for low order bits */ > + mask = 0xff; > + mask <<= (threads % 2) + 1; seems strange, depending on the number of processor you test the low order bit or the 2 low order bits. If you want an accurate test use those given in 7.6.10 (example 7.2 part 4) of Intel manual vol III else testing bit zero seems sufficient. > + /* Get APIC_ID from all logial procs and self */ > + smp_call_function(do_cpu_id, &apic_id, 1, 1); > + do_cpu_id(&apic_id); > + /* If the low order bits are on, then HT is enabled */ > + enabled = 0; > + cpu = 0; > + do { > + if (apic_id[cpu] & ~mask) { > + enabled = 1; > + break; > + } > + cpu++; > + } while (cpu < smp_num_cpus); > + regards, Phil |
From: Alex T. <al...@fc...> - 2002-11-05 17:46:24
|
On Tue, Nov 05, 2002 at 05:46:35AM +0000, Philippe Elie wrote: > Alex Tsariounov wrote: > >Kind of a roundabout way of getting at the info, but that's how the > >Intel docs do it. This was tested on a two-way HT capable machine. > > Can the bios disable HT only for one physical processor ? No, the BIOS is a system-wide switch for all processors, at least on the machine I have. The patch is just how the sample Intel code does it, so I just followed their example. See the url in my reply to John just earlier as well. > >+ /* Create mask for low order bits */ > >+ mask = 0xff; > >+ mask <<= (threads % 2) + 1; > > seems strange, depending on the number of processor > you test the low order bit or the 2 low order bits. > If you want an accurate test use those given in 7.6.10 > (example 7.2 part 4) of Intel manual vol III else > testing bit zero seems sufficient. Yes, I followed this example. I think that the number of logical processors is not limited to 2 per physical processor. Although our current systems only have two threads per proc, the code make allowances for more logicals, I assume for the near future. The already existing p4_threads() function is equivalent to Intel's LogicalProcessorsPerPackage() function in the code you mention, and my compression into (mask <<= (threads % 2) + 1) is for the following where *LogicalNum is the return of LogicalProcessorsPerPackage(): (Did I get all that wrong?) /* Intel sample code: */ // Calculate the appropriate shifts and mask based on the // number of logical processors. unsigned char i = 1, PHY_ID_MASK = 0xFF, PHY_ID_SHIFT = 0; while (i < *LogicalNum) { i *= 2; PHY_ID_MASK <<= 1; PHY_ID_SHIFT++; } /* ............ and then, later on: */ { unsigned char APIC_ID, LOG_ID, PHY_ID; Sleep(0); // Give OS time to switch CPU APIC_ID = GetAPIC_ID(); LOG_ID = APIC_ID & ~PHY_ID_MASK; PHY_ID = APIC_ID >> PHY_ID_SHIFT; if (LOG_ID != 0) HT_Enabled = 1; } Regards, Alex |