[Syllable-kernel] new scheduler and HyperThreading (was Re: dos2unix)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Daniel Gryniewicz wrote:
> 
> I can easily sync up and give you a new sched02 patch.  I'll do that.
> Note that, after updating CVS today, I'm having tons on warnings on my
> linux box.  These will likely show up with Vanders' experimental GCC, so
> we should probably fix those too.  (The <atheos/typedefs.h> ->
> "inc/typedefs.h" move had me fooled for a minute...)

Cool.  I'll test Vanders' new GCC and see how stable it is for kernel 
builds.  I've been reading _The Unabridged Pentium 4 IA32 Processor 
Genealogy_ at Safari.oreilly.com and I found some good tips on getting 
the best performance with HT.  Are you keeping in mind the differences 
between logical and physical CPUs in your scheduling logic?  Here are 
the techniques that Intel recommends to optimize performance for 
HT-enabled processors.

* The OS scheduler should schedule threads to be executed on logical 
processors within different physical processors before scheduling 
threads to be executed on both of the logical processors within the same 
physical processor.

* Eliminate spin-wait loops wherever possible.  I need to add a PAUSE 
instruction to our spinlock() function to keep the loop from spinning 
too fast on P4 systems and wasting power (on laptops) and cycles that 
could be used by the other logical CPU.  On older CPUs, PAUSE is a NOP.

* The OS scheduler should attempt to balance the load on each logical 
processor.

* Attempt to share code and data between threads executing on each 
logical CPU within a physical processor (the L1 data cache and L2/L3 
caches are shared).  Two threads in the same process running the same 
code and accessing the same data set will run faster when executing on 
the same physical processor.  This is related to the optimization of 
giving threads an affinity to prefer running on the same CPU as they 
last executed on, only in this case the logical CPU doesn't matter and 
the affinity should be tied to the physical CPU.

* Eliminate or decrease the amount of code and data sharing between 
threads executing on different physical processors.  The ideal situation 
for things like semaphores is that they should be in separate cache 
lines from each other and from the data they're protecting.  On P6 
processors, cache line size is 32 bytes.  On P4 processors, the L2 and 
L3 cache line size is 128 bytes and the L1 data cache line is 64 bytes. 
  I know the Linux kernel has some macros to pad data structures based 
on the cache line size of the CPU(s) that you have compiled it for, but 
we haven't really optimized that aspect of Syllable yet.

After I get SMP working on my P4 system, I'll update the startup code to 
store logical CPU information in the ProcessorInfo_s so you'll be able 
to distinguish between logical and physical processors.  In the meantime 
I wanted to bring up the topic so that you can plan ahead.

-Jake