While reading _Solaris Internals: Core Kernel Components_ and stepping
through Sun's implementation of mutexes, I learned something about
memory barriers (fences) and why exactly we might need to use them to
fix at least some of our SMP bugs. Since I'm busy with other projects
this week and don't have an SMP machine here to test on, I'll write up
what I've found, in lieu of actually poking around the kernel myself.=20
:-)
The relevant sections in the Intel IA-32 software developer's manual
Volume 3 are 7.2 (memory ordering) and 7.4 (serializing instructions),
but they don't explain much other than to state that in all current
Intel processors, loads and stores are executed in order with respect
to each other. The following page, while written for Java VM
implementors, has a lot more useful detail, and covers all the popular
RISC CPU's in addition to x86:
http://gee.cs.oswego.edu/dl/jmm/cookbook.html
The important thing to note is the "StoreLoad" barrier type, which
flushes out all pending writes to memory before continuing. Without
doing this, if we write to a memory location, then later read from the
same location, even if the variable was declared volatile, we might
get the processor's cached copy of the data, even if another processor
has written to that location in the meantime. While the individual
CPU's in an SMP configuration keep their caches up-to-date through bus
snooping, this doesn't apply to data that is still in the processor's
write buffer and hasn't been pushed out to the cache yet.
"StoreLoad" is the only type of barrier required on current x86
systems (excluding AMD64), and the Pentium 4 and amd64 have a special
instruction for it, "mfence", but the portable way to do a StoreLoad
barrier on x86 is with a lock-prefixed memory access instruction, for
example the do-nothing "lock; xorl $0, (%esp)".
I don't know if this type of memory barrier is necessary to fix any of
the bugs that have been seen, but it's definitely a good thing to know
about. BTW, the Linux macros for this type of memory barrier on x86
are mb(), rmb(), smp_mb(), and smp_rmb(), defined in
"include/asm-i386/system.h". Linux also has a generic barrier() macro
defined in "include/linux/kernel.h" as '__asm__ __volatile__("": :
:"memory")', which is used to tell the compiler that memory contents
may have changed.
--
Jake
|