From: Jake H. <jak...@gm...> - 2005-07-04 21:26:55
|
While reading _Solaris Internals: Core Kernel Components_ and stepping through Sun's implementation of mutexes, I learned something about memory barriers (fences) and why exactly we might need to use them to fix at least some of our SMP bugs. Since I'm busy with other projects this week and don't have an SMP machine here to test on, I'll write up what I've found, in lieu of actually poking around the kernel myself.=20 :-) The relevant sections in the Intel IA-32 software developer's manual Volume 3 are 7.2 (memory ordering) and 7.4 (serializing instructions), but they don't explain much other than to state that in all current Intel processors, loads and stores are executed in order with respect to each other. The following page, while written for Java VM implementors, has a lot more useful detail, and covers all the popular RISC CPU's in addition to x86: http://gee.cs.oswego.edu/dl/jmm/cookbook.html The important thing to note is the "StoreLoad" barrier type, which flushes out all pending writes to memory before continuing. Without doing this, if we write to a memory location, then later read from the same location, even if the variable was declared volatile, we might get the processor's cached copy of the data, even if another processor has written to that location in the meantime. While the individual CPU's in an SMP configuration keep their caches up-to-date through bus snooping, this doesn't apply to data that is still in the processor's write buffer and hasn't been pushed out to the cache yet. "StoreLoad" is the only type of barrier required on current x86 systems (excluding AMD64), and the Pentium 4 and amd64 have a special instruction for it, "mfence", but the portable way to do a StoreLoad barrier on x86 is with a lock-prefixed memory access instruction, for example the do-nothing "lock; xorl $0, (%esp)". I don't know if this type of memory barrier is necessary to fix any of the bugs that have been seen, but it's definitely a good thing to know about. BTW, the Linux macros for this type of memory barrier on x86 are mb(), rmb(), smp_mb(), and smp_rmb(), defined in "include/asm-i386/system.h". Linux also has a generic barrier() macro defined in "include/linux/kernel.h" as '__asm__ __volatile__("": : :"memory")', which is used to tell the compiler that memory contents may have changed. -- Jake |