From: Linus T. <tor...@tr...> - 2002-06-19 16:57:10
|
On Wed, 19 Jun 2002, Keith Whitwell wrote: > > Our lock has an x86 'lock' modifier, so they're not ordinary reads & writes -- > they're much slower and probably scale with memory bus speed more than cpu speed. Happily, nobody does the "memory bus lock" any more. All sane CPU's do locked operations as locked cache transactions, but this _does_ mean that you need to keep the lock in regular memory (ie never _ever_ put a lock variable in AGP memory, that's a sure-fire way to kill performance). With the cache-locked access, the biggest overhead of doing a lock is actually the fact that the lock also needs to serialize the internal CPU write queues etc, and is a memory barrier. On Intel CPU's this _seems_ to be implemented as a pipeline flush, so on Intel the locked access will basically take something like 12-20 cycles. But it's still _CPU_ cycles, not memory cycles, so locking is still fairly cheap. On AMD Athlons, a lock doesn't flush the pipeline, only serializes the write queue and disables read speculation around it, so on Athlons the lock is even cheaper, on the order of just a couple of cycles. NOTE NOTE NOTE! This is all assuming that you have the lock in an exclusive cacheline already. If you actually get lock contention between multiple CPU's (or even just moving the lock back and forth with no real "contention" as far as sw is concerned), you get much much worse performance due to the cacheline bouncing back and forth. That's not likely to be the common case, though. I hope. If you actually pass the lock back and forth, you'll have other performance issues (like having to flush the context between different lockers). > Presumably we could get rid of the 'lock' in UP systems??? Yes, but it's nasty to do. Either you end up with binaries that only work on UP, or you have a lot of infrastructure to re-write the code at load-time or something. And you won't win _that_ much, since on UP systems you'll obviously never see the nasty behaviour (ie you'll never see the bouncing between CPU's). Linus |