From: Andrew M. <ak...@zi...> - 2001-11-16 20:26:25
|
Matthew Wilcox is the new owner of fs/locks.c. He'll be interested. About six months back we had a _big_ problem with Apache throughput. On 8-way x86 Apache throughput almost halved because someone removed the BKL from a path in the file locking code. Apache uses flock()-based synchronisation. Removing the BKL had turned a short spin into a semaphore schedule(), which hurt big-time. I did a bunch of maitenance work against fs/locks.c at the time to set things back right. IIRC I moved the BKL to a lower level in the flock() codepath. At one point I did have a super-scalable implementation which used a new per-inode spinlock for the exclusion. It worked and was just fine. But Linus and I agreed that it was a larger-than-necessary change, that sys_flock() contention was not a likely scenario, and that sticking with the BKL approach was a safer path. FWIW, the super-scalable flocking patch against 2.4.0-test10 is at http://www.zip.com.au/~akpm/threaded-locks-sem.patch Rick Lindsley wrote: > > Thanks for all of your responses. Yes, -fsdevel is probably the right > place to finish this discussion, but I wanted to take start it here in > lse because it's actually SMP related. > > A file-lock-intensive benchmark brought to my attention that the BKL is > currently used to guard i_flock. Without arguing about the merits of > this particular benchmark, it seems to me that simply from inspection, > replacing the BKL here would be a good thing. A per-inode spinlock > would give better granularity than a global one which will cause > blockage across the system on every lock attempt by any process. I've > given some thought to how to improve on that, and come up with to > > a) reducing use of kernel_flag elsewhere > b) replacing kernel_flag with another global spinlock > c) replacing kernel_flag with a global read/write lock > d) replacing kernel_flag with a new lock in struct inode > e) revisiting the algorithm, and all locking associated therein > > a) is far more work than necessary to fix this problem. b) through d) > are all possibilities but since this hasn't shown up before, I'd > conclude that all the contention this benchmark is seeing really is > centered right around i_flock. My hunch is that the best solution is > d), but it's possible that c) could actually provide "enough" > improvements to allow d) to be postponed. Unfortunately, c) may > introduce more troubles than it's worth, because in this particular > example, I suspect that i_flock is NOT read mostly, write occasionally. > Upgrading from a read to a write can't be done atomically so what you > may gain in performance you may lose in "supportability" as the code > grows in complexity. > > Both b) and c) cause serialization across every cpu in the system by > using a global lock, but d) would cause serialization *per inode* and > thus almost guarantee less contention. Assuming, of course, mucking > with the inode structure doesn't cause too many other ripples, which is > why I asked the question. Doing e) almost certainly puts it into the > 2.5 timespace, but not 100% certainly, I suppose. Before I dig too deep > into some test patches I thought I'd test the waters among the folks > here in LSE. > > It's good to hear that the inode is being redesigned for 2.5; a > spinlock (or two) which guards elements of the inode structure would be > very helpful in the new design. If there were one to usurp here I'd > include that in my options, but all we have is semaphores right now. > > Rick > > _______________________________________________ > Lse-tech mailing list > Lse...@li... > https://lists.sourceforge.net/lists/listinfo/lse-tech |