Thread: [Lse-tech] Nodeless MCS Lock Implementation

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Here again is my implementation of MCS locks. What is new this time is
that I've added a nodeless extension to the lock, which can be used as a
drop in replacement for spinlocks. I did this by serializing access to a
single bit lock with an MCS lock, keeping node allocation on the stack
of nmcs_lock(). This causes some additional overhead, but simplifies the
usage. Various performance numbers follow.

Zero contention overhead calculated from userspace test (tight
lock/unlock loop):
10000000 spin lock/unlocks:	2406865us
10000000 mcs lock/unlocks:	3071824us
10000000 nmcs lock/unlocks:	4244046us
mcs performance hit:	127.627599% 
nmcs performance hit:	176.330870% 

High contention performance (5sec w/ 8threads):
spin lock:	593051 acquisitions
mcs lock:	5500393 acquisitions
nmcs lock:	3642682 acquisitions

Also attached is a patch which globally replaces all spinlocks in the
kernel with nodeless mcs locks. Using hackbench as a quick and dirty
benchmark, I got the following numbers on an ~8 way (4 cpu + HT) box.

2.4.17 vanilla
hackbench 10:
Time: 3.693
Time: 3.050
Time: 3.206
hackbench 25:
Time: 15.831
Time: 20.074
Time: 22.694
hackbench 50:
Time: 96.058
Time: 84.713
Time: 94.403
hackbench 75:
Time: 203.603
Time: 184.666
Time: 218.726

2.4.17+nmcs+spinlocks-replaced
hackbench 10:
Time: 3.300
Time: 4.280
Time: 4.305
hackbench 25:
Time: 16.185
Time: 18.584
Time: 15.028
hackbench 50:
Time: 72.532
Time: 71.361
Time: 75.763
hackbench 75:
Time: 153.873
Time: 165.528
Time: 148.546

The patches have been tested on i386 only. I'll start playing w/ ia64
soon, and a patch for proper write ordering will most likely follow.

thanks
-john

Thread: [Lse-tech] Nodeless MCS Lock Implementation

lse-tech