Thread: [Sbcl-devel] An alternate sb-concurrency:queue

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

The attached patch replaces the doubly-linked list implementation of
sb-concurrency:queue with a simpler singly-linked one. The new queue
conses 1/3 as much and is 2X-5X faster in my benchmarks. It passes the
sb-concurrency tests, as well as other tests for libraries that use
sb-concurrency:queue.

The paper mentioned in sb-concurrency/queue.lisp appears to assume,
along with one of its citations, that a dequeue operation for a
singly-linked list implementation requires two CAS operations.
Garbage-collected languages can get by with one CAS, however.

In addition, the head and tail references stored need not be precise.
They move in one direction only. A thread is able to discover the
actual head and tail nodes on its own, without having to read a stored
reference twice. Barriers are not required -- if a thread gets an old
head or tail reference, it merely has a slightly longer journey to the
actual head or tail node.

A possible objection to this implementation is that it does not clear
the CDRs of old nodes (though the CARs are cleared). This could
possibly be done, but I wonder if it's necessary considering that
CL:POP is commonly used and has the same behavior.

------- old queue:

CL-USER> (bench-queue:run 8 5000000)
=== SERIAL
Evaluation took:
  0.794 seconds of real time
  0.792049 seconds of total run time (0.668041 user, 0.124008 system)
  [ Run times consist of 0.548 seconds GC time, and 0.245 seconds non-GC time. ]
  99.75% CPU
  2,695,490,254 processor cycles
  120,008,240 bytes consed

=== SINGLE-PRODUCER-MULTIPLE-CONSUMER
Evaluation took:
  1.012 seconds of real time
  5.372336 seconds of total run time (5.216326 user, 0.156010 system)
  [ Run times consist of 0.348 seconds GC time, and 5.025 seconds non-GC time. ]
  530.83% CPU
  3,435,016,666 processor cycles
  120,013,328 bytes consed

=== MULTIPLE-PRODUCER-SINGLE-CONSUMER
Evaluation took:
  1.657 seconds of real time
  5.296331 seconds of total run time (5.080318 user, 0.216013 system)
  [ Run times consist of 1.008 seconds GC time, and 4.289 seconds non-GC time. ]
  319.61% CPU
  5,619,372,273 processor cycles
  119,967,432 bytes consed

------- new queue:

CL-USER> (bench-queue:run 8 5000000)
=== SERIAL
Evaluation took:
  0.170 seconds of real time
  0.168011 seconds of total run time (0.140009 user, 0.028002 system)
  [ Run times consist of 0.044 seconds GC time, and 0.125 seconds non-GC time. ]
  98.82% CPU
  577,716,716 processor cycles
  40,001,536 bytes consed

=== SINGLE-PRODUCER-MULTIPLE-CONSUMER
Evaluation took:
  0.563 seconds of real time
  3.300205 seconds of total run time (3.132195 user, 0.168010 system)
  [ Run times consist of 0.088 seconds GC time, and 3.213 seconds non-GC time. ]
  586.15% CPU
  1,908,838,348 processor cycles
  40,014,064 bytes consed

=== MULTIPLE-PRODUCER-SINGLE-CONSUMER
Evaluation took:
  0.516 seconds of real time
  2.812176 seconds of total run time (2.808176 user, 0.004000 system)
  [ Run times consist of 0.144 seconds GC time, and 2.669 seconds non-GC time. ]
  544.96% CPU
  1,751,178,119 processor cycles
  40,013,808 bytes consed

Thread: [Sbcl-devel] An alternate sb-concurrency:queue

Common Lisp compiler and runtime

sbcl-devel