From: AlexF <ale...@gm...> - 2009-12-21 03:37:09
|
Hello, I'm having trouble understanding multi-thread behavior on SBCL, and would appreciate it if somebody could take a look at this. I'm seeing infrequent but replicatable instabiliby (inconsistency) in a multithreaded computation. The usage seems so straightforward, that maybe I have a basic misunderstanding of what should happen here? Here is some code which I believe does straightforward multithreaded computation. There are no special variables, and I think there shouldn't be any race conditions. (defun compute-thread (thread-num rows-per-proc min max mul) "straightforward computation performed by each thread" (let* ((local-min (+ min (* thread-num rows-per-proc))) (local-max (1- (+ local-min rows-per-proc))) (local-count 0)) (loop for i from local-min upto local-max do (loop for j from min upto max do (incf local-count (let ((xc (* mul i))) (loop for count from 0 below 100 do (when (>= xc 4) (return count)) (incf xc) finally (return 100))) ))) #+nil(format *out* "Thread ~a local-min=~a, local-max=~a local-count=~d~%" thread-num local-min local-max local-count) local-count)) (defun main (num-threads) "spawn some processes that perform some computations and sum the results" (loop with rows-per-proc = (/ 100 num-threads) for thread in (loop for thread-num from 0 below num-threads collect (let ((thread-num thread-num));establish private binding of thread-num for closure #+sbcl (sb-thread:make-thread (lambda () (compute-thread thread-num rows-per-proc -250 250 0.008d0))) #+ccl (process-run-function "a" (lambda () (compute-thread thread-num rows-per-proc -250 250 0.008d0))) )) summing #+sbcl(sb-thread:join-thread thread) #+ccl(join-process thread) )) Here is a function that checks the above computation: (defun test (num-threads num-iterations expected-val) "check that result of computation is expected-val each time" (loop for i from 0 below num-iterations do (format t "Run ~a:" i) (let ((result (main num-threads))) (format t "result=~a~%" result) (assert (= expected-val result))))) On SBCL 1.0.33 x86-64 CL-USER> ; compiling (DEFUN COMPUTE-THREAD ...) ; compiling (DEFUN MAIN ...) ; compiling (DEFUN TEST ...); No value CL-USER> (main 1) 300600 <---- this is the expected result CL-USER> (test 20 1000 (main 1)) Run 0:result=300600 Run 1:result=300600 Run 2:result=300600 Run 3:result=300600 . . (snip, all results are 300600 as expected) . Run 141:result=300600 Run 142:result=300600 Run 143:result=300600 Run 144:result=300600 Run 145:result=300602 <----- problem!! (assertion failure occurs) ; Evaluation aborted. CL-USER> Digging a little deeper by enabling the per-thread format form in compute-thread: (the behavior is nondeterministic so the failure occurs on a different run) Run 294:Thread 2 local-min=-240, local-max=-236 local-count=15030 Thread 1 local-min=-245, local-max=-241 local-count=15030 Thread 0 local-min=-250, local-max=-246 local-count=15033 <----- problem! The local sum in this thread is incorrect Thread 3 local-min=-235, local-max=-231 local-count=15030 Thread 4 local-min=-230, local-max=-226 local-count=15030 Thread 5 local-min=-225, local-max=-221 local-count=15030 Thread 6 local-min=-220, local-max=-216 local-count=15030 Thread 7 local-min=-215, local-max=-211 local-count=15030 Thread 8 local-min=-210, local-max=-206 local-count=15030 Thread 9 local-min=-205, local-max=-201 local-count=15030 Thread 10 local-min=-200, local-max=-196 local-count=15030 Thread 11 local-min=-195, local-max=-191 local-count=15030 Thread 12 local-min=-190, local-max=-186 local-count=15030 Thread 13 local-min=-185, local-max=-181 local-count=15030 Thread 14 local-min=-180, local-max=-176 local-count=15030 Thread 15 local-min=-175, local-max=-171 local-count=15030 Thread 16 local-min=-170, local-max=-166 local-count=15030 Thread 17 local-min=-165, local-max=-161 local-count=15030 Thread 18 local-min=-160, local-max=-156 local-count=15030 Thread 19 local-min=-155, local-max=-151 local-count=15030 result=300603 So there seems to be a race condition or something else happening inside the compute-thread function. Can anybody enlighten me on what might be happening? Is there something wrong with my code? By the way, Clozure Common Lisp (CCL) 1.4 behaved as expected ( no assertion failures in 1000 runs). |