From: AlexF <ale...@gm...> - 2009-12-21 03:37:09
|
Hello, I'm having trouble understanding multi-thread behavior on SBCL, and would appreciate it if somebody could take a look at this. I'm seeing infrequent but replicatable instabiliby (inconsistency) in a multithreaded computation. The usage seems so straightforward, that maybe I have a basic misunderstanding of what should happen here? Here is some code which I believe does straightforward multithreaded computation. There are no special variables, and I think there shouldn't be any race conditions. (defun compute-thread (thread-num rows-per-proc min max mul) "straightforward computation performed by each thread" (let* ((local-min (+ min (* thread-num rows-per-proc))) (local-max (1- (+ local-min rows-per-proc))) (local-count 0)) (loop for i from local-min upto local-max do (loop for j from min upto max do (incf local-count (let ((xc (* mul i))) (loop for count from 0 below 100 do (when (>= xc 4) (return count)) (incf xc) finally (return 100))) ))) #+nil(format *out* "Thread ~a local-min=~a, local-max=~a local-count=~d~%" thread-num local-min local-max local-count) local-count)) (defun main (num-threads) "spawn some processes that perform some computations and sum the results" (loop with rows-per-proc = (/ 100 num-threads) for thread in (loop for thread-num from 0 below num-threads collect (let ((thread-num thread-num));establish private binding of thread-num for closure #+sbcl (sb-thread:make-thread (lambda () (compute-thread thread-num rows-per-proc -250 250 0.008d0))) #+ccl (process-run-function "a" (lambda () (compute-thread thread-num rows-per-proc -250 250 0.008d0))) )) summing #+sbcl(sb-thread:join-thread thread) #+ccl(join-process thread) )) Here is a function that checks the above computation: (defun test (num-threads num-iterations expected-val) "check that result of computation is expected-val each time" (loop for i from 0 below num-iterations do (format t "Run ~a:" i) (let ((result (main num-threads))) (format t "result=~a~%" result) (assert (= expected-val result))))) On SBCL 1.0.33 x86-64 CL-USER> ; compiling (DEFUN COMPUTE-THREAD ...) ; compiling (DEFUN MAIN ...) ; compiling (DEFUN TEST ...); No value CL-USER> (main 1) 300600 <---- this is the expected result CL-USER> (test 20 1000 (main 1)) Run 0:result=300600 Run 1:result=300600 Run 2:result=300600 Run 3:result=300600 . . (snip, all results are 300600 as expected) . Run 141:result=300600 Run 142:result=300600 Run 143:result=300600 Run 144:result=300600 Run 145:result=300602 <----- problem!! (assertion failure occurs) ; Evaluation aborted. CL-USER> Digging a little deeper by enabling the per-thread format form in compute-thread: (the behavior is nondeterministic so the failure occurs on a different run) Run 294:Thread 2 local-min=-240, local-max=-236 local-count=15030 Thread 1 local-min=-245, local-max=-241 local-count=15030 Thread 0 local-min=-250, local-max=-246 local-count=15033 <----- problem! The local sum in this thread is incorrect Thread 3 local-min=-235, local-max=-231 local-count=15030 Thread 4 local-min=-230, local-max=-226 local-count=15030 Thread 5 local-min=-225, local-max=-221 local-count=15030 Thread 6 local-min=-220, local-max=-216 local-count=15030 Thread 7 local-min=-215, local-max=-211 local-count=15030 Thread 8 local-min=-210, local-max=-206 local-count=15030 Thread 9 local-min=-205, local-max=-201 local-count=15030 Thread 10 local-min=-200, local-max=-196 local-count=15030 Thread 11 local-min=-195, local-max=-191 local-count=15030 Thread 12 local-min=-190, local-max=-186 local-count=15030 Thread 13 local-min=-185, local-max=-181 local-count=15030 Thread 14 local-min=-180, local-max=-176 local-count=15030 Thread 15 local-min=-175, local-max=-171 local-count=15030 Thread 16 local-min=-170, local-max=-166 local-count=15030 Thread 17 local-min=-165, local-max=-161 local-count=15030 Thread 18 local-min=-160, local-max=-156 local-count=15030 Thread 19 local-min=-155, local-max=-151 local-count=15030 result=300603 So there seems to be a race condition or something else happening inside the compute-thread function. Can anybody enlighten me on what might be happening? Is there something wrong with my code? By the way, Clozure Common Lisp (CCL) 1.4 behaved as expected ( no assertion failures in 1000 runs). |
From: AlexF <ale...@gm...> - 2009-12-21 03:57:24
|
Hi David, Thanks for the reply. The error can occur on any thread. Here's another run: . . <snip> . Run 187:Thread 0 local-min=-250, local-max=-246 local-count=15030 Thread 1 local-min=-245, local-max=-241 local-count=15030 Thread 2 local-min=-240, local-max=-236 local-count=15030 Thread 3 local-min=-235, local-max=-231 local-count=15030 Thread 5 local-min=-225, local-max=-221 local-count=15030 Thread 4 local-min=-230, local-max=-226 local-count=15030 Thread 6 local-min=-220, local-max=-216 local-count=15030 Thread 7 local-min=-215, local-max=-211 local-count=15030 Thread 8 local-min=-210, local-max=-206 local-count=15030 Thread 9 local-min=-205, local-max=-201 local-count=15030 Thread 10 local-min=-200, local-max=-196 local-count=15030 Thread 11 local-min=-195, local-max=-191 local-count=15030 Thread 13 local-min=-185, local-max=-181 local-count=15030 Thread 14 local-min=-180, local-max=-176 local-count=15030 Thread 12 local-min=-190, local-max=-186 local-count=15030 Thread 15 local-min=-175, local-max=-171 local-count=15030 Thread 16 local-min=-170, local-max=-166 local-count=15030 Thread 17 local-min=-165, local-max=-161 local-count=15030 Thread 18 local-min=-160, local-max=-156 local-count=15030 Thread 19 local-min=-155, local-max=-151 local-count=15030 result=300600 Run 188:Thread 0 local-min=-250, local-max=-246 local-count=15030 Thread 1 local-min=-245, local-max=-241 local-count=15030 Thread 2 local-min=-240, local-max=-236 local-count=15030 Thread 4 local-min=-230, local-max=-226 local-count=15030 Thread 3 local-min=-235, local-max=-231 local-count=15030 Thread 5 local-min=-225, local-max=-221 local-count=15030 Thread 7 local-min=-215, local-max=-211 local-count=15030 Thread 8 local-min=-210, local-max=-206 local-count=15030 Thread 9 local-min=-205, local-max=-201 local-count=15034 <------------ error on Thread 9 Thread 10 local-min=-200, local-max=-196 local-count=15030 Thread 6 local-min=-220, local-max=-216 local-count=15030 Thread 11 local-min=-195, local-max=-191 local-count=15030 Thread 12 local-min=-190, local-max=-186 local-count=15030 Thread 13 local-min=-185, local-max=-181 local-count=15030 Thread 14 local-min=-180, local-max=-176 local-count=15030 Thread 15 local-min=-175, local-max=-171 local-count=15030 Thread 16 local-min=-170, local-max=-166 local-count=15030 Thread 17 local-min=-165, local-max=-161 local-count=15030 Thread 18 local-min=-160, local-max=-156 local-count=15030 Thread 19 local-min=-155, local-max=-151 local-count=15030 result=300604 On Mon, Dec 21, 2009 at 12:51 PM, David L. Rager <ra...@gm...> wrote: > Hi, > > I just looked at this briefly. Here's a probably-bad question: > > You already mention that the failure occurs on a random run. Is the > failure always in thread 0? Not that I know what it would mean if it > is.... I just find it peculiar that we're witnessing the occurrence > on a border case. > > David > > On Sun, Dec 20, 2009 at 9:36 PM, AlexF <ale...@gm...> wrote: >> Hello, I'm having trouble understanding multi-thread behavior on SBCL, >> and would appreciate it if somebody could take a look at this. >> I'm seeing infrequent but replicatable instabiliby (inconsistency) in >> a multithreaded computation. >> The usage seems so straightforward, that maybe I have a basic >> misunderstanding of what should happen here? >> >> >> >> Here is some code which I believe does straightforward multithreaded >> computation. >> There are no special variables, and I think there shouldn't be any >> race conditions. >> >> >> (defun compute-thread (thread-num rows-per-proc min max mul) >> "straightforward computation performed by each thread" >> (let* ((local-min (+ min (* thread-num rows-per-proc))) >> (local-max (1- (+ local-min rows-per-proc))) >> (local-count 0)) >> (loop for i from local-min upto local-max >> do (loop for j from min upto max >> do (incf local-count >> (let ((xc (* mul i))) >> (loop for count from 0 below 100 >> do (when (>= xc 4) >> (return count)) >> (incf xc) >> finally (return 100))) >> ))) >> #+nil(format *out* "Thread ~a local-min=~a, local-max=~a local-count=~d~%" >> thread-num local-min local-max local-count) >> local-count)) >> >> >> (defun main (num-threads) >> "spawn some processes that perform some computations and sum the results" >> (loop with rows-per-proc = (/ 100 num-threads) >> for thread in >> (loop for thread-num from 0 below num-threads >> collect >> (let ((thread-num thread-num));establish private binding of >> thread-num for closure >> #+sbcl >> (sb-thread:make-thread (lambda () >> (compute-thread thread-num rows-per-proc -250 250 0.008d0))) >> #+ccl >> (process-run-function "a" (lambda () >> (compute-thread thread-num rows-per-proc -250 250 0.008d0))) >> )) >> summing >> #+sbcl(sb-thread:join-thread thread) >> #+ccl(join-process thread) >> )) >> >> >> Here is a function that checks the above computation: >> >> (defun test (num-threads num-iterations expected-val) >> "check that result of computation is expected-val each time" >> (loop for i from 0 below num-iterations do >> (format t "Run ~a:" i) >> (let ((result (main num-threads))) >> (format t "result=~a~%" result) >> (assert (= expected-val result))))) >> >> >> >> On SBCL 1.0.33 x86-64 >> >> CL-USER> >> ; compiling (DEFUN COMPUTE-THREAD ...) >> ; compiling (DEFUN MAIN ...) >> ; compiling (DEFUN TEST ...); No value >> CL-USER> (main 1) >> >> 300600 <---- this is the expected result >> >> >> CL-USER> (test 20 1000 (main 1)) >> Run 0:result=300600 >> Run 1:result=300600 >> Run 2:result=300600 >> Run 3:result=300600 >> . >> . (snip, all results are 300600 as expected) >> . >> Run 141:result=300600 >> Run 142:result=300600 >> Run 143:result=300600 >> Run 144:result=300600 >> Run 145:result=300602 <----- problem!! (assertion failure occurs) >> ; Evaluation aborted. >> CL-USER> >> >> >> >> Digging a little deeper by enabling the per-thread format form in >> compute-thread: >> (the behavior is nondeterministic so the failure occurs on a different run) >> >> Run 294:Thread 2 local-min=-240, local-max=-236 local-count=15030 >> Thread 1 local-min=-245, local-max=-241 local-count=15030 >> Thread 0 local-min=-250, local-max=-246 local-count=15033 <----- >> problem! The local sum in this thread is incorrect >> Thread 3 local-min=-235, local-max=-231 local-count=15030 >> Thread 4 local-min=-230, local-max=-226 local-count=15030 >> Thread 5 local-min=-225, local-max=-221 local-count=15030 >> Thread 6 local-min=-220, local-max=-216 local-count=15030 >> Thread 7 local-min=-215, local-max=-211 local-count=15030 >> Thread 8 local-min=-210, local-max=-206 local-count=15030 >> Thread 9 local-min=-205, local-max=-201 local-count=15030 >> Thread 10 local-min=-200, local-max=-196 local-count=15030 >> Thread 11 local-min=-195, local-max=-191 local-count=15030 >> Thread 12 local-min=-190, local-max=-186 local-count=15030 >> Thread 13 local-min=-185, local-max=-181 local-count=15030 >> Thread 14 local-min=-180, local-max=-176 local-count=15030 >> Thread 15 local-min=-175, local-max=-171 local-count=15030 >> Thread 16 local-min=-170, local-max=-166 local-count=15030 >> Thread 17 local-min=-165, local-max=-161 local-count=15030 >> Thread 18 local-min=-160, local-max=-156 local-count=15030 >> Thread 19 local-min=-155, local-max=-151 local-count=15030 >> result=300603 >> >> So there seems to be a race condition or something else happening >> inside the compute-thread function. >> Can anybody enlighten me on what might be happening? Is there >> something wrong with my code? >> >> By the way, Clozure Common Lisp (CCL) 1.4 behaved as expected ( no >> assertion failures in 1000 runs). >> >> ------------------------------------------------------------------------------ >> This SF.Net email is sponsored by the Verizon Developer Community >> Take advantage of Verizon's best-in-class app development support >> A streamlined, 14 day to market process makes app distribution fast and easy >> Join now and get one step closer to millions of Verizon customers >> http://p.sf.net/sfu/verizon-dev2dev >> _______________________________________________ >> Sbcl-help mailing list >> Sbc...@li... >> https://lists.sourceforge.net/lists/listinfo/sbcl-help >> > |
From: Christophe R. <cs...@ca...> - 2009-12-21 20:10:13
|
AlexF <ale...@gm...> writes: > So there seems to be a race condition or something else happening > inside the compute-thread function. > Can anybody enlighten me on what might be happening? Is there > something wrong with my code? There was nothing that leapt out at me, though I'm not the most experienced of threaded programmers. That said, I was unable to reproduce your problem; I ran several 1000-fold runs, both on x86 and x86-64, with no failures. That's of course not to say that there's necessarily something else wrong: it could be sbcl, as you diagnose. One thing you might want to do, though, if you can, is see whether you can reproduce your problem on other machines, and if so how to characterize the set of failing systems (OS, architecture, kernel version...) Cheers, Christophe |
From: AlexF <ale...@gm...> - 2009-12-28 12:00:51
|
Dear David, Christophe, and Pascal Thanks very much for your replies. I finally had a chance to try to reproduce my threading problem in some other environments: SBCL1.0.33 on Intel Core 2 quad core, 64-bit Ubuntu 9.04: consistent failure (this is the original environment where I reported the problem) CCL1.4 on Intel Core 2 quad core, 64-bit Ubuntu 9.04: consistent failure SBCL 1.0.33Core i7 quad core, 64-bit Ubuntu 9.04: consistent failure CCL1.4 on Core i7 quad core, 64-bit Ubuntu 9.04: consistent failure SBCL1.0.33 on Intel Core 2 duo, 32-bit Ubuntu 9.04: can't reproduce after >10^5 iterations SBCL1.0.33 on Core i7 quad core, 64-bit Ubuntu 9.10: can't reproduce after >10^5 iterations I guess the biggest finding was that after many more runs, I was able to get CCL1.4 to fail, so at least we know that my threading problem is not SBCL-specific. It's not machine or architecture specific. It may be a Ubuntu 9.04+quad-core specific issue (which would be weird...), or it might be the result of a lack of understanding on my part as to what the code should do. So the mystery remains unsolved, and I'll dig deeper into this issue again later, but in the meantime, thanks very much for your help. Thanks, -Alex Fukunaga On Tue, Dec 22, 2009 at 5:42 AM, Pascal Costanza <pc...@p-...> wrote: > I just looked at the code of join-thread: The first thing it does is that it tries to grab the result-lock of the thread. However, that lock isn't taken until relatively late in make-thread, so what could happen is that join-thread successfully gets the lock before the thread has even started (unless there is something else that ensures this doesn't happen which I'm missing). > > Pascal > On Mon, Dec 21, 2009 at 12:36 PM, AlexF <ale...@gm...> wrote: > Hello, I'm having trouble understanding multi-thread behavior on SBCL, > and would appreciate it if somebody could take a look at this. > I'm seeing infrequent but replicatable instabiliby (inconsistency) in > a multithreaded computation. > The usage seems so straightforward, that maybe I have a basic > misunderstanding of what should happen here? > > > > Here is some code which I believe does straightforward multithreaded > computation. > There are no special variables, and I think there shouldn't be any > race conditions. > > > (defun compute-thread (thread-num rows-per-proc min max mul) > "straightforward computation performed by each thread" > (let* ((local-min (+ min (* thread-num rows-per-proc))) > (local-max (1- (+ local-min rows-per-proc))) > (local-count 0)) > (loop for i from local-min upto local-max > do (loop for j from min upto max > do (incf local-count > (let ((xc (* mul i))) > (loop for count from 0 below 100 > do (when (>= xc 4) > (return count)) > (incf xc) > finally (return 100))) > ))) > #+nil(format *out* "Thread ~a local-min=~a, local-max=~a local-count=~d~%" > thread-num local-min local-max local-count) > local-count)) > > > (defun main (num-threads) > "spawn some processes that perform some computations and sum the results" > (loop with rows-per-proc = (/ 100 num-threads) > for thread in > (loop for thread-num from 0 below num-threads > collect > (let ((thread-num thread-num));establish private binding of > thread-num for closure > #+sbcl > (sb-thread:make-thread (lambda () > (compute-thread thread-num rows-per-proc -250 250 0.008d0))) > #+ccl > (process-run-function "a" (lambda () > (compute-thread thread-num rows-per-proc -250 250 0.008d0))) > )) > summing > #+sbcl(sb-thread:join-thread thread) > #+ccl(join-process thread) > )) > > > Here is a function that checks the above computation: > > (defun test (num-threads num-iterations expected-val) > "check that result of computation is expected-val each time" > (loop for i from 0 below num-iterations do > (format t "Run ~a:" i) > (let ((result (main num-threads))) > (format t "result=~a~%" result) > (assert (= expected-val result))))) > > > > On SBCL 1.0.33 x86-64 > > CL-USER> > ; compiling (DEFUN COMPUTE-THREAD ...) > ; compiling (DEFUN MAIN ...) > ; compiling (DEFUN TEST ...); No value > CL-USER> (main 1) > > 300600 <---- this is the expected result > > > CL-USER> (test 20 1000 (main 1)) > Run 0:result=300600 > Run 1:result=300600 > Run 2:result=300600 > Run 3:result=300600 > . > . (snip, all results are 300600 as expected) > . > Run 141:result=300600 > Run 142:result=300600 > Run 143:result=300600 > Run 144:result=300600 > Run 145:result=300602 <----- problem!! (assertion failure occurs) > ; Evaluation aborted. > CL-USER> > > > > Digging a little deeper by enabling the per-thread format form in > compute-thread: > (the behavior is nondeterministic so the failure occurs on a different run) > > Run 294:Thread 2 local-min=-240, local-max=-236 local-count=15030 > Thread 1 local-min=-245, local-max=-241 local-count=15030 > Thread 0 local-min=-250, local-max=-246 local-count=15033 <----- > problem! The local sum in this thread is incorrect > Thread 3 local-min=-235, local-max=-231 local-count=15030 > Thread 4 local-min=-230, local-max=-226 local-count=15030 > Thread 5 local-min=-225, local-max=-221 local-count=15030 > Thread 6 local-min=-220, local-max=-216 local-count=15030 > Thread 7 local-min=-215, local-max=-211 local-count=15030 > Thread 8 local-min=-210, local-max=-206 local-count=15030 > Thread 9 local-min=-205, local-max=-201 local-count=15030 > Thread 10 local-min=-200, local-max=-196 local-count=15030 > Thread 11 local-min=-195, local-max=-191 local-count=15030 > Thread 12 local-min=-190, local-max=-186 local-count=15030 > Thread 13 local-min=-185, local-max=-181 local-count=15030 > Thread 14 local-min=-180, local-max=-176 local-count=15030 > Thread 15 local-min=-175, local-max=-171 local-count=15030 > Thread 16 local-min=-170, local-max=-166 local-count=15030 > Thread 17 local-min=-165, local-max=-161 local-count=15030 > Thread 18 local-min=-160, local-max=-156 local-count=15030 > Thread 19 local-min=-155, local-max=-151 local-count=15030 > result=300603 > > So there seems to be a race condition or something else happening > inside the compute-thread function. > Can anybody enlighten me on what might be happening? Is there > something wrong with my code? > > By the way, Clozure Common Lisp (CCL) 1.4 behaved as expected ( no > assertion failures in 1000 runs). > |
From: Nikodemus S. <nik...@ra...> - 2009-12-28 20:44:08
|
2009/12/28 AlexF <ale...@gm...>: > I guess the biggest finding was that after many more runs, I was able > to get CCL1.4 to fail, so at least we know that my threading problem > is > not SBCL-specific. > It's not machine or architecture specific. > It may be a Ubuntu 9.04+quad-core specific issue (which would be weird...), or > it might be the result of a lack of understanding on my part as to > what the code should do. Yhere is a small bug in your test-case: it works correctly only if the number of threads specified divides 100 exactly. Here is an always-correct version: (defun main (num-threads) "spawn some processes that perform some computations and sum the results" (flet ((in-thread (n rows) #+sbcl (sb-thread:make-thread (lambda () (compute-thread n rows -250 250 0.008d0))) #+ccl (process-run-function "a" (lambda () (compute-thread n rows -250 250 0.008d0))))) (multiple-value-bind (rows-per-proc overflow-rows) (truncate 100 num-threads) (loop for thread in (cons (in-thread num-threads overflow-rows) (loop for thread-num from 0 below num-threads collect (in-thread thread-num rows-per-proc))) summing #+sbcl (sb-thread:join-thread thread) #+ccl (join-process thread))))) That said, the bug is deterministic, so in a system functioning correctly you should always get stable values even then... Can you write a C/pthread only case that reproduces this? I have been so far unable to reproduce the bug on Darwin Core 2 Duo. Haven't tried on a quad core anywhere yet. >> I just looked at the code of join-thread: The first thing it does is that it tries to grab >> the result-lock of the thread. However, that lock isn't taken until relatively late in >> make-thread, so what could happen is that join-thread successfully gets the lock before >> the thread has even started (unless there is something else that ensures this doesn't >> happen which I'm missing). Nope: MAKE-THREAD doesn't return until the thread setup semaphore has been signalled, which isn't done until the new thread has grabbed it's own result lock. Cheers, -- Nikodemus |
From: Gábor M. <me...@re...> - 2009-12-28 21:33:38
|
On Monday 28 December 2009, Nikodemus Siivola wrote: > 2009/12/28 AlexF <ale...@gm...>: > > I guess the biggest finding was that after many more runs, I was > > able to get CCL1.4 to fail, so at least we know that my threading > > problem is > > not SBCL-specific. > > It's not machine or architecture specific. > > It may be a Ubuntu 9.04+quad-core specific issue (which would be > > weird...), or it might be the result of a lack of understanding on > > my part as to what the code should do. > > Yhere is a small bug in your test-case: it works correctly only if > the number of threads specified divides 100 exactly. Here is an > always-correct version: > > (defun main (num-threads) > "spawn some processes that perform some computations and sum the > results" (flet ((in-thread (n rows) > #+sbcl > (sb-thread:make-thread > (lambda () > (compute-thread n rows -250 250 0.008d0))) > #+ccl > (process-run-function > "a" > (lambda () > (compute-thread n rows -250 250 0.008d0))))) > (multiple-value-bind (rows-per-proc overflow-rows) (truncate 100 > num-threads) > (loop for thread in > (cons (in-thread num-threads overflow-rows) > (loop for thread-num from 0 below num-threads > collect (in-thread thread-num > rows-per-proc))) summing > #+sbcl (sb-thread:join-thread thread) > #+ccl (join-process thread))))) > > That said, the bug is deterministic, so in a system functioning > correctly you should always get stable values even then... > > Can you write a C/pthread only case that reproduces this? > > I have been so far unable to reproduce the bug on Darwin Core 2 Duo. > Haven't tried on a quad core anywhere yet. I've run this code a 100 times on a Core Duo with linux 2.6.30 and on a 24 core Core 2 Duo with 2.6.27.29 and failed to reproduce the bug. Gabor |