From: Christopher N. <cjn...@cn...> - 2005-08-21 23:11:05
|
I realize that the multi-threading work is fairly new, but I've come across two bugs while working with sbcl threads. I'm running a CVS checkout from August 21, 2005, IA-32 architecture, Linux 2.6.11 kernel. First, the following simple program breaks: (defparameter print-mutex (sb-thread:make-mutex)) (defun worker (i) (sb-thread:with-mutex (print-mutex) (format t "~D~%" i))) (defun main () (dotimes (i 100) (sb-thread:make-thread #'(lambda () (worker i))))) The output of (main) should be the integers from 0 to 99, not necessarily in order. Instead, a typical run will produce the number 13 seven times, then the number 20 seven times, then the number 27 seven times, then the number 33 twelve times, and so on. Finally, it will print the number 100 a few times. This appears to be because by the time the closure is evaluated to create the new thread, the value of 'i' has been incremented, maybe several times. Consequently, the only safe way to start a thread which takes arguments like this is to set up a barrier system with condition variables to protect the variables until after the thread has started executing. The second bug is more subtle. From the time sbcl is started, at least in my system, I can only spawn a total of 512 threads, even if all of them are reaped. That is, in the lifetime of an sbcl process, you can only successfully call sb-thread:make-thread a total of 512 times, even if you only create short-lived threads one at a time. After that, this error is produced: mmap: Cannot allocate memory debugger invoked on a SIMPLE-ERROR in thread #<THREAD "initial thread" {90034D1}>: Can't create a new thread This can be tested by running the above (main) function several times, with pauses in between to let the worker threads exit: (dotimes (i 6) (main) (sleep 10)) A process status listing will show that the child threads have been reaped, but we still can't create new threads. -- Christopher Neufeld Home page: http://www.cneufeld.ca/neufeld "Don't edit reality for the sake of simplicity" |
From: Nikodemus S. <nik...@ra...> - 2005-08-22 04:32:35
|
On Sun, 21 Aug 2005, Christopher Neufeld wrote: > (defun main () > (dotimes (i 100) > (sb-thread:make-thread #'(lambda () (worker i))))) > The output of (main) should be the integers from 0 to 99, not necessarily > in order. Instead, a typical run will produce the number 13 seven times, > then the number 20 seven times, then the number 27 seven times, then the > number 33 twelve times, and so on. Finally, it will print the number 100 a > few times. This appears to be because by the time the closure is evaluated > to create the new thread, the value of 'i' has been incremented, maybe > several times. Right on mark up till this point. However, the solution is much simpler: (defun main () (dotimes (i 100) (let ((ti i)) (sb-thread:make-thread #'(lambda () (worker ti)))))) ...introduce a separate binding for each thread. (Untested, not having an x86 Linux handy.) Cheers, -- Nikodemus Schemer: "Buddha is small, clean, and serious." Lispnik: "Buddha is big, has hairy armpits, and laughs." |
From: Christopher N. <cjn...@cn...> - 2005-08-22 13:44:29
|
On Mon, 22 Aug 2005 07:32:29 +0300 (EEST), Nikodemus Siivola <nik...@ra...> said: > On Sun, 21 Aug 2005, Christopher Neufeld wrote: >> (defun main () >> (dotimes (i 100) >> (sb-thread:make-thread #'(lambda () (worker i))))) >> The output of (main) should be the integers from 0 to 99, not necessarily >> in order. Instead, a typical run will produce the number 13 seven times, >> then the number 20 seven times, then the number 27 seven times, then the >> number 33 twelve times, and so on. > Right on mark up till this point. However, the solution is much simpler: > (defun main () > (dotimes (i 100) > (let ((ti i)) > (sb-thread:make-thread #'(lambda () (worker ti)))))) > ...introduce a separate binding for each thread. (Untested, not having > an x86 Linux handy.) Actually, this seemed like a natural thing to try, and I tried it in my testing. It did cause the symptoms to disappear. However, as there was no documentation guaranteeing this behaviour, I didn't want to count on it, in case some more subtle bug was going on here. Just because a race condition seems to disappear, doesn't mean it's fixed. Anyway, I'm happy with the behaviour, if it can be documented. The construct I provided seemed natural, and almost did the right thing, but the documentation should reflect that a new closure should be created holding all of the mutable context used in the call. It's clear on reflection why this is the case, but it's a classic gotcha, and one that introduces subtle, non-reproducble errors if not fully grasped. By the way, the serialization I put in with a mutex and condition variable to get around the problem of my variables shifting underneath me actually speeded up the creation of my swarm of 500 threads by more than a factor of two over creating a new closure. This was for my contribution to the threads-flow shootout benchmark. Apparently, for this combination of kernel and sbcl versions, the galloping threads got in each others ways. On Mon, 22 Aug 2005 12:27:01 +0200, =?iso-8859-1?q?G=E1bor_Melis?= <me...@ho...> said: > LIST-ALL-THREADS lists all active threads. Finished threads are not > listed but they may be around waiting for a gc to reap them. The > problem is that creating new threads does not use the normal allocation > mechanism and does not necessarily trigger gc. If instead of sleeping > you called (gc) then the threads would be gc'ed. > Well, at least that's the clumsy expected behaviour that was broken > recently. I checked in 0.9.3.75 that fixes the gc so the above > paragraph stands. OK, thanks. My checkout from last night does not have that fix, I'll check it out later. -- Christopher Neufeld Home page: http://www.cneufeld.ca/neufeld "Don't edit reality for the sake of simplicity" |
From: Nikodemus S. <nik...@ra...> - 2005-08-22 14:33:46
|
On Mon, 22 Aug 2005, Christopher Neufeld wrote: > Anyway, I'm happy with the behaviour, if it can be documented. The > construct I provided seemed natural, and almost did the right thing, but From CLHS / DOTIMES: "It is implementation-dependent whether dotimes establishes a new binding of var on each iteration or whether it establishes a binding for var once at the beginning and then assigns it on any subsequent iterations." Threading is not the only case this matters, so the LET is effectively mandatory in portable code that closes over the iteration variable. Documentation patches welcome. Cheers, -- Nikodemus Schemer: "Buddha is small, clean, and serious." Lispnik: "Buddha is big, has hairy armpits, and laughs." |
From: Christopher N. <cjn...@cn...> - 2005-08-23 15:33:53
|
On Mon, 22 Aug 2005 17:33:34 +0300 (EEST), Nikodemus Siivola <nik...@ra...> said: > On Mon, 22 Aug 2005, Christopher Neufeld wrote: >> Anyway, I'm happy with the behaviour, if it can be documented. The >> construct I provided seemed natural, and almost did the right thing, but > Documentation patches welcome. OK, here's my proposed patch: --- src/code/target-thread.lisp.orig Mon Aug 22 06:19:44 2005 +++ src/code/target-thread.lisp Tue Aug 23 10:47:04 2005 @@ -459,7 +459,28 @@ (defun make-thread (function &key name) #!+sb-doc "Create a new thread of NAME that runs FUNCTION. When the function -returns the thread exits." +returns the thread exits. + +It is important to note that if FUNCTION is a lambda expression +which depends on variables in the lexical environment, that the values +of those variables are not evaluated immediately, but only at some +later time when the thread begins to execute. Consequently, the +following code may print the number 2 twice: + + (let ((i 1)) + (make-thread (lambda () (format t \"~d~%\" i))) + (incf i) + (format t \"~d~%\" i)) + +The correct solution is to create new closures around the call(s) to +MAKE-THREAD, protecting the variables used against changes made by the +calling context: + + (let ((i 1)) + (let ((i-local i)) + (make-thread (lambda () (format t \"~d~%\" i-local)))) + (incf i) + (format t \"~d~%\" i))" #!-sb-thread (declare (ignore function name)) #!-sb-thread (error "Not supported in unithread builds.") #!+sb-thread -- Christopher Neufeld Home page: http://www.cneufeld.ca/neufeld "Don't edit reality for the sake of simplicity" |
From: <me...@ho...> - 2005-08-22 10:30:52
|
On Monday 22 August 2005 01:10, Christopher Neufeld wrote: > I realize that the multi-threading work is fairly new, but I've come > across > two bugs while working with sbcl threads. I'm running a CVS checkout > from > August 21, 2005, IA-32 architecture, Linux 2.6.11 kernel. > > First, the following simple program breaks: > > (defparameter print-mutex (sb-thread:make-mutex)) > > (defun worker (i) > (sb-thread:with-mutex (print-mutex) > (format t "~D~%" i))) > > (defun main () > (dotimes (i 100) > (sb-thread:make-thread #'(lambda () (worker i))))) > > > The output of (main) should be the integers from 0 to 99, not > necessarily > in order. Instead, a typical run will produce the number 13 seven > times, > then the number 20 seven times, then the number 27 seven times, then > the number 33 twelve times, and so on. Finally, it will print the > number 100 a > few times. This appears to be because by the time the closure is > evaluated > to create the new thread, the value of 'i' has been incremented, > maybe several times. Consequently, the only safe way to start a > thread which takes arguments like this is to set up a barrier system > with condition variables to protect the variables until after the > thread has started executing. What you describe is the intended behaviour. As you noticed the binding=20 of i is mutated by dotimes and new threads close over that binding.=20 Rebinding i works around the mutation problem: (defun main () (dotimes (i 100) (let ((i i)) (sb-thread:make-thread #'(lambda () (worker i)))))) > The second bug is more subtle. From the time sbcl is started, at > least in > my system, I can only spawn a total of 512 threads, even if all of > them are > reaped. That is, in the lifetime of an sbcl process, you can only > successfully call sb-thread:make-thread a total of 512 times, even if > you > only create short-lived threads one at a time. After that, this > error is produced: > > > mmap: Cannot allocate memory > > debugger invoked on a SIMPLE-ERROR in thread #<THREAD "initial > thread" {90034D1}>: > Can't create a new thread > > > This can be tested by running the above (main) function several > times, with pauses in between to let the worker threads exit: > > (dotimes (i 6) (main) (sleep 10)) > > > A process status listing will show that the child threads have been > reaped, but we still can't create new threads. LIST-ALL-THREADS lists all active threads. Finished threads are not=20 listed but they may be around waiting for a gc to reap them. The=20 problem is that creating new threads does not use the normal allocation=20 mechanism and does not necessarily trigger gc. If instead of sleeping=20 you called (gc) then the threads would be gc'ed. Well, at least that's the clumsy expected behaviour that was broken=20 recently. I checked in 0.9.3.75 that fixes the gc so the above=20 paragraph stands. A better fix shall wait until after the freeze. Cheers, G=E1bor |