From: Nikodemus S. <nik...@ra...> - 2008-05-14 11:58:26
|
I'm seeing a fair deal of hanging tests when building threaded on Darwin. For example, this sh run-tests.sh threads.pure.lisp clos-add-remove-method.impure.lisp reliably hangs in the first MAKE-THREAD call in the impure file: the thread is spawned, and starts running, but MAKE-THREAD never returns. Sticking the same code at the end of threads.pure.lisp shows it running fine there, and the test also runs fine if threads.pure.lisp has not been run beforehand... This particular problem at least seems to be due to problems with SB-POSIX:FORK. (Not sure of the exact mechanism yet.) We have three different ways of handling thread post-mortem cleanups, but SB-POSIX:FORK doesn't deal with any of them -- and at a glance I would expect potential problems with at least the Darwin and "default" strategies in child processes. Maybe Linux is robust enough that it doesn't matter there, or maybe I read the code too hastily, but the freeable thread stacks queue seems like a disaster waiting to happen. The following patch makes clos-add-remove-method.impure.lisp pass on threaded Darwin for me, not that threaded tests are too happy there in general: diff --git a/tests/run-tests.lisp b/tests/run-tests.lisp index bcd090d..5d775cd 100644 --- a/tests/run-tests.lisp +++ b/tests/run-tests.lisp @@ -79,7 +79,7 @@ (defun run-in-child-sbcl (load-forms forms) (declare (ignorable load-forms)) - #-win32 + #-(or win32 (and darwin sb-thread)) (let ((pid (sb-posix:fork))) (cond ((= pid 0) (dolist (form forms) @@ -90,7 +90,7 @@ (if (sb-posix:wifexited (aref status 0)) (sb-posix:wexitstatus (aref status 0)) 1))))) - #+win32 + #+(or win32 (and darwin sb-thread)) (process-exit-code (sb-ext:run-program (first *POSIX-ARGV*) I'm not sure yet how much time I can spend on this area right now -- definitely not very much in the Darwin specifics -- so this is mostly a heads-up call for everyone interested. Apropos, if someone is not seeing this on Darwin... it seems to be a bit of a heisenbug: commenting out the last test in threads.pure.lisp also makes things pass for me, and it also occasionally passes when the CPUs are busy enough, it seems. Gotto love these things. Cheers, -- Nikodemus |
From: F. <fa...@gm...> - 2008-06-01 00:34:22
|
When I read you write about fork and threads, I shudder. Fork should basically not be used when there are active threads. Not just the Lisp, but the libc, and any library that uses threads or mutexes (possibly including a threaded malloc) will be mighty confused. I'm not even sure you can use fork when there is but one thread left after others die (might or might not work in practice). In simple cases (static set of mutexes) you can survive with pthread_atfork (assuming lisp and all libraries play well with it). I wouldn't bet on it though. I don't know much about sbcl and its test suite, so I can't comment about the particulars. (PS: oh, and the sbcl implementation of run-program has a lot of scary race conditions. Playing games with aynchronous signal is braindead.) [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Science is like sex: sometimes something useful comes out, but that is not the reason we are doing it -- Richard Feynman 2008/5/14 Nikodemus Siivola <nik...@ra...>: > I'm seeing a fair deal of hanging tests when building threaded on > Darwin. For ex ample, this > > sh run-tests.sh threads.pure.lisp clos-add-remove-method.impure.lisp > > reliably hangs in the first MAKE-THREAD call in the impure file: the > thread is spawned, and starts running, but MAKE-THREAD never returns. > Sticking the same code at the end of threads.pure.lisp shows it > running fine there, and the test also runs fine if threads.pure.lisp > has not been run beforehand... > > This particular problem at least seems to be due to problems with > SB-POSIX:FORK. (Not sure of the exact mechanism yet.) We have three > different ways of handling thread post-mortem cleanups, but > SB-POSIX:FORK doesn't deal with any of them -- and at a glance I would > expect potential problems with at least the Darwin and "default" > strategies in child processes. Maybe Linux is robust enough that it > doesn't matter there, or maybe I read the code too hastily, but the > freeable thread stacks queue seems like a disaster waiting to happen. > > The following patch makes clos-add-remove-method.impure.lisp pass on > threaded Darwin for me, not that threaded tests are too happy there in > general: > > diff --git a/tests/run-tests.lisp b/tests/run-tests.lisp > index bcd090d..5d775cd 100644 > --- a/tests/run-tests.lisp > +++ b/tests/run-tests.lisp > @@ -79,7 +79,7 @@ > > (defun run-in-child-sbcl (load-forms forms) > (declare (ignorable load-forms)) > - #-win32 > + #-(or win32 (and darwin sb-thread)) > (let ((pid (sb-posix:fork))) > (cond ((= pid 0) > (dolist (form forms) > @@ -90,7 +90,7 @@ > (if (sb-posix:wifexited (aref status 0)) > (sb-posix:wexitstatus (aref status 0)) > 1))))) > - #+win32 > + #+(or win32 (and darwin sb-thread)) > (process-exit-code > (sb-ext:run-program > (first *POSIX-ARGV*) > > I'm not sure yet how much time I can spend on this area right now -- > definitely not very much in the Darwin specifics -- so this is mostly > a heads-up call for everyone interested. > > Apropos, if someone is not seeing this on Darwin... it seems to be a > bit of a heisenbug: commenting out the last test in threads.pure.lisp > also makes things pass for me, and it also occasionally passes when > the CPUs are busy enough, it seems. Gotto love these things. > > Cheers, > > -- Nikodemus > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Sbcl-devel mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-devel |
From: Nikodemus S. <nik...@ra...> - 2008-06-01 01:10:37
|
On Sun, Jun 1, 2008 at 3:34 AM, Faré <fa...@gm...> wrote: > When I read you write about fork and threads, I shudder. Fork should > basically not be used when there are active threads. Not just the > Lisp, but the libc, and any library that uses threads or mutexes > (possibly including a threaded malloc) will be mighty confused. I'm > not even sure you can use fork when there is but one thread left after > others die (might or might not work in practice). > > In simple cases (static set of mutexes) you can survive with > pthread_atfork (assuming lisp and all libraries play well with it). I > wouldn't bet on it though. I'm inclined to agree. I think there are a bunch of not-hard things to make forking a bit more sane in the presence of lisp threads, but yes: to robustly fork all running threads need to be informed about the fork -- and then we're off in the interrupt land again. Not worth the effort. Making sure GC doesn't deadlock, etc, it, needs to be done, though: otherwise an unlucky GC just after fork() but before exec() would be pretty nasty. The problem I outlined is a lot more modest, though: in the test-suite case there is only a single thread running when the fork happens, but there have been other threads earlier -- which is the bit that SBCL should deal with properly, but doesn't -- with occasionally spectacular results on Darwin. I don't think this is a real issue on Linux, though. > (PS: oh, and the sbcl implementation of run-program has a lot of scary > race conditions. Playing games with aynchronous signal is braindead.) If you have something particular in mind, please point it out. I assume you're referring to SIGCHLD handling? Where's the race? Cheers, -- Nikodemus |
From: F. <fa...@gm...> - 2008-06-01 01:49:04
|
2008/5/31 Nikodemus Siivola <nik...@ra...>: > an unlucky GC just after fork() but before exec() > would be pretty nasty. This is possible when you're single-threaded, by looking at the available space and gc'ing if not enough is there -- but that's inherently racy in a multi-threaded world. > The problem I outlined is a lot more modest, though: in the test-suite > case there is only a single thread running when the fork happens, but > there have been other threads earlier -- which is the bit that SBCL > should deal with properly, but doesn't -- with occasionally > spectacular results on Darwin. I don't think this is a real issue on > Linux, though. Fixing thread cleanup is good (though I dont understand what cleanup there is to do), but another obvious solution then is DON'T DO IT! Make sure to put the thread test last, after any forking test - or better, do the thread test in a different process, maybe inside a fork. >> (PS: oh, and the sbcl implementation of run-program has a lot of scary >> race conditions. Playing games with aynchronous signal is braindead.) > If you have something particular in mind, please point it out. I > assume you're referring to SIGCHLD handling? Where's the race? I sent email on the list before (or was that only to RmK?). In any case, it is just NOT POSSIBLE to safely share data between an interrupted program and an asynchronous signal handler. Typical condition: the signal arrives before you setup the shared datastructure, and the signal handler has nowhere to store things. Worse: you need to grab some mutex on the datastructure or otherwise modify it, and the handler pops in just in the middle of that operation. "Solution": disable the signal handler (for all threads! Of course, in practice it is only safe for only one thread to handle signals non-trivially) around any write operation on the shared datastructure. But if you're ready to pay that cost, you may as well do wholly without signal handler and just poll using waitpid when you want to read the child status, or use a signalfd in your event loop, etc. Or if you really love threads, have the signal handler just wake up a waiting thread (may as wel use a sigfd at least on linux). And yes, I ran into this run-program race condition in real life. [ François-René ÐVB Rideau | Reflection&Cybernethics | http://fare.tunes.org ] Greenspun's Tenth Rule of Programming: any sufficiently complicated C or Fortran program contains an ad hoc informally-specified bug-ridden slow implementation of half of Common Lisp. |