I think I have an "idea" what is going on - or at least I can get a little further. In the failure condition I've been seeing what appears to be happening is that when perform_thread_post_mortem() is called it sometimes is being called in the child after a fork. At this point there is no thread (a fork nukes all existing threads except for the thread doing the fork). The perform_thread_post_mortem() then calls: gc_assert(!pthread_join(post_mortem->os_thread, NULL));
Since the thread pointed at by post_mortem->os_thread doesn't exist pthread_join is returning a non-zero value (it looks like errno is probably set ESRCH but I haven't verified that at this point).
Anyway, this causes gc_assert() to abort resulting in the test failure.
To work around this I made two changes:
1. When the thread_post_mortem structure is first created I added a new field called origpid and set it to getpid().
2. When perform_thread_post_mortem() is called it calls getpid() and compares that to origpid. If they are different then I don't bother calling any of the pthread_* routines in perform_thread_post_mortem(). I do however free up the post_mortem, and call os_invalidate(). I'm not sure if this is entirely the correct thing to do.
The upshot is that I get further in the tests, but now I'm running into a consistent hang in threads.impure.lisp. So... At this point I'm not sure if my "fix" has introduced other issues or if things are getting farther along and new issues are coming up.
So, my plan right now is: I'm backing my change out, rebuilding and I'm going to run threads.impure.lisp by itself and see what I can discover. I'll probably put my change back in a slightly different way when I've had a chance to go through the code in this area a little more (I need to verify that calling things like os_invalidate() when I have a non-existent thread is the right thing to do).
V. Glenn Tarcea
On Feb 27, 2010, at 9:11 AM, Glenn Tarcea wrote:
> Thanks for the pointers. I'm going to poke around more to see what I can discover as well. I work mostly on MacOSX and would like to understand the issues around using threads in SBCL on it.
> V. Glenn Tarcea
> On Feb 27, 2010, at 8:44 AM, Nikodemus Siivola wrote:
>> On 27 February 2010 15:27, Glenn Tarcea <gtarcea@...> wrote:
>>> Thanks for the response. Why does a fork after spawning threads cause an issue on MacOSX?
>>> That is, not the internals to macos, but what issues do we see?
>> I added some mailing list archive links to the relavant bug on Launchpad:
>> Reading them should clarify the issue a bit -- but independent
>> investigation is always good too, maybe my analysis is bogus?
>> -- Nikodemus
> Download Intel® Parallel Studio Eval
> Try the new software tools for yourself. Speed compiling, find bugs
> proactively, and fine-tune applications for parallel performance.
> See why Intel Parallel Studio got high marks during beta.
> Sbcl-devel mailing list