Hi,

I am writing this email with Bcc to the ECL mailing list and to the GC developers mailing list. I just discovered a serious race condition that prevents our program from exiting. This race condition happens between the exit code associated to a call to dlclose() and the exit code from a POSIX thread.

Roughly, we just run ECL, load a bunch of libraries (DLLs) and then quit the program. At exit time two things will happen: the libraries will have to be unloaded and the servicing threads will exit. This results in the program hanging, as shown below

1) This thread is a servicing one. It is trying to exit and in the process it acquires the GC lock, but for some reason the thread invokes the dyld library. I still haven't located where in GC this happens but from the symptoms it seems it is close to GC_unregister...

(gdb) thread 2
(gdb) bt
#0  0x00007fff88009bf2 in __psynch_mutexwait ()
#1  0x00007fff897d31a1 in pthread_mutex_lock ()
#2  0x00007fff84eae623 in dyldGlobalLockAcquire ()
#3  0x00007fff6172a745 in __dyld__ZN26ImageLoaderMachOCompressed20doBindFastLazySymbolEjRKN11ImageLoader11Link\
ContextEPFvvES5_ ()
#4  0x00007fff61717922 in __dyld__ZN4dyld18fastBindLazySymbolEPP11ImageLoaderm ()
#5  0x00007fff84eae716 in dyld_stub_binder_ ()
#6  0x0000000101d01458 in C.88.15036 ()
#7  0x0000000101c73100 in GC_inner_start_routine (sb=0x1041deeb0, arg=0x102117ea0) at pthread_start.c:67
#8  0x0000000101c6eb1c in GC_call_with_stack_base (fn=0x101c73030 <GC_inner_start_routine>, arg=0x102117ea0) a\
t misc.c:1510
#9  0x0000000101c74565 in GC_start_routine (arg=0x102117ea0) at pthread_support.c:1504
#10 0x00007fff897d48bf in _pthread_start ()
#11 0x00007fff897d7b75 in thread_start ()

2) This thread is the main one. It is trying to close a bunch of libraries, none of which are related to the thread above. However, when dlclose() is called, some code associated to the garbage collector is run and we enter a race condition.

(gdb) thread 1
[Switching to thread 1 (process 37491), "com.apple.main-thread"]
0x00007fff88009bf2 in __psynch_mutexwait ()
(gdb) bt
#0  0x00007fff88009bf2 in __psynch_mutexwait ()
#1  0x00007fff897d31a1 in pthread_mutex_lock ()
#2  0x0000000101c74833 in GC_lock () at pthread_support.c:1784
#3  0x0000000101c6c53d in GC_remove_roots (b=0x104f03220, e=0x104f03238) at mark_rts.c:311
#4  0x0000000101c61f20 in GC_dyld_image_remove (hdr=0x104eff000, slide=4377800704) at dyn_load.c:1319
#5  0x00007fff61714bdd in __dyld__ZN4dyld11removeImageEP11ImageLoader ()
#6  0x00007fff6171858d in __dyld__ZN4dyld20garbageCollectImagesEv ()
#7  0x00007fff6171c432 in __dyld_dlclose ()
#8  0x00007fff84eaebd5 in dlclose ()
#9  0x0000000101c2ae8c in dlclose_wrapper [inlined] () at /Users/jjgarcia/devel/ecl/src/c/ffi/libraries.d:432
#10 0x0000000101c2ae8c in ecl_library_close (block=0x103be4e00) at libraries.d:432
#11 0x0000000101c2af79 in ecl_library_close_all () at libraries.d:448
#12 0x0000000101b1a84d in cl_shutdown () at main.d:301
#13 0x0000000101b1a964 in si_exit (narg=4377800704) at main.d:839
#14 0x0000000101b13e47 in main ()


--
Instituto de Física Fundamental, CSIC
c/ Serrano, 113b, Madrid 28006 (Spain) 
http://juanjose.garciaripoll.googlepages.com