From: Owen M. <owe...@bc...> - 2012-10-05 14:55:40
|
Hello, I'm using a multiprocessing.Pool to parallelize a set of tasks which record their results into separate hdf5 files. Occasionally (less than 2% of the time) the worker process will hang. According to gdb, the problem occurs while opening the hdf5 file, when it attempts to obtain the associated mutex. Here's part of the backtrace: #0 0x00007fb2ceaa716c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0 #1 0x00007fb2be61c215 in H5TS_mutex_lock () from /usr/lib/libhdf5.so.6 #2 0x00007fb2be32bff0 in H5open () from /usr/lib/libhdf5.so.6 #3 0x00007fb2b96226a4 in __pyx_pf_6tables_13hdf5Extension_4File__g_new (__pyx_v_self=0x7fb2b04867d0, __pyx_args=<value optimized out>, __pyx_kwds=<value optimized out>) at tables/hdf5Extension.c:2820 #4 0x00000000004abf62 in ext_do_call (f=0x4cb2430, throwflag=<value optimized out>) at Python/ceval.c:4331 Nothing else is trying to open this file, so can someone suggest why this is occurring? This is a very annoying problem as there is no way to recover from this error, and consequently the worker process is permanently occupied, which effectively removes one of my processors from the pool. Regards, Owen Mackwood |