(Affects pthreads i.e. Linux implementation of boost)
In a situation where pools are being continually created and destroyed, the terminate_all_workers() doesnt properly join all the threads that were created by the pool.
As a result, eventually pthread_create() will fail inside the worker_thread<pool_type>::create_and_attach()
Because this is silent, instead, the next pool created deadlocks waiting for a condition that will never be sent because there was no thread created to send it but the pool doesnt realise this because it doesnt properly detect creation errors.
On a linux x86-64 OpenSuse 11.4 with intel 8 core CPU, this occurs after 2752 unjoined threads exist.
On a linux amd64 phenom quad core debian squeeze, there seems to have been 32768 unjoined threads created
There are two problems:
(a) the resource exeption within create_and_attach() is silently trapped. This caused me a lot of head scratching for a while...
(b) the terminate_all_workers method() does not join all exiting threads, because it runs before the thread destruction hook worker_destructed() has a chance to add the thread to the terminated_workers vector
I resolved this by counting the number of times the thread creation is called for the pool, and using that to make sure that many threads are waiting to be cleaned up before exiting.
Update: after I fixed this problem, I read bug 2910301 and suspect it is the same problem. I also think the fix supplied is just as valid as what worked for me.
I would still class eating the resource exception as an additional bug, the caller should know if it is not possible to create the pool in the desired manner.
To demonstrate the problem using unpatched 0.2.5:
(Run the example)
g++ -I threadpool-0_2_5-src/threadpool/ -lboost_thread -lpthread example.cpp
After 172 iterations thread creation failed as 2562 threads were finished but never joined
because the exception is silently caught, output of 'task' stops after 172
when I inserted 'throw' in the handler, instead after 172 iterations the program aborted
On a different computer, amd64 debian squeeze, I managed 2047 loops instead
After my patch, this runs to completion
Log in to post a comment.