#7 Eventual pthread resource exhaustion uncaught exception

open
nobody
None
6
2012-05-10
2012-05-10
No

(Affects pthreads i.e. Linux implementation of boost)
In a situation where pools are being continually created and destroyed, the terminate_all_workers() doesnt properly join all the threads that were created by the pool.
As a result, eventually pthread_create() will fail inside the worker_thread<pool_type>::create_and_attach()
Because this is silent, instead, the next pool created deadlocks waiting for a condition that will never be sent because there was no thread created to send it but the pool doesnt realise this because it doesnt properly detect creation errors.

On a linux x86-64 OpenSuse 11.4 with intel 8 core CPU, this occurs after 2752 unjoined threads exist.
On a linux amd64 phenom quad core debian squeeze, there seems to have been 32768 unjoined threads created

There are two problems:
(a) the resource exeption within create_and_attach() is silently trapped. This caused me a lot of head scratching for a while...
(b) the terminate_all_workers method() does not join all exiting threads, because it runs before the thread destruction hook worker_destructed() has a chance to add the thread to the terminated_workers vector

I resolved this by counting the number of times the thread creation is called for the pool, and using that to make sure that many threads are waiting to be cleaned up before exiting.

Update: after I fixed this problem, I read bug 2910301 and suspect it is the same problem. I also think the fix supplied is just as valid as what worked for me.

I would still class eating the resource exception as an additional bug, the caller should know if it is not possible to create the pool in the desired manner.

To demonstrate the problem using unpatched 0.2.5:

(Run the example)
g++ -I threadpool-0_2_5-src/threadpool/ -lboost_thread -lpthread example.cpp
./a.out

After 172 iterations thread creation failed as 2562 threads were finished but never joined
because the exception is silently caught, output of 'task' stops after 172
when I inserted 'throw' in the handler, instead after 172 iterations the program aborted

On a different computer, amd64 debian squeeze, I managed 2047 loops instead

After my patch, this runs to completion

Discussion

  • Andrew McDonnell

    Of course, given ideally it would be a better usecase to crete a pool once and keep using it, I actually discovered this fault when retrofitting into an existing architecture where it was safer to destroy whatever I was creating each time...

     
  • Andrew McDonnell

    • priority: 5 --> 6
     
  • Comment has been marked as spam. 
    Undo

    You can see all pending comments posted by this user  here

    Anonymous - 2012-07-15

    I think patch 3127011 will fix your problem

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks