#62 DeadlockDetector casuses problems

open
nobody
None
5
2008-02-07
2008-02-07
Anders Wallgren
No

From http://forum.hibernate.org/viewtopic.php?t=947246&postdays=0&postorder=asc&highlight=apparent+deadlock+c3p0&start=15:

swaldman wrote:
hi,

please try c3p0-0.9.1-pre11. i think most of these APPARENT DEADLOCK issues are now resolved, though of course you can prove me wrong...

if you have trouble reaching c3p0's website on mchange.com (we're having some network issues), please go to sourceforge.net and search for c3p0. sourceforge hosts the actual releases.

good luck!
steve

We're using c3p0-0.9.1.2, and this problem happens consistently during heavy loads (especially if the machine itself is loaded to the point where it's swapping).

As far as I can tell, after reading the c3p0 code for a while, this is a fundamental problem with the way the DeadlockDetector code is written. The detection algorithm results in false positives under heavy system load.

The detection algotihm is simply that if no background task completes and no new tasks are posted between consecutive runs of the detector, then a deadlock is assumed and the detector interrupts all the pool threads and starts new ones.

Since there is no problem other than a really loaded machine, these thread interruptions cause resource leakages, since no cleanup is done by the background tasks when they are interrupted, and the process is highly likely to be repeated again in the future.

Am I off in the weeds here, or is this what's going on? We're pretty close to just hacking away the deadlock detector code -- the cure is much worse than the disease, it seems.

There's a second problem we've run into that's somewhat self-inflicted, but points out another vulnerability in the code. We configure acquireRetryAttempts to zero (meaning retry indefinitely). The problem is that this will cause resource starvation under certain conditions. For example, if all pool threads are blocked on acquiring connections, then no threads are available to refurbish released connections, which causes things to grind to a halt.

This is easy to configure around, but seems like something that should at least be documented as a vulnerability in the code.

Discussion

  • Logged In: NO

    This issue is a showstopper for our project. I agree that the cure is worse than the disease. It's impossible to maintain sufficient uptimes when using c3p0 in a large busy system because of the resource leakage caused by false-alarm deadlocks. Please consider removing this "feature".

     
  • Logged In: NO

    Hi.

    We have a highly concurrent crawling platform in which we see the same problems. APPARENT DEADLOCK! many times a day and after a while an OOM which I think has to do with that the resources are not cleaned up.

    Kindly

    //Marcus Herou, tailsweep.com