#85 Concurrency problem in trade* leading to deadlock

open
nobody
tradecommon (9)
5
2011-12-05
2011-12-05
No

I believe the wait/notify logic used by the tradebeans and tradesoap benchmarks is flawed and leads to a deadlock depending on how the JVM schedules threads and handles Object.notify().

Here is how I understand the logic:

(1) The main thread (called M from now on) starts worker threads (W1 and W2 in my example, but there can be an arbitrary number of course).

(2) There is a single object used for synchronization, so all wait() and notify() affect the same monitor.

(3) M starts W1 and W2 and then enters a wait() loop until all work is done (see DaCapoRunner.wait)

(4) The workers first have an initialization phase (see DaCapoTrader.reset). After all workers have finished this phase, the workers start doing real work. So there is a wait() loop until all workers have finished their initialization phase. Only the last thread to be initialized does not enter the wait() loop.

(5) Whenever a worker finishes its initialization phase, it calls notify(). This happens before the actual wait() loop. I assume the intention of this is that all workers start to work at some point.

(6) Whenever a worker has finished a batch of work, it calls notify(). I assume the intention of this is to wake up the main thread in case all work has been done.

(7) The main thread waits until all workers have terminated.

Let's assume as simple scheduling strategy where notify() always wakes up the thread that called wait() last. Then the following thing happens:

* M starts W1 and W2
* M calls wait() -> wait list: M
* W1 does its initialization and then calls notify(). This wakes up M -> wait list is empty.
* W1 calls wait -> wait list: W1
* M runs, but since no work has been done yet it calls wait() immediately -> wait list: W1, M
* W2 does its initialization and then calls notify(). This wakes up M -> wait list: W1
* M runs, but since no work has been done yet it calls wait() immediately -> wait list: W1, M
* Since W2 is the last thread, it does not call wait() but actually starts doing work.
* Whenever W2 finished a batch of work, it calls notify(). This wakes up M -> wait list: W1

As you can see, W1 is never notified. All work is done by W2. Since W1 never terminates, the main thread waits forever and the benchmark is deadlocked.

I think the fix is to change DaCapoTrader.reset() in the following way. Currently, the important parts of the code look like this:

synchronized(consumed) {
consumed[1]++;
if (consumed[1] == threads) {
consumed[1] = 0;
}
consumed.notify();
}
synchronized(consumed) {
while(consumed[1] != 0) {
consumed.wait();
}
}

but it should look like this:

synchronized(consumed) {
consumed[1]++;
if (consumed[1] == threads) {
consumed[1] = 0;
// Wake up all threads after the initialization has been finished.
consumed.notifyAll();
}
}
synchronized(consumed) {
while(consumed[1] != 0) {
consumed.wait();
}
}

As a side node, having to synchronized blocks on the same lock with no code in between is unnecessary, you can combine this to one:

synchronized(consumed) {
consumed[1]++;
if (consumed[1] == threads) {
consumed[1] = 0;
// Wake up all threads after the initialization has been finished.
consumed.notifyAll();
}

while(consumed[1] != 0) {
consumed.wait();
}
}

Discussion

  • Christian Wimmer

    • summary: Concurrency problem in tradebeans and tradesoap leading to d --> Concurrency problem in trade* leading to deadlock
     
  • Andreas Sewe

    Andreas Sewe - 2012-09-11

    I am not sure that the "two synchronized blocks on the same lock with no code in between" are unnecessary/unintentional: A thread leaving the first synchronized block would give the thread(s) just notified a chance to grab the lock before it enters the second synchronized block itself.

    That being said, this piece of code really makes my brain hurt. :-(

     
  • Christian Wimmer

    Regarding "two synchronized blocks on the same lock with no code in between": The VM is allowed to colasece such locks together, and the Java HotSpot server compiler is actualy performing that optimization. This means that it optimized code, there is only one synchronized block. If the original intention was that another thread should grab the lock in between, then this is the wrong way to express it.

     

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:





No, thanks