Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.

Close

#9 Bad thread dispersion under load...

closed
Steve Waldman
None
5
2005-03-24
2005-03-22
Peter Fassev
No

Hello,

I have written a very simple test, which starts a set of
threads (say 200), which concurrently try to get a
connection and execute a query, and than give the
connection back. Each thread is counting its iterations.
After a 2 minutes the treads are softly stopped (through
a global flag) and a summery is printed, when all of
them had really exited.

Actually I wanted to test the speed of different pools, but
I noticed, that there is bad dispersion, when too many
threads concurrently try to get a connection. If there are
for instance only 15 available connections and 200
threads (but it is the same with 100 only), only the first
30 threads was able to execute many iteration (between
100 and 150). The majority of the remaining threads
comes only to 2-3 iterations.

I am not sure, how the c3p0 is distributing the open
connections to the waiting threads, but I suppose it is
using the JVM random mechanism, wait/sleep and
notifyAll/interrupt on some lock. I suppose this, because
after I found some other issues (0.8.4.5 produces some
time a lock at the end of an application with Hibernate,
0.8.5.1 closes the prepared statements - thanks for the
quick fix in 0.8.5.2!) I had written a simple pool by
myself, which initially uses the wait/notifyAll method to
choose arbitrary one of the waiting threads. With the
first version I had the same bad dispertion - exactly the
first 30 threads was able to do some job (I suppose, 2 *
15 available connection because of the switch after
notify). I think, the so called "arbirtry" is
notification/sychronization of the JVM is not so good,
when choosing the next thread to enter a lock.

But than I changed the notification method of the pool to
use a FiFo queue - a simple ArrayList - instead of
notifyAll and took by myself always the first thread
within the waiting queue. If this thread wasn't able to get
connection again, it was added infront of the queue, to
be awaken the next as first. The result was good, now
all threads were executed almost equal - between 18-24
iterations.

Here is the body of my test (if you want to use it pleas
write your own initialization and query):

public class PoolLoadTest implements Runnable {

protected static int threadCount = 200;
protected static int duration = 2*60*1000;

protected int counter;
protected int index;
protected static boolean stopped = false;
protected static int threadsRunning = 0;
protected static int[] counters = new int[threadCount];

/** Creates a new instance of PoolLoadTest */
public PoolLoadTest(int index) {
this.index = index;
}

public void run() {
synchronized(PoolLoadTest.class) {
++threadsRunning;
}
try {
Thread.sleep(200);
} catch (Exception e) {
}
try {
while (!stopped) {
try {
PersistenceService.openSession();
List list = GenericDataService.getList(User.class);
++counter;
} finally {
PersistenceService.closeSession(true);
}
}
++counter;
} finally {
synchronized(PoolLoadTest.class) {
--threadsRunning;
}
}
counters[index] = counter;
}

public static void main(String args[]) {
InitializationService.startUp();
for (int i = 0; i < threadCount; ++i) {
Thread t = new Thread(new PoolLoadTest
(i), "Thread " + i);
t.setDaemon(true);
t.start();
}
try {
Thread.sleep(duration);
} catch (Exception e) {
}
stopped = true;
while(threadsRunning > 0) {
try {
Thread.sleep(100);
} catch (Exception e) {
}
}
int sum = 0;
for (int i = 0; i < threadCount; ++i) {
System.out.println("Thread " + i + ":" + counters[i]);
sum += counters[i];
}
System.out.println("All Threads :" + sum);
InitializationService.shutDown();
}

}

I hope, I can help You with this to find the problem.

Best regars
Peter

Anyway, the speed (the initial thing I wanted to test) of
c3p0 is actually very good. I was really surprised to see
this, because the code is looking complicated, and
many tasks are executed asynchronically from
separated helper threads. Good job!

Discussion

  • Peter Fassev
    Peter Fassev
    2005-03-22

    Logged In: YES
    user_id=234469

    Sory, I ment deadlock for Version 0.8.4.5. This happens
    rarely so I am not able give You a test application. I will test
    this with the new 0.8.5.2 Version, when I have enough time.

    Best regards
    Peter

     
  • Steve Waldman
    Steve Waldman
    2005-03-23

    Logged In: YES
    user_id=175530

    Peter,

    Wow. This is an interesting issue. I'd never really thought
    to test whether there was any bias in the likelihood a
    wait()ing Thread would acquire a Connection. I didn't have a
    chance to test it today and see for myself, but I'm curious,
    and will get back to you. As you guessed, c3p0 does nothing
    in particular to manage this, and relies upon hypothesized
    randomness in the wait()/notifyAll() mechanism. But an
    "undefined order" does not mean a truly random order, so
    it's unsurprising that under heavy contention there could be
    uneven distribution. As you suggest, it shouldn't be to hard
    to enforce a fair ordering, and I'll think about adding some
    mechanism to do this.

    Thanks for the nice words re: speed. I am embarrassed and
    astonished by the complexity of c3p0, which is realy a
    library that performs a trivial and simple function, and I
    initially expected more trivial and simple code. Much of the
    library's complexity owes to Sun's definition of transparent
    pooling, which puts a burden on driver developers that would
    be much more easily borne if managed explicitly and
    cooperatively by applications and drivers. But much of the
    complexity derives from c3p0's obsession with minimizing the
    duration during which contended locks are held. I do hope
    this leads to a fast library in real-world use, and I am
    always greedy for and glad to hear a compliment...

    Anyway, more soon.

    smiles,
    Steve

     
  • Logged In: NO

    Hello Steve,

    just a thought: although I was able to achieve good
    distribution - between 17-24 iterations per thread - I think it is
    impossible to get a perfect solution, where all threads perform
    equally, due to the fact, that a synchronization of some kind
    (over the pool for instance or the waiting queue) is still
    needed. So there will be always a small dependence of how
    synchronization/notification is handled by the JVM.

    Best regards
    Peter

     
  • Logged In: NO

    Steve,

    Sorry for my post, but obviously I can't leave with things,
    whitch are not perfect... (and this is sometimes a problem,
    not a quality). I have played with my pool and at the end I was
    able to get the same speed as yours (in my very simple load
    test only!) with a perfect distribution ot 20-21 iterations - so it
    is actually possible.

    Sorry again for my quick commet, and please don't get it too
    seriously. As you know, this is not the most important feature
    of a pool.

    Regards
    Peter

     
  • Logged In: NO

    Steve,

    sorry again, but I think the test I provided is too simple and
    quite misleading, because practically nobody writes a loop,
    where a connection is get and given back 100 times, without
    a brake. So I have written a simple Thread.yield() within the
    loop, just after the connection is given back to the pool. And
    the results were much better - from 6 to 40 iterations per
    thread, where the most threads has managed about 17-25.
    This is pretty acceptable achievement.

    Thus I must admit, that the problem does not lie within
    synchronize/notifyAll mechanism of the JVM, but is up to the
    simple fact, that my first "stupid" loop just continues after he
    has given the connection back and takes it again from the
    pool. So if the both operation checkin/checkout are fast
    enough, most probably there will be no thread
    interruption/switch, and we will become the experienced bad
    distribution, because the thread continues to hold the
    connection. Actually it commes to the thread slices
    (managed by the system, and not by Java) and to the
    interaption of the threads on the right place.

    Event if the load distribution is still an interesting issue to
    think about, please excuse me, if I scared you too match with
    my too fast and obviously not quite correct conclusions.

    With best regards.
    Peter

     
  • Steve Waldman
    Steve Waldman
    2005-03-24

    Logged In: YES
    user_id=175530

    Peter,

    Thanks again for the very detailed testing and comments, and
    for calling my attention to this issue, which I frankly had
    never thought about.

    I've made my own test program and have gotten results
    similar to some of your tests. This could become
    maddeningly detailed -- I'm tempted to start analyzing means
    and variances and testing the output against hypothesized
    distribution functions, but thus far I've resisted.
    Qualitatively, I am seeing ranges larger than I would like
    (something like 30 to 40 for 300 threads, two minutes,
    between "luckiest and "unluckiest", with the range narrowing
    as the contention increases and drives the numbers down
    towards zero. The bulk of the distribution tends to lie in a
    range plus or minus 9 from the mean with these parameters,
    which is broader than I would expect.

    I guess from a "bug" perspective, I'd categorize the current
    behavior as less than optimal, but not a huge problem. I'm
    tempted to try to implement a FIFO policy, since as you
    suggested, it needn't be expensive. But it may not happen so
    quickly, since it's not clear that there is a problem users
    are likely to notice in actual use. But it's definitely
    "wrong" somehow that c3p0 is relying on an undefined order
    of notification to be even-handed.

    Thanks again for all the testing and thinking about this
    problem. I really do appreciate the help.

    smiles,
    Steve

     
  • Steve Waldman
    Steve Waldman
    2005-03-24

    • status: open --> closed
     
  • Steve Waldman
    Steve Waldman
    2005-03-24

    • assigned_to: nobody --> swaldman