From: Julian H. <jul...@sb...> - 2003-08-22 05:22:01
|
Ah, you guys make a fine QA department. :) Thanks, I am able to reproduce this. Looks like I'm going to give the lock acquisition order a rethink. I don't want to gate everything on CachePool, because it's important that we're able to load 2 aggregations in parallel. (Remember I was loath to put in TOO MUCH synchronization a few months back? This is what I was worried about.) Julian > -----Original Message----- > From: mon...@li... > [mailto:mon...@li...] On Behalf > Of Andreas Voss > Sent: Wednesday, August 20, 2003 11:04 AM > To: Julian Hyde > Cc: mon...@li...; hh...@to... > Subject: [Mondrian-devel] Deadlock in Mondrian 1.0 Test Case > > > Hi Julian, > > after some fiddeling with classpaths, build scripts etc I was > able to run the > mondrian.test.Main test suite. Its very good that Mondrian > has all these > tests, and its even better to see that with version 1.0 all test pass > successfully (except XML/A). Well, most of the time, because > sometimes the > tests hang with zero cpu usage. I tried to figure out what > happens, and here > is my result. > > First I found a bug in the TestCaseForker (fixed in attached > FoodMartTestCase). The bug caused the main thread to wait() > although all test > threads had finished. The problem was that notify() wakes up > any thread and > not necessary the wait()ing main thread. I replaced the > notify/wait construct > with Thread.join(). > > But the tests still hang. In attached deadlock-03.txt you > find stack traces of > the involved threads. Thread #1 and thread #3 are responsible for the > deadlock. Thread #1 is inside synchronized Aggregation.get() > and holds a lock > on the Aggregation 68. It tries to call pin() on CachePool > 119 which is > locked by thread #3. > > Thread #3 is inside synchronized CachePool.unpin() and holds > a lock on > CachePool 199. It tries to call synchronized removeSegment on > Aggregation 68 > which is locked by thread #1. > > A solution would be, as you wrote in some comment, to aquire > the locks in a > defined order. It seems that CachePool is a singleton, so we > could always > aqire a lock on the CachePool before working with Aggregations, eg. by > > synchronized (CachePool.instance()) { > // work with Aggregation's here > } > > Anyway, I dont think I can fix this easily, could you please > take a look at > that? > > Best Regards, > Andreas > > |