On 14 May 2011 03:50, Bart Botta <00003b@...> wrote:
> On 18.104.22.168 linux x86-64, I'm getting intermittent failures on
> deadlock-detection.1 (~20% of the time when running threads.pure.lisp)
> and deadlock-detection.4 (~50%).
> Running the tests individually fails less often. Running them in a
> loop, it looks like both fail ~0.5% of the time, with both threads
> returning :deadlock
Wow, that was a brainfart from me. Both can indeed deadlock if the
detection runs in parallel -- it's just rare in practise.
> and very rarely (maybe 1 in 50k or so), I get the
> following error:
> The value NIL is not of type SB-THREAD:THREAD.
> [Condition of type TYPE-ERROR]
This one is interesting!
The proximate cause is a missing WHEN OTHER-THREAD in CHECK-DEADLOCK.
...but underneath, from the backtrace, we see a more interesting issue:
* T1 holds L1, T2 holds L2.
* T1 wants L2, T1 wants L1.
* Both run deadlock detection.
* Both detect the deadlock.
* Both format the error message for stderr.
* Both call FIND-CLASSOID-CELL during the printing.
* Both want the lock on *CLASSOID-CELLS* (call it L3).
* Say that T1 succeeds.
* T2 fails and runs deadlock detection for L3. It finds it owned by T1,
checks the state of T1 and find that T1 is waiting for L1 -- owned
by the current thread -- a deadlock!
...I need to think about this a bit more, but am committing the fixes
to the other issues forthwith.
Many thanks for the report!