[Nomen-dev] More thoughts on threading

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

First, I can detect deadlock in worst-case O(n), with n being the number
of threads in the system.  It goes like this- each thread is either
waiting for another thread to release a lock or not blocked and happily
continuing on.  Definitional note: if thread A holds a lock, and threads
B, C, and D are waiting for the lock, and after A releases the lock B gets
it, and when B releases the lock C gets it, and after C releases the lock
D gets it, then D is waiting on C, who is waiting on B, who is waiting on
A.  A holds the lock, and may be waiting on some other lock and some other
thread.

Each thread has a thread structure representing that thread, containing
(among other things) a pointer to the thread structure the thread is
waiting for.  This pointer is NULL if the thread is not waiting for any
lock.  So, when I block a thread to wait for another thread, I start
following the pointers, until I either come to an unblocked thread (at
which point we know we haven't hit deadlock) or I return to the current
thread's thread structure (at which point we know we have hit deadlock).

Since this is done each time a thread blocks, we know we have at most one
loop, and if it exists, the current thread has to be part of it.  So those
are the only two possible exit conditions, we don't have to worry about 
loops which do not contain the current thread- although if we want to be 
paranoid we could do a two-fingered approach and detect such loops still 
in O(n) (the two fingered approach is we have two pointers, one of which 
moves forward two steps for every time the other moves forward one step.  
If the two ever point to the same node, we know we have a loop).

Most of the time, this algorithm will behave as if it's O(1).  Generally 
you won't have to go more than a few threads before you discover that it's 
not a deadlock condition.  And most deadlock conditions will only involved 
a small handfull of threads.  Also, optimizations are possible.  For 
example, if we also stored what lock the thread was blocked on in the 
thread structure, we could jump directly to the head of the linked list of 
threads waiting on the lock and to the next lock.

This means that deadlocks are not silent death- we can detect it and deal 
with it as an error condition.  Now the debate becomes what do we do about 
it- have some thread(s) throw an exception is the obvious answer, but 
which threads?  The one where we detected the deadlock?  All the threads 
involved in the deadlock?  And do we force the exception to be thrown far 
enough that throwing thread(s) release the lock(s) that caused the 
deadlock?  If we don't, then you might see code like:
    synchronized (A) { // Grab a lock on A
        while (1) {
            try {
                synchronized (B) {
                    // ...
                }
                break; /* while loop */
            }
            catch (DeadlockDetectedException e) {
                /* If we deadlocked trying to get B, try it again */
            }
        }
    }

If trying to get the lock on B deadlocks because whoever holds B is 
waiting for a lock on A, you get stuck in an infinite loop.  Of course, 
the above code is deeply, fundamentally, stupid.  I'm not sure I'd have a 
problem simply going "Don't do that then!"  The exception should have a 
complete debug statement of which sequences of threads and locks created 
the deadlock, so the developer can add explicit synchronization statements 
to remove the deadlock (in the above, synchronize on B before A).  This 
isn't a perfect solution, but it's orders of magnitude better than what 
has gone before.

Second, allowing the compiler determine the length of the locks actually 
helps optimization.  Consider the Java code:
    int[] a;
    for (int i = 0; i < a.length; ++i) {
        a[i] = 0;
    }

With no optimization and bounds checking, the above code looks like:
    /* We assume accessing a.length is atomic */
    for (int i = 0; i < a.length; ++i) {
        synchronized(a) {
            if (i > a.length) {
                throw new IndexOutOfBoundsException();
            } else {
                a[i] = 0;
            }
        }
    }

The extra check of i vr.s the length of a is redundant.  But one of the
problems Java has is that another thread might come along and modify a,
changing it's length, between the test that i is less than the length of
a, and the access of a[i].  What the compiler would like to do is:
    synchronized (a) {
        for (int i = 0; i < a.length; ++i) {
            if (i < a.length) {
                a[i] = 0;
            } else {
                throw new IndexOutOfBoundsException();
            }
        }
    }
Now the redundant bounds check can be removed in a safe manner.  Since the 
compiler (runtime, actually) is adding the locks, it can add them far 
enough out to allow the optimizer to run.

Thinking about it, I wonder if we could "fix" a deadlock.  Consider some 
code which does some stuff with object A, then stops mucking with A and 
mucks with object B for a while, then stops mucking with B and goes back 
and mucks some more with A before returning.  It looks like:

    // muck with a
    // muck with b
    // muck with a again

Now, there are two ways the compiler might lock this.  Method one is to
hold the lock for a for the entire time:

    synchronized (a) {
        // muck with a
        synchronized (b) {
            // muck with b
        }
        // much with a again
    }

Method two is to release the A lock when we don't need it, before we lock
B:

    synchronized(a) {
        // muck with a
    }
    synchronized(b) {
        // muck with b
    }
    synchronized(a) {
        // muck with a again
    }

Unless it leads to deadlock, method one is preferred- it only does two 
locks, not three.  But it would be possible to change one to the other.  
Detecting this would be easy- you could simply have the holding thread 
mark the lock when it enters a region where the lock can be broken, and 
erase the lock when it exits.  The code could look like:

    lock(a->lock);
    // muck with a
    a->lock->breakable = 1;
    lock(b->lock);
    // muck with b
    unlock(b->lock);
    a->lock->breakable = 0;
    // muck with a again
    unlock(a->lock);

So when we detect a deadlock, we go through and find if we can break a 
lock that is involved in the deadlock.  If we can, we break the lock 
(resolving the deadlock), and modify the above code to method two.

This is made easier because we only break locks when the thread in 
question is already blocked.  We simply don't optimize past a lock, any 
lock.  So if we saw:
    for (int i = 0; i < a.length; ++i) {
        synchronized (b) {
            a[i] = b;
        }
    }
we could not reduce the bounds check away (by moving it outside the 
synchronization).

I actually like this- literally, we are optimizing the lifetime of locks 
based upon maximal performance and no deadlocks.  We can determine 
experimentally where the best bounds are.  The biggest problem here is how 
this interacts with optimization.

Three, we can completely eliminate some locks.  This is 100% just an
optimization.  I define that object B is dominated by object A if both
objects are global objects, and all paths from the root referencess (local
and global variables) to object B go through object A.  In other words, if 
you can access object B, you have to already have the lock for object A.  
In this case, we don't have to lock B at all, it's lock is always 
redundant.

Brian