|
From: David S. <dsc...@vy...> - 2001-03-28 16:48:02
|
> I've simply disabled the first 'read_lock' in the
> DisplayObject::remove() method. I came to this because my application
> always seems to hang at the point where it sets the 'visible' attribute
> of an object to false, and this function is called in response to that
> action.
>
> So, is there something wrong with the locking scheme in cvisual
Yes, there is apparently something wrong.
First of all, some background on the situation: The code in question is
invoked to add or remove an object from the list of visible objects in a
display. The reason it employs locking is that the visual renderer, in a
separate thread, may be iterating through the same list and displaying the
objects. It needs to be protected from encountering deleted objects,
objects that are being modified, etc.
To understand what's happening, we need to look in several places
(1) The insert() and remove() functions themselves
(2) DisplayListIterator::next() in displaylist.h - this is the function used
by the renderer to walk through the list
(3) The code in prim.cpp and arrprim.cpp that calls SetVisible(), which
calls insert() and remove()
This last is important, because it turns out that that code takes *another*
write_lock, on "this". So the sequence for insert goes
lock this
lock head of list
lock first item in list
unlock first item
unlock head
unlock this
The sequence for remove goes
lock this
lock prev
unlock prev
lock next
unlock next
unlock this
The sequence for next is a little harder to see:
lock head
lock first
unlock head
lock second
unlock first
lock third
unlock second
lock fourth
And so on. We need to watch out for deadlocks between next() in the
rendering thread and either insert() or remove() in the other thread.
No deadlocks can occur between insert() and next(). Intuitively, this is
because both functions are walking down the list "with the grain".
Specifically, each one locks a node A only if
(1) It holds no other locks
or
(2) It holds a lock on A.prev and no other nodes
If (1) is the case, this thread has *no* locks and so there can't possibly
be a deadlock while this thread waits for A. If (2) is the case, and A is
locked by the other thread, the other thread cannot be waiting for A.prev
because it can't have both A.prev.prev and A locked.
However, this logic doesn't work for remove(), which locks this->prev
"against the grain". In fact, it's pretty easy to see that there CAN be a
deadlock. To look at just one case:
remove() next()
-------- ------
lock this
lock prev
lock prev unlock prev.prev
lock this (deadlock)
This is fairly embarrassing, and also rather puzzling - why is it working
for everyone except Fred?
Now, Fred's fix removes the lock on prev. It's easy to see that the
deadlock is eliminated, since remove() then runs with the grain of the list:
lock this
lock next
unlock next
unlock this
Alas, this version of remove(), while deadlock proof, is highly suspicious
in another way: it is modifying an object (prev) that isn't locked. To give
a simple example of how this could cause problems, imagine that next() was
written this way:
void DisplayListIterator::next() {
DisplayObject *old = ptr;
ptr = old->next; // (1)
if (old->next) // (2)
old->next->mtx.sync_lock();
old->mtx.sync_unlock();
}
This is a perfectly legitimate way to write it - ptr is locked going in, so
all the accesses are to a locked object. However, imagine this code running
alongside Fred's remove(). I won't give the details, but remove() could
change old->next in between (1) and (2) above, resulting in ptr being set to
one object, but a different object being locked!
Now, the agonizing thing is that Fred's version probably *does* work as
things are implemented now. The prev and next pointers are only used (thank
goodness) in a very small number of places, and from a cursory examination
all of them will probably work given the memory ordering semantics of Intel
SMP. However, I hesitate to leave this land mine around for the next person
who tries to work with (or port) the code.
I think the best solution is for the locking in remove() to work like this:
lock prev
unlock prev
lock this
lock next
unlock next
unlock this
This should be safe because the rendering thread never *modifies* the prev
or this pointers, and multiple Python threads are protected from one another
by the global interpreter lock. Therefore it isn't necessary to have this
locked until we need to actually modify its prev or next pointers.
For this to work, the write locks have to be *removed* from the code that
calls SetVisible (in prim.cpp and arrprim.cpp, and maybe elsewhere?)
insert() and remove() both need to be modified to lock "this" themselves. A
plausible implementation of remove() is something like
void DisplayObject::remove() {
if (prev) {
DisplayObject::read_lock P(prev->mtx);
prev->next = next;
}
DisplayObject::write_lock T(this->mtx);
if (next) {
DisplayObject::read_lock N(next->mtx);
next->prev = prev;
}
prev = next = 0;
getObject().decrement_reference_count();
}
Unfortunately, I have no more time for this today.
> It does bother me that the insert() and remove() methods of
> DisplayObject both create two locks, not releasing the first one before
> acquiring the second even though the first seems no longer to be needed
> when the second is acquired. I tried hacking the code to explicitly
> delete the first lock as soon as possible, but that version still
> deadlocks.
Maybe you are unaware of the syntactic trick being used to implement the
locks? Each lock is removed when it goes out of scope (roughly, when you
hit a } in the code ):
if (prev) {
DisplayObject::read_lock P(prev->mtx);
prev->next = next;
} // <---- unlock P
Dave
|