RE: [Visualpython-users] fix/hack for hanging program

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

> I've simply disabled the first 'read_lock' in the
> DisplayObject::remove() method.  I came to this because my application
> always seems to hang at the point where it sets the 'visible' attribute
> of an object to false, and this function is called in response to that
> action.
>
> So, is there something wrong with the locking scheme in cvisual

Yes, there is apparently something wrong.

First of all, some background on the situation: The code in question is
invoked to add or remove an object from the list of visible objects in a
display.  The reason it employs locking is that the visual renderer, in a
separate thread, may be iterating through the same list and displaying the
objects.  It needs to be protected from encountering deleted objects,
objects that are being modified, etc.

To understand what's happening, we need to look in several places

(1) The insert() and remove() functions themselves
(2) DisplayListIterator::next() in displaylist.h - this is the function used
by the renderer to walk through the list
(3) The code in prim.cpp and arrprim.cpp that calls SetVisible(), which
calls insert() and remove()

This last is important, because it turns out that that code takes *another*
write_lock, on "this".  So the sequence for insert goes

lock this
  lock head of list
    lock first item in list
    unlock first item
  unlock head
unlock this

The sequence for remove goes

lock this
  lock prev
  unlock prev
  lock next
  unlock next
unlock this

The sequence for next is a little harder to see:

lock head
  lock first
  unlock head
  lock second
  unlock first
  lock third
  unlock second
  lock fourth

And so on.  We need to watch out for deadlocks between next() in the
rendering thread and either insert() or remove() in the other thread.

No deadlocks can occur between insert() and next().  Intuitively, this is
because both functions are walking down the list "with the grain".

Specifically, each one locks a node A only if
  (1) It holds no other locks
or
  (2) It holds a lock on A.prev and no other nodes

If (1) is the case, this thread has *no* locks and so there can't possibly
be a deadlock while this thread waits for A.  If (2) is the case, and A is
locked by the other thread, the other thread cannot be waiting for A.prev
because it can't have both A.prev.prev and A locked.

However, this logic doesn't work for remove(), which locks this->prev
"against the grain".  In fact, it's pretty easy to see that there CAN be a
deadlock.  To look at just one case:

remove()             next()
--------             ------
lock this
                     lock prev
  lock prev          unlock prev.prev
                     lock this (deadlock)

This is fairly embarrassing, and also rather puzzling - why is it working
for everyone except Fred?

Now, Fred's fix removes the lock on prev.  It's easy to see that the
deadlock is eliminated, since remove() then runs with the grain of the list:

  lock this
    lock next
    unlock next
  unlock this

Alas, this version of remove(), while deadlock proof, is highly suspicious
in another way: it is modifying an object (prev) that isn't locked.  To give
a simple example of how this could cause problems, imagine that next() was
written this way:

void DisplayListIterator::next() {
  DisplayObject *old = ptr;
  ptr = old->next;                 // (1)
  if (old->next)                   // (2)
    old->next->mtx.sync_lock();
  old->mtx.sync_unlock();
}

This is a perfectly legitimate way to write it - ptr is locked going in, so
all the accesses are to a locked object.  However, imagine this code running
alongside Fred's remove().  I won't give the details, but remove() could
change old->next in between (1) and (2) above, resulting in ptr being set to
one object, but a different object being locked!

Now, the agonizing thing is that Fred's version probably *does* work as
things are implemented now.  The prev and next pointers are only used (thank
goodness) in a very small number of places, and from a cursory examination
all of them will probably work given the memory ordering semantics of Intel
SMP.  However, I hesitate to leave this land mine around for the next person
who tries to work with (or port) the code.

I think the best solution is for the locking in remove() to work like this:

lock prev
unlock prev
lock this
  lock next
  unlock next
unlock this

This should be safe because the rendering thread never *modifies* the prev
or this pointers, and multiple Python threads are protected from one another
by the global interpreter lock.  Therefore it isn't necessary to have this
locked until we need to actually modify its prev or next pointers.

For this to work, the write locks have to be *removed* from the code that
calls SetVisible (in prim.cpp and arrprim.cpp, and maybe elsewhere?)
insert() and remove() both need to be modified to lock "this" themselves.  A
plausible implementation of remove() is something like

void DisplayObject::remove() {
  if (prev) {
    DisplayObject::read_lock P(prev->mtx);
    prev->next = next;
  }
  DisplayObject::write_lock T(this->mtx);
  if (next) {
    DisplayObject::read_lock N(next->mtx);
    next->prev = prev;
  }
  prev = next = 0;
  getObject().decrement_reference_count();
}

Unfortunately, I have no more time for this today.

> It does bother me that the insert() and remove() methods of
> DisplayObject both create two locks, not releasing the first one before
> acquiring the second even though the first seems no longer to be needed
> when the second is acquired.  I tried hacking the code to explicitly
> delete the first lock as soon as possible, but that version still
> deadlocks.

Maybe you are unaware of the syntactic trick being used to implement the
locks?  Each lock is removed when it goes out of scope (roughly, when you
hit a } in the code ):

  if (prev) {
    DisplayObject::read_lock P(prev->mtx);
    prev->next = next;
  }                         // <---- unlock P

Dave

RE: [Visualpython-users] fix/hack for hanging program

This project is no longer active. See vpython.org.

RE: [Visualpython-users] fix/hack for hanging program