Re: exemption-wait question

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Vladimir Tzankov writes:
 > On 11/1/10, Don Cohen <don...@is...> wrote:
 > >  > If thread is interrupted while in exemption-wait the interrupt
 > >  > function will be executed with mutex held.
 > > So if you interrupt the thread to go into the debugger, then all other
 > > threads that need to get the mutex (like to return write access) are
 > > stuck while you debug.  And if you then interrupt another to go into
 > > the debugger you can't even get into the debugger cause you're stuck
 > > waiting for the mutex before entering the debugger.
I gather you agree with all above.
I think you'll admit that this is not a good thing for cases where you
want most of your threads to keep working (like in a web server) while
you look around in the debugger to see what's going wrong with one thread.

 > > It seems to me that if you interrupt exemption-wait you should not
 > > wait to get the lock.  That's supposed to happen only before you exit
 > > exemption-wait and you should be able to interrupt and go into the
 > > debugger before that.

 > If you are brave enough - unlock the mutex from interrupt function -
 > but be careful to lock it again on exit.
This actually seems reasonable - I interrupt a thread, go into the
debugger, find that it's in exemption-wait and unlock the mutex.
Now other threads can continue to run while I examine the state of
this one.
I'd like to automate the first step so that other threads do not have
to wait while I figure out that I'm in exemption-wait and unlock the
mutex.  Where is doc that tells me how to do that? 
When you interrupt with function NIL to debug, is that the same as
some function that I can call?  I'm guessing something like 
(lambda nil (cerror ...)) ? 
Perhaps a function to show some backtrace of a given thread would be
useful.

 > btw: debugging in presence of many threads is quite adventurous. I
 > find it mostly useful for fixing deadlocks - for everything else I
 > prefer logging.
I've noticed that.  
It seems much more sane now that I can control which thread reads 
the input I type.

 > Let's explain why it works this way:
 > Suppose you are stuck in pthread_cond_wait() and you want to interrupt
 > it. According to POSIX standard this function never returns EINTR (if
 > it gets signal). Rather it calls signal handler and continues to wait
 > as if nothing happened. Within signal handler no lisp code can be
 > executed (and actually very limited set of C functions are allowed) .
 > The only way to implement thread-interrupt while we are waiting in
 > pthread_cond_wait() is to signal the condition/exemption, run the
 > interrupt function and continue to wait. After we signal the exemption
 > - mutex is reacquired automatically by the OS. As final result - lock
 > is held when the interrupt functions is executed.
Ok, so you can never look up the stack and find that you're in
exemption-wait.  If you interrupt while in that function you find
yourself ready to execute the thing after it.  And at that point
exemption-wait has returned T.  So it's not so easy to even see that
the thread you're considering debugging is in exemption-wait !
There really ought to be a way to find out what threads are waiting
and for what.

 > > > On non-local exit from the function - all unwind-protect forms will
 > > > be executed and mutex in the above example will be unlocked. In
 > > > case of normal exit from interrupt function - exemption-wait will
 > > > continue to wait.
 > > "The function" means the one running the code above?
 > "interrupt function" means the function passed as :function to
 > thread-interrupt to be executed in the context of the thread.
So I interrupt the thread with function NIL, which does something like
a call to cerror after the return from exemption-wait.  I then try to
return from something outside the exemption-wait loop and get stuck
back in it.  The typical fight between debugger and unwind protect.
Whereas, if I did not use the :test argument to exemption-wait, but
wrote my own loop, I could return from that loop and escape the wait.
Sounds like a reason to write my own loop.
I suppose all of this also applies to the case of interrupt with
function T to terminate the thread?

 > >  > With native OS preemptive threads - we do not have control on the
 > >  > scheduler. We may influence it with mutex and exemptions.
 > > This seems strange.  Wouldn't one normally wish to control which
 > > threads run when more are ready to run than there are processors
 > > available to run them?  I'm not blaming you, of course!
 > > (Unless you have a lot more to do with the design than I imagine.)
 > Not sure I understand. You control threads being executed via mutexes
 > and exemptions - that's all.
I argued that I could implement what I need to control whether a
thread is runnable by doing something in the iterrupt function that
blocks, and that this was all I needed to write my own scheduler.
Did you believe that argument?

 > > > > BTW, it seems to me that there should also be some some way to 
 > > > > control which of several threads waiting to run should be 
 > > > > scheduled next.  Why is that either not possible or a bad idea?
 > > > Use distinct exemptions for this (and share the mutex if needed).
 > > Just off hand I don't see how this solves the problem.
 > > Suppose each thread had a priority (integer) - how would you arrange
 > > to always resume a thread of highest priority among those ready to
 > > run?
 > What do you mean by "thread ready to run"? All threads are running all
 > the time until they exit - just some of them are blocked in calls like
 > mutex-lock, exemption-wait, read(), etc and do not consume cpu cycles.
 > Only OS kernel knows when a thread is in such blocked state and why.
By ready to run I mean not blocked as described above.
I'd like to add another kind of blocking controlled directly by my
program, e.g. function (block thread) and (unblock thread).
So you're telling me that all this is just not supported by the OS
thread implementations.  I take it that if you interrupt in a blocked
read you at least see some reading function on the stack, right?  How
about mutex-lock?  How about any others?
 > Looks like you want to process some prioritized tasks. If so - I would
 > do something like:
 > 1. define a job
 > 2. maintain sorted by priority collection of jobs. guard with mutex
 > and signal an exemption when new highest priority item arrives.
 > 3. create (pool of) thread(s) - for jobs execution. wait with
 > exemption on the collection of jobs and execute when available (when
 > collection is not empty).
 > 4. In case all threads are busy executing jobs and new highest
 > priority job arrives and you are absolutely sure you want to execute
 > it immediately - spawn a new thread or implement kind of job
 > cancellation in order to release thread from the pool.
This is not as good as my proposed scheduler.  In particular, when
a running job is no longer as worthy as some other job waiting to run
I'd like to suspend its thread -- so it can be later resumed when it
is again worth running.  I also want to be able to notify the
scheduler (a thread) when some operation blocks or unblocks so that
it can block or unblock others in compensation.  It would also be good
for the scheduler to be able to read the run time of each thread so
that it can tell, e.g., that lisp is now using 2 of the four cpus so
we should try to block all but the two most urgent threads.

I view this as an inadequacy of the OS thread implementations.
I'm guessing that some thread implementations support the sort of
thing I'm describing and some don't, so it's probably only an
inadequacy of some of them.