On Sun, Mar 28, 2010 at 11:31 AM, Nikodemus Siivola
> On 27 March 2010 21:09, Tobias C. Rittweiler <tcr@...> wrote:
> I can't seem to write a short reply, so sorry about the long one --
> not as well organized as I would have liked to.
I cut a little of it, an midposted my replies.
> * It is still not obvious to me that WITHOUT-INTERRUPTS should snuffle
> deadlines. I'm not strictly opposed to it, assuming a specific concern
> is covered somehow. (Saying convincingly "It can't work like that,
> ever." counts.)
> * Stuff about critical sections, errors, and unwinds.
> * Stuff about API design of condition variables and related constructs.
> ...my excuse is that I'm a bit feverish. :)
>> Now it's not totally obvious how right or wrong that really
>> is. Deadlines are not interrupts. OTOH, W/O-INTERRUPT is, I think, used
>> with the assumption that it means "pseudo atomic except for GC" and in
>> particular that the user cannot unwind during its extent.
> I don't quite agree. WITHOUT-INTERRUPTs means without deferrable
> interrupts, which means without most POSIX signals and without
> INTERRUPT-THREAD. You're reading too much into it, I think.
> W/O-I says that events from outside that particular thread of
> execution will not cause actions inside that thread of execution.
> A deadline is a local and controlled event -- just like user calling ERROR.
> Now, that said, I do think you're on to something here. I just
> disagree about the exact nature of the problem and the solution.
I'm inclined to agree with Tobias here, WITHOUT-INTERRUPTS is often
used to prevent any kind of async unwind: that's why no gc hooks are
run when in it's in effect.
Furthermore, what differentiates a deadline from a timer interrupt is
that it can only happen in a few places where it's safe and handled
explicitly. But the async nature is the same. In this sense I fail to
see how a deadline can be a "local event".
This is only to say what I believe the status quo is.
> Most of the time anyone uses WITHOUT-INTERRUPTS, having an ERROR occur
> will make things hard to reason about. It's not something you want to
> happen. ...but I don't see how W/O-I should or could do anything about
> it in general.
> The thing is, while the CL condition system is great, it is a tricky
> tool for "low level systems code", where you want to be able to reason
> about most everything that can happen during the execution of a block
> of code: you don't want condition handlers running random code during
> your critical section, so...
> * You check types before going into the critical section, because the
> error handler for a type-error could cause recursive entry here, or
> otherwise mess up the invariants. Similarly for any other error
> conditions that can occur.
> * You use W/O-INTERRUPTS because you don't want a C-c or a timer to
> execute random code while you're there.
> ...and yeah deadlines are just as bad as type errors or asynch
> interrupts here. Maybe the convenient thing is to extend
> W/O-INTERRUPTS to snuffle them for the duration.
> However, the original use-case of WITH-DEADLINE was something along the lines of
> (defun handle-client-request (request)
> (with-deadline (:seconds 1.0)
> that is, have a single deadline around a large body of code, ensuring
> that it will not hang due to a rare deadlock, lost DB connection, or
> It appears to me that W/O-I indiscriminately snuffling deadlines could
> break this assumption, and that it still would not help with unwinding
> issues in general.
By the rationale I gave above, I think that with-deadline is just a
safe with-timeout and should be subject to without-interrupts while
the uses of without-interrupts must be inspected for breakage caused
by this change.
> Independent of W/O-I and deadlines, currently in CONDITION-WAIT there
> is an ALLOW-WITH-INTERRUPTS around the call to GET-MUTEX for a reason:
> I don't think that allowing CONDITION-WAIT to hang forever while
> waiting for another thread to release the mutex is right in the
> presence of deadlines or timeouts. So sometimes it will *not* get the
> mutex back, in which case returning is clearly wrong -- so we need to
> ERROR/unwind. Even if we said no to deadlines or timeouts we could run
> out of heap or stack, which are also SERIOUS-CONDITIONs. (If this
> seems academic now, consider eg. David's heap growing stuff, which
> makes running out of heap a minor inconvenience.)
There is no bulletproof recovery possible from heap and control stack
exhaustion, that's why the corruption warning and --lose-on-corruption
>> onto GET-MUTEX. _However_, the problem is that unwindability does not
>> compose. E.g. in a simplified pseudish implementation of CONDITION-WAIT,
>> re. SIMPLE-CONDITION-WAIT and
>> is not a safe way to specify a timeout for waiting on the
>> condition-variable as we may unwind without having reacquired the lock.
> In both your example and here:
> (with-mutex (mtx)
> (handler-case (condition-wait cvar mtx))
> (serious-condition ()))
> the problem is not the serious condition itself, but the _handler_
> that unwinds into middle of the critical section. The unwind would be
> a non-issue if it went out of the critical section. ...a non-issue
> aside from the handler running random code inside our critical
> section, re-entrancy issues, etc.
Re: re-entrancy: that's a very important point: condition system has
power but lacks control.
> And this issue is not limited to asych interrupts or deadlines -- it
> applies to all calls to ERROR and SIGNAL. Even if we eliminate all
> predictable errors (check types, etc outside the critical section), we
> can still run out of heap or stack, etc.
> I haven't quite thought this through, but I suspect that:
> (0) Users writing handlers must take care to unwind to a consistent
> state. This is generally only an issue for handlers inside critical
> (1) There are two kinds of critical sections of interest here:
> re-entrant and non re-entrant.
> (2) It is safe to call ERROR or SIGNAL inside a re-entrant section,
> assuming the state exposed is a consistent one.
> (3) It is never safe to call ERROR or SIGNAL inside a non re-entrant
> section, OR the handlers for that run must be aware of the critical
> section -- and the possible consistency violations in effect during
> the error situation.
> (4) UNWIND-PROTECT cleanup clauses inside critical sections must be
> written aware of possible consistency violations due to unwinding from
> Numbers 2 and 4. are not too problematic, IMO -- aside from needing to
> write re-entrant code in the first place.
> Number 3. is unnecessarily hard to write, and in this particular case
> (CONDITION-WAIT, that is) it is compounded by user and system code
> interacting with the same objects inside the same critical section.
> Basically, I suspect the condition variable API should look more like
> (with-condition-variable (queue mutex)
> (condition-wait queue mutex)
> and errors from inside CONDITION-WAIT should force an unwind from the
> critical section before signaling the error.
> While condition variables are a poster child for this problem, it is
> more general Perhaps something we should package up into
> HANDLER-UNWIND, and for efficiency's sake make SIGNAL and friends
> directly aware of it:
> (defvar *handler-unwind* nil)
> (defmacro handler-unwind (&body body)
> "Executes BODY as an implicit PROGN.
> Any call to ERROR, CERROR, or SIGNAL during dynamic scope of BODY will unwind
> from out of BODY before the condition is signalled. On normal exit returns the
> values of BODY.
> Intended for unwinding from non-reentrant critical sections for outer
> handlers run."
> `(flet ((call-with-unwind ()
> (progn ,@body)))
> (if *handler-unwind*
> (let ((cont (cons nil nil)))
> (declare (dynamic-extent cont))
> (block 'handler-unwind
> (dx-flet ((unwind (fun args)
> (setf (car cont) fun
> (cdr cont) args)
> (go :unwind)))
> (let ((*handler-unwind* #'unwind))
> (return-from handler-unwind (call-with-unwind))))
> (funcall (car cont) (cdr cont))))))))
> ;;; %SIGNAL is what SIGNAL now does
> (defun signal (datum &rest arguments)
> (if *handler-unwind*
> (funcall *handler-unwind* (lambda () (%signal datum arguments)))
> (%signal datum arguments)))
> ;;; Similarly for ERROR and CERROR.
I see how this would mitigate reentrancy issues, but would render the
handlers established within the dynamic extent of a handler-unwind
> Anyways, on to condition variables in specific.
> This strikes me as an excellent example of how and why condition
> variables are a terrible design for lisp (haven't done enough posix
> threading in C to comment on things there): we need to document a
> crapload of caveats for the user to be able to safely use them.
> I strongly suspect most of the problems come from there being two
> moving parts -- the mutex and waitqueue in the interface. In most sane
> synchronization constructs the API has only a single moving part --
> the complexity is inside. With cvars the complexity is all over the
> So, to get back on topic. I think allowing W/O-I to snuffle deadlines
> is worth considering, but I am concerned about getting into situations
> where WITH-DEADLINE no longers does what it was supposed to do.
> WITHOUT-DEADLINE seems like a more reasonable tool, in the sense that
> it will no change semantics of existing code.
I'd vote for with-interrupts affecting deadlines and existing code be audited.
> -- Nikodemus