Time flies. With #lisp becoming immune to async unwind, it's now=20
sbcl-devil's time to think.
The following might become part of the internals manual if the patch=20
goes in, hence the repetition from earlier posts.
* Issues with asynchronous unwinds (AUs)
Consider the following example:
(setq fd (open ...))
There are several ways things can go wrong:
** (SETQ FD (OPEN ...)) can be interrupted and unwound just after OPEN
returns but before SETQ is done
** foreign code may not like being aborted: imagine a Berkeley db
update call in the second ..., it will not like being unwound in
** cleanups may not be run: if the cleanup is interrupted it can be
unwound without closing the fd
** AUs can clobber each other
Suppose we have a thread that can be terminated cleanly:
(defun greet ()
(loop (write-line "Hello, world") (sleep 1))
(write-line "Good bye")))
(let ((thread (make-thread #'greet)))
(sleep (random 6))
So far so good. But what if two other threads try to terminate it at
the same time? The second terminate can hit while the first unwinding
is in still progress. This is not a problem since the target of the
two throws is the same. Now, what happens if there is another
AU with a different target?
(defun greet-impatiently ()
Let's try terminating it:
(let ((thread (make-thread #'greet-impatiently)))
(sleep (random 6))
There are several possible outcomes, but the most interesting is when
TERMINATE-THREAD starts unwinding then the timeout hits does another
NLX to the HANDLER-CASE and the thread termination request is
lost. Note that the UNWIND-PROTECT in GREET is not strictly needed for
this scenario to occur.
In general it is very hard to write reliable programs when multiple
AUs play and can cancel or steer away (see
http://www.lisp.org/HyperSpec/Issues/iss152-writeup.html) an ongoing
By definition an interruption is a function that is invoked
** AU unsafe zone
A thread is said to be in AU unsafe zone iff execution is within a
WITHOUT-ASYNCHRONOUS-UNWIND form, an UNWIND-PROTECT cleanup or it's
Note that there can be multiple simultaneous unwinds:
(throw 'aaa 'a)
(throw 'bbb 'b))))
The implementation (on x86 only for the time being) keeps track of
the outermost unwind-protect block that's being unwound to.
If an AU occurs in an unsafe zone then the current AU handlers get to
decide what happens.
** AU handlers
Interruptions are run in a pretty normal dynamic environment with
all the condition handlers that were setup in the thread. When an
NLX occurs that leaves the interruption UNSAFE-ASYNCHRONOUS-UNWIND
(a SERIOUS-CONDITION) is signalled if the interrupted thread is in
an unsafe zone.
Now if this condition was handled by the normal condition handlers a
simple HANDLER-CASE for SERIOUS-CONDITION could unknowingly unwind
the stack when it's unsafe. Hence, UNSAFE-ASYNCHRONOUS-UNWIND is
signalled with a different set of condition handlers active. These
handlers can be established by ASYNCHRONOUS-UNWIND-HANDLER-BIND or
from within the interruption with PUSH-INTERRUPTION-UNWIND-HANDLERS,
POP-INTERRUPTION-UNWIND-HANDLERS, the setfable
INTERRUPTION-UNWIND-HANDLERS. (Bleh. I can see no way around this.)
These handlers are run, protected by a WITHOUT-INTERRUPTS, when the
interruption is about to be left (from an UNWIND-PROTECT cleanup
around the interruption, in fact). The *only* restarts available are
RETRY-LATER, ABORT (aborts the unwind) and CONTINUE (forces the
unsafe unwind to continue). The AU handler may do an NLX by any of
the usual suspects (THROW, RETURN-FROM, GO), but if it signals a
condition only the AU handlers are there to help.
Behind the scenes interruptions are run by INVOKE-INTERRUPTION that
on a NLX from the interruption signals an UNSAFE-ASYNCHRONOUS-UNWIND
** The RETRY-LATER restart
The default AU handler invokes this restart.
This is similar to the ABORT restart, but it sets up a timer to run
the whole interruption again in a short time (~0.1s,
randomized). Alternatively the implementation could poll for
interruptions when the unsafe zone is potentially exited. This is
not done since detecting unsafe zones is currently expensive,
because the stack is searched for cleanup frames.
** Foreign code
Foreign code is to be wrapped in WITHOUT-ASYNCHRONOUS-UNWIND by
default. System calls are a different: they can be interrupted
without ill effects.
Consider the slightly modified example:
(setq fd (open ...)))
This is bullet-proof wrt to open/close. The problems outlined above
are gone: the fd cannot be lost by unwinding just after OPEN returns
but before the SETQ is completed; the cleanup and CLOSE within it is
guaranteed to run without much disturbance.
But an unwind due to timeout cannot happen in the AU unsafe zones, and
that means OPEN and maybe CLOSE should take a timeout argument.
Also note that WITH-TIMEOUT doesn't work in cleanup forms (!), so this
will not return in one second if CLOSE runs into problems:
One can work around that by:
but more care is needed as it allows an ongoing unwind to be lost.
* An alternative: double cross implementation
It can be argued that an AU becomes a problem only if it would cross
a WITHOUT-ASYNCHRONOUS-UNWIND, a cleanup, or cancel/steer away an
ongoing unwind (i.e. it crosses an border of an interruption and an
unsafe zone delimiter).
Ultimately, it is about user expectations about what can go wrong
and when. It's unclear if such a weakening of semantic the
guarantees is worthwhile. In the above example if CLOSE has a
HANDLER-CASE for ERROR then an AU can still make it fail without
closing the fd.
** detection of cleanups is slow (walks the stack)
** detection of cleanups is racy
And here is the patch: http://retes.hu/~mega/au.patch
It has just reached a state where one can experiment with C-c-ing:
(without-asynchronous-unwind () (sleep 5))
(unwind-protect t (sleep 5))
and get reasonable behaviour, so it must be ready for comments.