Raymond Wiker writes:
> Christophe Rhodes writes:
> > Raymond Wiker <Raymond.Wiker@...> writes:
> > > The actual problem appears to be that run-program in some
> > > cases is unable to detect that an external program has terminated. In
> > > the case of make-target-contrib, the sb-grovel thingy compiles a C
> > > program into an executable named "a.out" (funnily enough :-) and runs
> > > this. sbcl starts to spin, using ~90% of CPU (the rest taken by "top",
> > > in my case), and if I use "ps", I cannot see that "a.out" is running
> > > at all.
> > >
> > > Does this sound familiar to anyone?
> > Vaguely familiar, yes, though I've never seen it break during
> > make-target-contrib.
> This seems quite repeatable on my PowerBook (12" AlBook),
> MacOSX 10.2.8, although it seems to be a Heisenbug - the operation
> sometimes succeeds.
> > This looks like the same thing as bug #190 from the BUGS file, which
> > I've seen on linux/ppc. I've never been able to reproduce it on
> > demand or get very far into debugging it, though.
> It doesn't seem to be related to buffering between the
> subprocess and SBCL - the call from asdf sets up a subprocess to send
> it stdout to a file. Could be that it fills up an stderr descriptor,
> though :-)
> If it's not a buffering issue, then it may be that the wait3
> operation fails. As I said earlier, the process it should be waiting
> for is nowhere to be seen. It may also be spinning somewhere else; I
> can't see any reason that sbcl should use a lot of cpu in wait3 :-)
> > Sorry not to be any more helpful :-/
> Helpful enough - at least I know that others have seen a
> problem in this area (actually, I saw something about this on the
> mailing list earlier, and I *should* have checked BUGS before
I've been looking a bit closer at this problem, and it seems
to be related to timing. Consider this little test program:
(defun test-run-program (command &optional (sleep-interval 0) (iterations 50))
(dotimes (n iterations)
(sb-ext:run-program "/bin/sh" (list "-c" command)
:input nil :output *standard-output*)
(format *standard-output* "~&Done.~%"))
If I run this with command = "uname -a", sbcl locks up (actually,
starts spinning) within a few iterations. I suspect that it is
spinning within the following, snipped from src/code/run-program.lisp,
and called from run-program:
(defun process-wait (proc &optional check-for-stopped)
"Wait for PROC to quit running for some reason. Returns PROC."
(case (process-status proc)
(when (zerop (car (process-cookie proc)))
The process status is updated by a sigchld handler, which
is set up by run-program.
I ran it with sleep-interval .25 and iterations 1000, and it
locked up after 509 iterations.
I'm now running it with sleep-interval .3, and it looks as if
it may actually run to completion --- yes, it does, and it even succeeds
if I try again :-)
Looking at src/runtime/interrupt.c, it seems that only 1
signal can be preserved while in a sb-sys:without-interrupts region -
if more than 1 signal is delivered, then the previous signal data will
be overwritten by the latest. This is one possible explanation; I have
instrumented interrupt.c to detect this situation and print out the
old and new values for signal and handler, but I haven't built sbcl
with this yet.
I had a hunch that adding a call to
get-processes-status-changes before or after the call to
serve-all-events may make this code slightly more robust - a missing
sigchld should not stop us from retrieveing the child status via
get-processes-status-changes. Unfortunately, this does not appear to
be the case :-(
This is on a 12" PowerBook G4, 867 MHz G4, 640 MB RAM, MacOSX
Raymond Wiker Mail: Raymond.Wiker@...
Senior Software Engineer Web: http://www.fast.no/
Fast Search & Transfer ASA Phone: +47 23 01 11 60
P.O. Box 1677 Vika Fax: +47 35 54 87 99
NO-0120 Oslo, NORWAY Mob: +47 48 01 11 60
Try FAST Search: http://alltheweb.com/