Thiemo Seufer <ths@...> writes:
> Christophe Rhodes wrote:
>> Peter Van Eynde <pvaneynd@...> writes:
>> Wow, the patch looks lovely. I take it that with this patch, the
>> system can build itself reliably, pass (some or most of) its
>> regression tests, and so on?
> I haven't done any testing beyond a package build yet. This seems
> to work reliable, at least on a faster machine than the debian buildds,
> and the build includes running some tests in contrib. Four of the
> posix file access tests there fail, the rest looks fine.
OK, I've committed your patch to CVS HEAD. Thank you.
Before crying complete victory, though, it would be good to
stress-test the system a bit more thoroughly, because...
> The C->lisp call forgot to save the global pointer, which means it must
> fail on return as soon as the lisp part clobbers the contents of the
> $gp register.
OK. I don't *think* this is responsible for previous instability,
because the primary call to lisp never returns. It is true that a
nested call_into_lisp is possible, if a Unix signal handler runs lisp
code, but in the normal course of an sbcl build there should be no
such signal received. So, while this fix is of course valuable, it
might not be the only thing wrong.
> It also didn't reserve the callee's argument register
> space on stack, but I guess lisp never tries to take the address of a
> C function argument, so this bug never triggered. Another place which
> can cause random segfaults is the broken branch/jump emulation, but I
> think the only use of it is in the debugger (which means sbcl crashes
> for breakpoints on branches/jumps which span an offset larger than
> +0 / +32k, once the breakpoint is removed again).
Again, none of this is likely to be the cause of the intermittent
segfaults, though of course I could be wrong. (I hope I'm wrong!)
My thought, before I got enough Real Work that I abandoned the issue,
was that it was likely due to insufficient cache flushing; I'll be
happy if you tell me that this is in fact not the case, and that the
new sbcl works perfectly everywhere. A good stress-test of the system
is a number of consecutive builds; if you have access to the hardware
to do this overnight on a couple of machines, or somesuch, that would
be good evidence one way or the other.
In any case, thank you very much for the patch.