And I think these are even real fixes, not just cheezy workarounds! :)
There were several problems with the debugger code and stack handling
that caused the PPC to have errors on backtrace. This was annoyingly
present in Slime, whose automatic backtraces often threw you into to the
TTY debugger with the hated message "-2834122 is not of type (MOD 0
These patches solve (at least) two cases. I've referred to the as the
(/ 1 0) case, and the (asdjfahsdi 1 0) test cases.
For the (/ 1 0) case, what was happening was that it would call
TWO-ARG-/, tail-call INTEGER-/-INTEGER, and call the SIGNED-TRUNCATE
SB!DI::FIND-ESCAPED-FRAME gets called when the PC was in an assembly
routine, and it has a branch of execution to handle that case.
Unfortunately, it tries to use $LRA to find out who called the assembly
routine. On PPC, with the "fast call" assembly routines this will not
work! Since INTEGER-/-INTEGER had not set $LRA, it was still pointing
back into whoever called TWO-ARG-/.
On the PPC, we BLA into fast call assembly routines, which sets LR to
the next instruction after BLA. It returns with a simple BLR. So to
fix this behavior, I added a FIND-PC-FROM-ASSEMBLY-FUNCTION function.
This has architecture-specific code in it to find it's way out of an
assembly routine properly. Right now it only has PPC code, the rest
falls back onto the old LRA method. I guess this should probably be
split out into the arch-vm.lisp files, possibly, but this was easier at
This seems to fix this particular case. I also kept my "workaround"
approach in there, which was to set pc-offset to 0 if it is not in the
code object after adjustment. It's actually tighter now than before,
since /any/ pc-offset outside the code object will be caught, not just
one that's (< 0 pc-offset most-positive-fixnum) like before. I added a
somewhat lengthy report that says to mail sbcl-devel about the cause of
this message, so perhaps finding and fixing other cases will be easier.
As a side benefit, the backtraces won't crash in Slime either.
The second case, the (hdfausdhf 1 0) case, was a problem with
undefined_tramp. This was really broken on PPC. Thanks to Raymond Toy
for noticing that the PPC undefined_tramp looked rather incomplete
compared to the Sparc one.
The PPC undefined_tramp was not doing any setup of $CODE. There were
instructions there to do it, but they were never executed. It actually
looked a bit like the Sparc version, but the Sparc has a weird execution
pattern that seems to do the stuff in the proper order. If the PPC
tramp was copied from Sparc, it's possible this was overlooked.
I made it run the fixup code and branch back into the trap. This
actually made the ("undefined function") frame show up for the first
time! Unfortunately, if the undefined_tramp was tail-called, that was
the /only/ frame you saw. (This was the problem I found with Slime
having a low debug level.)
Through quite a bit of digging (curse Apple's gdb!; it can't pass traps)
I discovered that, when tail called, $CFP is right, but $CSP is some
random value. So on the first entry into the undefined_tramp, I set
$CSP to $CFP. undefined_tramp should (AFAICT) always be entered as a
function call, and since normal functions set up their $CSP based of the
extant $CFP, it should be safe for me to as well. As a bonus, if $CSP
is the same as $CFP, the C interrupt code will do some stack stuff so I
don't have to.
With all of that stuff, I can't get PPC to print a backtrace error
report anymore! Also, some examples that I had that did backtrace
successfully but had "bogus stack frame"s now don't. I'm sure there's
more stuff wrong, but I can't find it at the moment.
For other architectures, some code to properly find the PC that called a
fast assembly routine should be added to FIND-PC-FROM-ASSEMBLY-FUNCTION
in debug-int.lisp. Also, if they ever have backtrace issues in their
undefined_tramp, it might be worthwhile to see if it only happens on a
tail-call, in which case it might be a similar issue to the one above.
(Some decent test code for undefined_tramp issues is:
(defun bar (x y) (zot x y))
(defun bar2 (x y) (zot2 x y) (+ x y))
compile with high speed and low debug to ensure that tail-call
optimization is enabled. If BAR generates a broken backtrace but
BAR2 doesn't, it may be an issue like described above.)
Also, these issues /are/ present in CMUCL as far as I can tell. I
haven't forwarded this to cmucl-imp, as it's an SBCL-specific patch, but
they should probably be told about it.
(Although I'm pretty sure rtoy is going to make sure that they do:
<bdowning> Good call rtoym!
<rtoym> Amazing! I guessed right!
<rtoym> Anyway, I'm being selfish. I hope that cmucl/ppc will eventually
get these fixes too. :-)
-bcd, happy that his backtraces finally work
*** Brian Downing <bdowning at lavos dot net>