From: Gábor M. <me...@re...> - 2009-03-12 10:44:47
|
On Miércoles 04 Marzo 2009, Max-Gerd Retzlaff wrote: > Hello, > > I've tried to run a somewhat complex simulation program that > I've written last year on a current SBCL 1.0.26 Linux/amd64. > Sadly it just stopped running after about 700.000 computations > (of a good 24.000.000 computations of the test case), see > the attached file corruption-warning.txt for the error messages. > > > Luckily, during one run a still open Slime debugger window > remained after it died: > > Backtrace: > 0: (SB-KERNEL::CONTROL-STACK-EXHAUSTED-ERROR) > 1: ("foreign function: #x421442") > 2: ("foreign function: #x421520") > 3: ((FLET #:BODY-FUN-[GETHASH3]1045)) > 4: (SB-IMPL::GETHASH3 NO-INTERSECTION #<HASH-TABLE :TEST EQ :COUNT > 876 {1000217811}> NIL) > 5: ((FLET SB-THREAD::WITH-RECURSIVE-SYSTEM-SPINLOCK-THUNK)) > 6: ((FLET > #:WITHOUT-INTERRUPTS-BODY-[CALL-WITH-RECURSIVE-SYSTEM-SPINLOCK]366)) > 7: (SB-THREAD::CALL-WITH-RECURSIVE-SYSTEM-SPINLOCK ..) > 8: (SB-KERNEL:FIND-CLASSOID-CELL NO-INTERSECTION)[:EXTERNAL] > 9: (SB-KERNEL:FIND-CLASSOID NO-INTERSECTION NIL) > 10: (MAKE-CONDITION NO-INTERSECTION)[:EXTERNAL] > 11: (SIGNAL NO-INTERSECTION)[:EXTERNAL] > 12: (COMPUTE-QS (17.67897 26.5144) 68.18379451681811) > 13: (DECOMPOSITION ..) > > So SIGNALling my custom condition NO-INTERSECTION has been > a problem, though at that time it had already been SIGNALled > some hundred thousand times.. That has given me a hint to > SIGNAL, conditions, and HANDLER-CASE (as that is used to > catch those conditions). > > > As I've known that SBCL 1.0.15 is running my application > without problems I started to binary search on the SBCL > binary releases available on the homepage (a nice thing!): > > 1.0.15 and 1.0.17 caused no problems, > 1.0.18, 1.0.19, 1.0.20, 1.0.23, and 1.0.26 did. > > Looking at the release notes of 1.0.18 at > http://sbcl.sourceforge.net/all-news.html#1.0.18 > the following comment caught my eye: > - optimization: simple uses of HANDLER-CASE and HANDLER-BIND > no longer cons. > > HANDLER-BIND is defined in src/code/defboot.lisp so I've looked > through the CVS logs of that file. Right after the tag > sbcl_1_0_17 there is an interesting commit: > > -- zipp -- > Revision 1.59 - (view) (download) (annotate) - [select for diffs] > Fri May 30 11:32:15 2008 UTC (9 months ago) by demoss > Branch: MAIN > Changes since 1.58: +88 -61 lines > Diff to previous 1.58 > > 1.0.17.8: use dynamic-extent in HANDLER-CASE and HANDLER-BIND > > * Hairier then I would have liked due to need not to leak the stack > allocation policy to user code. See my email to sbcl-devel: > "Future of sb-c:stack-allocate-dynamic-extent" for related > discussion. > > * Also eliminate one redundant FLOAT-WAIT by splitting HANDLER-BIND > into two parts, and using the more primitive one -- one that > doesn't inject FLOAT-WAIT on its -- to implement HANDLER-CASE. > -- zapp -- > > > I've looked at the patch itself, compared the revision 1.58 > of defboot.lisp with the current one of SBCL 1.0.26, and brutally > replaced the current definitions %HANDLER-BIND, HANDLER-BIND, and > HANDLER-CASE by the old definitions HANDLER-BIND and HANDLER-CASE. > (See attached file: > > defboot.lisp_SBCL-1.0.25-with-HANDLER-BIND-HANDLER-CASE-of-SBCL-1.0.1 >7.patch ) > > Afterwards I recompiled SBCL and fired up my application, e voila: to > my surprise it really worked and produced correct results! Nice! > > > Well ... > > I haven't looked closer at the changes to HANDLER-BIND and -CASE. > I don't know what the real problem is. Obviously, it can't be > completely wrong as the patch is applied to SBCL since 30th of May, > 2008. > > But it'd be quite hard to get down to the single call that provoked > that error in my case. And even more to actually get a simple test > case that triggers it. The simulation has a main loop containing > that HANDLER-BIND and within it a quite complex and nested > intersection algorithm is called. Somewhere in that is a certain call > to SIGNAL that makes problems, perhaps only in certain circumstances. > > > Hopefully this report still helps, mainly because of the > backtrace and as it points to a single "evil" patch of last year. > The search itself took me two or three hours.. (and I'm happy > enough that I can now run the program with a current SBCL.) > > Bye and thanks for your work, > Max corruption-warning.txt has "control stack exhausted" followed by "memory fault" in the same thread. Did you handle control-stack-exhausted manually from the debugger? The patch you have found stack allocates stuff in handler-{bind,case} which means you run out of stack faster if you have many of these nested. In the other mail, you say that there are no HANDLER-CASEs. Can you verify this again and also that there no nested HANDLER-BINDs? Cheers, Gabor |