From: Russell M. <rus...@ya...> - 2004-10-05 12:47:00
|
I was able to build sbcl-0.8.14 on NetBSD 2.0_BETA from source using CLISP 2.33. It seems to work (woohoo!). Also, I ran the tests, and they seemed to get pretty far, but eventually died like so: //running exhaust.impure.lisp test ; in: LAMBDA NIL ; (SB-KERNEL:FLOAT-WAIT) ; ; note: deleting unreachable code ; compilation unit finished ; printed 1 note unhandled SIMPLE-ERROR in thread 3819: segmentation violation at #X0 unhandled condition in --disable-debugger mode, quitting test exhaust.impure.lisp failed, expected 104 return code, got 1 Not sure if this is a known issue, so I report it here. I plan to keep using sbcl on NetBSD going forward, building from source, reporting bugs, fixing what I can (probably not much). Also, I would be willing to build and make available NetBSD binaries going forward. Would this be useful? -russ |
From: Christophe R. <cs...@ca...> - 2004-10-05 13:36:58
|
Russell McManus <rus...@ya...> writes: > I was able to build sbcl-0.8.14 on NetBSD 2.0_BETA from source using > CLISP 2.33. It seems to work (woohoo!). Excellent. > Also, I ran the tests, and they seemed to get pretty far, but > eventually died like so: > > //running exhaust.impure.lisp test > unhandled SIMPLE-ERROR in thread 3819: segmentation violation at #X0 > unhandled condition in --disable-debugger mode, quitting > test exhaust.impure.lisp failed, expected 104 return code, got 1 > > Not sure if this is a known issue, so I report it here. I think at least Richard Kreuter knows about it. (I'm not sure that he knows how to solve it). It's a little painful to debug, but the idea is that the system memprotect()s various different pages on the regular stack at different times, to ensure that stack exhaustion is detected. It is of course possible that you're finding something which doesn't quite work in NetBSD's SA_SIGINFO signal handling, or it could be a subtle bug in sbcl's memory_fault_handler() itself. > I plan to keep using sbcl on NetBSD going forward, building from > source, reporting bugs, fixing what I can (probably not much). Also, > I would be willing to build and make available NetBSD binaries going > forward. Would this be useful? Having binaries available would be good, I think. Cheers, Christophe |
From: Richard M K. <kr...@pr...> - 2004-10-07 00:13:45
|
Christophe Rhodes <cs...@ca...> writes: > Russell McManus <rus...@ya...> writes: > >> I was able to build sbcl-0.8.14 on NetBSD 2.0_BETA from source using >> CLISP 2.33. It seems to work (woohoo!). Note: the particular bug (whatever it is) is involved with signal handling code that's changed somewhat from SBCL 0.8.14 to 0.8.15, described below. The same sort of error occurs in SBCL 0.8.15, though. >> Also, I ran the tests, and they seemed to get pretty far, but >> eventually died like so: >> >> //running exhaust.impure.lisp test >> unhandled SIMPLE-ERROR in thread 3819: segmentation violation at #X0 >> unhandled condition in --disable-debugger mode, quitting >> test exhaust.impure.lisp failed, expected 104 return code, got 1 >> >> Not sure if this is a known issue, so I report it here. > > I think at least Richard Kreuter knows about it. (I'm not sure that > he knows how to solve it). Both of these are true, but I'm still working on it. This is the only repeatable failure I've found in the SBCL tests on NetBSD, fwiw. > It's a little painful to debug, but the idea is that the system > memprotect()s various different pages on the regular stack at > different times, to ensure that stack exhaustion is detected. It is > of course possible that you're finding something which doesn't quite > work in NetBSD's SA_SIGINFO signal handling, or it could be a subtle > bug in sbcl's memory_fault_handler() itself. The following things work for me: (1) the signal handler setup succeeds and the handler runs on the alternate signal stack, (2) mprotect is being called at the right times and is working, according to pmap. However, I always get segfaults somewhere after the end of call_into_lisp (in src/runtime/x86-assem.S) before reaching the Lisp debugger. The page containing the fault address is the then-writable control_stack_guard_page. I think the problem is in the setup for the call into Lisp: the eax register is zero at the end of call_into_lisp on NetBSD, but contains a non-zero address at the same point of execution on GNU/Linux (it's supposed to contain a pointer to the lexenv for the Lisp function to be called, IIUC). Tracing backwards, the word that's copied into eax at/around line 216 of x86-assem.S is zero, but I'm not sure who's supposed to be stashing the desired value into that address. Hope that helps, Richard |
From: Richard M K. <kr...@pr...> - 2004-10-07 00:25:29
|
Richard M Kreuter <kr...@pr...> writes: > Christophe Rhodes <cs...@ca...> writes: >> Russell McManus <rus...@ya...> writes: > > Note: the particular bug (whatever it is) is involved with signal > handling code that's changed somewhat from SBCL 0.8.14 to 0.8.15, > described below. Oops. I description of the intended stack exhaustion handling behavior, but then cut it out. In case it's useful to somebody, I added it to the sbcl-internals wiki: http://sbcl-internals.cliki.net/stack%20exhaustion -- Richard |
From: Nikodemus S. <tsi...@cc...> - 2004-10-10 14:21:39
|
On Wed, 6 Oct 2004, Richard M Kreuter wrote: > I think the problem is in the setup for the call into Lisp: the eax > register is zero at the end of call_into_lisp on NetBSD, but contains > a non-zero address at the same point of execution on GNU/Linux (it's > supposed to contain a pointer to the lexenv for the Lisp function to > be called, IIUC). Tracing backwards, the word that's copied into eax > at/around line 216 of x86-assem.S is zero, but I'm not sure who's > supposed to be stashing the desired value into that address. Looking at the commit message for 0.8.15.7 "arrange_return_to_lisp_function wasn't restoring esp properly. Not sure it ever makes a difference in practice, but fix it anyway." makes me guess arrange_r_t_l_f is the culprit, or alternatively we're not getting the right/all info out of the signal context (see x86-bsd-os.h and x86-bsd-os.c). This is just guessing, though. Cheers, -- Nikodemus Schemer: "Buddha is small, clean, and serious." Lispnik: "Buddha is big, has hairy armpits, and laughs." |
From: Richard M K. <kr...@pr...> - 2004-10-16 03:46:17
|
> On Wed, 6 Oct 2004, Richard M Kreuter wrote: > >> I think the problem is in the setup for the call into Lisp... I think I figured it out: NetBSD restores esp from the uesp mcontext slot. Dunno if that's a bug in NetBSD, but the following small patch fixes the stack overflow handling here, and a build of tonight's anoncvs with with this patch passes all the tests. Regards, Richard --- sbcl/src/runtime/interrupt.c 2004-10-02 20:57:14.000000000 -0400 +++ sbcl-0.8.15.11/src/runtime/interrupt.c 2004-10-15 22:24:38.000000000 -0400 @@ -679,7 +679,11 @@ *os_context_pc_addr(context) = call_into_lisp; *os_context_register_addr(context,reg_ECX) = 0; *os_context_register_addr(context,reg_EBP) = sp-2; +#ifdef __NetBSD__ /* NetBSD restores esp from uesp, evidently. */ + *os_context_register_addr(context,reg_UESP) = sp-14; +#else *os_context_register_addr(context,reg_ESP) = sp-14; +#endif #else /* this much of the calling convention is common to all non-x86 ports */ --- sbcl/src/runtime/x86-bsd-os.c 2004-07-20 16:20:15.000000000 -0400 +++ sbcl-0.8.15.11/src/runtime/x86-bsd-os.c 2004-10-15 17:30:43.000000000 -0400 @@ -69,6 +69,8 @@ return CONTEXT_ADDR_FROM_STEM(ESI); case 14: return CONTEXT_ADDR_FROM_STEM(EDI); + case 16: + return CONTEXT_ADDR_FROM_STEM(UESP); default: return 0; } --- sbcl/src/runtime/x86-lispregs.h 2000-10-20 19:30:35.000000000 -0400 +++ sbcl-0.8.15.11/src/runtime/x86-lispregs.h 2004-10-15 17:30:52.000000000 -0400 @@ -34,8 +34,9 @@ #define reg_EBP REG(10) #define reg_ESI REG(12) #define reg_EDI REG(14) +#define reg_UESP REG(16) -#define REGNAMES "EAX", "ECX", "EDX", "EBX", "ESP", "EBP", "ESI", "EDI" +#define REGNAMES "EAX", "ECX", "EDX", "EBX", "ESP", "EBP", "ESI", "EDI", "UESP" /* classification of registers * |