May 2 07:14:39 shit kernel: lisp.run[19883]: segfault at 1 ip 08068794 sp bf8d26d0 error 6 in lisp.run[8048000+18c000]
I am use CLISP for a project, unfortunately I can't reveal the code for it, but I can describe the application, and list for you the modules used.
Basically CLISP is loading in as a web application, using the following switches:
#!/usr/bin/clisp -q -q -q -q -norc
then the code...
This is actually a stub to load in FAS's, and execute the actual application.
Modules used are:
asdf Revision: 1.110, with a small modification to not print certain annoyance messages
cl-ppcre 1.3.2
puri 1.5.1
acl-compat dated 2006-01-22 in the Changelog
There is one other module loaded that is part of the project, but it does not segfault any place else.
The only thing I can think of that can cause a similar effect is if STDOUT/STDERR go away, like what would happen on an aborted web page containing lots of data.
If there is something I can do to debug this, please let me know so I can home in on the cause and get to you the information you really need to fix the problem. It's not fatal, just annoying, and the information here says to report segfaults... ;-) And it should be cosmetically cleaned up.
I also notice when building CLISP 2.44.1 that there are many segfaults caused during the build phase...
The exact method of how it is built can be located at the following URL (it's a slackbuild) in case you would like to look at how I build it... Small note on that too, is I had to increase the ulimit to 32768, or else the build fails, you may want to update the size to that during the warning when configuring...
ftp://ftp.uglyplace.org/pkg_dreams/Slackware-12.0.0/repos/development/clisp/2.44.1/src/
Logged In: YES
user_id=2049191
Originator: YES
OOoooo... interesting (thanx for the debugging tip!)
[1]>
Program received signal SIGHUP, Hangup.
0xb7dd3cce in __read_nocancel () from /lib/libc.so.6
(gdb) continue
Continuing.
*** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
Breakpoint 1, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
39 spvw_sigsegv.d: No such file or directory.
in spvw_sigsegv.d
(gdb) bt 20
#0 sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
#1 0x08067b9b in sigsegv_handler (fault_address=0x0, serious=1)
at spvw_sigsegv.d:55
#2 0x0812d357 in sigsegv_handler ()
#3 <signal handler called>
Backtrace stopped: previous frame inner to this frame (corrupt stack?)
Will dig as deep as I can, thanx!
Logged In: YES
user_id=5735
Originator: NO
1. you might want to set a breakpoint in quit_on_signal
2. please take a look at http://clisp.podval.org/impnotes/faq.html#faq-debug and tell me what's missing there.
Logged In: YES
user_id=2049191
Originator: YES
[1]>
Program received signal SIGHUP, Hangup.
0xb7e11cce in __read_nocancel () from /lib/libc.so.6
(gdb) step
Single stepping until exit from function __read_nocancel,
which has no line number information.
quit_on_signal (sig=1) at spvw_sigterm.d:46
46 local void quit_on_signal (int sig) {
(gdb) print sig
$1 = 1
(gdb) step
Breakpoint 18, quit_on_signal (sig=1) at spvw_sigterm.d:47
47 if (quit_on_signal_in_progress) { /* quit without much ado */
(gdb) print quit_on_signal_in_progress
$2 = false
(gdb) step
54 quit_on_signal_in_progress = true;
(gdb) step
55 pushSTACK(Symbol_value(S(error_output))); fresh_line(&STACK_0);
(gdb) call pushSTACK(Symbol_value(S(error_output)))
No symbol "pushSTACK" in current context.
(gdb) print STACK_0
No symbol "STACK_0" in current context.
(gdb) print error_output
No symbol "error_output" in current context.
(gdb) step
*** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
Breakpoint 16, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
39 fputs("\n",stderr);
(gdb)
Seems the line it explodes on, is a macro, as I could not print the variables, or even call the functions manually...
I'll look at what it unrolled to, and take another look.
Seems the segfault is on this:
spvw_sigterm.d:55
55 pushSTACK(Symbol_value(S(error_output))); fresh_line(&STACK_0);
I'll see what I can dig up tonight, since I got some time to dig.
Logged In: YES
user_id=5735
Originator: NO
pushSTACK, STACK, Symbol_value, S are all macros, defined in lispbibl.d (or clisp.h in a binary installation).
please examine the value of STACK:
p STACK
p STACK[-1] ;; this is STACK_0
xout STACK[-1]
Logged In: YES
user_id=2049191
Originator: YES
Sorry about the slow reply. This is the debug data.
[1]>
Program received signal SIGHUP, Hangup.
0xb7e6bcce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$1 = (gcv_object_t *) 0x0
(gdb) p STACK[-1]
Cannot access memory at address 0xfffffffc
(gdb) xout STACK[-1]
Cannot access memory at address 0xfffffffc
(gdb)
looks like STACK is a null pointer :-) Yep, that'll cause it to go up in flames!
Logged In: YES
user_id=5735
Originator: NO
thanks. I cannot imagine how this could happen though.
start clisp under gdb, when you get the prompt, interrupt clisp to get to the gdb promt, then do
(gdb) p STACK
(gdb) watch STACK
(gdb) continue
then send SIGHUP and see what happens.
Logged In: YES
user_id=2049191
Originator: YES
I apologize again for the late reply, Had to do my day job...
I can actually see why it is breaking ... very interesting output, I hit ^C and STACK is a null pointer X-D
Obviously it is not initialized for some reason... and the sigsegv catcher is helping it along after something is done on the stack...
Here's the trace as directed:
[1]>
Program received signal SIGINT, Interrupt.
0xb7da9cce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$1 = (gcv_object_t *) 0x0
(gdb) watch STACK
Watchpoint 19: STACK
(gdb) continue
Continuing.
Watchpoint 19 deleted because the program has left the block in
which its expression is valid.
0xb7da9cce in __read_nocancel () from /lib/libc.so.6
(gdb) step
Single stepping until exit from function __read_nocancel,
which has no line number information.
Program received signal SIGHUP, Hangup.
0xb7da9cce in __read_nocancel () from /lib/libc.so.6
(gdb) step
Single stepping until exit from function __read_nocancel,
which has no line number information.
quit_on_signal (sig=1) at spvw_sigterm.d:46
46 spvw_sigterm.d: No such file or directory.
in spvw_sigterm.d
(gdb) step
Breakpoint 18, quit_on_signal (sig=1) at spvw_sigterm.d:47
47 in spvw_sigterm.d
(gdb) step
54 in spvw_sigterm.d
(gdb) step
55 in spvw_sigterm.d
(gdb) step
*** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
Breakpoint 16, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
39 spvw_sigsegv.d: No such file or directory.
in spvw_sigsegv.d
(gdb)
And now for something completely different...
Here is the trace after I hit one key! I typed a '('...
[1]>
Program received signal SIGINT, Interrupt.
0xb7dfacce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$2 = (gcv_object_t *) 0x0
(gdb) watch STACK
Watchpoint 20: STACK
(gdb) c
Continuing.
Watchpoint 20: STACK
Old value = (gcv_object_t *) 0x0
New value = (gcv_object_t *) 0xb7b4609c
0xb7dfacce in __read_nocancel () from /lib/libc.so.6
(gdb) c
Continuing.
Watchpoint 20 deleted because the program has left the block in
which its expression is valid.
0xb7d5c55e in __ctype_get_mb_cur_max () from /lib/libc.so.6
(gdb) c
Continuing.
[1]> (format t "Hello world!")
Hello world!
NIL
[2]>
Program received signal SIGINT, Interrupt.
0xb7dfacce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$3 = (gcv_object_t *) 0x0
(gdb)
Note how it went back to a NULL pointer! ACK!
Yep, something is misbehaving...
Logged In: YES
user_id=2049191
Originator: YES
Would you like me to patch this and send a diff?
It's a fairly obvious bug ;-)
Logged In: YES
user_id=5735
Originator: NO
I would love to see your patch, but I doubt this is something simple or obvious: nobody has seen this before.
e.g., I am not interested in sweeping the problem under the rug by checking STACK in quit_on_signal.
Logged In: YES
user_id=2049191
Originator: YES
Ok, That took away my second option...
I'm thinking of forcing it to point at something legal to begin with at the beginning of the program...
Here is the details of my build, so that you can see if anything is amiss... One never knows...
Perhaps you can spot some detail, perhaps it's a bad option, or a compiler corner case being tripped up.
GNU CLISP 2.44.1 (2008-02-23) (built 3419009049) (memory 3419009665)
Software: GNU C 4.1.2
gcc -g -O2 -Igllib -W -Wswitch -Wcomment -Wpointer-arith -Wimplicit -Wreturn-type -Wmissing-declarations -Wno-sign-compare -O2 -fexpensive-optimizations -falign-functions=4 -DUNICODE -DDYNAMIC_FFI -I. -x none /usr/lib/libreadline.so -lncurses -ldl /usr/lib/libavcall.a /usr/lib/libcallback.a -L/usr/lib -lsigsegv -L/usr/lib -lc
SAFETY=0 HEAPCODES LINUX_NOEXEC_HEAPCODES GENERATIONAL_GC SPVW_BLOCKS SPVW_MIXED TRIVIALMAP_MEMORY
libsigsegv 2.4
libreadline 5.2
Features:
(READLINE REGEXP SYSCALLS I18N LOOP COMPILER CLOS MOP CLISP ANSI-CL COMMON-LISP
LISP=CL INTERPRETER SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI
GETTEXT UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
C Modules: (clisp i18n syscalls regexp readline)
Installation directory: /usr/lib/clisp-2.44.1/
User language: ENGLISH
Machine: I686 (I686) shit.apartment [192.168.123.2]
Logged In: YES
user_id=5735
Originator: NO
"-O2" makes it undebuggable. whatever you have done is thus a waste of time.
please rebuild without ANY optimizations.
PS. setting STACK to "something legal" is done long before the first prompt is printed.
Logged In: YES
user_id=2049191
Originator: YES
Ok
I removed the optimization and configure placed the -O2 in it on it's own.... Is there a switch to send to configure to explicitly turn off optimization without imposing any of the debug code in the binary?
Logged In: YES
user_id=5735
Originator: NO
./configure --with-debug CFLAGS=''
or
CFLAGS='' ./configure --with-debug
Logged In: YES
user_id=5735
Originator: NO
please do this:
make lispbibl.h
grep HAVE_SAVED_STACK lispbibl.h
then get to the crash (where STACK is NULL)
and examine saved_STACK
thanks.
Logged In: YES
user_id=5735
Originator: NO
thank you for your bug report.
the bug has been fixed in the CVS tree.
you can either wait for the next release (recommended)
or check out the current CVS tree (see http://clisp.cons.org\)
and build CLISP from the sources (be advised that between
releases the CVS tree is very unstable and may not even build
on your platform).