CLISP - an ANSI Common Lisp / Bugs / #470 segfault on SIGHUP

Andrew Kroll - 2008-05-06

Logged In: YES
user_id=2049191
Originator: YES

OOoooo... interesting (thanx for the debugging tip!)

[1]>
Program received signal SIGHUP, Hangup.
0xb7dd3cce in __read_nocancel () from /lib/libc.so.6
(gdb) continue
Continuing.

*** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
Breakpoint 1, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
39 spvw_sigsegv.d: No such file or directory.
in spvw_sigsegv.d
(gdb) bt 20
#0 sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
#1 0x08067b9b in sigsegv_handler (fault_address=0x0, serious=1)
at spvw_sigsegv.d:55
#2 0x0812d357 in sigsegv_handler ()
#3 <signal handler called>
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Will dig as deep as I can, thanx!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-06

status: pending-works-for-me --> open-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-06

Logged In: YES
user_id=5735
Originator: NO

1. you might want to set a breakpoint in quit_on_signal
2. please take a look at http://clisp.podval.org/impnotes/faq.html#faq-debug and tell me what's missing there.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-06

status: open-works-for-me --> pending-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-07

Logged In: YES
user_id=2049191
Originator: YES

[1]>
Program received signal SIGHUP, Hangup.
0xb7e11cce in __read_nocancel () from /lib/libc.so.6
(gdb) step
Single stepping until exit from function __read_nocancel,
which has no line number information.
quit_on_signal (sig=1) at spvw_sigterm.d:46
46 local void quit_on_signal (int sig) {
(gdb) print sig
$1 = 1
(gdb) step

Breakpoint 18, quit_on_signal (sig=1) at spvw_sigterm.d:47
47 if (quit_on_signal_in_progress) { /* quit without much ado */
(gdb) print quit_on_signal_in_progress
$2 = false
(gdb) step
54 quit_on_signal_in_progress = true;
(gdb) step
55 pushSTACK(Symbol_value(S(error_output))); fresh_line(&STACK_0);
(gdb) call pushSTACK(Symbol_value(S(error_output)))
No symbol "pushSTACK" in current context.
(gdb) print STACK_0
No symbol "STACK_0" in current context.
(gdb) print error_output
No symbol "error_output" in current context.
(gdb) step

*** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
Breakpoint 16, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
39 fputs("\n",stderr);
(gdb)

Seems the line it explodes on, is a macro, as I could not print the variables, or even call the functions manually...
I'll look at what it unrolled to, and take another look.
Seems the segfault is on this:
spvw_sigterm.d:55
55 pushSTACK(Symbol_value(S(error_output))); fresh_line(&STACK_0);
I'll see what I can dig up tonight, since I got some time to dig.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-07

status: pending-works-for-me --> open-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-07

Logged In: YES
user_id=5735
Originator: NO

pushSTACK, STACK, Symbol_value, S are all macros, defined in lispbibl.d (or clisp.h in a binary installation).
please examine the value of STACK:
p STACK
p STACK[-1] ;; this is STACK_0
xout STACK[-1]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-08

Logged In: YES
user_id=2049191
Originator: YES

Sorry about the slow reply. This is the debug data.
[1]>
Program received signal SIGHUP, Hangup.
0xb7e6bcce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$1 = (gcv_object_t *) 0x0
(gdb) p STACK[-1]
Cannot access memory at address 0xfffffffc
(gdb) xout STACK[-1]
Cannot access memory at address 0xfffffffc
(gdb)

looks like STACK is a null pointer :-) Yep, that'll cause it to go up in flames!

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-08

Logged In: YES
user_id=5735
Originator: NO

thanks. I cannot imagine how this could happen though.
start clisp under gdb, when you get the prompt, interrupt clisp to get to the gdb promt, then do
(gdb) p STACK
(gdb) watch STACK
(gdb) continue
then send SIGHUP and see what happens.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-08

status: open-works-for-me --> pending-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-10

status: pending-works-for-me --> open-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-10

Logged In: YES
user_id=2049191
Originator: YES

I apologize again for the late reply, Had to do my day job...

I can actually see why it is breaking ... very interesting output, I hit ^C and STACK is a null pointer X-D
Obviously it is not initialized for some reason... and the sigsegv catcher is helping it along after something is done on the stack...

Here's the trace as directed:

[1]>
Program received signal SIGINT, Interrupt.
0xb7da9cce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$1 = (gcv_object_t *) 0x0
(gdb) watch STACK
Watchpoint 19: STACK
(gdb) continue
Continuing.

Watchpoint 19 deleted because the program has left the block in
which its expression is valid.
0xb7da9cce in __read_nocancel () from /lib/libc.so.6
(gdb) step
Single stepping until exit from function __read_nocancel,
which has no line number information.

Program received signal SIGHUP, Hangup.
0xb7da9cce in __read_nocancel () from /lib/libc.so.6
(gdb) step
Single stepping until exit from function __read_nocancel,
which has no line number information.
quit_on_signal (sig=1) at spvw_sigterm.d:46
46 spvw_sigterm.d: No such file or directory.
in spvw_sigterm.d
(gdb) step

Breakpoint 18, quit_on_signal (sig=1) at spvw_sigterm.d:47
47 in spvw_sigterm.d
(gdb) step
54 in spvw_sigterm.d
(gdb) step
55 in spvw_sigterm.d
(gdb) step

*** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
Breakpoint 16, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
39 spvw_sigsegv.d: No such file or directory.
in spvw_sigsegv.d
(gdb)

And now for something completely different...
Here is the trace after I hit one key! I typed a '('...

[1]>
Program received signal SIGINT, Interrupt.
0xb7dfacce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$2 = (gcv_object_t *) 0x0
(gdb) watch STACK
Watchpoint 20: STACK
(gdb) c
Continuing.
Watchpoint 20: STACK

Old value = (gcv_object_t *) 0x0
New value = (gcv_object_t *) 0xb7b4609c
0xb7dfacce in __read_nocancel () from /lib/libc.so.6
(gdb) c
Continuing.

Watchpoint 20 deleted because the program has left the block in
which its expression is valid.
0xb7d5c55e in __ctype_get_mb_cur_max () from /lib/libc.so.6
(gdb) c
Continuing.
[1]> (format t "Hello world!")
Hello world!
NIL
[2]>
Program received signal SIGINT, Interrupt.
0xb7dfacce in __read_nocancel () from /lib/libc.so.6
(gdb) p STACK
$3 = (gcv_object_t *) 0x0
(gdb)

Note how it went back to a NULL pointer! ACK!
Yep, something is misbehaving...

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-10

Logged In: YES
user_id=2049191
Originator: YES

Would you like me to patch this and send a diff?
It's a fairly obvious bug ;-)

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-11

status: open-works-for-me --> pending-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-11

Logged In: YES
user_id=5735
Originator: NO

I would love to see your patch, but I doubt this is something simple or obvious: nobody has seen this before.
e.g., I am not interested in sweeping the problem under the rug by checking STACK in quit_on_signal.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-11

Logged In: YES
user_id=2049191
Originator: YES

Ok, That took away my second option...
I'm thinking of forcing it to point at something legal to begin with at the beginning of the program...

Here is the details of my build, so that you can see if anything is amiss... One never knows...
Perhaps you can spot some detail, perhaps it's a bad option, or a compiler corner case being tripped up.

GNU CLISP 2.44.1 (2008-02-23) (built 3419009049) (memory 3419009665)
Software: GNU C 4.1.2
gcc -g -O2 -Igllib -W -Wswitch -Wcomment -Wpointer-arith -Wimplicit -Wreturn-type -Wmissing-declarations -Wno-sign-compare -O2 -fexpensive-optimizations -falign-functions=4 -DUNICODE -DDYNAMIC_FFI -I. -x none /usr/lib/libreadline.so -lncurses -ldl /usr/lib/libavcall.a /usr/lib/libcallback.a -L/usr/lib -lsigsegv -L/usr/lib -lc
SAFETY=0 HEAPCODES LINUX_NOEXEC_HEAPCODES GENERATIONAL_GC SPVW_BLOCKS SPVW_MIXED TRIVIALMAP_MEMORY
libsigsegv 2.4
libreadline 5.2
Features:
(READLINE REGEXP SYSCALLS I18N LOOP COMPILER CLOS MOP CLISP ANSI-CL COMMON-LISP
LISP=CL INTERPRETER SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI
GETTEXT UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
C Modules: (clisp i18n syscalls regexp readline)
Installation directory: /usr/lib/clisp-2.44.1/
User language: ENGLISH
Machine: I686 (I686) shit.apartment [192.168.123.2]

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-11

status: pending-works-for-me --> open-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-11

Logged In: YES
user_id=5735
Originator: NO

"-O2" makes it undebuggable. whatever you have done is thus a waste of time.
please rebuild without ANY optimizations.
PS. setting STACK to "something legal" is done long before the first prompt is printed.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-11

status: open-works-for-me --> pending-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-11

Logged In: YES
user_id=2049191
Originator: YES

Ok
I removed the optimization and configure placed the -O2 in it on it's own.... Is there a switch to send to configure to explicitly turn off optimization without imposing any of the debug code in the binary?

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Andrew Kroll - 2008-05-11

status: pending-works-for-me --> open-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-11

Logged In: YES
user_id=5735
Originator: NO

./configure --with-debug CFLAGS=''
or
CFLAGS='' ./configure --with-debug

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-19

status: open-works-for-me --> pending-works-for-me
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-22

Logged In: YES
user_id=5735
Originator: NO

please do this:
make lispbibl.h
grep HAVE_SAVED_STACK lispbibl.h
then get to the crash (where STACK is NULL)
and examine saved_STACK
thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Sam Steingold - 2008-05-22

Logged In: YES
user_id=5735
Originator: NO

thank you for your bug report.
the bug has been fixed in the CVS tree.
you can either wait for the next release (recommended)
or check out the current CVS tree (see http://clisp.cons.org\)
and build CLISP from the sources (be advised that between
releases the CVS tree is very unstable and may not even build
on your platform).

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

segfault on SIGHUP

Group

Searches

Help

#470 segfault on SIGHUP

Discussion