Menu

#470 segfault on SIGHUP

segfault
closed-fixed
clisp (524)
5
2008-05-22
2008-05-03
No

May 2 07:14:39 shit kernel: lisp.run[19883]: segfault at 1 ip 08068794 sp bf8d26d0 error 6 in lisp.run[8048000+18c000]

I am use CLISP for a project, unfortunately I can't reveal the code for it, but I can describe the application, and list for you the modules used.

Basically CLISP is loading in as a web application, using the following switches:

#!/usr/bin/clisp -q -q -q -q -norc
then the code...

This is actually a stub to load in FAS's, and execute the actual application.

Modules used are:

asdf Revision: 1.110, with a small modification to not print certain annoyance messages
cl-ppcre 1.3.2
puri 1.5.1
acl-compat dated 2006-01-22 in the Changelog

There is one other module loaded that is part of the project, but it does not segfault any place else.

The only thing I can think of that can cause a similar effect is if STDOUT/STDERR go away, like what would happen on an aborted web page containing lots of data.

If there is something I can do to debug this, please let me know so I can home in on the cause and get to you the information you really need to fix the problem. It's not fatal, just annoying, and the information here says to report segfaults... ;-) And it should be cosmetically cleaned up.

I also notice when building CLISP 2.44.1 that there are many segfaults caused during the build phase...

The exact method of how it is built can be located at the following URL (it's a slackbuild) in case you would like to look at how I build it... Small note on that too, is I had to increase the ulimit to 32768, or else the build fails, you may want to update the size to that during the warning when configuring...

ftp://ftp.uglyplace.org/pkg_dreams/Slackware-12.0.0/repos/development/clisp/2.44.1/src/

Discussion

<< < 1 2 3 > >> (Page 2 of 3)
  • Andrew Kroll

    Andrew Kroll - 2008-05-06

    Logged In: YES
    user_id=2049191
    Originator: YES

    OOoooo... interesting (thanx for the debugging tip!)

    [1]>
    Program received signal SIGHUP, Hangup.
    0xb7dd3cce in __read_nocancel () from /lib/libc.so.6
    (gdb) continue
    Continuing.

    *** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
    Breakpoint 1, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
    39 spvw_sigsegv.d: No such file or directory.
    in spvw_sigsegv.d
    (gdb) bt 20
    #0 sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
    #1 0x08067b9b in sigsegv_handler (fault_address=0x0, serious=1)
    at spvw_sigsegv.d:55
    #2 0x0812d357 in sigsegv_handler ()
    #3 <signal handler called>
    Backtrace stopped: previous frame inner to this frame (corrupt stack?)

    Will dig as deep as I can, thanx!

     
  • Andrew Kroll

    Andrew Kroll - 2008-05-06
    • status: pending-works-for-me --> open-works-for-me
     
  • Sam Steingold

    Sam Steingold - 2008-05-06

    Logged In: YES
    user_id=5735
    Originator: NO

    1. you might want to set a breakpoint in quit_on_signal
    2. please take a look at http://clisp.podval.org/impnotes/faq.html#faq-debug and tell me what's missing there.

     
  • Sam Steingold

    Sam Steingold - 2008-05-06
    • status: open-works-for-me --> pending-works-for-me
     
  • Andrew Kroll

    Andrew Kroll - 2008-05-07

    Logged In: YES
    user_id=2049191
    Originator: YES

    [1]>
    Program received signal SIGHUP, Hangup.
    0xb7e11cce in __read_nocancel () from /lib/libc.so.6
    (gdb) step
    Single stepping until exit from function __read_nocancel,
    which has no line number information.
    quit_on_signal (sig=1) at spvw_sigterm.d:46
    46 local void quit_on_signal (int sig) {
    (gdb) print sig
    $1 = 1
    (gdb) step

    Breakpoint 18, quit_on_signal (sig=1) at spvw_sigterm.d:47
    47 if (quit_on_signal_in_progress) { /* quit without much ado */
    (gdb) print quit_on_signal_in_progress
    $2 = false
    (gdb) step
    54 quit_on_signal_in_progress = true;
    (gdb) step
    55 pushSTACK(Symbol_value(S(error_output))); fresh_line(&STACK_0);
    (gdb) call pushSTACK(Symbol_value(S(error_output)))
    No symbol "pushSTACK" in current context.
    (gdb) print STACK_0
    No symbol "STACK_0" in current context.
    (gdb) print error_output
    No symbol "error_output" in current context.
    (gdb) step

    *** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
    Breakpoint 16, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
    39 fputs("\n",stderr);
    (gdb)

    Seems the line it explodes on, is a macro, as I could not print the variables, or even call the functions manually...
    I'll look at what it unrolled to, and take another look.
    Seems the segfault is on this:
    spvw_sigterm.d:55
    55 pushSTACK(Symbol_value(S(error_output))); fresh_line(&STACK_0);
    I'll see what I can dig up tonight, since I got some time to dig.

     
  • Andrew Kroll

    Andrew Kroll - 2008-05-07
    • status: pending-works-for-me --> open-works-for-me
     
  • Sam Steingold

    Sam Steingold - 2008-05-07

    Logged In: YES
    user_id=5735
    Originator: NO

    pushSTACK, STACK, Symbol_value, S are all macros, defined in lispbibl.d (or clisp.h in a binary installation).
    please examine the value of STACK:
    p STACK
    p STACK[-1] ;; this is STACK_0
    xout STACK[-1]

     
  • Andrew Kroll

    Andrew Kroll - 2008-05-08

    Logged In: YES
    user_id=2049191
    Originator: YES

    Sorry about the slow reply. This is the debug data.
    [1]>
    Program received signal SIGHUP, Hangup.
    0xb7e6bcce in __read_nocancel () from /lib/libc.so.6
    (gdb) p STACK
    $1 = (gcv_object_t *) 0x0
    (gdb) p STACK[-1]
    Cannot access memory at address 0xfffffffc
    (gdb) xout STACK[-1]
    Cannot access memory at address 0xfffffffc
    (gdb)

    looks like STACK is a null pointer :-) Yep, that'll cause it to go up in flames!

     
  • Sam Steingold

    Sam Steingold - 2008-05-08

    Logged In: YES
    user_id=5735
    Originator: NO

    thanks. I cannot imagine how this could happen though.
    start clisp under gdb, when you get the prompt, interrupt clisp to get to the gdb promt, then do
    (gdb) p STACK
    (gdb) watch STACK
    (gdb) continue
    then send SIGHUP and see what happens.

     
  • Sam Steingold

    Sam Steingold - 2008-05-08
    • status: open-works-for-me --> pending-works-for-me
     
  • Andrew Kroll

    Andrew Kroll - 2008-05-10
    • status: pending-works-for-me --> open-works-for-me
     
  • Andrew Kroll

    Andrew Kroll - 2008-05-10

    Logged In: YES
    user_id=2049191
    Originator: YES

    I apologize again for the late reply, Had to do my day job...

    I can actually see why it is breaking ... very interesting output, I hit ^C and STACK is a null pointer X-D
    Obviously it is not initialized for some reason... and the sigsegv catcher is helping it along after something is done on the stack...

    Here's the trace as directed:

    [1]>
    Program received signal SIGINT, Interrupt.
    0xb7da9cce in __read_nocancel () from /lib/libc.so.6
    (gdb) p STACK
    $1 = (gcv_object_t *) 0x0
    (gdb) watch STACK
    Watchpoint 19: STACK
    (gdb) continue
    Continuing.

    Watchpoint 19 deleted because the program has left the block in
    which its expression is valid.
    0xb7da9cce in __read_nocancel () from /lib/libc.so.6
    (gdb) step
    Single stepping until exit from function __read_nocancel,
    which has no line number information.

    Program received signal SIGHUP, Hangup.
    0xb7da9cce in __read_nocancel () from /lib/libc.so.6
    (gdb) step
    Single stepping until exit from function __read_nocancel,
    which has no line number information.
    quit_on_signal (sig=1) at spvw_sigterm.d:46
    46 spvw_sigterm.d: No such file or directory.
    in spvw_sigterm.d
    (gdb) step

    Breakpoint 18, quit_on_signal (sig=1) at spvw_sigterm.d:47
    47 in spvw_sigterm.d
    (gdb) step
    54 in spvw_sigterm.d
    (gdb) step
    55 in spvw_sigterm.d
    (gdb) step

    *** - handle_fault error2 ! address = 0x0 not in [0x2025b004,0x203c566c) !
    Breakpoint 16, sigsegv_handler_failed (address=0x0) at spvw_sigsegv.d:39
    39 spvw_sigsegv.d: No such file or directory.
    in spvw_sigsegv.d
    (gdb)

    And now for something completely different...
    Here is the trace after I hit one key! I typed a '('...

    [1]>
    Program received signal SIGINT, Interrupt.
    0xb7dfacce in __read_nocancel () from /lib/libc.so.6
    (gdb) p STACK
    $2 = (gcv_object_t *) 0x0
    (gdb) watch STACK
    Watchpoint 20: STACK
    (gdb) c
    Continuing.
    Watchpoint 20: STACK

    Old value = (gcv_object_t *) 0x0
    New value = (gcv_object_t *) 0xb7b4609c
    0xb7dfacce in __read_nocancel () from /lib/libc.so.6
    (gdb) c
    Continuing.

    Watchpoint 20 deleted because the program has left the block in
    which its expression is valid.
    0xb7d5c55e in __ctype_get_mb_cur_max () from /lib/libc.so.6
    (gdb) c
    Continuing.
    [1]> (format t "Hello world!")
    Hello world!
    NIL
    [2]>
    Program received signal SIGINT, Interrupt.
    0xb7dfacce in __read_nocancel () from /lib/libc.so.6
    (gdb) p STACK
    $3 = (gcv_object_t *) 0x0
    (gdb)

    Note how it went back to a NULL pointer! ACK!
    Yep, something is misbehaving...

     
  • Andrew Kroll

    Andrew Kroll - 2008-05-10

    Logged In: YES
    user_id=2049191
    Originator: YES

    Would you like me to patch this and send a diff?
    It's a fairly obvious bug ;-)

     
  • Sam Steingold

    Sam Steingold - 2008-05-11
    • status: open-works-for-me --> pending-works-for-me
     
  • Sam Steingold

    Sam Steingold - 2008-05-11

    Logged In: YES
    user_id=5735
    Originator: NO

    I would love to see your patch, but I doubt this is something simple or obvious: nobody has seen this before.
    e.g., I am not interested in sweeping the problem under the rug by checking STACK in quit_on_signal.

     
  • Andrew Kroll

    Andrew Kroll - 2008-05-11

    Logged In: YES
    user_id=2049191
    Originator: YES

    Ok, That took away my second option...
    I'm thinking of forcing it to point at something legal to begin with at the beginning of the program...

    Here is the details of my build, so that you can see if anything is amiss... One never knows...
    Perhaps you can spot some detail, perhaps it's a bad option, or a compiler corner case being tripped up.

    GNU CLISP 2.44.1 (2008-02-23) (built 3419009049) (memory 3419009665)
    Software: GNU C 4.1.2
    gcc -g -O2 -Igllib -W -Wswitch -Wcomment -Wpointer-arith -Wimplicit -Wreturn-type -Wmissing-declarations -Wno-sign-compare -O2 -fexpensive-optimizations -falign-functions=4 -DUNICODE -DDYNAMIC_FFI -I. -x none /usr/lib/libreadline.so -lncurses -ldl /usr/lib/libavcall.a /usr/lib/libcallback.a -L/usr/lib -lsigsegv -L/usr/lib -lc
    SAFETY=0 HEAPCODES LINUX_NOEXEC_HEAPCODES GENERATIONAL_GC SPVW_BLOCKS SPVW_MIXED TRIVIALMAP_MEMORY
    libsigsegv 2.4
    libreadline 5.2
    Features:
    (READLINE REGEXP SYSCALLS I18N LOOP COMPILER CLOS MOP CLISP ANSI-CL COMMON-LISP
    LISP=CL INTERPRETER SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI
    GETTEXT UNICODE BASE-CHAR=CHARACTER PC386 UNIX)
    C Modules: (clisp i18n syscalls regexp readline)
    Installation directory: /usr/lib/clisp-2.44.1/
    User language: ENGLISH
    Machine: I686 (I686) shit.apartment [192.168.123.2]

     
  • Andrew Kroll

    Andrew Kroll - 2008-05-11
    • status: pending-works-for-me --> open-works-for-me
     
  • Sam Steingold

    Sam Steingold - 2008-05-11

    Logged In: YES
    user_id=5735
    Originator: NO

    "-O2" makes it undebuggable. whatever you have done is thus a waste of time.
    please rebuild without ANY optimizations.
    PS. setting STACK to "something legal" is done long before the first prompt is printed.

     
  • Sam Steingold

    Sam Steingold - 2008-05-11
    • status: open-works-for-me --> pending-works-for-me
     
  • Andrew Kroll

    Andrew Kroll - 2008-05-11

    Logged In: YES
    user_id=2049191
    Originator: YES

    Ok
    I removed the optimization and configure placed the -O2 in it on it's own.... Is there a switch to send to configure to explicitly turn off optimization without imposing any of the debug code in the binary?

     
  • Andrew Kroll

    Andrew Kroll - 2008-05-11
    • status: pending-works-for-me --> open-works-for-me
     
  • Sam Steingold

    Sam Steingold - 2008-05-11

    Logged In: YES
    user_id=5735
    Originator: NO

    ./configure --with-debug CFLAGS=''
    or
    CFLAGS='' ./configure --with-debug

     
  • Sam Steingold

    Sam Steingold - 2008-05-19
    • status: open-works-for-me --> pending-works-for-me
     
  • Sam Steingold

    Sam Steingold - 2008-05-22

    Logged In: YES
    user_id=5735
    Originator: NO

    please do this:
    make lispbibl.h
    grep HAVE_SAVED_STACK lispbibl.h
    then get to the crash (where STACK is NULL)
    and examine saved_STACK
    thanks.

     
  • Sam Steingold

    Sam Steingold - 2008-05-22

    Logged In: YES
    user_id=5735
    Originator: NO

    thank you for your bug report.
    the bug has been fixed in the CVS tree.
    you can either wait for the next release (recommended)
    or check out the current CVS tree (see http://clisp.cons.org\)
    and build CLISP from the sources (be advised that between
    releases the CVS tree is very unstable and may not even build
    on your platform).

     
<< < 1 2 3 > >> (Page 2 of 3)

Log in to post a comment.