#371 2.40 sigsegv on sparc

segfault
open
Bruno Haible
clisp (525)
5
2006-10-12
2006-10-12
Peter Van Eynde
No

bulding 2.40 with:

./configure debian/build --prefix=/usr --fsstnd=debian
--with-dynamic-ffi --with-dynamic-modules
--with-module=bindings/glibc --with-module=clx/new-clx
--with-debug

results in a crash in lisp.run using gdb I see:

(gdb) run ./lisp.run -B . -N locale -E 1:1 -Efile UTF-8
-Eterminal UTF-8 -norc -m 1800KW -x "(and (load
\"init.lisp\") (sys::%saveinitmem) (ext::exit))
(ext::exit t)"
Starting program:
/home/pvaneynd/clisp/clisp-2.40.orig/debian/build/lisp.run
./lisp.run -B . -N locale -E 1:1 -Efile UTF-8
-Eterminal UTF-8 -norc -m 1800KW -x "(and (load
\"init.lisp\") (sys::%saveinitmem) (ext::exit))
(ext::exit t)"
STACK depth: 98206

Program received signal SIGSEGV, Segmentation fault.
0x0016effc in hash_lookup_builtin (ht={one_o =
3777369}, obj={one_o = 3783721}, allowgc=false,
KVptr_=0xefffe458, Iptr_=0xefffe454) at hashtabl.d:1610
1610 while (!eq(*Nptr,nix)) { /* track "list" :
"list" finished -> not found */
(gdb) print Nptr
$2 = (gcv_object_t *) 0x43f1c16c
(gdb) print /x ht
$4 = {one_o = 0x39a359}
(gdb) frame
#0 0x0016effc in hash_lookup_builtin (ht={one_o =
3777369}, obj={one_o = 3783721}, allowgc=false,
KVptr_=0xefffe458, Iptr_=0xefffe454) at hashtabl.d:1610
1610 while (!eq(*Nptr,nix)) { /* track "list" :
"list" finished -> not found */
(gdb) backtrace
#0 0x0016effc in hash_lookup_builtin (ht={one_o =
3777369}, obj={one_o = 3783721}, allowgc=false,
KVptr_=0xefffe458, Iptr_=0xefffe454) at hashtabl.d:1610
#1 0x00173690 in gethash (obj=Cannot access memory at
address 0x43f1c16c
) at hashtabl.d:2416
#2 0x00284d54 in register_foreign_variable
(address=0x357100, name_asciz=0x32cd00
"ffi_user_pointer", flags=0, size=4) at foreign.d:185
#3 0x002a0edc in init_ffi () at foreign.d:4422
#4 0x0004675c in main (argc=17, argv=0xefffe6c4) at
spvw.d:3345
(gdb) print flags
$5 = 2 '\002'
(gdb) print hashindex
$6 = 1357776787
(gdb) print kvtable
$7 = {one_o = 3777329}
(gdb) print kvt_data
$8 = (gcv_object_t *) 0x39a348

I there are any gdb commands I can execute to help
debug this problem, just ask

Discussion

  • Sam Steingold
    Sam Steingold
    2006-10-12

    Logged In: YES
    user_id=5735

    one thing you could do is build with g++
    to check for GC safety.
    http://clisp.cons.org/impnotes/gc-safety.html
    CC=g++ ./configure --with-debug build-g-gxx
    thanks.

     
  • Sam Steingold
    Sam Steingold
    2006-10-12

    Logged In: YES
    user_id=5735

    you didn't specify it, but, presumably, this is linux/sparc
    (not solaris).

     
  • Logged In: YES
    user_id=7267

    I did the rebuild with g++-3.3 (on sparc/linux as you
    guessed) and after fixing a trivial casting problem I get:

    (gdb) run -B . -N locale -E 1:1 -Efile UTF-8 -Eterminal
    UTF-8 -norc -m 1800KW -x "(and (load \"init.lisp\")
    (sys::%saveinitmem) (ext::exit)) (ext::exit t)"
    Starting program:
    /home/pvaneynd/clisp/clisp-2.40.orig/debian/build/lisp.run
    -B . -N locale -E 1:1 -Efile UTF-8 -Eterminal UTF-8 -norc -m
    1800KW -x "(and (load \"init.lisp\") (sys::%saveinitmem)
    (ext::exit)) (ext::exit t)"
    STACK depth: 230302

    Program received signal SIGSEGV, Segmentation fault.
    0x002a12bc in hash_lookup_builtin (ht={one_o = 5853545,
    allocstamp = 67855}, obj={one_o = 5859937, allocstamp =
    67855}, allowgc=false, KVptr_=0xefffe410,
    Iptr_=0xefffe40c) at hashtabl.d:1610
    1610 while (!eq(*Nptr,nix)) { /* track "list" : "list"
    finished -> not found */
    Warning: the current language does not match this frame.

    Also as the failure is rather fast I guess there has been no
    gc yet.

     
  • Sam Steingold
    Sam Steingold
    2006-10-13

    Logged In: YES
    user_id=5735

    debug_gcsafety detects gcsafety bugs _before_ the GC that
    crashes. obviously, this is not such a bug. too bad.
    how about setting a break in hash_lookup_builtin and doing
    -before- the segfault (I assume that this is the first time
    you enter hash_lookup_builtin):
    (gdb) xout ht
    (gdb) zout ht
    (gdb) print TheHashtable_(ht)
    and examining the slots.

     
  • Logged In: YES
    user_id=7267

    Breakpoint 16, hash_lookup_builtin (ht={one_o = 5853545,
    allocstamp = 67855}, obj={one_o = 5859937, allocstamp =
    67855}, allowgc=false, KVptr_=0xefffe410,
    Iptr_=0xefffe40c) at hashtabl.d:1571
    1571 GCTRIGGER_IF(allowgc, GCTRIGGER2(ht,obj));
    (gdb) xout ht
    #(CL::HASH-TABLE size=3 maxcount=1 mincount=0 free=
    test=CL::EQUAL
    KV=#(CL::NIL #(#<UNBOUND> #<UNBOUND> #<UNBOUND>) 0 0
    #<UNBOUND> #<UNBOUND> #<UNBOUND>)){one_o = 5853545,
    allocstamp = 67855}
    (gdb) zout ht
    #S(HASH-TABLE :TEST EXT::FASTHASH-EQUAL)
    {one_o = 5853545, allocstamp = 68046}
    (gdb) print TheHashtable_(ht)

    Program received signal SIGABRT, Aborted.
    0x5026f910 in kill () from /lib/libc.so.6
    The program being debugged was signaled while in a function
    called from GDB.
    GDB remains in the frame where the signal was received.
    To change this behavior use "set unwindonsignal on"
    Evaluation of the expression containing the function
    (TheHashtable_) will be abandoned.

    it seems it aborted:

    (gdb) backtrace
    #0 0x5026f910 in kill () from /lib/libc.so.6
    #1 0x5027093c in abort () from /lib/libc.so.6
    #2 0x000cbedc in ngci_pointable (obj={one_o = 5853545,
    allocstamp = 67855}) at lispbibl.d:6927
    #3 0x0002c54c in TheHashtable_ (x={one_o = 0, allocstamp =
    0}) at spvw_debug.d:454
    #4 <function called from gdb>
    #5 hash_lookup_builtin (ht={one_o = 5853545, allocstamp =
    67855}, obj={one_o = 5859937, allocstamp = 67855},
    allowgc=false, KVptr_=0xefffe410, Iptr_=0xefffe40c)
    at hashtabl.d:1571
    #6 0x002a95d4 in gethash (obj={one_o = 5859937, allocstamp
    = 67855}, ht={one_o = 5853545, allocstamp = 67855},
    allowgc=false) at hashtabl.d:2416
    #7 0x0045250c in register_foreign_variable
    (address=0x55298c, name_asciz=0x50f838 "ffi_user_pointer",
    flags=0, size=4) at foreign.d:185
    #8 0x004832a8 in init_ffi () at foreign.d:4422
    #9 0x000c4418 in main (argc=16, argv=0xefffe6d4) at spvw.d:3345
    (gdb) down
    #2 0x000cbedc in ngci_pointable (obj={one_o = 5853545,
    allocstamp = 67855}) at lispbibl.d:6927
    6927 abort();
    (gdb) list
    6922 return obj.one_o;
    6923 }
    6924 static inline aint ngci_pointable (object obj) {
    6925 if (!(gcinvariant_symbol_p(obj)
    6926 || obj.allocstamp == alloccount ||
    nonimmsubrp(obj)))
    6927 abort();
    6928 nonimmprobe(obj.one_o);
    6929 return obj.one_o;
    6930 }
    6931 static inline aint ngci_pointable (gcv_object_t obj) {
    (gdb) print obj
    $1 = {one_o = 5853545, allocstamp = 67855}
    (gdb) print alloccount
    $2 = 68046
    (gdb) print gcinvariant_symbol_p(obj)
    $3 = false

    nonimmprobe is a macro and difficult to recreate it seems,
    but I can check if the pointer works:

    (gdb) print obj.one_o
    $14 = 5853545
    (gdb) print /x obj.one_o
    $15 = 0x595169
    (gdb) x /xw obj.one_o
    0x595169: 0x59516904

    I will retry now with 2.41.

     
  • Sam Steingold
    Sam Steingold
    2006-10-16

    Logged In: YES
    user_id=5735

    this abort means nothing: zout calls PRIN1 which conses and
    thus invalidates ht.
    note that it has the right allocstamp before zout.

     
  • Sam Steingold
    Sam Steingold
    2008-03-31

    Logged In: YES
    user_id=5735
    Originator: NO

    apparently this is a bug in some versions of gcc on solaris.
    what is your gcc version?
    http://permalink.gmane.org/gmane.lisp.clisp.general/12179
    In case any one cares, I tried building clisp 2.44.1 using gcc 3.4.3
    on sparc. The sources appear to build and an image is created.
    However, when running make check, the check eventually gets a segfault
    that crashes clisp.

    I didn't use any special libraries and only had libsigsegv available.
    More info available if anyone wants to take a look.

    Using the same sources, I rebuilt using gcc 3.3.3, and make check
    finishes just fine. This works for me, so the fact that 3.4.3 doesn't
    work is not so important to me.

     
  • Raymond Toy
    Raymond Toy
    2008-04-01

    Logged In: YES
    user_id=28849
    Originator: NO

    It's not clear if the bug I reported in that link is in clisp or in gcc. I didn't investigate the cause.

     
  • Sam Steingold
    Sam Steingold
    2008-04-02

    Logged In: YES
    user_id=5735
    Originator: NO

    the crash is with _some_ gcc version is a very basic part of CLISP.
    if you can read assembly, it would be nice if you could figure out which part of hashtabl.d is miscompiled and file a gcc bug report.

     
  • Raymond Toy
    Raymond Toy
    2008-04-02

    Logged In: YES
    user_id=28849
    Originator: NO

    The crash I reported isn't in the same place, and it's sparc/solaris, not sparc/linux.

    make check is running, and here is the last output before crashing:

    (PROGN (DEFGENERIC TESTGF00 (&REST ARGS &KEY) (:METHOD (&REST ARGS))) (TESTGF00 'A 'B))
    [SIMPLE-KEYWORD-ERROR]: #:COMPILED-FORM-180-1: illegal keyword/value pair A, B in argument list.
    The allowed keywords are NIL

    Here is part of the backtrace. I don't think this is the same bug.

    #0 top_of_back_trace_frame (bt=0x669400fb) at debug.d:1104
    #1 0x00034bc8 in unwind_upto (upto_frame=0xffbe9464) at eval.d:624
    #2 0x00043d40 in invoke_handlers (cond=0x1a952901) at eval.d:697
    #3 0x000be904 in C_clcs_signal (argcount=0, rest_args_pointer=0x1ba2e4) at error.d:775
    #4 0x00037234 in funcall_subr (fun=0x1a0a7a, args_on_stack=0) at eval.d:5179
    #5 0x000bbb54 in signal_and_debug (condition=0x1a952901) at error.d:204
    #6 0x000bbdd4 in end_error (stackptr=0x1ba2c4, start_driver_p=true) at error.d:317
    #7 0x000bbf6c in error (errortype=keyword_error,
    errorstring=0x139a70 "~S: illegal keyword/value pair ~S, ~S in argument list.\nThe allowed keywords are ~S") at error.d:349
    #8 0x000bf380 in error_key_badkw (fun=0x1a951241, key=0x1a9095a9, val=0x1a9095d9, kwlist=0x1a68d9)
    at error.d:1317
    #9 0x00036764 in match_cclosure_key (closure=0x1a952419, argcount=1, key_args_pointer=0x1ba2bc,
    rest_args_pointer=0x1ba2bc) at eval.d:2803
    #10 0x0004361c in apply_closure (closure=0x1a952419, args_on_stack=0, args=0x1a68d9) at eval.d:4747
    #11 0x0003d538 in interpret_bytecode_ (closure_in=0x1a94e491, codeptr=0x1a950b48,
    byteptr_in=0x1a950b6f "") at eval.d:7737
    #12 0x00042e58 in apply_closure (closure=0x1a94e491, args_on_stack=0, args=0x669429e3)
    at eval.d:4770
    #13 0x0003d538 in interpret_bytecode_ (closure_in=0x1a94e491, codeptr=0x1a94f2c8, byteptr_in=0x0)
    at eval.d:7737
    #14 0x0003e764 in eval1 (form=0x1ba2b4) at eval.d:3866
    #15 0x0003f0f4 in eval (form=0x66944853) at eval.d:2908

     
  • Sam Steingold
    Sam Steingold
    2009-10-20

    same crash:

    Program received signal SIGSEGV, Segmentation fault.
    0x0016dfb4 in hash_lookup_builtin (ht=..., obj=..., allowgc=false,
    KVptr_=0xffef14d4, Iptr_=0xffef14d0) at ../src/hashtabl.d:1417
    1417 while (!eq(*Nptr,nix)) { /* track "list" : "list" finished -> not found */
    (gdb) where
    #0 0x0016dfb4 in hash_lookup_builtin (ht=..., obj=..., allowgc=false,
    KVptr_=0xffef14d4, Iptr_=0xffef14d0) at ../src/hashtabl.d:1417
    #1 0x0017244c in gethash (obj=..., ht=..., allowgc=false)
    at ../src/hashtabl.d:2219
    #2 0x0027dd34 in register_foreign_inttype (name_asciz=0x2ec938 "ssize_t",
    size=4, signed_p=true) at ../src/foreign.d:274
    #3 0x0029aab0 in init_ffi () at ../src/foreign.d:4604
    #4 0x00046ff4 in main (argc=18, argv=0xffef1784) at ../src/spvw.d:3841

    with current cvs head on
    Linux titan 2.6.24.4 #1 SMP Sat Apr 12 20:33:06 UTC 2008 sparc64 GNU/Linux
    with
    gcc (Debian 4.3.4-5) 4.3.4

     
  • Sam Steingold
    Sam Steingold
    2009-10-20

    note that I observe the crash with "gcc -g -O0"

     
  • Sam Steingold
    Sam Steingold
    2009-10-20

    same crash in 2.32 & 2.34