Causing a stack overflow twice in a row leads to a
segfault in clisp 2.33.2 on x86 Linux from Gentoo
ebuild. gcc is version 3.3.5, and glibc is version
2.3.4. My "CFLAGS" is '-mcpu=i686 -O2 -pipe'. Gzipped
core file is attached. Note that this error does *not*
happen with a clean, debug-enabled build. Should I
send this bug report to the Gentoo ebuild maintainer
instead?
This is how I produced the problem:
$ clisp -q -q
[1]> (defun f (n) (if (zerop n) 0 (f (1- n))))
F
[2]> (f 10000)
*** - Program stack overflow. RESET
[3]> (f 10000)
Segmentation fault (core dumped)
This is my system configuration:
$ uname -a
Linux ballpoint 2.6.10-gentoo-r6_dr #1 Sun Mar 6
13:56:17 PST 2005 i686 AMD Athlon(tm) 64 Processor
3200+ AuthenticAMD GNU/Linux
$
$ gcc -v
Reading specs from
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.5/specs
Configured with:
/var/tmp/portage/gcc-3.3.5-r1/work/gcc-3.3.5/configure
--enable-version-specific-runtime-libs --prefix=/usr
--bindir=/usr/i686-pc-linux-gnu/gcc-bin/3.3.5
--includedir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.5/include
--datadir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3.5
--mandir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3.5/man
--infodir=/usr/share/gcc-data/i686-pc-linux-gnu/3.3.5/info
--with-gxx-include-dir=/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.5/include/g++-v3
--host=i686-pc-linux-gnu --disable-altivec
--disable-nls --enable-__cxa_atexit
--enable-clocale=gnu --with-system-zlib
--disable-checking --disable-werror
--disable-libunwind-exceptions --enable-shared
--enable-threads=posix --disable-multilib
--disable-libgcj --enable-languages=c,c++
Thread model: posix
gcc version 3.3.5 (Gentoo Linux 3.3.5-r1, ssp-3.3.2-3,
pie-8.7.7.1)
$
$ clisp --version
GNU CLISP 2.33.2 (2004-06-02) (built 3322071952)
(memory 3322072066)
Software: GNU C 3.3.5 (Gentoo Linux 3.3.5-r1,
ssp-3.3.2-3, pie-8.7.7.1) ANSI C program
Features: (PCRE CLX-ANSI-COMMON-LISP CLX SYSCALLS
REGEXP CLOS LOOP COMPILER CLISP ANSI-CL COMMON-LISP
LISP=CL INTERPRETER SOCKETS GENERIC-STREAMS
LOGICAL-PATHNAMES SCREEN FFI GETTEXT UNICODE
BASE-CHAR=CHARACTER PC386 UNIX)
Installation directory: /usr/lib/clisp/
User language: ENGLISH
Machine: I686 (I686) ballpoint.Stanford.EDU [128.12.51.95]
Logged In: YES
user_id=887335
I got and error from the file attachment. I'll try to
attach it again.
Logged In: YES
user_id=5735
There have been patches on clisp-devel recently which were
supposed to fix this or something similar.
the patches have to be applied both to clisp and libsigsegv.
presumably, Bruno will review them and apply to clisp and
libsigsegv...
Logged In: YES
user_id=377168
[ME too], with an even simpler case:
(f -1) -> stack overflow, RESET
(cl::barf) (or other errors) -> core dump
Debian (April 2005 Hoary/Ubuntu) on Linux-386, both
clisp-2.33.2 from Debian as well as clisp-cvs (a few days
old), using libsigsegv-dev 2.1-1 packaged for Debian by Will
Newton.
I'll have to locate those sigsegv patches and see how I can
put them into my current Debian system (replacing the Debian
pre-built package).
Logged In: YES
user_id=377168
I asked Bruno Haible and he remembers/knows of no patches.
Furthermore, libsigsegv-cvs is unchanged since 2.1 (what I
have installed) w.r.t. i386 (mach and MacOSX changed), thus
Bruno suspects a bug in CLISP: maybe STACK is in a register
and not restored properly (I'll have to check whether my
build and also the Ubuntu/Debian clisp-2.33.2 build uses a
register variable for STACK).
Summary: the crash bug is still in cvs-clisp-2005-05-18, as
well as in Ubuntu's clisp-2.33.2 Debian package.
Logged In: YES
user_id=5735
patches are in this thread:
<http://thread.gmane.org/gmane.lisp.clisp.general/9405>
Logged In: YES
user_id=377168
today's experimental results:
SAFETY=3 fixed the crash, while it's still in with SAFETY=2
Now, where to look next??
Note that with SAFETY=2, STACK_register is not used, so that
should not be the culprit this time.
Well, actually, STACK_register was not used in my default
build anyway since I'm using gcc-3.3 per default and
lispbibl.d disables it for GNUC_MINOR<4.
Logged In: YES
user_id=377168
As I noticed that SAFETY=3 disables generational GC, I tried
again with normal SAFETY settings but -DNO_GENERATIONAL_GC.
The bug disappears.
BTW, I'm still using the old sigsegv (i.e. without some
patches that should not affect i386 anyway).
I tried normal settings and -DDEBUG_SPVW. It crashed as
usual. I was surprised that the only debug output was, right
after program start:
STACK depth: 114415
SP depth: 67108956
I had expected some more messages from using that option.
Here's another way to crash:
[1]> (defun fact(n)(if (zerop n) 1 (* n (fact (1- n)))))
FACT
[2]> (fact -1)
*** - Program stack overflow. RESET
[3]> (room) ; or call (ext:gc)
Speicherzugriffsfehler
which shows that the memory is corrupt -- somewhere
Here again, I'm surprised there's no output from DEBUG_SPVW.
Logged In: YES
user_id=377168
I now tried the following:
win32-native build using MS-VC 6.0 (instead of Linux/gcc)
clisp/cvs (from Friday, 1st of July
libsigsegv/cvs (I'll have to use the cvs one on Linux also)
A. Build without libsigsegv:
(fact -1) -> *** - Lisp stack overflow. RESET, but no crash
(compile *) (f -1) -> crash & window requester "unknown
software exception" (0xc00000fd)
B. Build with libsigsegv (and working generational GC)
Lisp stack overflow is detected in both interpreted and
compiled mode, everything is fine.
Note that this version says
*** - Program stack overflow. RESET
in both cases, not "Lisp stack".
I.e., the bug is not present in the MS-VC/win32 version of
CLISP.
I believe the crash in the compiled function + nolibsigsegv
case is not new and due to weaker stack bounds checking with
compiled code (possibly still somewhat surprising w.r.t.
what I was used to from the Amiga, where CLISP sort of never
crashed in years).
Logged In: YES
user_id=377168
The bug is still present at least on Linux/i686:
[1]> (defun fact(n)(if (zerop n) 1 (* n (fact (1- n)))))
FACT
[2]> (fact -1)
Speicherzugriffsfehler
Even in the interpreter! (identical crash when compiling first).
./lisp.run -B. --version
GNU CLISP 2.37 (2006-01-02) (built 2006-02-04 18:02:32)
Software: GNU C 4.0.2 20050808 (prerelease) (Ubuntu
4.0.1-4ubuntu9) gcc -W -Wswitch -Wcomment -Wpointer-arith
-Wimplicit -Wreturn-type -Wmissing-declarations
-Wno-sign-compare -O2 -fexpensive-optimizations
-DDYNAMIC_FFI -DDYNAMIC_MODULES -I. -x none libcharset.a
libavcall.a libcallback.a -lreadline -lncurses -ldl
-lsigsegv -L/usr/X11R6/lib -lX11
SAFETY=0 HEAPCODES LINUX_NOEXEC_HEAPCODES GENERATIONAL_GC
SPVW_BLOCKS SPVW_MIXED TRIVIALMAP_MEMORY
libsigsegv 2.2
Features: (CLISP ANSI-CL COMMON-LISP LISP=CL INTERPRETER
SOCKETS GENERIC-STREAMS LOGICAL-PATHNAMES SCREEN FFI GETTEXT
BASE-CHAR=CHARACTER PC386 UNIX)
C Modules: (clisp)
Installation directory: ./
User language: ENGLISH
Machine: I686 (I686) localhost.localdomain [127.0.0.1]
libsigsegv is that from Peter van Eynde's people.debian.org
2.2-2breezy
Logged In: YES
user_id=377168
Here's a work-around that does not disable generational GC:
Compile with -DCONS_HEAP_GROWS_UP
Note it can only make a difference on machines using
SPVW_MIXED_BLOCKS where TRIVIALMAP_MEMORY is defined (e.g.
Linux or MS-VC) -- see spvw.d
Logged In: YES
user_id=377168
The suggested work-arounds -DCONS_HEAP_GROWS_UP or
-DSAFETY=3 or -DNO_GENERATIONAL_GC have nothing todo with
the error. A simple clisp -m14MB also helps to make the Lisp
stack much larger in some configurations, so CLISP runs in a
(non-deadly) Lisp stack overflow instead of a (upto now
deadly) C program stack overflow.
I've identified the cause of the exit on the second stack
overflow:
* spvw_sigsegv.d (stackoverflow_handler) [UNIX]: libsigsegv doc
says to restore normal signal mask prior to leaving handler.
Expect a fix in CVS ASAP.
Logged In: YES
user_id=377168
thank you for your bug report.
the bug has been fixed in the CVS tree.
you can either wait for the next release (recommended)
or check out the current CVS tree (see http://clisp.cons.org\)
and build CLISP from the sources (be advised that between
releases the CVS tree is very unstable and may not even build
on your platform).