#568 seg fault from cvs MT clisp due to show-stack

segfault
closed-fixed
5
6 days ago
2010-09-28
Don Cohen
No

doesn't happen if I just run clisp and do show-stack in the main thread, but after doing this:
[1]> (defun show-socket-addrs(socket)
(multiple-value-bind
(local-host local-port)
(socket:socket-stream-local socket)
(multiple-value-bind
(remote-host remote-port)
(socket:socket-stream-peer socket)
(format t "~&Connection: ~S:~D -- ~S:~D~%"
remote-host remote-port local-host local-port))))
SHOW-SOCKET-ADDRS
[2]> (defun debug-server()
(let ((server (socket:socket-server 8217 :interface "localhost")))
(unwind-protect
(loop
(let ((socket (socket:socket-accept server :buffered nil)))
(show-socket-addrs socket)
(let ((tlist (loop for x in (mt:list-threads) as i from 1
when (mt:thread-active-p x) collect (cons i x)))
ans)
(print tlist socket)
(print "debug which thread (enter the number)" socket)
(when (setf ans (cdr (assoc (read socket) tlist)))
(mt:thread-interrupt
ans
:function
(lambda nil
(let ((*standard-input* socket)
(*standard-output* socket)
(*debug-io* socket)
(*error-output* socket)
(*trace-output* socket)
(*query-io* socket))
(unwind-protect
(break "debug")
(close socket)))))))))
(socket:socket-server-close server))))
DEBUG-SERVER
[3]> (mt:make-thread #'debug-server :name "debug-server")
#<THREAD "debug-server">
[4]>

then in another shell:
$ telnet localhost 8217
Trying 127.0.0.1...^M
Connected to localhost.localdomain (127.0.0.1).^M
Escape character is '^]'.^M
^M
((1 . #<THREAD "debug-server">) (2 . #<THREAD "main thread">)) ^M
"debug which thread (enter the number)" 1
1^M
^M
** - Continuable Error^M
debug^M
If you continue (by typing 'continue'): Return from BREAK loop^M
Break 1 [1]> (ext:show-stack)
(ext:show-stack)^M

... 670 more lines
<21/5> #<SYSTEM-FUNCTION MAKE-THREAD>^M
- Connection closed by foreign host.^M
[2010-09-28 12:54:50 root@collabrium /home/metasearch]
$

The reason the connection is closed is that the clisp process now contains:
Segmentation fault
[2010-09-28 12:54:51 root@collabrium /home/metasearch]
$

Discussion

  • The problem is similar to bug#1506316.
    It is caused by not detecting properly the bottom of the stack in debug.d:show_stack():1508.
    In this case STACK_start == FRAME but cmpSTACKop is defined as > - thus the stack is traversed beyond it's end.
    Shouldn't the comparison be reversed: (FRAME cmpSTACKop (gcv_object_t*)STACK_start)? Would this work if STACK_DOWN is defined?

    The problem is not experienced for the first thread in process since it's stack is allocated differently (at least page aligned in spvw.d:init_memory()).
    show-stack in this thread ends with following:
    <20/3> #<SYSTEM-FUNCTION SYSTEM::DRIVER>
    - #<ADDRESS #x00000000>
    - #<ADDRESS #x00000000>
    - #<ADDRESS #x00000000>
    20

    Why 3 nullobjs at the end - by design there should be just 2? Looks like this works because of uninitialized memory.
    With thread created by mt:create-thread, show-stack shows:
    - #<FUNCTION :LAMBDA NIL (LOOP (SLEEP 1))>
    - #<ADDRESS #x00000000>
    - #<ADDRESS #x00000000>
    <17/5> #<SYSTEM-FUNCTION MAKE-THREAD>
    and segfault here (two nullobjs are the bottom of the stack and beyond them there is some uninitialized memory).

     
  • thank you for your bug report.
    the bug has been fixed in the CVS tree.
    you can either wait for the next release (recommended)
    or check out the current CVS tree (see http://clisp.cons.org\)
    and build CLISP from the sources (be advised that between
    releases the CVS tree is very unstable and may not even build
    on your platform).

     
    • status: open --> closed-fixed