#520 Terminal kill under 2.44 debian might lead to hungry zombie

segfault
closed-fixed
Sam Steingold
5
2010-04-11
2009-05-22
Big Mike
No

If you start clisp in a terminal and then kill the terminal window, sometimes there is a renegade "lisp.run" process that goes to 100% cpu usage.
Having a hard time reproducing, but the following has lead to the problem:

start a terminal
type clisp
kill the terminal window with mouse

open another terminal
top
watch the zombie run at 100% cpu until you kill it...

Discussion

  • Sam Steingold
    Sam Steingold
    2009-05-22

    I cannot reproduce this.
    If you can, please compile clisp with debugging symbols
    (http://clisp.podval.org/impnotes/faq.html#faq-debug)
    and attach gdb to the run-away clisp and send us the backtrace.
    if you cannot recompile clisp, please use strace and ltrace to figure out what clisp is doing.

     
  • Sam Steingold
    Sam Steingold
    2009-06-16

    this does not look too informative: mprotect point in the direction of GC.
    I am afraid I won't be able to do much unless I get a lisp backtrace.
    i.e., you need to build clisp --with-debug and attach gdb to the zombie.

    Bruno, could you please take a look at the Don's straces?

     
  • Sam Steingold
    Sam Steingold
    2009-06-16

    • labels: --> memory management
    • milestone: --> segfault
    • assigned_to: nobody --> haible
     
  • Brian De Wolf
    Brian De Wolf
    2010-04-07

    I've recently run across this and can contribute debugging information. For the processes I've looked at, their backtraces are huge (tens of thousands of frames) and they consist of a repeat of this:

    ...
    #36462 0x0000000000507ea0 in check_variable_value_replacement ()
    #36463 0x000000000045cc42 in interpret_bytecode_ ()
    #36464 0x0000000000465de4 in funcall_closure ()
    #36465 0x0000000000465ebd in funcall ()
    #36466 0x0000000000502f90 in C_invoke_debugger ()
    #36467 0x0000000000455545 in funcall_subr ()
    #36468 0x0000000000465ea7 in funcall ()
    #36469 0x00000000005028b0 in signal_and_debug ()
    #36470 0x0000000000504149 in end_error ()
    #36471 0x0000000000505b90 in prepare_error ()
    #36472 0x00000000005041be in error ()
    #36473 0x0000000000507ea0 in check_variable_value_replacement ()
    #36474 0x000000000045cc42 in interpret_bytecode_ ()
    #36475 0x0000000000465de4 in funcall_closure ()
    #36476 0x0000000000465ebd in funcall ()
    #36477 0x0000000000502f90 in C_invoke_debugger ()
    ---Type <return> to continue, or q <return> to quit---
    #36478 0x0000000000455545 in funcall_subr ()
    #36479 0x0000000000465ea7 in funcall ()
    #36480 0x00000000005028b0 in signal_and_debug ()
    #36481 0x0000000000504149 in end_error ()
    #36482 0x0000000000505a19 in OS_error ()
    #36483 0x000000000048c3e5 in low_listen_unbuffered_handle ()
    #36484 0x000000000048a945 in listen_char_unbuffered ()
    #36485 0x000000000048ac86 in listen_char_terminal3 ()
    #36486 0x00000000004a0c59 in listen_char ()
    #36487 0x0000000000501562 in read_form ()
    #36488 0x000000000050229a in C_read_eval_print ()
    #36489 0x0000000000455545 in funcall_subr ()
    #36490 0x0000000000465eeb in funcall ()
    #36491 0x000000000045e130 in interpret_bytecode_ ()
    #36492 0x0000000000465de4 in funcall_closure ()
    #36493 0x0000000000465ebd in funcall ()
    #36494 0x000000000046bf9e in C_driver ()
    #36495 0x000000000045e276 in interpret_bytecode_ ()
    #36496 0x0000000000465de4 in funcall_closure ()
    #36497 0x0000000000465ebd in funcall ()
    #36498 0x000000000045ea86 in interpret_bytecode_ ()
    #36499 0x0000000000465de4 in funcall_closure ()
    #36500 0x0000000000465ebd in funcall ()
    ---Type <return> to continue, or q <return> to quit---
    #36501 0x000000000045ea86 in interpret_bytecode_ ()
    #36502 0x0000000000465de4 in funcall_closure ()
    #36503 0x0000000000465ebd in funcall ()
    #36504 0x000000000045ea86 in interpret_bytecode_ ()
    #36505 0x0000000000465de4 in funcall_closure ()
    #36506 0x0000000000465ebd in funcall ()
    #36507 0x00000000005011d2 in driver ()
    #36508 0x000000000044db69 in main_actions ()
    #36509 0x0000000000450eae in main ()

    The frames omitted are just a repeat of the pattern that you can see beginning.

    It does seem to have a random chance of occurring. I think it has a higher chance of occurring if there's input sitting on the line but it's hard to say.

    I'm running on gentoo linux, 2.6.31-gentoo-r6, version 2.48. Let me know if there's anything else I can provide.

     
  • Sam Steingold
    Sam Steingold
    2010-04-11

    • assigned_to: haible --> sds
    • status: open --> closed-fixed
     
  • Sam Steingold
    Sam Steingold
    2010-04-11

    thank you for your bug report.
    the bug has been fixed in the CVS tree.
    you can either wait for the next release (recommended)
    or check out the current CVS tree (see http://clisp.cons.org\)
    and build CLISP from the sources (be advised that between
    releases the CVS tree is very unstable and may not even build
    on your platform).