From: Jeff C. <je...@jk...> - 2010-10-07 16:05:52
|
Forgot to cc the list. -----Original Message----- From: Attila Lendvai <att...@gm...> To: Jeff Cunningham <je...@jk...> Subject: Re: [Sbcl-devel] Started getting memory fault in core dumps? Date: Thu, 7 Oct 2010 15:03:05 +0200 > Any ideas? if you start a swank server, then this might be related: https://bugs.launchpad.net/slime/+bug/444427 although not probable if it only happened after the some upgrades... Maybe low probability but good hunch, Attila. After browsing through that bug report I tried unsetting the swank *log-output* and it seems to fix it: (defun start-the-servers() (setf swank:*log-output* nil) (handler-bind ((serious-condition (lambda (c) (declare (ignore c)) (sb-debug:backtrace)))) (rest-of-code))) Now the question is why? Or maybe better, why now? Do I injure my code in any way from just leaving the swank:*log-output* set to NIL there? Regards, Jeff |
From: Jeff C. <je...@jk...> - 2010-10-07 16:06:26
|
Forgot to cc the list. -------- Forwarded Message -------- From: Jeff Cunningham <je...@jk...> To: Nikodemus Siivola <nik...@ra...> Subject: Re: [Sbcl-devel] Started getting memory fault in core dumps? Date: Thu, 07 Oct 2010 11:48:59 -0400 Thank you for the response. I didn't know about sbcl-help so I just subscribed. Comments interspersed below: --Jeff -----Original Message----- From: Nikodemus Siivola <nik...@ra...> On 7 October 2010 14:29, Jeff Cunningham wrote: (Moving to sbcl-help.) [lengthy explanation snipped...] Any number of things could be causing this. * Finalizers from before the core is saved trying to release foreign memory allocated in the previous session. :DONT-SAVE keyword argument to FINALIZE is for preventing that from happening. Happily, this is easy to rule out: (setf sb-impl::**finalizer-store** nil) right at the start of START-THE-SERVERS. If it makes the error go away, then it's a stale finalizer and all you need to do is to identify where it comes from, and add :DONT-SAVE to it. I tried this right away, of course, but it made no difference. * Something, somewhere, may be eg. grabbing a VECTOR-SAP (the actual memory address of vector) and trying to reuse it for the second invocation. * Something, somewhere, could be allocating foreign memory before the core is saved, and blindly assuming that the foreign memory will remain valid. These two (and related ones) are a bit tricker to identify. The best way is probably to try to identify just when does the memory fault occur -- which is likely to make identifying the underlying problem a lot easier. First try to get a backtrace from the error: (defun start-the-servers () (handler-bind ((serious-condition (lambda (c) (declare (ignore c)) (sb-debug:backtrace)))) (stuff))) with luck, this will set you on the right track. Here is the backtrace I get when I try to launch the core after building in the backtrace: starting servers CORRUPTION WARNING in SBCL pid 13997(tid 140737353971456): Memory fault at ae03e000 (pc=0x1000b7f955, sp=0x7ffff4c72670) The integrity of this image is possibly compromised. Continuing with fingers crossed. 0: (SB-DEBUG::MAP-BACKTRACE #<CLOSURE (LAMBDA #) {1004E1A009}>)[:EXTERNAL] 1: (BACKTRACE 1152921504606846975 #<SYNONYM-STREAM :SYMBOL *TERMINAL-IO* {10002DD531}>) 2: (SIGNAL #<SB-SYS:MEMORY-FAULT-ERROR {1004E19A81}>)[:EXTERNAL] 3: (ERROR SB-SYS:MEMORY-FAULT-ERROR)[:EXTERNAL] 4: (SB-SYS:MEMORY-FAULT-ERROR) 5: ("foreign function: call_into_lisp") 6: ("foreign function: post_signal_tramp") 7: (SB-IMPL::OUTPUT-BYTES/UTF-8 #<SB-SYS:FD-STREAM for "standard error" {1000349001}> ";; Swank started at port: " NIL 0 26) 8: ((LAMBDA (&REST REST)) #<SB-SYS:FD-STREAM for "standard error" {1000349001}> ";; Swank started at port: " NIL 0 26) 9: ((LAMBDA (&REST REST)) #<SB-SYS:FD-STREAM for "standard error" {1000349001}> ";; Swank started at port: " NIL 0 26)[:OPTIONAL] 10: (SB-IMPL::FD-SOUT #<SB-SYS:FD-STREAM for "standard error" {1000349001}> ";; Swank started at port: " 0 26) 11: (SB-IMPL::%WRITE-STRING ";; Swank started at port: " #<SB-SYS:FD-STREAM for "standard error" {1000349001}> 0 NIL) 12: ((LAMBDA (STREAM &OPTIONAL (#:FORMAT-ARG1072 (ERROR ??? :COMPLAINT "required argument missing" :CONTROL-STRING "~&;; Swank started at port: ~D.~%" :OFFSET 29)) &REST SB-FORMAT::ARGS)) #<SB-SYS:FD-STREAM for "standard error" {1000349001}> 4005) 13: (FORMAT #<SB-SYS:FD-STREAM for "standard error" {1000349001}> #<FUNCTION (LAMBDA #) {100207E6A9}>)[:EXTERNAL] 14: (SWANK::SIMPLE-ANNOUNCE-FUNCTION 4005) 15: (SWANK::SETUP-SERVER 4005 #<FUNCTION SWANK::SIMPLE-ANNOUNCE-FUNCTION> :SPAWN T "iso-latin-1-unix") 16: (WEBSERVER::START-THE-SERVERS) 17: (SB-INT:CALL-HOOKS "initialization" (#<FUNCTION WEBSERVER::START-THE-SERVERS>))[:EXTERNAL] 18: ((LABELS SB-IMPL::RESTART-LISP)) Error: Problem running initialization hook #<FUNCTION WEBSERVER::START-THE-SERVERS>: Unhandled memory fault at #x2AAAAE03E000. It looks like it has something to do with a foreign function call. Unfortunately, there are a number of these to track down. Regards, Jeff |
From: Nikodemus S. <nik...@ra...> - 2010-10-07 17:23:01
|
On 7 October 2010 19:05, Jeff Cunningham <je...@jk...> wrote: > (defun start-the-servers() > (setf swank:*log-output* nil) > (handler-bind ((serious-condition (lambda (c) > (declare (ignore c)) > (sb-debug:backtrace)))) > (rest-of-code))) > > Now the question is why? Or maybe better, why now? No idea why this did not cause trouble previously. It should have. My best guess is that your updates added mmap randomization, and that previously the mmap'ed buffer used by sb-sys:*stderr* (which is what *log-output* points to) just happened to have the address in each invocation. When SBCL is started the order and size of mmap calls is predictable, and if the OS hands out memory linearly then things could very well have previously worked out by accident. > Do I injure my code in any way from just leaving the swank:*log-output* > set to NIL there? No, you don't. And if you update Slime from CVS, you don't need to do that anymore. I just added an appropriate *save-hook* to swank-sbcl.lisp so this is now done automagically. Thanks to Attila for point out the issue! Cheers, -- Nikodemus |
From: Jeff C. <je...@jk...> - 2010-10-07 17:55:48
|
-----Original Message----- From: Nikodemus Siivola <nik...@ra...> No, you don't. And if you update Slime from CVS, you don't need to do that anymore. I just added an appropriate *save-hook* to swank-sbcl.lisp so this is now done automagically. Thanks to Attila for point out the issue! Cheers, -- Nikodemus Great! Thanks for taking care of that. I'm going to check it out right now and make sure it works with your fix. Thanks! --Jeff |