From: Gábor M. <me...@re...> - 2009-01-03 16:28:18
|
On Martes 16 Diciembre 2008, Gábor Melis wrote: > On Martes 16 Diciembre 2008, Juho Snellman wrote: > > "Attila Lendvai" <att...@gm...> writes: > > > dear list, > > > > > > our webapp stopped responding and slime was unable to connect, so > > > i've attached a gdb to see what's going on. please take a look at > > > the attached script. > > > > > > the backtrace of two threads are different, the others are very > > > similar. i've slightly edited the script to move the interesting > > > stuff towards the beginning. > > > > > > to me it looks like a gc hang, but i hope some more trained eyes > > > can extract more info from it. > > > > Can't say that the backtrace makes a lot of sense... > > > > All threads except one are waiting to be restarted after having > > been stopped for GC. The one thread that isn't is instead blocked > > on a lock that should already have been acquired before any of the > > other threads were stopped. And from the backtrace I can't see what > > other thread could possibly be holding the lock. > > > > The only explanation I can see is that gc_stop_the_world is getting > > called twice without an intervening gc_start_the_world. But that > > should be impossible. > > I'm just thinking aloud. > > There is a scav_lose frame at the bottom of the backtrace of the > thread in gc_stop_the_world. If it's for real then possibly the > dereferencing of the pointer in scav_lose triggers a memory fault and > the lisp handler tries to run gc again. To go out farther on this > branch, let's say that the > > (unless (eq sb!thread:*current-thread* > (sb!thread:mutex-value *already-in-gc*)) ...) > > check in SUB-GC fails due to only one of the two references having > being scavenged and stop_the_world ends up being called again. > > It's a _very_ long shot, though. Especially because scav_lose should > only be called with pointers into the heap by scavenge() like this: > > (scavtab[widetag_of(object)])(object_ptr, object) > > but why isn't scav_lose simply taking widetag_of(object)??? > > static long > scav_lose(lispobj *where, lispobj object) > { > lose("no scavenge function for object 0x%08x (widetag 0x%x)\n", > (unsigned long)object, > widetag_of(*(lispobj*)native_pointer(object))); > > return 0; /* bogus return value to satisfy static type checking > */ } Fixed scav_lose in 1.0.24.8. Hopefully, this will make backtraces from similar failures clearer. |