From: Daniel F. <drf...@gm...> - 2007-12-08 21:38:16
|
On Dec 8, 2007 2:23 AM, Nikodemus Siivola <nik...@ra...> wrot= e: > On Dec 8, 2007 2:51 AM, Daniel Farina <drf...@gm...> wrote: > > > I have recently determined that SBCL seems to be having an issue with > > garbage collection grabbing a recursive lock. I have found this bug > > affects every version I have tested from the latest to sbcl 1.0.0. > > > > The hardware is a core 2 duo (/proc/cpuinfo for one of these machines > > appended) running GNU/Linux. I have determined that on my two to three > > hour test set that this bug may occur one in two times. Unfortunately > > it's a total blocker at that point, so it's debilitating. > > Just a sanity check: nothing's binding *CURRENT-THREAD* by any chance, > using GET-MUTEX with an explicit NEW-OWNER, or frobbing MUTEX-OWNER/VALUE= ? > I don't write any odd threaded code here that would do such a thing. I suppose it's possible that one of the libraries I use (most obviously relevant: drakma, hunchentoot, bordeaux-threads for a portable WITH-MUTEX) might. > Does the issue manifest if you use (restrict-compiler-policy 'safety 3)? > I'll give this a try, but any sort of verification is going to take about a day to run at least a few trials. I'll get started on this and let you know. > There is also a known (and fixed) issue in SBCLs in the range 1.0.7.? > - 1.0.9.38 that could > manifest in this manner -- stay away from those. I suggest you run with 1= .0.12. > > > I'm poking around in the gc, interrupt, signal, and thread sections of > > SBCL, but it takes a long time to confirm or deny if anything I'm doing > > Are you saying you're looking for the problem, or are you saying you're r= unning > a patched SBCL? Patching SBCL: only in the last day or so (around the time I wrote this email). Prior to that I had been trying to see if I could find a revision of SBCL that worked to try and identify a regression-causing changeset. > At any rate, what's in *FEATURES*? To verify that you have a sane build, = please > send the results of (DISASSEMBLE 'SB-THREAD:GET-MUTEX). > *FEATURES*: (:OS-PROVIDES-DLADDR :SB-THREAD :ANSI-CL :COMMON-LISP :SBCL :SB-DOC :SB-TEST :SB-LDB :SB-PACKAGE-LOCKS :SB-UNICODE :SB-EVAL :SB-SOURCE-LOCATIONS :IEEE-FLOATING-POINT :X86 :UNIX :ELF :LINUX :LARGEFILE :GENCGC :STACK-GROWS-DOWNWARD-NOT-UPWARD :C-STACK-IS-CONTROL-STACK :COMPARE-AND-SWAP-VOPS :UNWIND-TO-FRAME-AND-CALL-VOP :STACK-ALLOCATABLE-CLOSURES :ALIEN-CALLBACKS :LINKAGE-TABLE :OS-PROVIDES-DLOPEN :OS-PROVIDES-PUTWC :OS-PROVIDES-SUSECONDS-T) (DISASSEMBLE 'SB-THREAD:GET-MUTEX): This is long, so I have attached it. WARNING: It comes from a patched version of SBCL that has some minor changes to SUB-GC. When I get back home tonight I will have time to build an unaltered SBCL if this disassembly seems bad. > > has an effect. Send any patches that you think will affect the bug or > > test cases that you want think may be able to reproduce the bug (I may > > get around to writing such a test case eventually). > > > > Some more obvious causes for failure: > > > > =95 without-interrupts could be not working entirely properly > > =95 without-interrupts may not be called in the right places to > > guard everything that needs to be guarded from interrupts > > I don't see how this would be the cause. (Which of course doesn't mean th= at > it can't be -- I just don't find it likely.) SUB-GC and GET-MUTEX are > the obvious > candidates. Okay, I had been staring at the SUB-GC procedure for a while and had been wondering if everything is being guarded properly even though I could not identify a principled reason for failure there after close inspection. > > Without further ado, here are three separate condition dumps: > > Which versions of SBCL are these from? These vary, unfortunately. The error tended to be similar, so I just tried to get a nice orthogonal set errors. I can generate you new condition dumps from whatever version you wish. fdr |