From: Liam H. <ln...@he...> - 2009-03-16 22:49:21
|
Yes, indeed you are correct. Paul Khuong alerted me to this problem on #lisp over the weekend. I have confirmed with Viktor that my patch solves the problem http://repo.or.cz/w/gsll.git?a=commitdiff;h=40e1bfcee70b79a35827c49faf54b7d47d31a694;hp=b009f3e6347a96c325bab2facecc8ae790defc05 Thanks again for digging into this problem, I had completely overlooked the stuffing of a pointer into the C struct outside of the with-pinned-objects. Liam 2009/3/16 Gábor Melis <me...@re...>: > On Domingo 15 Marzo 2009, Liam Healy wrote: >> [I have added Viktor, who originally reported the problem, to the cc >> as I do not know if he's on the sbcl-devel mailing list.] >> >> On Sat, Mar 14, 2009 at 3:48 PM, Gábor Melis <me...@re...> wrote: >> > On Sábado 14 Marzo 2009, Liam Healy wrote: >> >> Hi, >> >> >> >> A user of GSLL got a SB-SYS:MEMORY-FAULT-ERROR with SBCL using >> >> large (3000x3000) matrices on the GSL function that computes a >> >> singular value decomposition (gsl_linalg_SV_decomp). When I tried >> >> to repeat his test case, I got the error for a smaller size matrix >> >> (1000x1000) where he reported that he got a matrix with all zeros >> >> (which is not the right answer). This is the error I got: >> >> >> >> debugger invoked on a SB-SYS:MEMORY-FAULT-ERROR in thread #<THREAD >> >> "initial thread" RUNNING {10039ADBF1}>: >> >> Unhandled memory fault at #x0. >> >> Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL. >> >> restarts (invokable by number or by possibly-abbreviated name): >> >> 0: [ABORT] Exit debugger, returning to top level. >> >> (SB-SYS:MEMORY-FAULT-ERROR) >> >> 0] >> >> WARNING: Starting a select without a timeout while interrupts are >> >> disabled. >> >> >> >> If the matrix is small enough (say 500x500), it works fine. I am >> >> using 1.0.18.debian (amd64) and he is using 1.0.25 (amd64) and >> >> 1.0.17 (32 bit). How should we approach debugging this problem? >> > >> > Try to isolate things a little. What does (without-gcing (test n) >> > nil) do? What does (defparameter *x* (test n)) (without-gcing >> > (print *x*) nil) do? Does it still crash if you replace the svd >> > call with a dummy one that just creates matrices of the appropriate >> > size? >> >> He reports: >> - (without-gcing (test n)) always works; suddenly both CPU cores are >> busy. - (defparameter *x* (test n)) (without-gcing (print *x*) nil) >> crashes or produces wrong results. >> - If I don't call svd it doesn't crash. >> >> (BTW his use of the term "crash" seems to mean the >> memory-fault-error, not that SBCL quits entirely. At least I have >> only observed that error when trying to >> reproduce his problem, not crashing as I would use the term.) >> >> When I tried to reproduce his problem I noticed that the memory fault >> is associated with printing, see >> http://common-lisp.net/pipermail/gsll-devel/2009q1/000250.html >> where you notice that output-pretty-object etc. is in the backtrace. >> >> > Does make-marray rely on ffi? >> >> No. It makes a CL array and puts it into a CLOS object with some >> other stuff. >> >> > Can it be that some large alien value is stack allocated somewhere? >> >> I'm not sure what you mean. Internally in GSL it wouldn't surprise >> me that something is malloced, but I guess you are asking if on the >> Lisp side there is a foreign allocation. The answer is no; basically >> I generate a pointer (vector-sap) from the marrays and then call the >> foreign function (using with-pinned-objects). >> >> Thanks, >> Liam >> >> >> The thread begins here: >> >> http://common-lisp.net/pipermail/gsll-devel/2009q1/000249.html >> >> GSLL repository: http://repo.or.cz/w/gsll.git >> >> >> >> >> >> Thanks, >> >> Liam > > I've reproduced it locally. Look at the macroexpansion of defmfun > sv-decomposition, it has a bunch of LETs that look like this: > > (LET ((#:G943 (MPOINTER A))) > > These are from with-pinned-objects. A quick check into what mpointer is > reveals that: > > (mpointer (make-marray 'double-float :dimensions (list 5 3))) > => #.(SB-SYS:INT-SAP #X080855E8) > > You are pinning the sap object and it's not doing what you think it is. > Pin the underlying vector and not its sap. > > Cheers, Gabor > |