From: Malcolm R. <mal...@gm...> - 2009-07-30 17:23:29
|
Okay the paste is back at http://paste.lisp.org/display/84446#2 , hopefully in a form which won't result in any legal tonnes of bricks on my head. Responding to Nikodemus: > In pretty much everywhere in SBCL where this is called at runtime, an > error should be signalled if the call fails. Do you have a toplevel > handler that might be silently handling these (or eg. IGNORE-ERRORS > somewhere)? I do have IGNORE-ERRORS, basically because the data I have to work with is at times wildly inconsistent. My first attempt at the code worked fine until it found some entry where numbers that were supposed to be floats were "NaN" or "Infinity", which caused parse-integer to error out. I decided rather than add ad hoc fixes for every bit of random broken data I find, I would wrap the number parsing in ignore-errors and if anything bad happened just use 0.0d0 instead. Hence: (:float `(progn (dbg :feat-extract "i=~A float feature #~A:~A~%" i ,(first spec) ,(third spec)) (setf (aref ,feature-vector i) (or (ignore-errors (coerce-double (parse-number:parse-real-number (pop ,input-vector)))) 0.0d0)) (incf i))) and similar code for integers. I appreciate this is in no way the Right Thing to do but I only just looked at the chapter of PCL which covers conditions. I'll try and be less lazy about this and see if the source of the problem becomes more apparent. I think I can be pretty sure that I'm not recursing in error handling though - the only thing I have in my code releated to errors is a bunch of ignore-errors calls and they are all inside a (or (ignore-errors ...) 0.0d0) form. You say that it could be due to allocating too many stream buffers - surely my use of WITH-OPEN-FILE is fairly textbook.. I will investigate the csv-parser library as it is only one .lisp file I think. I'm not entirely sure what I should look out for though.. I'll report back if I can reproduce after I've removed the ignore-errors from my code. Malcolm > > (SETF *BREAK-ON-SIGNALS* 'ERROR) > > should help with in locating the problem. If you see these messages > and are not getting any conditions that would help narrow it down, > then the next task is to make sure that is is indeed from > os_validate... > >> [ ... snip, prints this line a couple thousand times ... ] >> >> mmap: Cannot allocate memory >> mmap: Cannot allocate memory >> INFO: Control stack guard page unprotected > > At this point the stack has overflown. Again, you should be getting a > STORAGE-CONDITION for this. > >> mmap: Cannot allocate memory >> mmap: Cannot allocate memory >> fatal error encountered in SBCL pid 37763: >> Unhandled memory fault. Exiting. > > I would guess that this is from overflowing the stack some more after > the guard page has been triggered. > > From the description of your program my first guesses as to proximate > causes would be that somehow a huge number of stream buffers are > allocated -- and they either remain in use, or are somehow not > released. Getting a better idea of where the mmap failures are coming > from would go a long way towards verifying this. > > As for the stack overflow: maybe you are recursing in an error handler? Eg. > > (defun log-condition (c) > (handler-bind ((error (lambda (c) (log-condition c)))) > (format *standard-output* "ERROR: ~A~%" c))) > > if the error occurs due to not being able to allocate an IO buffer, > then trying to log the error almost certainly causes the same error > again -- ad infinitum, till you run out of stack space, try to report > that, and finally die. > > So, first task: find out where the mmap is first failing. If it is in > the stream system, as I am guessing, then we need to figure out why > SBCL is using so many buffers and why are they not getting released. > > Cheers, > > -- Nikodemus > |