From: Daniel B. <da...@te...> - 2002-06-07 02:16:29
|
I found what I believe is a GC bug while rebuilding on Tru64 a.k.a OSF/1. This seems to apply only to ports using the exact GC; the gencgc does sopmething different. - interrupt_maybe_gc establishes that the gc trigger has been hit. It clears the trigger, calls the Lisp function MAYBE-GC. - MAYBE-GC calls SUB-GC, which if *GC-INHIBIT* is set, returns immediately without resetting the trigger - as it would have done had it actually GCed. - So, after that we cons until we run straight off the end of dynamic space. I've checked in a fix for the immediate problem, which makes interrupt_maybe_gc check whether a GC has happened, and resets the trigger itself if it hadn't, but I think that's probably more of a kludge than a fix. Doing TRT is going to require understanding how gencgc works as well as gc, though, as the code in gc.lisp talks to both. Oh, I also checked in the port to Tru64, which passes tests, builds itself, etc. -dan -- http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources |
From: Nicolas N. <ne...@ma...> - 2007-09-04 08:38:48
|
Hello, as much as I see I can reliably break any of CMUCL, SBCL, GCL, ECL with suitable calls of the following function (to achieve fast results adapt the parameters such that the data structure needs something like 15-25% of the available heap): (defun test (&optional (n1 1000) (n2 50) (n3 1000000)) "Allocates n1 times a list of n2 double-float arrays of size n3." (let ((space ())) (format t "~&Calculation should need ~D MB.~%~%" (floor (/ (* 8 n2 n3) 1000000))) (loop for i from 1 upto n1 do (format t "~&~%i=~D~%" i) (time (progn (setq space ()) (setq space (loop repeat n2 collect (make-array n3 :element-type 'double-float :initial-element 0.0d0))) (room)))))) Is there anyone on this list who cannot reproduce this? Since there was reported a GC bug on cll involving only lists, I suspect that most long-running applications can run into this bug. Could someone of the developers comment on this, please? Thank you very much, Nicolas P.S.: Up to now, I could not break Allegro CL using this technique. P.S.2: Here again an excerpt from a session neuss@ma-patru:~$ sbcl --dynamic-space-size 1000 This is SBCL 1.0.9.5, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. ; loading system definition from /usr/local/lib/sbcl/sb-grovel/sb-grovel.asd ; into #<PACKAGE "ASDF1"> ; registering #<SYSTEM SB-GROVEL {100255AE71}> as SB-GROVEL * (defun test (&optional (n1 1000) (n2 50) (n3 1000000)) "Allocates n1 times a list of n2 double-float arrays of size n3." (let ((space ())) (format t "~&Calculation should need ~D MB.~%~%" (floor (/ (* 8 n2 n3) 1000000))) (loop for i from 1 upto n1 do (format t "~&~%i=~D~%" i) (time (progn (setq space ()) (setq space (loop repeat n2 collect (make-array n3 :element-type 'double-float :initial-element 0.0d0))) (room)))))) TEST * (test 1000 100 200000) Calculation should need 160 MB. ... i=11 Heap exhausted during allocation: 1212416 bytes available, 1600016 requested. debugger invoked on a SB-KERNEL::HEAP-EXHAUSTED-ERROR in thread #<THREAD "initial thread" {10023D8CD1}>: Heap exhausted: 1212416 bytes available, 1600016 requested. PROCEED WITH CAUTION! Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL. restarts (invokable by number or by possibly-abbreviated name): 0: [ABORT] Exit debugger, returning to top level. (SB-KERNEL::HEAP-EXHAUSTED-ERROR 1212416 1600016) 0] |
From: Harald Hanche-O. <ha...@ma...> - 2007-09-04 09:11:38
|
Funny, on my ppc mac I got this result: * (test 1000 100 200000) Calculation should need 160 MB. ;;; nothing out of the ordinary, until ... i=65 Dynamic space usage is: 539,684,672 bytes. Read-only space usage is: 1,872 bytes. Static space usage is: 1,272 bytes. Control stack usage is: 1,472 bytes. Binding stack usage is: 336 bytes. Garbage collection is currently enabled. Breakdown for dynamic space: debugger invoked on a TYPE-ERROR: The value 539695592 is not of type FIXNUM. Type HELP for debugger help, or (SB-EXT:QUIT) to exit from SBCL. restarts (invokable by number or by possibly-abbreviated name): 0: [ABORT] Exit debugger, returning to top level. (SB-VM::REPORT-SPACE-TOTAL (:DYNAMIC (513602576 322 SB-VM::SIMPLE-ARRAY-DOUBLE-FLOAT) (12281680 11028 SB-VM::CODE) (3386496 79221 SB-KERNEL:INSTANCE) (3146584 55129 SIMPLE-VECTOR) (3004648 375581 CONS) (1130112 33036 SIMPLE-BASE-STRING) (697992 29083 SYMBOL) (537960 5447 SB-VM::SIMPLE-CHARACTER-STRING) (356744 16080 BIGNUM) (350144 19126 SB-VM::CLOSURE) (311128 4 SB-VM::SIMPLE-ARRAY-UNSIGNED-BYTE-29) ...) 0.05) 0] ^R back 0: (SB-VM::REPORT-SPACE-TOTAL (:DYNAMIC (513602576 322 SB-VM::SIMPLE-ARRAY-DOUBLE-FLOAT) (12281680 11028 SB-VM::CODE) (3386496 79221 SB-KERNEL:INSTANCE) (3146584 55129 SIMPLE-VECTOR) (3004648 375581 CONS) (1130112 33036 SIMPLE-BASE-STRING) (697992 29083 SYMBOL) (537960 5447 SB-VM::SIMPLE-CHARACTER-STRING) (356744 16080 BIGNUM) (350144 19126 SB-VM::CLOSURE) (311128 4 SB-VM::SIMPLE-ARRAY-UNSIGNED-BYTE-29) ...) 0.05) 1: (SB-VM:MEMORY-USAGE) 2: (ROOM :DEFAULT) ... * (lisp-implementation-version) "1.0.9.16" - Harald |
From: Harald Hanche-O. <ha...@ma...> - 2007-09-09 15:38:44
|
In the function sb-vm::report-space-total (file room.lisp), we find the local declaration (declare (fixnum total-objects total-bytes cutoff-point reported-objects reported-bytes)) However, as I discovered when trying out the test code in the "GC bug" thread, at least one of these numbers (total-bytes probably) can be too large to fit in a fixnum. I imagine the point of the declaration is to avoid consing, which might be harmful in a tight memory situation when you run (room) to find out what is happening. But getting thrown willy-nilly into the debugger is surely not much better? Should that declaration just be removed? At least on 32 bit architectures? I don't know how big a fixnum is on 64 bit machines, but perhaps it is big enough that the declaration is still safe there. - Harald |
From: Harald Hanche-O. <ha...@ma...> - 2007-09-09 19:29:41
|
[ There seems to be a local SMTP problem so that the first version of this email fell into a black hole. Still trying to figure out what went wrong. My apologies if the list gets it twice after all ... ] In the function sb-vm::report-space-total (file room.lisp), we find the local declaration (declare (fixnum total-objects total-bytes cutoff-point reported-objects reported-bytes)) However, as I discovered when trying out the test code in the "GC bug" thread, at least one of these numbers (total-bytes probably) can be too large to fit in a fixnum. I imagine the point of the declaration is to avoid consing, which might be harmful in a tight memory situation when you run (room) to find out what is happening. But getting thrown willy-nilly into the debugger is surely not much better? Should that declaration just be removed? At least on 32 bit architectures? I don't know how big a fixnum is on 64 bit machines, but perhaps it is big enough that the declaration is still safe there. - Harald |
From: Juho S. <js...@ik...> - 2007-09-04 19:10:22
|
Nicolas Neuss <ne...@ma...> writes: > Is there anyone on this list who cannot reproduce this? Since there was > reported a GC bug on cll involving only lists, I suspect that most > long-running applications can run into this bug. The cll bug report and this are basically completely different things. The cll bug was about essentially an infinitely large linked list. As long as the conservative gc accidentally holds on to even one cons in the list, the whole remaining chain can not be collected. I consider this to be basically an uninteresting issue in that a conservative gc can't really avoid it happening. One could reduce the likelyhood of it happening by being less conservative (e.g. object granularity instead of allocation region granularity like the current sbcl design). The issue you're reporting has to do with the generational nature of the garbage collector. Basically data is being accumulated into older generations, and not getting collected from there. SBCL can't do a a full GC to free up more memory, since that will require N+2*M memory where N is the heap usage before GC and M is the usage after it. It also doesn't have the capability of collecting just some older generation. There are various ways in which this could be solved. The one that can be instantly applied is to reduce the amount of generations (reducing it to 0 will basically turn the collector into non-generational): (define-alien-variable gencgc-oldest-gen-to-gc char) (setf gencgc-oldest-gen-to-gc 0) ;; Or 1, or however many you want Proper solutions might be to write an incremental gc, to extend the existing gc to allow it to temporarily use more heap during collection, or to modify the gc to allow it to make a full collection without using as much extra heap. -- Juho Snellman |
From: Nicolas N. <ne...@ma...> - 2007-09-05 08:29:09
|
Juho Snellman <js...@ik...> writes: > There are various ways in which this could be solved. The one that can > be instantly applied is to reduce the amount of generations (reducing > it to 0 will basically turn the collector into non-generational): > > (define-alien-variable gencgc-oldest-gen-to-gc char) > (setf gencgc-oldest-gen-to-gc 0) ;; Or 1, or however many you want Very nice. It works for my test case. I think I'll use this solution at least when starting large calculations. Thanks, Nicolas |
From: Nicolas N. <ne...@ma...> - 2007-09-04 20:14:55
|
Juho Snellman <js...@ik...> writes: > Nicolas Neuss <ne...@ma...> writes: > > Is there anyone on this list who cannot reproduce this? Since there was > > reported a GC bug on cll involving only lists, I suspect that most > > long-running applications can run into this bug. > > The cll bug report and this are basically completely different things. First, thank you for the answer. > The cll bug was about essentially an infinitely large linked list. As > long as the conservative gc accidentally holds on to even one cons in > the list, the whole remaining chain can not be collected. I consider > this to be basically an uninteresting issue in that a conservative gc > can't really avoid it happening. One could reduce the likelyhood of it > happening by being less conservative (e.g. object granularity instead > of allocation region granularity like the current sbcl design). OK, probably I have to read more about GC to understand this properly. At the moment, I feel uneasy about such a reasonable implementation of infinite streams can break the GC. > The issue you're reporting has to do with the generational nature of > the garbage collector. Basically data is being accumulated into older > generations, and not getting collected from there. SBCL can't do a a > full GC to free up more memory, since that will require N+2*M memory > where N is the heap usage before GC and M is the usage after it. It > also doesn't have the capability of collecting just some older > generation. This would mean that (if N=M as in the test function) my data structures should be able to use about 1/3 of the available heap, no? If I interpret my experiments correctly the limit is lower, something like 1/5 or even 1/6. > There are various ways in which this could be solved. The one that can > be instantly applied is to reduce the amount of generations (reducing > it to 0 will basically turn the collector into non-generational): > > (define-alien-variable gencgc-oldest-gen-to-gc char) > (setf gencgc-oldest-gen-to-gc 0) ;; Or 1, or however many you want OK, I'll try that tomorrow. Thanks. > Proper solutions might be to write an incremental gc, to extend the > existing gc to allow it to temporarily use more heap during > collection, or to modify the gc to allow it to make a full collection > without using as much extra heap. Out of interest: what would be the best of those? Is it known what Allegro is doing? Yours, Nicolas |
From: Nicolas N. <ne...@ma...> - 2007-09-04 20:43:41
|
Nicolas Neuss <ne...@ma...> writes: > ... Is it known what Allegro is doing? To answer myself here: this is very well explained in the Allegro CL documentation: "... The Allegro CL garbage collector is a two-space, generation-scavenging system. ..." Nicolas |
From: Juho S. <js...@ik...> - 2007-09-04 20:51:08
|
Nicolas Neuss <ne...@ma...> writes: > > The issue you're reporting has to do with the generational nature of > > the garbage collector. Basically data is being accumulated into older > > generations, and not getting collected from there. SBCL can't do a a > > full GC to free up more memory, since that will require N+2*M memory > > where N is the heap usage before GC and M is the usage after it. [Err... That should of course be where N+M for those definitions of N and M. I was thinking of different ones while writing the formula...] > This would mean that (if N=M as in the test function) my data structures > should be able to use about 1/3 of the available heap, no? If I interpret > my experiments correctly the limit is lower, something like 1/5 or even > 1/6. I'm not quite sure of what you mean. You can certainly fill up half of the heap with your own data structure and then successfully run a full GC. Is the question about running multiple iterations of the memory consuming code? If so, the answer is what I was trying to explain in the previous message: the data sets of previous iterations might still be using up heap in the older generations, even if they are logically garbage. > > Proper solutions might be to write an incremental gc, to extend the > > existing gc to allow it to temporarily use more heap during > > collection, or to modify the gc to allow it to make a full collection > > without using as much extra heap. > > Out of interest: what would be the best of those? Depends on what you're measuring, really. A full incremental gc might be complete overkill for this, but desireable for other reasons. The middle option would probably be easy to implement, but wouldn't help if available address space is scarce. The goodness of the third option would depend on the design, that description is rather broad. > Is it known what Allegro is doing? That's probably not very interesting. Allegro is hardly a gold standard of gc implementations. -- Juho Snellman |
From: Nicolas N. <ne...@ma...> - 2007-09-05 08:16:33
|
Juho Snellman <js...@ik...> writes: > Nicolas Neuss <ne...@ma...> writes: > > > The issue you're reporting has to do with the generational nature of > > > the garbage collector. Basically data is being accumulated into older > > > generations, and not getting collected from there. SBCL can't do a a > > > full GC to free up more memory, since that will require N+2*M memory > > > where N is the heap usage before GC and M is the usage after it. > > [Err... That should of course be where N+M for those definitions of N > and M. I was thinking of different ones while writing the formula...] Ah, ok. > > This would mean that (if N=M as in the test function) my data structures > > should be able to use about 1/3 of the available heap, no? If I interpret > > my experiments correctly the limit is lower, something like 1/5 or even > > 1/6. > > I'm not quite sure of what you mean. You can certainly fill up half of > the heap with your own data structure and then successfully run a full > GC. Is the question about running multiple iterations of the memory > consuming code? If so, the answer is what I was trying to explain in > the previous message: the data sets of previous iterations might still > be using up heap in the older generations, even if they are logically > garbage. OK, now I understand also this point. I tried including a (gc :full t) in the loop, and it does remove the problem. Maybe this could also be an option for my actual application. Thanks, Nicolas |
From: Daniel H. <dhe...@te...> - 2007-09-04 23:44:47
|
On Tue, 4 Sep 2007, Juho Snellman wrote: > Nicolas Neuss <ne...@ma...> writes: >>> Proper solutions might be to write an incremental gc, to extend the >>> existing gc to allow it to temporarily use more heap during >>> collection, or to modify the gc to allow it to make a full collection >>> without using as much extra heap. >> >> Out of interest: what would be the best of those? > > Depends on what you're measuring, really. A full incremental gc might > be complete overkill for this, but desireable for other reasons. The > middle option would probably be easy to implement, but wouldn't help > if available address space is scarce. The goodness of the third option > would depend on the design, that description is rather broad. > >> Is it known what Allegro is doing? > > That's probably not very interesting. Allegro is hardly a gold > standard of gc implementations. FWIW, Microsoft appears to be using a mark/compact strategy in their .NET runtime. http://www.csharphelp.com/archives2/archive297.html http://msdn2.microsoft.com/en-us/library/ms973837.aspx http://drowningintechnicaldebt.com/blogs/royashbrook/archive/2007/06/22/top-20-net-garbage-collection-gc-articles.aspx The "Finalization and Performance" section of this second link is somewhat interesting. - Daniel |
From: Daniel H. <dhe...@te...> - 2007-09-05 01:04:48
|
On Tue, 4 Sep 2007, Daniel Herring wrote: > FWIW, Microsoft appears to be using a mark/compact strategy in their .NET > runtime. > > http://www.csharphelp.com/archives2/archive297.html > http://msdn2.microsoft.com/en-us/library/ms973837.aspx > http://drowningintechnicaldebt.com/blogs/royashbrook/archive/2007/06/22/top-20-net-garbage-collection-gc-articles.aspx Last post on this - better links. Richard Jones wrote a book on the topic and maintains a good list of GC links. http://www.cs.kent.ac.uk/people/staff/rej/gc.html In particular, he links a good survey paper (67 pages) on the different GC techniques: ftp://ftp.cs.utexas.edu/pub/garbage/bigsurv.ps Here's a detailed link on the .NET CLR GC. http://vineetgupta.spaces.live.com/blog/cns!8DE4BDC896BEE1AD!1104.entry - Daniel |
From: Daniel B. <da...@te...> - 2002-06-07 11:26:18
|
Daniel Barlow <da...@te...> writes: > I've checked in a fix for the immediate problem, which makes > interrupt_maybe_gc check whether a GC has happened, and resets the > trigger itself if it hadn't "... is the wrong answer" -- Angus Deayton This will cause the segv_handler to get called on every page of allocation inside the body of a without-gcing form - which is (a) slow, (b) unnecessary, as the without-gcing macro calls maybe-gc when it's done anyway - which should reenable the trigger. Clearly there is a bug somewhere else ... -dan -- http://ww.telent.net/cliki/ - Link farm for free CL-on-Unix resources |