From: Nikodemus S. <nik...@ra...> - 2012-04-28 15:31:22
|
Below a small test-case. Run it with --dynamic-space-size 8Gb, and observe how the resident memory size keeps growing. The issue is that we don't release memory back to the OS after nursery collections -- and indeed doing that would be pretty expensive. Though this, as far as I can tell, is actually not a regression, Stas (and someone else too, IIRC), however, has reported similar behaviour as a regression since commit 6b1b11a6c51e1c29aee947f1fde7f91651ca3763 Author: Nikodemus Siivola <nik...@ra...> Date: Sat Mar 31 00:56:44 2012 +0300 gencgc: reclaim space more aggressively ...but I so far lack a test-case that demonstrates this. So, if you have a test-case that demonstrates a regression in SBCL's RES growing for no apparent good cause, please let me know! The test-case below behaves apparently identically with current HEAD and ones prior to the commit above. SBCL's *between* the commit above and commit 31103f174118c5e30087b26447cf33515627f9c4 Author: Nikodemus Siivola <nik...@ra...> Date: Sat Apr 14 11:08:45 2012 +0300 gencgc: tune the recent "more aggressive GC" changes misbehave terribly this test, though -- but ones after the latter one behave effectively identically to say 1.0.55. Re. the misbehaviour with the test-case below. I have a couple of tentative fixes for it. One is to remap after every N nursery collections if there have been no intervening larger collections. Another is to trigger a remap depending on the number of bytes released by GCs. Both seem reasonable heuristics, but I'm looking into actually keeping track of the exact number of pages eligible for release to the OS, which is a bit hairier, but a much better basis for the decision than heuristics. Cheers, -- nikodemus (require :sb-posix) (defun xterm-status-hook (proc) (when (member (process-status proc) '(:exited :signaled)) (quit :unix-status 0 :recklessly-p t))) (run-program "xterm" (list "-e" (format nil "top -b -p~S | tee top.log" (sb-posix:getpid))) :search t :wait nil :status-hook 'xterm-status-hook) (setf (bytes-consed-between-gcs) (* 400 1024 1024)) (defun test (n) (loop repeat n count (evenp (length (make-list 100000))))) (loop (write-char #\.) (finish-output) (test 10000)) Cheers, -- Nikodemus |
From: Martin C. <cra...@co...> - 2012-04-29 02:14:23
|
Nikodemus Siivola wrote on Sat, Apr 28, 2012 at 06:31:16PM +0300: > Below a small test-case. Run it with --dynamic-space-size 8Gb, and > observe how the resident memory size keeps growing. The issue is that > we don't release memory back to the OS after nursery collections -- > and indeed doing that would be pretty expensive. > > Though this, as far as I can tell, is actually not a regression, Stas > (and someone else too, IIRC), however, has reported similar behaviour > as a regression since > > commit 6b1b11a6c51e1c29aee947f1fde7f91651ca3763 > Author: Nikodemus Siivola <nik...@ra...> > Date: Sat Mar 31 00:56:44 2012 +0300 > > gencgc: reclaim space more aggressively > > ...but I so far lack a test-case that demonstrates this. > > So, if you have a test-case that demonstrates a regression in SBCL's > RES growing for no apparent good cause, please let me know! We always had both Lisp and C heap grow in RSS size, for hours of contiguous CPU time. Very annoying since there's no performance increase to reward us for it. That can be reduced by running the madsive(DONTNEED) on every GC but that can be prohibitively expensive (see below). The real problem is that the unchanged behavior has large spikes. small_generation_limit = 0 isn't only smaller, it doesn't have the spikes. I am certain that recent SBCL versions in April have not made this worse, I ran our RSS sizing with all kinds of recent versions. I can't check right now but it might also be that we overwrote those parameters. I can't find out Monday. I would be most interested in code that releases memory more often. small_generation_limit = 0 does that for us. While it slows us down by 3% on single memory bank machines it is almost 30% on other machines with several memory banks (different kernel, too, so YMMV). I tinkered with trying not to release pages that were right in front of the steamroller and some other hacks trying to figure out who's been a release candidate for a while. But as you say below this quickly got into requiring quite a bit more code. I then went off and first fixed this issue for the C heap and tinkered with a bunch of malloc implementations and tuning parameters. glibc is really bad for this. Newish versions of tcmalloc and Doug Lea's malloc with the appropriate parameters tuned payed off. Martin > The test-case below behaves apparently identically with current HEAD > and ones prior to the commit above. SBCL's *between* the commit above > and > > commit 31103f174118c5e30087b26447cf33515627f9c4 > Author: Nikodemus Siivola <nik...@ra...> > Date: Sat Apr 14 11:08:45 2012 +0300 > > gencgc: tune the recent "more aggressive GC" changes > > misbehave terribly this test, though -- but ones after the latter one > behave effectively identically to say 1.0.55. > > Re. the misbehaviour with the test-case below. I have a couple of > tentative fixes for it. One is to remap after every N nursery > collections if there have been no intervening larger collections. > Another is to trigger a remap depending on the number of bytes > released by GCs. Both seem reasonable heuristics, but I'm looking into > actually keeping track of the exact number of pages eligible for > release to the OS, which is a bit hairier, but a much better basis for > the decision than heuristics. > > Cheers, > > -- nikodemus > > (require :sb-posix) > > (defun xterm-status-hook (proc) > (when (member (process-status proc) '(:exited :signaled)) > (quit :unix-status 0 :recklessly-p t))) > > (run-program "xterm" (list "-e" (format nil "top -b -p~S | tee > top.log" (sb-posix:getpid))) > :search t :wait nil :status-hook 'xterm-status-hook) > > (setf (bytes-consed-between-gcs) (* 400 1024 1024)) > > (defun test (n) > (loop repeat n count (evenp (length (make-list 100000))))) > > (loop > (write-char #\.) > (finish-output) > (test 10000)) > > Cheers, > > ??-- Nikodemus > > ------------------------------------------------------------------------------ > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Sbcl-devel mailing list > Sbc...@li... > https://lists.sourceforge.net/lists/listinfo/sbcl-devel -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ |
From: Nikodemus S. <nik...@ra...> - 2012-04-29 18:05:23
|
On 29 April 2012 04:53, Martin Cracauer <cra...@co...> wrote: > Nikodemus Siivola wrote on Sat, Apr 28, 2012 at 06:31:16PM +0300: >> Below a small test-case. Run it with --dynamic-space-size 8Gb, and >> observe how the resident memory size keeps growing. The issue is that >> we don't release memory back to the OS after nursery collections -- >> and indeed doing that would be pretty expensive. >> >> Though this, as far as I can tell, is actually not a regression, Stas >> (and someone else too, IIRC), however, has reported similar behaviour >> as a regression since Turns out the regression was older than that: https://bugs.launchpad.net/sbcl/+bug/991293 Fixed in HEAD -- the issue show by the test-case upthread still exists, though in many cases somewhat ameliorated by this fix as well. Cheers, -- nikodemus |
From: Martin C. <cra...@co...> - 2012-04-30 15:59:39
|
Nikodemus Siivola wrote on Sun, Apr 29, 2012 at 09:05:17PM +0300: > On 29 April 2012 04:53, Martin Cracauer <cra...@co...> wrote: > > > Nikodemus Siivola wrote on Sat, Apr 28, 2012 at 06:31:16PM +0300: > >> Below a small test-case. Run it with --dynamic-space-size 8Gb, and > >> observe how the resident memory size keeps growing. The issue is that > >> we don't release memory back to the OS after nursery collections -- > >> and indeed doing that would be pretty expensive. > >> > >> Though this, as far as I can tell, is actually not a regression, Stas > >> (and someone else too, IIRC), however, has reported similar behaviour > >> as a regression since > > Turns out the regression was older than that: > > https://bugs.launchpad.net/sbcl/+bug/991293 > > Fixed in HEAD -- the issue show by the test-case upthread still > exists, though in many cases somewhat ameliorated by this fix as well. We override that manually, that's why the change didn't affect us. We have it at 10 MB (for 400 MB regular consed between GCs). I dunno whether that's the result of guessing or whether I measured different values. Problem is, the Lisp heap still grows strongly in number of dirty pages over time for hours of CPU time :-) The search for the silver bullet continues. Martin -- %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% Martin Cracauer <cra...@co...> http://www.cons.org/cracauer/ |