From: Benjamin L. <ben...@cm...> - 2008-11-19 23:45:11
|
Dear SBCL people, I use SBCL frequently at Carnegie Mellon. We use it mostly for our knowledge representation system, Scone, developed by Scott Fahlman, an old CMUCL Lisper. I recently came across a new problem, and I'm not seeing anything in the mailing list history. We're running on an x86-64 Linux machine with 8GB of RAM. One person in our group was getting error #1 below when he used more than around 1GB of memory. So, I tried recompiling SBCL (1.0.22) a few times with greater values for GENCGC-PAGE-SIZE. Each time I increased it, he'd get error #1 eventually, each time after more memory had been used. After bumping this up to 32768, he was getting error #1 at around 6GB of RAM used. I tried increasing it once more, first to 64k, then back down to 49152. When I compile with GENCGC-PAGE-SIZE set to either, compilation terminates with error #2. (The main part of this error message seems to be this: GC invariant lost, file "gencgc.c", line 4503). Is it possible to increase GENCGC-PAGE-SIZE even more? If so, is that desirable? What exactly is GENCGC-PAGE-SIZE? Ben Lambert Error #1: fatal error encountered in SBCL pid 18893: An mprotect call failed with ENOMEM. This probably means that the maximum amount of separate memory mappings was exceeded. To fix the problem, either increase the maximum with e.g. 'echo 262144 > /proc/sys/vm/max_map_count' or recompile SBCL with a larger value for GENCGC-PAGE-SIZE in 'src/target/ parms.lisp'. Welcome to LDB, a low-level debugger for the Lisp runtime environment. Error #2: beginning GENESIS, creating core "output/cold-sbcl.core" obj/from-xc/src/code/show.lisp-obj .... obj/from-xc/src/code/late-defbangmethod.lisp-obj obj/from-xc/src/pcl/walk.lisp-obj [building initial core file in "output/cold-sbcl.core": writing 8192 bytes [2 pages] from #<SB!FASL::GSPACE :READ-ONLY> writing 4096 bytes [1 page] from #<SB!FASL::GSPACE :STATIC> writing 51998720 bytes [12695 pages] from #<SB!FASL::GSPACE :DYNAMIC> /(DESCRIPTOR-BITS INITIAL-FUN)=#X1002993D19 done] * //testing for consistency of first and second GENESIS passes //header files match between first and second GENESIS -- good real 13m0.960s user 4m32.852s sys 0m9.967s //entering make-target-2.sh //doing warm init - compilation phase fatal error encountered in SBCL pid 16153: GC invariant lost, file "gencgc.c", line 4503 -- Benjamin Lambert Graduate Student of Computer Science Carnegie Mellon University www.cs.cmu.edu/~belamber |
From: Gábor M. <me...@re...> - 2008-11-20 08:12:49
|
On Jueves 20 Noviembre 2008, Benjamin Lambert wrote: > Dear SBCL people, > > I use SBCL frequently at Carnegie Mellon. We use it mostly for our > knowledge representation system, Scone, developed by Scott Fahlman, > an old CMUCL Lisper. > > I recently came across a new problem, and I'm not seeing anything in > the mailing list history. > > We're running on an x86-64 Linux machine with 8GB of RAM. One person > in our group was getting error #1 below when he used more than around > 1GB of memory. So, I tried recompiling SBCL (1.0.22) a few times > with greater values for GENCGC-PAGE-SIZE. Each time I increased it, > he'd get error #1 eventually, each time after more memory had been > used. After bumping this up to 32768, he was getting error #1 at > around 6GB of RAM used. > > I tried increasing it once more, first to 64k, then back down to > 49152. When I compile with GENCGC-PAGE-SIZE set to either, > compilation terminates with error #2. (The main part of this error > message seems to be this: GC invariant lost, file "gencgc.c", line > 4503). > > Is it possible to increase GENCGC-PAGE-SIZE even more? If so, is > that desirable? What exactly is GENCGC-PAGE-SIZE? > > Ben Lambert > > > Error #1: > > fatal error encountered in SBCL pid 18893: > An mprotect call failed with ENOMEM. This probably means that the > maximum amount > of separate memory mappings was exceeded. To fix the problem, either > increase > the maximum with e.g. 'echo 262144 > /proc/sys/vm/max_map_count' or > recompile > SBCL with a larger value for GENCGC-PAGE-SIZE in 'src/target/ > parms.lisp'. > Welcome to LDB, a low-level debugger for the Lisp runtime > environment. > > Error #2: > > beginning GENESIS, creating core "output/cold-sbcl.core" > obj/from-xc/src/code/show.lisp-obj > .... > obj/from-xc/src/code/late-defbangmethod.lisp-obj > obj/from-xc/src/pcl/walk.lisp-obj > [building initial core file in "output/cold-sbcl.core": > writing 8192 bytes [2 pages] from #<SB!FASL::GSPACE :READ-ONLY> > writing 4096 bytes [1 page] from #<SB!FASL::GSPACE :STATIC> > writing 51998720 bytes [12695 pages] from #<SB!FASL::GSPACE :DYNAMIC> > /(DESCRIPTOR-BITS INITIAL-FUN)=#X1002993D19 > done] > * //testing for consistency of first and second GENESIS passes > //header files match between first and second GENESIS -- good > > real 13m0.960s > user 4m32.852s > sys 0m9.967s > //entering make-target-2.sh > //doing warm init - compilation phase > fatal error encountered in SBCL pid 16153: > GC invariant lost, file "gencgc.c", line 4503 A quick look into gencgc.c reveals that the heap size should be divisible by PAGE_BYTES (that, with gencgc, is equal to GENCGC_PAGE_BYTES). /* Compute the number of pages needed for the dynamic space. * Dynamic space size should be aligned on page size. */ page_table_pages = dynamic_space_size/PAGE_BYTES; gc_assert(dynamic_space_size == npage_bytes(page_table_pages)); > -- > Benjamin Lambert > Graduate Student of Computer Science > Carnegie Mellon University > www.cs.cmu.edu/~belamber |
From: Benjamin L. <ben...@cm...> - 2008-11-20 22:25:00
|
>> * //testing for consistency of first and second GENESIS passes >> //header files match between first and second GENESIS -- good >> >> real 13m0.960s >> user 4m32.852s >> sys 0m9.967s >> //entering make-target-2.sh >> //doing warm init - compilation phase >> fatal error encountered in SBCL pid 16153: >> GC invariant lost, file "gencgc.c", line 4503 > > A quick look into gencgc.c reveals that the heap size should be > divisible by PAGE_BYTES (that, with gencgc, is equal to > GENCGC_PAGE_BYTES). > > /* Compute the number of pages needed for the dynamic space. > * Dynamic space size should be aligned on page size. */ > page_table_pages = dynamic_space_size/PAGE_BYTES; > gc_assert(dynamic_space_size == npage_bytes(page_table_pages)); I wasn't sure exactly which parameter specifies PAGE_BYTES, but I tried compiled again with the following in backend-parms.lisp: (setf *backend-page-size* 65536) (def!constant gencgc-page-size 65536) Compilation finishes OK with these parameters But then another error shows up when running (below). This time, I noticed that when I started SBCL the process was only allocated 4GB of memory (previously was 8GB). And this time the error appeared when the memory consumption was close to 4GB. ERROR: Heap exhausted during garbage collection: 256 bytes available, 272 requested. Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age 0: 0 0 0 0 0 0 0 0 0 0 0 2000 000 0 0 0.0000 1: 0 0 0 0 0 0 0 0 0 0 0 2000 000 0 0 0.0000 2: 0 0 0 0 0 0 0 0 0 0 0 2000 000 0 0 0.0000 3: 18702 19623 0 0 2316 1559 0 0 0 253304736 647264 367 902928 0 1 1.4445 4: 65533 64042 0 0 10721 6663 1731 1539 10 1350800224 2780320 7 78330752 9602 1 1.3438 5: 56837 57662 0 0 21457 15654 1569 1745 25 2643916288 5376512 2000000 22023 0 0.5391 6: 0 0 0 0 581 0 0 0 0 38076416 0 2000 000 535 0 0.0000 Total bytes allocated=4286097664 GC control variables: *GC-INHIBIT* = false *GC-PENDING* = true fatal error encountered in SBCL pid 23666: Heap exhausted, game over. |
From: Nikodemus S. <nik...@ra...> - 2008-12-01 15:59:11
|
On Fri, Nov 21, 2008 at 12:24 AM, Benjamin Lambert <ben...@cm...> wrote: > (setf *backend-page-size* 65536) > (def!constant gencgc-page-size 65536) > > Compilation finishes OK with these parameters > > But then another error shows up when running (below). This time, I > noticed that when I started SBCL the process was only allocated 4GB of > memory (previously was 8GB). And this time the error appeared when the > memory consumption was close to 4GB. Just to check: how are you running SBCL? Are you providing an explicit --dynamic-space-size argument? If yes, is it in the right place (it needs to appear before any lisp-side command line arguments like --eval or --load.) Also, if the expected size of the dataset is ~8Gb, you should reserve more than that -- say 10-16Gb -- using --dynamic-space-size, since copying GC can in the worst case require 2xlive-set memory. Cheers, -- Nikodemus |
From: Benjamin L. <ben...@cm...> - 2008-12-03 19:29:27
|
>> (setf *backend-page-size* 65536) >> (def!constant gencgc-page-size 65536) >> >> Compilation finishes OK with these parameters >> >> But then another error shows up when running (below). This time, I >> noticed that when I started SBCL the process was only allocated 4GB >> of >> memory (previously was 8GB). And this time the error appeared when >> the >> memory consumption was close to 4GB. > > Just to check: how are you running SBCL? Are you providing an explicit > > --dynamic-space-size > > argument? If yes, is it in the right place (it needs to appear before > any lisp-side command line arguments like --eval or --load.) > > Also, if the expected size of the dataset is ~8Gb, you should reserve > more than that -- say 10-16Gb -- using --dynamic-space-size, since > copying GC can in the worst case require 2xlive-set memory. Thanks - We weren't using that flag before, but tried it again with the command line argument "sbcl --dynamic-space-size 16000", and got the same error as last time. I'll paste the error again, below. Ben ERROR: Heap exhausted during garbage collection: 256 bytes available, 272 requested. Gen StaPg UbSta LaSta LUbSt Boxed Unboxed LB LUB !move Alloc Waste Trig WP GCs Mem-age 0: 0 0 0 0 0 0 0 0 0 0 0 2000 000 0 0 0.0000 1: 0 0 0 0 0 0 0 0 0 0 0 2000 000 0 0 0.0000 2: 0 0 0 0 0 0 0 0 0 0 0 2000 000 0 0 0.0000 3: 18702 19623 0 0 2316 1559 0 0 0 253304736 647264 367 902928 0 1 1.4445 4: 65533 64042 0 0 10721 6663 1731 1539 10 1350800224 2780320 7 78330752 9602 1 1.3438 5: 56837 57662 0 0 21457 15654 1569 1745 25 2643916288 5376512 2000000 22023 0 0.5391 6: 0 0 0 0 581 0 0 0 0 38076416 0 2000 000 535 0 0.0000 Total bytes allocated=4286097664 GC control variables: *GC-INHIBIT* = false *GC-PENDING* = true fatal error encountered in SBCL pid 23666: Heap exhausted, game over. > > > Cheers, > > -- Nikodemus |
From: Nikodemus S. <nik...@ra...> - 2008-12-04 07:19:46
|
On Wed, Dec 3, 2008 at 9:29 PM, Benjamin Lambert <ben...@cm...> wrote: > Thanks - We weren't using that flag before, but tried it again with the > command line argument "sbcl --dynamic-space-size 16000", and got the same > error as last time. I'll paste the error again, below. One thing that I find highly suspicious is the total of Alloc+Waste for all generations: (+ (+ 253304736 1350800224 2643916288 38076416) ; Alloc (+ 647264 2780320 5376512)) ; Waste ; => 4294901760, which is #xFFFF0000 No immediate ideas, but it certainly looks like a Clue. As a sanity check, can you evaluate the following in the image (after having started it up as usual, and after having started it up with --dynamic-space-size 16000)? (define-alien-variable dynamic-space-size unsigned-long) dynamic-space-size I don't really expect any surprises there, though. Cheers, -- Nikodemus |
From: Benjamin L. <ben...@cm...> - 2008-12-04 20:50:30
|
On Dec 4, 2008, at 2:19 AM, Nikodemus Siivola wrote: > On Wed, Dec 3, 2008 at 9:29 PM, Benjamin Lambert > <ben...@cm...> wrote: > >> Thanks - We weren't using that flag before, but tried it again with >> the >> command line argument "sbcl --dynamic-space-size 16000", and got >> the same >> error as last time. I'll paste the error again, below. > > One thing that I find highly suspicious is the total of Alloc+Waste > for all generations: > > (+ (+ 253304736 1350800224 2643916288 38076416) ; Alloc > (+ 647264 2780320 5376512)) ; Waste > > ; => 4294901760, which is #xFFFF0000 > > No immediate ideas, but it certainly looks like a Clue. > > As a sanity check, can you evaluate the following in the image (after > having started it up as usual, and after having started it up with > --dynamic-space-size 16000)? > > (define-alien-variable dynamic-space-size unsigned-long) > > dynamic-space-size Below is what I get from evaluating this, I guess the number you're looking for is this: 3892314112. ~> sbcl --dynamic-space-size 16000 This is SBCL 1.0.22, an implementation of ANSI Common Lisp. More information about SBCL is available at <http://www.sbcl.org/>. SBCL is free software, provided as is, with absolutely no warranty. It is mostly in the public domain; some portions are provided under BSD-style licenses. See the CREDITS and COPYING files in the distribution for more information. * (define-alien-variable dynamic-space-size unsigned-long) #<SB-ALIEN-INTERNALS:HEAP-ALIEN-INFO (SB-SYS:FOREIGN-SYMBOL-SAP '"dynamic_space_size" T) (UNSIGNED 64)> * dynamic-space-size 3892314112 * > > > I don't really expect any surprises there, though. > > Cheers, > > -- Nikodemus |
From: Nikodemus S. <nik...@ra...> - 2008-12-04 09:07:35
Attachments:
0001-export-page-sizes-to-C-with-LU-suffix.patch
|
On Thu, Dec 4, 2008 at 9:19 AM, Nikodemus Siivola <nik...@ra...> wrote: > (+ (+ 253304736 1350800224 2643916288 38076416) ; Alloc > (+ 647264 2780320 5376512)) ; Waste > > ; => 4294901760, which is #xFFFF0000 ...which is (ldb (byte 32 0) (lognot (- gencgc-page-size 1))) ; using your settings and we have dynamic_space_size &= ~(PAGE_BYTES-1); in the runtime, where PAGE_BYTES is a literal constant without an L suffix. I don't see it on this box, but presumably the C compiler is allowed to truncate the dynamic-space-size to 32 bits there? The attached patch should fix this. Does it make a difference for you? Cheers, -- Nikodemus |
From: Benjamin L. <ben...@cm...> - 2008-12-04 20:53:15
|
On Dec 4, 2008, at 4:07 AM, Nikodemus Siivola wrote: > On Thu, Dec 4, 2008 at 9:19 AM, Nikodemus Siivola > <nik...@ra...> wrote: > >> (+ (+ 253304736 1350800224 2643916288 38076416) ; Alloc >> (+ 647264 2780320 5376512)) ; Waste >> >> ; => 4294901760, which is #xFFFF0000 > > ...which is (ldb (byte 32 0) (lognot (- gencgc-page-size 1))) ; using > your settings > > and we have > > dynamic_space_size &= ~(PAGE_BYTES-1); > > in the runtime, where PAGE_BYTES is a literal constant without an L > suffix. I don't see it on this box, but presumably the C compiler is > allowed to truncate the dynamic-space-size to 32 bits there? Hm - I'm not sure about that. > The attached patch should fix this. Does it make a difference for you? I apologize, I haven't applied a "patch" file in a few years. I tried applying the patch to the 1.0.22 source, and I get the output below. It says "failed" a couple times, so I'm not sure if it really worked. Am I patching the correct source tree, and/or using the correct command? sbcl-1.0.22-source> patch -p1 < 0001-export-page-sizes-to-C-with-LU- suffix.patch patching file contrib/sb-sprof/sb-sprof.lisp patching file doc/internals-notes/GENCGC-PORTING-NOTES patching file package-data-list.lisp-expr patching file src/code/linux-os.lisp patching file src/code/room.lisp patching file src/code/toplevel.lisp patching file src/compiler/alpha/backend-parms.lisp patching file src/compiler/early-backend.lisp patching file src/compiler/generic/genesis.lisp Hunk #3 FAILED at 2693. 1 out of 9 hunks FAILED -- saving rejects to file src/compiler/generic/ genesis.lisp.rej patching file src/compiler/generic/vm-ir2tran.lisp patching file src/compiler/hppa/backend-parms.lisp patching file src/compiler/mips/backend-parms.lisp patching file src/compiler/ppc/backend-parms.lisp patching file src/compiler/sparc/backend-parms.lisp patching file src/compiler/x86-64/backend-parms.lisp Hunk #1 FAILED at 33. 1 out of 1 hunk FAILED -- saving rejects to file src/compiler/x86-64/ backend-parms.lisp.rej patching file src/compiler/x86/backend-parms.lisp patching file src/runtime/gc.h patching file src/runtime/gencgc.c Hunk #1 succeeded at 464 (offset -43 lines). patching file src/runtime/linux-os.c patching file src/runtime/thread.h sbcl-1.0.22-source> > > > Cheers, > > -- Nikodemus > <0001-export-page-sizes-to-C-with-LU-suffix.patch> |
From: Nikodemus S. <nik...@ra...> - 2008-12-04 23:44:05
|
On Thu, Dec 4, 2008 at 10:53 PM, Benjamin Lambert <ben...@cm...> wrote: > applying the patch to the 1.0.22 source, and I get the output below. It > says "failed" a couple times, so I'm not sure if it really worked. Am I > patching the correct source tree, and/or using the correct command? The command was correct, and it did not apply cleanly. Sorry, this was my bad: the patch was against 1.0.23.16 (probably applies cleanly to anything after 1.0.23.10, though). So you need to get the CVS version -- or if you don't want to, I can cook up a smaller patch against 1.0.22 tomorrow. Cheers, -- Nikodemus |
From: Benjamin L. <ben...@cm...> - 2008-12-12 20:13:06
|
On Dec 4, 2008, at 6:44 PM, Nikodemus Siivola wrote: > On Thu, Dec 4, 2008 at 10:53 PM, Benjamin Lambert > <ben...@cm...> wrote: > >> applying the patch to the 1.0.22 source, and I get the output >> below. It >> says "failed" a couple times, so I'm not sure if it really worked. >> Am I >> patching the correct source tree, and/or using the correct command? > > The command was correct, and it did not apply cleanly. > > Sorry, this was my bad: the patch was against 1.0.23.16 (probably > applies cleanly to anything after 1.0.23.10, though). So you need to > get the CVS version -- or if you don't want to, I can cook up a > smaller patch against 1.0.22 tomorrow. I succeeded in applying the patch, and it seems to fix the problems we've run into. Now we can allocate memory until both the RAM and Swap space fill up! I suppose you can't help beyond that :). So, I guess the patch is a success assuming it doesn't cause any other problems. Thanks Nikodemus! This is with *backend-page-bytes* set to 65536, in src/compiler/x86-64/ backend-parms.lisp. Is there a performance tradeoff for changing this? The original error message also suggested changing this value in the OS, 'echo 262144 > /proc/sys/vm/max_map_count' , is one solution preferable to the other? Thanks |
From: Nikodemus S. <nik...@ra...> - 2008-12-13 10:37:53
|
On Fri, Dec 12, 2008 at 10:12 PM, Benjamin Lambert <ben...@cm...> wrote: > I succeeded in applying the patch, and it seems to fix the problems we've > run into. Now we can allocate memory until both the RAM and Swap space fill > up! I suppose you can't help beyond that :). So, I guess the patch is a Good to hear, I'll merge it to CVS asap. > This is with *backend-page-bytes* set to 65536, in > src/compiler/x86-64/backend-parms.lisp. Is there a performance tradeoff for > changing this? Yes. Increasing the page-size reduces GC write-protection granularity and makes GC conservatism more expensive, but also increases allocation area size. The exact effect on performance are application dependent: eg. an application that is almost purely functional might run faster, but an application that touches old generations here and there (but not everywhere) might slow down. > The original error message also suggested changing this value in the OS, > 'echo 262144 > /proc/sys/vm/max_map_count' , is one solution preferable to > the other? Depends. I have very little idea what kind of global performance effects toggling max_map_count has. Maybe none, or maybe fork() suddenly becomes much more expensive? Cheers, -- Nikodmus |