From: Cedric St-J. <ced...@gm...> - 2008-08-02 18:13:14
|
1. I sometimes have to call (sb-ext:gc :full t) repeatedly in order to free "everything". I've seen jumps from 200 mb to 50 mb between the first and 4th call (yes, I've cared about the +++ *** et al.). On the x86-64, this is presumably an artifact of the generational GC, but I would like to know if there is some rule to it. (gc) is expensive, is there any way to know if I've likely cleaned everything that could be cleaned? Or is there any way to avoid hitting the "conservative" part of the GC? What I do right now is, in my main loop, (when (< threshold (dynamic-usage)) (clean-some-old-structures) (gc :full t)) And this doesn't work very well, I'm afraid. In my experience, having 3 calls to the gc works better, but it makes me cringe, as well as being slow. 2. The man page for SBCL 1.0.14 says that the code for the weak-hash-table has suffered from "code rot". I use it and it seems to work fine, and the actual source code doesn't have any comment predicting doom. 3. It'd be awesome if weak-pointers were print-readable. The weak hash-table already is... Many thanks for SBCL! Cedric |
From: Tim J. <tim...@gm...> - 2018-01-20 06:06:19
|
1. Background ***************** TL;DR I want to make SBCL GC more memory space efficient and make memory failures more predictable, and I have an idea that might help but have some questions before I start coding. As mentioned on the help list ( https://groups.google.com/forum/#!topic/sbcl-help-archive/NrZhwc_UfzU), the mark/copy method used by the GC has been causing me some problems. In essence the issue is that this approach can result in effective memory efficiency of as low as 50% and more usually 70-80% and a fairly high degree of unpredictability about when and whether you are going to run out of memory. My PC is currently at 32GiB and the motherboard is maxed out. There seems to be a price cliff above 32 Gib, especially if you want the option to go from 64 GiB to 128 Gib, which possibly I might need to do. I have spent a few days looking into what can be done. More than that actually. I hope my questions reflect that. 2. Idea ******** Basically the idea is to compact the used storage in situ. There would be no "from" and "ro" areas. However this causes problems because doing this naively requires lots of storage to keep track of things. The stop and copy code uses a/b areas, in part to provide somewhere for the forwarding pointers to go. Using techniques such as those from the excellent book "Compact Data Structures: A practical approach" by Gonzalo Navarro I think I have a way to keep this overhead to less than 3%(1) while retaining good CPU efficiency. If you have to use a full pointer per object then you are probably no better off and maybe worse off than with A/B areas, except that it is more predictable. It may not work so I won't go into any more detail at this stage. (1) Actually may be less than that, in that it would replace the hash table mark_bits in fullcgc.c 3. Questions *************** 3.1 Interface between GC and the rest of the code The interface between the GC and the rest of the system is quite loose. For example there are well over 100 source code files with the string "gencgc" in them. > grep -irl gencgc .|grep "\.h\|\.c\|.lisp\|.sh"|wc -l 121 Having looked at these files, they are often non trivial references; in many cases the non-GC code makes quite invasive assumptions about how the GC works, and the code often depends on one version of GC or another. All this makes it potentially a lot of work to plug in a new GC module. Adding support back into the X86 system for cheneygc would require adding in support for the relocatable heap and for threads (or lose support for threads) but this is the least of the problem, which is going through all these other modules and fixing their dependence on gencgc. Adding a new GC would be an opportunity to make the GC I/F more plug and play, with a defined interface as discussed in the HACKING document, under "misc". *Question*: does this sound like a good idea? Do people have any specific ideas on this, beyond a) what is in the documentation with the source code on coding standards etc b) an assumed desire to support multithreaded GC and more incremental GC in the future? 3.2 My impression of what is needed in principle to add support for threads in cheyneygc is basically to add in the thread_mutex calls. Stop/start the world are handled in common code already. Any other large or complex pieces of work? 3.3. What is needed to add make relocatable storage work with src/runtime/cheneygc.c? Shared.lisp says they don't work together but I am not clear why. I am guessing something like gencgc-space-setup in src/compiler/generic/parms.lisp is needed plus related changes. 3.4 Some specific detailed questions: 3.4.1 In cheneygc.c, where do the variables DYNAMIC_0_SPACE_START DYNAMIC_0_SPACE_END DYNAMIC_1_SPACE_START and DYNAMIC_1_SPACE_END get set up? 3.4.2 I see a lot of non-'define' variables with names in CAPS e.g. DYNAMIC_SPACE_START. Why is this? Normally in C, caps are for defined names. 3.4.3 preserve_pointer(...) looks to me like it is used to ensure that variables on the stack etc don't have to be changed to accommodate GC-initiated moves. Why not update the pointers (maybe the values are also in registers)? If this is needed the implication is that there will be some fragmentation at a page-ish level. 3.4.4 file fullcgc.c does some sort of full GC. But not really - as per these comments (gencgc.c) // "full" [sic] GC /* This is a full mark-and-sweep of all generations without compacting * and without returning free space to the allocator. The intent is to * break chains of objects causing accidental reachability. * Subsequent GC cycles will compact and reclaims space as usual. */ I think this code exists to zero out unused objects to avoid spuriously thinking that other objects are used as per the comment. I think this is done in preparation for an actual full GC and only then. But this doesn't seem to make sense. Dong a full mark will not find any spurious usage. ? ---- I don't expect anyone to rush off and do a lot of research on these points but I would appreciate if anyone does know answers or even hints off the top of their heads it would save me a lot of time. If you are uncertain about something saying "I Think" or 'As Far As I Can Tell' As an interesting side note the openjava GC is about 150,000 lines of code though it does include 5 different versions some of which are real time. Sbcl's 5-10 kloc GC is a more feasible to deal with even with less documentation IMHO. Thanks, Tim Josling |
From: Douglas K. <do...@go...> - 2018-01-20 14:52:31
|
> > Basically the idea is to compact the used storage in situ. > What you're describing is (probably) a mark-and-compact algorithm. If you intend not to use additional storage (aside from overhead structures) beyond that in use at the start of GC, then you must have to decide what is garbage before starting to compact. There's no simple way to decide what's garbage without a mark bit. Have you thought about where to put the mark bits? This is currently a problem for cons cells, though I have some changes pending which allow simpler marking of cons cells. > > Having looked at these files, they are often non trivial references; in > many cases the non-GC code makes quite invasive assumptions about how the > GC works, and the code often depends on one version of GC or another. All > this makes it potentially a lot of work to plug in a new GC module. Adding > support back into the X86 system for cheneygc would require adding in > support for the relocatable heap and for threads (or lose support for > threads) but this is the least of the problem, which is going through all > these other modules and fixing their dependence on gencgc. > > I think that everyone whose ever said "I'm going to 'just' write a new GC" eventually stumbles upon a certain reality that in every garbage-collected language, there is a tie that binds the compiler to the GC in the sense that the compiler has to know how to emit code that cooperates with GC. I also think that claims of far-reaching dependencies outside of GC by non-GC code is not really true to the degree that it renders things impossibly non-modular. Anything relying so heavily on knowledge of GC is probably part of GC. A difference you'll see with us and others is that we lack, let's say, a subdirectory for "gc/" and everything that knows anything about GC can reside in there. Oh well. > 3.2 My impression of what is needed in principle to add support for > threads in cheyneygc is basically to add in the thread_mutex calls. > Stop/start the world are handled in common code already. Any other large or > complex pieces of work? > Just ignore cheneygc as if it's not even there, except in as much as don't break it. (And I'm not sure what "adding back in" means. x86 never supported it) > 3.3. What is needed to add make relocatable storage work with > src/runtime/cheneygc.c? Shared.lisp says they don't work together but I am > not clear why. I am guessing something like gencgc-space-setup in > src/compiler/generic/parms.lisp is needed plus related changes. > > I suspect there is nothing tricky about making relocation work for cheneygc. The address range chosen by the OS should be split into two spaces. The originally desired space should be relocated into whichever half of the actual range is appropriate based on whether it was originally supposed to be the lower or upper space. (i.e. when dumped, the space is either 0 or 1, and the dumper doesn't care which) 3.4 Some specific detailed questions: > > 3.4.1 In cheneygc.c, where do the variables DYNAMIC_0_SPACE_START DYNAMIC_0_SPACE_END > DYNAMIC_1_SPACE_START and DYNAMIC_1_SPACE_END get set up? > > Lots of things are in header files generated by first genesis > 3.4.2 I see a lot of non-'define' variables with names in CAPS e.g. > DYNAMIC_SPACE_START. Why is this? Normally in C, caps are for defined names. > > Prior to space relocation being implemented, these were manifest constants. > 3.4.3 preserve_pointer(...) looks to me like it is used to ensure that > variables on the stack etc don't have to be changed to accommodate > GC-initiated moves. Why not update the pointers (maybe the values are also > in registers)? If this is needed the implication is that there will be some > fragmentation at a page-ish level. > Our compiler does not emit stack maps. In the conservative GC, it does not know which stack words and registers contain pointers. Imagine a floating-point value in a general purpose register whose low bits have the pointer nature, and spilled to memory. (Additional reason: in 32-bit x86, it was deemed impossible to partition the register set into pointer/non-pointer, so there's that problem too) > 3.4.4 file fullcgc.c does some sort of full GC. But not really - as per > these comments (gencgc.c) > > > I think this code exists to zero out unused objects to avoid spuriously > thinking that other objects are used as per the comment. I think this is > done in preparation for an actual full GC and only then. But this doesn't > seem to make sense. Dong a full mark will not find any spurious usage. ? > fullcgc exists because without it, we otherwise never do a *truly* full GC. In the standard gencgc algorithm, objects that were dumped to the image never die. This is not a feature, it's an artifact of implementation. The consequence of the artifact is that we accidentally enliven things that your running program will never actually use. Consider: (defvar *a (make-massively-long-array)), but somewhere in your program (setq *a* (make-other-array)) - in the absence of a dynamic binding. The massively long array is unreachable. (Assume no other path to it, of course). And suppose further that the array pointed to a zillion other things. There's a mitigating strategy for this problem - zero-fill the known dead image-backed objects. So while it's possibly to try to deal with that problem without having implemented essentially a separate kind of GC, it's not how it's done, because mixing the copying GC with non-copying of generation 6 was actually more tricky than it sounds. Doug |
From: Nikodemus S. <nik...@ra...> - 2008-08-03 18:24:36
|
On Sat, Aug 2, 2008 at 9:13 PM, Cedric St-Jean <ced...@gm...> wrote: > 1. I sometimes have to call (sb-ext:gc :full t) repeatedly in order to free > "everything". I've seen jumps from 200 mb to 50 mb between the first and 4th > call (yes, I've cared about the +++ *** et al.). On the x86-64, this is > presumably an artifact of the generational GC, but I would like to know if > there is some rule to it. (gc) is expensive, is there any way to know if > I've likely cleaned everything that could be cleaned? Or is there any way to > avoid hitting the "conservative" part of the GC? > > What I do right now is, in my main loop, > > (when (< threshold (dynamic-usage)) > (clean-some-old-structures) > (gc :full t)) > > And this doesn't work very well, I'm afraid. In my experience, having 3 > calls to the gc works better, but it makes me cringe, as well as being slow. What happens if you don't have any explicit calls to GC at all? Do you run out of heap? That is, calling GC manually is definitely unusual -- though it is a known problem that some allocation patterns are prone to cause problems without that. Depending on what your control flow is like, you might be able to use dynamic-extent allocation for some of the things that are now causing trouble. Alternatively, you might be able to use *AFTER-GC-HOOKS* (or better yet, *BEFORE-GC-HOOKS*, which don't exist at them monent...), to run CLEAN-SOME-OLD-STRUCTURES without having to manually check for heap usage. Hard to say without more details. > 2. The man page for SBCL 1.0.14 says that the code for the weak-hash-table > has suffered from "code rot". I use it and it seems to work fine, and the > actual source code doesn't have any comment predicting doom. That comment is outdated, and removed from the current version of the manpage -- weak hash tables should be fine nowadays. > 3. It'd be awesome if weak-pointers were print-readable. The weak hash-table > already is... Possibly. Cheers, -- Nikodemus |