On May 15, 2013 2:42 PM, "Craig Lanning" <clanning@isc8.com> wrote:
> On Wed, 2013-05-15 at 13:15 -0400, Alastair Bridgewater wrote:
> >
> > On May 14, 2013 6:33 PM, "Craig Lanning" <clanning@isc8.com> wrote:
> > >
> > > On Tue, 2013-05-14 at 18:09 +0300, Nikodemus Siivola wrote:
> > > > Then the correct answer is ALLOCATION/WITH-FIXED-ALLOCATION;
> > > > everything else is more or less negotiable. Ie. the compiler needs
> > to
> > > > know how to emit allocation sequences.
> > >
> > > Now, that was an eye opener.
> > >
> > > Ultimately, what I'd like to do is define a set of functions and
> > global
> > > variables that can be define either in Lisp for a GC implemented in
> > Lisp
> > > or in C for a GC implemented in C (and brought into Lisp via the
> > alien
> > > facility).
> > >
> > > The company I work for has a very special use case that I think can
> > be
> > > made easier by rewriting the GC.  I read the opinions about why the
> > GC
> > > is in C instead of Lisp.  I understand them, agree with a few, and
> > > disagree with others.  I think that the changes I need to do to the
> > GC
> > > will be easier to implement if I can do it in Lisp instead of C.
> >
> > Are you able to share any details about the sort of changes that you
> > have in mind?  Or about the use case, even in general terms?
> I can give a generic description of what I want to do.  I just can't
> tell you the specific reason why the application needs to do this.
> Basically, the application has a certain object organization that it
> needs to be able to keep localized.  If we need to drop one of our main
> objects so that we can load another one, it would be advantageous to
> know that all allocation related to that object is contained within a
> known memory block.  Then we just declare the block empty and zero it.
> I started my Lisp career working on Symbolics Lisp Machines.  One thing
> that they had was a concept of areas.  An area was an allocation block
> that could be constrained somehow.  For the LispM's some areas held only
> cons cells, others held symbol names.  Areas could also be excluded from
> collection.  The aspect I'm interested in is that the user could create
> an area, designate whether it should be handled by the GC, what objects
> it would hold, and when the contents of the area are no longer needed,
> just flush the contents without running any GC.

If you can guarantee that your memory areas will always have sufficient space for all of the allocation that you need to do, are willing to run WITHOUT-INTERRUPTS while allocating into that space, and the space will contain no external references other than to static space, then I have a nasty, nasty hack in mind that will run on a stock system (hint: there are only two words used for the basic allocation and overflow check, and the region is in a known global location on single-thread systems and in the thread structure on multi-thread systems, easily poked at via SAP functions either way).

The requirement to always have sufficient space is so that you don't trip an allocation overflow "trap" (though they are actually straight calls on x86oids, but on PPC they are a TWI instruction that the trap handler can easily recognize).  To alleviate this, we would need to define a way to hook the overflow trap on a per-region basis to point to the "correct" GC.  Note that it should be plausible to run the trap handler in Lisp with a bit of care.

The requirement to run WITHOUT-INTERRUPTS is because an interrupt (unix signal) handler will almost certainly allocate memory, and you don't want that in your custom memory space.  To alleviate this, modify the interrupt handling logic to detect that a thread is running with a custom allocation region, and to somehow bind a normal, empty gencgc region in its place.  And modify your macro to arrange to close the old gencgc region before setting up your custom region, otherwise the GC could lose track of the old region. If you also have to enable scavenging of your allocation regions then this becomes more complex, as you would need to update whatever tracks the size of the allocated data in your region when binding the gencgc region into place.

The requirement to not have any pointers to non-static heap data is because gencgc doesn't know to scavenge your allocation regions.  This should be straightforward to arrange if you keep track of the regions, by arranging to scavenge all of the regions at some point after all conservative roots have been scanned (somewhere near the scavenging of the binding stacks should suffice; it should be before scavenging newspace).

Depending on which of these restrictions you can live with, this could run anywhere from an afternoon to a month, as a zero'th order estimate. The easy restriction to lift is the one about scavenging custom allocation regions.  I'd actually have to think through the details for the other two.  There may be a dependency involved (you might need some of the bits for removing WITHOUT-INTERRUPTS before you can have a lisp-side overflow handler).

[ snip ]

> > I hope that this brain-dump helps, and would love to be kept "in the
> > loop" if you decide to go forward with writing a new GC for SBCL in
> > Lisp.  Or even a new GC for SBCL in C.
> The "brain-dump" does help.  You mentioned a few things that I hadn't
> run across yet.  I certainly will keep posting info to the list.  I'm
> interested to run any GC performance tests on whatever I end up
> building.  I intend to provide the SBCL team with a GC chapter for the
> Internals manual at the very least.  As I work on the GC changes, any
> code that is generally useful, SBCL is welcome to have.  I will try to
> make sure that any of our "proprietary" changes are really nothing more
> than specific configuration of the general GC that I build.
> Based on the info from Nikodemus and Alastair, it looks like this will
> be a longer term project than I had originally thought.  Fortunately,
> the application will still work with the existing GC so we're ok for the
> time being.  Changing the GC would just make the application run more
> quickly and more efficiently.
> Craig

Now you've got me thinking about pluggable allocation regions and scavenging strategies... And I already have enough of a project list.

-- Alastair