Re: [Sbcl-devel] Feasibility of a concurrent GC for SBCL

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Thu, May 8, 2014 at 11:36 PM, Vsevolod Dyomkin <vse...@gm...> wrote:
> We develop a CPU-heavy application, and in our production setup and
> experiments there's a huge loss in CPU utilization because of GC pauses with
> the increase of the number of processor cores (for instance, going from 8 to
> 16 cores, basically, doesn't increase throughput of the application, because
> all of the gain is lost in GC). So this topic became quite important to me.

A concurrent GC executes simultaneously with your program. A parallel
GC utilizes multiple threads when performing its work. Some GCs are
both parallel and concurrent. SBCL's is neither, as you well know. As
Martin pointed out, if you're interested in throughput, you want a
parallel stop-the-world GC. Concurrent GCs trade some throughput for
decreased latency, i.e. they will not stop your program (our at least
pauses will be very brief) but both the GC and the program will run a
little bit slower.

> Has anyone explored this direction before?

Yes, once upon a time I tried to introduce parallelism into SBCL's
GENCGC, following the architecture described in this paper:
<http://research.microsoft.com/en-us/um/people/simonpj/papers/parallel-gc/par-gc-ismm08.pdf>.
In that paper the authors describe how they've parallelized a
stop-the-world copying GC that's pretty similar to SBCL's.

> What are the main obstacles
> (besides the obvious need to devote a hefty amount of time to it) in
> implementing one of the existing concurrent GC approaches in SBCL?

The main obstacle for me, besides a general lack of
knowledge/guidance, was that SBCL's copying code is much more
complicated that GHC's. Whereas GHC's GC has a single function that
copies some sort of hunk from old to new space, SBCL's has various
copying functions for dozens of object types, some with
parallelism-unfriendly optimization tricks. It took a while to nudge
all of those copiers into apparent correctness. In hindsight, it was a
big mistake not to simplify the copiers before making them concurrent.

The other challenges are getting a balanced distribution of work
across the GC threads and, possibly, solving the page table bottleneck
described in the paper. I definitely bumped into the former. I very
likely bumped into the latter as well, but I didn't know how to
measure it. Some years later, I came across Simon Marlow -- the main
author of the aforementioned paper -- at a GC workshop and he told me
he had used a thread profiler tool from Intel, whose name I can't
recall.

In the end, the bottom line was: I reproduced the 20-30% hit on
single-thread GC time when the synchronization mechanism was enabled;
running with two threads recovered that overhead and on some
benchmarks even yielded a little bit of speedup compared to the
original GC; but, there was little to no speedup beyond 2 threads. :-(

> Maybe, someone wanted to approach it, but didn't have motivation?

Well, motivation was pretty low when I was stuck chasing heisenbugs. :-)

> I'd be grateful for any pointers and can provide more detail here or in the
> personal communication if needed.

I can try and dig out my thesis and (outdated) code if you're interested.

-- 
Luís Oliveira
http://kerno.org/~luis/

Re: [Sbcl-devel] Feasibility of a concurrent GC for SBCL

Common Lisp compiler and runtime

Re: [Sbcl-devel] Feasibility of a concurrent GC for SBCL