RE: [Squeak-VMdev] GC improvements

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

>> Tim - is there any chance that we can get these changes into the 3.8
>> VMMaker? This stuff will be critical for the next Tweak version and  
>> having
>> it in the official 3.8 would heavily simplify migration.
> Perhaps you could chat with John about the GC monitoring code he
> suggested recently. There is a degree of overlap that you might be able
> to mutually remove, making my lif emuch simpler.

These change sets are attached.

I did add a
primitiveSetGCSemaphore

I'm not sure about having to run smalltalk code on each signal because  
of the frequency of invocation, if you look at the monitor change set  
you'll see the instance variable
gcActivity and some hacked code to look every 100 ms. The VM was  
altered to invoke the different tenure/compact logic if the semaphore  
is set, so I drop in a dummy one (Semaphore new) to trigger the new  
logic. As implied in Andreas' early note the semaphore signal allows  
you to do active tinkering.
See calculateGoals
watch out for the comment out code which I've been tinkering with...,  
let alone the "true ifTrue: [^self]."

Right now that code attempts to tenure if it feels the marking has  
become excessive because of Root Table scanning, and ensures after  
growing N Bytes we do a full GC, which is the other part of the new VM  
logic to avoid doing a GC everytime we start to run low on space.

Plus added this
" A VM change will consider that after a tenure if the young space is  
less than 4MB then growth will happen to make young space greater than  
4MB plus a calculated slack. Then after we've tenured N MB we will do a  
full GC, versus doing a full GC on every grow operation, this will  
trigger a shrink if required.  For example we'll tenure at 75% and be  
bias to grow to 16MB before doing full GC."

>
> The Problem:
>
> Last weekend I built a new VM which has instrumentation to describe  
> exactly what the GC is doing, also to
> trigger a semaphore when an GC finishes, and to allow you to poke at  
> more interesting things that control GC activity.
>
> What I found was an issue which we hadn't realized is there, well I'm  
> sure people have seen it, but don't know why...
> What happens is that as we are tenuring objects we are decreasing the  
> young space from 4MB to Zero.
>
> Now as indicated in the table below if conditions are right (a couple  
> of cases in the macrobenchmarks) why as you see the
> number of objects we can allocate decreases to zero, and we actually  
> don't tenure anymore once the survivors fall below 2000.
> The rate at which young space GC activity occurs goes from say 8 per  
> second towards 1000 per second, mind on fast machines
> the young space ms accumulation count doesn't move much because the  
> time taken to do this is under 1 millisecond, or 0, skewing
> those statistics and hiding the GC time.
>
> AllocationCount 	Survivors
> 4000	5400
> 3209	3459
> 2269	2790
> 1760	1574
> 1592	2299
> 1105	1662
> 427	2355
> 392	2374
> 123	1472
> 89	1478
> 79	2
> 78	2
> 76	2
> 76	2
>
> Note how we allocate 76 objects, do a young space GC, then have two  
> survivors, finally we reach the 200K minimum GC
> threshold and do a full GC followed by growing young space. However  
> this process is very painful. Also it's why the low space dialog
> doesn't appear in a timely manner because we are attempting to  
> approach the 200K limit and trying really hard by doing thousands of
> young space GCed to avoid going over that limit. If conditions are  
> right, then we get close but not close enough...
>
> What will change in the future.
>
> a) A GC monitoring class (new) will look at mark/sweep/Root table  
> counts and decide when to do a tenure operation if iterating
> over the root table objects takes too many iterations. A better  
> solution would be to remember old objects and which slot has the young  
> reference but that is harder to do.
>
> b) A VM change will consider that after a tenure if the young space is  
> less than 4MB then growth will happen to make young space greater than  
> 4MB plus a calculated slack. Then after we've tenured N MB we will do  
> a full GC, versus doing a full GC on every grow operation, this will  
> trigger a shrink if required.  For example we'll tenure at 75% and be  
> bias to grow to 16MB before doing full GC.
>
> c) To solve hitting the hard boundary when we can not allocate more  
> space we need to rethink when the low semaphore is signaled and the  
> rate of young space GC activity, signaling the semaphore earlier will  
> allow a user to take action before things grind to a halt. I'm not  
> quite sure how to do that yet.

Some older notes:
> I've been getting a few GC test data results, the change sets to build  
> a VM lurk on my idisk as per my note to the squeak mailing  list
>
> OMniBrowser/Monnticello SUnits from Colin Putney
>
> Before any changes what I see is  (averages)
> 8139 marked objects per young space GC, where 2426 marked via  
> interpreter roots, and 5713 by remember table for 6703 iterations
> 4522 swept objects in young space
> 714 survivors
>
> After changes where we bias towards growth (more likely to tenure on  
> excessive marking), and ensure young space stays largish,
> versus heading towards zero I see (again averages)
>
> 4652 marked objects per young space GC, where 2115 marked via  
> interpreter roots, and 2526 by remember table for 6678 iterations
> 4238 swept objects in young space.
> 368 survivors
>
> This of course translates into fewer CPU cycles needed fpr  youngspace  
> GC work
>
>
> Jerry Bell send me some Croquet testing data
>
> Seems Croquet starts at about 30MB and grows upwards to 200MB when you  
> invoke a teapot and look about.
>
> Jerry has to confirm what he did and if it was repeated mostly the  
> same, but it did do 65,000 to 70,000 young space GC and it appears
> we reduced the young space GC time by 40 seconds.  This does result in  
> more full GC work (5) since I tenure about 16MB before doing a Full  
> GC, but that accounts only for an extra second of real time...
>
> Marking in the original case is average 20,808 per young gc
> After alterations it's 11,386, making GC work faster
>
> I'll also note growing to the 195mb takes 49 seconds versus the  
> original 57.
>
> Anyway I've got to get my head around the numbers and decide were to  
> take the active tuning logic.

--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===