|
From: John M M. <jo...@sm...> - 2005-01-19 06:30:44
|
On Jan 18, 2005, at 9:55 PM, Andreas Raab wrote: > Hi John, > > Thanks, got it - the VM-dev list is just very slow. Some comments: > > * Using TheGCSemaphore makes the VM unusable with older images due to > splObj size mismatch - I'd want to change this to use an external > semaphore index (this is what I used for "my" primitive) to be able to > run 3.6 images on 3.8 VMs. I stole the set semaphore from another usage for some other semaphore, which is why the check for > [(self fetchClassOf: (self splObj: TheGCSemaphore)) = > (self splObj: ClassSemaphore)]) I have no problem changing this to use a external semaphore index. The check then I'd guess would check to check for a non-zero value in that global. > > * One of the truly important situations which is not covered in these > measures is when we have to run multiple compaction cycles due to lack > of forwarding blocks. I believe this has killed me in the past and > taking GC stats should definitely include this tad of information > (dunno how to measure to be honest...) > > * The logic in incrementalGC for growing namely: > > (((self sizeOfFree: freeBlock) < growHeadroom) and: > [(self fetchClassOf: (self splObj: TheGCSemaphore)) = > (self splObj: ClassSemaphore)]) ifTrue: > [growSize _ growHeadroom*3/2 - (self sizeOfFree: freeBlock) > self growObjectMemory: growSize]. > > looks odd. Questions: > - What has TheGCSemaphore to do with growing? > - Why do we grow when having less than growHeadroom space? > (all we need here is enough space to accomodate the next round > of allocations + IGC - I don't see a logic here) > - Why is the grow size inconsistent with, e.g., > sufficientSpaceAfterGC:? > - Why do it all? :-) > (no, quite seriously, I don't see what good the logic actually > does) Looking at TheGCSemaphore allows me to turn on or off the new logic so I can run before/after tests using the same VM by just setting the TheGCSemaphore to nil or to a Semaphore. This code is where the problem with: > What I found was an issue which we hadn't realized is there, well I'm > sure people have seen it, but don't know why... > What happens is that as we are tenuring objects we are decreasing the > young space from 4MB to Zero. > > Now as indicated in the table below if conditions are right (a couple > of cases in the macrobenchmarks) why as you see the > number of objects we can allocate decreases to zero, and we actually > don't tenure anymore once the survivors fall below 2000. > The rate at which young space GC activity occurs goes from say 8 per > second towards 1000 per second, mind on fast machines > the young space ms accumulation count doesn't move much because the > time taken to do this is under 1 millisecond, or 0, skewing > those statistics and hiding the GC time.AllocationCount Survivors > 4000 5400 > 3209 3459 > 2269 2790 > 1760 1574 > 1592 2299 > 1105 1662 > 427 2355 > 392 2374 > 123 1472 > 89 1478 > 79 2 > 78 2 > 76 2 > 76 2 > > Note how we allocate 76 objects, do a young space GC, then have two > survivors, finally we reach the 200K minimum GC > threshold and do a full GC followed by growing young space. However > this process is very painful. By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: freeBlock) we avoid the above problem. In the smalltalk code is a post check to say if we've grown N MB between full GCs then it's time to do another one, this prevents uncontrolled growth. I could add that check in the VM? Should we? If the GC tuning process stops running then the VM will grow to the maximum Virtual memory size. I did not add it to the VM code since I wanted to minimize the code there. What this also exposed is a tendency for the image to grow. In my notes to Jerry Bell about Croquet: "In your testing the image uses about 200MB. Of which the regular image/vm ran upwards in 6MB chunks when building the teapot in the 58 seconds, then usually using 500K of memory as active young space. Now if you choose to allocate memory then tenure to reduce the amount of mark/sweep work by reducing the number of objects being manged this churns more memory then at some point you are forced to collect it all, making it an expensive noticeable operation.. The activeRun changed things a bit to force growth a bit faster, then tenure on excessive marking which appears to have gotten rid of 42 seconds of incremental GC time because we are looking at fewer objects on each young space mark/sweep." Soo I'd suggest building a VM and running some before after tests and observe memory usage and clock time to complete a known task of work. > > * Measuring statMarkCount, statCompMoveCount, statSweepCount, > statMkFwdCount etc. seem to be excessive - is there really any need to > add extra instructions to these tight loops? I'd rather live without > these insns in the midst of time-critical GC code. Well I wanted to collect data, I wonder tho if adding these new instructions it makes any measurable difference, maybe integer unit 47 on that cpu now gets used? Somehow I'd rather leave them, unless you can show they are issues? Remember we don't have anyway to collect that type of data right now. > > Other than this it looks good. So I'd propose that: > a) We use "my" GC signaling code in order to keep the VMs compatible. > b) Add a counter for multiple compaction cycles (if we know how that > is) > c) Either remove the growing code from IGC or add a comment explaining > what the point of it is and why the parameters have been choosen the > way they have been choosen > d) Get rid of of the counters in the inner loops of the GC code. > > Opinions? (I'd be happy to integrate John's code on top of what I just > posted) > > Cheers, > - Andreas -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |