|
From: Andreas R. <and...@gm...> - 2005-01-19 01:28:54
Attachments:
GCImprovements-ar.cs
|
Folks, After some talking to Ian today he convinced to finally fix the one remaining issue with weak arrays, namely that weak arrays in old space hold on strongly to values in young space and therefore prevent those values to be garbage collected. Fixing this turned actually out to be almost trivial - it's about ten lines of code or so and I really don't understand why I didn't fix this in the original version (but what the heck...) Since this exercise was so easy, I also added a number of GC related primitives all of which would have saved my day at some point in the past: * primitiveIsYoung: This answers the question whether an object currently lives in young or in old space. * primitiveIsRoot: Answers the question whether any given object is currently a root for young space. * primitiveRootTable: Answers a snapshot of the current root table. Useful to examine the roots table if the analysis requires complex other operations during which the root table might be modified itself. Note that since this primitive can cause GC there is a small chance that it will give an inaccurate answer. * primitiveRootTableAt: Answers a single element of the root table (by one-based index). This primitive can be used to quickly scan the root table for certain objects. * primitiveSetGCSemaphore: Indicates a semaphore (index) to be signaled whenever a garbage collection occurs. I can see at least two uses uses for the GC semaphore: running cleanup actions (for example after full GCs occured) and dynamic parameter tuning for the GC algorithm itself. Tim - is there any chance that we can get these changes into the 3.8 VMMaker? This stuff will be critical for the next Tweak version and having it in the official 3.8 would heavily simplify migration. Cheers, - Andreas |
|
From: Tim R. <ti...@su...> - 2005-01-19 01:54:29
|
In message <028001c4fdc6$34760c00$0301000a@R22>
"Andreas Raab" <and...@gm...> wrote:
> Tim - is there any chance that we can get these changes into the 3.8
> VMMaker? This stuff will be critical for the next Tweak version and having
> it in the official 3.8 would heavily simplify migration.
Perhaps you could chat with John about the GC monitoring code he
suggested recently. There is a degree of overlap that you might be able
to mutually remove, making my lif emuch simpler.
Aaaannnnnd, how is this going to relate to the 64bit code? I'm still
waiting for some answers about that before doing anything much to
vmmaker.
tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
"Bother," said Pooh, reading his bank statement from Barings.
|
|
From: Andreas R. <and...@gm...> - 2005-01-19 02:29:13
|
>> Tim - is there any chance that we can get these changes into the 3.8 >> VMMaker? This stuff will be critical for the next Tweak version and >> having >> it in the official 3.8 would heavily simplify migration. > Perhaps you could chat with John about the GC monitoring code he > suggested recently. There is a degree of overlap that you might be able > to mutually remove, making my lif emuch simpler. I haven't seen that code ... but what I am proposing here should allow us to run tuning code from the image instead of the VM. > Aaaannnnnd, how is this going to relate to the 64bit code? I'm still > waiting for some answers about that before doing anything much to > vmmaker. Just asked Ian about this today (I'll defer to him for an ultimate answer). Cheers, - Andreas |
|
From: Andreas R. <and...@gm...> - 2005-01-19 05:55:26
|
Hi John,
Thanks, got it - the VM-dev list is just very slow. Some comments:
* Using TheGCSemaphore makes the VM unusable with older images due to splObj
size mismatch - I'd want to change this to use an external semaphore index
(this is what I used for "my" primitive) to be able to run 3.6 images on 3.8
VMs.
* One of the truly important situations which is not covered in these
measures is when we have to run multiple compaction cycles due to lack of
forwarding blocks. I believe this has killed me in the past and taking GC
stats should definitely include this tad of information (dunno how to
measure to be honest...)
* The logic in incrementalGC for growing namely:
(((self sizeOfFree: freeBlock) < growHeadroom) and:
[(self fetchClassOf: (self splObj: TheGCSemaphore)) =
(self splObj: ClassSemaphore)]) ifTrue:
[growSize _ growHeadroom*3/2 - (self sizeOfFree: freeBlock)
self growObjectMemory: growSize].
looks odd. Questions:
- What has TheGCSemaphore to do with growing?
- Why do we grow when having less than growHeadroom space?
(all we need here is enough space to accomodate the next round of
allocations + IGC - I don't see a logic here)
- Why is the grow size inconsistent with, e.g.,
sufficientSpaceAfterGC:?
- Why do it all? :-)
(no, quite seriously, I don't see what good the logic actually does)
* Measuring statMarkCount, statCompMoveCount, statSweepCount, statMkFwdCount
etc. seem to be excessive - is there really any need to add extra
instructions to these tight loops? I'd rather live without these insns in
the midst of time-critical GC code.
Other than this it looks good. So I'd propose that:
a) We use "my" GC signaling code in order to keep the VMs compatible.
b) Add a counter for multiple compaction cycles (if we know how that is)
c) Either remove the growing code from IGC or add a comment explaining what
the point of it is and why the parameters have been choosen the way they
have been choosen
d) Get rid of of the counters in the inner loops of the GC code.
Opinions? (I'd be happy to integrate John's code on top of what I just
posted)
Cheers,
- Andreas
----- Original Message -----
From: "John M McIntosh" <jo...@sm...>
To: "Ian Piumarta" <ian...@hp...>; "Andreas Raab"
<and...@gm...>
Cc: "Squeak VM Developers" <squ...@li...>; "Tim
Rowledge" <ti...@su...>
Sent: Tuesday, January 18, 2005 8:32 PM
Subject: Re: [Squeak-VMdev] Re: GC improvements
> Well I tried to reply to the list, but that hasn't arrived yet.
>
> There is a primitive to set the GC semaphore,
> some VM changes for monitoring (more statistical data),
> some VM changes around the memory growth versus GC activity decision.
>
> The logic to tweak things lurks in the image, not the VM.
>
> Let's see where my message went. Otherwise the changesets are on my
> idisk.
>
>
> On Jan 18, 2005, at 6:28 PM, Andreas Raab wrote:
>
>>>> Tim - is there any chance that we can get these changes into the 3.8
>>>> VMMaker? This stuff will be critical for the next Tweak version and
>>>> having
>>>> it in the official 3.8 would heavily simplify migration.
>>> Perhaps you could chat with John about the GC monitoring code he
>>> suggested recently. There is a degree of overlap that you might be able
>>> to mutually remove, making my lif emuch simpler.
>>
>> I haven't seen that code ... but what I am proposing here should allow
>> us to run tuning code from the image instead of the VM.
>>
>>> Aaaannnnnd, how is this going to relate to the 64bit code? I'm still
>>> waiting for some answers about that before doing anything much to
>>> vmmaker.
>>
>> Just asked Ian about this today (I'll defer to him for an ultimate
>> answer).
>>
>> Cheers,
>> - Andreas
>>
>>
>>
>> -------------------------------------------------------
>> The SF.Net email is sponsored by: Beat the post-holiday blues
>> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>> _______________________________________________
>> Squeak-VMdev mailing list
>> Squ...@li...
>> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev
>>
>>
> --
> ========================================================================
> ===
> John M. McIntosh <jo...@sm...> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
|
|
From: John M M. <jo...@sm...> - 2005-01-19 06:30:44
|
On Jan 18, 2005, at 9:55 PM, Andreas Raab wrote: > Hi John, > > Thanks, got it - the VM-dev list is just very slow. Some comments: > > * Using TheGCSemaphore makes the VM unusable with older images due to > splObj size mismatch - I'd want to change this to use an external > semaphore index (this is what I used for "my" primitive) to be able to > run 3.6 images on 3.8 VMs. I stole the set semaphore from another usage for some other semaphore, which is why the check for > [(self fetchClassOf: (self splObj: TheGCSemaphore)) = > (self splObj: ClassSemaphore)]) I have no problem changing this to use a external semaphore index. The check then I'd guess would check to check for a non-zero value in that global. > > * One of the truly important situations which is not covered in these > measures is when we have to run multiple compaction cycles due to lack > of forwarding blocks. I believe this has killed me in the past and > taking GC stats should definitely include this tad of information > (dunno how to measure to be honest...) > > * The logic in incrementalGC for growing namely: > > (((self sizeOfFree: freeBlock) < growHeadroom) and: > [(self fetchClassOf: (self splObj: TheGCSemaphore)) = > (self splObj: ClassSemaphore)]) ifTrue: > [growSize _ growHeadroom*3/2 - (self sizeOfFree: freeBlock) > self growObjectMemory: growSize]. > > looks odd. Questions: > - What has TheGCSemaphore to do with growing? > - Why do we grow when having less than growHeadroom space? > (all we need here is enough space to accomodate the next round > of allocations + IGC - I don't see a logic here) > - Why is the grow size inconsistent with, e.g., > sufficientSpaceAfterGC:? > - Why do it all? :-) > (no, quite seriously, I don't see what good the logic actually > does) Looking at TheGCSemaphore allows me to turn on or off the new logic so I can run before/after tests using the same VM by just setting the TheGCSemaphore to nil or to a Semaphore. This code is where the problem with: > What I found was an issue which we hadn't realized is there, well I'm > sure people have seen it, but don't know why... > What happens is that as we are tenuring objects we are decreasing the > young space from 4MB to Zero. > > Now as indicated in the table below if conditions are right (a couple > of cases in the macrobenchmarks) why as you see the > number of objects we can allocate decreases to zero, and we actually > don't tenure anymore once the survivors fall below 2000. > The rate at which young space GC activity occurs goes from say 8 per > second towards 1000 per second, mind on fast machines > the young space ms accumulation count doesn't move much because the > time taken to do this is under 1 millisecond, or 0, skewing > those statistics and hiding the GC time.AllocationCount Survivors > 4000 5400 > 3209 3459 > 2269 2790 > 1760 1574 > 1592 2299 > 1105 1662 > 427 2355 > 392 2374 > 123 1472 > 89 1478 > 79 2 > 78 2 > 76 2 > 76 2 > > Note how we allocate 76 objects, do a young space GC, then have two > survivors, finally we reach the 200K minimum GC > threshold and do a full GC followed by growing young space. However > this process is very painful. By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: freeBlock) we avoid the above problem. In the smalltalk code is a post check to say if we've grown N MB between full GCs then it's time to do another one, this prevents uncontrolled growth. I could add that check in the VM? Should we? If the GC tuning process stops running then the VM will grow to the maximum Virtual memory size. I did not add it to the VM code since I wanted to minimize the code there. What this also exposed is a tendency for the image to grow. In my notes to Jerry Bell about Croquet: "In your testing the image uses about 200MB. Of which the regular image/vm ran upwards in 6MB chunks when building the teapot in the 58 seconds, then usually using 500K of memory as active young space. Now if you choose to allocate memory then tenure to reduce the amount of mark/sweep work by reducing the number of objects being manged this churns more memory then at some point you are forced to collect it all, making it an expensive noticeable operation.. The activeRun changed things a bit to force growth a bit faster, then tenure on excessive marking which appears to have gotten rid of 42 seconds of incremental GC time because we are looking at fewer objects on each young space mark/sweep." Soo I'd suggest building a VM and running some before after tests and observe memory usage and clock time to complete a known task of work. > > * Measuring statMarkCount, statCompMoveCount, statSweepCount, > statMkFwdCount etc. seem to be excessive - is there really any need to > add extra instructions to these tight loops? I'd rather live without > these insns in the midst of time-critical GC code. Well I wanted to collect data, I wonder tho if adding these new instructions it makes any measurable difference, maybe integer unit 47 on that cpu now gets used? Somehow I'd rather leave them, unless you can show they are issues? Remember we don't have anyway to collect that type of data right now. > > Other than this it looks good. So I'd propose that: > a) We use "my" GC signaling code in order to keep the VMs compatible. > b) Add a counter for multiple compaction cycles (if we know how that > is) > c) Either remove the growing code from IGC or add a comment explaining > what the point of it is and why the parameters have been choosen the > way they have been choosen > d) Get rid of of the counters in the inner loops of the GC code. > > Opinions? (I'd be happy to integrate John's code on top of what I just > posted) > > Cheers, > - Andreas -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-19 08:26:21
|
Hi John,
> I have no problem changing this to use a external semaphore index.
> The check then I'd guess would check to check for a non-zero value in
> that global.
I prefer this solution.
>> * The logic in incrementalGC for growing namely:
>
> This code is where the problem with:
>> What I found was an issue which we hadn't realized is there, well I'm
>> sure people have seen it, but don't know why...
>> What happens is that as we are tenuring objects we are decreasing the
>> young space from 4MB to Zero.
Ah, I see. Yes that'd be really bad.
>> Note how we allocate 76 objects, do a young space GC, then have two
>> survivors, finally we reach the 200K minimum GC
>> threshold and do a full GC followed by growing young space. However this
>> process is very painful.
>
> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree:
> freeBlock) we avoid the above problem.
I see. Yes this makes sense. (btw, I'm not sure if these parameter choices
are best but I guess since they aren't worse than what's there they must be
good enough ;-)
> In the smalltalk code is a post check to say if we've grown N MB between
> full GCs then it's time to do another one, this prevents uncontrolled
> growth.
> I could add that check in the VM? Should we? If the GC tuning process
> stops running then the VM will grow to the maximum Virtual memory size.
> I did not add it to the VM code since I wanted to minimize the code
> there.
With this aggressive growth strategy I think we should have a (VM) parameter
which controls when to run a full GC depending on how much we've grown.
Having a GC tuner sit there all the time and watch things fly by doesn't
look like the best solution to me.
On the other hand it seems like in this case we probably should be following
the same strategy in sufficientSpaceAfterGC:, shouldn't we? Here, we just GC
and then grow and it seems that if we're okay with aggressive growths me
grow here, too.
>> * Measuring statMarkCount, statCompMoveCount, statSweepCount,
>> statMkFwdCount etc. seem to be excessive - is there really any need to
>> add extra instructions to these tight loops? I'd rather live without
>> these insns in the midst of time-critical GC code.
>
> Well I wanted to collect data, I wonder tho if adding these new
> instructions it makes any measurable difference, maybe integer unit 47
> on that cpu now gets used? Somehow I'd rather leave them, unless you can
> show they are issues?
Well, the major reason why I don't like the insns there is that it is so
hard to measure the difference (otherwise I would have just done it). If you
know how to get accurate measures here (say to be able to spot a speed
difference of 1% reliably) let me know.
> Remember we don't have anyway to collect that type of data right now.
If you look at them, all can be computed by other means, say:
statMarkCount:
Number of marked objects ==
Number of roots + Number of survivors
statSweepCount
Number of objects in young space before GC ==
Number of survivors of last GC + allocationsBetweenGC
statMkFwdCount
Number of objects for which fwdBlocks were created ==
Number of survivors
statCompMoveCount
Number of chunks touched in incr. compaction ==
statSweepCount
So there really isn't any need to count them one by one (if the result of
counting would be different from the above formulaes it's time to get a new
CPU which gets addition right ;-)
Cheers,
- Andreas
|
|
From: Andreas R. <and...@gm...> - 2005-01-19 18:41:26
|
John,
>>> Note how we allocate 76 objects, do a young space GC, then have two
>>> survivors, finally we reach the 200K minimum GC
>>> threshold and do a full GC followed by growing young space. However
>>> this process is very painful.
>>
>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree:
>> freeBlock) we avoid the above problem.
>
> I see. Yes this makes sense. (btw, I'm not sure if these parameter choices
> are best but I guess since they aren't worse than what's there they must
> be good enough ;-)
I take this back. The longer I'm looking at these changes the more
questionable they look to me. With them, unless you do a manual full GC at
some point you keep growing and growing and growing until you just run out
of memory. I *really* don't like this.
The current machinery may be inefficient in some borderline situations but
it works very well with the default situations. With these tenuring changes
we risk to make the default behavior of the system to be one in which we
grow endlessly (say, if you run a web server or something like this), for
example:
queue := Array new: 20000.
index := 0.
[true] whileTrue:[
(index := index + 1) > queue size ifTrue:[index := 1].
queue at: index put: Object new.
].
You keep this guy looping and the only question is *when* you are running
out of memory (depending on the size of the object you stick into the
queue), not if. Compare this to the obscure circumstances in which we get a
(hardly noticable) slowdown with the current behavior. So I think some way
for bounding growths like in the above is absolutely required before even
considering that change.
> > statMarkCount:
> Actually this is the number of times around the marking loop,
> I don't think it's same as the survivor count plus roots.
That's right, the number of times around the loop is essentially
fieldCount(roots+survivors). But my point still stands that it is easily
computed and that we really don't need to explicitly count that loop.
Cheers,
- Andreas
|
|
From: John M M. <jo...@sm...> - 2005-01-19 21:47:04
|
On Jan 19, 2005, at 10:41 AM, Andreas Raab wrote: > John, > >>>> Note how we allocate 76 objects, do a young space GC, then have two >>>> survivors, finally we reach the 200K minimum GC >>>> threshold and do a full GC followed by growing young space. However >>>> this process is very painful. >>> >>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: >>> freeBlock) we avoid the above problem. >> >> I see. Yes this makes sense. (btw, I'm not sure if these parameter >> choices are best but I guess since they aren't worse than what's >> there they must be good enough ;-) > > I take this back. The longer I'm looking at these changes the more > questionable they look to me. With them, unless you do a manual full > GC at some point you keep growing and growing and growing until you > just run out of memory. I *really* don't like this. > > The current machinery may be inefficient in some borderline situations > but it works very well with the default situations. With these > tenuring changes we risk to make the default behavior of the system to > be one in which we grow endlessly (say, if you run a web server or > something like this), for example: > > queue := Array new: 20000. > index := 0. > [true] whileTrue:[ > (index := index + 1) > queue size ifTrue:[index := 1]. > queue at: index put: Object new. > ]. > > You keep this guy looping and the only question is *when* you are > running out of memory (depending on the size of the object you stick > into the queue), not if. Compare this to the obscure circumstances in > which we get a (hardly noticable) slowdown with the current behavior. > So I think some way for bounding growths like in the above is > absolutely required before even considering that change. You miss the point that after N MB of memory growth, we do a full GC event. The logic to do the full GC is either in the smalltalk code running as active memory monitoring, or can be moved into the image. I'm not growing the image endlessly. I've attached two jpegs of before/after memory end boundary charts when a person was working in a croquet world, not a borderline case. Also two jpegs from a seaside application (again not a borderline case) which were generated by doing: " wget --recursive --no-parent --delete-after --non-verbose \ http://localhost/seaside/alltests from 4 simultaneous threads." You'll note how the seaside application using the historical logic grows to 64MB, perhaps it will grow forever? Using the modified logic we actually cycle between 24MB and 45MB. Lastly hitting the boundary condition where you trigger 1000's of incremental GC events is triggered by just running the macrobenchmarks. As a reminder here is a summary of the information I calculated last nov OMniBrowser/Monnticello SUnits from Colin Putney Before any changes what I see is (averages) 8139 marked objects per young space GC, where 2426 marked via interpreter roots, and 5713 by remember table for 6703 iterations 4522 swept objects in young space 714 survivors After changes where we bias towards growth (more likely to tenure on excessive marking), and ensure young space stays largish, versus heading towards zero I see (again averages) 4652 marked objects per young space GC, where 2115 marked via interpreter roots, and 2526 by remember table for 6678 iterations 4238 swept objects in young space. 368 survivors This of course translates into fewer CPU cycles needed fpr youngspace GC work Jerry Bell send me some Croquet testing data Seems Croquet starts at about 30MB and grows upwards to 200MB when you invoke a teapot and look about. Jerry has to confirm what he did and if it was repeated mostly the same, but it did do 65,000 to 70,000 young space GC and it appears we reduced the young space GC time by 40 seconds. This does result in more full GC work (5) since I tenure about 16MB before doing a Full GC, but that accounts only for an extra second of real time... Marking in the original case is average 20,808 per young gc After alterations it's 11,386, making GC work faster I'll also note growing to the 195mb takes 49 seconds versus the original 57. > >> > statMarkCount: >> Actually this is the number of times around the marking loop, >> I don't think it's same as the survivor count plus roots. > > That's right, the number of times around the loop is essentially > fieldCount(roots+survivors). But my point still stands that it is > easily computed and that we really don't need to explicitly count that > loop. Fine compute them. > > Cheers, > - Andreas > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Squeak-VMdev mailing list > Squ...@li... > https://lists.sourceforge.net/lists/listinfo/squeak-vmdev > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-19 22:39:25
|
John, > You miss the point that after N MB of memory growth, we do a full GC > event. > > The logic to do the full GC is either in the smalltalk code running as > active memory monitoring, or can be moved into the image. My criticism here is that the *default* without running that extra code is not sensible. In other words, without active monitoring and memory tuning you will indeed run out of memory. And that's just not acceptable - there are plenty of people who will want to run a VM with as little overhead as possible. > I'm not growing the image endlessly. I've attached two jpegs of > before/after memory end boundary charts when a person was working > in a croquet world, not a borderline case. You're making my point ;-) Without the extra GC monitoring code (which people may not have, may not be aware about, may not want to run) the system would indeed rapidly grow beyound reasonable limits. > Also two jpegs from a seaside application (again not a borderline case) > which were generated by doing: Yup. Again making my point - there need to be sensible defaults in the VM before this strategy can make sense. You may want to be able to "tweak it away", e.g., set it to "unreasonable limits" to be able to have manual control from inside the image, but the VM needs to react sensibly even without that extra tuning code. Bottom line: If we want these changes, we need a sensible mechanism in the VM to avoid unbounded memory growth. > Jerry has to confirm what he did and if it was repeated mostly the > same, but it did do 65,000 to 70,000 young space GC and it appears > we reduced the young space GC time by 40 seconds. This does result in > more full GC work (5) since I tenure about 16MB before doing a Full GC, > but that accounts only for an extra second of real time... Agressive tenuring would probably work around some of the problems we've seen in the past (see http://croqueteer.blogspot.com/2005/01/need-for-speed.html) though in general, it seems as if Croquet performs significantly better if you allow for more allocationsBetweenGCs - it seems as if the working set of Croquet is larger than your average Squeak working set. This also explains the stats you are getting - you're tenuring like mad because youngSpace is too small and then at some point where you hit fullGC you are "back to normal". This probably shouldn't be "fixed" by tenuring but rather by tweaking the allocationsBetweenGCs. Also notice that the stats look like with the agressive growths logic we are doing more fullGCs than without them (this isn't totally clear from looking at the pictures but I would expect more spikes in the before part if it had more fullGCs). This is not necessarily a good thing - I would rather spend a few percent more on IGC than having to run fullGCs in Croquet. Cheers, - Andreas |
|
From: John M M. <jo...@sm...> - 2005-01-19 22:57:41
|
On Jan 19, 2005, at 2:38 PM, Andreas Raab wrote: > John, > >> You miss the point that after N MB of memory growth, we do a full GC >> event. >> >> The logic to do the full GC is either in the smalltalk code running as >> active memory monitoring, or can be moved into the image. > > My criticism here is that the *default* without running that extra > code is not sensible. In other words, without active monitoring and > memory tuning you will indeed run out of memory. And that's just not > acceptable - there are plenty of people who will want to run a VM with > as little overhead as possible. Well the VM change won't get triggered unless you supply a GC semaphore as part of doing active monitoring and memory tuning. No active monitoring and memory tuning then no new logic is run and we default to the historical behavior. My original point was that the growth part is in the VM, the GC trigger companion to the logic is in the image. The VM part will always run, the image part could choke. Thus a suggestion to move it into the VM. However I'm leery about doing that since perhaps you don't want to say do a GC after N MB of growth, rather do it after a build a teapot, or after N http requests, having it in smalltalk makes it much easier to tinker with. I will note that in VisualWorks the oldspace GC logic is a separate process (or two) and if they die your VW application will soon run out of memory since those Smalltalk processes control how oldspace incremental logic is run. So leaving the logic in Smalltalk isn't an outrageous suggestion, but it's not fail safe. What I could do is integrate your changes with mine over the weekend, and unless you want to take on that task? > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-20 05:33:06
|
John, > Well the VM change won't get triggered unless you supply a GC semaphore > as part of doing active monitoring and memory tuning. No active > monitoring and memory tuning then no new logic is run and we default to > the historical behavior. My original point was that the growth part is in > the VM, the GC trigger companion to the logic is in the image. The VM > part will always run, the image part could choke. Thus a suggestion to > move it into the VM. However I'm leery about doing that since perhaps > you don't want to say do a GC after N MB of growth, rather do it after a > build a teapot, or after N http requests, having it in smalltalk makes it > much easier to tinker with. Okay, now I see your point. I was confused by the intention of having the -seemingly unrelated- GC semaphore be the indicator for having the image in control of memory growth behavior (thus my question about what the GC sema has to do with growing the OM). This makes sense now, and although I'd still like to be able to have more reasonable behavior without a monitoring process I'm fine with having the change done in this way. Effectively this means that older images running on VMs with these changes exhibit precisely the previous behavior and running the GC monitor on top of a new VM would put the monitor in control. Sounds good. > I will note that in VisualWorks the oldspace GC logic is a separate > process (or two) and if they die your VW application will soon run out of > memory since those Smalltalk processes control how oldspace incremental > logic is run. So leaving the logic in Smalltalk isn't an outrageous > suggestion, but it's not fail safe. I didn't think it would be outrageous - the thing I worried about is running an image on top of a VM which exposes unbounded growth behavior. (and I don't care whether VW dies if the controller process dies ;-) > What I could do is integrate your changes with mine over the weekend, and > unless you want to take on that task? If you can find the time this would be great (I may or may not get around to it). To summarize: * Use gcSemaIndex instead of TheGCSemaphore for compatibility * Trigger growth logic in tenuring upon presence of gc sema * Remove the stats from the inner loops of the GC logic. Is this it? Cheers, - Andreas |
|
From: John M M. <jo...@ma...> - 2005-01-20 19:07:25
|
On Jan 19, 2005, at 9:32 PM, Andreas Raab wrote: >> What I could do is integrate your changes with mine over the weekend, >> and unless you want to take on that task? > > If you can find the time this would be great (I may or may not get > around to it). > To summarize: > * Use gcSemaIndex instead of TheGCSemaphore for compatibility > * Trigger growth logic in tenuring upon presence of gc sema > * Remove the stats from the inner loops of the GC logic. > Is this it? > > Cheers, > - Andreas Ok, but I'll make one change. I'll see about a flag to turn the new behavior on versus re-using the semaphore. That way you don't mix both desires, one of getting IGC/GC notification, the other of triggering the new behavior. ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-28 22:56:23
|
Hi John, > Ok, I consolidated all of this, added a prim to turn the biasGrowth > behavior on where the grow and post fullGC logic is in the interp.c code, > versus being partially in the image. Also a prim to set the threshold for > doing the fullGC if we have grown by N bytes. Thus you can turn off/on > and set the boundary. Left in the other prims as agreed. Thanks, this sounds good. > a) The 10 entries I have for FinalizationDependents have sizes of #(0 2 0 > 3 2 55825 nil nil nil nil). You will note the one with 55,825 entries. > This actually a 98K or so WeakIdenityKeyDictionary of CompiledMethods. This is not the case in any image I am using. You must be using a non-standard image; all the regular ones that I've checked have #(0 2 nil nil nil nil nil nil nil nil) (0: Sockets; 2: Files). > which leds to signaling the FinalizationSemaphore which wakes up the > FinalizationProcess which calls finalizeValues on each WeakArray > instance (and subclass instance) That has some issues. Correct. But this has always been the case and is not a result of the recent changes. > PS Isn't it a bit ugly to iterate over all the elements of Weak things > looking for null entries, versus passing up a cluestick? Yes it is. But remember: Passing a cluestick requires an (unbounded) amount of memory which is why I never considered it. Cheers, - Andreas |
|
From: John M M. <jo...@ma...> - 2005-02-01 02:04:26
|
On Jan 28, 2005, at 2:55 PM, Andreas Raab wrote: > Hi John, > >> Ok, I consolidated all of this, added a prim to turn the biasGrowth >> behavior on where the grow and post fullGC logic is in the interp.c >> code, versus being partially in the image. Also a prim to set the >> threshold for doing the fullGC if we have grown by N bytes. Thus you >> can turn off/on and set the boundary. Left in the other prims as >> agreed. > > Thanks, this sounds good. > >> a) The 10 entries I have for FinalizationDependents have sizes of #(0 >> 2 0 3 2 55825 nil nil nil nil). You will note the one with 55,825 >> entries. This actually a 98K or so WeakIdenityKeyDictionary of >> CompiledMethods. > > This is not the case in any image I am using. You must be using a > non-standard image; all the regular ones that I've checked have #(0 2 > nil nil nil nil nil nil nil nil) (0: Sockets; 2: Files). Well no it's not a non-standard image, rather it's a working image floating about that I have VMMaker in, but the problem is I filed in your code from your note of Nov 11th/ 2004 (below) when you were talking about issues with the finalization process and CPU performance. Seems I filed this in and save the image (duh! well that was dumb). For some reason the macrobenchmark triggers a weak object GC event base on the new changes which then grinds thru the 48K WeakIdentityKeyDictionary and as you point out that's CPU intensive. Smalltalk removeKey: #CPUHog of course fixes things. > I've seen this problem myself. It is easiest to see what happens when > you have a process browser open and turn on the cpu watcher - this > will show that the finalization process takes a huge amount of > resources. > > But why? Most likely (this was the case I have experienced) you have > created some weak collection with what I consider "automatic > finalization", e.g., WeakRegistry and friends register themselves to > get notified when a weak references got freed. If this registry grows > very large it can take significant amounts of time to do the > finalization and if your code is then weak reference heavy you may > spend a lot of time in finalization. > > Here is an example illustration the problem: > | hog proc | > hog := WeakIdentityKeyDictionary new. > CompiledMethod allInstancesDo:[:cm| hog at: cm put: 42.0]. > Smalltalk at: #CPUHog put: hog. > WeakArray addWeakDependent: hog. > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: John M M. <jo...@sm...> - 2005-01-19 23:03:57
|
On Jan 19, 2005, at 2:38 PM, Andreas Raab wrote: > John, >> Jerry has to confirm what he did and if it was repeated mostly the >> same, but it did do 65,000 to 70,000 young space GC and it appears >> we reduced the young space GC time by 40 seconds. This does result in >> more full GC work (5) since I tenure about 16MB before doing a Full >> GC, >> but that accounts only for an extra second of real time... > > Agressive tenuring would probably work around some of the problems > we've seen in the past (see > http://croqueteer.blogspot.com/2005/01/need-for-speed.html) though in > general, it seems as if Croquet performs significantly better if you > allow for more allocationsBetweenGCs - it seems as if the working set > of Croquet is larger than your average Squeak working set. This also > explains the stats you are getting - you're tenuring like mad because > youngSpace is too small and then at some point where you hit fullGC > you are "back to normal". This probably shouldn't be "fixed" by > tenuring but rather by tweaking the allocationsBetweenGCs. The problem with making allocationsBetweenGCs is the average objects explored goes up since the number of objects that might live is allocationsBetweenGCs. This increases the incremental GC time by milliseconds which impacts timing for Delays. > > Also notice that the stats look like with the agressive growths logic > we are doing more fullGCs than without them (this isn't totally clear > from looking at the pictures but I would expect more spikes in the > before part if it had more fullGCs). This is not necessarily a good > thing - I would rather spend a few percent more on IGC than having to > run fullGCs in Croquet. In this example the historical implementation took 57 seconds to build a teapot, after the change it takes 49 seconds, even if it took a second or two to manage the full GCs. I'm grabbed back 8 seconds that is a good thing. > > Cheers, > - Andreas > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-20 05:33:29
|
[Re: Croquet and GC] > The problem with making allocationsBetweenGCs is the average objects > explored goes up since the number of objects that might live is > allocationsBetweenGCs. This increases the incremental GC time by > milliseconds which impacts timing for Delays. Sure does. But on the other hand, an increased number of allocationsBetweenGCs causes GC to happen less often and is therefore less time consuming (I'm talking about things like >30% time spent in IGC here). The best delay accuracy doesn't help if the system is busy running GC cycles ;-) > In this example the historical implementation took 57 seconds to build a > teapot, after the change it takes 49 seconds, even if it took a second or > two to manage the full GCs. I'm grabbed back 8 seconds that is a good > thing. In this particular example (building a teapot) yes. In other situations the tradeoffs are very different. If you are running a game you can not afford a hickup every fifteen seconds even if that may make your game run with 57fps instead of 49fps. [Besides I have to admit that I consider "building a teapot morph" a very atypical example of Croquet use - it is an artifact of our current lack of replication and will go away in the not too distant future so using it to discuss how Croquet behaves is just totally missing the point]. Cheers, - Andreas |