You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(54) |
Jun
(3) |
Jul
|
Aug
(23) |
Sep
(33) |
Oct
(14) |
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(5) |
Jun
|
Jul
|
Aug
(15) |
Sep
(4) |
Oct
|
Nov
|
Dec
|
| 2004 |
Jan
(1) |
Feb
|
Mar
(26) |
Apr
(130) |
May
(5) |
Jun
|
Jul
(21) |
Aug
(3) |
Sep
(24) |
Oct
(10) |
Nov
(37) |
Dec
(2) |
| 2005 |
Jan
(30) |
Feb
(15) |
Mar
(4) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(2) |
Dec
|
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(10) |
| 2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Craig L. <cr...@ne...> - 2005-02-01 20:33:04
|
> squeak.hpl.hp.com Ahhh... nice. :) thanks, -C -- Craig Latta improvisational musical informaticist cr...@ne... www.netjam.org [|] Proceed for Truth! |
|
From: Andreas R. <and...@gm...> - 2005-02-01 12:28:23
|
> I'm unable to tell if it is publicly accessible since I'm registered - > so obviously it lets me in. The SF stuff is presumably still there - > haven't looked in ages - but by now must be well out of date. I wonder > how we get it actually removed? And we'll need to change the mailing > listas well since I imagine that would disappear too. Well, we just fry the CVS repository. I guess you should be able to do this when you log into a shell (we do have shell access). In this case we could just sit out the whole mess for now - if SF throws us out we leave earlier, otherwise we'll leave later. Cheers, - Andreas |
|
From: Andreas R. <and...@gm...> - 2005-02-01 12:25:50
|
John - All the code is up at squeak.hpl.hp.com for now. I think we should start redirecting people to this place. Cheers, - Andreas ----- Original Message ----- From: "John M McIntosh" <jo...@ma...> To: "Squeak VM Developers" <squ...@li...> Sent: Tuesday, February 01, 2005 9:26 AM Subject: [Squeak-VMdev] Fwd: VM building > So, did we resolve where the subversion based source code is and takedown > of the SourceForge stuff? > Tom here wants to build a VM, so what do I tell him? > > > Begin forwarded message: > >> From: Tom Rushworth <tb...@li...> >> Date: January 31, 2005 4:18:52 PM PST >> To: John M McIntosh <jo...@sm...> >> Subject: Re: VM building (was Re: possible idiot question) >> >> John, >> >> On 31-Jan-05, at 2:59 PM, John M McIntosh wrote: >> >>> >>> On Jan 31, 2005, at 9:10 AM, Tom Rushworth wrote: >>> [snip] >>>> >>>> I'm actually using darcs for my source code control, and like it very >>>> much. What I don't like is that Xcode seems to >>>> mix data that looks more like appearance than dependency info into the >>>> file(s). I haven't been very discriminating >>>> in what files I checkin though, so it may be possible that xcode has >>>> the dependency info in a separate file, I haven't >>>> looked too closely. >>> >>> We've moved to subversion http://subversion.tigris.org/ >> >> I tried subversion before darcs, since Xcode supports it, but couldn't >> get it to work even after half a day >> of fiddling. darcs worked out of the box. Oh well, I guess I'll have >> to go back to poking at subversion. >> Once I get it working, can I point at a server somewhere to get the >> platform tree? >>> >>> >>> -- >>> ====================================================================== >>> ===== >>> John M. McIntosh <jo...@sm...> 1-800-477-2659 >>> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com >>> ====================================================================== >>> ===== >>> >>> >> -- >> Tom Rushworth >> >> > -- > ======================================================================== > === > John M. McIntosh <jo...@sm...> 1-800-477-2659 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Squeak-VMdev mailing list > Squ...@li... > https://lists.sourceforge.net/lists/listinfo/squeak-vmdev > |
|
From: Tim R. <ti...@su...> - 2005-02-01 02:16:23
|
In message <bcb...@ma...>
John M McIntosh <jo...@ma...> wrote:
> So, did we resolve where the subversion based source code is and
> takedown of the SourceForge stuff?
The SVN server is currently http://squeak.hpl.hp.com/svn/squeak/trunk/platforms
I'm unable to tell if it is publicly accessible since I'm registered -
so obviously it lets me in. The SF stuff is presumably still there -
haven't looked in ages - but by now must be well out of date. I wonder
how we get it actually removed? And we'll need to change the mailing
listas well since I imagine that would disappear too.
tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
Useful random insult:- Can't find log base two of 65536 without a calculator.
|
|
From: John M M. <jo...@ma...> - 2005-02-01 02:04:26
|
On Jan 28, 2005, at 2:55 PM, Andreas Raab wrote: > Hi John, > >> Ok, I consolidated all of this, added a prim to turn the biasGrowth >> behavior on where the grow and post fullGC logic is in the interp.c >> code, versus being partially in the image. Also a prim to set the >> threshold for doing the fullGC if we have grown by N bytes. Thus you >> can turn off/on and set the boundary. Left in the other prims as >> agreed. > > Thanks, this sounds good. > >> a) The 10 entries I have for FinalizationDependents have sizes of #(0 >> 2 0 3 2 55825 nil nil nil nil). You will note the one with 55,825 >> entries. This actually a 98K or so WeakIdenityKeyDictionary of >> CompiledMethods. > > This is not the case in any image I am using. You must be using a > non-standard image; all the regular ones that I've checked have #(0 2 > nil nil nil nil nil nil nil nil) (0: Sockets; 2: Files). Well no it's not a non-standard image, rather it's a working image floating about that I have VMMaker in, but the problem is I filed in your code from your note of Nov 11th/ 2004 (below) when you were talking about issues with the finalization process and CPU performance. Seems I filed this in and save the image (duh! well that was dumb). For some reason the macrobenchmark triggers a weak object GC event base on the new changes which then grinds thru the 48K WeakIdentityKeyDictionary and as you point out that's CPU intensive. Smalltalk removeKey: #CPUHog of course fixes things. > I've seen this problem myself. It is easiest to see what happens when > you have a process browser open and turn on the cpu watcher - this > will show that the finalization process takes a huge amount of > resources. > > But why? Most likely (this was the case I have experienced) you have > created some weak collection with what I consider "automatic > finalization", e.g., WeakRegistry and friends register themselves to > get notified when a weak references got freed. If this registry grows > very large it can take significant amounts of time to do the > finalization and if your code is then weak reference heavy you may > spend a lot of time in finalization. > > Here is an example illustration the problem: > | hog proc | > hog := WeakIdentityKeyDictionary new. > CompiledMethod allInstancesDo:[:cm| hog at: cm put: 42.0]. > Smalltalk at: #CPUHog put: hog. > WeakArray addWeakDependent: hog. > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: John M M. <jo...@ma...> - 2005-02-01 00:26:09
|
So, did we resolve where the subversion based source code is and takedown of the SourceForge stuff? Tom here wants to build a VM, so what do I tell him? Begin forwarded message: > From: Tom Rushworth <tb...@li...> > Date: January 31, 2005 4:18:52 PM PST > To: John M McIntosh <jo...@sm...> > Subject: Re: VM building (was Re: possible idiot question) > > John, > > On 31-Jan-05, at 2:59 PM, John M McIntosh wrote: > >> >> On Jan 31, 2005, at 9:10 AM, Tom Rushworth wrote: >> [snip] >>> >>> I'm actually using darcs for my source code control, and like it >>> very much. What I don't like is that Xcode seems to >>> mix data that looks more like appearance than dependency info into >>> the file(s). I haven't been very discriminating >>> in what files I checkin though, so it may be possible that xcode has >>> the dependency info in a separate file, I haven't >>> looked too closely. >> >> We've moved to subversion http://subversion.tigris.org/ > > I tried subversion before darcs, since Xcode supports it, but couldn't > get it to work even after half a day > of fiddling. darcs worked out of the box. Oh well, I guess I'll have > to go back to poking at subversion. > Once I get it working, can I point at a server somewhere to get the > platform tree? >> >> >> -- >> ====================================================================== >> ===== >> John M. McIntosh <jo...@sm...> 1-800-477-2659 >> Corporate Smalltalk Consulting Ltd. >> http://www.smalltalkconsulting.com >> ====================================================================== >> ===== >> >> > -- > Tom Rushworth > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-28 22:56:23
|
Hi John, > Ok, I consolidated all of this, added a prim to turn the biasGrowth > behavior on where the grow and post fullGC logic is in the interp.c code, > versus being partially in the image. Also a prim to set the threshold for > doing the fullGC if we have grown by N bytes. Thus you can turn off/on > and set the boundary. Left in the other prims as agreed. Thanks, this sounds good. > a) The 10 entries I have for FinalizationDependents have sizes of #(0 2 0 > 3 2 55825 nil nil nil nil). You will note the one with 55,825 entries. > This actually a 98K or so WeakIdenityKeyDictionary of CompiledMethods. This is not the case in any image I am using. You must be using a non-standard image; all the regular ones that I've checked have #(0 2 nil nil nil nil nil nil nil nil) (0: Sockets; 2: Files). > which leds to signaling the FinalizationSemaphore which wakes up the > FinalizationProcess which calls finalizeValues on each WeakArray > instance (and subclass instance) That has some issues. Correct. But this has always been the case and is not a result of the recent changes. > PS Isn't it a bit ugly to iterate over all the elements of Weak things > looking for null entries, versus passing up a cluestick? Yes it is. But remember: Passing a cluestick requires an (unbounded) amount of memory which is why I never considered it. Cheers, - Andreas |
|
From: John M M. <jo...@ma...> - 2005-01-28 08:55:29
|
(Sent to list too) Ok, I consolidated all of this, added a prim to turn the biasGrowth behavior on where the grow and post fullGC logic is in the interp.c code, versus being partially in the image. Also a prim to set the threshold for doing the fullGC if we have grown by N bytes. Thus you can turn off/on and set the boundary. Left in the other prims as agreed. However when I when to cross check performance and impact I immediately ran into a performance problem which took some time to figure out. If you take the freecell game and select game 1 (the benchmark) then instead of completing in about 4 seconds, it took 85 seconds. Ick. Well it's going to be a long night, now which of my GC changes, Andreas' changes or Ian's weak object changes, or my changes to drawing/flushing/locking causes this. Many hours later... So what is happening, also someone could confirm these details. a) The 10 entries I have for FinalizationDependents have sizes of #(0 2 0 3 2 55825 nil nil nil nil). You will note the one with 55,825 entries. This actually a 98K or so WeakIdenityKeyDictionary of CompiledMethods. b) On every animated card move this code in markAndTrace: is triggered to increment weakRootCount (self isWeakNonInt: oop) ifTrue: [ "Set lastFieldOffset before the weak fields in the receiver" lastFieldOffset := (self nonWeakFieldsOf: oop) << 2. "And remember as weak root" weakRootCount := weakRootCount + 1. weakRoots at: weakRootCount put: oop. ] ifFalse: [ "Do it the usual way" lastFieldOffset _ self lastPointerOf: oop. ]. which later triggers 1 to: weakRootCount do:[:i| self finalizeReference: (weakRoots at: i)]. which leds to signaling the FinalizationSemaphore which wakes up the FinalizationProcess which calls finalizeValues on each WeakArray instance (and subclass instance) That has some issues. One is that it iterates over the 98K WeakIdenityKeyDictionary and then cheerfully does a rehash even if it didn't do anything. First should we only do the rehash if we actually tamper with the data? Aka this suggestion. finalizeValues "remove all nil keys and rehash the receiver afterwards" | assoc hit | hit _ false. 1 to: array size do:[:i| assoc _ array at: i. (assoc notNil and:[assoc key == nil]) ifTrue:[array at: i put: nil. hit _ true]. ]. hit ifTrue: [self rehash]. Well adding that makes it a bit faster, still hunting thru 98K elements every card animation move isn't fun. PS Isn't it a bit ugly to iterate over all the elements of Weak things looking for null entries, versus passing up a cluestick? I'll note VisualAge had issues years ago, they would pass up the OOps, one at a time, finalization on large numbers took *forever*. Then they moved to passing up N elements, where when N overflowed you were toast. Can't recall what it is now, but they got past losing entries. On Jan 20, 2005, at 11:07 AM, John M McIntosh wrote: > On Jan 19, 2005, at 9:32 PM, Andreas Raab wrote: > >>> What I could do is integrate your changes with mine over the >>> weekend, and unless you want to take on that task? >> >> If you can find the time this would be great (I may or may not get >> around to it). >> To summarize: >> * Use gcSemaIndex instead of TheGCSemaphore for compatibility >> * Trigger growth logic in tenuring upon presence of gc sema >> * Remove the stats from the inner loops of the GC logic. >> Is this it? >> >> Cheers, >> - Andreas > > Ok, but I'll make one change. I'll see about a flag to turn the new > behavior on versus re-using the > semaphore. That way you don't mix both desires, one of getting IGC/GC > notification, the other of triggering the new behavior. > > ======================================================================= > ==== > John M. McIntosh <jo...@sm...> 1-800-477-2659 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================= > ==== -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: John M M. <jo...@ma...> - 2005-01-20 19:07:25
|
On Jan 19, 2005, at 9:32 PM, Andreas Raab wrote: >> What I could do is integrate your changes with mine over the weekend, >> and unless you want to take on that task? > > If you can find the time this would be great (I may or may not get > around to it). > To summarize: > * Use gcSemaIndex instead of TheGCSemaphore for compatibility > * Trigger growth logic in tenuring upon presence of gc sema > * Remove the stats from the inner loops of the GC logic. > Is this it? > > Cheers, > - Andreas Ok, but I'll make one change. I'll see about a flag to turn the new behavior on versus re-using the semaphore. That way you don't mix both desires, one of getting IGC/GC notification, the other of triggering the new behavior. ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-20 05:33:29
|
[Re: Croquet and GC] > The problem with making allocationsBetweenGCs is the average objects > explored goes up since the number of objects that might live is > allocationsBetweenGCs. This increases the incremental GC time by > milliseconds which impacts timing for Delays. Sure does. But on the other hand, an increased number of allocationsBetweenGCs causes GC to happen less often and is therefore less time consuming (I'm talking about things like >30% time spent in IGC here). The best delay accuracy doesn't help if the system is busy running GC cycles ;-) > In this example the historical implementation took 57 seconds to build a > teapot, after the change it takes 49 seconds, even if it took a second or > two to manage the full GCs. I'm grabbed back 8 seconds that is a good > thing. In this particular example (building a teapot) yes. In other situations the tradeoffs are very different. If you are running a game you can not afford a hickup every fifteen seconds even if that may make your game run with 57fps instead of 49fps. [Besides I have to admit that I consider "building a teapot morph" a very atypical example of Croquet use - it is an artifact of our current lack of replication and will go away in the not too distant future so using it to discuss how Croquet behaves is just totally missing the point]. Cheers, - Andreas |
|
From: Andreas R. <and...@gm...> - 2005-01-20 05:33:06
|
John, > Well the VM change won't get triggered unless you supply a GC semaphore > as part of doing active monitoring and memory tuning. No active > monitoring and memory tuning then no new logic is run and we default to > the historical behavior. My original point was that the growth part is in > the VM, the GC trigger companion to the logic is in the image. The VM > part will always run, the image part could choke. Thus a suggestion to > move it into the VM. However I'm leery about doing that since perhaps > you don't want to say do a GC after N MB of growth, rather do it after a > build a teapot, or after N http requests, having it in smalltalk makes it > much easier to tinker with. Okay, now I see your point. I was confused by the intention of having the -seemingly unrelated- GC semaphore be the indicator for having the image in control of memory growth behavior (thus my question about what the GC sema has to do with growing the OM). This makes sense now, and although I'd still like to be able to have more reasonable behavior without a monitoring process I'm fine with having the change done in this way. Effectively this means that older images running on VMs with these changes exhibit precisely the previous behavior and running the GC monitor on top of a new VM would put the monitor in control. Sounds good. > I will note that in VisualWorks the oldspace GC logic is a separate > process (or two) and if they die your VW application will soon run out of > memory since those Smalltalk processes control how oldspace incremental > logic is run. So leaving the logic in Smalltalk isn't an outrageous > suggestion, but it's not fail safe. I didn't think it would be outrageous - the thing I worried about is running an image on top of a VM which exposes unbounded growth behavior. (and I don't care whether VW dies if the controller process dies ;-) > What I could do is integrate your changes with mine over the weekend, and > unless you want to take on that task? If you can find the time this would be great (I may or may not get around to it). To summarize: * Use gcSemaIndex instead of TheGCSemaphore for compatibility * Trigger growth logic in tenuring upon presence of gc sema * Remove the stats from the inner loops of the GC logic. Is this it? Cheers, - Andreas |
|
From: John M M. <jo...@sm...> - 2005-01-19 23:03:57
|
On Jan 19, 2005, at 2:38 PM, Andreas Raab wrote: > John, >> Jerry has to confirm what he did and if it was repeated mostly the >> same, but it did do 65,000 to 70,000 young space GC and it appears >> we reduced the young space GC time by 40 seconds. This does result in >> more full GC work (5) since I tenure about 16MB before doing a Full >> GC, >> but that accounts only for an extra second of real time... > > Agressive tenuring would probably work around some of the problems > we've seen in the past (see > http://croqueteer.blogspot.com/2005/01/need-for-speed.html) though in > general, it seems as if Croquet performs significantly better if you > allow for more allocationsBetweenGCs - it seems as if the working set > of Croquet is larger than your average Squeak working set. This also > explains the stats you are getting - you're tenuring like mad because > youngSpace is too small and then at some point where you hit fullGC > you are "back to normal". This probably shouldn't be "fixed" by > tenuring but rather by tweaking the allocationsBetweenGCs. The problem with making allocationsBetweenGCs is the average objects explored goes up since the number of objects that might live is allocationsBetweenGCs. This increases the incremental GC time by milliseconds which impacts timing for Delays. > > Also notice that the stats look like with the agressive growths logic > we are doing more fullGCs than without them (this isn't totally clear > from looking at the pictures but I would expect more spikes in the > before part if it had more fullGCs). This is not necessarily a good > thing - I would rather spend a few percent more on IGC than having to > run fullGCs in Croquet. In this example the historical implementation took 57 seconds to build a teapot, after the change it takes 49 seconds, even if it took a second or two to manage the full GCs. I'm grabbed back 8 seconds that is a good thing. > > Cheers, > - Andreas > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: John M M. <jo...@sm...> - 2005-01-19 22:57:41
|
On Jan 19, 2005, at 2:38 PM, Andreas Raab wrote: > John, > >> You miss the point that after N MB of memory growth, we do a full GC >> event. >> >> The logic to do the full GC is either in the smalltalk code running as >> active memory monitoring, or can be moved into the image. > > My criticism here is that the *default* without running that extra > code is not sensible. In other words, without active monitoring and > memory tuning you will indeed run out of memory. And that's just not > acceptable - there are plenty of people who will want to run a VM with > as little overhead as possible. Well the VM change won't get triggered unless you supply a GC semaphore as part of doing active monitoring and memory tuning. No active monitoring and memory tuning then no new logic is run and we default to the historical behavior. My original point was that the growth part is in the VM, the GC trigger companion to the logic is in the image. The VM part will always run, the image part could choke. Thus a suggestion to move it into the VM. However I'm leery about doing that since perhaps you don't want to say do a GC after N MB of growth, rather do it after a build a teapot, or after N http requests, having it in smalltalk makes it much easier to tinker with. I will note that in VisualWorks the oldspace GC logic is a separate process (or two) and if they die your VW application will soon run out of memory since those Smalltalk processes control how oldspace incremental logic is run. So leaving the logic in Smalltalk isn't an outrageous suggestion, but it's not fail safe. What I could do is integrate your changes with mine over the weekend, and unless you want to take on that task? > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-19 22:39:25
|
John, > You miss the point that after N MB of memory growth, we do a full GC > event. > > The logic to do the full GC is either in the smalltalk code running as > active memory monitoring, or can be moved into the image. My criticism here is that the *default* without running that extra code is not sensible. In other words, without active monitoring and memory tuning you will indeed run out of memory. And that's just not acceptable - there are plenty of people who will want to run a VM with as little overhead as possible. > I'm not growing the image endlessly. I've attached two jpegs of > before/after memory end boundary charts when a person was working > in a croquet world, not a borderline case. You're making my point ;-) Without the extra GC monitoring code (which people may not have, may not be aware about, may not want to run) the system would indeed rapidly grow beyound reasonable limits. > Also two jpegs from a seaside application (again not a borderline case) > which were generated by doing: Yup. Again making my point - there need to be sensible defaults in the VM before this strategy can make sense. You may want to be able to "tweak it away", e.g., set it to "unreasonable limits" to be able to have manual control from inside the image, but the VM needs to react sensibly even without that extra tuning code. Bottom line: If we want these changes, we need a sensible mechanism in the VM to avoid unbounded memory growth. > Jerry has to confirm what he did and if it was repeated mostly the > same, but it did do 65,000 to 70,000 young space GC and it appears > we reduced the young space GC time by 40 seconds. This does result in > more full GC work (5) since I tenure about 16MB before doing a Full GC, > but that accounts only for an extra second of real time... Agressive tenuring would probably work around some of the problems we've seen in the past (see http://croqueteer.blogspot.com/2005/01/need-for-speed.html) though in general, it seems as if Croquet performs significantly better if you allow for more allocationsBetweenGCs - it seems as if the working set of Croquet is larger than your average Squeak working set. This also explains the stats you are getting - you're tenuring like mad because youngSpace is too small and then at some point where you hit fullGC you are "back to normal". This probably shouldn't be "fixed" by tenuring but rather by tweaking the allocationsBetweenGCs. Also notice that the stats look like with the agressive growths logic we are doing more fullGCs than without them (this isn't totally clear from looking at the pictures but I would expect more spikes in the before part if it had more fullGCs). This is not necessarily a good thing - I would rather spend a few percent more on IGC than having to run fullGCs in Croquet. Cheers, - Andreas |
|
From: John M M. <jo...@ma...> - 2005-01-19 21:54:37
|
Resent but without the attachments. Begin forwarded message: > From: John M McIntosh <jo...@sm...> > Date: January 19, 2005 1:46:52 PM PST > To: "Andreas Raab" <and...@gm...> > Cc: "Squeak VM Developers" <squ...@li...>, "Tim > Rowledge" <ti...@su...>, "Ian Piumarta" > <ian...@hp...> > Subject: Re: [Squeak-VMdev] Re: GC improvements > > > On Jan 19, 2005, at 10:41 AM, Andreas Raab wrote: > >> John, >> >>>>> Note how we allocate 76 objects, do a young space GC, then have >>>>> two survivors, finally we reach the 200K minimum GC >>>>> threshold and do a full GC followed by growing young space. >>>>> However this process is very painful. >>>> >>>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: >>>> freeBlock) we avoid the above problem. >>> >>> I see. Yes this makes sense. (btw, I'm not sure if these parameter >>> choices are best but I guess since they aren't worse than what's >>> there they must be good enough ;-) >> >> I take this back. The longer I'm looking at these changes the more >> questionable they look to me. With them, unless you do a manual full >> GC at some point you keep growing and growing and growing until you >> just run out of memory. I *really* don't like this. >> >> The current machinery may be inefficient in some borderline >> situations but it works very well with the default situations. With >> these tenuring changes we risk to make the default behavior of the >> system to be one in which we grow endlessly (say, if you run a web >> server or something like this), for example: >> >> queue := Array new: 20000. >> index := 0. >> [true] whileTrue:[ >> (index := index + 1) > queue size ifTrue:[index := 1]. >> queue at: index put: Object new. >> ]. >> >> You keep this guy looping and the only question is *when* you are >> running out of memory (depending on the size of the object you stick >> into the queue), not if. Compare this to the obscure circumstances in >> which we get a (hardly noticable) slowdown with the current behavior. >> So I think some way for bounding growths like in the above is >> absolutely required before even considering that change. > > > You miss the point that after N MB of memory growth, we do a full GC > event. > > The logic to do the full GC is either in the smalltalk code running as > active memory monitoring, or can be moved into the image. I'm not > growing the image endlessly. I've attached two jpegs of before/after > memory end boundary charts when a person was working in a croquet > world, not a borderline case. > Also two jpegs from a seaside application (again not a borderline > case) which were generated by doing: > > " wget --recursive --no-parent --delete-after --non-verbose \ > http://localhost/seaside/alltests > from 4 simultaneous threads." > > You'll note how the seaside application using the historical logic > grows to 64MB, perhaps it will grow forever? > Using the modified logic we actually cycle between 24MB and 45MB. > > Lastly hitting the boundary condition where you trigger 1000's of > incremental GC events is triggered by just running the > macrobenchmarks. > > As a reminder here is a summary of the information I calculated last > nov > > OMniBrowser/Monnticello SUnits from Colin Putney > > Before any changes what I see is (averages) > 8139 marked objects per young space GC, where 2426 marked via > interpreter roots, and 5713 by remember table for 6703 iterations > 4522 swept objects in young space > 714 survivors > > After changes where we bias towards growth (more likely to tenure on > excessive marking), and ensure young space stays largish, > versus heading towards zero I see (again averages) > > 4652 marked objects per young space GC, where 2115 marked via > interpreter roots, and 2526 by remember table for 6678 iterations > 4238 swept objects in young space. > 368 survivors > > This of course translates into fewer CPU cycles needed fpr youngspace > GC work > > > Jerry Bell send me some Croquet testing data > > Seems Croquet starts at about 30MB and grows upwards to 200MB when you > invoke a teapot and look about. > > Jerry has to confirm what he did and if it was repeated mostly the > same, but it did do 65,000 to 70,000 young space GC and it appears > we reduced the young space GC time by 40 seconds. This does result in > more full GC work (5) since I tenure about 16MB before doing a Full > GC, but that accounts only for an extra second of real time... > > Marking in the original case is average 20,808 per young gc > After alterations it's 11,386, making GC work faster > > I'll also note growing to the 195mb takes 49 seconds versus the > original 57. > > >> >>> > statMarkCount: >>> Actually this is the number of times around the marking loop, >>> I don't think it's same as the survivor count plus roots. >> >> That's right, the number of times around the loop is essentially >> fieldCount(roots+survivors). But my point still stands that it is >> easily computed and that we really don't need to explicitly count >> that loop. > > Fine compute them. > > > >> >> Cheers, >> - Andreas >> >> >> >> ------------------------------------------------------- >> This SF.Net email is sponsored by: IntelliVIEW -- Interactive >> Reporting >> Tool for open source databases. Create drag-&-drop reports. Save time >> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. >> Download a FREE copy at http://www.intelliview.com/go/osdn_nl >> _______________________________________________ >> Squeak-VMdev mailing list >> Squ...@li... >> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev >> >> > -- > ======================================================================= > ==== > John M. McIntosh <jo...@sm...> 1-800-477-2659 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================= > ==== > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: John M M. <jo...@sm...> - 2005-01-19 21:47:04
|
On Jan 19, 2005, at 10:41 AM, Andreas Raab wrote: > John, > >>>> Note how we allocate 76 objects, do a young space GC, then have two >>>> survivors, finally we reach the 200K minimum GC >>>> threshold and do a full GC followed by growing young space. However >>>> this process is very painful. >>> >>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: >>> freeBlock) we avoid the above problem. >> >> I see. Yes this makes sense. (btw, I'm not sure if these parameter >> choices are best but I guess since they aren't worse than what's >> there they must be good enough ;-) > > I take this back. The longer I'm looking at these changes the more > questionable they look to me. With them, unless you do a manual full > GC at some point you keep growing and growing and growing until you > just run out of memory. I *really* don't like this. > > The current machinery may be inefficient in some borderline situations > but it works very well with the default situations. With these > tenuring changes we risk to make the default behavior of the system to > be one in which we grow endlessly (say, if you run a web server or > something like this), for example: > > queue := Array new: 20000. > index := 0. > [true] whileTrue:[ > (index := index + 1) > queue size ifTrue:[index := 1]. > queue at: index put: Object new. > ]. > > You keep this guy looping and the only question is *when* you are > running out of memory (depending on the size of the object you stick > into the queue), not if. Compare this to the obscure circumstances in > which we get a (hardly noticable) slowdown with the current behavior. > So I think some way for bounding growths like in the above is > absolutely required before even considering that change. You miss the point that after N MB of memory growth, we do a full GC event. The logic to do the full GC is either in the smalltalk code running as active memory monitoring, or can be moved into the image. I'm not growing the image endlessly. I've attached two jpegs of before/after memory end boundary charts when a person was working in a croquet world, not a borderline case. Also two jpegs from a seaside application (again not a borderline case) which were generated by doing: " wget --recursive --no-parent --delete-after --non-verbose \ http://localhost/seaside/alltests from 4 simultaneous threads." You'll note how the seaside application using the historical logic grows to 64MB, perhaps it will grow forever? Using the modified logic we actually cycle between 24MB and 45MB. Lastly hitting the boundary condition where you trigger 1000's of incremental GC events is triggered by just running the macrobenchmarks. As a reminder here is a summary of the information I calculated last nov OMniBrowser/Monnticello SUnits from Colin Putney Before any changes what I see is (averages) 8139 marked objects per young space GC, where 2426 marked via interpreter roots, and 5713 by remember table for 6703 iterations 4522 swept objects in young space 714 survivors After changes where we bias towards growth (more likely to tenure on excessive marking), and ensure young space stays largish, versus heading towards zero I see (again averages) 4652 marked objects per young space GC, where 2115 marked via interpreter roots, and 2526 by remember table for 6678 iterations 4238 swept objects in young space. 368 survivors This of course translates into fewer CPU cycles needed fpr youngspace GC work Jerry Bell send me some Croquet testing data Seems Croquet starts at about 30MB and grows upwards to 200MB when you invoke a teapot and look about. Jerry has to confirm what he did and if it was repeated mostly the same, but it did do 65,000 to 70,000 young space GC and it appears we reduced the young space GC time by 40 seconds. This does result in more full GC work (5) since I tenure about 16MB before doing a Full GC, but that accounts only for an extra second of real time... Marking in the original case is average 20,808 per young gc After alterations it's 11,386, making GC work faster I'll also note growing to the 195mb takes 49 seconds versus the original 57. > >> > statMarkCount: >> Actually this is the number of times around the marking loop, >> I don't think it's same as the survivor count plus roots. > > That's right, the number of times around the loop is essentially > fieldCount(roots+survivors). But my point still stands that it is > easily computed and that we really don't need to explicitly count that > loop. Fine compute them. > > Cheers, > - Andreas > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting > Tool for open source databases. Create drag-&-drop reports. Save time > by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc. > Download a FREE copy at http://www.intelliview.com/go/osdn_nl > _______________________________________________ > Squeak-VMdev mailing list > Squ...@li... > https://lists.sourceforge.net/lists/listinfo/squeak-vmdev > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-19 18:41:26
|
John,
>>> Note how we allocate 76 objects, do a young space GC, then have two
>>> survivors, finally we reach the 200K minimum GC
>>> threshold and do a full GC followed by growing young space. However
>>> this process is very painful.
>>
>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree:
>> freeBlock) we avoid the above problem.
>
> I see. Yes this makes sense. (btw, I'm not sure if these parameter choices
> are best but I guess since they aren't worse than what's there they must
> be good enough ;-)
I take this back. The longer I'm looking at these changes the more
questionable they look to me. With them, unless you do a manual full GC at
some point you keep growing and growing and growing until you just run out
of memory. I *really* don't like this.
The current machinery may be inefficient in some borderline situations but
it works very well with the default situations. With these tenuring changes
we risk to make the default behavior of the system to be one in which we
grow endlessly (say, if you run a web server or something like this), for
example:
queue := Array new: 20000.
index := 0.
[true] whileTrue:[
(index := index + 1) > queue size ifTrue:[index := 1].
queue at: index put: Object new.
].
You keep this guy looping and the only question is *when* you are running
out of memory (depending on the size of the object you stick into the
queue), not if. Compare this to the obscure circumstances in which we get a
(hardly noticable) slowdown with the current behavior. So I think some way
for bounding growths like in the above is absolutely required before even
considering that change.
> > statMarkCount:
> Actually this is the number of times around the marking loop,
> I don't think it's same as the survivor count plus roots.
That's right, the number of times around the loop is essentially
fieldCount(roots+survivors). But my point still stands that it is easily
computed and that we really don't need to explicitly count that loop.
Cheers,
- Andreas
|
|
From: <gor...@bl...> - 2005-01-19 12:21:54
|
Hi! "Andreas Raab" <and...@gm...> wrote: > Hi - > > >> Good question. If we're actually moving to the svn server at hplabs > >> along with the mirroring previously discussed then it would be nice to > >> get it all setup for public access, announced, and the SF stuff killed > >> off for the 3.8 release There's way too much chaos and confusion around > >> without adding more. > > Talked to Ian about this yesterday. Result: All of the active code is now in > SVN at squeak.hpl.hp.com. Proposal: Let's toast sf.net and get away from the > sux0rs. Goodie. > > So... how about either setting up the Svn server there - or an Svn > > mirror or whatever works in the Svn-world? > > Mirrors sound good. I like squeak.hpl.hp.com more than SqF.net since the hpl > machine is in our cubicles (== easy to reboot if needed; just give me a > buzz) but mirroring would allow anyone to use whatever seems appropriate. So > lets get rid of SF (the sooner the better) and use SVN for real. Ok, I will find out how to set up a Svn mirror and then do it. I assume a mirror will be readonly to start with. > Cheers, > - Andreas regards, Göran |
|
From: Andreas R. <and...@gm...> - 2005-01-19 10:24:11
|
Hi - >> Good question. If we're actually moving to the svn server at hplabs >> along with the mirroring previously discussed then it would be nice to >> get it all setup for public access, announced, and the SF stuff killed >> off for the 3.8 release There's way too much chaos and confusion around >> without adding more. Talked to Ian about this yesterday. Result: All of the active code is now in SVN at squeak.hpl.hp.com. Proposal: Let's toast sf.net and get away from the sux0rs. > So... how about either setting up the Svn server there - or an Svn > mirror or whatever works in the Svn-world? Mirrors sound good. I like squeak.hpl.hp.com more than SqF.net since the hpl machine is in our cubicles (== easy to reboot if needed; just give me a buzz) but mirroring would allow anyone to use whatever seems appropriate. So lets get rid of SF (the sooner the better) and use SVN for real. Cheers, - Andreas |
|
From: <gor...@bl...> - 2005-01-19 09:06:09
|
Hi guys! Tim Rowledge <ti...@su...> wrote: > Good question. If we're actually moving to the svn server at hplabs > along with the mirroring previously discussed then it would be nice to > get it all setup for public access, announced, and the SF stuff killed > off for the 3.8 release There's way too much chaos and confusion around > without adding more. I definitely agree. And another thing - Cees has set us up with (the bomb! :) couldn't resist) a new virtual Debian server and also a paypal account that he, I and Avi has access to at the moment. Well, some of you guys already know this because you have given money to it. :) In fact - the account now holds IIRC more than $400 after one very large gift of $300. So the money funding the server is now pretty much in place for a long while. Now - I logged in yesterday and checked the server, it has 2Gb free and when I sucked down some upgrades it averaged 1200kB/s. I intend to move SM over there and BFAV is already in the process of moving. So... how about either setting up the Svn server there - or an Svn mirror or whatever works in the Svn-world? regards, Göran |
|
From: John M M. <jo...@ma...> - 2005-01-19 09:01:44
|
Oops wrong email address, try again to vmdev Begin forwarded message: > From: John M McIntosh <jo...@sm...> > Date: January 19, 2005 12:47:00 AM PST > To: "Andreas Raab" <and...@gm...> > Cc: "Squeak VM Developers" <squ...@li...>, "Tim > Rowledge" <ti...@su...>, "Ian Piumarta" > <ian...@hp...> > Subject: Re: [Squeak-VMdev] Re: GC improvements > > > On Jan 19, 2005, at 12:26 AM, Andreas Raab wrote: > >> With this aggressive growth strategy I think we should have a (VM) >> parameter which controls when to run a full GC depending on how much >> we've grown. Having a GC tuner sit there all the time and watch >> things fly by doesn't look like the best solution to me. >> >> On the other hand it seems like in this case we probably should be >> following the same strategy in sufficientSpaceAfterGC:, shouldn't we? >> Here, we just GC and then grow and it seems that if we're okay with >> aggressive growths me grow here, too. > > Ok, do you want to move the logic to the VM, the change is after we > grown N MB (settable value) we then do a full GC. > >> If you look at them, all can be computed by other means, say: >> >> statMarkCount: >> Number of marked objects == >> Number of roots + Number of survivors > > Actually this is the number of times around the marking loop, I don't > think it's same as the survivor count plus roots. > > > I think these below look ok > >> >> statSweepCount >> Number of objects in young space before GC == >> Number of survivors of last GC + allocationsBetweenGC >> >> statMkFwdCount >> Number of objects for which fwdBlocks were created == >> Number of survivors >> >> statCompMoveCount >> Number of chunks touched in incr. compaction == >> statSweepCount >> >> So there really isn't any need to count them one by one (if the >> result of counting would be different from the above formulaes it's >> time to get a new CPU which gets addition right ;-) >> >> Cheers, >> - Andreas >> >> > -- > ======================================================================= > ==== > John M. McIntosh <jo...@sm...> 1-800-477-2659 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================= > ==== > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-19 08:26:21
|
Hi John,
> I have no problem changing this to use a external semaphore index.
> The check then I'd guess would check to check for a non-zero value in
> that global.
I prefer this solution.
>> * The logic in incrementalGC for growing namely:
>
> This code is where the problem with:
>> What I found was an issue which we hadn't realized is there, well I'm
>> sure people have seen it, but don't know why...
>> What happens is that as we are tenuring objects we are decreasing the
>> young space from 4MB to Zero.
Ah, I see. Yes that'd be really bad.
>> Note how we allocate 76 objects, do a young space GC, then have two
>> survivors, finally we reach the 200K minimum GC
>> threshold and do a full GC followed by growing young space. However this
>> process is very painful.
>
> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree:
> freeBlock) we avoid the above problem.
I see. Yes this makes sense. (btw, I'm not sure if these parameter choices
are best but I guess since they aren't worse than what's there they must be
good enough ;-)
> In the smalltalk code is a post check to say if we've grown N MB between
> full GCs then it's time to do another one, this prevents uncontrolled
> growth.
> I could add that check in the VM? Should we? If the GC tuning process
> stops running then the VM will grow to the maximum Virtual memory size.
> I did not add it to the VM code since I wanted to minimize the code
> there.
With this aggressive growth strategy I think we should have a (VM) parameter
which controls when to run a full GC depending on how much we've grown.
Having a GC tuner sit there all the time and watch things fly by doesn't
look like the best solution to me.
On the other hand it seems like in this case we probably should be following
the same strategy in sufficientSpaceAfterGC:, shouldn't we? Here, we just GC
and then grow and it seems that if we're okay with aggressive growths me
grow here, too.
>> * Measuring statMarkCount, statCompMoveCount, statSweepCount,
>> statMkFwdCount etc. seem to be excessive - is there really any need to
>> add extra instructions to these tight loops? I'd rather live without
>> these insns in the midst of time-critical GC code.
>
> Well I wanted to collect data, I wonder tho if adding these new
> instructions it makes any measurable difference, maybe integer unit 47
> on that cpu now gets used? Somehow I'd rather leave them, unless you can
> show they are issues?
Well, the major reason why I don't like the insns there is that it is so
hard to measure the difference (otherwise I would have just done it). If you
know how to get accurate measures here (say to be able to spot a speed
difference of 1% reliably) let me know.
> Remember we don't have anyway to collect that type of data right now.
If you look at them, all can be computed by other means, say:
statMarkCount:
Number of marked objects ==
Number of roots + Number of survivors
statSweepCount
Number of objects in young space before GC ==
Number of survivors of last GC + allocationsBetweenGC
statMkFwdCount
Number of objects for which fwdBlocks were created ==
Number of survivors
statCompMoveCount
Number of chunks touched in incr. compaction ==
statSweepCount
So there really isn't any need to count them one by one (if the result of
counting would be different from the above formulaes it's time to get a new
CPU which gets addition right ;-)
Cheers,
- Andreas
|
|
From: John M M. <jo...@sm...> - 2005-01-19 06:30:44
|
On Jan 18, 2005, at 9:55 PM, Andreas Raab wrote: > Hi John, > > Thanks, got it - the VM-dev list is just very slow. Some comments: > > * Using TheGCSemaphore makes the VM unusable with older images due to > splObj size mismatch - I'd want to change this to use an external > semaphore index (this is what I used for "my" primitive) to be able to > run 3.6 images on 3.8 VMs. I stole the set semaphore from another usage for some other semaphore, which is why the check for > [(self fetchClassOf: (self splObj: TheGCSemaphore)) = > (self splObj: ClassSemaphore)]) I have no problem changing this to use a external semaphore index. The check then I'd guess would check to check for a non-zero value in that global. > > * One of the truly important situations which is not covered in these > measures is when we have to run multiple compaction cycles due to lack > of forwarding blocks. I believe this has killed me in the past and > taking GC stats should definitely include this tad of information > (dunno how to measure to be honest...) > > * The logic in incrementalGC for growing namely: > > (((self sizeOfFree: freeBlock) < growHeadroom) and: > [(self fetchClassOf: (self splObj: TheGCSemaphore)) = > (self splObj: ClassSemaphore)]) ifTrue: > [growSize _ growHeadroom*3/2 - (self sizeOfFree: freeBlock) > self growObjectMemory: growSize]. > > looks odd. Questions: > - What has TheGCSemaphore to do with growing? > - Why do we grow when having less than growHeadroom space? > (all we need here is enough space to accomodate the next round > of allocations + IGC - I don't see a logic here) > - Why is the grow size inconsistent with, e.g., > sufficientSpaceAfterGC:? > - Why do it all? :-) > (no, quite seriously, I don't see what good the logic actually > does) Looking at TheGCSemaphore allows me to turn on or off the new logic so I can run before/after tests using the same VM by just setting the TheGCSemaphore to nil or to a Semaphore. This code is where the problem with: > What I found was an issue which we hadn't realized is there, well I'm > sure people have seen it, but don't know why... > What happens is that as we are tenuring objects we are decreasing the > young space from 4MB to Zero. > > Now as indicated in the table below if conditions are right (a couple > of cases in the macrobenchmarks) why as you see the > number of objects we can allocate decreases to zero, and we actually > don't tenure anymore once the survivors fall below 2000. > The rate at which young space GC activity occurs goes from say 8 per > second towards 1000 per second, mind on fast machines > the young space ms accumulation count doesn't move much because the > time taken to do this is under 1 millisecond, or 0, skewing > those statistics and hiding the GC time.AllocationCount Survivors > 4000 5400 > 3209 3459 > 2269 2790 > 1760 1574 > 1592 2299 > 1105 1662 > 427 2355 > 392 2374 > 123 1472 > 89 1478 > 79 2 > 78 2 > 76 2 > 76 2 > > Note how we allocate 76 objects, do a young space GC, then have two > survivors, finally we reach the 200K minimum GC > threshold and do a full GC followed by growing young space. However > this process is very painful. By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: freeBlock) we avoid the above problem. In the smalltalk code is a post check to say if we've grown N MB between full GCs then it's time to do another one, this prevents uncontrolled growth. I could add that check in the VM? Should we? If the GC tuning process stops running then the VM will grow to the maximum Virtual memory size. I did not add it to the VM code since I wanted to minimize the code there. What this also exposed is a tendency for the image to grow. In my notes to Jerry Bell about Croquet: "In your testing the image uses about 200MB. Of which the regular image/vm ran upwards in 6MB chunks when building the teapot in the 58 seconds, then usually using 500K of memory as active young space. Now if you choose to allocate memory then tenure to reduce the amount of mark/sweep work by reducing the number of objects being manged this churns more memory then at some point you are forced to collect it all, making it an expensive noticeable operation.. The activeRun changed things a bit to force growth a bit faster, then tenure on excessive marking which appears to have gotten rid of 42 seconds of incremental GC time because we are looking at fewer objects on each young space mark/sweep." Soo I'd suggest building a VM and running some before after tests and observe memory usage and clock time to complete a known task of work. > > * Measuring statMarkCount, statCompMoveCount, statSweepCount, > statMkFwdCount etc. seem to be excessive - is there really any need to > add extra instructions to these tight loops? I'd rather live without > these insns in the midst of time-critical GC code. Well I wanted to collect data, I wonder tho if adding these new instructions it makes any measurable difference, maybe integer unit 47 on that cpu now gets used? Somehow I'd rather leave them, unless you can show they are issues? Remember we don't have anyway to collect that type of data right now. > > Other than this it looks good. So I'd propose that: > a) We use "my" GC signaling code in order to keep the VMs compatible. > b) Add a counter for multiple compaction cycles (if we know how that > is) > c) Either remove the growing code from IGC or add a comment explaining > what the point of it is and why the parameters have been choosen the > way they have been choosen > d) Get rid of of the counters in the inner loops of the GC code. > > Opinions? (I'd be happy to integrate John's code on top of what I just > posted) > > Cheers, > - Andreas -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Andreas R. <and...@gm...> - 2005-01-19 05:55:26
|
Hi John,
Thanks, got it - the VM-dev list is just very slow. Some comments:
* Using TheGCSemaphore makes the VM unusable with older images due to splObj
size mismatch - I'd want to change this to use an external semaphore index
(this is what I used for "my" primitive) to be able to run 3.6 images on 3.8
VMs.
* One of the truly important situations which is not covered in these
measures is when we have to run multiple compaction cycles due to lack of
forwarding blocks. I believe this has killed me in the past and taking GC
stats should definitely include this tad of information (dunno how to
measure to be honest...)
* The logic in incrementalGC for growing namely:
(((self sizeOfFree: freeBlock) < growHeadroom) and:
[(self fetchClassOf: (self splObj: TheGCSemaphore)) =
(self splObj: ClassSemaphore)]) ifTrue:
[growSize _ growHeadroom*3/2 - (self sizeOfFree: freeBlock)
self growObjectMemory: growSize].
looks odd. Questions:
- What has TheGCSemaphore to do with growing?
- Why do we grow when having less than growHeadroom space?
(all we need here is enough space to accomodate the next round of
allocations + IGC - I don't see a logic here)
- Why is the grow size inconsistent with, e.g.,
sufficientSpaceAfterGC:?
- Why do it all? :-)
(no, quite seriously, I don't see what good the logic actually does)
* Measuring statMarkCount, statCompMoveCount, statSweepCount, statMkFwdCount
etc. seem to be excessive - is there really any need to add extra
instructions to these tight loops? I'd rather live without these insns in
the midst of time-critical GC code.
Other than this it looks good. So I'd propose that:
a) We use "my" GC signaling code in order to keep the VMs compatible.
b) Add a counter for multiple compaction cycles (if we know how that is)
c) Either remove the growing code from IGC or add a comment explaining what
the point of it is and why the parameters have been choosen the way they
have been choosen
d) Get rid of of the counters in the inner loops of the GC code.
Opinions? (I'd be happy to integrate John's code on top of what I just
posted)
Cheers,
- Andreas
----- Original Message -----
From: "John M McIntosh" <jo...@sm...>
To: "Ian Piumarta" <ian...@hp...>; "Andreas Raab"
<and...@gm...>
Cc: "Squeak VM Developers" <squ...@li...>; "Tim
Rowledge" <ti...@su...>
Sent: Tuesday, January 18, 2005 8:32 PM
Subject: Re: [Squeak-VMdev] Re: GC improvements
> Well I tried to reply to the list, but that hasn't arrived yet.
>
> There is a primitive to set the GC semaphore,
> some VM changes for monitoring (more statistical data),
> some VM changes around the memory growth versus GC activity decision.
>
> The logic to tweak things lurks in the image, not the VM.
>
> Let's see where my message went. Otherwise the changesets are on my
> idisk.
>
>
> On Jan 18, 2005, at 6:28 PM, Andreas Raab wrote:
>
>>>> Tim - is there any chance that we can get these changes into the 3.8
>>>> VMMaker? This stuff will be critical for the next Tweak version and
>>>> having
>>>> it in the official 3.8 would heavily simplify migration.
>>> Perhaps you could chat with John about the GC monitoring code he
>>> suggested recently. There is a degree of overlap that you might be able
>>> to mutually remove, making my lif emuch simpler.
>>
>> I haven't seen that code ... but what I am proposing here should allow
>> us to run tuning code from the image instead of the VM.
>>
>>> Aaaannnnnd, how is this going to relate to the 64bit code? I'm still
>>> waiting for some answers about that before doing anything much to
>>> vmmaker.
>>
>> Just asked Ian about this today (I'll defer to him for an ultimate
>> answer).
>>
>> Cheers,
>> - Andreas
>>
>>
>>
>> -------------------------------------------------------
>> The SF.Net email is sponsored by: Beat the post-holiday blues
>> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>> _______________________________________________
>> Squeak-VMdev mailing list
>> Squ...@li...
>> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev
>>
>>
> --
> ========================================================================
> ===
> John M. McIntosh <jo...@sm...> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
> ========================================================================
> ===
>
|
|
From: John M M. <jo...@ma...> - 2005-01-19 03:34:40
|
>> Tim - is there any chance that we can get these changes into the 3.8 >> VMMaker? This stuff will be critical for the next Tweak version and >> having >> it in the official 3.8 would heavily simplify migration. > Perhaps you could chat with John about the GC monitoring code he > suggested recently. There is a degree of overlap that you might be able > to mutually remove, making my lif emuch simpler. These change sets are attached. I did add a primitiveSetGCSemaphore I'm not sure about having to run smalltalk code on each signal because of the frequency of invocation, if you look at the monitor change set you'll see the instance variable gcActivity and some hacked code to look every 100 ms. The VM was altered to invoke the different tenure/compact logic if the semaphore is set, so I drop in a dummy one (Semaphore new) to trigger the new logic. As implied in Andreas' early note the semaphore signal allows you to do active tinkering. See calculateGoals watch out for the comment out code which I've been tinkering with..., let alone the "true ifTrue: [^self]." Right now that code attempts to tenure if it feels the marking has become excessive because of Root Table scanning, and ensures after growing N Bytes we do a full GC, which is the other part of the new VM logic to avoid doing a GC everytime we start to run low on space. Plus added this " A VM change will consider that after a tenure if the young space is less than 4MB then growth will happen to make young space greater than 4MB plus a calculated slack. Then after we've tenured N MB we will do a full GC, versus doing a full GC on every grow operation, this will trigger a shrink if required. For example we'll tenure at 75% and be bias to grow to 16MB before doing full GC." > > The Problem: > > Last weekend I built a new VM which has instrumentation to describe > exactly what the GC is doing, also to > trigger a semaphore when an GC finishes, and to allow you to poke at > more interesting things that control GC activity. > > What I found was an issue which we hadn't realized is there, well I'm > sure people have seen it, but don't know why... > What happens is that as we are tenuring objects we are decreasing the > young space from 4MB to Zero. > > Now as indicated in the table below if conditions are right (a couple > of cases in the macrobenchmarks) why as you see the > number of objects we can allocate decreases to zero, and we actually > don't tenure anymore once the survivors fall below 2000. > The rate at which young space GC activity occurs goes from say 8 per > second towards 1000 per second, mind on fast machines > the young space ms accumulation count doesn't move much because the > time taken to do this is under 1 millisecond, or 0, skewing > those statistics and hiding the GC time. > > AllocationCount Survivors > 4000 5400 > 3209 3459 > 2269 2790 > 1760 1574 > 1592 2299 > 1105 1662 > 427 2355 > 392 2374 > 123 1472 > 89 1478 > 79 2 > 78 2 > 76 2 > 76 2 > > Note how we allocate 76 objects, do a young space GC, then have two > survivors, finally we reach the 200K minimum GC > threshold and do a full GC followed by growing young space. However > this process is very painful. Also it's why the low space dialog > doesn't appear in a timely manner because we are attempting to > approach the 200K limit and trying really hard by doing thousands of > young space GCed to avoid going over that limit. If conditions are > right, then we get close but not close enough... > > What will change in the future. > > a) A GC monitoring class (new) will look at mark/sweep/Root table > counts and decide when to do a tenure operation if iterating > over the root table objects takes too many iterations. A better > solution would be to remember old objects and which slot has the young > reference but that is harder to do. > > b) A VM change will consider that after a tenure if the young space is > less than 4MB then growth will happen to make young space greater than > 4MB plus a calculated slack. Then after we've tenured N MB we will do > a full GC, versus doing a full GC on every grow operation, this will > trigger a shrink if required. For example we'll tenure at 75% and be > bias to grow to 16MB before doing full GC. > > c) To solve hitting the hard boundary when we can not allocate more > space we need to rethink when the low semaphore is signaled and the > rate of young space GC activity, signaling the semaphore earlier will > allow a user to take action before things grind to a halt. I'm not > quite sure how to do that yet. Some older notes: > I've been getting a few GC test data results, the change sets to build > a VM lurk on my idisk as per my note to the squeak mailing list > > OMniBrowser/Monnticello SUnits from Colin Putney > > Before any changes what I see is (averages) > 8139 marked objects per young space GC, where 2426 marked via > interpreter roots, and 5713 by remember table for 6703 iterations > 4522 swept objects in young space > 714 survivors > > After changes where we bias towards growth (more likely to tenure on > excessive marking), and ensure young space stays largish, > versus heading towards zero I see (again averages) > > 4652 marked objects per young space GC, where 2115 marked via > interpreter roots, and 2526 by remember table for 6678 iterations > 4238 swept objects in young space. > 368 survivors > > This of course translates into fewer CPU cycles needed fpr youngspace > GC work > > > Jerry Bell send me some Croquet testing data > > Seems Croquet starts at about 30MB and grows upwards to 200MB when you > invoke a teapot and look about. > > Jerry has to confirm what he did and if it was repeated mostly the > same, but it did do 65,000 to 70,000 young space GC and it appears > we reduced the young space GC time by 40 seconds. This does result in > more full GC work (5) since I tenure about 16MB before doing a Full > GC, but that accounts only for an extra second of real time... > > Marking in the original case is average 20,808 per young gc > After alterations it's 11,386, making GC work faster > > I'll also note growing to the 195mb takes 49 seconds versus the > original 57. > > Anyway I've got to get my head around the numbers and decide were to > take the active tuning logic. -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |