squeak-vmdev Mailing List for Squeak (Page 2)

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

 > squeak.hpl.hp.com

	Ahhh... nice. :)

	thanks,

-C

--
Craig Latta
improvisational musical informaticist
cr...@ne...
www.netjam.org
[|] Proceed for Truth!

> I'm unable to tell if it is publicly accessible since I'm registered -
> so obviously it lets me in. The SF stuff is presumably still there -
> haven't looked in ages - but by now must be well out of date. I wonder
> how we get it actually removed? And we'll need to change the mailing
> listas well since I imagine that would disappear too.

Well, we just fry the CVS repository. I guess you should be able to do this 
when you log into a shell (we do have shell access). In this case we could 
just sit out the whole mess for now - if SF throws us out we leave earlier, 
otherwise we'll leave later.

Cheers,
  - Andreas

John -

All the code is up at squeak.hpl.hp.com for now. I think we should start 
redirecting people to this place.

Cheers,
  - Andreas

----- Original Message ----- 
From: "John M McIntosh" <jo...@ma...>
To: "Squeak VM Developers" <squ...@li...>
Sent: Tuesday, February 01, 2005 9:26 AM
Subject: [Squeak-VMdev] Fwd: VM building

> So, did we resolve where the subversion based source code is and  takedown 
> of the SourceForge stuff?
> Tom here wants to build a VM, so what do I tell him?
>
>
> Begin forwarded message:
>
>> From: Tom Rushworth <tb...@li...>
>> Date: January 31, 2005 4:18:52 PM PST
>> To: John M McIntosh <jo...@sm...>
>> Subject: Re: VM building (was Re: possible idiot question)
>>
>> John,
>>
>> On 31-Jan-05, at 2:59 PM, John M McIntosh wrote:
>>
>>>
>>> On Jan 31, 2005, at 9:10 AM, Tom Rushworth wrote:
>>> [snip]
>>>>
>>>> I'm actually using darcs for my source code control, and like it  very 
>>>> much.  What I don't like is that Xcode seems to
>>>> mix data that looks more like appearance than dependency info into  the 
>>>> file(s).  I haven't been very discriminating
>>>> in what files I checkin though, so it may be possible that xcode has 
>>>> the dependency info in a separate file, I haven't
>>>> looked too closely.
>>>
>>> We've moved to subversion http://subversion.tigris.org/
>>
>> I tried subversion before darcs, since Xcode supports it, but couldn't 
>> get it to work even after half a day
>> of fiddling.  darcs worked out of the box.  Oh well, I guess I'll have 
>> to go back to poking at subversion.
>> Once I get it working, can I point at a server somewhere to get the 
>> platform tree?
>>>
>>>
>>> --
>>> ====================================================================== 
>>> =====
>>> John M. McIntosh <jo...@sm...> 1-800-477-2659
>>> Corporate Smalltalk Consulting Ltd.   http://www.smalltalkconsulting.com
>>> ====================================================================== 
>>> =====
>>>
>>>
>> --
>> Tom Rushworth
>>
>>
> --
> ======================================================================== 
> ===
> John M. McIntosh <jo...@sm...> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ======================================================================== 
> ===
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> Squeak-VMdev mailing list
> Squ...@li...
> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev
> 

In message <bcb...@ma...>
          John M McIntosh <jo...@ma...> wrote:

> So, did we resolve where the subversion based source code is and  
> takedown of the SourceForge stuff?
The SVN server is currently http://squeak.hpl.hp.com/svn/squeak/trunk/platforms

I'm unable to tell if it is publicly accessible since I'm registered -
so obviously it lets me in. The SF stuff is presumably still there -
haven't looked in ages - but by now must be well out of date. I wonder
how we get it actually removed? And we'll need to change the mailing
listas well since I imagine that would disappear too.

tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
Useful random insult:- Can't find log base two of 65536 without a calculator.

On Jan 28, 2005, at 2:55 PM, Andreas Raab wrote:

> Hi John,
>
>> Ok, I consolidated all of this, added a prim to turn the biasGrowth  
>> behavior on where the grow and post fullGC logic is in the interp.c   
>> code, versus being partially in the image. Also a prim to set the   
>> threshold for doing the fullGC if we have grown by N bytes. Thus you   
>> can turn off/on and set the boundary. Left in the other prims as   
>> agreed.
>
> Thanks, this sounds good.
>
>> a) The 10 entries I have for FinalizationDependents have sizes of #(0  
>> 2  0 3 2 55825 nil nil nil nil). You will note the one with 55,825   
>> entries. This actually a 98K or so WeakIdenityKeyDictionary of   
>> CompiledMethods.
>
> This is not the case in any image I am using. You must be using a  
> non-standard image; all the regular ones that I've checked have  #(0 2  
> nil nil nil nil nil nil nil nil) (0: Sockets; 2: Files).

Well no it's not a non-standard image, rather it's a working image  
floating about that I have VMMaker in, but the problem is I filed in  
your code from your note of Nov 11th/ 2004 (below) when you were  
talking about issues with the finalization process and CPU performance.  
Seems I filed this in and save the image (duh! well that was dumb). For  
some reason the macrobenchmark triggers a weak object GC event base on  
the new changes which then grinds thru the 48K  
WeakIdentityKeyDictionary and as you point out that's CPU intensive.

Smalltalk removeKey: #CPUHog of course fixes things.

> I've seen this problem myself. It is easiest to see what happens when  
> you have a process browser open and turn on the cpu watcher - this  
> will show that the finalization process takes a huge amount of  
> resources.
>
> But why? Most likely (this was the case I have experienced) you have  
> created some weak collection with what I consider "automatic  
> finalization", e.g., WeakRegistry and friends register themselves to  
> get notified when a weak references got freed. If this registry grows  
> very large it can take significant amounts of time to do the  
> finalization and if your code is then weak reference heavy you may  
> spend a lot of time in finalization.
>
> Here is an example illustration the problem:
> | hog proc |
> hog := WeakIdentityKeyDictionary new.
> CompiledMethod allInstancesDo:[:cm| hog at: cm put: 42.0].
> Smalltalk at: #CPUHog put: hog.
> WeakArray addWeakDependent: hog.

> --
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

So, did we resolve where the subversion based source code is and  
takedown of the SourceForge stuff?
Tom here wants to build a VM, so what do I tell him?

Begin forwarded message:

> From: Tom Rushworth <tb...@li...>
> Date: January 31, 2005 4:18:52 PM PST
> To: John M McIntosh <jo...@sm...>
> Subject: Re: VM building (was Re: possible idiot question)
>
> John,
>
> On 31-Jan-05, at 2:59 PM, John M McIntosh wrote:
>
>>
>> On Jan 31, 2005, at 9:10 AM, Tom Rushworth wrote:
>> [snip]
>>>
>>> I'm actually using darcs for my source code control, and like it  
>>> very much.  What I don't like is that Xcode seems to
>>> mix data that looks more like appearance than dependency info into  
>>> the file(s).  I haven't been very discriminating
>>> in what files I checkin though, so it may be possible that xcode has  
>>> the dependency info in a separate file, I haven't
>>> looked too closely.
>>
>> We've moved to subversion http://subversion.tigris.org/
>
> I tried subversion before darcs, since Xcode supports it, but couldn't  
> get it to work even after half a day
> of fiddling.  darcs worked out of the box.  Oh well, I guess I'll have  
> to go back to poking at subversion.
> Once I get it working, can I point at a server somewhere to get the  
> platform tree?
>>
>>
>> --
>> ====================================================================== 
>> =====
>> John M. McIntosh <jo...@sm...> 1-800-477-2659
>> Corporate Smalltalk Consulting Ltd.   
>> http://www.smalltalkconsulting.com
>> ====================================================================== 
>> =====
>>
>>
> --
> Tom Rushworth
>
>
--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

Hi John,

> Ok, I consolidated all of this, added a prim to turn the biasGrowth 
> behavior on where the grow and post fullGC logic is in the interp.c  code, 
> versus being partially in the image. Also a prim to set the  threshold for 
> doing the fullGC if we have grown by N bytes. Thus you  can turn off/on 
> and set the boundary. Left in the other prims as  agreed.

Thanks, this sounds good.

> a) The 10 entries I have for FinalizationDependents have sizes of #(0 2  0 
> 3 2 55825 nil nil nil nil). You will note the one with 55,825  entries. 
> This actually a 98K or so WeakIdenityKeyDictionary of  CompiledMethods.

This is not the case in any image I am using. You must be using a 
non-standard image; all the regular ones that I've checked have  #(0 2 nil 
nil nil nil nil nil nil nil) (0: Sockets; 2: Files).

> which leds to signaling the FinalizationSemaphore which wakes up the 
> FinalizationProcess which calls finalizeValues on each WeakArray
> instance (and subclass instance) That has some issues.

Correct. But this has always been the case and is not a result of the recent 
changes.

> PS Isn't it a bit ugly to iterate over all the elements of Weak things 
> looking for null entries, versus passing up a cluestick?

Yes it is. But remember: Passing a cluestick requires an (unbounded) amount 
of memory which is why I never considered it.

Cheers,
  - Andreas

(Sent to list too)

Ok, I consolidated all of this, added a prim to turn the biasGrowth  
behavior on where the grow and post fullGC logic is in the interp.c  
code, versus being partially in the image. Also a prim to set the  
threshold for doing the fullGC if we have grown by N bytes. Thus you  
can turn off/on and set the boundary. Left in the other prims as  
agreed.

However when I when to cross check performance and impact I immediately  
ran into a performance problem which took some time to figure out.

If you take the freecell game and select game 1 (the benchmark) then  
instead of completing in about 4 seconds, it took 85 seconds. Ick.   
Well it's going to be a long night, now which of my GC changes,  
Andreas' changes or Ian's weak object changes, or my changes to  
drawing/flushing/locking causes this.  Many hours later...

So what is happening, also someone could confirm these details.

a) The 10 entries I have for FinalizationDependents have sizes of #(0 2  
0 3 2 55825 nil nil nil nil). You will note the one with 55,825  
entries. This actually a 98K or so WeakIdenityKeyDictionary of  
CompiledMethods.

b) On every animated card move this code in markAndTrace:  is triggered  
to increment weakRootCount
	(self isWeakNonInt: oop) ifTrue: [
		"Set lastFieldOffset before the weak fields in the receiver"
		lastFieldOffset := (self nonWeakFieldsOf: oop) << 2.
		"And remember as weak root"
		weakRootCount := weakRootCount + 1.
		weakRoots at: weakRootCount put: oop.
	] ifFalse: [
		"Do it the usual way"
		lastFieldOffset _ self lastPointerOf: oop.
	].

which later triggers
	1 to: weakRootCount do:[:i| self finalizeReference: (weakRoots at: i)].

which leds to signaling the FinalizationSemaphore which wakes up the  
FinalizationProcess which
calls
finalizeValues on each WeakArray instance (and subclass instance)
That has some issues.
One is that it iterates over the 98K WeakIdenityKeyDictionary and then  
cheerfully does a rehash even if it didn't do anything.
First should  we only do the rehash if we actually tamper with the  
data?  Aka this suggestion.

finalizeValues
	"remove all nil keys and rehash the receiver afterwards"
	| assoc hit |
	hit _ false.
	1 to: array size do:[:i|
		assoc _ array at: i.
		(assoc notNil and:[assoc key == nil]) ifTrue:[array at: i put: nil.  
hit _ true].
	].
	hit ifTrue: [self rehash].

Well adding that makes it a bit faster, still hunting thru 98K elements  
every card animation move isn't fun.

PS Isn't it a bit ugly to iterate over all the elements of Weak things  
looking for null entries, versus passing up a cluestick?
I'll note VisualAge had issues years ago, they would pass up the OOps,  
one at a time, finalization on large numbers took *forever*.
Then they moved to passing up N elements, where when N overflowed you  
were toast. Can't recall what it is now, but they
got past losing entries.

On Jan 20, 2005, at 11:07 AM, John M McIntosh wrote:

> On Jan 19, 2005, at 9:32 PM, Andreas Raab wrote:
>
>>> What I could do is integrate your changes with mine over the  
>>> weekend,  and unless you want to take on that task?
>>
>> If you can find the time this would be great (I may or may not get  
>> around to it).
>> To summarize:
>> * Use gcSemaIndex instead of TheGCSemaphore for compatibility
>> * Trigger growth logic in tenuring upon presence of gc sema
>> * Remove the stats from the inner loops of the GC logic.
>> Is this it?
>>
>> Cheers,
>>  - Andreas
>
> Ok, but I'll make one change. I'll see about a flag to turn the new  
> behavior on versus re-using the
> semaphore. That way you don't mix both desires, one of getting IGC/GC  
> notification, the other of triggering the new behavior.
>
> ======================================================================= 
> ====
> John M. McIntosh <jo...@sm...> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ======================================================================= 
> ====

--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

On Jan 19, 2005, at 9:32 PM, Andreas Raab wrote:

>> What I could do is integrate your changes with mine over the weekend,  
>>  and unless you want to take on that task?
>
> If you can find the time this would be great (I may or may not get  
> around to it).
> To summarize:
> * Use gcSemaIndex instead of TheGCSemaphore for compatibility
> * Trigger growth logic in tenuring upon presence of gc sema
> * Remove the stats from the inner loops of the GC logic.
> Is this it?
>
> Cheers,
>  - Andreas

Ok, but I'll make one change. I'll see about a flag to turn the new  
behavior on versus re-using the
semaphore. That way you don't mix both desires, one of getting IGC/GC  
notification, the other of triggering the new behavior.

======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

[Re: Croquet and GC]
> The problem with making allocationsBetweenGCs is the average objects 
> explored goes up since the number of objects that might live is 
> allocationsBetweenGCs. This increases the incremental GC time by 
> milliseconds which impacts timing for Delays.

Sure does. But on the other hand, an increased number of 
allocationsBetweenGCs causes GC to happen less often and is therefore less 
time consuming (I'm talking about things like >30% time spent in IGC here). 
The best delay accuracy doesn't help if the system is busy running GC cycles 
;-)

> In this example the historical implementation took 57 seconds to build  a 
> teapot, after the change it takes 49 seconds, even if it took a  second or 
> two to manage the full GCs.  I'm grabbed back 8 seconds that  is a good 
> thing.

In this particular example (building a teapot) yes. In other situations the 
tradeoffs are very different. If you are running a game you can not afford a 
hickup every fifteen seconds even if that may make your game run with 57fps 
instead of 49fps.

[Besides I have to admit that I consider "building a teapot morph" a very 
atypical example of Croquet use - it is an artifact of our current lack of 
replication and will go away in the not too distant future so using it to 
discuss how Croquet behaves is just totally missing the point].

Cheers,
  - Andreas

John,

> Well the VM change won't get triggered unless you supply a GC semaphore 
> as part of doing active monitoring and memory tuning. No active 
> monitoring and memory tuning then no new logic is run and we default to 
> the historical behavior. My original point was that the growth part is  in 
> the VM, the GC trigger companion to the logic is in the image. The  VM 
> part will always run, the image part could choke. Thus a suggestion  to 
> move it into the VM.  However I'm leery about doing that since  perhaps 
> you don't want to say do a GC after N MB of growth, rather do  it after a 
> build a teapot, or after N http requests, having it in  smalltalk makes it 
> much easier to tinker with.

Okay, now I see your point. I was confused by the intention of having 
the -seemingly unrelated- GC semaphore be the indicator for having the image 
in control of memory growth behavior (thus my question about what the GC 
sema has to do with growing the OM).

This makes sense now, and although I'd still like to be able to have more 
reasonable behavior without a monitoring process I'm fine with having the 
change done in this way.

Effectively this means that older images running on VMs with these changes 
exhibit precisely the previous behavior and running the GC monitor on top of 
a new VM would put the monitor in control. Sounds good.

> I will note that in VisualWorks the oldspace GC logic is a separate 
> process (or two) and if they die your VW application will soon run out  of 
> memory since  those Smalltalk processes control how oldspace  incremental 
> logic is run.   So leaving the logic in Smalltalk isn't an  outrageous 
> suggestion, but it's not fail safe.

I didn't think it would be outrageous - the thing I worried about is running 
an image on top of a VM which exposes unbounded growth behavior. (and I 
don't care whether VW dies if the controller process dies ;-)

> What I could do is integrate your changes with mine over the weekend,  and 
> unless you want to take on that task?

If you can find the time this would be great (I may or may not get around to 
it).
To summarize:
* Use gcSemaIndex instead of TheGCSemaphore for compatibility
* Trigger growth logic in tenuring upon presence of gc sema
* Remove the stats from the inner loops of the GC logic.
Is this it?

Cheers,
  - Andreas

On Jan 19, 2005, at 2:38 PM, Andreas Raab wrote:

> John,
>> Jerry has to confirm what he did and if it was repeated mostly the
>> same, but it did do 65,000 to 70,000 young space GC and it appears
>> we reduced the young space GC time by 40 seconds.  This does result in
>> more full GC work (5) since I tenure about 16MB before doing a Full  
>> GC,
>> but that accounts only for an extra second of real time...
>
> Agressive tenuring would probably work around some of the problems  
> we've seen in the past (see  
> http://croqueteer.blogspot.com/2005/01/need-for-speed.html) though in  
> general, it seems as if Croquet performs significantly better if you  
> allow for more allocationsBetweenGCs - it seems as if the working set  
> of Croquet is larger than your average Squeak working set. This also  
> explains the stats you are getting - you're tenuring like mad because  
> youngSpace is too small and then at some point where you hit fullGC  
> you are "back to normal". This probably shouldn't be "fixed" by  
> tenuring but rather by tweaking the allocationsBetweenGCs.

The problem with making allocationsBetweenGCs is the average objects  
explored goes up since the number of objects that might live is   
allocationsBetweenGCs. This increases the incremental GC time by  
milliseconds which impacts timing for Delays.

>
> Also notice that the stats look like with the agressive growths logic  
> we are doing more fullGCs than without them (this isn't totally clear  
> from looking at the pictures but I would expect more spikes in the  
> before part if it had more fullGCs). This is not necessarily a good  
> thing - I would rather spend a few percent more on IGC than having to  
> run fullGCs in Croquet.

In this example the historical implementation took 57 seconds to build  
a teapot, after the change it takes 49 seconds, even if it took a  
second or two to manage the full GCs.  I'm grabbed back 8 seconds that  
is a good thing.

>
> Cheers,
>  - Andreas
>
>
--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

On Jan 19, 2005, at 2:38 PM, Andreas Raab wrote:

> John,
>
>> You miss the point that after N MB of memory growth, we do a full GC
>> event.
>>
>> The logic to do the full GC is either in the smalltalk code running as
>> active memory monitoring, or can be moved into the image.
>
> My criticism here is that the *default* without running that extra  
> code is not sensible. In other words, without active monitoring and  
> memory tuning you will indeed run out of memory. And that's just not  
> acceptable - there are plenty of people who will want to run a VM with  
> as little overhead as possible.

Well the VM change won't get triggered unless you supply a GC semaphore  
as part of doing active monitoring and memory tuning. No active  
monitoring and memory tuning then no new logic is run and we default to  
the historical behavior. My original point was that the growth part is  
in the VM, the GC trigger companion to the logic is in the image. The  
VM part will always run, the image part could choke. Thus a suggestion  
to move it into the VM.  However I'm leery about doing that since  
perhaps you don't want to say do a GC after N MB of growth, rather do  
it after a build a teapot, or after N http requests, having it in  
smalltalk makes it much easier to tinker with.

I will note that in VisualWorks the oldspace GC logic is a separate  
process (or two) and if they die your VW application will soon run out  
of  memory since  those Smalltalk processes control how oldspace  
incremental logic is run.   So leaving the logic in Smalltalk isn't an  
outrageous suggestion, but it's not fail safe.

What I could do is integrate your changes with mine over the weekend,  
and unless you want to take on that task?

> --
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

John,

> You miss the point that after N MB of memory growth, we do a full GC
> event.
>
> The logic to do the full GC is either in the smalltalk code running as
> active memory monitoring, or can be moved into the image.

My criticism here is that the *default* without running that extra code is 
not sensible. In other words, without active monitoring and memory tuning 
you will indeed run out of memory. And that's just not acceptable - there 
are plenty of people who will want to run a VM with as little overhead as 
possible.

> I'm not growing the image endlessly. I've attached two jpegs of
> before/after memory end boundary charts when a person was working
> in a croquet world, not a borderline case.

You're making my point ;-) Without the extra GC monitoring code (which 
people may not have, may not be aware about, may not want to run) the system 
would indeed rapidly grow beyound reasonable limits.

> Also two jpegs from a seaside application (again not a borderline case)
> which were generated by doing:

Yup. Again making my point - there need to be sensible defaults in the VM 
before this strategy can make sense. You may want to be able to "tweak it 
away", e.g., set it to "unreasonable limits" to be able to have manual 
control from inside the image, but the VM needs to react sensibly even 
without that extra tuning code.

Bottom line: If we want these changes, we need a sensible mechanism in the 
VM to avoid unbounded memory growth.

> Jerry has to confirm what he did and if it was repeated mostly the
> same, but it did do 65,000 to 70,000 young space GC and it appears
> we reduced the young space GC time by 40 seconds.  This does result in
> more full GC work (5) since I tenure about 16MB before doing a Full GC,
> but that accounts only for an extra second of real time...

Agressive tenuring would probably work around some of the problems we've 
seen in the past (see 
http://croqueteer.blogspot.com/2005/01/need-for-speed.html) though in 
general, it seems as if Croquet performs significantly better if you allow 
for more allocationsBetweenGCs - it seems as if the working set of Croquet 
is larger than your average Squeak working set. This also explains the stats 
you are getting - you're tenuring like mad because youngSpace is too small 
and then at some point where you hit fullGC you are "back to normal". This 
probably shouldn't be "fixed" by tenuring but rather by tweaking the 
allocationsBetweenGCs.

Also notice that the stats look like with the agressive growths logic we are 
doing more fullGCs than without them (this isn't totally clear from looking 
at the pictures but I would expect more spikes in the before part if it had 
more fullGCs). This is not necessarily a good thing - I would rather spend a 
few percent more on IGC than having to run fullGCs in Croquet.

Cheers,
  - Andreas

Resent but without the attachments.

Begin forwarded message:

> From: John M McIntosh <jo...@sm...>
> Date: January 19, 2005 1:46:52 PM PST
> To: "Andreas Raab" <and...@gm...>
> Cc: "Squeak VM Developers" <squ...@li...>, "Tim  
> Rowledge" <ti...@su...>, "Ian Piumarta"  
> <ian...@hp...>
> Subject: Re: [Squeak-VMdev] Re: GC improvements
>
>
> On Jan 19, 2005, at 10:41 AM, Andreas Raab wrote:
>
>> John,
>>
>>>>> Note how we allocate 76 objects, do a young space GC, then have  
>>>>> two survivors, finally we reach the 200K minimum GC
>>>>> threshold and do a full GC followed by growing young space.  
>>>>> However this process is very painful.
>>>>
>>>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree:  
>>>> freeBlock) we avoid the above problem.
>>>
>>> I see. Yes this makes sense. (btw, I'm not sure if these parameter  
>>> choices are best but I guess since they aren't worse than what's  
>>> there they must be good enough ;-)
>>
>> I take this back. The longer I'm looking at these changes the more  
>> questionable they look to me. With them, unless you do a manual full  
>> GC at some point you keep growing and growing and growing until you  
>> just run out of memory. I *really* don't like this.
>>
>> The current machinery may be inefficient in some borderline  
>> situations but it works very well with the default situations. With  
>> these tenuring changes we risk to make the default behavior of the  
>> system to be one in which we grow endlessly (say, if you run a web  
>> server or something like this), for example:
>>
>>    queue := Array new: 20000.
>>    index := 0.
>>    [true] whileTrue:[
>>        (index := index + 1) > queue size ifTrue:[index := 1].
>>        queue at: index put: Object new.
>>    ].
>>
>> You keep this guy looping and the only question is *when* you are  
>> running out of memory (depending on the size of the object you stick  
>> into the queue), not if. Compare this to the obscure circumstances in  
>> which we get a (hardly noticable) slowdown with the current behavior.  
>> So I think some way for bounding growths like in the above is  
>> absolutely required before even considering that change.
>
>
> You miss the point that after N MB of memory growth, we do a full GC  
> event.
>
> The logic to do the full GC is either in the smalltalk code running as  
> active memory monitoring, or can be moved into the image. I'm not  
> growing the image endlessly. I've attached two jpegs of before/after  
> memory end boundary charts when a person was working in a croquet  
> world, not a borderline case.
> Also two jpegs from a seaside application (again not a borderline  
> case) which were generated by doing:
>
> "	wget --recursive --no-parent --delete-after --non-verbose \
> 		http://localhost/seaside/alltests
> from 4 simultaneous threads."
>
> You'll note how the seaside application using the historical logic  
> grows to 64MB, perhaps it will grow forever?
> Using the modified logic we actually cycle between 24MB and 45MB.
>
> Lastly hitting the boundary condition where you trigger 1000's of  
> incremental GC events is triggered by just running the  
> macrobenchmarks.
>
> As a reminder here is a summary of the information I calculated last  
> nov
>
> OMniBrowser/Monnticello SUnits from Colin Putney
>
> Before any changes what I see is  (averages)
> 8139 marked objects per young space GC, where 2426 marked via  
> interpreter roots, and 5713 by remember table for 6703 iterations
> 4522 swept objects in young space
> 714 survivors
>
> After changes where we bias towards growth (more likely to tenure on  
> excessive marking), and ensure young space stays largish,
> versus heading towards zero I see (again averages)
>
> 4652 marked objects per young space GC, where 2115 marked via  
> interpreter roots, and 2526 by remember table for 6678 iterations
> 4238 swept objects in young space.
> 368 survivors
>
> This of course translates into fewer CPU cycles needed fpr  youngspace  
> GC work
>
>
> Jerry Bell send me some Croquet testing data
>
> Seems Croquet starts at about 30MB and grows upwards to 200MB when you  
> invoke a teapot and look about.
>
> Jerry has to confirm what he did and if it was repeated mostly the  
> same, but it did do 65,000 to 70,000 young space GC and it appears
> we reduced the young space GC time by 40 seconds.  This does result in  
> more full GC work (5) since I tenure about 16MB before doing a Full  
> GC, but that accounts only for an extra second of real time...
>
> Marking in the original case is average 20,808 per young gc
> After alterations it's 11,386, making GC work faster
>
> I'll also note growing to the 195mb takes 49 seconds versus the  
> original 57.
>
>
>>
>>> > statMarkCount:
>>> Actually this is the number of times around the marking loop,
>>> I don't think it's same as the survivor count plus roots.
>>
>> That's right, the number of times around the loop is essentially  
>> fieldCount(roots+survivors). But my point still stands that it is  
>> easily computed and that we really don't need to explicitly count  
>> that loop.
>
> Fine compute them.
>
>
>
>>
>> Cheers,
>>  - Andreas
>>
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IntelliVIEW -- Interactive  
>> Reporting
>> Tool for open source databases. Create drag-&-drop reports. Save time
>> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
>> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
>> _______________________________________________
>> Squeak-VMdev mailing list
>> Squ...@li...
>> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev
>>
>>
> --
> ======================================================================= 
> ====
> John M. McIntosh <jo...@sm...> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ======================================================================= 
> ====
>
>
--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

On Jan 19, 2005, at 10:41 AM, Andreas Raab wrote:

> John,
>
>>>> Note how we allocate 76 objects, do a young space GC, then have two  
>>>> survivors, finally we reach the 200K minimum GC
>>>> threshold and do a full GC followed by growing young space. However  
>>>> this process is very painful.
>>>
>>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree:  
>>> freeBlock) we avoid the above problem.
>>
>> I see. Yes this makes sense. (btw, I'm not sure if these parameter  
>> choices are best but I guess since they aren't worse than what's  
>> there they must be good enough ;-)
>
> I take this back. The longer I'm looking at these changes the more  
> questionable they look to me. With them, unless you do a manual full  
> GC at some point you keep growing and growing and growing until you  
> just run out of memory. I *really* don't like this.
>
> The current machinery may be inefficient in some borderline situations  
> but it works very well with the default situations. With these  
> tenuring changes we risk to make the default behavior of the system to  
> be one in which we grow endlessly (say, if you run a web server or  
> something like this), for example:
>
>    queue := Array new: 20000.
>    index := 0.
>    [true] whileTrue:[
>        (index := index + 1) > queue size ifTrue:[index := 1].
>        queue at: index put: Object new.
>    ].
>
> You keep this guy looping and the only question is *when* you are  
> running out of memory (depending on the size of the object you stick  
> into the queue), not if. Compare this to the obscure circumstances in  
> which we get a (hardly noticable) slowdown with the current behavior.  
> So I think some way for bounding growths like in the above is  
> absolutely required before even considering that change.

You miss the point that after N MB of memory growth, we do a full GC  
event.

The logic to do the full GC is either in the smalltalk code running as  
active memory monitoring, or can be moved into the image. I'm not  
growing the image endlessly. I've attached two jpegs of before/after  
memory end boundary charts when a person was working in a croquet  
world, not a borderline case.
Also two jpegs from a seaside application (again not a borderline case)  
which were generated by doing:

"	wget --recursive --no-parent --delete-after --non-verbose \
		http://localhost/seaside/alltests
from 4 simultaneous threads."

You'll note how the seaside application using the historical logic  
grows to 64MB, perhaps it will grow forever?
Using the modified logic we actually cycle between 24MB and 45MB.

Lastly hitting the boundary condition where you trigger 1000's of  
incremental GC events is triggered by just running the macrobenchmarks.

As a reminder here is a summary of the information I calculated last nov

OMniBrowser/Monnticello SUnits from Colin Putney

Before any changes what I see is  (averages)
8139 marked objects per young space GC, where 2426 marked via  
interpreter roots, and 5713 by remember table for 6703 iterations
4522 swept objects in young space
714 survivors

After changes where we bias towards growth (more likely to tenure on  
excessive marking), and ensure young space stays largish,
versus heading towards zero I see (again averages)

4652 marked objects per young space GC, where 2115 marked via  
interpreter roots, and 2526 by remember table for 6678 iterations
4238 swept objects in young space.
368 survivors

This of course translates into fewer CPU cycles needed fpr  youngspace  
GC work

Jerry Bell send me some Croquet testing data

Seems Croquet starts at about 30MB and grows upwards to 200MB when you  
invoke a teapot and look about.

Jerry has to confirm what he did and if it was repeated mostly the  
same, but it did do 65,000 to 70,000 young space GC and it appears
we reduced the young space GC time by 40 seconds.  This does result in  
more full GC work (5) since I tenure about 16MB before doing a Full GC,  
but that accounts only for an extra second of real time...

Marking in the original case is average 20,808 per young gc
After alterations it's 11,386, making GC work faster

I'll also note growing to the 195mb takes 49 seconds versus the  
original 57.

>
>> > statMarkCount:
>> Actually this is the number of times around the marking loop,
>> I don't think it's same as the survivor count plus roots.
>
> That's right, the number of times around the loop is essentially  
> fieldCount(roots+survivors). But my point still stands that it is  
> easily computed and that we really don't need to explicitly count that  
> loop.

Fine compute them.

>
> Cheers,
>  - Andreas
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IntelliVIEW -- Interactive Reporting
> Tool for open source databases. Create drag-&-drop reports. Save time
> by over 75%! Publish reports on the web. Export to DOC, XLS, RTF, etc.
> Download a FREE copy at http://www.intelliview.com/go/osdn_nl
> _______________________________________________
> Squeak-VMdev mailing list
> Squ...@li...
> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev
>
>
--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

John,

>>> Note how we allocate 76 objects, do a young space GC, then have two 
>>> survivors, finally we reach the 200K minimum GC
>>> threshold and do a full GC followed by growing young space. However 
>>> this process is very painful.
>>
>> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: 
>> freeBlock) we avoid the above problem.
>
> I see. Yes this makes sense. (btw, I'm not sure if these parameter choices 
> are best but I guess since they aren't worse than what's there they must 
> be good enough ;-)

I take this back. The longer I'm looking at these changes the more 
questionable they look to me. With them, unless you do a manual full GC at 
some point you keep growing and growing and growing until you just run out 
of memory. I *really* don't like this.

The current machinery may be inefficient in some borderline situations but 
it works very well with the default situations. With these tenuring changes 
we risk to make the default behavior of the system to be one in which we 
grow endlessly (say, if you run a web server or something like this), for 
example:

    queue := Array new: 20000.
    index := 0.
    [true] whileTrue:[
        (index := index + 1) > queue size ifTrue:[index := 1].
        queue at: index put: Object new.
    ].

You keep this guy looping and the only question is *when* you are running 
out of memory (depending on the size of the object you stick into the 
queue), not if. Compare this to the obscure circumstances in which we get a 
(hardly noticable) slowdown with the current behavior. So I think some way 
for bounding growths like in the above is absolutely required before even 
considering that change.

> > statMarkCount:
> Actually this is the number of times around the marking loop,
> I don't think it's same as the survivor count plus roots.

That's right, the number of times around the loop is essentially 
fieldCount(roots+survivors). But my point still stands that it is easily 
computed and that we really don't need to explicitly count that loop.

Cheers,
  - Andreas

Hi!

"Andreas Raab" <and...@gm...> wrote:
> Hi -
> 
> >> Good question. If we're actually moving to the svn server at hplabs
> >> along with the mirroring previously discussed then it would be nice to
> >> get it all setup for public access, announced, and the SF stuff killed
> >> off for the 3.8 release  There's way too much chaos and confusion around
> >> without adding more.
> 
> Talked to Ian about this yesterday. Result: All of the active code is now in 
> SVN at squeak.hpl.hp.com. Proposal: Let's toast sf.net and get away from the 
> sux0rs.

Goodie.

> > So... how about either setting up the Svn server there - or an Svn
> > mirror or whatever works in the Svn-world?
> 
> Mirrors sound good. I like squeak.hpl.hp.com more than SqF.net since the hpl 
> machine is in our cubicles (== easy to reboot if needed; just give me a 
> buzz) but mirroring would allow anyone to use whatever seems appropriate. So 
> lets get rid of SF (the sooner the better) and use SVN for real.

Ok, I will find out how to set up a Svn mirror and then do it. I assume
a mirror will be readonly to start with.

> Cheers,
>   - Andreas

regards, Göran

Hi -

>> Good question. If we're actually moving to the svn server at hplabs
>> along with the mirroring previously discussed then it would be nice to
>> get it all setup for public access, announced, and the SF stuff killed
>> off for the 3.8 release  There's way too much chaos and confusion around
>> without adding more.

Talked to Ian about this yesterday. Result: All of the active code is now in 
SVN at squeak.hpl.hp.com. Proposal: Let's toast sf.net and get away from the 
sux0rs.

> So... how about either setting up the Svn server there - or an Svn
> mirror or whatever works in the Svn-world?

Mirrors sound good. I like squeak.hpl.hp.com more than SqF.net since the hpl 
machine is in our cubicles (== easy to reboot if needed; just give me a 
buzz) but mirroring would allow anyone to use whatever seems appropriate. So 
lets get rid of SF (the sooner the better) and use SVN for real.

Cheers,
  - Andreas

Hi guys!

Tim Rowledge <ti...@su...> wrote:
> Good question. If we're actually moving to the svn server at hplabs
> along with the mirroring previously discussed then it would be nice to
> get it all setup for public access, announced, and the SF stuff killed
> off for the 3.8 release  There's way too much chaos and confusion around
> without adding more.

I definitely agree. And another thing - Cees has set us up with (the
bomb! :) couldn't resist) a new virtual Debian server and also a paypal
account that he, I and Avi has access to at the moment. 

Well, some of you guys already know this because you have given money to
it. :) In fact - the account now holds IIRC more than $400 after one
very large gift of $300. So the money funding the server is now pretty
much in place for a long while.

Now - I logged in yesterday and checked the server, it has 2Gb free and
when I sucked down some upgrades it averaged 1200kB/s. I intend to move
SM over there and BFAV is already in the process of moving.

So... how about either setting up the Svn server there - or an Svn
mirror or whatever works in the Svn-world?

regards, Göran

Oops wrong email address, try again to vmdev

Begin forwarded message:

> From: John M McIntosh <jo...@sm...>
> Date: January 19, 2005 12:47:00 AM PST
> To: "Andreas Raab" <and...@gm...>
> Cc: "Squeak VM Developers" <squ...@li...>, "Tim  
> Rowledge" <ti...@su...>, "Ian Piumarta"  
> <ian...@hp...>
> Subject: Re: [Squeak-VMdev] Re: GC improvements
>
>
> On Jan 19, 2005, at 12:26 AM, Andreas Raab wrote:
>
>> With this aggressive growth strategy I think we should have a (VM)  
>> parameter which controls when to run a full GC depending on how much  
>> we've grown. Having a GC tuner sit there all the time and watch  
>> things fly by doesn't look like the best solution to me.
>>
>> On the other hand it seems like in this case we probably should be  
>> following the same strategy in sufficientSpaceAfterGC:, shouldn't we?  
>> Here, we just GC and then grow and it seems that if we're okay with  
>> aggressive growths me grow here, too.
>
> Ok, do you want to move the logic to the VM, the change is after we  
> grown N MB (settable value) we then do a full GC.
>
>> If you look at them, all can be computed by other means, say:
>>
>> statMarkCount:
>>    Number of marked objects ==
>>        Number of roots + Number of survivors
>
> Actually this is the number of times around the marking loop, I don't  
> think it's same as the survivor count plus roots.
>
>
> I think these below look ok
>
>>
>> statSweepCount
>>    Number of objects in young space before GC ==
>>        Number of survivors of last GC + allocationsBetweenGC
>>
>> statMkFwdCount
>>    Number of objects for which fwdBlocks were created ==
>>        Number of survivors
>>
>> statCompMoveCount
>>    Number of chunks touched in incr. compaction ==
>>        statSweepCount
>>
>> So there really isn't any need to count them one by one (if the  
>> result of counting would be different from the above formulaes it's  
>> time to get a new CPU which gets addition right ;-)
>>
>> Cheers,
>>  - Andreas
>>
>>
> --
> ======================================================================= 
> ====
> John M. McIntosh <jo...@sm...> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ======================================================================= 
> ====
>
>
--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

Hi John,

> I have no problem changing this to use a external semaphore index.
> The check then I'd guess would check to check for a non-zero value in 
> that global.

I prefer this solution.

>> * The logic in incrementalGC for growing namely:
>
> This code is where the problem with:
>> What I found was an issue which we hadn't realized is there, well I'm 
>> sure people have seen it, but don't know why...
>> What happens is that as we are tenuring objects we are decreasing the 
>> young space from 4MB to Zero.

Ah, I see. Yes that'd be really bad.

>> Note how we allocate 76 objects, do a young space GC, then have two 
>> survivors, finally we reach the 200K minimum GC
>> threshold and do a full GC followed by growing young space. However  this 
>> process is very painful.
>
> By saying we want some slack growHeadroom*3/2 - (self sizeOfFree: 
> freeBlock) we avoid the above problem.

I see. Yes this makes sense. (btw, I'm not sure if these parameter choices 
are best but I guess since they aren't worse than what's there they must be 
good enough ;-)

> In the smalltalk code is a post check to say if we've grown N MB  between 
> full GCs then it's time to do another one, this prevents  uncontrolled 
> growth.
> I could add that check in the VM? Should we? If the GC tuning process 
> stops running then the VM will grow to the maximum Virtual memory size.
> I did not add it to the VM code since I wanted to minimize the code 
> there.

With this aggressive growth strategy I think we should have a (VM) parameter 
which controls when to run a full GC depending on how much we've grown. 
Having a GC tuner sit there all the time and watch things fly by doesn't 
look like the best solution to me.

On the other hand it seems like in this case we probably should be following 
the same strategy in sufficientSpaceAfterGC:, shouldn't we? Here, we just GC 
and then grow and it seems that if we're okay with aggressive growths me 
grow here, too.

>> * Measuring statMarkCount, statCompMoveCount, statSweepCount, 
>> statMkFwdCount etc. seem to be excessive - is there really any need to 
>> add extra instructions to these tight loops? I'd rather live without 
>> these insns in the midst of time-critical GC code.
>
> Well I wanted to collect data, I wonder tho if adding these new 
> instructions  it makes any measurable difference, maybe integer unit 47 
> on that cpu now gets used? Somehow I'd rather leave them, unless you  can 
> show they are issues?

Well, the major reason why I don't like the insns there is that it is so 
hard to measure the difference (otherwise I would have just done it). If you 
know how to get accurate measures here (say to be able to spot a speed 
difference of  1% reliably) let me know.

> Remember we don't have anyway to collect that type of data right now.

If you look at them, all can be computed by other means, say:

statMarkCount:
    Number of marked objects ==
        Number of roots + Number of survivors

statSweepCount
    Number of objects in young space before GC ==
        Number of survivors of last GC + allocationsBetweenGC

statMkFwdCount
    Number of objects for which fwdBlocks were created ==
        Number of survivors

statCompMoveCount
    Number of chunks touched in incr. compaction ==
        statSweepCount

So there really isn't any need to count them one by one (if the result of 
counting would be different from the above formulaes it's time to get a new 
CPU which gets addition right ;-)

Cheers,
  - Andreas

On Jan 18, 2005, at 9:55 PM, Andreas Raab wrote:

> Hi John,
>
> Thanks, got it - the VM-dev list is just very slow. Some comments:
>
> * Using TheGCSemaphore makes the VM unusable with older images due to  
> splObj size mismatch - I'd want to change this to use an external  
> semaphore index (this is what I used for "my" primitive) to be able to  
> run 3.6 images on 3.8 VMs.

I stole the set semaphore from another usage for some other semaphore,  
which is why the check for
>  [(self fetchClassOf: (self splObj: TheGCSemaphore)) =
>            (self splObj: ClassSemaphore)])

I have no problem changing this to use a external semaphore index.
The check then I'd guess would check to check for a non-zero value in  
that global.

>
> * One of the truly important situations which is not covered in these  
> measures is when we have to run multiple compaction cycles due to lack  
> of forwarding blocks. I believe this has killed me in the past and  
> taking GC stats should definitely include this tad of information  
> (dunno how to measure to be honest...)
>
> * The logic in incrementalGC for growing namely:
>
>    (((self sizeOfFree: freeBlock) < growHeadroom) and:
>        [(self fetchClassOf: (self splObj: TheGCSemaphore)) =
>            (self splObj: ClassSemaphore)]) ifTrue:
>        [growSize _  growHeadroom*3/2 - (self sizeOfFree: freeBlock)
>         self growObjectMemory: growSize].
>
>  looks odd. Questions:

>      - What has TheGCSemaphore to do with growing?
>      - Why do we grow when having less than growHeadroom space?
>        (all we need here is enough space to accomodate the next round  
> of allocations + IGC - I don't see a logic here)
>      - Why is the grow size inconsistent with, e.g.,  
> sufficientSpaceAfterGC:?
>      - Why do it all? :-)
>        (no, quite seriously, I don't see what good the logic actually  
> does)

Looking at TheGCSemaphore allows me to turn on or off the new logic so  
I can run before/after tests using the same VM  by just
setting the TheGCSemaphore to nil or to a Semaphore.

This code is where the problem with:
> What I found was an issue which we hadn't realized is there, well I'm  
> sure people have seen it, but don't know why...
> What happens is that as we are tenuring objects we are decreasing the  
> young space from 4MB to Zero.
>
> Now as indicated in the table below if conditions are right (a couple  
> of cases in the macrobenchmarks) why as you see the
> number of objects we can allocate decreases to zero, and we actually  
> don't tenure anymore once the survivors fall below 2000.
> The rate at which young space GC activity occurs goes from say 8 per  
> second towards 1000 per second, mind on fast machines
> the young space ms accumulation count doesn't move much because the  
> time taken to do this is under 1 millisecond, or 0, skewing
> those statistics and hiding the GC time.AllocationCount 	Survivors
> 4000	5400
> 3209	3459
> 2269	2790
> 1760	1574
> 1592	2299
> 1105	1662
> 427	2355
> 392	2374
> 123	1472
> 89	1478
> 79	2
> 78	2
> 76	2
> 76	2
>
> Note how we allocate 76 objects, do a young space GC, then have two  
> survivors, finally we reach the 200K minimum GC
> threshold and do a full GC followed by growing young space. However  
> this process is very painful.

By saying we want some slack growHeadroom*3/2 - (self sizeOfFree:  
freeBlock) we avoid the above problem.
In the smalltalk code is a post check to say if we've grown N MB  
between full GCs then it's time to do another one, this prevents  
uncontrolled growth.
I could add that check in the VM? Should we? If the GC tuning process  
stops running then the VM will grow to the maximum Virtual memory size.
I did not add it to the VM code since I wanted to minimize the code  
there.

What this also exposed is a tendency for the image to grow. In my notes  
to Jerry Bell about Croquet:

"In your testing the image uses about 200MB.
Of which the regular image/vm ran upwards in 6MB chunks when building  
the teapot in the 58 seconds, then usually using 500K of memory
as active young space. Now if you choose to allocate memory then tenure  
to reduce the amount of mark/sweep work by reducing the number of
objects being manged this churns more memory then at some point  you  
are forced to collect it all, making it an expensive noticeable  
operation..

The activeRun changed things a bit to force growth a bit faster, then  
tenure on excessive marking which appears to have gotten rid
of 42 seconds of incremental GC time because we are looking at fewer  
objects on each young space mark/sweep."

Soo I'd suggest building a VM and running some before after tests and  
observe memory usage and clock time to complete a known task of work.

>
> * Measuring statMarkCount, statCompMoveCount, statSweepCount,  
> statMkFwdCount etc. seem to be excessive - is there really any need to  
> add extra instructions to these tight loops? I'd rather live without  
> these insns in the midst of time-critical GC code.

Well I wanted to collect data, I wonder tho if adding these new  
instructions  it makes any measurable difference, maybe integer unit 47  
on that cpu now gets used? Somehow I'd rather leave them, unless you  
can show they are issues? Remember we don't have anyway to collect that  
type of data right now.

>
> Other than this it looks good. So I'd propose that:
> a) We use "my" GC signaling code in order to keep the VMs compatible.
> b) Add a counter for multiple compaction cycles (if we know how that  
> is)
> c) Either remove the growing code from IGC or add a comment explaining  
> what the point of it is and why the parameters have been choosen the  
> way they have been choosen
> d) Get rid of of the counters in the inner loops of the GC code.
>
> Opinions? (I'd be happy to integrate John's code on top of what I just  
> posted)
>
> Cheers,
>  - Andreas
--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

Hi John,

Thanks, got it - the VM-dev list is just very slow. Some comments:

* Using TheGCSemaphore makes the VM unusable with older images due to splObj 
size mismatch - I'd want to change this to use an external semaphore index 
(this is what I used for "my" primitive) to be able to run 3.6 images on 3.8 
VMs.

* One of the truly important situations which is not covered in these 
measures is when we have to run multiple compaction cycles due to lack of 
forwarding blocks. I believe this has killed me in the past and taking GC 
stats should definitely include this tad of information (dunno how to 
measure to be honest...)

* The logic in incrementalGC for growing namely:

    (((self sizeOfFree: freeBlock) < growHeadroom) and:
        [(self fetchClassOf: (self splObj: TheGCSemaphore)) =
            (self splObj: ClassSemaphore)]) ifTrue:
        [growSize _  growHeadroom*3/2 - (self sizeOfFree: freeBlock)
         self growObjectMemory: growSize].

  looks odd. Questions:
      - What has TheGCSemaphore to do with growing?
      - Why do we grow when having less than growHeadroom space?
        (all we need here is enough space to accomodate the next round of 
allocations + IGC - I don't see a logic here)
      - Why is the grow size inconsistent with, e.g., 
sufficientSpaceAfterGC:?
      - Why do it all? :-)
        (no, quite seriously, I don't see what good the logic actually does)

* Measuring statMarkCount, statCompMoveCount, statSweepCount, statMkFwdCount 
etc. seem to be excessive - is there really any need to add extra 
instructions to these tight loops? I'd rather live without these insns in 
the midst of time-critical GC code.

Other than this it looks good. So I'd propose that:
a) We use "my" GC signaling code in order to keep the VMs compatible.
b) Add a counter for multiple compaction cycles (if we know how that is)
c) Either remove the growing code from IGC or add a comment explaining what 
the point of it is and why the parameters have been choosen the way they 
have been choosen
d) Get rid of of the counters in the inner loops of the GC code.

Opinions? (I'd be happy to integrate John's code on top of what I just 
posted)

Cheers,
  - Andreas

----- Original Message ----- 
From: "John M McIntosh" <jo...@sm...>
To: "Ian Piumarta" <ian...@hp...>; "Andreas Raab" 
<and...@gm...>
Cc: "Squeak VM Developers" <squ...@li...>; "Tim 
Rowledge" <ti...@su...>
Sent: Tuesday, January 18, 2005 8:32 PM
Subject: Re: [Squeak-VMdev] Re: GC improvements

> Well I tried to reply to the list, but that hasn't arrived yet.
>
> There is a primitive to set the GC semaphore,
> some VM changes for monitoring (more statistical data),
> some VM changes around the memory growth versus GC activity decision.
>
> The logic to tweak things lurks in the image, not the VM.
>
> Let's see where my message went. Otherwise the changesets are on my 
> idisk.
>
>
> On Jan 18, 2005, at 6:28 PM, Andreas Raab wrote:
>
>>>> Tim - is there any chance that we can get these changes into the 3.8
>>>> VMMaker? This stuff will be critical for the next Tweak version and 
>>>> having
>>>> it in the official 3.8 would heavily simplify migration.
>>> Perhaps you could chat with John about the GC monitoring code he
>>> suggested recently. There is a degree of overlap that you might be  able
>>> to mutually remove, making my lif emuch simpler.
>>
>> I haven't seen that code ... but what I am proposing here should allow 
>> us to run tuning code from the image instead of the VM.
>>
>>> Aaaannnnnd, how is this going to relate to the 64bit code? I'm still
>>> waiting for some answers about that before doing anything much to
>>> vmmaker.
>>
>> Just asked Ian about this today (I'll defer to him for an ultimate 
>> answer).
>>
>> Cheers,
>>  - Andreas
>>
>>
>>
>> -------------------------------------------------------
>> The SF.Net email is sponsored by: Beat the post-holiday blues
>> Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
>> It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
>> _______________________________________________
>> Squeak-VMdev mailing list
>> Squ...@li...
>> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev
>>
>>
> --
> ======================================================================== 
> ===
> John M. McIntosh <jo...@sm...> 1-800-477-2659
> Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
> ======================================================================== 
> ===
> 

>> Tim - is there any chance that we can get these changes into the 3.8
>> VMMaker? This stuff will be critical for the next Tweak version and  
>> having
>> it in the official 3.8 would heavily simplify migration.
> Perhaps you could chat with John about the GC monitoring code he
> suggested recently. There is a degree of overlap that you might be able
> to mutually remove, making my lif emuch simpler.

These change sets are attached.

I did add a
primitiveSetGCSemaphore

I'm not sure about having to run smalltalk code on each signal because  
of the frequency of invocation, if you look at the monitor change set  
you'll see the instance variable
gcActivity and some hacked code to look every 100 ms. The VM was  
altered to invoke the different tenure/compact logic if the semaphore  
is set, so I drop in a dummy one (Semaphore new) to trigger the new  
logic. As implied in Andreas' early note the semaphore signal allows  
you to do active tinkering.
See calculateGoals
watch out for the comment out code which I've been tinkering with...,  
let alone the "true ifTrue: [^self]."

Right now that code attempts to tenure if it feels the marking has  
become excessive because of Root Table scanning, and ensures after  
growing N Bytes we do a full GC, which is the other part of the new VM  
logic to avoid doing a GC everytime we start to run low on space.

Plus added this
" A VM change will consider that after a tenure if the young space is  
less than 4MB then growth will happen to make young space greater than  
4MB plus a calculated slack. Then after we've tenured N MB we will do a  
full GC, versus doing a full GC on every grow operation, this will  
trigger a shrink if required.  For example we'll tenure at 75% and be  
bias to grow to 16MB before doing full GC."

>
> The Problem:
>
> Last weekend I built a new VM which has instrumentation to describe  
> exactly what the GC is doing, also to
> trigger a semaphore when an GC finishes, and to allow you to poke at  
> more interesting things that control GC activity.
>
> What I found was an issue which we hadn't realized is there, well I'm  
> sure people have seen it, but don't know why...
> What happens is that as we are tenuring objects we are decreasing the  
> young space from 4MB to Zero.
>
> Now as indicated in the table below if conditions are right (a couple  
> of cases in the macrobenchmarks) why as you see the
> number of objects we can allocate decreases to zero, and we actually  
> don't tenure anymore once the survivors fall below 2000.
> The rate at which young space GC activity occurs goes from say 8 per  
> second towards 1000 per second, mind on fast machines
> the young space ms accumulation count doesn't move much because the  
> time taken to do this is under 1 millisecond, or 0, skewing
> those statistics and hiding the GC time.
>
> AllocationCount 	Survivors
> 4000	5400
> 3209	3459
> 2269	2790
> 1760	1574
> 1592	2299
> 1105	1662
> 427	2355
> 392	2374
> 123	1472
> 89	1478
> 79	2
> 78	2
> 76	2
> 76	2
>
> Note how we allocate 76 objects, do a young space GC, then have two  
> survivors, finally we reach the 200K minimum GC
> threshold and do a full GC followed by growing young space. However  
> this process is very painful. Also it's why the low space dialog
> doesn't appear in a timely manner because we are attempting to  
> approach the 200K limit and trying really hard by doing thousands of
> young space GCed to avoid going over that limit. If conditions are  
> right, then we get close but not close enough...
>
> What will change in the future.
>
> a) A GC monitoring class (new) will look at mark/sweep/Root table  
> counts and decide when to do a tenure operation if iterating
> over the root table objects takes too many iterations. A better  
> solution would be to remember old objects and which slot has the young  
> reference but that is harder to do.
>
> b) A VM change will consider that after a tenure if the young space is  
> less than 4MB then growth will happen to make young space greater than  
> 4MB plus a calculated slack. Then after we've tenured N MB we will do  
> a full GC, versus doing a full GC on every grow operation, this will  
> trigger a shrink if required.  For example we'll tenure at 75% and be  
> bias to grow to 16MB before doing full GC.
>
> c) To solve hitting the hard boundary when we can not allocate more  
> space we need to rethink when the low semaphore is signaled and the  
> rate of young space GC activity, signaling the semaphore earlier will  
> allow a user to take action before things grind to a halt. I'm not  
> quite sure how to do that yet.

Some older notes:
> I've been getting a few GC test data results, the change sets to build  
> a VM lurk on my idisk as per my note to the squeak mailing  list
>
> OMniBrowser/Monnticello SUnits from Colin Putney
>
> Before any changes what I see is  (averages)
> 8139 marked objects per young space GC, where 2426 marked via  
> interpreter roots, and 5713 by remember table for 6703 iterations
> 4522 swept objects in young space
> 714 survivors
>
> After changes where we bias towards growth (more likely to tenure on  
> excessive marking), and ensure young space stays largish,
> versus heading towards zero I see (again averages)
>
> 4652 marked objects per young space GC, where 2115 marked via  
> interpreter roots, and 2526 by remember table for 6678 iterations
> 4238 swept objects in young space.
> 368 survivors
>
> This of course translates into fewer CPU cycles needed fpr  youngspace  
> GC work
>
>
> Jerry Bell send me some Croquet testing data
>
> Seems Croquet starts at about 30MB and grows upwards to 200MB when you  
> invoke a teapot and look about.
>
> Jerry has to confirm what he did and if it was repeated mostly the  
> same, but it did do 65,000 to 70,000 young space GC and it appears
> we reduced the young space GC time by 40 seconds.  This does result in  
> more full GC work (5) since I tenure about 16MB before doing a Full  
> GC, but that accounts only for an extra second of real time...
>
> Marking in the original case is average 20,808 per young gc
> After alterations it's 11,386, making GC work faster
>
> I'll also note growing to the 195mb takes 49 seconds versus the  
> original 57.
>
> Anyway I've got to get my head around the numbers and decide were to  
> take the active tuning logic.

--
======================================================================== 
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd.  http://www.smalltalkconsulting.com
======================================================================== 
===

2002	Jan	Feb	Mar	Apr	May (54)	Jun (3)	Jul	Aug (23)	Sep (33)	Oct (14)	Nov (1)	Dec
2003	Jan	Feb	Mar	Apr	May (5)	Jun	Jul	Aug (15)	Sep (4)	Oct	Nov	Dec
2004	Jan (1)	Feb	Mar (26)	Apr (130)	May (5)	Jun	Jul (21)	Aug (3)	Sep (24)	Oct (10)	Nov (37)	Dec (2)
2005	Jan (30)	Feb (15)	Mar (4)	Apr (1)	May (1)	Jun (1)	Jul (1)	Aug (2)	Sep (2)	Oct	Nov (2)	Dec
2006	Jan	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct (1)	Nov (2)	Dec (10)
2007	Jan (1)	Feb	Mar	Apr	May	Jun	Jul	Aug	Sep	Oct	Nov	Dec

squeak-vmdev Mailing List for Squeak (Page 2)

squeak-vmdev — CVS-related Virtual Machine Development Issues