You can subscribe to this list here.
| 2002 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(54) |
Jun
(3) |
Jul
|
Aug
(23) |
Sep
(33) |
Oct
(14) |
Nov
(1) |
Dec
|
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2003 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(5) |
Jun
|
Jul
|
Aug
(15) |
Sep
(4) |
Oct
|
Nov
|
Dec
|
| 2004 |
Jan
(1) |
Feb
|
Mar
(26) |
Apr
(130) |
May
(5) |
Jun
|
Jul
(21) |
Aug
(3) |
Sep
(24) |
Oct
(10) |
Nov
(37) |
Dec
(2) |
| 2005 |
Jan
(30) |
Feb
(15) |
Mar
(4) |
Apr
(1) |
May
(1) |
Jun
(1) |
Jul
(1) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(2) |
Dec
|
| 2006 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(1) |
Nov
(2) |
Dec
(10) |
| 2007 |
Jan
(1) |
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
|
From: Andreas R. <and...@gm...> - 2004-04-08 13:13:02
|
Hi Craig, Since noone has commented on this yet, I'll take a stab: > - Incorporate method-lookup change to support remote message-sending, > in anticipation of Squat's support for minimal snapshots and > inter-system communication. What kind of method-lookup change? What is needed here and what are the implications for both speed and potential impact on JIT? > - Allocate header format five for Squat's method dictionary marking, > and version bits in method trailers for Squat's module system. I'm slightly hesitent to reserve an object format for something like method dictionary marking. Can you elaborate on why this would be needed? I would somewhat expect that a well-known class could do the same and we only have so many object format types. Version bits in method trailers shouldn't be needed if I understand the CM changes correctly - you might just add another iVar to CMs. > The Squat homepage is http://netjam.org/squat/. The release page has > links to all the relevant code. Hm... something a little more specific would be nice ;-) Cheers, - Andreas |
|
From: Ian P. <ian...@in...> - 2004-04-08 12:37:34
|
On 08 Apr 2004, at 11:46, Andreas Raab wrote: > Yup, that's pretty much what I had in mind, e.g., > > * when we go into a primitive we set a disableGC flag The last two (maybe three) attempts at jitter did precisely this (the flag was called by exactly that name). It was less important for the primitives than it was for internal mechanisms (such as flushing a volatile context into the heap during return, which is a real pain if you have to keep every single pointer remappable over every context allocation when flushing). Cheers, Ian |
|
From: Andreas R. <and...@gm...> - 2004-04-08 10:08:21
|
> I'd suggest that the VM changes are pretty small but ought to include
> the ability to pass back an error value (fortunately I have ancient
> code to do that sitting somewhere) so the image knows what the problem
> was and does the smart thing.
Yes, that is a "must do" in my understanding. Thanks for reminding. I think
we might just extend #primitiveFail to include an "error reason", so you
would use it via:
interpreterProxy->primitiveFail(ERROR_BAD_ARGUMENT);
or somesuch.
Cheers,
- Andreas
|
|
From: Andreas R. <and...@gm...> - 2004-04-08 09:46:57
|
John, Yup, that's pretty much what I had in mind, e.g., * when we go into a primitive we set a disableGC flag * if there is allocation we fall through in sufficientSpaceToAllocate: (possibly growing mem if needed but not GCing) * upon primitive return we do the allocation check and GC if necessary That's all. We would effectively continue to run everything else as it is today, and the red zone we have for signaling low-space as well as the ability to grow/shrink will be able to deal with the remaining situations. Effectively, all we need to do is to make sure we have "enough headroom for the primitive", and I would be surprised if we *ever* had more than 1k allocation per primitive except in #new: - and those primitives might be marked "gc-safe" to begin with (e.g., resetting the disableGC flag and dealing with remapping). This would add an extra check at the end of primitive returns but to me, this is acceptable if I consider all the potential and yet undiscovered GC hazards we potentially have right now. And heck, we might be able to hack this right away... really there is no need to wait for V4 to get this going. And yes, the point would be to get away from the messy remappings - I was recently reviewing some primitive code and not surprisingly I found three potential GC problems in the one method I was looking at. I think GC problems is the single biggest issue we have for writing prims, followed by argument passing and stack imbalance. Cheers, - Andreas ----- Original Message ----- From: "John M McIntosh" <jo...@sm...> To: <squ...@li...> Sent: Thursday, April 08, 2004 6:25 AM Subject: Re: [Squeak-VMdev] Versiojn 4 changes > Well we only trigger a GC because of allocation count, or some other > condition, or if in fact we've run out of memory. > Certainly I think you could change that to allow growth of the image if > the VM support it, but not to trigger GC activity. > If growth fails and we can't find the memory then exit to shell I'd > guess. I seem to recall we aren't very good at post checking object > allocation in primitives and handing failure cases so failure (write > the stack to stdout) is ok. > > Could then get rid of the remapping logic I'd guess that handles the > current messy details of oops moving during allocation in prim. > > On Apr 7, 2004, at 6:46 PM, Andreas Raab wrote: > > > Which reminds about something totally unrelated but potentially > > *hugely* > > helpful: > > > > How about if we disable GC in primitives? > > > > This idea came back recently when we were talking about chasing GC > > problems - I don't even want to know how many places we have that > > aren't GC > > safe. And I wonder if it's even worthwhile to do this in primitives. > > If it > > is, we could still have a flag that basically "turns GC back on" (and > > this > > could be the default for quick-indexed primitives). Or maybe we just > > turn it > > off for any kind of named primitives. > > > > Thoughts? > > > > - Andreas > > > > ----- Original Message ----- > > From: "Ian Piumarta" <ian...@in...> > > To: "Andreas Raab" <and...@gm...> > > Cc: <squ...@li...> > > Sent: Thursday, April 08, 2004 3:02 AM > > Subject: Re: [Squeak-VMdev] Versiojn 4 changes > > > > > >> On 08 Apr 2004, at 02:39, Andreas Raab wrote: > >> > >>> Which reminds of something else we were talking about in the past: > >>> Passing > >>> primitive arguments as C arguments instead of the Smalltalk stack. > >> > >> Which reminds me of something else Dan & I talked about in the past: > >> evaluating arguments from right to left. Saves an awful lot of > >> tedious > >> peeking into the middle of the stack to pick up the receiver. > >> (Combined with the above, potentially wins Really Big for 386 too. > >> OTOH, the tradeoffs for register architectures are a little more > >> complex.) > >> > >> Cheers, > >> Ian > >> > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IBM Linux Tutorials > > Free Linux tutorial presented by Daniel Robbins, President and CEO of > > GenToo technologies. Learn everything from fundamentals to system > > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > > _______________________________________________ > > Squeak-VMdev mailing list > > Squ...@li... > > https://lists.sourceforge.net/lists/listinfo/squeak-vmdev > > > > > -- > ======================================================================== > === > John M. McIntosh <jo...@sm...> 1-800-477-2659 > Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com > ======================================================================== > === > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > Squeak-VMdev mailing list > Squ...@li... > https://lists.sourceforge.net/lists/listinfo/squeak-vmdev > |
|
From: <gor...@bl...> - 2004-04-08 07:01:18
|
John M McIntosh <jo...@sm...> wrote: > Could then get rid of the remapping logic I'd guess that handles the > current messy details of oops moving during allocation in prim. Aargh! And I just learned how to use that stuff! :) :) (My GtkPlugin creates an Array of ByteArrays for my little callback mechanism etc, it sure took me a while to get all those pushs and pops in the right order...) regards, Göran |
|
From: Tim R. <ti...@su...> - 2004-04-08 05:23:45
|
Does anyone recall the claimed reason for 'internalising' the method lookup/activation stuff? It seems pretty pointless to me, sitting here staring at it. I feel sure it must be giving the poor C compiler coniptions to have so much cruft in a single loop, probably leading to register allocation indigestion. So far as I can see the only cost to having them external is the internalize/externalize SP/IP code and that's a couple of storage instructions. The only code I can see particularly benefitting from being 'internal' is the quickreturn prims which use internalPop:thenPush: - but changing them to external status would only involve referring to the global SP/IP and IIRC all of us are using the global vars in an array now which speeds that up. Maybe I'll try it out in my copious spare time. tim -- Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim Useful random insult:- A Neanderthal brain in a Cro-Magnon body. |
|
From: John M M. <jo...@sm...> - 2004-04-08 04:40:53
|
Tim just posted a note about Processor Yield prim call on the mail list. Which reminded me of a change I did a few years back to collect Process dispatch time as part of the VM scheduler. The current processor watcher can only estimate that value now and it requires a quite a bit of overhead, where as it could be preformed at little cost within the VM yield logic. -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: John M M. <jo...@sm...> - 2004-04-08 04:25:56
|
Well we only trigger a GC because of allocation count, or some other condition, or if in fact we've run out of memory. Certainly I think you could change that to allow growth of the image if the VM support it, but not to trigger GC activity. If growth fails and we can't find the memory then exit to shell I'd guess. I seem to recall we aren't very good at post checking object allocation in primitives and handing failure cases so failure (write the stack to stdout) is ok. Could then get rid of the remapping logic I'd guess that handles the current messy details of oops moving during allocation in prim. On Apr 7, 2004, at 6:46 PM, Andreas Raab wrote: > Which reminds about something totally unrelated but potentially > *hugely* > helpful: > > How about if we disable GC in primitives? > > This idea came back recently when we were talking about chasing GC > problems - I don't even want to know how many places we have that > aren't GC > safe. And I wonder if it's even worthwhile to do this in primitives. > If it > is, we could still have a flag that basically "turns GC back on" (and > this > could be the default for quick-indexed primitives). Or maybe we just > turn it > off for any kind of named primitives. > > Thoughts? > > - Andreas > > ----- Original Message ----- > From: "Ian Piumarta" <ian...@in...> > To: "Andreas Raab" <and...@gm...> > Cc: <squ...@li...> > Sent: Thursday, April 08, 2004 3:02 AM > Subject: Re: [Squeak-VMdev] Versiojn 4 changes > > >> On 08 Apr 2004, at 02:39, Andreas Raab wrote: >> >>> Which reminds of something else we were talking about in the past: >>> Passing >>> primitive arguments as C arguments instead of the Smalltalk stack. >> >> Which reminds me of something else Dan & I talked about in the past: >> evaluating arguments from right to left. Saves an awful lot of >> tedious >> peeking into the middle of the stack to pick up the receiver. >> (Combined with the above, potentially wins Really Big for 386 too. >> OTOH, the tradeoffs for register architectures are a little more >> complex.) >> >> Cheers, >> Ian >> > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > Squeak-VMdev mailing list > Squ...@li... > https://lists.sourceforge.net/lists/listinfo/squeak-vmdev > > -- ======================================================================== === John M. McIntosh <jo...@sm...> 1-800-477-2659 Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com ======================================================================== === |
|
From: Ian P. <ian...@in...> - 2004-04-08 02:32:21
|
On 08 Apr 2004, at 04:20, Tim Rowledge wrote: > In message <6F7...@in...> > Ian Piumarta <ian...@in...> wrote: > >> On 08 Apr 2004, at 02:39, Andreas Raab wrote: >> >>> Which reminds of something else we were talking about in the past: >>> Passing >>> primitive arguments as C arguments instead of the Smalltalk stack. >> > That could save time in the primitive code, but what would it cost in > the calling code? Outside of a translator, how would we load up the > registers (and think of the platform differences in which registers and > how many etc)? Why did I say: "the tradeoffs for register architectures are more complex"? > You mean having rcvr as TOS at prim call time? What benefit does that > have ? You don't need to know how many arguments are passed to know where the receiver is. If you want to do anything more interesting that dumb interpretation, chances are you end up having the receiver right where you need it just when you discover it's time to dynamic bind. Makes inlining statically-bound sends, not to mention deleting that useless frame pointer, a whole lot easier. Too lazy to think of any more excuses. (No, nothing special about TOS on x86. The only thing there was R-2-L eval + C ABI args gives you zero-copy callout to prims.) Ian |
|
From: Tim R. <ti...@su...> - 2004-04-08 02:21:50
|
In message <6F7...@in...>
Ian Piumarta <ian...@in...> wrote:
> On 08 Apr 2004, at 02:39, Andreas Raab wrote:
>
> > Which reminds of something else we were talking about in the past:
> > Passing
> > primitive arguments as C arguments instead of the Smalltalk stack.
>
That could save time in the primitive code, but what would it cost in
the calling code? Outside of a translator, how would we load up the
registers (and think of the platform differences in which registers and
how many etc)? Wouldn't it end up with primitiveResponse looking like
switch(numArgs) {
case 1: (prim)(*sp); break;
case 2: (prim)(*sp, *s--p); break;
etc
which surely wouldn't net much benfit?
> Which reminds me of something else Dan & I talked about in the past:
> evaluating arguments from right to left.
You mean having rcvr as TOS at prim call time? What benefit does that
have ? Is there some specialness about the TOS value on x86?
tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
Useful Latin Phrases:- Utinam coniurati te in foro interficiant! = May
conspirators assassinate you in the mall!
|
|
From: Tim R. <ti...@su...> - 2004-04-08 02:12:50
|
In message <066701c41d0b$4c545ee0$b2d0fea9@R22>
"Andreas Raab" <and...@gm...> wrote:
> Which reminds about something totally unrelated but potentially *hugely*
> helpful:
>
> How about if we disable GC in primitives?
From my audit recently to find places that needed looking at for
messing with the 'interrupt check right now' I found very few numbered
prims can trigger a GC. Most of those that could, ought to be rewritten
to fail and let the image work it out. The nastiest case I can remember
is when sending a message and trying to allocate a context; run out of
memory there and there probably isn't much that can be done. Perhaps
automatically send an email to RamChipsRUs.com?
I'd suggest that the VM changes are pretty small but ought to include
the ability to pass back an error value (fortunately I have ancient
code to do that sitting somewhere) so the image knows what the problem
was and does the smart thing.
tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
Old programmers never die; they just branch to a new address.
|
|
From: Tim R. <ti...@su...> - 2004-04-08 02:02:44
|
In message <6FE...@in...>
Ian Piumarta <ian...@in...> wrote:
>
> Current version:
>
> lwz r3,0xfffc(r27)
> lwz r4,0(r27)
> and r28,r3,r4
> andi. r9,r28,0x1
> beq <fail>
> srawi r5,r3,1
> srawi r0,r4,1
> add r4,r5,r0
> rlwinm r2,r4,1,0,30
> xor. r9,r4,r2
> blt <fail>
> ori r6,r2,0x1
> stwu r6,0xfffc(r27)
> <dispatch>
>
Just out of interest I extracted the RISC OS equivalent and I'm pleased
to see that the latest compiler does a reasonable job. Probably why it
makes a VM 20% faster than the old one...
LDR R0,[R5] get rcvr & arg from stack
LDR R2,[R5,#-4]
AND R1,R2,R0 do tag test
TST R1,#1
BEQ fail smallint test
MOV R1,R2,ASR #1 shift dn arg
ADD R0,R1,R0,ASR #1 add shifted arg to shifted rcvr
EORS R1,R0,R0,LSL #1 eor result with itself shifted up
BMI fail result test
MOV R1, #1 odd- put 1 in R0?
ORR R1,R0,R0,LSL #1 or in 1 to result shifted up
STR R0,[R5,#-4]! push & modify sp in one go
LDRB R7,[R6,R1]! fetch next byte - odd use R0 for 1
B dispatch
It's hard to see much that could be dropped (in the context of an
interpreter). If we could keep the top couple of items in registers it
would save two loads and a store. Even a translator would have to do
most of this I think; no load or store or bytecode fetch I guess?
tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
People who deal with bits should expect to get bitten. - Jon Bentley
|
|
From: Andreas R. <and...@gm...> - 2004-04-08 01:46:30
|
Which reminds about something totally unrelated but potentially *hugely* helpful: How about if we disable GC in primitives? This idea came back recently when we were talking about chasing GC problems - I don't even want to know how many places we have that aren't GC safe. And I wonder if it's even worthwhile to do this in primitives. If it is, we could still have a flag that basically "turns GC back on" (and this could be the default for quick-indexed primitives). Or maybe we just turn it off for any kind of named primitives. Thoughts? - Andreas ----- Original Message ----- From: "Ian Piumarta" <ian...@in...> To: "Andreas Raab" <and...@gm...> Cc: <squ...@li...> Sent: Thursday, April 08, 2004 3:02 AM Subject: Re: [Squeak-VMdev] Versiojn 4 changes > On 08 Apr 2004, at 02:39, Andreas Raab wrote: > > > Which reminds of something else we were talking about in the past: > > Passing > > primitive arguments as C arguments instead of the Smalltalk stack. > > Which reminds me of something else Dan & I talked about in the past: > evaluating arguments from right to left. Saves an awful lot of tedious > peeking into the middle of the stack to pick up the receiver. > (Combined with the above, potentially wins Really Big for 386 too. > OTOH, the tradeoffs for register architectures are a little more > complex.) > > Cheers, > Ian > |
|
From: Ian P. <ian...@in...> - 2004-04-08 01:02:44
|
On 08 Apr 2004, at 02:39, Andreas Raab wrote: > Which reminds of something else we were talking about in the past: > Passing > primitive arguments as C arguments instead of the Smalltalk stack. Which reminds me of something else Dan & I talked about in the past: evaluating arguments from right to left. Saves an awful lot of tedious peeking into the middle of the stack to pick up the receiver. (Combined with the above, potentially wins Really Big for 386 too. OTOH, the tradeoffs for register architectures are a little more complex.) Cheers, Ian |
|
From: John M M. <jo...@sm...> - 2004-04-08 00:53:02
|
Ned I looked at this a year back in terms of checking for large integers
that fit into 32bit since there is routines to load smallish large
integers into 32bits.
This made messing with calculations absolute 512MB > < 2 billion really
fast. However this is rare in practice, some thought is required to
watch out
for large postive/negative integers and not screw up signage, and as
Andreas pointed
out to me small integer math is like 99.999ish %, with perhaps some
float mixed in.
followed by that large integer stuff requiring a send.
Mmmm I'm not sure I've a change set, perhaps some C code if you want to
look at it?
On Apr 7, 2004, at 3:54 PM, Ned Konz wrote:
> On Tuesday 06 April 2004 3:48 pm, Ian Piumarta wrote:
>> On 06 Apr 2004, at 23:50, Yoshiki Ohshima wrote:
>>> how about using "01" for OOP
>>> and "00" for SmallInteger? Some processor's addressing mode let us
>>> access the word-aligned memory with such pointer, while "no-tag" for
>>> SmallInteger may save some bit-operations.
>>
>> Just another data point: some Smalltalk implementations put the
>> SmallInteger tag in the topmost bit. This makes SI tag and overflow
>> checks after arithmetic simpler: addition and subtraction work
>> in-place, plus you can just look at the sign flag after the operation
>> instead of "mask + test-zero" or "shift + xor + sign-test".
>>
>> On architectures where you can set the sign flag during move this can
>> also often eliminate any need to mask and test on the tag bit; after a
>> move you can "trap" immediately on (non-)SI oops.
>
> On a related note, does it seem wasteful to anyone but me that we do
> the
> following in primBytecodeAdd:
>
> int rcvr;
> int arg;
>
> rcvr = longAt(localSP - (1 * 4));
> arg = longAt(localSP - (0 * 4));
> if (((rcvr & arg) & 1) != 0) /* areIntegers: rcvr and: arg */
> {
> result = ((rcvr >> 1)) + ((arg >> 1));
> if ((result ^ (result << 1)) >= 0) /* isIntegerValue: result */
> {
> /* begin internalPop:thenPush: */
> longAtput(localSP -= (2 - 1) * 4, ((result << 1) | 1));
> /* begin fetchNextBytecode */
> currentBytecode = byteAt(++localIP);
> goto l9;
> }
> }
> else
> {
> /* Try to add them as a float */
> /* If success, we're done. Get the next bytecode and loop */
> }
>
> /* otherwise, do a normal send */
>
>
> For a total operation count (not counting the stack load/store) of:
> 1 AND
> 2 bit tests
> 2 right shifts
> 2 left shifts
> 1 OR
> 1 XOR
> 1 ADD
>
> when for the majority of additions (those that don't overflow 31 bits)
> we only
> have to add the top two values (resetting one low bit first so that we
> don't
> get a carry from B0 to B1).
>
> Seems like we could save the shifts in most cases by looking at the
> top two
> bits of the receiver and argument; if the sign bits are different or
> the high
> bits (B30) are both the same as the sign bits we aren't going to get
> any
> overflow.
>
> --
> Ned Konz
> http://bike-nomad.com
> GPG key ID: BEEA7EFE
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Squeak-VMdev mailing list
> Squ...@li...
> https://lists.sourceforge.net/lists/listinfo/squeak-vmdev
>
>
--
========================================================================
===
John M. McIntosh <jo...@sm...> 1-800-477-2659
Corporate Smalltalk Consulting Ltd. http://www.smalltalkconsulting.com
========================================================================
===
|
|
From: Andreas R. <and...@gm...> - 2004-04-08 00:39:19
|
> While it probably won't impact speed on a decent implementation of the > CPU (the additional branch will be predicted correctly) it won't > increase speed either (you haven't reduced the overall number of data > hazards the pipeline has to deal with). Which reminds of something else we were talking about in the past: Passing primitive arguments as C arguments instead of the Smalltalk stack. I remember that you said that this would be advantageous for various reasons and since it requires changing the plugins and a lot of other stuff, V4 might be a chance to give this "more serious thoughts". Cheers, - Andreas |
|
From: Ian P. <ian...@in...> - 2004-04-08 00:27:01
|
On 08 Apr 2004, at 00:54, Ned Konz wrote:
> On a related note, does it seem wasteful to anyone but me that we do
> the
> following in primBytecodeAdd:
Probably not. ;)
You introduce an additional branch into the critical path, by checking
both operands for overflow instead of checking just the result.
> Seems like we could save the shifts in most cases by looking at the
> top two
> bits of the receiver and argument; if the sign bits are different or
> the high
> bits (B30) are both the same as the sign bits we aren't going to get
> any
> overflow.
You end up with exactly the same number of instructions anyway.
Current version:
lwz r3,0xfffc(r27)
lwz r4,0(r27)
and r28,r3,r4
andi. r9,r28,0x1
beq <fail>
srawi r5,r3,1
srawi r0,r4,1
add r4,r5,r0
rlwinm r2,r4,1,0,30
xor. r9,r4,r2
blt <fail>
ori r6,r2,0x1
stwu r6,0xfffc(r27)
<dispatch>
Nedified version:
lwz r3,0xfffc(r27)
lwz r4,0(r27)
xor. r0,r3,r4
blt <fail>
rlwinm r5,r3,1,0,30
xor. r2,r5,r3
blt <fail>
rlwinm r6,r4,1,0,30
xor. r2,r6,r4
blt <fail>
add r7,r3,r4
addi r3,r7,0xffff
stwu r3,0xfffc(r27)
<dispatch>
While it probably won't impact speed on a decent implementation of the
CPU (the additional branch will be predicted correctly) it won't
increase speed either (you haven't reduced the overall number of data
hazards the pipeline has to deal with).
Cheers,
Ian
|
|
From: Ned K. <ne...@bi...> - 2004-04-07 22:55:00
|
On Tuesday 06 April 2004 3:48 pm, Ian Piumarta wrote:
> On 06 Apr 2004, at 23:50, Yoshiki Ohshima wrote:
> > how about using "01" for OOP
> > and "00" for SmallInteger? Some processor's addressing mode let us
> > access the word-aligned memory with such pointer, while "no-tag" for
> > SmallInteger may save some bit-operations.
>
> Just another data point: some Smalltalk implementations put the
> SmallInteger tag in the topmost bit. This makes SI tag and overflow
> checks after arithmetic simpler: addition and subtraction work
> in-place, plus you can just look at the sign flag after the operation
> instead of "mask + test-zero" or "shift + xor + sign-test".
>
> On architectures where you can set the sign flag during move this can
> also often eliminate any need to mask and test on the tag bit; after a
> move you can "trap" immediately on (non-)SI oops.
On a related note, does it seem wasteful to anyone but me that we do the
following in primBytecodeAdd:
int rcvr;
int arg;
rcvr = longAt(localSP - (1 * 4));
arg = longAt(localSP - (0 * 4));
if (((rcvr & arg) & 1) != 0) /* areIntegers: rcvr and: arg */
{
result = ((rcvr >> 1)) + ((arg >> 1));
if ((result ^ (result << 1)) >= 0) /* isIntegerValue: result */
{
/* begin internalPop:thenPush: */
longAtput(localSP -= (2 - 1) * 4, ((result << 1) | 1));
/* begin fetchNextBytecode */
currentBytecode = byteAt(++localIP);
goto l9;
}
}
else
{
/* Try to add them as a float */
/* If success, we're done. Get the next bytecode and loop */
}
/* otherwise, do a normal send */
For a total operation count (not counting the stack load/store) of:
1 AND
2 bit tests
2 right shifts
2 left shifts
1 OR
1 XOR
1 ADD
when for the majority of additions (those that don't overflow 31 bits) we only
have to add the top two values (resetting one low bit first so that we don't
get a carry from B0 to B1).
Seems like we could save the shifts in most cases by looking at the top two
bits of the receiver and argument; if the sign bits are different or the high
bits (B30) are both the same as the sign bits we aren't going to get any
overflow.
--
Ned Konz
http://bike-nomad.com
GPG key ID: BEEA7EFE
|
|
From: Tim R. <ti...@su...> - 2004-04-07 19:59:39
|
In message <5F1...@in...>
Ian Piumarta <ian...@in...> wrote:
>
> Oh, just go install NetBSD on your ARM and never look back. ;)
Been there, done that, wiped up the vomit. I would love to have a good
kernel (stipulating for now that my understanding is that netBSD has
such) under RISC OS's GUI & applications. However, I'm not willing to pay
the price of a completely horrible user experience in order to get the
(noticable but not huge) improvement in kernel cleanliness. It's not as if I
haven't ever tried; I think my *nix experience goes back to 82 or
thereabouts. It's just that I've never felt that it was something a human
being should be subjected to if there is any plausible alternative. Er, and
note that after getting reacquainted with windows after six years I wouldn't
consider it to be such.
Besides, despite it being a truism that security through obscurity is
no security, I have to say that being effectivly immune to viruses is
quite an advantage these days.
tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
May the bugs of many programs nest on your hard drive.
|
|
From: Dan I. <Da...@Sq...> - 2004-04-07 17:32:28
|
[forgot to cc the list on my reply...] Hi, Ian (and all) - Ian wrote... >Dispatch bytecodes through a pointer to the bytecode table (identical to what gnuification generates for the inner loop at present anyway) and on creation of a float result push it onto the float stack and switch the dispatch pointer to the "floating bytecode set". Arithmetic selectors continue to manipulate the float set until something non-arithmetic comes along, triggering a pop and box of the float stack onto the regular stack and a switch back to the regular dispatch pointer before continuing with whatever bytecode we're up to. Yes, this is a very nice way to handle it. >No compiler changes needed. > >Anton Ertl did something related (but different) in his vmgen, where parallel bytecode sets are used to represent the state of caching the topmost stack value in a register. Yes, and so did my Apple Smalltalk back in 1985. But your suggestion generalizes to a very nice way to integrate "volatile contexts". Best of all (at least at this point), I don't think we need to add anything to the V4 project to get into this. >With a little work this could maybe even be made to look fairly pretty in the source (with the parallel implementations generated automagically of the same source methods with compile-time conditionalised sections) and extended to work for SIs too (or even matrices if they were every to become a primitive type known to the arithmetic selectors directly). > >(Of course, the right solution is to generate and execute in native code and do minimal dataflow analysis and method splitting to keep everything unboxed and in registers as much as possible. But I digress...) No, no... please go on!... ;-) - Dan |
|
From: David P. R. <dp...@re...> - 2004-04-07 16:56:22
|
The only reason to do compiler changes might be to reorder code to increase the likelihood that you'd stay in the "math mode" for a long time. This is like compilers try to reorder code to get maximum benefit from the CPU pipeline and registers by moving loads earlier and stores later within basic blocks. The generic strategy of an alternative interpreter that handles certain streams of operations optimistically and then backs up to retry with the standard one benefits most when there is a really fast really common case. Integer calculations also would benefit, by the way, from also avoiding checks to see if the intermediates are bigger than small integers, and so you could get very effective integer loops. At 11:44 AM 4/7/2004, Ian Piumarta wrote: >On 07 Apr 2004, at 17:21, Andreas Raab wrote: > >>From: "David P. Reed" <dp...@re...> >> >>>Prbably the biggest other win would be around making it much more >>efficient >>>to use floating point (which we do in tea-times as well as in the 3D >>>stuff). Since floats are put on the heap, it might be worth looking at >>>the techniques we used in MACLISP interpretation to put intermediate >>floats >>>in a "number stack" that was much more efficiently allocated and freed >>>(allocate = push onto the temporary number stack). Coupled with >>compiling >>>sequences of math operations and tests into a "math mode" byte code stream >>>that checks types on the inputs and then just runs a different byte code >>>interpreter without any further type checking, this could speed up math a >>>lot. It's a kind of optimistic or speculative execution concept. > >I think you could do this implicitly, at least for the special arithmetic >selectors. > >Dispatch bytecodes through a pointer to the bytecode table (identical to >what gnuification generates for the inner loop at present anyway) and on >creation of a float result push it onto the float stack and switch the >dispatch pointer to the "floating bytecode set". Arithmetic selectors >continue to manipulate the float set until something non-arithmetic comes >along, triggering a pop and box of the float stack onto the regular stack >and a switch back to the regular dispatch pointer before continuing with >whatever bytecode we're up to. > >No compiler changes needed. > >Anton Ertl did something related (but different) in his vmgen, where >parallel bytecode sets are used to represent the state of caching the >topmost stack value in a register. > >With a little work this could maybe even be made to look fairly pretty in >the source (with the parallel implementations generated automagically of >the same source methods with compile-time conditionalised sections) and >extended to work for SIs too (or even matrices if they were every to >become a primitive type known to the arithmetic selectors directly). > >(Of course, the right solution is to generate and execute in native code >and do minimal dataflow analysis and method splitting to keep everything >unboxed and in registers as much as possible. But I digress...) > >Cheers, >Ian > |
|
From: Ian P. <ian...@in...> - 2004-04-07 15:44:21
|
On 07 Apr 2004, at 17:21, Andreas Raab wrote: > From: "David P. Reed" <dp...@re...> > >> Prbably the biggest other win would be around making it much more > efficient >> to use floating point (which we do in tea-times as well as in the 3D >> stuff). Since floats are put on the heap, it might be worth looking >> at >> the techniques we used in MACLISP interpretation to put intermediate > floats >> in a "number stack" that was much more efficiently allocated and freed >> (allocate = push onto the temporary number stack). Coupled with > compiling >> sequences of math operations and tests into a "math mode" byte code >> stream >> that checks types on the inputs and then just runs a different byte >> code >> interpreter without any further type checking, this could speed up >> math a >> lot. It's a kind of optimistic or speculative execution concept. I think you could do this implicitly, at least for the special arithmetic selectors. Dispatch bytecodes through a pointer to the bytecode table (identical to what gnuification generates for the inner loop at present anyway) and on creation of a float result push it onto the float stack and switch the dispatch pointer to the "floating bytecode set". Arithmetic selectors continue to manipulate the float set until something non-arithmetic comes along, triggering a pop and box of the float stack onto the regular stack and a switch back to the regular dispatch pointer before continuing with whatever bytecode we're up to. No compiler changes needed. Anton Ertl did something related (but different) in his vmgen, where parallel bytecode sets are used to represent the state of caching the topmost stack value in a register. With a little work this could maybe even be made to look fairly pretty in the source (with the parallel implementations generated automagically of the same source methods with compile-time conditionalised sections) and extended to work for SIs too (or even matrices if they were every to become a primitive type known to the arithmetic selectors directly). (Of course, the right solution is to generate and execute in native code and do minimal dataflow analysis and method splitting to keep everything unboxed and in registers as much as possible. But I digress...) Cheers, Ian |
|
From: Andreas R. <and...@gm...> - 2004-04-07 15:21:18
|
I'm forwarding David's reply here - I don't know if there are any implications for the stuff we're talking about wrt. V4 but if anyone has an insight into these areas it might be worthwhile to keep some of this in mind. Cheers, - Andreas ----- Original Message ----- From: "David P. Reed" <dp...@re...> To: "Andreas Raab" <and...@gm...>; <dav...@be...>; <al...@sq...> Sent: Wednesday, April 07, 2004 4:37 PM Subject: Re: Q: Lowest level VM changes > I don't think we know what is best at this time. It's clear that the > inter-teaparty message send has a common-case fast-path. But it's too > early to guess what the change should be. > > Prbably the biggest other win would be around making it much more efficient > to use floating point (which we do in tea-times as well as in the 3D > stuff). Since floats are put on the heap, it might be worth looking at > the techniques we used in MACLISP interpretation to put intermediate floats > in a "number stack" that was much more efficiently allocated and freed > (allocate = push onto the temporary number stack). Coupled with compiling > sequences of math operations and tests into a "math mode" byte code stream > that checks types on the inputs and then just runs a different byte code > interpreter without any further type checking, this could speed up math a > lot. It's a kind of optimistic or speculative execution concept. This > coupled with the matrix stuff would make Croquet a kick-ass math interpreter. > > At 05:26 PM 4/6/2004, Andreas Raab wrote: > >Hi, > > > >I am just in an extremely low-level discussion with some people about the > >benefits of various kinds of lowest-level VM changes and I was wondering if > >there is anything in Croquet where certain modifications of the VM could > >make huge differences. If you have anything where you say "oh, it would be a > >*huge* improvement to have support X, Y, or Z" this would be a very good > >time to voice it. Note that I am not making any promises here - just that I > >might be able to throw something in that helps us support what you think is > >needed. > > > > - Andreas > |
|
From: <gor...@bl...> - 2004-04-07 09:09:48
|
Avi Bryant <av...@be...> wrote: [SNIP] > The real question, I guess, is > whether transactional systems are considered an important enough set of > applications to use a whole header bit to optimize them. Since I > mostly work on business applications, which usually need transactional > behavior, my bias is that they are important enough, but obviously not > everyone will think so. > > Avi It seems to me that a transactional engine like KATS for example could be used in much more situations than merely business applications. It doesn't even have to be persistent objects, but rather multi-Process applications that use a transactional model instead of Montitors, Queues, Semaphores etc. regards, Göran |
|
From: Avi B. <av...@be...> - 2004-04-07 07:58:59
|
On Apr 7, 2004, at 12:22 AM, Craig Latta wrote: > I added a few things to the swiki page, I'll repeat them here = for the > sake of discussion. Under the heading "Facilitate a number of > anticipated extensions to Squeak": In a similar vein: since I added something to the swiki about an=20 immutability bit, I'll bring it up here as well. As I think G=F6ran=20 already mentioned, the most interesting use is in transactional=20 systems: these need to be able to find out, for some group of objects,=20= which of them has changed state since some point in the past. =20 Currently, the usual pattern is to store a copy of every object in the=20= group when the transaction begins, and then scan through them when the=20= transaction ends looking for changes. This is, of course, generally=20 pretty slow, especially if you're trying to manage large chunks of the=20= image transactionally. Having an immutability bit allows a for much=20 more efficient approach, and has some other nice uses too (I very much=20= like, for example, that objects in the literal frame of a VisualWorks=20 method get marked as immutable). The real question, I guess, is=20 whether transactional systems are considered an important enough set of=20= applications to use a whole header bit to optimize them. Since I=20 mostly work on business applications, which usually need transactional=20= behavior, my bias is that they are important enough, but obviously not=20= everyone will think so. Avi |