Re: [Squeak-VMdev] Fw: Q: Lowest level VM changes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 07 Apr 2004, at 17:21, Andreas Raab wrote:

> From: "David P. Reed" <dp...@re...>
>
>> Prbably the biggest other win would be around making it much more
> efficient
>> to use floating point (which we do in tea-times as well as in the 3D
>> stuff).   Since floats are put on the heap, it might be worth looking 
>> at
>> the techniques we used in MACLISP interpretation to put intermediate
> floats
>> in a "number stack" that was much more efficiently allocated and freed
>> (allocate = push onto the temporary number stack).   Coupled with
> compiling
>> sequences of math operations and tests into a "math mode" byte code 
>> stream
>> that checks types on the inputs and then just runs a different byte 
>> code
>> interpreter without any further type checking, this could speed up 
>> math a
>> lot.   It's a kind of  optimistic or speculative execution concept.

I think you could do this implicitly, at least for the special 
arithmetic selectors.

Dispatch bytecodes through a pointer to the bytecode table (identical 
to what gnuification generates for the inner loop at present anyway) 
and on creation of a float result push it onto the float stack and 
switch the dispatch pointer to the "floating bytecode set".  Arithmetic 
selectors continue to manipulate the float set until something 
non-arithmetic comes along, triggering a pop and box of the float stack 
onto the regular stack and a switch back to the regular dispatch 
pointer before continuing with whatever bytecode we're up to.

No compiler changes needed.

Anton Ertl did something related (but different) in his vmgen, where 
parallel bytecode sets are used to represent the state of caching the 
topmost stack value in a register.

With a little work this could maybe even be made to look fairly pretty 
in the source (with the parallel implementations generated 
automagically of the same source methods with compile-time 
conditionalised sections) and extended to work for SIs too (or even 
matrices if they were every to become a primitive type known to the 
arithmetic selectors directly).

(Of course, the right solution is to generate and execute in native 
code and do minimal dataflow analysis and method splitting to keep 
everything unboxed and in registers as much as possible.  But I 
digress...)

Cheers,
Ian