|
From: Ian P. <ian...@in...> - 2004-04-07 15:44:21
|
On 07 Apr 2004, at 17:21, Andreas Raab wrote: > From: "David P. Reed" <dp...@re...> > >> Prbably the biggest other win would be around making it much more > efficient >> to use floating point (which we do in tea-times as well as in the 3D >> stuff). Since floats are put on the heap, it might be worth looking >> at >> the techniques we used in MACLISP interpretation to put intermediate > floats >> in a "number stack" that was much more efficiently allocated and freed >> (allocate = push onto the temporary number stack). Coupled with > compiling >> sequences of math operations and tests into a "math mode" byte code >> stream >> that checks types on the inputs and then just runs a different byte >> code >> interpreter without any further type checking, this could speed up >> math a >> lot. It's a kind of optimistic or speculative execution concept. I think you could do this implicitly, at least for the special arithmetic selectors. Dispatch bytecodes through a pointer to the bytecode table (identical to what gnuification generates for the inner loop at present anyway) and on creation of a float result push it onto the float stack and switch the dispatch pointer to the "floating bytecode set". Arithmetic selectors continue to manipulate the float set until something non-arithmetic comes along, triggering a pop and box of the float stack onto the regular stack and a switch back to the regular dispatch pointer before continuing with whatever bytecode we're up to. No compiler changes needed. Anton Ertl did something related (but different) in his vmgen, where parallel bytecode sets are used to represent the state of caching the topmost stack value in a register. With a little work this could maybe even be made to look fairly pretty in the source (with the parallel implementations generated automagically of the same source methods with compile-time conditionalised sections) and extended to work for SIs too (or even matrices if they were every to become a primitive type known to the arithmetic selectors directly). (Of course, the right solution is to generate and execute in native code and do minimal dataflow analysis and method splitting to keep everything unboxed and in registers as much as possible. But I digress...) Cheers, Ian |