|
From: David P. R. <dp...@re...> - 2004-04-07 16:56:22
|
The only reason to do compiler changes might be to reorder code to increase the likelihood that you'd stay in the "math mode" for a long time. This is like compilers try to reorder code to get maximum benefit from the CPU pipeline and registers by moving loads earlier and stores later within basic blocks. The generic strategy of an alternative interpreter that handles certain streams of operations optimistically and then backs up to retry with the standard one benefits most when there is a really fast really common case. Integer calculations also would benefit, by the way, from also avoiding checks to see if the intermediates are bigger than small integers, and so you could get very effective integer loops. At 11:44 AM 4/7/2004, Ian Piumarta wrote: >On 07 Apr 2004, at 17:21, Andreas Raab wrote: > >>From: "David P. Reed" <dp...@re...> >> >>>Prbably the biggest other win would be around making it much more >>efficient >>>to use floating point (which we do in tea-times as well as in the 3D >>>stuff). Since floats are put on the heap, it might be worth looking at >>>the techniques we used in MACLISP interpretation to put intermediate >>floats >>>in a "number stack" that was much more efficiently allocated and freed >>>(allocate = push onto the temporary number stack). Coupled with >>compiling >>>sequences of math operations and tests into a "math mode" byte code stream >>>that checks types on the inputs and then just runs a different byte code >>>interpreter without any further type checking, this could speed up math a >>>lot. It's a kind of optimistic or speculative execution concept. > >I think you could do this implicitly, at least for the special arithmetic >selectors. > >Dispatch bytecodes through a pointer to the bytecode table (identical to >what gnuification generates for the inner loop at present anyway) and on >creation of a float result push it onto the float stack and switch the >dispatch pointer to the "floating bytecode set". Arithmetic selectors >continue to manipulate the float set until something non-arithmetic comes >along, triggering a pop and box of the float stack onto the regular stack >and a switch back to the regular dispatch pointer before continuing with >whatever bytecode we're up to. > >No compiler changes needed. > >Anton Ertl did something related (but different) in his vmgen, where >parallel bytecode sets are used to represent the state of caching the >topmost stack value in a register. > >With a little work this could maybe even be made to look fairly pretty in >the source (with the parallel implementations generated automagically of >the same source methods with compile-time conditionalised sections) and >extended to work for SIs too (or even matrices if they were every to >become a primitive type known to the arithmetic selectors directly). > >(Of course, the right solution is to generate and execute in native code >and do minimal dataflow analysis and method splitting to keep everything >unboxed and in registers as much as possible. But I digress...) > >Cheers, >Ian > |