[Squeak-VMdev] primitive timing issues

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

I've been messing in the area of primitive calling and found a small
(5% - pMac) to big (150% - RISC OS) improvement in macrobenchmark
performance by removing the timing stuff that currently surrounds
primitive calls. (See primitiveResponse) I moved the timing to
primitiveExternalCall on the dodgy sounding but surprisingly practical
grounds that numbered prims are quick and named ones are slow.

Of course, with no timing check, those numbered prims (like
primRelinquishCPU) that _are_ cycle-hogs cause problems with Delay
triggering. For example, I found that 1mS Delays were taking 87mS to
get triggered because the primRelinquishCPU on RISC OS involves letting
other apps take control. Combine this with the use of the
interruptCheckCounter and naturally x-thousand quickChecks between full
CheckForInterrupts can stretch to a long time. Yes, it gradually goes
down but can take a while.

The obvious (and quite effective) hack is to have the long running prim set
interruptCheckCounter to 0, which is adequate unless there is some
psychopathic code in use that involves no real message sends nor backward
branches for a long period. I suspect that the writer of code like that
deserves late Delay triggering; along with keelhauling. The problem
that concerns me with this approach is that it involves calling
checkForInterrupts for every prim that we tag as long-running, at a
price of a ioMSecs() and perhaps worse the fairly fast ramping up of
interruptCheckCounterFeedBackReset. Imagine a loop calling a suspect
long-runner prim that turns out to go quickly most of the time; each
time around we go to checkForInterrupts and add 10 to the feedbackreset
value. Once out of that loop we may take a while to drag it down again
and suffer delayed Delays in the meantime. The actual runtime cost of
excess checkForInterrupts is mostly ioMSecs and a few tests. I suppose
we could consider more sophisticated handling of the feedback - perhaps
checking the most recent interval between 'now' and 'lastTick' for
being a multiple of interruptChecksEveryNms and aggresively reducing the
feedbackreset.

Given the apparent large differences between platforms' costs for time
checking, perhaps the best answer is to use a macro so that we can do
the right thing for each machine. It might be worth changing
checkForInterrupts to take the 'now' value as an arg so that macros
that need to get the time can reuse it?

To tag the primitives that need this timer check, I suggest that some
Slang equivalent of a pragma be tossed in. We can automagically add the
macro reference to each exit. There are however cases where the
potential slowness is also very platform dependent and we ought to
handle that. For example, getNextEvent could be very slow on RISC OS if
it allows some other app to run and that app goes off and calculates pi
to a gazillion places before returning. Some prims could be made long
running if they trigger a GC.

Summary: it seems worthwhile to avoid timing all prims since so many
are smaller than the timer code. Your thoughts on what checks various
platforms & circumstances need is solicited. Oh, and some idea of what
situation originally lead to the prim timing code being added would be
interesting if anyone remembers.

tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
Useful Latin Phrases:- Ne auderis delere orbem rigidum meum! = Don't you dare erase my hard disk!