|
From: Tim R. <ti...@su...> - 2004-03-17 05:55:45
|
I've been messing in the area of primitive calling and found a small (5% - pMac) to big (150% - RISC OS) improvement in macrobenchmark performance by removing the timing stuff that currently surrounds primitive calls. (See primitiveResponse) I moved the timing to primitiveExternalCall on the dodgy sounding but surprisingly practical grounds that numbered prims are quick and named ones are slow. Of course, with no timing check, those numbered prims (like primRelinquishCPU) that _are_ cycle-hogs cause problems with Delay triggering. For example, I found that 1mS Delays were taking 87mS to get triggered because the primRelinquishCPU on RISC OS involves letting other apps take control. Combine this with the use of the interruptCheckCounter and naturally x-thousand quickChecks between full CheckForInterrupts can stretch to a long time. Yes, it gradually goes down but can take a while. The obvious (and quite effective) hack is to have the long running prim set interruptCheckCounter to 0, which is adequate unless there is some psychopathic code in use that involves no real message sends nor backward branches for a long period. I suspect that the writer of code like that deserves late Delay triggering; along with keelhauling. The problem that concerns me with this approach is that it involves calling checkForInterrupts for every prim that we tag as long-running, at a price of a ioMSecs() and perhaps worse the fairly fast ramping up of interruptCheckCounterFeedBackReset. Imagine a loop calling a suspect long-runner prim that turns out to go quickly most of the time; each time around we go to checkForInterrupts and add 10 to the feedbackreset value. Once out of that loop we may take a while to drag it down again and suffer delayed Delays in the meantime. The actual runtime cost of excess checkForInterrupts is mostly ioMSecs and a few tests. I suppose we could consider more sophisticated handling of the feedback - perhaps checking the most recent interval between 'now' and 'lastTick' for being a multiple of interruptChecksEveryNms and aggresively reducing the feedbackreset. Given the apparent large differences between platforms' costs for time checking, perhaps the best answer is to use a macro so that we can do the right thing for each machine. It might be worth changing checkForInterrupts to take the 'now' value as an arg so that macros that need to get the time can reuse it? To tag the primitives that need this timer check, I suggest that some Slang equivalent of a pragma be tossed in. We can automagically add the macro reference to each exit. There are however cases where the potential slowness is also very platform dependent and we ought to handle that. For example, getNextEvent could be very slow on RISC OS if it allows some other app to run and that app goes off and calculates pi to a gazillion places before returning. Some prims could be made long running if they trigger a GC. Summary: it seems worthwhile to avoid timing all prims since so many are smaller than the timer code. Your thoughts on what checks various platforms & circumstances need is solicited. Oh, and some idea of what situation originally lead to the prim timing code being added would be interesting if anyone remembers. tim -- Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim Useful Latin Phrases:- Ne auderis delere orbem rigidum meum! = Don't you dare erase my hard disk! |