|
From: Tim R. <ti...@su...> - 2004-03-17 05:55:45
|
I've been messing in the area of primitive calling and found a small (5% - pMac) to big (150% - RISC OS) improvement in macrobenchmark performance by removing the timing stuff that currently surrounds primitive calls. (See primitiveResponse) I moved the timing to primitiveExternalCall on the dodgy sounding but surprisingly practical grounds that numbered prims are quick and named ones are slow. Of course, with no timing check, those numbered prims (like primRelinquishCPU) that _are_ cycle-hogs cause problems with Delay triggering. For example, I found that 1mS Delays were taking 87mS to get triggered because the primRelinquishCPU on RISC OS involves letting other apps take control. Combine this with the use of the interruptCheckCounter and naturally x-thousand quickChecks between full CheckForInterrupts can stretch to a long time. Yes, it gradually goes down but can take a while. The obvious (and quite effective) hack is to have the long running prim set interruptCheckCounter to 0, which is adequate unless there is some psychopathic code in use that involves no real message sends nor backward branches for a long period. I suspect that the writer of code like that deserves late Delay triggering; along with keelhauling. The problem that concerns me with this approach is that it involves calling checkForInterrupts for every prim that we tag as long-running, at a price of a ioMSecs() and perhaps worse the fairly fast ramping up of interruptCheckCounterFeedBackReset. Imagine a loop calling a suspect long-runner prim that turns out to go quickly most of the time; each time around we go to checkForInterrupts and add 10 to the feedbackreset value. Once out of that loop we may take a while to drag it down again and suffer delayed Delays in the meantime. The actual runtime cost of excess checkForInterrupts is mostly ioMSecs and a few tests. I suppose we could consider more sophisticated handling of the feedback - perhaps checking the most recent interval between 'now' and 'lastTick' for being a multiple of interruptChecksEveryNms and aggresively reducing the feedbackreset. Given the apparent large differences between platforms' costs for time checking, perhaps the best answer is to use a macro so that we can do the right thing for each machine. It might be worth changing checkForInterrupts to take the 'now' value as an arg so that macros that need to get the time can reuse it? To tag the primitives that need this timer check, I suggest that some Slang equivalent of a pragma be tossed in. We can automagically add the macro reference to each exit. There are however cases where the potential slowness is also very platform dependent and we ought to handle that. For example, getNextEvent could be very slow on RISC OS if it allows some other app to run and that app goes off and calculates pi to a gazillion places before returning. Some prims could be made long running if they trigger a GC. Summary: it seems worthwhile to avoid timing all prims since so many are smaller than the timer code. Your thoughts on what checks various platforms & circumstances need is solicited. Oh, and some idea of what situation originally lead to the prim timing code being added would be interesting if anyone remembers. tim -- Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim Useful Latin Phrases:- Ne auderis delere orbem rigidum meum! = Don't you dare erase my hard disk! |
|
From: Ian P. <ian...@in...> - 2004-03-17 06:47:16
|
Hi Tim, > I've been messing in the area of primitive calling and found a small > (5% - pMac) to big (150% - RISC OS) improvement in macrobenchmark > performance by removing the timing stuff that currently surrounds > primitive calls. (See primitiveResponse) I moved the timing to > primitiveExternalCall on the dodgy sounding but surprisingly practical > grounds that numbered prims are quick Hmmm. Like primitives 127, 128 and 130, for example? ;) How about just changing it all to do The Right Thing. > To tag the primitives that need this timer check, I suggest that some > Slang equivalent of a pragma be tossed in. We can automagically add the > macro reference to each exit. That sounds like a close approximation to me. (j5 had a completely local set of [numbered] primitive implementations, written by hand from scratch, in which I very carefully identified those that could potentially run for more than 1ms and avoided the timer check in all others.) > There are however cases where the > potential slowness is also very platform dependent and we ought to > handle that. For example, getNextEvent could be very slow on RISC OS if > it allows some other app to run and that app goes off and calculates pi > to a gazillion places before returning. Some prims could be made long > running if they trigger a GC. In those cases, just fall back on setting interruptCheckCounter to zero in your ioGetNextEvent() thing. (Like you already do, when the user presses the interruptKeyCode, right? ;) > Oh, and some idea of what > situation originally lead to the prim timing code being added would be > interesting if anyone remembers. It had (IIRC) a lot to do with timing a 1ms Delay a zillion times, storing the results into a Bag, and then asking for its sortedCounts. Cheers, Ian |
|
From: Tim R. <ti...@su...> - 2004-03-18 04:36:05
|
In message <F11...@in...>
Ian Piumarta <ian...@in...> wrote:
> Hmmm. Like primitives 127, 128 and 130, for example?
Excellent choices for discussion.
127 (showdisplayrect for those not able to look it up right now) has a
potentially huge time range. Depends on the size of the rectangle and
what the platform does with it. It invokes (usually) ioShowDisplay and
ioForceDisplayUpdate but might not. The latter does nothing on mac, an
ioProcessEvents on RISC OS, a variety of things on *nix and an
UpdateWindow on win32 if various defer related things are right. It's
_probably_ a good bet that it will take a fairly long time.
128 (arraybecome) is a pretty good bet for averaging a long time.
130 (fullgc) is a certainty on most systems; I'd say that any system
that can fullgc in well under a mS is not going to be worried about an
extra checkForInterrupt.
But we also have the normally very quick Float prims that can trigger a
GC - so some easy detection of a GC could be useful so that it can
trigger a checkForInterrupts. My guess is that it will be rare but we
should be able to cope with it.
[snip]
>
> In those cases, just fall back on setting interruptCheckCounter to zero
> in your ioGetNextEvent() thing. (Like you already do, when the user
> presses the interruptKeyCode, right? ;)
Easy for the things that are already platform specific (interesting
aside on interruptKey though; the event loop in the image is trying to
handle the semaphore itself. Is the vm expected to not signal it
anymore?) but not quite so simple for 'portable' prims IF we want to
try to do optimal checking for the platform. Anything beyond merely
setting interruptCheckCounter to 0 for example.
Well absent any strong suggestions from you guys I'll see what running
with just a zeroing the counter does. I'll even try it on my winXP
laptop if I can find a way to persuade the f^&$%^$ing thing to talk to
the network...(offline advice on settings etc welcomed)
tim
--
Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim
Strange OpCodes: BH: Branch and Hang
|
|
From: Tim R. <ti...@su...> - 2004-03-26 04:56:38
|
I've found a relatively simple and I think plausible solution. I factored out the forcing of an interrupt check but implemented it for now as setting the interruptCheckCounter to -1000. In checkForInterrupts I also factored out a test (and implemented it as 'is iCC < -100') so that forcing a check no longer affects the feedback reset value. That at least stops it climbing to ludicrous values. There is a small potential cost in the check being done a bit more frequently. If we need to the force and test can be macroized to let platforms have more choice viz Mac using a cheap timer or whatever. I've also, for now, changed the nextPollTick increment to 100 mS instead of 500, to give some chance of a timely response to an interruptKey press during a heavy non-interactive process. We could quite reasonably make this settable via vmParameter or a platform macro if that is a problem. I made interruptChecksEveryNms vmParameter settable. I don't think the default of 3mS is very helpful towards achieving 1mS Delay response unless one wants to rely on enough forced interrupt checks being made so I changed to 1mS. Surprisingly few numbered prims need to force a check. Most that do involve some GC so I put the forcing there. Some platform support functions will be time consuming and the force can be added in those places easily. For example, on RISC OS a getNextEvent that results in polling the OS events is likely to take a while if some other app sucks up the cycles. I have the impression from Ian that X11 clipboard fetching can be traumatically long winded, so he can add a force to that code. This strikes me as better than imposing a forced check on every platform for all these things (io* and so on). I've also altered the ioMSecs() wrap handling code in checkForInterrupts. I think it was broken previously though given that the timer probably wraps every six days or so it is hardly a major problem. As I mentioned on the general list, I'm reasonably certain that Ned's recursive interrupt problem is not exacerbated by these changes. tim -- Tim Rowledge, ti...@su..., http://sumeru.stanford.edu/tim Useful Latin Phrases:- Recedite, plebes! Gero rem imperialem! = Stand aside plebians! I am on imperial business. |