Since the switch to using Perf counters timing below 10ms has been
seriously impacted - many of the machines we us fair the tests,
causing a fallback to the old method which was always 10ms or
worse. But this problem has been compounded for explicit sleeps in
Tcl_Sleep, since before that would work correctly, but now it loops
until the time is right, which if you are limited to 10ms granularity
means a Tcl_Sleep(1) might take anywhere from 11 to 20 or so ms.
to complete.
After researching this it is clear (to me at least) that the PerfCounter
apporach, whilst seemingly attractive is quite flawed - esp. on MP
machines.
The attached diffs change the approach to use the multi-media
timers with are not subject to OEM vagraties and on all the modern
systems I have access to (which is quite a few!) seem to work well.
We have been using this patch in production for over three months
with only good results :-)
Matt
cvs diff output
Logged In: YES
user_id=99768
Matt,
I'm not quite willing yet to give up on the
performance counter - simply because it's
the only timing reference with sub-millisecond
precision that we have. As I read your patch,
it rolls back to a state where
[clock clicks -microseconds] returns something
precise only to the millisecond - and moreover,
on most machines, accurate only to the video
frame. For profiling, that's a horrible price to pay.
I'm most definitely willing - indeed, eager - to
add the multimedia timer approach on systems
where the performance counter does not pass
the tests. Doing that would at least give us a
consistent timing reference on those machines.
But I'm more concerned about the MP machines
that "fail" the tests. My understanding of the
situation is that machines on which the performance
counter is unreliable are actually rare, and the tests
are quite over-conservative-- excluding all MP
machines except for GenuineIntel hyperthread.
I suspect that a few extra tests for machines on which
the perf counter is "safe" may well cover your
production machines. I've actually encountered
only a couple of machines on which the vendor got
it wrong, and they are rather antiques now.
Have you tried simply patching out the checks for
perf counter frequency (the block of code conditioned
on "#if !defined(_WIN64)")? If so, what were the results?
My suspicion is that on modern machines, that may well
Just Work - and in that case, we'd win simply by making
the checks within that block more permissive.
Logged In: YES
user_id=1333796
I don't think you quite follow - on *all* the MP hyperthreaded systems I
have tested from Dell and HP the criteria fails and it falls back to 10ms
granularity.
Also PerfCounters is plain flawed on MP boxes anyway due to the fact
that unless your process is locked on on CPU the numbers returned by
PerfCounters depend on which CPU your code is executing on at that
momement!
If you want sub-1ms timing for profiling purposes that is entirely a
different issue from the primary timing sub-system.
When we profile us have an extension that works like the tcl [time]
command, but does two things to make the results predicable - 1. it
locks the process onto a cpu (non-HT) and uses perf-counters for the
extra resolution - this yields excellent results for profiling...
Also another thing to consider, if you like I, ship commercial server
processes that heavily use [after] it is not acceptable to have the timing
of a [after 1] be 10ms on some systems and not others - it undermines
the entire design of the application.
So in considering this patch, I ask you to seperate your (valid) concerns
about good profiling, from the more general issues of a highly time-
dependent event loop.
Matt
Logged In: YES
user_id=99768
Let me make sure that we're reading from the same page here,
because I think this is going at cross purposes. I *do*
understand that the tests in the _WIN64 block fail. They are
too conservative.
I quite agree that an unpredictable 1-10 ms delay in [after
1] is unacceptable in any case, and I'm looking into using
the multimedia timer to mitigate that - bringing it down to
1 ms resolution. That can be done regardless of whether we
use the perf counter. So yes, your immediate problem *will*
get fixed, in something close to the way you request.
One possibility is to go to a two-loop PLL - phaselock the
MM timer derived reference to the system clock (this may be
done for us, I'm checking up on that), and then phaselock
the perf-counter-derived reference to the MM timer. That
gives the single-processor (or even multiple-core-per-chip)
the best of both worlds.
You have more experience with modern MT servers than I do -
but the limited work that I've done on Dell servers suggests
that the board-level integrators actually did better than
MS's documentation indicates. My understanding is that the
multiple CPU's actually derive their clocks from the same
reference, and get their reset pulse at the same time, so
that even though the counters are separate registers, they
actually increment in lockstep. That's why I described the
tests as "overconservative." This was not always true -
Compaq got it spectacularly wrong in the 486 era - but I
have contacts that report good results with patching out the
test on modern systems. That's why I'm trying to identify
how to make the tests more permissive.
And, well, the perf counter is just too useful on today's
typical desktop machines for me to give it up entirely. I
still think we can get the best of both worlds - no
unpredictable 10 ms delays *and* the high-resolution counter
in places where it works.
Logged In: YES
user_id=1333796
As long as I can get 1ms timing if the OS is capable, then I am happy.
However I would warn you that I did some tests on our DL380 MP boxes
(HP/Compaq) and perf-counter values were different across the cpus.
Matt