Menu

#451 windows: timing granularity is poor on many systems

open
5
2005-11-11
2005-11-11
Matt Newman
No

Since the switch to using Perf counters timing below 10ms has been
seriously impacted - many of the machines we us fair the tests,
causing a fallback to the old method which was always 10ms or
worse. But this problem has been compounded for explicit sleeps in
Tcl_Sleep, since before that would work correctly, but now it loops
until the time is right, which if you are limited to 10ms granularity
means a Tcl_Sleep(1) might take anywhere from 11 to 20 or so ms.
to complete.

After researching this it is clear (to me at least) that the PerfCounter
apporach, whilst seemingly attractive is quite flawed - esp. on MP
machines.

The attached diffs change the approach to use the multi-media
timers with are not subject to OEM vagraties and on all the modern
systems I have access to (which is quite a few!) seem to work well.

We have been using this patch in production for over three months
with only good results :-)

Matt

Discussion

  • Matt Newman

    Matt Newman - 2005-11-11

    cvs diff output

     
  • Donal K. Fellows

    • labels: --> 06. Time Measurement
    • assigned_to: nobody --> kennykb
     
  • Kevin B KENNY

    Kevin B KENNY - 2005-11-11

    Logged In: YES
    user_id=99768

    Matt,

    I'm not quite willing yet to give up on the
    performance counter - simply because it's
    the only timing reference with sub-millisecond
    precision that we have. As I read your patch,
    it rolls back to a state where
    [clock clicks -microseconds] returns something
    precise only to the millisecond - and moreover,
    on most machines, accurate only to the video
    frame. For profiling, that's a horrible price to pay.

    I'm most definitely willing - indeed, eager - to
    add the multimedia timer approach on systems
    where the performance counter does not pass
    the tests. Doing that would at least give us a
    consistent timing reference on those machines.

    But I'm more concerned about the MP machines
    that "fail" the tests. My understanding of the
    situation is that machines on which the performance
    counter is unreliable are actually rare, and the tests
    are quite over-conservative-- excluding all MP
    machines except for GenuineIntel hyperthread.
    I suspect that a few extra tests for machines on which
    the perf counter is "safe" may well cover your
    production machines. I've actually encountered
    only a couple of machines on which the vendor got
    it wrong, and they are rather antiques now.

    Have you tried simply patching out the checks for
    perf counter frequency (the block of code conditioned
    on "#if !defined(_WIN64)")? If so, what were the results?
    My suspicion is that on modern machines, that may well
    Just Work - and in that case, we'd win simply by making
    the checks within that block more permissive.

     
  • Matt Newman

    Matt Newman - 2005-11-11

    Logged In: YES
    user_id=1333796

    I don't think you quite follow - on *all* the MP hyperthreaded systems I
    have tested from Dell and HP the criteria fails and it falls back to 10ms
    granularity.

    Also PerfCounters is plain flawed on MP boxes anyway due to the fact
    that unless your process is locked on on CPU the numbers returned by
    PerfCounters depend on which CPU your code is executing on at that
    momement!

    If you want sub-1ms timing for profiling purposes that is entirely a
    different issue from the primary timing sub-system.

    When we profile us have an extension that works like the tcl [time]
    command, but does two things to make the results predicable - 1. it
    locks the process onto a cpu (non-HT) and uses perf-counters for the
    extra resolution - this yields excellent results for profiling...

    Also another thing to consider, if you like I, ship commercial server
    processes that heavily use [after] it is not acceptable to have the timing
    of a [after 1] be 10ms on some systems and not others - it undermines
    the entire design of the application.

    So in considering this patch, I ask you to seperate your (valid) concerns
    about good profiling, from the more general issues of a highly time-
    dependent event loop.

    Matt

     
  • Kevin B KENNY

    Kevin B KENNY - 2005-11-11

    Logged In: YES
    user_id=99768

    Let me make sure that we're reading from the same page here,
    because I think this is going at cross purposes. I *do*
    understand that the tests in the _WIN64 block fail. They are
    too conservative.

    I quite agree that an unpredictable 1-10 ms delay in [after
    1] is unacceptable in any case, and I'm looking into using
    the multimedia timer to mitigate that - bringing it down to
    1 ms resolution. That can be done regardless of whether we
    use the perf counter. So yes, your immediate problem *will*
    get fixed, in something close to the way you request.

    One possibility is to go to a two-loop PLL - phaselock the
    MM timer derived reference to the system clock (this may be
    done for us, I'm checking up on that), and then phaselock
    the perf-counter-derived reference to the MM timer. That
    gives the single-processor (or even multiple-core-per-chip)
    the best of both worlds.

    You have more experience with modern MT servers than I do -
    but the limited work that I've done on Dell servers suggests
    that the board-level integrators actually did better than
    MS's documentation indicates. My understanding is that the
    multiple CPU's actually derive their clocks from the same
    reference, and get their reset pulse at the same time, so
    that even though the counters are separate registers, they
    actually increment in lockstep. That's why I described the
    tests as "overconservative." This was not always true -
    Compaq got it spectacularly wrong in the 486 era - but I
    have contacts that report good results with patching out the
    test on modern systems. That's why I'm trying to identify
    how to make the tests more permissive.

    And, well, the perf counter is just too useful on today's
    typical desktop machines for me to give it up entirely. I
    still think we can get the best of both worlds - no
    unpredictable 10 ms delays *and* the high-resolution counter
    in places where it works.

     
  • Matt Newman

    Matt Newman - 2005-11-11

    Logged In: YES
    user_id=1333796

    As long as I can get 1ms timing if the OS is capable, then I am happy.

    However I would warn you that I did some tests on our DL380 MP boxes
    (HP/Compaq) and perf-counter values were different across the cpus.

    Matt