From: Henry N. <hen...@ar...> - 2010-09-17 01:56:28
|
Hello Ron, please see the bug #1780633 on sf.net https://sourceforge.net/tracker/?func=detail&aid=1780633&group_id=98788&atid=622063 The reporter sad: "Every 5 seconds, the system clock is increased by 8 hours, 35 minutes and 45 seconds." (8*60+35)*60+45 = 30 945 seconds. The time you also have seen! Henry On 17.09.2010 02:47, Henry Nestler wrote: > Hello Ron, > > thank for tracing all this, and many thanks for pointing to the div64 > bug. It would be nice, if you would open a bug report on sf.net, so we > don't forget to change the co_div64 some times. Currently I have no > idea for better function. > > I don't assume that is the problem. Because the rounding error will > later adjust by multily and storing the rest in the variable > timestamp_reminder. I mean this line: > cmon->timestamp_reminder = timestamp_diff - (jiffies * > cmon->timestamp_freq.quad); > > A debug version is available from here: > http://www.henrynestler.com/colinux/testing/devel-0.7.8/20100916-jiffies > > I have changed the casts from "long long" to "unsigned long long" and > remove the casts where we don't need. So we would have one bit more > and no negative values. > > Old: > long long timestamp_diff; > timestamp_diff += 100 * (((long long)timestamp.quad) - ((long > long)cmon->timestamp.quad)); > > New: > unsigned long long timestamp_diff; > timestamp_diff += 100 * (timestamp.quad - cmon->timestamp.quad) > > Henry > > On 16.09.2010 19:06, Ron Avriel wrote: >> Hi, >> >> Any update on this issue? The server leaped again with almost an >> identical value (30949 seconds). >> Is it possible to at least have a debug version with log prints in >> case of large leap? >> I also suggest replacing co_div64() - see below. >> >> Thanks, >> Ron >> >> >> From: ra...@ho... <mailto:ra...@ho...> >> To: col...@li... >> <mailto:col...@li...> >> Date: Sun, 12 Sep 2010 14:29:25 +0000 >> Subject: Re: [coLinux-users] Very large time offset in coLinux >> >> Hi Henry, >> >> One of our servers leaped forward again. The interesting part is that >> the leap is almost identical to a previous leap. >> Last time it leaped forward by 30944 seconds, and this time by 30961 >> seconds. >> Performance frequency is 3579545. >> >> Since these two leaps are very close, I have a feeling it's not some >> a random error, but rather a calculation error. >> It's possible that Windows/Linux were loaded at time of leap. >> >> I went over some of the code and found that co_div64() isn't accurate >> (!), although I couldn't explain the leap by this bug. >> >> For example, >> co_div64(0x100000000,0x10000000) returns 15 instead of 16. >> co_div64(0x1000000000000,0x10000000) returns 983055 instead of 1048576. >> >> I'm sure you'll find more accurate algorithms. >> >> Could you also go over relevant code and see if you notice any >> overflow, signed/unsigned error that can explain the leap with the >> above data? >> Would it be possible to to get a debug version to get more >> information next time the problem occurs? >> >> Thanks in advance, >> Ron |