Hello Ron,

thank for tracing all this, and many thanks for pointing to the div64 bug. It would be nice, if you would open a bug report on sf.net, so we don't forget to change the co_div64 some times. Currently I have no idea for better function.

I don't assume that is the problem. Because the rounding error will later adjust by multily and storing the rest in the variable timestamp_reminder. I mean this line:
cmon->timestamp_reminder = timestamp_diff - (jiffies * cmon->timestamp_freq.quad);

A debug version is available from here:
http://www.henrynestler.com/colinux/testing/devel-0.7.8/20100916-jiffies

I have changed the casts from "long long" to "unsigned long long" and remove the casts where we don't need. So we would have one bit more and no negative values.

Old:
long long timestamp_diff;
timestamp_diff += 100 * (((long long)timestamp.quad) - ((long long)cmon->timestamp.quad));

New:
unsigned long long timestamp_diff;
timestamp_diff += 100 * (timestamp.quad - cmon->timestamp.quad)

Henry

On 16.09.2010 19:06, Ron Avriel wrote:
Hi,

Any update on this issue? The server leaped again with almost an identical value (30949 seconds).
Is it possible to at least have a debug version with log prints in case of large leap?
I also suggest replacing co_div64() - see below.

Thanks,
Ron


From: ravriel@hotmail.com
To: colinux-users@lists.sourceforge.net
Date: Sun, 12 Sep 2010 14:29:25 +0000
Subject: Re: [coLinux-users] Very large time offset in coLinux

Hi Henry,

One of our servers leaped forward again. The interesting part is that the leap is almost identical to a previous leap.
Last time it leaped forward by 30944 seconds, and this time by 30961 seconds.
Performance frequency is 3579545.

Since these two leaps are very close, I have a feeling it's not some a random error, but rather a calculation error.
It's possible that Windows/Linux were loaded at time of leap.

I went over some of the code and found that co_div64() isn't accurate (!), although I couldn't explain the leap by this bug.

For example,
co_div64(0x100000000,0x10000000) returns 15 instead of 16.
co_div64(0x1000000000000,0x10000000) returns 983055 instead of 1048576.

I'm sure you'll find more accurate algorithms.

Could you also go over relevant code and see if you notice any overflow, signed/unsigned error that can explain the leap with the above data?
Would it be possible to to get a debug version to get more information next time the problem occurs?

Thanks in advance,
Ron