From: Ron A. <ra...@ho...> - 2010-10-21 14:46:22
|
Hello Henry, Thanks a lot for fix. I installed it, and will let it run for some time. On average, the glitch occurs every 4-5 days. I'll update you on results. Thanks again, Ron Date: Wed, 20 Oct 2010 02:26:49 +0200 Subject: Re: [coLinux-users] coLinux 30945 seconds leap mystery solved Hello Ron, many thanks for your work! A skip for negative glitches coded now. We simple ignore this time and wait for next call of this function. Please see SVN r1539: http://colinux.svn.sourceforge.net/viewvc/colinux?view=revision&revision=1539 This should fix the Bug#1780633. Binaries for devel version 0.7.9 have here: http://www.henrynestler.com/colinux/testing/devel-0.7.9/20101019-jiffies/ An update for released version 0.7.8 have here: http://www.henrynestler.com/colinux/testing/stable-0.7.8/20101019-jiffies/linux-stable-svn1528-jiffies-fix.zip Henry Am 10.10.2010 16:50, schrieb Ron Avriel: Hi, After analyzing several +30945 seconds leaps, using the coLinux debug version, I managed to solved the mystery. The root cause of the problem is a timer read hardware glitch. The glitch occurs when the 24th bit of the counter should overflow, but it doesn't. For example: Instead of reading 0x16242A000004, a value of 0x162429000004 was read. Instead of reading 0x13510E000005, a value of 0x13510D000005 was read. This causes an error of -4.68 seconds (for a 3579545 frequency). The code in callback_return_jiffies(), together with an inaccuracy in co_div64() causes some interesting results: The first time the glitch occurs the time leaps forward by 24 seconds. When subsequent glitches occur, time will leap forward by 30945 and 30956 seconds alternatively. The results can be verified with the simulation below. Problem resolution: It seems this clock reading problem is well known and there are several workaroun ds for it. For example see the the thread http://www.mail-archive.com/fre...@fr.../msg34826.html and http://support.microsoft.com/kb/274323 . In any case, I think that a simple check to verify that clock did not shift backward from previous read is a must. In addition, if the clock changed more than a threshold, then the clock should be read again in loop until below threshold. |