From: Ron A. <ra...@ho...> - 2010-09-01 11:37:55
|
Hi, I have several coLinux running in different Windows hosts. Occasionally, the time in coLinux advances by several hours and up to a day! The time on the Windows hosts always remains correct. This happened on multiple coLinux servers, some of them ran NTP, but it seems that NTP will not adjust clock if the offset is too large. It also happened on a server that did not run NTP. It should be noted that that the the large offset of 9 hours or more happened at once. It's not a small offset that accumulates. How can this happen? Can anyone explain how time is kept in coLinux? I know that vmWare and HyperV have special tools for keeping the VM clock in sync. Is there anything similar in coLinux? What are the recommendations for keeping time in coLinux? Thanks for a great product, Ron |
From: Henry N. <hen...@ar...> - 2010-09-01 23:42:19
|
On 01.09.2010 13:37, Ron Avriel wrote: > Hi, > > I have several coLinux running in different Windows hosts. > Occasionally, the time in coLinux advances by several hours and up to > a day! > The time on the Windows hosts always remains correct. > This happened on multiple coLinux servers, some of them ran NTP, but > it seems that NTP will not adjust clock if the offset is too large. It > also happened on a server that did not run NTP. > > It should be noted that that the the large offset of 9 hours or more > happened at once. It's not a small offset that accumulates. > > How can this happen? Can anyone explain how time is kept in coLinux? I > know that vmWare and HyperV have special tools for keeping the VM > clock in sync. Is there anything similar in coLinux? > What are the recommendations for keeping time in coLinux? In what direction goes the clock wrong? You lost 9 hours, or is it 9 hours in the future? coLinux uses Windows timer with 10Hz (100ms) as time base. In the time where Linux is not running, we count the "losed" ticks. Next time Linux is running again, the internal Linux timer "jiffies" will adjust from the count of "losed" timer ticks. For calculation the clock difference between last Linux running and next Linux entry will calculate as difference into "jiffies". The time base function on Windows side is "KeQueryPerformanceCounter". I don't know, why some machines have such big difference. One option would be, if the Windows timer frequency is not stable, for example the clock frequency will change from one to other call. Then we would count a wrong timer difference. The "clock frequency" is not the CPU frequency. It is the parameter we get from KeQueryPerformanceCounter, and this should be the same value as long the Windows machine is running. See http://www.osronline.com/ddkx/kmarch/k105_3waa.htm An other idea would be an overflow in the clock calculation, on machines with very high "clock frequency". Can you reproduce it? Maybe we can add some debug messages to check the calculation, if the time difference from one to next Linux entry is more as some seconds? Normal it is less than 100 ms, unless Windows goes into suspend (to RAM or to disk) and will wakeup later. -- Henry N. |
From: Shai V. <sva...@gm...> - 2010-09-02 07:39:21
|
Hi Ron, Do you use an AMD processor? If so, can you please be more specific, which one? I *think* that this could be related to an issue I've seen with RDTSC which reports the clock frequency, usually in accordance with QueryPerformanceFrequency (could be a different API, not sure) but has some problems with AMD processors. There should be a workaround, I'll have to dig more into this. Another option is that the RDTSC is not updated properly when the system is in sleep (power maangement). Can you try again without power management enabled and see if the issue is resolved? Thanks, - Shai On Thu, Sep 2, 2010 at 2:42 AM, Henry Nestler <hen...@ar...> wrote: > On 01.09.2010 13:37, Ron Avriel wrote: > > Hi, > > I have several coLinux running in different Windows hosts. Occasionally, > the time in coLinux advances by several hours and up to a day! > The time on the Windows hosts always remains correct. > This happened on multiple coLinux servers, some of them ran NTP, but it > seems that NTP will not adjust clock if the offset is too large. It also > happened on a server that did not run NTP. > > It should be noted that that the the large offset of 9 hours or more > happened at once. It's not a small offset that accumulates. > > How can this happen? Can anyone explain how time is kept in coLinux? I know > that vmWare and HyperV have special tools for keeping the VM clock in sync. > Is there anything similar in coLinux? > What are the recommendations for keeping time in coLinux? > > > In what direction goes the clock wrong? You lost 9 hours, or is it 9 hours > in the future? > > coLinux uses Windows timer with 10Hz (100ms) as time base. In the time > where Linux is not running, we count the "losed" ticks. Next time Linux is > running again, the internal Linux timer "jiffies" will adjust from the count > of "losed" timer ticks. For calculation the clock difference between last > Linux running and next Linux entry will calculate as difference into > "jiffies". The time base function on Windows side is > "KeQueryPerformanceCounter". > > I don't know, why some machines have such big difference. One option would > be, if the Windows timer frequency is not stable, for example the clock > frequency will change from one to other call. Then we would count a wrong > timer difference. > The "clock frequency" is not the CPU frequency. It is the parameter we get > from KeQueryPerformanceCounter, and this should be the same value as long > the Windows machine is running. See > http://www.osronline.com/ddkx/kmarch/k105_3waa.htm > > An other idea would be an overflow in the clock calculation, on machines > with very high "clock frequency". > > Can you reproduce it? Maybe we can add some debug messages to check the > calculation, if the time difference from one to next Linux entry is more as > some seconds? Normal it is less than 100 ms, unless Windows goes into > suspend (to RAM or to disk) and will wakeup later. > > -- > Henry N. > > > > ------------------------------------------------------------------------------ > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > coLinux-users mailing list > coL...@li... > https://lists.sourceforge.net/lists/listinfo/colinux-users > > -- Shai Vaingast Author of Beginning Python Visualization http://www.apress.com/book/view/1430218436 |
From: Ron A. <ra...@ho...> - 2010-09-02 11:07:05
|
Hi Henry and Shai, Thanks for answers. Here are some answers and more information: The time always advanced to the future, so far. In one coLinux it jumped forward 30944 seconds, and in another it jumped more than 24 hours (I don't have the exact number). The processor is an Intel Pentium 1.4 GHz. QueryPerformanceFrequency on the server returns 3579545. The problem already occurred several times on different servers, but it can take days or weeks until it happens. I think RDTSC is not a good idea for time measurement, because its speed can change. Are you sure it's used in coLinux? It seems to me that the problem is related to some wrong overflow calculation. I'd check in that direction. I also think that special handling is required if time difference between coLinux wakeups is too large. Perhaps in that case you should get the real time from Windows, like you do on startup? Thanks, Ron Date: Thu, 2 Sep 2010 10:39:10 +0300 Subject: Re: [coLinux-users] Very large time offset in coLinux From: sva...@gm... To: hen...@ar... CC: ra...@ho...; col...@li... Hi Ron, Do you use an AMD processor? If so, can you please be more specific, which one? I *think* that this could be related to an issue I've seen with RDTSC which reports the clock frequency, usually in accordance with QueryPerformanceFrequency (could be a different API, not sure) but has some problems with AMD processors. There should be a workaround, I'll have to dig more into this. Another option is that the RDTSC is not updated properly when the system is in sleep (power maangement). Can you try again without power management enabled and see if the issue is resolved? Thanks, - Shai On Thu, Sep 2, 2010 at 2:42 AM, Henry Nestler <hen...@ar...> wrote: On 01.09.2010 13:37, Ron Avriel wrote: Hi, I have several coLinux running in different Windows hosts. Occasionally, the time in coLinux advances by several hours and up to a day! The time on the Windows hosts always remains correct. This happened on multiple coLinux servers, some of them ran NTP, but it seems that NTP will not adjust clock if the offset is too large. It also happened on a server that did not run NTP. It should be noted that that the the large offset of 9 hours or more happened at once. It's not a small offset that accumulates. How can this happen? Can anyone explain how time is kept in coLinux? I know that vmWare and HyperV have special tools for keeping the VM clock in sync. Is there anything similar in coLinux? What are the recommendations for keeping time in coLinux? In what direction goes the clock wrong? You lost 9 hours, or is it 9 hours in the future? coLinux uses Windows timer with 10Hz (100ms) as time base. In the time where Linux is not running, we count the "losed" ticks. Next time Linux is running again, the internal Linux timer "jiffies" will adjust from the count of "losed" timer ticks. For calculation the clock difference between last Linux running and next Linux entry will calculate as difference into "jiffies". The time base function on Windows side is "KeQueryPerformanceCounter". I don't know, why some machines have such big difference. One option would be, if the Windows timer frequency is not stable, for example the clock frequency will change from one to other call. Then we would count a wrong timer difference. The "clock frequency" is not the CPU frequency. It is the parameter we get from KeQueryPerformanceCounter, and this should be the same value as long the Windows machine is running. See http://www.osronline.com/ddkx/kmarch/k105_3waa.htm An other idea would be an overflow in the clock calculation, on machines with very high "clock frequency". Can you reproduce it? Maybe we can add some debug messages to check the calculation, if the time difference from one to next Linux entry is more as some seconds? Normal it is less than 100 ms, unless Windows goes into suspend (to RAM or to disk) and will wakeup later. -- Henry N. ------------------------------------------------------------------------------ This SF.net Dev2Dev email is sponsored by: Show off your parallel programming skills. Enter the Intel(R) Threading Challenge 2010. http://p.sf.net/sfu/intel-thread-sfd _______________________________________________ coLinux-users mailing list coL...@li... https://lists.sourceforge.net/lists/listinfo/colinux-users -- Shai Vaingast Author of Beginning Python Visualization http://www.apress.com/book/view/1430218436 |
From: Henry N. <hen...@ar...> - 2010-09-02 22:42:58
|
Hello Ron, we don't use RDTSC directly. We use windows API, and that should work without problems. KeQueryPerformanceCounter is the only function with smallest time interval. Of curse I also have seen updates for broken KeQueryPerformanceCounter: http://support.microsoft.com/kb/896256 The problem is not to follow the time. If the timediff is more as one second, for example if you suspend Windows, then we set the time absolutely. The problem is, from where we can get the exact time stamp, if KeQueryPerformanceCounter is wrong? Here we have the two calculations. One on Windows side, here we calculate the difference of "jiffies" in callback_return_jiffies: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/kernel/monitor.c?view=markup#l366 co_os_get_timestamp is here: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/kernel/lowlevel/time.c?view=markup#l31 On Linux side we increment teh jiffies and time here in function co_handle_jiffies: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/patch/base-2.6.33.diff?view=markup#l2563 I will create more debug for cases of "time warps". -- Henry On 02.09.2010 13:06, Ron Avriel wrote: > Hi Henry and Shai, > > Thanks for answers. > Here are some answers and more information: > > The time always advanced to the future, so far. > In one coLinux it jumped forward 30944 seconds, and in another it > jumped more than 24 hours (I don't have the exact number). > The processor is an Intel Pentium 1.4 GHz. > QueryPerformanceFrequency on the server returns 3579545. > > The problem already occurred several times on different servers, but > it can take days or weeks until it happens. > > I think RDTSC is not a good idea for time measurement, because its > speed can change. Are you sure it's used in coLinux? > > It seems to me that the problem is related to some wrong overflow > calculation. I'd check in that direction. > I also think that special handling is required if time difference > between coLinux wakeups is too large. Perhaps in that case you should > get the real time from Windows, like you do on startup? > > Thanks, > Ron > > Date: Thu, 2 Sep 2010 10:39:10 +0300 > Subject: Re: [coLinux-users] Very large time offset in coLinux > From: sva...@gm... > To: hen...@ar... > CC: ra...@ho...; col...@li... > > Hi Ron, > Do you use an AMD processor? If so, can you please be more specific, > which one? I *think* that this could be related to an issue I've seen > with RDTSC which reports the clock frequency, usually in accordance > with QueryPerformanceFrequency (could be a different API, not sure) > but has some problems with AMD processors. There should be a > workaround, I'll have to dig more into this. > Another option is that the RDTSC is not updated properly when the > system is in sleep (power maangement). Can you try again without > power management enabled and see if the issue is resolved? > Thanks, > - Shai > > On Thu, Sep 2, 2010 at 2:42 AM, Henry Nestler <hen...@ar...> wrote: > > On 01.09.2010 13:37, Ron Avriel wrote: > > Hi, > > I have several coLinux running in different Windows hosts. > Occasionally, the time in coLinux advances by several hours > and up to a day! > The time on the Windows hosts always remains correct. > This happened on multiple coLinux servers, some of them ran > NTP, but it seems that NTP will not adjust clock if the offset > is too large. It also happened on a server that did not run NTP. > > It should be noted that that the the large offset of 9 hours > or more happened at once. It's not a small offset that > accumulates. > > How can this happen? Can anyone explain how time is kept in > coLinux? I know that vmWare and HyperV have special tools for > keeping the VM clock in sync. Is there anything similar in > coLinux? > What are the recommendations for keeping time in coLinux? > > > In what direction goes the clock wrong? You lost 9 hours, or is it > 9 hours in the future? > > coLinux uses Windows timer with 10Hz (100ms) as time base. In the > time where Linux is not running, we count the "losed" ticks. Next > time Linux is running again, the internal Linux timer "jiffies" > will adjust from the count of "losed" timer ticks. For calculation > the clock difference between last Linux running and next Linux > entry will calculate as difference into "jiffies". The time base > function on Windows side is "KeQueryPerformanceCounter". > > I don't know, why some machines have such big difference. One > option would be, if the Windows timer frequency is not stable, for > example the clock frequency will change from one to other call. > Then we would count a wrong timer difference. > The "clock frequency" is not the CPU frequency. It is the > parameter we get from KeQueryPerformanceCounter, and this should > be the same value as long the Windows machine is running. See > http://www.osronline.com/ddkx/kmarch/k105_3waa.htm > > An other idea would be an overflow in the clock calculation, on > machines with very high "clock frequency". > > Can you reproduce it? Maybe we can add some debug messages to > check the calculation, if the time difference from one to next > Linux entry is more as some seconds? Normal it is less than 100 > ms, unless Windows goes into suspend (to RAM or to disk) and will > wakeup later. > > -- > Henry N. > > > ------------------------------------------------------------------------------ > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > coLinux-users mailing list > coL...@li... > https://lists.sourceforge.net/lists/listinfo/colinux-users > > > > > -- > Shai Vaingast > Author of Beginning Python Visualization > http://www.apress.com/book/view/1430218436 -- Henry N. |
From: Ron A. <ra...@ho...> - 2010-09-03 08:34:56
|
Hi Henry, Can you please explain why you multiply by 100 in timestamp_diff += 100 * (((long long)timestamp.quad) - ((long long)cmon->timestamp.quad)); /* HZ value */ http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/kernel/lowlevel/time.c?view=markup#l31 If this function is called every 100ms, as you previously wrote, then you get 10 jiffies per call. Shouldn't it be 1 jiffy per 100ms? Shouldn't you multiply by 10 instead? I guess your calculation is right, but I can't figure it out. Thanks, Ron Date: Fri, 3 Sep 2010 00:42:47 +0200 From: hen...@ar... To: ra...@ho... CC: col...@li... Subject: Re: [coLinux-users] Very large time offset in coLinux Message body Hello Ron, we don't use RDTSC directly. We use windows API, and that should work without problems. KeQueryPerformanceCounter is the only function with smallest time interval. Of curse I also have seen updates for broken KeQueryPerformanceCounter: http://support.microsoft.com/kb/896256 The problem is not to follow the time. If the timediff is more as one second, for example if you suspend Windows, then we set the time absolutely. The problem is, from where we can get the exact time stamp, if KeQueryPerformanceCounter is wrong? Here we have the two calculations. One on Windows side, here we calculate the difference of "jiffies" in callback_return_jiffies: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/kernel/monitor.c?view=markup#l366 co_os_get_timestamp is here: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/kernel/lowlevel/time.c?view=markup#l31 On Linux side we increment teh jiffies and time here in function co_handle_jiffies: http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/patch/base-2.6.33.diff?view=markup#l2563 I will create more debug for cases of "time warps". -- Henry |
From: Shai V. <sva...@gm...> - 2010-09-03 09:43:16
|
Hi Henry and Ron, Just to verify, I have duplicated this problem on my PC as well. This appears to be related to the selected Power Options settings (Control Panel, Power Options). I believe this can also happen if you have a variable frequency CPU as the time calculation is performed using QueryPerformanceCounter which counts CPU cycles and not time. Ron, can you verify this by disabling power options in your host? Thanks, - Shai On Fri, Sep 3, 2010 at 1:42 AM, Henry Nestler <hen...@ar...> wrote: > Hello Ron, > > we don't use RDTSC directly. We use windows API, and that should work > without problems. KeQueryPerformanceCounter is the only function with > smallest time interval. Of curse I also have seen updates for broken > KeQueryPerformanceCounter: http://support.microsoft.com/kb/896256 > > The problem is not to follow the time. If the timediff is more as one > second, for example if you suspend Windows, then we set the time absolutely. > The problem is, from where we can get the exact time stamp, if > KeQueryPerformanceCounter is wrong? > > Here we have the two calculations. One on Windows side, here we calculate > the difference of "jiffies" in callback_return_jiffies: > > http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/kernel/monitor.c?view=markup#l366 > > co_os_get_timestamp is here: > > http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/kernel/lowlevel/time.c?view=markup#l31 > > On Linux side we increment teh jiffies and time here in function > co_handle_jiffies: > > http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/patch/base-2.6.33.diff?view=markup#l2563 > > I will create more debug for cases of "time warps". > > -- > Henry > > > > On 02.09.2010 13:06, Ron Avriel wrote: > > Hi Henry and Shai, > > Thanks for answers. > Here are some answers and more information: > > The time always advanced to the future, so far. > In one coLinux it jumped forward 30944 seconds, and in another it jumped > more than 24 hours (I don't have the exact number). > The processor is an Intel Pentium 1.4 GHz. > QueryPerformanceFrequency on the server returns 3579545. > > The problem already occurred several times on different servers, but it can > take days or weeks until it happens. > > I think RDTSC is not a good idea for time measurement, because its speed > can change. Are you sure it's used in coLinux? > > It seems to me that the problem is related to some wrong overflow > calculation. I'd check in that direction. > I also think that special handling is required if time difference between > coLinux wakeups is too large. Perhaps in that case you should get the real > time from Windows, like you do on startup? > > Thanks, > Ron > > Date: Thu, 2 Sep 2010 10:39:10 +0300 > Subject: Re: [coLinux-users] Very large time offset in coLinux > From: sva...@gm... > To: hen...@ar... > CC: ra...@ho...; col...@li... > > Hi Ron, > > Do you use an AMD processor? If so, can you please be more specific, which > one? I *think* that this could be related to an issue I've seen with RDTSC > which reports the clock frequency, usually in accordance with > QueryPerformanceFrequency (could be a different API, not sure) but has some > problems with AMD processors. There should be a workaround, I'll have to dig > more into this. > Another option is that the RDTSC is not updated properly when the system is > in sleep (power maangement). Can you try again without power management > enabled and see if the issue is resolved? > > Thanks, > > - Shai > > On Thu, Sep 2, 2010 at 2:42 AM, Henry Nestler <hen...@ar...> wrote: > > On 01.09.2010 13:37, Ron Avriel wrote: > > Hi, > > I have several coLinux running in different Windows hosts. Occasionally, > the time in coLinux advances by several hours and up to a day! > The time on the Windows hosts always remains correct. > This happened on multiple coLinux servers, some of them ran NTP, but it > seems that NTP will not adjust clock if the offset is too large. It also > happened on a server that did not run NTP. > > It should be noted that that the the large offset of 9 hours or more > happened at once. It's not a small offset that accumulates. > > How can this happen? Can anyone explain how time is kept in coLinux? I know > that vmWare and HyperV have special tools for keeping the VM clock in sync. > Is there anything similar in coLinux? > What are the recommendations for keeping time in coLinux? > > > In what direction goes the clock wrong? You lost 9 hours, or is it 9 hours > in the future? > > coLinux uses Windows timer with 10Hz (100ms) as time base. In the time > where Linux is not running, we count the "losed" ticks. Next time Linux is > running again, the internal Linux timer "jiffies" will adjust from the count > of "losed" timer ticks. For calculation the clock difference between last > Linux running and next Linux entry will calculate as difference into > "jiffies". The time base function on Windows side is > "KeQueryPerformanceCounter". > > I don't know, why some machines have such big difference. One option would > be, if the Windows timer frequency is not stable, for example the clock > frequency will change from one to other call. Then we would count a wrong > timer difference. > The "clock frequency" is not the CPU frequency. It is the parameter we get > from KeQueryPerformanceCounter, and this should be the same value as long > the Windows machine is running. See > http://www.osronline.com/ddkx/kmarch/k105_3waa.htm > > An other idea would be an overflow in the clock calculation, on machines > with very high "clock frequency". > > Can you reproduce it? Maybe we can add some debug messages to check the > calculation, if the time difference from one to next Linux entry is more as > some seconds? Normal it is less than 100 ms, unless Windows goes into > suspend (to RAM or to disk) and will wakeup later. > > -- > Henry N. > > > > ------------------------------------------------------------------------------ > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > coLinux-users mailing list > coL...@li... > https://lists.sourceforge.net/lists/listinfo/colinux-users > > > > > -- > Shai Vaingast > Author of Beginning Python Visualization > http://www.apress.com/book/view/1430218436 > > > > -- > Henry N. > > > > ------------------------------------------------------------------------------ > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intel-thread-sfd > _______________________________________________ > coLinux-users mailing list > coL...@li... > https://lists.sourceforge.net/lists/listinfo/colinux-users > > |
From: Ron A. <ra...@ho...> - 2010-09-05 15:16:01
|
Hi Shai, My Windows servers run with standard power options (always on, no hibernation). The frequency returned from QueryPerformanceFrequency "cannot change while the system is running" (from MSDN). If it did change then it could explain the problem, but since the Windows host is running OK, I assume that the root cause is different. Ron Date: Fri, 3 Sep 2010 12:43:04 +0300 Subject: Re: [coLinux-users] Very large time offset in coLinux From: sva...@gm... To: hen...@ar... CC: ra...@ho...; col...@li... Hi Henry and Ron, Just to verify, I have duplicated this problem on my PC as well. This appears to be related to the selected Power Options settings (Control Panel, Power Options). I believe this can also happen if you have a variable frequency CPU as the time calculation is performed using QueryPerformanceCounter which counts CPU cycles and not time. Ron, can you verify this by disabling power options in your host? Thanks, - Shai |
From: Henry N. <hen...@ar...> - 2010-09-06 20:22:45
|
Hello Ron, CONFIG_HZ=100 is in coLinux Kernel config. This are 10 ms per jiffy. It does not matter, how often or in what time diff the function callback_return_jiffies is called on Windows side. The variable "timestamp_diff" will count the jiffies for the time coLinux was not active. An other time base is a free running 100 ms timer (10Hz) for all coLinux daemons on Windows side. This intervall will wake up any sleeping Windows task for coLinux. So, we can guarantee, that coLinux will running all 100 ms in idle or more often in full running action. Henry On 03.09.2010 10:34, Ron Avriel wrote: > Hi Henry, > > Can you please explain why you multiply by 100 in > > timestamp_diff += 100 * (((long long)timestamp.quad) - ((long > long)cmon->timestamp.quad)); /* HZ value */ > > http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/kernel/lowlevel/time.c?view=markup#l31 > > If this function is called every 100ms, as you previously wrote, then > you get 10 jiffies per call. Shouldn't it be 1 jiffy per 100ms? > Shouldn't you multiply by 10 instead? > > I guess your calculation is right, but I can't figure it out. > > Thanks, > Ron > > Date: Fri, 3 Sep 2010 00:42:47 +0200 > From: hen...@ar... > To: ra...@ho... > CC: col...@li... > Subject: Re: [coLinux-users] Very large time offset in coLinux > > Hello Ron, > > we don't use RDTSC directly. We use windows API, and that should work > without problems. KeQueryPerformanceCounter is the only function with > smallest time interval. Of curse I also have seen updates for broken > KeQueryPerformanceCounter: http://support.microsoft.com/kb/896256 > > The problem is not to follow the time. If the timediff is more as one > second, for example if you suspend Windows, then we set the time > absolutely. The problem is, from where we can get the exact time > stamp, if KeQueryPerformanceCounter is wrong? > > Here we have the two calculations. One on Windows side, here we > calculate the difference of "jiffies" in callback_return_jiffies: > http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/kernel/monitor.c?view=markup#l366 > > co_os_get_timestamp is here: > http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/src/colinux/os/winnt/kernel/lowlevel/time.c?view=markup#l31 > > On Linux side we increment teh jiffies and time here in function > co_handle_jiffies: > http://colinux.svn.sourceforge.net/viewvc/colinux/branches/devel/patch/base-2.6.33.diff?view=markup#l2563 > > I will create more debug for cases of "time warps". > > -- > Henry |
From: Ron A. <ra...@ho...> - 2010-09-12 14:29:31
|
Hi Henry, One of our servers leaped forward again. The interesting part is that the leap is almost identical to a previous leap. Last time it leaped forward by 30944 seconds, and this time by 30961 seconds. Performance frequency is 3579545. Since these two leaps are very close, I have a feeling it's not some a random error, but rather a calculation error. It's possible that Windows/Linux were loaded at time of leap. I went over some of the code and found that co_div64() isn't accurate (!), although I couldn't explain the leap by this bug. For example, co_div64(0x100000000,0x10000000) returns 15 instead of 16. co_div64(0x1000000000000,0x10000000) returns 983055 instead of 1048576. I'm sure you'll find more accurate algorithms. Could you also go over relevant code and see if you notice any overflow, signed/unsigned error that can explain the leap with the above data? Would it be possible to to get a debug version to get more information next time the problem occurs? Thanks in advance, Ron |
From: Ron A. <ra...@ho...> - 2010-09-16 17:07:05
|
Hi, Any update on this issue? The server leaped again with almost an identical value (30949 seconds). Is it possible to at least have a debug version with log prints in case of large leap? I also suggest replacing co_div64() - see below. Thanks, Ron From: ra...@ho... To: col...@li... Date: Sun, 12 Sep 2010 14:29:25 +0000 Subject: Re: [coLinux-users] Very large time offset in coLinux Hi Henry, One of our servers leaped forward again. The interesting part is that the leap is almost identical to a previous leap. Last time it leaped forward by 30944 seconds, and this time by 30961 seconds. Performance frequency is 3579545. Since these two leaps are very close, I have a feeling it's not some a random error, but rather a calculation error. It's possible that Windows/Linux were loaded at time of leap. I went over some of the code and found that co_div64() isn't accurate (!), although I couldn't explain the leap by this bug. For example, co_div64(0x100000000,0x10000000) returns 15 instead of 16. co_div64(0x1000000000000,0x10000000) returns 983055 instead of 1048576. I'm sure you'll find more accurate algorithms. Could you also go over relevant code and see if you notice any overflow, signed/unsigned error that can explain the leap with the above data? Would it be possible to to get a debug version to get more information next time the problem occurs? Thanks in advance, Ron ------------------------------------------------------------------------------ Start uncovering the many advantages of virtual appliances and start using them to simplify application deployment and accelerate your shift to cloud computing http://p.sf.net/sfu/novell-sfdev2dev _______________________________________________ coLinux-users mailing list coL...@li... https://lists.sourceforge.net/lists/listinfo/colinux-users |
From: Henry N. <hen...@ar...> - 2010-09-17 00:47:29
|
Hello Ron, thank for tracing all this, and many thanks for pointing to the div64 bug. It would be nice, if you would open a bug report on sf.net, so we don't forget to change the co_div64 some times. Currently I have no idea for better function. I don't assume that is the problem. Because the rounding error will later adjust by multily and storing the rest in the variable timestamp_reminder. I mean this line: cmon->timestamp_reminder = timestamp_diff - (jiffies * cmon->timestamp_freq.quad); A debug version is available from here: http://www.henrynestler.com/colinux/testing/devel-0.7.8/20100916-jiffies I have changed the casts from "long long" to "unsigned long long" and remove the casts where we don't need. So we would have one bit more and no negative values. Old: long long timestamp_diff; timestamp_diff += 100 * (((long long)timestamp.quad) - ((long long)cmon->timestamp.quad)); New: unsigned long long timestamp_diff; timestamp_diff += 100 * (timestamp.quad - cmon->timestamp.quad) Henry On 16.09.2010 19:06, Ron Avriel wrote: > Hi, > > Any update on this issue? The server leaped again with almost an > identical value (30949 seconds). > Is it possible to at least have a debug version with log prints in > case of large leap? > I also suggest replacing co_div64() - see below. > > Thanks, > Ron > > > From: ra...@ho... > To: col...@li... > Date: Sun, 12 Sep 2010 14:29:25 +0000 > Subject: Re: [coLinux-users] Very large time offset in coLinux > > Hi Henry, > > One of our servers leaped forward again. The interesting part is that > the leap is almost identical to a previous leap. > Last time it leaped forward by 30944 seconds, and this time by 30961 > seconds. > Performance frequency is 3579545. > > Since these two leaps are very close, I have a feeling it's not some a > random error, but rather a calculation error. > It's possible that Windows/Linux were loaded at time of leap. > > I went over some of the code and found that co_div64() isn't accurate > (!), although I couldn't explain the leap by this bug. > > For example, > co_div64(0x100000000,0x10000000) returns 15 instead of 16. > co_div64(0x1000000000000,0x10000000) returns 983055 instead of 1048576. > > I'm sure you'll find more accurate algorithms. > > Could you also go over relevant code and see if you notice any > overflow, signed/unsigned error that can explain the leap with the > above data? > Would it be possible to to get a debug version to get more information > next time the problem occurs? > > Thanks in advance, > Ron |
From: Henry N. <hen...@ar...> - 2010-09-17 01:56:28
|
Hello Ron, please see the bug #1780633 on sf.net https://sourceforge.net/tracker/?func=detail&aid=1780633&group_id=98788&atid=622063 The reporter sad: "Every 5 seconds, the system clock is increased by 8 hours, 35 minutes and 45 seconds." (8*60+35)*60+45 = 30 945 seconds. The time you also have seen! Henry On 17.09.2010 02:47, Henry Nestler wrote: > Hello Ron, > > thank for tracing all this, and many thanks for pointing to the div64 > bug. It would be nice, if you would open a bug report on sf.net, so we > don't forget to change the co_div64 some times. Currently I have no > idea for better function. > > I don't assume that is the problem. Because the rounding error will > later adjust by multily and storing the rest in the variable > timestamp_reminder. I mean this line: > cmon->timestamp_reminder = timestamp_diff - (jiffies * > cmon->timestamp_freq.quad); > > A debug version is available from here: > http://www.henrynestler.com/colinux/testing/devel-0.7.8/20100916-jiffies > > I have changed the casts from "long long" to "unsigned long long" and > remove the casts where we don't need. So we would have one bit more > and no negative values. > > Old: > long long timestamp_diff; > timestamp_diff += 100 * (((long long)timestamp.quad) - ((long > long)cmon->timestamp.quad)); > > New: > unsigned long long timestamp_diff; > timestamp_diff += 100 * (timestamp.quad - cmon->timestamp.quad) > > Henry > > On 16.09.2010 19:06, Ron Avriel wrote: >> Hi, >> >> Any update on this issue? The server leaped again with almost an >> identical value (30949 seconds). >> Is it possible to at least have a debug version with log prints in >> case of large leap? >> I also suggest replacing co_div64() - see below. >> >> Thanks, >> Ron >> >> >> From: ra...@ho... <mailto:ra...@ho...> >> To: col...@li... >> <mailto:col...@li...> >> Date: Sun, 12 Sep 2010 14:29:25 +0000 >> Subject: Re: [coLinux-users] Very large time offset in coLinux >> >> Hi Henry, >> >> One of our servers leaped forward again. The interesting part is that >> the leap is almost identical to a previous leap. >> Last time it leaped forward by 30944 seconds, and this time by 30961 >> seconds. >> Performance frequency is 3579545. >> >> Since these two leaps are very close, I have a feeling it's not some >> a random error, but rather a calculation error. >> It's possible that Windows/Linux were loaded at time of leap. >> >> I went over some of the code and found that co_div64() isn't accurate >> (!), although I couldn't explain the leap by this bug. >> >> For example, >> co_div64(0x100000000,0x10000000) returns 15 instead of 16. >> co_div64(0x1000000000000,0x10000000) returns 983055 instead of 1048576. >> >> I'm sure you'll find more accurate algorithms. >> >> Could you also go over relevant code and see if you notice any >> overflow, signed/unsigned error that can explain the leap with the >> above data? >> Would it be possible to to get a debug version to get more >> information next time the problem occurs? >> >> Thanks in advance, >> Ron |
From: Ron A. <ra...@ho...> - 2010-09-20 06:30:05
|
Hello Henry, Thanks a lot for debug version. I installed it, and I'm waiting for the leap again. It's amazing that the exact same time offset was also seen by another user. What's so special about this 30945 value? Too bad it doesn't happen here every five seconds. It would have been solved by now. Re the SF bug - our servers run latest Windows 2003, no virtual machines. Our processor is Intel Pentium 4 1.4 GHz. However cat /etc/adjtime 0.000000 1162000000 0.000000 1162000000 UTC I got the same output is from other machines as well. Is that 1162000000 OK? BTW, from the output in the debug readme I saw that your server has the same clock frequency as our leaping servers: freq=3579545. I hope it will help solving this problem. Thanks, Ron |
From: Henry N. <hen...@ar...> - 2010-09-20 20:55:23
Attachments:
co_div64-vs-div64_32.c
|
Hello Ron, On 20.09.2010 08:29, Ron Avriel wrote: > Hello Henry, > > Thanks a lot for debug version. I installed it, and I'm waiting for > the leap again. > > It's amazing that the exact same time offset was also seen by another > user. What's so special about this 30945 value? > Too bad it doesn't happen here every five seconds. It would have been > solved by now. If we would know. what it is, we would change it. But, currently it is a ghost on some computers only. ;-) > > Re the SF bug - our servers run latest Windows 2003, no virtual > machines. Our processor is Intel Pentium 4 1.4 GHz. > However cat /etc/adjtime > 0.000000 1162000000 0.000000 > 1162000000 > UTC > > I got the same output is from other machines as well. Is that > 1162000000 OK? Yes. The middle field says, when you have last set adjustment. It's the count of seconds since 1/1/1970: TZ= LANG= date --date="1970-01-01 +1162000000sec" Sat Oct 28 01:46:40 UTC 2006 > > BTW, from the output in the debug readme I saw that your server has > the same clock frequency as our leaping servers: freq=3579545. > I hope it will help solving this problem. > Have tested co_div64 against the div64_32 by simulate timediffs between 5 to 10*3579545, and have compaired the jiffies result and remainder for every timediff. Results are all exactly the same. See attched file. The output under Windows is the same as under Linux 32 bit: co_div64: 15 f co_div64: 983055 f000f div64_32: 16 10, rem:0 0 div64_32: 1048576 100000, rem:0 0 done -- Henry N. |