From: Gregory S. <gas...@wi...> - 2007-04-19 01:13:50
|
Hi list, I'd like to pause UML's wall clock (and jiffies count) when I issue a "stop" to the VM, and resume it when I issue "go", so that the VMs notion of time only reflects its running time. I've been poking around a bit, but I've only just started learning about the internals of the linux kernel, so it's slow going. My first (admittedly naive) approach was to merely use set the clock using do_settimeofday in drivers/mconsole_kern.c:mconsole_stop(), but this seems to cause strange behavior for time-sensitive programs inside the VM. >From what I've read in the past few days, this approach is seriously unenlightened. I feel like the best way to implement the desired functionality would be to somehow stop timer interrupts from reaching the kernel. However, I haven't been able to figure out how to do this yet. If anyone has any pointers or suggestions, I'd really appreciate the help. Thanks, Greg |
From: Jeff D. <jd...@ad...> - 2007-04-19 02:50:47
|
On Wed, Apr 18, 2007 at 08:13:46PM -0500, Gregory Smith wrote: > I'd like to pause UML's wall clock (and jiffies count) when I > issue a "stop" to the VM, and resume it when I issue "go", so that > the VMs notion of time only reflects its running time. Disable CONFIG_UML_REAL_TIME_CLOCK. That will turn off UML's attempts to catch up with the real world when it's been asleep for a while. Jeff -- Work email - jdike at linux dot intel dot com |
From: Gregory S. <gas...@wi...> - 2007-04-19 03:44:35
|
Hi Jeff, On Wed, Apr 18, 2007 at 10:48:47PM -0400, Jeff Dike wrote: > Disable CONFIG_UML_REAL_TIME_CLOCK. That will turn off UML's attempts > to catch up with the real world when it's been asleep for a while. This measure seems to be, on its own, insufficient. For example, the UML binary in the following example was built with that option disabled: --------------- snip uml:~# date Wed Apr 18 22:31:40 CDT 2007 // mconsole stop, wait ~30 sec, mconsole go uml:~# date Wed Apr 18 22:32:14 CDT 2007 --------------- snip With UML paused for 30 seconds, it comes back and has the same time and date as the host. Whatever is causing this also causes programs running inside UML to wake up prematurely, if the UML instance has been paused and resumed. Greg |
From: Jeff D. <jd...@ad...> - 2007-04-19 04:28:53
|
On Wed, Apr 18, 2007 at 10:44:27PM -0500, Gregory Smith wrote: > On Wed, Apr 18, 2007 at 10:48:47PM -0400, Jeff Dike wrote: > > Disable CONFIG_UML_REAL_TIME_CLOCK. That will turn off UML's attempts > > to catch up with the real world when it's been asleep for a while. > > This measure seems to be, on its own, insufficient. For example, the > UML binary in the following example was built with that option > disabled: > > --------------- snip > uml:~# date > Wed Apr 18 22:31:40 CDT 2007 > > // mconsole stop, wait ~30 sec, mconsole go > > uml:~# date > Wed Apr 18 22:32:14 CDT 2007 Do you really care about gettimeofday? That is tied to the host, in the sense that the UML gtod calls the host's gtod. However, sleeps inside UML should behave as you want. If you can demonstrate otherwise, please do. Jeff -- Work email - jdike at linux dot intel dot com |
From: Gregory S. <gas...@wi...> - 2007-04-19 04:59:55
|
On Thu, Apr 19, 2007 at 12:26:53AM -0400, Jeff Dike wrote: > Do you really care about gettimeofday? That is tied to the host, in > the sense that the UML gtod calls the host's gtod. However, sleeps > inside UML should behave as you want. If you can demonstrate > otherwise, please do. Right, it's the sleeps I care about; gettimeofday is trivial. Here is an example. I just coded up two programs -- one to run in UML, and one to run in the host. The UML program simply calls nanosleep() to sleep for 10 seconds. The host program sends "stop" to the mconsole through a socket, sleeps for 15 seconds, and then sends "go". ---------- snip 1176957979.606483 [UML] Trying to nanosleep for 10 seconds 1176957980.633799 [HOST] Sent "stop" to UML 1176957995.634926 [HOST] Sent "go" to UML 1176957995.655234 [UML] Woke up with 0.000000000 seconds remaining ---------- snip Thus, from a UML-internal perspective, the process only slept for about a second, but it thinks it has slept at least 10. I should make a comment about my kernels at this point, since this is clearly unexpected behavior. * Host kernel: Linux 2.6.17-hrt-dyntick5-skas3-v9-pre9 * UML kernel: Linux 2.6.18.3 For the host, the hrt-dyntick5 patch is for high-resolution timers. I'm fairly certain that I saw the same behavior exemplified above /without/ this patch applied, but I could double-check if you think it might be a contributing factor. Thanks for your quick responses, by the way. Greg |
From: Jeff D. <jd...@ad...> - 2007-04-19 14:48:24
|
On Wed, Apr 18, 2007 at 11:59:52PM -0500, Gregory Smith wrote: > Thus, from a UML-internal perspective, the process only slept for > about a second, but it thinks it has slept at least 10. You're right. I can start a sleep, pause UML, and if I unpause it before the sleep is supposed to expire, it wakes up right on schedule. This needs looking at, stay tuned. Jeff -- Work email - jdike at linux dot intel dot com |
From: Jeff D. <jd...@ad...> - 2007-04-19 16:19:28
|
On Wed, Apr 18, 2007 at 11:59:52PM -0500, Gregory Smith wrote: > Thus, from a UML-internal perspective, the process only slept for > about a second, but it thinks it has slept at least 10. It looks like jiffies doesn't really control anything any more. The timers are driven from xtime, which UML updates from the host gettimeofday. To fix this, I guess I would just increase xtime every tick by the amount that it should be bumped in the absence of being paused. Jeff -- Work email - jdike at linux dot intel dot com |
From: Gregory S. <gas...@wi...> - 2007-04-19 05:11:43
|
On Wed, Apr 18, 2007 at 11:59:52PM -0500, Gregory Smith wrote: > ---------- snip > 1176957979.606483 [UML] Trying to nanosleep for 10 seconds > 1176957980.633799 [HOST] Sent "stop" to UML > 1176957995.634926 [HOST] Sent "go" to UML > 1176957995.655234 [UML] Woke up with 0.000000000 seconds remaining > ---------- snip Ignore the misleading "seconds remaining" part. I need to read my manpages better. Suffice it to say that nanosleep() did not return EINTR. Greg |
From: Jeff D. <jd...@ad...> - 2007-04-19 17:45:20
|
Try the patch below. I had to totally disassociate the UML's gettimeofday from the host's. Jeff -- Work email - jdike at linux dot intel dot com Index: linux-2.6.21-mm/arch/um/kernel/time.c =================================================================== --- linux-2.6.21-mm.orig/arch/um/kernel/time.c 2007-04-08 19:18:52.000000000 -0400 +++ linux-2.6.21-mm/arch/um/kernel/time.c 2007-04-19 13:43:55.000000000 -0400 @@ -34,8 +34,8 @@ unsigned long long sched_clock(void) return (unsigned long long)jiffies_64 * (1000000000 / HZ); } -static unsigned long long prev_nsecs[NR_CPUS]; #ifdef CONFIG_UML_REAL_TIME_CLOCK +static unsigned long long prev_nsecs[NR_CPUS]; static long long delta[NR_CPUS]; /* Deviation per interval */ #endif @@ -94,7 +94,12 @@ irqreturn_t um_timer(int irq, void *dev) do_timer(1); +#ifdef CONFIG_UML_REAL_TIME_CLOCK nsecs = get_time(); +#else + nsecs = (unsigned long long) xtime.tv_sec * BILLION + xtime.tv_nsec + + BILLION / HZ; +#endif xtime.tv_sec = nsecs / NSEC_PER_SEC; xtime.tv_nsec = nsecs - xtime.tv_sec * NSEC_PER_SEC; @@ -127,13 +132,18 @@ void time_init(void) nsecs = os_nsecs(); set_normalized_timespec(&wall_to_monotonic, -nsecs / BILLION, -nsecs % BILLION); + set_normalized_timespec(&xtime, nsecs / BILLION, nsecs % BILLION); late_time_init = register_timer; } void do_gettimeofday(struct timeval *tv) { +#ifdef CONFIG_UML_REAL_TIME_CLOCK unsigned long long nsecs = get_time(); - +#else + unsigned long long nsecs = (unsigned long long) xtime.tv_sec * BILLION + + xtime.tv_nsec; +#endif tv->tv_sec = nsecs / NSEC_PER_SEC; /* Careful about calculations here - this was originally done as * (nsecs - tv->tv_sec * NSEC_PER_SEC) / NSEC_PER_USEC |
From: Gregory S. <gas...@wi...> - 2007-04-19 18:31:53
|
Yes, that seems to do the trick. Thanks very much! Greg On Thu, Apr 19, 2007 at 01:43:19PM -0400, Jeff Dike wrote: > Try the patch below. I had to totally disassociate the UML's > gettimeofday from the host's. > > Jeff > > -- > Work email - jdike at linux dot intel dot com > > Index: linux-2.6.21-mm/arch/um/kernel/time.c > =================================================================== > --- linux-2.6.21-mm.orig/arch/um/kernel/time.c 2007-04-08 19:18:52.000000000 -0400 > +++ linux-2.6.21-mm/arch/um/kernel/time.c 2007-04-19 13:43:55.000000000 -0400 > @@ -34,8 +34,8 @@ unsigned long long sched_clock(void) > return (unsigned long long)jiffies_64 * (1000000000 / HZ); > } > > -static unsigned long long prev_nsecs[NR_CPUS]; > #ifdef CONFIG_UML_REAL_TIME_CLOCK > +static unsigned long long prev_nsecs[NR_CPUS]; > static long long delta[NR_CPUS]; /* Deviation per interval */ > #endif > > @@ -94,7 +94,12 @@ irqreturn_t um_timer(int irq, void *dev) > > do_timer(1); > > +#ifdef CONFIG_UML_REAL_TIME_CLOCK > nsecs = get_time(); > +#else > + nsecs = (unsigned long long) xtime.tv_sec * BILLION + xtime.tv_nsec + > + BILLION / HZ; > +#endif > xtime.tv_sec = nsecs / NSEC_PER_SEC; > xtime.tv_nsec = nsecs - xtime.tv_sec * NSEC_PER_SEC; > > @@ -127,13 +132,18 @@ void time_init(void) > nsecs = os_nsecs(); > set_normalized_timespec(&wall_to_monotonic, -nsecs / BILLION, > -nsecs % BILLION); > + set_normalized_timespec(&xtime, nsecs / BILLION, nsecs % BILLION); > late_time_init = register_timer; > } > > void do_gettimeofday(struct timeval *tv) > { > +#ifdef CONFIG_UML_REAL_TIME_CLOCK > unsigned long long nsecs = get_time(); > - > +#else > + unsigned long long nsecs = (unsigned long long) xtime.tv_sec * BILLION + > + xtime.tv_nsec; > +#endif > tv->tv_sec = nsecs / NSEC_PER_SEC; > /* Careful about calculations here - this was originally done as > * (nsecs - tv->tv_sec * NSEC_PER_SEC) / NSEC_PER_USEC |
From: Gregory S. <gas...@wi...> - 2007-04-20 02:13:52
|
Hi Jeff, Thanks again for the patch. I ended up doing something slightly different, but reading through the code and looking at your changes helped me arrive at a solution. The primary problem I face with your patch is that the timer ticks are not interrupting at even intervals. While things /do/ work the way we discussed, the clock was slipping all over the place. Sleeps inside the VM were in sync and preserved when the VM was paused, but their real-time value was intrinsically unreliable. So, I think that syncing to the host clock in /some/ way is a good thing. In my original solution, where I used {get,set}timeofday to manage the clock (with UML_REAL_TIME_CLOCK disabled), the real problem (I think) is that local_offset was overflowing. Because (in my case) the UML clock always lags the host clock, I simply switched the semantics of local_offset to be "how far behind" instead of "how far ahead". This seems to work (hack though it may be). The problem I had with my original solution is that, although some sleeps worked, others didn't. I haven't seen this problem since fixing local_offset. I'll be working with this more over the weekend, and I'll let you know if things break. For now, I need to divert my energies towards homework. Thanks again for your quick responses! Greg |
From: Jeff D. <jd...@ad...> - 2007-04-20 03:53:15
|
On Thu, Apr 19, 2007 at 09:13:46PM -0500, Gregory Smith wrote: > The primary problem I face with your patch is that the timer ticks are > not interrupting at even intervals. While things /do/ work the way we > discussed, the clock was slipping all over the place. Sleeps inside > the VM were in sync and preserved when the VM was paused, but their > real-time value was intrinsically unreliable. I saw the same thing, if I understand what you mean. I was seeing very irregular intervals, including the host clock going backwards on occasion. Although, now that I'm thinking about it, some of the irregularity might be due to how UML handles signals. They're not actually blocked. When signals aren't allowed, a flag is set saying that, the signals arrive as usual, except they don't go any further, except to set another flag that says that they happened. When signals are enabled, this flag is checked, and the actual handling happens. This might make timers less regular, but it doesn't explain the host gettimeofday going backwards. > I'll be working with this more over the weekend, and I'll let you know > if things break. For now, I need to divert my energies towards > homework. Thanks again for your quick responses! You did point out a real bug, and I'll be sending the fix to mainline. Jeff -- Work email - jdike at linux dot intel dot com |
From: Blaisorblade <bla...@ya...> - 2007-05-17 18:44:34
|
On venerd=EC 20 aprile 2007, Jeff Dike wrote: > On Thu, Apr 19, 2007 at 09:13:46PM -0500, Gregory Smith wrote: > > The primary problem I face with your patch is that the timer ticks are > > not interrupting at even intervals. While things /do/ work the way we > > discussed, the clock was slipping all over the place. Sleeps inside > > the VM were in sync and preserved when the VM was paused, but their > > real-time value was intrinsically unreliable. > > I saw the same thing, if I understand what you mean. I was seeing > very irregular intervals, including the host clock going backwards on > occasion. > Although, now that I'm thinking about it, some of the irregularity > might be due to how UML handles signals. They're not actually > blocked. When signals aren't allowed, a flag is set saying that, the > signals arrive as usual, except they don't go any further, except to > set another flag that says that they happened. When signals are > enabled, this flag is checked, and the actual handling happens. > > This might make timers less regular, but it doesn't explain the host > gettimeofday going backwards. That may be partially related to timer sources contributing for sub-HZ=20 resolution, if they are not reliable (say TSC), or to problems in old host= =20 timer code. In *2004* Ottawa Linux Symposium proceedings I read (on a paper= =20 by John Stultz) that sub-sources are used only to interpolate between jiffi= es=20 ticks, which means that at each timer tick the host clock may go backwards;= =20 the fix (a rewrite of the timing subsystem) is related with HRT, and I don'= t=20 know the current status not which host kernel you are using. =2D-=20 Inform me of my mistakes, so I can add them to my list! Paolo Giarrusso, aka Blaisorblade http://www.user-mode-linux.org/~blaisorblade |