From: Bram M. (Syzop) <sy...@vu...> - 2008-05-07 14:01:12
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, I'm experiencing the following problem: I upgraded from 2.6.20.1 to 2.6.25 on both the host and the uml's. Now, after some time (unsure how soon), the uml's appear to hang. It seems though, that they are not completely freezed, but just very very very slow (or rather.. 99% unresponsive). top: ~ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND ~ 5434 virt 20 0 128m 89m 89m R 99 4.5 269:40.81 linux ..so consuming nearly 100% cpu. When typing a letter at the console (I run the umls in a screen), it goes slow, sometimes it takes up to a minute or so... so I can hardly login (actually it does process/buffer my line, but by the time the username is entered and it prompts for the password the login time of 60s is exceeded). Also, there are no errors (like kernel warnings) displayed on the console. When pinging I get: PING slave (192.168.22.11) 56(84) bytes of data. ~From slave (192.168.22.1) icmp_seq=2 Destination Host Unreachable ~From slave (192.168.22.1) icmp_seq=3 Destination Host Unreachable - -more more- 64 bytes from slave (192.168.22.11): icmp_seq=26 ttl=64 time=111 ms - -yes, just one.. then delay... and then..- ping: sendmsg: No buffer space available ping: sendmsg: No buffer space available - -more- When doing a version request using uml_mconsole I get a respond after delay of like 25 seconds, then quick subsequent requests work too, then they no longer do for like 34 seconds, then a reply, etc etc etc. I'm not sure if this is actually correct (I know the 'linux' image corresponds to the running slave kernel but I'm unsure about the backtrace it shows), but here's some gdb stuff: - -gdb- srv1:/home/virt# gdb linux 5434 GNU gdb 6.4.90-debian Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i486-linux-gnu"...Using host libthread_db library "/lib/tls/i686/cmov/libthread_db.so.1". Attaching to program: /home/virt/linux, process 5434 0x0809647a in update_xtime_cache () (gdb) bt #0 0x0809647a in update_xtime_cache () (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x08096477 in update_xtime_cache () (gdb) bt #0 0x08096477 in update_xtime_cache () (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x0809645a in update_xtime_cache () (gdb) bt #0 0x0809645a in update_xtime_cache () (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x0809645d in update_xtime_cache () (gdb) bt #0 0x0809645d in update_xtime_cache () (gdb) ..so each time I ctrl+c after a few secs to see where it's at, it's in there.. Any ideas what this could be? Or any help on how to get additional / useful info? TIA, Bram. - -- Bram Matthys Software developer/IT consultant sy...@vu... PGP key: www.vulnscan.org/pubkey.asc PGP fp: 8DD4 437E 9BA8 09AA 0A8D 1811 E1C3 D65F E6ED 2AA2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (MingW32) iD8DBQFIIbX846ioc5305a8RAtiMAJ9/MJaGN/6k+711lFVxoX9sUgt5vACgxXgr qyheRL6nPMJmat49fDS828k= =0Ra0 -----END PGP SIGNATURE----- |
From: Nix <ni...@es...> - 2008-05-07 20:24:13
|
On 7 May 2008, Bram Matthys said: > Or any help on how to get additional / useful info? Set CONFIG_DEBUG_INFO=y CONFIG_FRAME_POINTER=y in your kernel, and recompile. `bt' will then show heaps more info. -- `If you are having a "ua luea luea le ua le" kind of day, I can only assume that you are doing no work due [to] incapacitating nausea caused by numerous lazy demons.' --- Frossie |
From: Bram M. (Syzop) <sy...@vu...> - 2008-05-08 10:03:08
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Nix wrote: | On 7 May 2008, Bram Matthys said: |> Or any help on how to get additional / useful info? | | Set | | CONFIG_DEBUG_INFO=y | CONFIG_FRAME_POINTER=y | | in your kernel, and recompile. `bt' will then show heaps more info. Thanks, I'll get back with the results once it hangs again. I just noticed something odd at one of the uml's that didn't hang. That uml is still running the .25 kernel (without the debugging info): it says login timeout all the time, this might be why... When I type 'date' every second I get this: root@vsrv:~# date Fri Sep 5 15:49:31 UTC 2008 root@vsrv:~# date Thu Sep 4 03:51:46 UTC 2008 root@vsrv:~# date Tue Sep 9 01:27:15 UTC 2008 root@vsrv:~# date Fri Sep 5 02:27:58 UTC 2008 root@vsrv:~# date Tue Sep 2 06:22:48 UTC 2008 root@vsrv:~# date Tue Sep 9 03:21:44 UTC 2008 root@vsrv:~# date Tue Sep 2 22:51:05 UTC 2008 root@vsrv:~# date Sun Sep 7 09:18:25 UTC 2008 root@vsrv:~# date Thu Sep 11 03:24:52 UTC 2008 root@vsrv:~# date Thu Sep 11 20:24:11 UTC 2008 So it seems to hop both forward and backward.. heavily.. There's no ntp stuff running on the UML btw. Date/Time on the main server is correct. I did do this on the main server a few days ago: /etc/init.d/ntp stop hwclock --systohc /etc/init.d/ntp start due to these kernel messages (on the main): 'set_rtc_mmss: can't update from 90 to 21' ..which went away after that. but could that really be related ? (the hw clock was off by 1 hour or so, but linux time was ok) Bram. - -- Bram Matthys Software developer/IT consultant sy...@vu... PGP key: www.vulnscan.org/pubkey.asc PGP fp: 8DD4 437E 9BA8 09AA 0A8D 1811 E1C3 D65F E6ED 2AA2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (MingW32) iD8DBQFIIs+/46ioc5305a8RAttsAKCwIWDpGjbH7PEaX37e7BE/sIfX8gCgkaC8 frzs6nPC35fMoJ+p78AA4h0= =3e1T -----END PGP SIGNATURE----- |
From: Jeff D. <jd...@ad...> - 2008-05-09 15:46:40
|
On Wed, May 07, 2008 at 04:00:28PM +0200, Bram Matthys (Syzop) wrote: > I'm experiencing the following problem: > I upgraded from 2.6.20.1 to 2.6.25 on both the host and the uml's. > Now, after some time (unsure how soon), the uml's appear to hang. > It seems though, that they are not completely freezed, but just very very > very slow (or rather.. 99% unresponsive). > > top: > ~ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > ~ 5434 virt 20 0 128m 89m 89m R 99 4.5 269:40.81 linux > ..so consuming nearly 100% cpu. Are you using CONFIG_NOHZ? There have been some recent time-related fixes. Can you try the two patches below and see if they help? Jeff -- Work email - jdike at linux dot intel dot com Index: linux-2.6.22/arch/um/os-Linux/time.c =================================================================== --- linux-2.6.22.orig/arch/um/os-Linux/time.c 2008-03-18 12:32:19.000000000 -0400 +++ linux-2.6.22/arch/um/os-Linux/time.c 2008-03-24 12:46:26.000000000 -0400 @@ -11,6 +11,7 @@ #include "kern_constants.h" #include "os.h" #include "user.h" +#include "kern_util.h" int set_interval(void) { @@ -58,12 +59,17 @@ static inline long long timeval_to_ns(co long long disable_timer(void) { struct itimerval time = ((struct itimerval) { { 0, 0 }, { 0, 0 } }); + int remain, max = UM_NSEC_PER_SEC / UM_HZ; if (setitimer(ITIMER_VIRTUAL, &time, &time) < 0) printk(UM_KERN_ERR "disable_timer - setitimer failed, " "errno = %d\n", errno); - return timeval_to_ns(&time.it_value); + remain = timeval_to_ns(&time.it_value); + if (remain > max) + remain = max; + + return remain; } long long os_nsecs(void) @@ -74,12 +80,51 @@ long long os_nsecs(void) return timeval_to_ns(&tv); } +extern void alarm_handler(int sig, struct sigcontext *sc); + #ifdef UML_CONFIG_NO_HZ static int after_sleep_interval(struct timespec *ts) { return 0; } + +static void deliver_alarm(void) +{ + alarm_handler(SIGVTALRM, NULL); +} + +static unsigned long long sleep_time(unsigned long long nsecs) +{ + return nsecs; +} + #else +unsigned long long last_tick; +unsigned long long skew; + +static void deliver_alarm(void) +{ + unsigned long long this_tick = os_nsecs(); + int one_tick = UM_NSEC_PER_SEC / UM_HZ; + + if (last_tick == 0) + last_tick = this_tick - one_tick; + + skew += this_tick - last_tick; + + while (skew >= one_tick) { + alarm_handler(SIGVTALRM, NULL); + skew -= one_tick; + } + + last_tick = this_tick; +} + +static unsigned long long sleep_time(unsigned long long nsecs) +{ + return nsecs > skew ? nsecs - skew : 0; +} + static inline long long timespec_to_us(const struct timespec *ts) { return ((long long) ts->tv_sec * UM_USEC_PER_SEC) + @@ -102,6 +147,8 @@ static int after_sleep_interval(struct t */ if (start_usecs > usec) start_usecs = usec; + + start_usecs -= skew / UM_NSEC_PER_USEC; tv = ((struct timeval) { .tv_sec = start_usecs / UM_USEC_PER_SEC, .tv_usec = start_usecs % UM_USEC_PER_SEC }); interval = ((struct itimerval) { { 0, usec }, tv }); @@ -113,8 +160,6 @@ static int after_sleep_interval(struct t } #endif -extern void alarm_handler(int sig, struct sigcontext *sc); - void idle_sleep(unsigned long long nsecs) { struct timespec ts; @@ -126,10 +171,12 @@ void idle_sleep(unsigned long long nsecs */ if (nsecs == 0) nsecs = UM_NSEC_PER_SEC / UM_HZ; + + nsecs = sleep_time(nsecs); ts = ((struct timespec) { .tv_sec = nsecs / UM_NSEC_PER_SEC, .tv_nsec = nsecs % UM_NSEC_PER_SEC }); if (nanosleep(&ts, &ts) == 0) - alarm_handler(SIGVTALRM, NULL); + deliver_alarm(); after_sleep_interval(&ts); } Index: linux-2.6.22/arch/um/kernel/time.c =================================================================== --- linux-2.6.22.orig/arch/um/kernel/time.c 2008-04-10 12:53:32.000000000 -0400 +++ linux-2.6.22/arch/um/kernel/time.c 2008-04-14 10:30:00.000000000 -0400 @@ -75,7 +75,7 @@ static irqreturn_t um_timer(int irq, voi static cycle_t itimer_read(void) { - return os_nsecs(); + return os_nsecs() / 1000; } static struct clocksource itimer_clocksource = { @@ -83,7 +83,7 @@ static struct clocksource itimer_clockso .rating = 300, .read = itimer_read, .mask = CLOCKSOURCE_MASK(64), - .mult = 1, + .mult = 1000, .shift = 0, .flags = CLOCK_SOURCE_IS_CONTINUOUS, }; |
From: Bram M. (Syzop) <sy...@vu...> - 2008-05-10 08:42:23
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Sorry, this was supposed to go to the list... UPDATE: after 12+ hours still hung. Jeff Dike wrote: | On Wed, May 07, 2008 at 04:00:28PM +0200, Bram Matthys (Syzop) wrote: |> I'm experiencing the following problem: |> I upgraded from 2.6.20.1 to 2.6.25 on both the host and the uml's. |> Now, after some time (unsure how soon), the uml's appear to hang. |> It seems though, that they are not completely freezed, but just very very |> very slow (or rather.. 99% unresponsive). |> |> top: |> ~ PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND |> ~ 5434 virt 20 0 128m 89m 89m R 99 4.5 269:40.81 linux |> ..so consuming nearly 100% cpu. | | Are you using CONFIG_NOHZ? | | There have been some recent time-related fixes. Can you try the two | patches below and see if they help? Thanks for your reply. $ grep HZ .config CONFIG_HZ=100 # CONFIG_NO_HZ is not set I've applied your patch against my 2.6.25 (vanilla)... patching file arch/um/os-Linux/time.c patching file arch/um/kernel/time.c Hunk #1 succeeded at 74 (offset -1 lines). Hunk #2 succeeded at 82 (offset -1 lines). and recompiled etc.. I saw vincent's issue, and when I set the time like 5 seconds back.. the UML freezes and uses 100% cpu and doesn't respond at all. This is however not entirely the same as what I had, because i still had it somewhat responsive... Anyway, applied your patches and recompiled, booted etc.. hangs again when I set the time 5s back. I also tested with 1s backwards... same... This was a quick test, I don't know if it becomes responsive after like several hours... Attaching to program: /home/virt/linux, process 25109 0x080978bf in update_wall_time () at kernel/time/timekeeping.c:475 475 clock->error -= clock->xtime_interval << (TICK_LENGTH_SHIFT - clock->shift); (gdb) bt #0 0x080978bf in update_wall_time () at kernel/time/timekeeping.c:475 #1 0x08086bb5 in do_timer (ticks=1) at kernel/timer.c:929 #2 0x08099793 in tick_periodic (cpu=0) at kernel/time/tick-common.c:66 #3 0x080997b8 in tick_handle_periodic (dev=0x8355420) at kernel/time/tick-common.c:82 #4 0x0805c143 in um_timer (irq=0, dev=0x0) at arch/um/kernel/time.c:70 #5 0x0809fac0 in handle_IRQ_event (irq=0, action=0x11449460) at kernel/irq/handle.c:140 #6 0x0809fb6a in __do_IRQ (irq=0) at kernel/irq/handle.c:236 #7 0x08059d65 in do_IRQ (irq=0, regs=0x834fe98) at arch/um/kernel/irq.c:335 #8 0x0805c0c6 in timer_handler (sig=26, regs=0x834fe98) at arch/um/kernel/time.c:28 #9 0x0806c3d9 in real_alarm_handler (sc=0x0) at arch/um/os-Linux/signal.c:93 #10 0x0806c410 in alarm_handler (sig=26, sc=0x0) at arch/um/os-Linux/signal.c:108 #11 0x0806cee4 in deliver_alarm () at arch/um/os-Linux/time.c:116 #12 0x0806d0f1 in idle_sleep (nsecs=<value optimized out>) at arch/um/os-Linux/time.c:180 #13 0x0805ab13 in default_idle () at arch/um/kernel/process.c:248 #14 0x0805ab56 in cpu_idle () at arch/um/kernel/process.c:256 #15 0x082b379a in rest_init () at init/main.c:453 #16 0x0804879a in start_kernel () at init/main.c:650 #17 0x0804a12c in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:46 #18 0x0806b671 in run_kernel_thread (fn=0x804a100 <start_kernel_proc>, arg=0x0, jmp_ptr=0x83551e0) ~ at arch/um/os-Linux/process.c:267 #19 0x0805a892 in new_thread_handler () at arch/um/kernel/process.c:151 #20 0x00000000 in ?? () (gdb) c Continuing. ^C Program received signal SIGINT, Interrupt. 0x0809785c in update_wall_time () at kernel/time/timekeeping.c:464 464 clock->cycle_last += clock->cycle_interval; (gdb) bt #0 0x0809785c in update_wall_time () at kernel/time/timekeeping.c:464 #1 0x08086bb5 in do_timer (ticks=1) at kernel/timer.c:929 #2 0x08099793 in tick_periodic (cpu=0) at kernel/time/tick-common.c:66 #3 0x080997b8 in tick_handle_periodic (dev=0x8355420) at kernel/time/tick-common.c:82 #4 0x0805c143 in um_timer (irq=0, dev=0x0) at arch/um/kernel/time.c:70 #5 0x0809fac0 in handle_IRQ_event (irq=0, action=0x11449460) at kernel/irq/handle.c:140 #6 0x0809fb6a in __do_IRQ (irq=0) at kernel/irq/handle.c:236 #7 0x08059d65 in do_IRQ (irq=0, regs=0x834fe98) at arch/um/kernel/irq.c:335 #8 0x0805c0c6 in timer_handler (sig=26, regs=0x834fe98) at arch/um/kernel/time.c:28 #9 0x0806c3d9 in real_alarm_handler (sc=0x0) at arch/um/os-Linux/signal.c:93 #10 0x0806c410 in alarm_handler (sig=26, sc=0x0) at arch/um/os-Linux/signal.c:108 #11 0x0806cee4 in deliver_alarm () at arch/um/os-Linux/time.c:116 #12 0x0806d0f1 in idle_sleep (nsecs=<value optimized out>) at arch/um/os-Linux/time.c:180 #13 0x0805ab13 in default_idle () at arch/um/kernel/process.c:248 #14 0x0805ab56 in cpu_idle () at arch/um/kernel/process.c:256 #15 0x082b379a in rest_init () at init/main.c:453 #16 0x0804879a in start_kernel () at init/main.c:650 #17 0x0804a12c in start_kernel_proc (unused=0x0) at arch/um/kernel/skas/process.c:46 #18 0x0806b671 in run_kernel_thread (fn=0x804a100 <start_kernel_proc>, arg=0x0, jmp_ptr=0x83551e0) ~ at arch/um/os-Linux/process.c:267 #19 0x0805a892 in new_thread_handler () at arch/um/kernel/process.c:151 #20 0x00000000 in ?? () (gdb) I also saw this on my console (which does not react either btw), not sure when it appeared.. at or very short after/before the time setting: Stub registers - ~ 0 - 621a ~ 1 - 13 ~ 2 - 621a ~ 3 - 6215 ~ 4 - 8 ~ 5 - bfae182c ~ 6 - 0 ~ 7 - 7b ~ 8 - 7b ~ 9 - 0 ~ 10 - 0 ~ 11 - ffffffff ~ 12 - 1000be ~ 13 - 73 ~ 14 - 200246 ~ 15 - bfae1810 ~ 16 - 7b wait_stub_done : failed to wait for SIGTRAP, pid = 26141, n = 26141, errno = 0, status = 0x1c7f The old 2.6.20.1 uml's react fine when setting time backwards, btw (well.. within reasonable limits) Bram. - -- Bram Matthys Software developer/IT consultant sy...@vu... PGP key: www.vulnscan.org/pubkey.asc PGP fp: 8DD4 437E 9BA8 09AA 0A8D 1811 E1C3 D65F E6ED 2AA2 -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (MingW32) iD8DBQFIJV/E46ioc5305a8RAsiwAJ4wjzYWngQWdfQ+EdGuJgXFyu5PYQCeIPbe tqYB/w+brTtcjK0dLpoe/yY= =P50g -----END PGP SIGNATURE----- |
From: Jeff D. <jd...@ad...> - 2008-05-13 15:40:55
|
On Sat, May 10, 2008 at 10:41:40AM +0200, Bram Matthys (Syzop) wrote: > I also saw this on my console (which does not react either btw), not sure > when it appeared.. at or very short after/before the time setting: > Stub registers - > ~ 0 - 621a > ~ 1 - 13 > ~ 2 - 621a > ~ 3 - 6215 > ~ 4 - 8 > ~ 5 - bfae182c > ~ 6 - 0 > ~ 7 - 7b > ~ 8 - 7b > ~ 9 - 0 > ~ 10 - 0 > ~ 11 - ffffffff > ~ 12 - 1000be > ~ 13 - 73 > ~ 14 - 200246 > ~ 15 - bfae1810 > ~ 16 - 7b > wait_stub_done : failed to wait for SIGTRAP, pid = 26141, n = 26141, errno = > 0, status = 0x1c7f For this one, try this patch: Index: linux-2.6.22/arch/um/os-Linux/skas/process.c =================================================================== --- linux-2.6.22.orig/arch/um/os-Linux/skas/process.c 2008-04-14 10:44:33.000000000 -0400 +++ linux-2.6.22/arch/um/os-Linux/skas/process.c 2008-05-13 11:37:35.000000000 -0400 @@ -55,7 +55,7 @@ static int ptrace_dump_regs(int pid) * Signals that are OK to receive in the stub - we'll just continue it. * SIGWINCH will happen when UML is inside a detached screen. */ -#define STUB_SIG_MASK (1 << SIGVTALRM) +#define STUB_SIG_MASK ((1 << SIGVTALRM) | (1 << SIGWINCH)) /* Signals that the stub will finish with - anything else is an error */ #define STUB_DONE_MASK (1 << SIGTRAP) I doubt it will fix the time problem. I'm going to chase vincent's problem on the assusmption that you're seeing the same thing. When I figure that out, we'll see how true that is. > The old 2.6.20.1 uml's react fine when setting time backwards, btw (well.. > within reasonable limits) UML got its timekeeping redone as part of the tickless work and I'm still shaking out bugs... Jeff -- Work email - jdike at linux dot intel dot com |
From: Sakari A. <sak...@sa...> - 2008-05-14 08:41:29
|
Bram Matthys (Syzop) wrote: > Thanks for your reply. > > $ grep HZ .config > CONFIG_HZ=100 > # CONFIG_NO_HZ is not set > > I've applied your patch against my 2.6.25 (vanilla)... > patching file arch/um/os-Linux/time.c > patching file arch/um/kernel/time.c > Hunk #1 succeeded at 74 (offset -1 lines). > Hunk #2 succeeded at 82 (offset -1 lines). > and recompiled etc.. > > I saw vincent's issue, and when I set the time like 5 seconds back.. the UML > freezes and uses 100% cpu and doesn't respond at all. This is however not > entirely the same as what I had, because i still had it somewhat responsive... Hi, I think I have experienced the same problem. I don't have time now to investigate it further, but I have some info which may or may not be useful in debugging. So this is mainly just FYI. I have three UML instances running on a host. First, they all were unresponsive simultaneously using all CPU time they could get. After a while they became responsive again. I could log in through SSH. The funny thing is that the date command showed correct date and time (as far as I remember, can't test it now as they are hung again) while the time in bash prompt was constant showing the time around the initial hang, which is the same on all three instances. I think there was some NTP related activity on the host while this happened. The time the UMLs were showing was 23:xx:xx, don't know exactly. :( --- May 13 22:58:56 retiisi ntpd[20630]: synchronized to 192.26.119.7, stratum 2 May 13 22:58:56 retiisi ntpd[20630]: time reset -5.151310 s May 13 22:58:56 retiisi ntpd[20630]: kernel time sync enabled 0001 May 13 22:58:52 retiisi kernel: set_rtc_mmss: can't update from 1 to 58 May 13 22:58:56 retiisi last message repeated 4 times May 13 22:59:27 retiisi kernel: set_rtc_mmss: can't update from 1 to 59 May 13 22:59:40 retiisi last message repeated 13 times May 13 22:59:41 retiisi kernel: set_rtc_mmss: can't update from 2 to 59 May 13 22:59:59 retiisi last message repeated 18 times May 13 23:02:29 retiisi ntpd[20630]: synchronized to 192.26.119.7, stratum 2 --- The host is 2.6.24 (skas4) and the clients are vanilla 2.6.24. Oh dear, I seem to have CONFIG_NO_HZ enabled... -- Sakari Ailus sak...@sa... |
From: Jeff D. <jd...@ad...> - 2008-05-19 16:16:52
|
On Wed, May 14, 2008 at 11:41:18AM +0300, Sakari Ailus wrote: > I have three UML instances running on a host. First, they all were > unresponsive simultaneously using all CPU time they could get. After a > while they became responsive again. I could log in through SSH. The > funny thing is that the date command showed correct date and time (as > far as I remember, can't test it now as they are hung again) while the > time in bash prompt was constant showing the time around the initial > hang, which is the same on all three instances. I reproduced and debugged a similar problem, resulting in the patch below. See if it makes any difference for you... Jeff -- Work email - jdike at linux dot intel dot com Index: 2.6/stable/arch/um/os-Linux/time.c =================================================================== --- 2.6.orig/stable/arch/um/os-Linux/time.c 2008-05-14 14:55:56.000000000 -0400 +++ 2.6/stable/arch/um/os-Linux/time.c 2008-05-14 15:30:48.000000000 -0400 @@ -66,12 +66,21 @@ long long disable_timer(void) return timeval_to_ns(&time.it_value); } +static long long last_time; + long long os_nsecs(void) { struct timeval tv; + long long ret; gettimeofday(&tv, NULL); - return timeval_to_ns(&tv); + ret = timeval_to_ns(&tv); + + if((last_time != 0) && (last_time > ret)) + ret = last_time; + + last_time = ret; + return ret; } #ifdef UML_CONFIG_NO_HZ |
From: Bram M. (Syzop) <sy...@vu...> - 2008-05-31 09:17:11
|
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi Jeff, Sorry for my late reply. Just tried, and I think your patch fixed it (at least a test with 'date' setting the time 2 seconds back on the host no longer causes it to hang). If it causes any trouble in the next few days I'll let you know. Regards, Bram. Jeff Dike wrote: | On Wed, May 14, 2008 at 11:41:18AM +0300, Sakari Ailus wrote: |> I have three UML instances running on a host. First, they all were |> unresponsive simultaneously using all CPU time they could get. After a |> while they became responsive again. I could log in through SSH. The |> funny thing is that the date command showed correct date and time (as |> far as I remember, can't test it now as they are hung again) while the |> time in bash prompt was constant showing the time around the initial |> hang, which is the same on all three instances. | | I reproduced and debugged a similar problem, resulting in the patch | below. See if it makes any difference for you... | | Jeff | -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.5 (MingW32) iD8DBQFIQReI46ioc5305a8RAqQpAKC0j9HcBGFxRgaT2yvXPC9E3W1FXACgg6q6 LAYiR6Z0MMuwIZfTax6ojhc= =tSsa -----END PGP SIGNATURE----- |