From: Rennie d. <ren...@gm...> - 2006-05-17 17:20:13
|
I'm trying to build a UML guest system based on Gentoo and the 2.6.16.16 kernel. It seems to work, except that it always hangs on shutdown at the message " * Deactivating swap ...". This is after syslog and sshd have been shut down, so there's no log information and I can't get a ps output. My skills with gdb are limited (is it possible to get a something like a ps listing from gdb or uml_mconsole?), but the UML kernel seems to be spending some of its time in an idle loop and some in the stack trace below. I can kill the uml system with uml_mconsole, but it's annoying to have to do that all the time. Does anyone know why this is happening or how I can fix it? Thanks, Rennie deGraaf Host kernel: 2.6.16-1.2096_FC4 (unmodified Fedora Core 4 kernel) Host system: Fedora Core 4 Guest kernel: 2.6.16.16 (unmodified) Guest system: Gentoo Kernel stack trace: Program received signal SIGINT, Interrupt. sig_handler (sig=29) at arch/um/os-Linux/signal.c:42 42 in arch/um/os-Linux/signal.c (gdb) bt #0 sig_handler (sig=29) at arch/um/os-Linux/signal.c:42 #1 <signal handler called> #2 0x002ed402 in __kernel_vsyscall () #3 0x003c7513 in __write_nocancel () from /lib/libc.so.6 #4 0x08083119 in file_io (fd=11, buf=0xa0b3adc, len=64, io_proc=0x805c440, copy_user_proc=0x805fe50 <copy_to_user_proc>) at arch/um/os-Linux/file.c:333 #5 0x0807f7ca in do_ubd_request (q=0x9502bc8) at arch/um/drivers/ubd_kern.c:1054 #6 0x0816641b in generic_unplug_device (q=0x9502bc8) at block/ll_rw_blk.c:1611 #7 0x080d4a58 in swap_unplug_io_fn (unused_bdi=0x82644f8, page=Variable "page" is not available. ) at include/linux/blkdev.h:629 #8 0x080e56e6 in block_sync_page (page=0x94a13a0) at include/linux/blkdev.h:629 #9 0x080bd562 in sync_page (word=0x94a13a0) at mm/filemap.c:165 #10 0x0820f225 in __wait_on_bit (wq=0x94b0600, q=0xa0b3b8c, action=0x80bd530 <sync_page>, mode=2) at kernel/wait.c:162 #11 0x080bdd51 in wait_on_page_bit (page=Variable "page" is not available. ) at mm/filemap.c:455 #12 0x080d5995 in try_to_unuse (type=0) at include/linux/pagemap.h:189 #13 0x080d6354 in sys_swapoff (specialfile=0x804f078 "\030\uffff\215\uffff") at mm/swapfile.c:1155 #14 0x080647ca in handle_syscall (r=0x96682d8) at arch/um/kernel/skas/syscall.c:42 #15 0x080897be in userspace (regs=0x96682d8) at arch/um/os-Linux/skas/process.c:151 #16 0x08064458 in fork_handler (sig=10) at arch/um/kernel/skas/process_kern.c:96 #17 <signal handler called> #18 0x002ed402 in __kernel_vsyscall () #19 0x080890f4 in map (mm_idp=0xa8741bc, virt=32277, len=157712640, r=3359623, w=1, x=1, phys_fd=6, offset=14423702927018972, done=1, data=0xa103aec) at arch/um/os-Linux/skas/mem.c:209 #20 0x0806495a in do_ops (mmu=0x94f8740, ops=0xa103ab8, last=136695296, finished=156206912, flush=0xa103aec) at arch/um/kernel/skas/tlb.c:31 #21 0x00000001 in ?? () #22 0x094f8740 in ?? () #23 0x0a103ab8 in ?? () #24 0x0825ce00 in init_sighand () #25 0x094f8740 in ?? () #26 0x0a100000 in ?? () #27 0x08089a99 in switch_threads (me=0x94f8c24, next=0x825baec) at arch/um/os-Linux/skas/process.c:485 #28 0x0806428d in switch_to_skas (prev=0x94f8740, next=0x825ce00) at arch/um/kernel/skas/process_kern.c:35 #29 0x0805f8fa in _switch_to (prev=0x333df7, next=0x825ce00, last=0x94f8740) at arch/um/kernel/process_kern.c:123 #30 0x0820e569 in schedule () at kernel/sched.c:1600 #31 0x0809aa57 in do_wait (pid=-1, options=4, infop=0x0, stat_addr=0xbf501fc8, ru=0x0) at kernel/exit.c:1495 #32 0x0809ae61 in sys_wait4 (pid=-1, stat_addr=0xbf501fc8, options=0, ru=0x0) at kernel/exit.c:1571 #33 0x0809ae95 in sys_waitpid (pid=-1, stat_addr=0xbf501fc8, options=0) at kernel/exit.c:1586 #34 0x080647ca in handle_syscall (r=0x94f8918) at arch/um/kernel/skas/syscall.c:42 #35 0x080897be in userspace (regs=0x94f8918) at arch/um/os-Linux/skas/process.c:151 #36 0x08064458 in fork_handler (sig=10) at arch/um/kernel/skas/process_kern.c:96 #37 <signal handler called> #38 0x002ed402 in __kernel_vsyscall () #39 0x08089a99 in switch_threads (me=0x8e0db04, next=0xa0b3a34) at arch/um/os-Linux/skas/process.c:485 #40 0x0806428d in switch_to_skas (prev=0x8e0d620, next=0x9668100) at arch/um/kernel/skas/process_kern.c:35 #41 0x0805f8fa in _switch_to (prev=0x94f8bdc, next=0x9668100, last=0x8e0d620) at arch/um/kernel/process_kern.c:123 #42 0x0820e569 in schedule () at kernel/sched.c:1600 #43 0x0820f113 in schedule_timeout (timeout=Variable "timeout" is not available.) at kernel/timer.c:1136 #44 0x080f6ce5 in do_select (n=11, fds=0x8e13be0, timeout=0x8e13c38) at fs/select.c:276 #45 0x080f7007 in core_sys_select (n=11, inp=0xbf67f4d0, outp=0x0, exp=0x0, timeout=0x8e13c38) at fs/select.c:353 #46 0x080f715d in sys_select (n=11, inp=0xbf67f4d0, outp=0x0, exp=0x0, tvp=0xbf67f4c8) at fs/select.c:398 #47 0x080647ca in handle_syscall (r=0x8e0d7f8) at arch/um/kernel/skas/syscall.c:42 #48 0x080897be in userspace (regs=0x8e0d7f8) at arch/um/os-Linux/skas/process.c:151 #49 0x0806437f in new_thread_handler (sig=10) at arch/um/kernel/skas/process_kern.c:66 #50 <signal handler called> #51 0x002ed402 in __kernel_vsyscall () #52 0x00000000 in ?? () (gdb) |
From: Jeff D. <jd...@ad...> - 2006-05-17 17:42:53
|
On Wed, May 17, 2006 at 11:12:22AM -0600, Rennie deGraaf wrote: > but the UML > kernel seems to be spending some of its time in an idle loop and some in > the stack trace below. Are you positive that it's hung, and not just taking a while to flush out dirty data? The fact that you see either the idle loop or writeout code suggests that it's doing writeout and isn't hung. Jeff |
From: Rennie d. <ren...@gm...> - 2006-05-17 19:06:48
|
Jeff Dike wrote: > On Wed, May 17, 2006 at 11:12:22AM -0600, Rennie deGraaf wrote: > >>but the UML >>kernel seems to be spending some of its time in an idle loop and some in >>the stack trace below. > > > Are you positive that it's hung, and not just taking a while to flush > out dirty data? The fact that you see either the idle loop or writeout > code suggests that it's doing writeout and isn't hung. That had occurred to me, but if so, then it's not flushing properly, since I've left it for half an hour without it finishing, I only have a 250 MB swap file, and all I did was boot and halt immediately. Immediately before halting, free (on the UML system) reported only 528 kB of swap in use. The swap file is sparse; du reports that only 62560 kB are allocated. The swap device (as reported from inside UML) is /dev/ubdb. While this is going on, my CPU is pegged at 100%. The host system reports that the first linux process is using about 65%, and the third is using 25%. (Other processes on my system are using the remaining 10%.) top also reports that almost 50% of the CPU time is spent in the kernel (of the host system). If I bring my system up without swap (by not specifying ubd1 on the command line), it shuts down properly. I tried loading UML with the argument "mem=128M" (as opposed to not specifying "mem=..."); if it never used the swap file, it shut down properly, but once I wrote and ran a program to force the system to swap (free reported 692 kB of swap in use at shutdown), it hung up again. I discovered that running "swapoff -a" also hangs. I can still log into the system via ssh, so I obviously don't have a total lock-up. I can attach gdb to it, giving me the following stack trace: #0 0x40000802 in _dl_sysinfo_int80 () from /lib/ld-linux.so.2 #1 0x400daf42 in swapoff () from /lib/tls/libc.so.6 #2 0x080498aa in ?? () #3 0x0804f078 in ?? () #4 0xbf643698 in ?? () #5 0x400caa31 in getopt_long () from /lib/tls/libc.so.6 #6 0x0804a204 in ?? () #7 0x00000002 in ?? () #8 0xbf6456d4 in ?? () #9 0x0804d209 in _IO_stdin_used () #10 0x0804ec80 in ?? () #11 0x00000000 in ?? () (gdb) (Sorry, I didn't compile my system with debugging enabled.) When I try to resume the program after generating the stack trace, it immediately exits (with status -1), but swapon -s reports the same swap useage as before. Rennie |
From: Stephane B. <bor...@ni...> - 2006-05-18 11:58:52
|
On Wed, May 17, 2006 at 11:12:22AM -0600, Rennie deGraaf <ren...@gm...> wrote a message of 120 lines which said: > I'm trying to build a UML guest system based on Gentoo and the > 2.6.16.16 kernel. It seems to work, except that it always hangs on > shutdown at the message " * Deactivating swap ...". Exactly the same problem for me. |
From: Anthony B. <br...@st...> - 2006-05-19 15:56:05
|
> On Wed, May 17, 2006 at 11:12:22AM -0600, > Rennie deGraaf <ren...@gm...> wrote > a message of 120 lines which said: > > > I'm trying to build a UML guest system based on Gentoo and the > > 2.6.16.16 kernel. It seems to work, except that it always hangs on > > shutdown at the message " * Deactivating swap ...". > > Exactly the same problem for me. I have also seen this issue with two of my instances. However, it hasn't been a consistent issue. Also, I haven't seen the issue with other instances. I'm running a custom compiled version of 2.6.15.1-bs1 on all of instances in question. Tony |
From: Blaisorblade <bla...@ya...> - 2006-05-21 19:27:00
|
On Wednesday 17 May 2006 19:43, Jeff Dike wrote: > On Wed, May 17, 2006 at 11:12:22AM -0600, Rennie deGraaf wrote: > > but the UML > > kernel seems to be spending some of its time in an idle loop and some in > > the stack trace below. > Are you positive that it's hung, and not just taking a while to flush > out dirty data? The fact that you see either the idle loop or writeout > code suggests that it's doing writeout and isn't hung. When I do swapoff -a it also tends to hang (not the whole kernel, but the sys_swapoff code) - it frees most of the content and then keeps running to free the last few pages. Killing swapoff and retrying doesn't progress any further. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com |
From: Jeff D. <jd...@ad...> - 2006-05-21 21:59:23
|
On Sun, May 21, 2006 at 09:26:50PM +0200, Blaisorblade wrote: > When I do swapoff -a it also tends to hang (not the whole kernel, but the > sys_swapoff code) - it frees most of the content and then keeps running to > free the last few pages. Killing swapoff and retrying doesn't progress any > further. This suggests a hang in the I/O path - like a request being issued, but the process behind it never being notified of it finishing. Jeff |
From: Blaisorblade <bla...@ya...> - 2006-05-22 09:16:42
|
On Sunday 21 May 2006 23:59, Jeff Dike wrote: > On Sun, May 21, 2006 at 09:26:50PM +0200, Blaisorblade wrote: > > When I do swapoff -a it also tends to hang (not the whole kernel, but the > > sys_swapoff code) - it frees most of the content and then keeps running > > to free the last few pages. Killing swapoff and retrying doesn't progress > > any further. > > This suggests a hang in the I/O path - like a request being issued, but the > process behind it never being notified of it finishing. The stacktrace suggests that too (by memory), but why it progresses till a certain point and then becomes unable to do anything? There's something more involved - a lock on a page, or on a swap entry (but I don't know lock on swap entries), maybe? -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com |
From: Anthony B. <br...@st...> - 2006-05-22 19:52:43
|
Quoting Blaisorblade <bla...@ya...>: > The stacktrace suggests that too (by memory), but why it progresses till a > certain point and then becomes unable to do anything? There's something more > involved - a lock on a page, or on a swap entry (but I don't know lock on > swap entries), maybe? I don't have anything technical to add, but I suspect that I have been encountering this for a couple years. The symptom goes back to 2.6.7 (my first 2.6 kernel). However, it has stopped occurring with all but two of my production instances. Fortunately, this is relatively minor issue that I've designed my systems to work around (kill the guest instance if it hangs at this point). Tony |