From: antoine <an...@na...> - 2005-09-13 23:10:53
|
Hello list, I am back testing things, some initial results: * Some of the latest kernels I've built for x86 stop early in the boot. Here is a 2.6.14-rc1 TT guest: read(255, "./kernel.bin root=/dev/ubda mem="..., 330) = 130 rt_sigprocmask(SIG_BLOCK, [INT CHLD], [], 8) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID| SIGCHLD, child_tidptr=0xb7e70928) = 922 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 rt_sigprocmask(SIG_BLOCK, [CHLD], [], 8) = 0 rt_sigaction(SIGINT, {0x8078320, [], 0}, {SIG_DFL}, 8) = 0 waitpid(-1, Checking for /proc/mm...found Checking for the skas3 patch in the host...found UML running in SKAS3 mode Checking PROT_EXEC mmap in /tmp...OK Kernel virtual memory size shrunk to 28311552 bytes [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0) = 922 rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, 0xbf8c7e6c, WNOHANG) = -1 ECHILD (No child processes) sigreturn() = ? (mask now []) rt_sigaction(SIGINT, {SIG_DFL}, {0x8078320, [], 0}, 8) = 0 rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 read(255, "", 330) = 0 exit_group(1) = ? # uname -a Linux localhost 2.6.13.1-skas3-v9-pre7 #3 Sat Sep 10 20:35:26 BST 2005 i686 AMD Athlon(tm) XP 3200+ unknown GNU/Linux What's this about shrinking vm size? (reducing the mem gets rid of this warning) - Google found some dead links. I also tried mode=tt and mode=skas0 with the same result. I've also had kernels booting up to the point of mounting root and then spinning at 100% cpu usage. * Next one: Not sure if I am supposed to be able to strace a TT kernel, but when I do (this is on another system that breaks) here is what I get (end of long log only). Kernel panic - not syncing: Kernel mode fault at addr 0x8c2420, ip 0x8c2420 [42949374.400000] ReiserFS: ubda: Using r5 hash to sort names [42949374.400000] VFS: Mounted root (reiserfs filesystem) readonly. waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGALRM}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGALRM) = 0 --- SIGCHLD (Child exited) @ 0 (0) --- waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGIO}], WSTOPPED) = 2037 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_CONT, 2037, 0, SIGIO) = 0 waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGUSR1}], WSTOPPED) = 2039 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_ATTACH, 2065, 0, 0) = 0 ptrace(PTRACE_CONT, 2065, 0, SIG_0) = 0 waitpid(2065, NULL, WSTOPPED) = 2065 --- SIGCHLD (Child exited) @ 0 (0) --- ptrace(PTRACE_GETREGS, 2039, 0, 0xbfdc2f50) = 0 kill(2039, SIGKILL) = 0 ptrace(PTRACE_KILL, 2039, 0, 0xbfdc2f50) = 0 [42949374.410000] Kernel panic - not syncing: Kernel mode fault at addr 0x1a8420, ip 0x1a8420 [42949374.410000] [42949374.410000] EIP: 0073:[<001a8420>] CPU: 0 Not tainted ESP: 007b:b022310c EFLAGS: 00010296 [42949374.410000] Not tainted [42949374.410000] EAX: a03bae68 EBX: 00000001 ECX: 00000005 EDX: 00000000 [42949374.410000] ESI: 00000008 EDI: b022354c EBP: b02233ec DS: 007b ES: 007b [42949374.410000] b0222c30: [<a0040d93>] show_regs+0x113/0x140 [42949374.410000] b0222c50: [<a001948c>] panic_exit+0x2c/0x50 [42949374.410000] b0222c60: [<a005471d>] notifier_call_chain+0x2d/0x50 [42949374.410000] b0222c80: [<a0044e72>] panic+0x72/0x110 [42949374.410000] b0222ca0: [<a00189b4>] segv+0x274/0x2b0 [42949374.410000] b0222d90: [<a0018c9e>] segv_handler+0x8e/0x90 [42949374.410000] b0222dc0: [<a001c297>] sig_handler_common_tt +0xb7/0x150 [42949374.410000] b0222e20: [<a003cd48>] sig_handler+0x18/0x20 [42949374.410000] b0222e30: [<001a8420>] 0x1a8420 [42949374.410000] b02233f0: [<a00166e2>] change_signals+0x62/0x90 [42949374.410000] b0223490: [<a0016742>] unblock_signals+0x12/0x20 [42949374.410000] b02234a0: [<a015260b>] generic_unplug_device +0x1b/0x20 [42949374.410000] b02234b0: [<a015262d>] blk_backing_dev_unplug +0x1d/0x20 [42949374.410000] b02234c0: [<a00878a2>] sync_buffer+0x42/0x50 [42949374.410000] b02234d0: [<a023a966>] __wait_on_bit+0x66/0x70 [42949374.410000] b02234f0: [<a023a9f4>] out_of_line_wait_on_bit +0x84/0x90 [42949374.410000] b0223580: [<a0087948>] __wait_on_buffer+0x38/0x40 [42949374.410000] b0223590: [<a00e357e>] search_by_key+0xee/0xe10 [42949374.410000] b02236d0: [<a00c979e>] search_by_entry_key+0x2e/0x230 [42949374.410000] b0223710: [<a00c9d60>] reiserfs_find_entry+0x90/0x130 [42949374.410000] b0223770: [<a00c9e7b>] reiserfs_lookup+0x7b/0x170 [42949374.410000] b0223860: [<a00940bc>] real_lookup+0xbc/0xe0 [42949374.410000] b0223880: [<a0094464>] do_lookup+0x94/0xa0 [42949374.410000] b02238b0: [<a0094c9c>] __link_path_walk+0x82c/0x1070 [42949374.410000] b02239d0: [<a0095522>] link_path_walk+0x42/0xf0 [42949374.410000] b0223a50: [<a00958c5>] path_lookup+0xa5/0x1e0 [42949374.410000] b0223ab0: [<a0090d28>] open_exec+0x28/0xf0 [42949374.410000] b0223b30: [<a0091e24>] do_execve+0x44/0x220 [42949374.410000] b0223b60: [<a00118d8>] execve1+0x38/0x80 [42949374.410000] b0223b90: [<a0011942>] um_execve+0x22/0x60 [42949374.410000] b0223bb0: [<a00111bc>] run_init_process+0x4c/0x80 [42949374.410000] b0223be0: [<a00112c4>] init+0xd4/0x170 [42949374.410000] b0223c00: [<a003ccf9>] run_kernel_thread+0x49/0x50 [42949374.410000] b0223cd0: [<a001a7cb>] new_thread_handler+0x14b/0x180 [42949374.410000] b0223d20: [<001a8420>] 0x1a8420 [42949374.410000] [42949374.410000] Failed to restore terminal state - errno = 1 tracing thread pid = 2033 # uname -a Linux mamba 2.6.12-skas3-v9-pre4 #2 Thu Jun 23 16:28:29 GMT i686 AMD Athlon(tm) XP 2000+ AuthenticAMD GNU/Linux I tried the same filesystem as ext3 but that made no difference. Guest is 2.6.14-rc1 Same kernel in skas3/skas0 works occasionally! But when it does not: [42949374.340000] VFS: Mounted root (ext3 filesystem) readonly. [42949384.250000] BUG: soft lockup detected on CPU#0! [42949384.250000] [42949384.250000] EIP: 0073:[<400007c0>] CPU: 0 Not tainted ESP: 007b:bffdde70 EFLAGS: 00000202 [42949384.250000] Not tainted [42949384.250000] EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 [42949384.250000] ESI: 00000000 EDI: 00000000 EBP: 00000000 DS: 007b ES: 007b [42949384.250000] b1b071f8: [<a0046574>] show_regs+0x214/0x220 [42949384.250000] b1b07228: [<a006e6d7>] softlockup_tick+0x57/0x60 [42949384.250000] b1b07248: [<a00582a7>] do_timer+0x47/0xd0 [42949384.250000] b1b07258: [<a00181c4>] um_timer+0x14/0x50 [42949384.250000] b1b07268: [<a006e873>] handle_IRQ_event+0x33/0x80 [42949384.250000] b1b07298: [<a006e915>] __do_IRQ+0x55/0xb0 [42949384.250000] b1b072c8: [<a0012180>] do_IRQ+0x30/0x40 [42949384.250000] b1b072d8: [<a0018113>] timer_irq+0x113/0x170 [42949384.250000] b1b07308: [<a00184d0>] timer_handler+0x70/0x90 [42949384.250000] b1b07328: [<a001fef3>] sig_handler_common_skas +0x93/0xf0 [42949384.250000] b1b07358: [<a0040d9c>] alarm_handler+0x5c/0x70 [42949384.250000] b1b07378: [<002e4420>] 0x2e4420 [42949384.250000] b1b07668: [<a0018d3c>] flush_tlb_kernel_range_common +0xbc/0x170 [42949384.250000] b1b07698: [<a0018f6e>] flush_tlb_kernel_vm+0x2e/0x30 [42949384.250000] b1b076a8: [<a0019508>] segv+0x258/0x2b0 [42949384.250000] b1b07798: [<a001984f>] segv_handler+0xaf/0x100 [42949384.250000] b1b077c8: [<a001fef3>] sig_handler_common_skas +0x93/0xf0 [42949384.250000] b1b077f8: [<a0040d35>] sig_handler+0x35/0x40 [42949384.250000] b1b07808: [<002e4420>] 0x2e4420 [42949384.250000] b1b07b20: [<a0143dc6>] snprintf+0x26/0x30 [42949384.250000] b1b07b40: [<a0019cfd>] set_cmdline+0x9d/0x100 [42949384.250000] b1b07b70: [<a001194b>] execve1+0x7b/0x80 [42949384.250000] b1b07ba0: [<a0011972>] um_execve+0x22/0x60 [42949384.250000] b1b07bc0: [<a00111bc>] run_init_process+0x4c/0x80 [42949384.250000] b1b07bf0: [<a00112b8>] init+0xc8/0x170 [42949384.250000] b1b07c10: [<a0040cc9>] run_kernel_thread+0x49/0x50 [42949384.250000] b1b07ce0: [<a001f593>] new_thread_handler+0xc3/0x120 [42949384.250000] b1b07d20: [<002e4420>] 0x2e4420 [42949384.250000] * Good points: pcap works really well. I just wished there was a way to easily figure out which libraries need to be included in the chroot to make it work (beyond lipcap) * Some other small issues: when building IPv6 & pcap, I get: /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/../../../libc.a(in6_addr.o)(.rodata+0x10): multiple definition of `in6addr_loopback' (This has been the case with the last few releases) Hope this helps, as usual - let me know what I can do to help Cheers Antoine |
From: antoine <an...@na...> - 2005-09-14 20:00:48
|
On some setups, when trying to bring up a pcap interface I get (in dmesg): [42949414.170000] dev_ip_addr - device not assigned an IP address [42949414.170000] pcap_open : pcap_compile failed - 'syntax error' [42949414.170000] device eth1 entered promiscuous mode [42949414.170000] dev_ip_addr - device not assigned an IP address [42949414.170000] pcap_open : pcap_compile failed - 'syntax error' Any clues? What's this syntax error? Thanks Antoine |
From: Blaisorblade <bla...@ya...> - 2005-09-14 20:08:25
|
On Wednesday 14 September 2005 22:19, antoine wrote: > On some setups, when trying to bring up a pcap interface I get (in > dmesg): > [42949414.170000] dev_ip_addr - device not assigned an IP address I'd care more to this one probably. Especially because you say "on some setups". It should refer to the host interface (don't know which one). > [42949414.170000] pcap_open : pcap_compile failed - 'syntax error' This could be bogus. > [42949414.170000] device eth1 entered promiscuous mode > [42949414.170000] dev_ip_addr - device not assigned an IP address > [42949414.170000] pcap_open : pcap_compile failed - 'syntax error' > > Any clues? What's this syntax error? > Thanks > Antoine It'd be on the PCAP filter expression (there's a parameter for that), assuming the message is correct. If you find no other explaination, I'll look if UML is passing a bogus string there - however check the rest first. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.messenger.yahoo.com |
From: antoine <an...@na...> - 2005-09-14 20:25:32
|
> > [42949414.170000] dev_ip_addr - device not assigned an IP address Yes, I was wondering about this one. The device has got an IP assigned and it is up... > I'd care more to this one probably. Especially because you say "on some > setups". It should refer to the host interface (don't know which one). > > [42949414.170000] pcap_open : pcap_compile failed - 'syntax error' > This could be bogus. Actually it isn't. This forced me to check my syntax... > > [42949414.170000] device eth1 entered promiscuous mode > > [42949414.170000] dev_ip_addr - device not assigned an IP address > > [42949414.170000] pcap_open : pcap_compile failed - 'syntax error' > It'd be on the PCAP filter expression (there's a parameter for that), assuming > the message is correct. > > If you find no other explaination, I'll look if UML is passing a bogus string > there - however check the rest first. Thanks. Found it, the startup scripts on the broken setups were passing a 'null' filter expression (from a database) rather than leaving it empty. Works fine now. Sorry for the line noise... Antoine |
From: Blaisorblade <bla...@ya...> - 2005-09-16 19:26:36
Attachments:
uml-fix-conflict-libc-ipv6
|
On Wednesday 14 September 2005 01:25, antoine wrote: > Hello list, > I am back testing things, some initial results: > * Some of the latest kernels I've built for x86 stop early in the boot. > Here is a 2.6.14-rc1 TT guest: > waitpid(-1, Checking for /proc/mm...found > Checking for the skas3 patch in the host...found > UML running in SKAS3 mode > Checking PROT_EXEC mmap in /tmp...OK > Kernel virtual memory size shrunk to 28311552 bytes This is running in SKAS3 mode - see message. However, the problem may be TT in the meaning you'd better probably disable it. > [{WIFEXITED(s) && WEXITSTATUS(s) == 1}], 0) = 922 > rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 > --- SIGCHLD (Child exited) @ 0 (0) --- > waitpid(-1, 0xbf8c7e6c, WNOHANG) = -1 ECHILD (No child processes) > sigreturn() = ? (mask now []) > rt_sigaction(SIGINT, {SIG_DFL}, {0x8078320, [], 0}, 8) = 0 > rt_sigprocmask(SIG_BLOCK, NULL, [], 8) = 0 > read(255, "", 330) = 0 > exit_group(1) = ? > # uname -a > Linux localhost 2.6.13.1-skas3-v9-pre7 #3 Sat Sep 10 20:35:26 BST 2005 > i686 AMD Athlon(tm) XP 3200+ unknown GNU/Linux > What's this about shrinking vm size? (reducing the mem gets rid of this > warning) By how much? If you tried to pass it 1G of mem it's ok it complains (when TT mode is enabled and HIGHMEM disabled it doesn't work - disabling TT should fix this). > - Google found some dead links. Hmm, problems with memory layout on the host. Likely disabling TT mode in compilation will avoid it. > I also tried mode=tt and mode=skas0 with the same result. > I've also had kernels booting up to the point of mounting root and then > spinning at 100% cpu usage. Had something like this here too, but some random fixes resolved that. However, will see. > * Next one: > Not sure if I am supposed to be able to strace a TT kernel, IIRC you are supposed to be able, just to get uninteresting things (because to get the real stuff you should use the ptrace proxy, via debug=<pid> thing. See the webpage about how to apply this to strace). > but when I > do (this is on another system that breaks) here is what I get (end of > long log only). > Kernel panic - not syncing: Kernel mode fault at addr 0x8c2420, ip > 0x8c2420 > [42949374.400000] ReiserFS: ubda: Using r5 hash to sort names > [42949374.400000] VFS: Mounted root (reiserfs filesystem) readonly. > waitpid(-1, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGALRM}], WSTOPPED) = > 2037 > [42949374.410000] Kernel panic - not syncing: Kernel mode fault at addr > 0x1a8420, ip 0x1a8420 > [42949374.410000] > [42949374.410000] EIP: 0073:[<001a8420>] CPU: 0 Not tainted ESP: > 007b:b022310c EFLAGS: 00010296 > [42949374.410000] Not tainted > [42949374.410000] EAX: a03bae68 EBX: 00000001 ECX: 00000005 EDX: > 00000000 > [42949374.410000] ESI: 00000008 EDI: b022354c EBP: b02233ec DS: 007b ES: > 007b Seems change_signal is the culprit, which is impossible. Or it re-enabled SIGSEGV, which is impossible too since when it's blocked the process is killed. > [42949374.410000] b02233f0: [<a00166e2>] change_signals+0x62/0x90 > [42949374.410000] b0223490: [<a0016742>] unblock_signals+0x12/0x20 > [42949374.410000] b02234a0: [<a015260b>] generic_unplug_device > +0x1b/0x20 > [42949374.410000] b02234b0: [<a015262d>] blk_backing_dev_unplug > +0x1d/0x20 > [42949374.410000] b02234c0: [<a00878a2>] sync_buffer+0x42/0x50 > [42949374.410000] b02234d0: [<a023a966>] __wait_on_bit+0x66/0x70 > [42949374.410000] b02234f0: [<a023a9f4>] out_of_line_wait_on_bit > +0x84/0x90 > [42949374.410000] b0223580: [<a0087948>] __wait_on_buffer+0x38/0x40 > [42949374.410000] b0223590: [<a00e357e>] search_by_key+0xee/0xe10 > [42949374.410000] b02236d0: [<a00c979e>] search_by_entry_key+0x2e/0x230 > [42949374.410000] b0223710: [<a00c9d60>] reiserfs_find_entry+0x90/0x130 > [42949374.410000] b0223770: [<a00c9e7b>] reiserfs_lookup+0x7b/0x170 > [42949374.410000] b0223860: [<a00940bc>] real_lookup+0xbc/0xe0 > [42949374.410000] b0223880: [<a0094464>] do_lookup+0x94/0xa0 > [42949374.410000] b02238b0: [<a0094c9c>] __link_path_walk+0x82c/0x1070 > [42949374.410000] b02239d0: [<a0095522>] link_path_walk+0x42/0xf0 > [42949374.410000] b0223a50: [<a00958c5>] path_lookup+0xa5/0x1e0 > [42949374.410000] b0223ab0: [<a0090d28>] open_exec+0x28/0xf0 > [42949374.410000] b0223b30: [<a0091e24>] do_execve+0x44/0x220 > [42949374.410000] b0223b60: [<a00118d8>] execve1+0x38/0x80 > [42949374.410000] b0223b90: [<a0011942>] um_execve+0x22/0x60 > [42949374.410000] b0223bb0: [<a00111bc>] run_init_process+0x4c/0x80 > [42949374.410000] b0223be0: [<a00112c4>] init+0xd4/0x170 > [42949374.410000] b0223c00: [<a003ccf9>] run_kernel_thread+0x49/0x50 > [42949374.410000] b0223cd0: [<a001a7cb>] new_thread_handler+0x14b/0x180 > [42949374.410000] b0223d20: [<001a8420>] 0x1a8420 > [42949374.410000] > [42949374.410000] Failed to restore terminal state - errno = 1 > tracing thread pid = 2033 > # uname -a > Linux mamba 2.6.12-skas3-v9-pre4 #2 Thu Jun 23 16:28:29 GMT i686 AMD > Athlon(tm) XP 2000+ AuthenticAMD GNU/Linux > I tried the same filesystem as ext3 but that made no difference. > Guest is 2.6.14-rc1 > Same kernel in skas3/skas0 works occasionally! But when it does not: > [42949374.340000] VFS: Mounted root (ext3 filesystem) readonly. > [42949384.250000] BUG: soft lockup detected on CPU#0! Does the box locks up thereafter? Jeff has been seeing these for some time, but IIRC the box didn't lock up. (Jeff, what's the actual situation?). Also, the warning is given when a certain thread, which has only this purpose, is not allowed to run for more than 10 seconds (see description in commit 8446f1d391f3d27e6bf9c43d4cbcdac0ca720417). I.e. a really bad load (or scheduler problems, including the fact we are not preemptible) could cause this. When this thread is started, this message is printed: softlockup thread 0 started up. > * Good points: > pcap works really well. > I just wished there was a way to easily figure out which libraries need > to be included in the chroot to make it work (beyond lipcap) Idea: try using ltrace with focus on dlopen (from libdl). > * Some other small issues: > when building IPv6 & pcap, I get: > /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/../../../libc.a(in6_addr.o)(.rodat >a+0x10): multiple definition of `in6addr_loopback' > (This has been the case with the last few releases) It's a name conflict with libc... ugh, means adding another -D like -Derrno=kernel_errno in the same place. It's trivial, but it's getting boring. Attached the fix. A question: is it ok to be mentioned in the patch changelog? It was suggested to do so to give a bit of reward to testers - just I don't know if I should or not, and secondly if I should put your email or not. > Hope this helps, as usual - let me know what I can do to help -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade |
From: antoine <an...@na...> - 2005-09-16 19:43:16
|
> > UML running in SKAS3 mode > > Checking PROT_EXEC mmap in /tmp...OK > > Kernel virtual memory size shrunk to 28311552 bytes > This is running in SKAS3 mode - see message. However, the problem may be TT in > the meaning you'd better probably disable it. I enabled both to make it easier to test each mode, oh well... Noted. > > What's this about shrinking vm size? (reducing the mem gets rid of this > > warning) > By how much? If you tried to pass it 1G of mem it's ok it complains (when TT > mode is enabled and HIGHMEM disabled it doesn't work - disabling TT should > fix this). HIGHMEM was disabled, but mem=512M so this shouldn't matter, right? It stopped complaining at about mem=200M (roughly IIRC) > > # uname -a > > Linux mamba 2.6.12-skas3-v9-pre4 #2 Thu Jun 23 16:28:29 GMT i686 AMD > > Athlon(tm) XP 2000+ AuthenticAMD GNU/Linux > > I tried the same filesystem as ext3 but that made no difference. > > Guest is 2.6.14-rc1 > > > Same kernel in skas3/skas0 works occasionally! But when it does not: > > > [42949374.340000] VFS: Mounted root (ext3 filesystem) readonly. > > [42949384.250000] BUG: soft lockup detected on CPU#0! > Does the box locks up thereafter? Jeff has been seeing these for some time, > but IIRC the box didn't lock up. (Jeff, what's the actual situation?). The host is fine... so far. > > * Good points: > > pcap works really well. > > I just wished there was a way to easily figure out which libraries need > > to be included in the chroot to make it work (beyond lipcap) > Idea: try using ltrace with focus on dlopen (from libdl). I'll do that and post the results. > > * Some other small issues: > > when building IPv6 & pcap, I get: > > /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/../../../libc.a(in6_addr.o)(.rodat > >a+0x10): multiple definition of `in6addr_loopback' > > > (This has been the case with the last few releases) > It's a name conflict with libc... ugh, means adding another -D like > -Derrno=kernel_errno in the same place. It's trivial, but it's getting boring. > Attached the fix. > > A question: is it ok to be mentioned in the patch changelog? It was suggested > to do so to give a bit of reward to testers - just I don't know if I should > or not, and secondly if I should put your email or not. Anything for credits ;-) BTW, I can't contribute patches (I can code in C, but not kernel code) - however I can provide free hosting, disk space, etc to all the uml hackers that ask. Feel free include my email, in the worst case it will train my spam filters! Antoine |
From: Blaisorblade <bla...@ya...> - 2005-09-17 15:43:39
|
On Friday 16 September 2005 22:12, antoine wrote: > > > UML running in SKAS3 mode > > > Checking PROT_EXEC mmap in /tmp...OK > > > Kernel virtual memory size shrunk to 28311552 bytes > > > > This is running in SKAS3 mode - see message. However, the problem may be > > TT in the meaning you'd better probably disable it. > I enabled both to make it easier to test each mode, oh well... Noted. > > > What's this about shrinking vm size? (reducing the mem gets rid of this > > > warning) > > By how much? If you tried to pass it 1G of mem it's ok it complains (when > > TT mode is enabled and HIGHMEM disabled it doesn't work - disabling TT > > should fix this). > > HIGHMEM was disabled, but mem=512M so this shouldn't matter, right? I've read from Jeff that the actual limit is 480M (some space must be saved for vmalloc(), i.e. kernel pageable memory). > It stopped complaining at about mem=200M (roughly IIRC) Well, that's too low, so yes, there's some problem. Especially because memory is shrunk down to 28M... Which is the host distro? Having a RH/Fc could be a cause (I don't know the current situation with these). > > > # uname -a > > > > Does the box locks up thereafter? Jeff has been seeing these for some > > time, but IIRC the box didn't lock up. (Jeff, what's the actual > > situation?). > The host is fine... so far. Sorry, was talking about UML. But I assume it locks up. Was SMP enabled? In that case, there's a couple of things in -bs1 (just published) which may help it. > > > * Good points: > > > pcap works really well. > > > I just wished there was a way to easily figure out which libraries need > > > to be included in the chroot to make it work (beyond lipcap) > > Idea: try using ltrace with focus on dlopen (from libdl). > I'll do that and post the results. > > > * Some other small issues: > > > when building IPv6 & pcap, I get: > > > /usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.6/../../../libc.a(in6_addr.o)(.r > > >odat a+0x10): multiple definition of `in6addr_loopback' > > > > > > (This has been the case with the last few releases) > > > Anything for credits ;-) > BTW, I can't contribute patches (I can code in C, but not kernel code) - > however I can provide free hosting, disk space, etc to all the uml > hackers that ask. > Feel free include my email, in the worst case it will train my spam > filters! > Antoine -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger: chiamate gratuite in tutto il mondo http://it.messenger.yahoo.com |
From: Antoine M. <an...@na...> - 2005-09-17 18:11:51
|
> > It stopped complaining at about mem=200M (roughly IIRC) > Well, that's too low, so yes, there's some problem. Especially because memory > is shrunk down to 28M... > > Which is the host distro? Having a RH/Fc could be a cause (I don't know the > current situation with these). That box is a Mandriva system, why would the distribution be relevant? I thought things like memory allocation would depend on the kernel (or glibc malloc?) - host kernel is 2.6.13.1 > > The host is fine... so far. > Sorry, was talking about UML. But I assume it locks up. Not always. > Was SMP enabled? No. > In > that case, there's a couple of things in -bs1 (just published) which may help > it. I'll try that. > > > > * Good points: > > > > pcap works really well. > > > > I just wished there was a way to easily figure out which libraries need > > > > to be included in the chroot to make it work (beyond lipcap) > > > > Idea: try using ltrace with focus on dlopen (from libdl). > > > I'll do that and post the results. Not tried it yet, but I found this which may be of interest - literally thousands of these stacktraces in the logs. I believe the interface that pcap was bound to was restarted - but I am not sure this is the cause. After that the interface refuses to be brought back up (inside the guest), only a guest reboot does it, and dmesg has this message too: [43003437.440000] dev_ip_addr - device not assigned an IP address [43003437.440000] pcap_dispatch failed - recvfrom: Network is down Here is the repeated message: [42977981.360000] Badness in local_bh_enable at kernel/softirq.c:140 [42977981.360000] a037f6c0: [<a001c4de>] dump_stack+0x1e/0x20 [42977981.360000] a037f6e0: [<a0053fcc>] local_bh_enable+0x6c/0x90 [42977981.360000] a037f710: [<a0230caa>] __dev_remove_pack+0x7a/0xa0 [42977981.360000] a037f720: [<a0293b93>] packet_notifier+0xd3/0x130 [42977981.360000] a037f750: [<a005df9e>] notifier_call_chain+0x1e/0x40 [42977981.360000] a037f780: [<a0231689>] dev_close+0x89/0xa0 [42977981.360000] a037f7b0: [<a0039058>] uml_net_interrupt+0x68/0x70 [42977981.360000] a037f7d0: [<a006e006>] handle_IRQ_event+0x36/0x90 [42977981.360000] a037f800: [<a006e0fc>] __do_IRQ+0x9c/0xf0 [42977981.360000] a037f820: [<a0016fdf>] do_IRQ+0x2f/0x40 [42977981.360000] a037f830: [<a0017534>] sigio_handler+0xd4/0x110 [42977981.360000] a037f860: [<a0023c78>] sig_handler_common_skas +0xa8/0x130 [42977981.360000] a037f890: [<a0040b9f>] sig_handler+0x2f/0x40 [42977981.360000] a037f8b0: [<ffffe420>] ifaddrs+0x5fa477e0/0x4 [42977981.360000] a037fbb0: [<a0019a3a>] default_idle+0x5a/0x80 [42977981.360000] a037fbe0: [<a0023724>] init_idle_skas+0x24/0x30 [42977981.360000] a037fbf0: [<a00015d1>] start_kernel+0x181/0x1c0 [42977981.360000] a037fc00: [<a002375b>] start_kernel_proc+0x2b/0x30 [42977981.360000] a037fc10: [<a00194bf>] run_kernel_thread+0x2f/0x40 [42977981.360000] a037fcd0: [<a0023401>] new_thread_handler+0xb1/0x110 [42977981.360000] a037fd20: [<ffffe420>] ifaddrs+0x5fa477e0/0x4 [42977981.360000] Another query, unrelated (I think): what does this mean? [42949410.920000] uml_net_start_xmit: failed(-1) Last, is it possible that there is a connection leak in the socket code? I've experienced problems running tomcat and ntop in uml guests (but not apache...), after a while the process are still running but do not respond to SYN packets, although netstat and lsof still list the processes as listening on the port - I think I'll prepare a root_fs instance for you to try out. The problem with ntop could be linked with the example above of the interface disappearing for a short while, and as for Java it could be many things.. Cheers Antoine |
From: Jeff D. <jd...@ad...> - 2005-09-17 19:47:53
|
On Sat, Sep 17, 2005 at 07:15:19PM +0100, Antoine Martin wrote: > Here is the repeated message: > [42977981.360000] Badness in local_bh_enable at kernel/softirq.c:140 > [42977981.360000] a037f6c0: [<a001c4de>] dump_stack+0x1e/0x20 > [42977981.360000] a037f6e0: [<a0053fcc>] local_bh_enable+0x6c/0x90 > [42977981.360000] a037f710: [<a0230caa>] __dev_remove_pack+0x7a/0xa0 > [42977981.360000] a037f720: [<a0293b93>] packet_notifier+0xd3/0x130 > [42977981.360000] a037f750: [<a005df9e>] notifier_call_chain+0x1e/0x40 > [42977981.360000] a037f780: [<a0231689>] dev_close+0x89/0xa0 > [42977981.360000] a037f7b0: [<a0039058>] uml_net_interrupt+0x68/0x70 Looks like dev_close shouldn't be called from an interrupt, since it enables interrupts. Jeff |
From: Antoine M. <an...@na...> - 2005-09-17 18:30:22
|
> Sorry, was talking about UML. But I assume it locks up. Was SMP enabled? In > that case, there's a couple of things in -bs1 (just published) which may help > it. Initial testing reveals (one just one x86 box so far), that the kernel stops booting right after: Checking PROT_EXEC mmap in /tmp...OK only when using: 2.6.12-bs11 2.6.13-bs1 but not using 2.6.12-bb12 Hope that helps... Antoine |
From: Blaisorblade <bla...@ya...> - 2005-09-18 11:31:49
|
On Saturday 17 September 2005 20:34, Antoine Martin wrote: > > Sorry, was talking about UML. But I assume it locks up. Was SMP enabled? > > In that case, there's a couple of things in -bs1 (just published) which > > may help it. > Initial testing reveals (one just one x86 box so far), that the kernel > stops booting right after: Add stderr=1 to see some the rest... > Checking PROT_EXEC mmap in /tmp...OK > only when using: > 2.6.12-bs11 > 2.6.13-bs1 > but not using 2.6.12-bb12 Since the only difference between -bb12 and -bs11 is SKAS0, in which modes are you running those kernels? Use the "UML running in XXX mode" message to make sure. > Hope that helps... > Antoine -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |