From: Boaz H. <bha...@pa...> - 2012-03-13 22:39:07
|
Since a while now my UMLs are constantly crashing in __module_text_address which makes no sense because if I do gdb> list *(__module_text_address+0xd) I get: 0x6005614e is in __module_text_address (/media/usr0/export/dev/bharrosh/git/pub/linux-open-osd/kernel/module.c:3469). 3464 * module doesn't get freed during this. 3465 */ 3466 struct module *__module_text_address(unsigned long addr) 3467 { 3468 struct module *mod = __module_address(addr); 3469 if (mod) { 3470 /* Make sure it's within the text section. */ 3471 if (!within(addr, mod->module_init, mod->init_text_size) 3472 && !within(addr, mod->module_core, mod->core_text_size)) 3473 mod = NULL; It can not be crashing in line 3469, I suspect it's crashing inside __module_address(addr); at line 3468 Below it's crashing as part of the console operation, which is the most common one, but it can crash in __module_text_address as part of other stack traces like networking and so on. From my feel it's always related to some UML driver that actually operates as part of the host. But I can't be sure. I'm running with 3.3-rc4 but I'm hit with this since 3.0, I tried to bisect this at the time, but I found out that I could not find a perfectly good point even as far as 3.6.37. So I suspected there is something wrong with my uml-image file or my host. But now I upgraded both host and image to FC15 (was FC12/FC13) and I get the same exact crashes. It came to a situation that I can't complete any kind of heavy operation anymore and have abandoned UML for VMS for now. But I'm very sorry to see UML go. Can anyone help me with some insight on what I should try, to debug this thing. BTW: How to debug UML under gdb, it forks like mad and if I do: gdb> set detach-on-fork off It will just freeze. And if I do gdb> attach <some-vmlinux-child-process> (Try to attach to any but the top parent process) Will return "access not permitted". I guess UML is a debugger of sorts and it can't be double debugged. Thanks Boaz Kernel panic - not syncing: Kernel mode fault at addr 0x54, ip 0x6015c233 Modules linked in: md5 objlayoutdriver nfsd exofs exportfs libore async_xor async_tx xor cryptomgr aead crc32c crypto_hash crypto_algapi iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi osd scsi_mod libosd nfs lockd auth_rpcgss nfs_acl sunrpc ipv6 [last unloaded: scsi_wait_scan] Pid: 1223, comm: bash Not tainted 3.3.0-rc4-pnfs+ RIP: 0033:[<000000387aed78a3>] RSP: 0000007fbfee78c8 EFLAGS: 00000206 RAX: ffffffffffffffda RBX: 0000000000000000 RCX: ffffffffffffffff RDX: 0000007fbfee7890 RSI: 0000000000005403 RDI: 0000000000000000 RBP: 0000000000000000 R08: 0000007fbfee7940 R09: 0000000000000010 R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000002 R13: 00000000000000c8 R14: 0000000000000000 R15: 0000000000000000 Call Trace: 602ab598: [<6001697c>] panic_exit+0x2f/0x45 602ab5b8: [<60046678>] notifier_call_chain+0x32/0x5e 602ab5e8: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab5f8: [<600466c6>] atomic_notifier_call_chain+0x13/0x15 602ab608: [<601e90a4>] panic+0x112/0x1ea 602ab640: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab660: [<6005614e>] __module_text_address+0xd/0x56 602ab678: [<60059324>] is_module_text_address+0x9/0x11 602ab688: [<6004019b>] __kernel_text_address+0x21/0x47 602ab6a8: [<600154de>] show_trace+0x8e/0x95 602ab6b0: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab6d8: [<60028683>] show_regs+0x2b/0x30 602ab6f8: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab708: [<60016713>] segv_handler+0x0/0x81 602ab718: [<600603d1>] handle_irq_event_percpu+0xfd/0x119 602ab758: [<6006300f>] rcu_sched_qs+0x74/0x79 602ab7d8: [<6001678a>] segv_handler+0x77/0x81 602ab808: [<60013c26>] sigio_handler+0x58/0x5d 602ab828: [<60023e9d>] sig_handler_common+0x84/0x98 602ab890: [<60018efc>] line_chars_in_buffer+0x0/0x4c 602ab8b0: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab8d0: [<60016fe8>] virt_to_pte+0x4a/0x6a 602ab928: [<6001141c>] _einittext+0x1a0d/0x2c91 602ab938: [<60010760>] _einittext+0xd51/0x2c91 602aba18: [<6001141c>] _einittext+0x1a0d/0x2c91 602abb58: [<60023f8d>] sig_handler+0x2d/0x38 602abb78: [<60023bc3>] hard_handler+0x6b/0x9d 602abc48: [<60018efc>] line_chars_in_buffer+0x0/0x4c 602abc68: [<6015c233>] do_raw_spin_lock+0x12/0xdb |
From: Richard W. <ri...@no...> - 2012-03-13 23:59:13
|
Am 13.03.2012 23:38, schrieb Boaz Harrosh: > Since a while now my UMLs are constantly crashing in __module_text_address > > which makes no sense because if I do gdb> list *(__module_text_address+0xd) > I get: > > 0x6005614e is in __module_text_address (/media/usr0/export/dev/bharrosh/git/pub/linux-open-osd/kernel/module.c:3469). > 3464 * module doesn't get freed during this. > 3465 */ > 3466 struct module *__module_text_address(unsigned long addr) > 3467 { > 3468 struct module *mod = __module_address(addr); > 3469 if (mod) { > 3470 /* Make sure it's within the text section. */ > 3471 if (!within(addr, mod->module_init, mod->init_text_size) > 3472&& !within(addr, mod->module_core, mod->core_text_size)) > 3473 mod = NULL; > > It can not be crashing in line 3469, I suspect it's crashing inside __module_address(addr); at line 3468 > > Below it's crashing as part of the console operation, which is the most common one, but it can crash in > __module_text_address as part of other stack traces like networking and so on. From my feel it's always > related to some UML driver that actually operates as part of the host. But I can't be sure. > > I'm running with 3.3-rc4 but I'm hit with this since 3.0, I tried to bisect > this at the time, but I found out that I could not find a perfectly good point > even as far as 3.6.37. So I suspected there is something wrong with my uml-image file > or my host. But now I upgraded both host and image to FC15 (was FC12/FC13) and I get > the same exact crashes. It came to a situation that I can't complete any kind > of heavy operation anymore and have abandoned UML for VMS for now. But I'm > very sorry to see UML go. > > Can anyone help me with some insight on what I should try, to debug this thing. > What exactly triggers the crash? IOW, how can I reproduce it? > BTW: > How to debug UML under gdb, it forks like mad and if I do: > gdb> set detach-on-fork off > It will just freeze. And if I do > gdb> attach<some-vmlinux-child-process> > (Try to attach to any but the top parent process) > Will return "access not permitted". I guess UML is a debugger of sorts and it can't be > double debugged. e.g: $ gdb linux (gdb) handle SIGSEGV noprint nostop pass (gdb) set args <your kernel args> (gdb) run UML "forks like mad" because it creates an thread for each process within UML... Thanks, //richard |
From: Boaz H. <bha...@pa...> - 2012-03-14 00:15:53
|
On 03/13/2012 04:58 PM, Richard Weinberger wrote: > > What exactly triggers the crash? > IOW, how can I reproduce it? > It's totally random, but always the same crash. I guess if you don't have it then you don't. The most reliable way for me to get it is a simple "halt". I'm not able to ever shut down properly it reliably crashes like: Kernel panic - not syncing: Kernel mode fault at addr 0x54, ip 0x6015c233 Modules linked in: nfs lockd auth_rpcgss nfs_acl sunrpc ipv6 [last unloaded: scsi_wait_scan] Pid: 1210, comm: autofs Not tainted 3.3.0-rc4-pnfs+ RIP: 0033:[<000000387aed28d0>] RSP: 0000007fbf9876d8 EFLAGS: 00000246 RAX: ffffffffffffffda RBX: 0000000000000001 RCX: ffffffffffffffff RDX: 0000000000000001 RSI: 0000000040008000 RDI: 0000000000000001 RBP: 0000000040008000 R08: 00000000ffffffff R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000246 R12: 000000387b1928e0 R13: 0000000000000001 R14: 000000000093eb80 R15: 000000000093eb80 Call Trace: 602ab598: [<6001697c>] panic_exit+0x2f/0x45 602ab5b8: [<60046678>] notifier_call_chain+0x32/0x5e 602ab5e8: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab5f8: [<600466c6>] atomic_notifier_call_chain+0x13/0x15 602ab608: [<601e90a4>] panic+0x112/0x1ea 602ab640: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab660: [<6005614e>] __module_text_address+0xd/0x56 602ab678: [<60059324>] is_module_text_address+0x9/0x11 602ab688: [<6004019b>] __kernel_text_address+0x21/0x47 602ab6a8: [<600154de>] show_trace+0x8e/0x95 602ab6b0: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab6d8: [<60028683>] show_regs+0x2b/0x30 602ab6f8: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab708: [<60016713>] segv_handler+0x0/0x81 602ab718: [<600603d1>] handle_irq_event_percpu+0xfd/0x119 602ab758: [<6006300f>] rcu_sched_qs+0x74/0x79 602ab7d8: [<6001678a>] segv_handler+0x77/0x81 602ab808: [<60013c26>] sigio_handler+0x58/0x5d 602ab828: [<60023e9d>] sig_handler_common+0x84/0x98 602ab890: [<60018eb8>] line_write_room+0x0/0x44 602ab8b0: [<6015c233>] do_raw_spin_lock+0x12/0xdb 602ab8d0: [<6015435a>] prio_tree_insert+0x4c/0x23b 602ab928: [<6001141c>] _einittext+0x1a0d/0x2c91 602ab938: [<60010760>] _einittext+0xd51/0x2c91 602aba18: [<6001141c>] _einittext+0x1a0d/0x2c91 602abb58: [<60023f8d>] sig_handler+0x2d/0x38 602abb78: [<60023bc3>] hard_handler+0x6b/0x9d 602abc48: [<60018eb8>] line_write_room+0x0/0x44 602abc68: [<6015c233>] do_raw_spin_lock+0x12/0xdb I'd give anything to get your setup that works. Though I have everything by the letter: - Newly installed FC15 host - Copied over FC15 image from a KVM with the tty0 /etc fixes. - commandline: ./vmlinux ubd0=Fedora15-AMD64-root_fs-1 ubd1=swap_file-1 eth0=tuntap,,,172.17.132.231 mem=384M It'll crash every time >> BTW: >> How to debug UML under gdb, it forks like mad and if I do: >> gdb> set detach-on-fork off >> It will just freeze. And if I do >> gdb> attach<some-vmlinux-child-process> >> (Try to attach to any but the top parent process) >> Will return "access not permitted". I guess UML is a debugger of sorts and it can't be >> double debugged. > > e.g: > $ gdb linux > (gdb) handle SIGSEGV noprint nostop pass > (gdb) set args <your kernel args> > (gdb) run > > UML "forks like mad" because it creates an thread for each process > within UML... > OK then I have a problem because I get the "access not permitted" on attaching to any of these forks but the top most parent. (as sudo) > Thanks, > //richard Thanks for your reply Boaz |
From: Richard W. <ri...@no...> - 2012-03-14 00:22:24
|
Am 14.03.2012 01:15, schrieb Boaz Harrosh: > I guess if you don't have it then you don't. The most reliable way for me to get > it is a simple "halt". I'm not able to ever shut down properly it reliably crashes > like: > > Kernel panic - not syncing: Kernel mode fault at addr 0x54, ip 0x6015c233 What is at addr 0x6015c233? "addr2line -e vmlinux 0x6015c233" shows it. Does it also happen with a vanilla kernel? > > OK then I have a problem because I get the "access not permitted" on attaching to any of these > forks but the top most parent. (as sudo) You cannot attach to them because they are already being ptrace()'ed by the UML main thread. Thanks, //richard |
From: Boaz H. <bha...@pa...> - 2012-03-14 00:51:56
|
On 03/13/2012 05:22 PM, Richard Weinberger wrote: > Am 14.03.2012 01:15, schrieb Boaz Harrosh: >> I guess if you don't have it then you don't. The most reliable way for me to get >> it is a simple "halt". I'm not able to ever shut down properly it reliably crashes >> like: >> >> Kernel panic - not syncing: Kernel mode fault at addr 0x54, ip 0x6015c233 > > What is at addr 0x6015c233? > "addr2line -e vmlinux 0x6015c233" shows it. > linux-open-osd/arch/um/drivers/line.c:46 > Does it also happen with a vanilla kernel? > Yes, you mean git checkout v3.2 Yes it's the same all the way back to 2.6.39 or so I just did that I get the same as above: addr2line -e .build_um/vmlinux 0x600179b8 linux-open-osd/arch/um/drivers/line.c:46 >> >> OK then I have a problem because I get the "access not permitted" on attaching to any of these >> forks but the top most parent. (as sudo) > > You cannot attach to them because they are already being ptrace()'ed by > the UML main thread. right that what I thought. so if I make a break point that happened in another fork will it trigger still? I guess yes. But if anything goes wrong in any of the other forks, is there a way for me to break that fork into the debugger and see what happened like a bt? My current theory is that Fedora became UML unfriendly is there some random-exec-mem thingy I need to turn off? > > Thanks, > //richard Thanks richard for your help Boaz |
From: Stian S. <st...@ni...> - 2012-03-14 07:29:16
|
> Am 13.03.2012 23:38, schrieb Boaz Harrosh: >> Since a while now my UMLs are constantly crashing in >> __module_text_address >> >> which makes no sense because if I do gdb> list >> *(__module_text_address+0xd) >> I get:t >> >> 0x6005614e is in __module_text_address >> (/media/usr0/export/dev/bharrosh/git/pub/linux-open-osd/kernel/module.c:3469). >> 3464 * module doesn't get freed during this. >> 3465 */ >> 3466 struct module *__module_text_address(unsigned long addr) >> 3467 { >> 3468 struct module *mod = __module_address(addr); >> 3469 if (mod) { >> 3470 /* Make sure it's within the text section. */ >> 3471 if (!within(addr, mod->module_init, >> mod->init_text_size) >> 3472&& !within(addr, mod->module_core, mod->core_text_size)) >> 3473 mod = NULL; source listing vs where it crashes can be off aslong as you use CFLAGS with -On where n is greater than 0. So the first thing you can test is to compile UML with CFLAGS="-g -O0". It might be that it doesn't crash anymore then, if so, you can try to play with different gcc optimization flags in order to find out what triggers it. Other sources of the crash can be gcc bugs, or some special CPU optimization BUG (either in gcc or the UML source). Your backtrace also included spin-lock right before the panic, which could be the source of the issue. If I remember right, the kernel has some debug options that can be enabled for spin-locks. Stian Skjelstad |
From: Richard W. <ri...@no...> - 2012-03-14 08:44:58
|
On 14.03.2012 08:28, Stian Skjelstad wrote: >> Am 13.03.2012 23:38, schrieb Boaz Harrosh: >>> Since a while now my UMLs are constantly crashing in >>> __module_text_address >>> >>> which makes no sense because if I do gdb> list >>> *(__module_text_address+0xd) >>> I get:t >>> >>> 0x6005614e is in __module_text_address >>> (/media/usr0/export/dev/bharrosh/git/pub/linux-open-osd/kernel/module.c:3469). >>> 3464 * module doesn't get freed during this. >>> 3465 */ >>> 3466 struct module *__module_text_address(unsigned long addr) >>> 3467 { >>> 3468 struct module *mod = __module_address(addr); >>> 3469 if (mod) { >>> 3470 /* Make sure it's within the text section. */ >>> 3471 if (!within(addr, mod->module_init, >>> mod->init_text_size) >>> 3472&& !within(addr, mod->module_core, mod->core_text_size)) >>> 3473 mod = NULL; > > source listing vs where it crashes can be off aslong as you use CFLAGS > with -On where n is greater than 0. So the first thing you can test is to > compile UML with CFLAGS="-g -O0". It might be that it doesn't crash > anymore then, if so, you can try to play with different gcc optimization > flags in order to find out what triggers it. Building the kernel with -O0 is also dangerous. It relies on the fact that some functions need to be inlined in any case. Thanks, //richard |
From: Richard W. <ri...@no...> - 2012-03-14 08:23:37
|
Am 14.03.2012 01:51, schrieb Boaz Harrosh: > Yes, you mean git checkout v3.2 Yes it's the same all the way back to 2.6.39 or so > > I just did that I get the same as above: addr2line -e .build_um/vmlinux 0x600179b8 > linux-open-osd/arch/um/drivers/line.c:46 Please give this patch a try: https://lkml.org/lkml/2012/3/10/163 > My current theory is that Fedora became UML unfriendly is there some random-exec-mem thingy > I need to turn off? > No. UML tty driver is broken like hell and newer distors seem to trigger it. I thought only Fc16 with systemd is affected. But as you using Fc15, I was wrong... Thanks, //richard |
From: Boaz H. <bha...@pa...> - 2012-03-14 22:28:53
|
On 03/14/2012 01:23 AM, Richard Weinberger wrote: > Am 14.03.2012 01:51, schrieb Boaz Harrosh: >> Yes, you mean git checkout v3.2 Yes it's the same all the way back to 2.6.39 or so >> >> I just did that I get the same as above: addr2line -e .build_um/vmlinux 0x600179b8 >> linux-open-osd/arch/um/drivers/line.c:46 > > Please give this patch a try: > https://lkml.org/lkml/2012/3/10/163 > Yes with this patch I'm not crashing with halt anymore. Cheers. Let me test some more and see if I have other problems. I crashed once in some networking stack-trace but did not save it, let me see play with it some more. But surly it needs to go upstream. I'll run with it out of tree for now. >> My current theory is that Fedora became UML unfriendly is there some random-exec-mem thingy >> I need to turn off? >> > > No. UML tty driver is broken like hell and newer distors seem to trigger it. > I thought only Fc16 with systemd is affected. But as you using Fc15, I > was wrong... > I have an administrative question. (If I may). This came up now with my FC15 setups. (I did not have these problem with my old FC12 setup) In a default setup (make defconfig) I have: CONFIG_XTERM_CHAN=y CONFIG_CON_CHAN=xterm If I leave it at that then very early in the boot an xterm X11 window comes up the terminal is on the host (I see my host name and environment, not the UML one) and the boot process just stops. Even if I close the window the boot is just stuck I can only do "sudo killall vmlinux" from the host. If I change "CONFIG_CON_CHAN not set" or just "CONFIG_CON_CHAN=nothing" then the boot process complains but continues to a login prompt like I use to. Do you know how to fix that? Also in vi, it thinks I only have 25 lines. Do you know in FC where I say console have more lines, or better yet auto adopt to my host console. I guess it might be related to my first Q. I did not use to have these problems with my old setup > Thanks, > //richard Thanks a million, for being so patient with me Boaz |
From: Richard W. <ri...@no...> - 2012-03-14 22:40:40
|
Am 14.03.2012 23:28, schrieb Boaz Harrosh: > But surly it needs to go upstream. I'll run with it out of tree for now. As of now it's not ready for upstream. I'll submit it as -stable patch as soon as possible. > I have an administrative question. (If I may). This came up now with my FC15 > setups. (I did not have these problem with my old FC12 setup) > > In a default setup (make defconfig) I have: > > CONFIG_XTERM_CHAN=y > CONFIG_CON_CHAN=xterm > > If I leave it at that then very early in the boot an xterm X11 window comes up > the terminal is on the host (I see my host name and environment, not the UML one) > and the boot process just stops. Even if I close the window the boot is just stuck > I can only do "sudo killall vmlinux" from the host. > > If I change "CONFIG_CON_CHAN not set" or just "CONFIG_CON_CHAN=nothing" then the > boot process complains but continues to a login prompt like I use to. > > Do you know how to fix that? > > Also in vi, it thinks I only have 25 lines. Do you know in FC where I say console > have more lines, or better yet auto adopt to my host console. I guess it might be > related to my first Q. I did not use to have these problems with my old setup This is most likely because tty has still some problems. Can you please provide the exact kernel command line and output? Thanks, //richard |
From: Boaz H. <bha...@pa...> - 2012-05-25 07:35:31
Attachments:
0001-TTY-tty_port-questions.patch
|
On 03/15/2012 12:40 AM, Richard Weinberger wrote: > Am 14.03.2012 23:28, schrieb Boaz Harrosh: >> But surly it needs to go upstream. I'll run with it out of tree for now. > > As of now it's not ready for upstream. > I'll submit it as -stable patch as soon as possible. > Richard hi, I've been running with your tty patch for a while now, and was able to work. 3.3 Kernel Now based on 3.4 tree the patch no longer merges, too many changes. But am back to the same crashes as before. Do you have a rough patch for 3.4 base I can use. Attached is the original patch good for 3.3 based tree. Thanks in advance Boaz |
From: Richard R. W. <ri...@si...> - 2012-05-25 08:05:31
|
----- Ursprüngliche Mail ----- > I've been running with your tty patch for a while now, and was able > to work. 3.3 Kernel > > Now based on 3.4 tree the patch no longer merges, too many changes. > But am back to the same crashes as before. Do you have a rough > patch for 3.4 base I can use. Okay, I'll send an updated version. Currently the tty sub-system is heavily changed. Thanks, //richard |
From: Boaz H. <bha...@pa...> - 2012-05-25 09:20:06
|
On 05/25/2012 10:40 AM, Richard RW. Weinberger wrote: > ----- Ursprüngliche Mail ----- >> I've been running with your tty patch for a while now, and was able >> to work. 3.3 Kernel >> >> Now based on 3.4 tree the patch no longer merges, too many changes. >> But am back to the same crashes as before. Do you have a rough >> patch for 3.4 base I can use. > > Okay, I'll send an updated version. > Currently the tty sub-system is heavily changed. > Thanks. Yes I noticed bunch of stuff moved around nothing trivial that I could fix > Thanks, > //richard Whats keeping this patch from hitting mainline? to much ugliness? Is there a way to work without it (somehow) Thanks Boaz |
From: Richard W. <ri...@no...> - 2012-05-25 09:27:00
|
On 25.05.2012 11:19, Boaz Harrosh wrote: > Whats keeping this patch from hitting mainline? to much ugliness? > Is there a way to work without it (somehow) > UML's tty driver is a corner case. (It is a console but does not support virtual consoles, etc...) TTY folks are currently moving most drivers to the new tty_port interface. UML has also to use tty_port. But as of writing of the my tty-patch tty_port was not fully ready for consoles. Anyway, I'll checkout what's the current state of tty_port is and rewrite UML's TTY mess^Wdriver again. :) Thanks, //richard |