From: richard -r. w. <ric...@gm...> - 2011-05-19 17:00:41
|
Hi, Please CC also use...@li..., so you can reach much more UML users. :) 2011/5/19 Toralf Förster <tor...@gm...>: > I got a segfault as soon as I try to access the phpmyadmin web page of the UML > instance at https://<uml_hostname>/phpmyadmin/ : Hmm, strange. phpmyadmin works fine on my UML test bed. Can you bisect the issue? -- Thanks, //richard |
From: Toralf F. <tor...@gm...> - 2011-05-19 17:20:59
|
richard -rw- weinberger wrote at 19:00:35 > Hi, > > Please CC also use...@li..., > so you can reach much more UML users. :) > > 2011/5/19 Toralf Förster <tor...@gm...>: > > I got a segfault as soon as I try to access the phpmyadmin web page of > > the UML > > > instance at https://<uml_hostname>/phpmyadmin/ : > Hmm, strange. > phpmyadmin works fine on my UML test bed. > Can you bisect the issue? Errm, automatic bisecting doesn't work, b/c the issue can't be reproduced by a simple "wget https://..." - when I use konqueror - up to 6-10 times I'm asked to confirm a cookie or something else before the crash occures. And if I use "lynx -accept_all_cookies https://n22_uml/phpmyadmin/" then I'm able to login - so it has something to do with HTTP frames I suspected - but a shutdown of the UML instance wasn't possible too after such a try - probably something else then the HTTP frames itself triggers the issue ... In short - it is not phpmyadmin (3.4.0) itself, but it triggers the bug (in fact sometimes I even could see the login window within Firefox of the phpmyadmin site before the crash happened). Because therefore manual interaction is needed (or do you know an automated way for konqueror/ff/.... ?) at least it would be helpful if the bisecting could be narrowed doesn to a given path or somethign else. -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: richard -r. w. <ric...@gm...> - 2011-05-19 17:25:58
|
2011/5/19 Toralf Förster <tor...@gm...>: > Errm, automatic bisecting doesn't work, b/c the issue can't be reproduced by a > simple "wget https://..." - when I use konqueror - up to 6-10 times I'm asked > to confirm a cookie or something else before the crash occures. > > And if I use "lynx -accept_all_cookies https://n22_uml/phpmyadmin/" then I'm > able to login - so it has something to do with HTTP frames I suspected - but a > shutdown of the UML instance wasn't possible too after such a try - probably > something else then the HTTP frames itself triggers the issue ... > > In short - it is not phpmyadmin (3.4.0) itself, but it triggers the bug (in > fact sometimes I even could see the login window within Firefox of the > phpmyadmin site before the crash happened). > > Because therefore manual interaction is needed (or do you know an automated > way for konqueror/ff/.... ?) at least it would be helpful if the bisecting > could be narrowed doesn to a given path or somethign else. BTW: Haven’t you had such an issue a few months ago? Does it work without https? Maybe mod_ssl triggers the bug... -- Thanks, //richard |
From: Toralf F. <tor...@gm...> - 2011-05-19 20:18:31
|
richard -rw- weinberger wrote at 19:00:35 > Can you bisect the issue? tfoerste@n22 ~/devel/linux-2.6 $ git bisect bad 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c is the first bad commit commit 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c Author: Lai Jiangshan <la...@cn...> Date: Wed Dec 22 14:18:50 2010 +0800 futex,plist: Pass the real head of the priority list to plist_del() Some plist_del()s in kernel/futex.c are passed a faked head of the priority list. It does not fail because the current code does not require the real head in plist_del(). The current code of plist_del() just uses the head for checking, so it will not cause a bad result even when we use a faked head. But it is undocumented usage: /** * plist_del - Remove a @node from plist. * * @node: &struct plist_node pointer - entry to be removed * @head: &struct plist_head pointer - list head */ The document says that the @head is the "list head" head of the priority list. In futex code, several places use "plist_del(&q->list, &q->list.plist);", they pass a fake head. We need to fix them all. Thanks to Darren Hart for many suggestions. Acked-by: Darren Hart <dv...@li...> Signed-off-by: Lai Jiangshan <la...@cn...> LKML-Reference: <4D1...@cn...> Signed-off-by: Steven Rostedt <ro...@go...> :040000 040000 78d47de377f8da1c131007a17ca915fbd13f7ff6 ffac93205aaf22fda0667d6395c8da7c7bf692e4 M kernel -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: Steven R. <ro...@go...> - 2011-05-19 20:55:59
|
On Thu, May 19, 2011 at 10:18:16PM +0200, Toralf Förster wrote: > > richard -rw- weinberger wrote at 19:00:35 > > Can you bisect the issue? Is this bug fully reproducable? If not, then you may have had a git bisect good, when it should have been git bisect bad. The futex/plist should not be affecting rwsem. -- Steve > > tfoerste@n22 ~/devel/linux-2.6 $ git bisect bad > 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c is the first bad commit > commit 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c > Author: Lai Jiangshan <la...@cn...> > Date: Wed Dec 22 14:18:50 2010 +0800 > > futex,plist: Pass the real head of the priority list to plist_del() > > Some plist_del()s in kernel/futex.c are passed a faked head of the > priority list. > > It does not fail because the current code does not require the real head > in plist_del(). The current code of plist_del() just uses the head for > checking, > so it will not cause a bad result even when we use a faked head. > > But it is undocumented usage: > > /** > * plist_del - Remove a @node from plist. > * > * @node: &struct plist_node pointer - entry to be removed > * @head: &struct plist_head pointer - list head > */ > > The document says that the @head is the "list head" head of the priority > list. > > In futex code, several places use "plist_del(&q->list, &q->list.plist);", > they pass a fake head. We need to fix them all. > > Thanks to Darren Hart for many suggestions. > > Acked-by: Darren Hart <dv...@li...> > Signed-off-by: Lai Jiangshan <la...@cn...> > LKML-Reference: <4D1...@cn...> > Signed-off-by: Steven Rostedt <ro...@go...> > > :040000 040000 78d47de377f8da1c131007a17ca915fbd13f7ff6 > ffac93205aaf22fda0667d6395c8da7c7bf692e4 M kernel > > -- > MfG/Sincerely > Toralf Förster > pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to maj...@vg... > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ |
From: Toralf F. <tor...@gm...> - 2011-05-20 07:37:27
|
Steven Rostedt wrote at 22:43:43 > Is this bug fully reproducable? If not, then you may have had a git > bisect good, when it should have been git bisect bad. Yes, bisected it again to the same commit. Furthermore I explicitely checked out that revision - tested it - issue exists, reverted exactly that commit on top of the checked out tree and tested it again, issue went away. Then I recompiled the buggy version with CONFIG_DEBUG_INFO=y here's the output : ... Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b 08324b44: [<0829e78b>] dump_stack+0x22/0x24 08324b5c: [<0829e7f0>] panic+0x63/0x167 08324b84: [<080603d2>] segv+0x1e2/0x2b0 08324c3c: [<080604e1>] segv_handler+0x41/0x60 08324c5c: [<08070c54>] sig_handler_common+0x44/0xb0 08324cd8: [<08070e32>] sig_handler+0x42/0x50 08324ce8: [<0807106c>] handle_signal+0x5c/0xa0 08324d0c: [<08073408>] hard_handler+0x18/0x20 08324d1c: [<b7715400>] 0xb7715400 EIP: 0073:[<400008d2>] CPU: 0 Tainted: G W ESP: 007b:4ef22270 EFLAGS: 00200206 Tainted: G W EAX: ffffffda EBX: 081efe10 ECX: 00000081 EDX: 00000001 ESI: 083f6758 EDI: 081efe0c EBP: 080a88a8 DS: 007b ES: 007b 08324af8: [<080780bd>] show_regs+0xed/0x120 08324b14: [<0806071c>] panic_exit+0x2c/0x50 08324b24: [<0809fc1c>] notifier_call_chain+0x4c/0x70 08324b4c: [<0809fc93>] atomic_notifier_call_chain+0x23/0x30 08324b5c: [<0829e818>] panic+0x8b/0x167 08324b84: [<080603d2>] segv+0x1e2/0x2b0 08324c3c: [<080604e1>] segv_handler+0x41/0x60 08324c5c: [<08070c54>] sig_handler_common+0x44/0xb0 08324cd8: [<08070e32>] sig_handler+0x42/0x50 08324ce8: [<0807106c>] handle_signal+0x5c/0xa0 08324d0c: [<08073408>] hard_handler+0x18/0x20 08324d1c: [<b7715400>] 0xb7715400 The file /var/log/messages of the UML says : 2011-05-20T09:33:03.455+02:00 n22_uml kernel: ------------[ cut here ]------------ 2011-05-20T09:33:03.455+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60() 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd14: [<0829e78b>] dump_stack+0x22/0x24 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd2c: [<0808205a>] warn_slowpath_common+0x5a/0x80 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd54: [<080820a3>] warn_slowpath_null+0x23/0x30 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd64: [<080a9eb8>] wake_futex+0x28/0x60 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bd7c: [<080a9faf>] futex_wake+0xbf/0x100 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5bda4: [<080abb1d>] do_futex+0xcd/0x6c0 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5be08: [<080ac184>] sys_futex+0x74/0x140 2011-05-20T09:33:03.455+02:00 n22_uml kernel: 19e5be60: [<0807ffc1>] mm_release+0xd1/0x130 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5be8c: [<08083dad>] exit_mm+0x1d/0x100 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5beb8: [<08085b73>] do_exit+0xc3/0x660 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bf14: [<080861e9>] sys_exit+0x19/0x20 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bf20: [<08060d16>] handle_syscall+0xa6/0xb0 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bf68: [<08074cf1>] userspace+0x361/0x500 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bfe8: [<0805e0cb>] fork_handler+0x5b/0x70 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 19e5bffc: [<00000000>] 0x0 2011-05-20T09:33:03.457+02:00 n22_uml kernel: 2011-05-20T09:33:03.457+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473e8 ]--- 2011-05-20T09:33:03.831+02:00 n22_uml kernel: ------------[ cut here ]------------ 2011-05-20T09:33:03.831+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60() 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d14: [<0829e78b>] dump_stack+0x22/0x24 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d2c: [<0808205a>] warn_slowpath_common+0x5a/0x80 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d54: [<080820a3>] warn_slowpath_null+0x23/0x30 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d64: [<080a9eb8>] wake_futex+0x28/0x60 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99d7c: [<080a9faf>] futex_wake+0xbf/0x100 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99da4: [<080abb1d>] do_futex+0xcd/0x6c0 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99e08: [<080ac184>] sys_futex+0x74/0x140 2011-05-20T09:33:03.831+02:00 n22_uml kernel: 19d99e60: [<0807ffc1>] mm_release+0xd1/0x130 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99e8c: [<08083dad>] exit_mm+0x1d/0x100 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99eb8: [<08085b73>] do_exit+0xc3/0x660 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99f14: [<080861e9>] sys_exit+0x19/0x20 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99f20: [<08060d16>] handle_syscall+0xa6/0xb0 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99f68: [<08074cf1>] userspace+0x361/0x500 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99fe8: [<0805e0cb>] fork_handler+0x5b/0x70 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 19d99ffc: [<00000000>] 0x0 2011-05-20T09:33:03.832+02:00 n22_uml kernel: 2011-05-20T09:33:03.832+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473e9 ]--- 2011-05-20T09:33:03.951+02:00 n22_uml kernel: ------------[ cut here ]------------ 2011-05-20T09:33:03.951+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60() 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bd78: [<0829e78b>] dump_stack+0x22/0x24 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bd90: [<0808205a>] warn_slowpath_common+0x5a/0x80 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bdb8: [<080820a3>] warn_slowpath_null+0x23/0x30 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bdc8: [<080a9eb8>] wake_futex+0x28/0x60 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bde0: [<080ab702>] futex_requeue+0x362/0x6b0 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5be64: [<080abceb>] do_futex+0x29b/0x6c0 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bec8: [<080ac184>] sys_futex+0x74/0x140 2011-05-20T09:33:03.951+02:00 n22_uml kernel: 19e5bf20: [<08060d16>] handle_syscall+0xa6/0xb0 2011-05-20T09:33:03.955+02:00 n22_uml kernel: 19e5bf68: [<08074cf1>] userspace+0x361/0x500 2011-05-20T09:33:03.955+02:00 n22_uml kernel: 19e5bfe8: [<0805e0cb>] fork_handler+0x5b/0x70 2011-05-20T09:33:03.955+02:00 n22_uml kernel: 19e5bffc: [<00000000>] 0x0 2011-05-20T09:33:03.955+02:00 n22_uml kernel: 2011-05-20T09:33:03.955+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473ea ]--- 2011-05-20T09:33:04.000+02:00 n22_uml sshd[738]: Server listening on 0.0.0.0 port 22. 2011-05-20T09:33:06.100+02:00 n22_uml kernel: ------------[ cut here ]------------ 2011-05-20T09:33:06.100+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60() 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d14: [<0829e78b>] dump_stack+0x22/0x24 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d2c: [<0808205a>] warn_slowpath_common+0x5a/0x80 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d54: [<080820a3>] warn_slowpath_null+0x23/0x30 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d64: [<080a9eb8>] wake_futex+0x28/0x60 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0d7c: [<080a9faf>] futex_wake+0xbf/0x100 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0da4: [<080abb1d>] do_futex+0xcd/0x6c0 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0e08: [<080ac184>] sys_futex+0x74/0x140 2011-05-20T09:33:06.100+02:00 n22_uml kernel: 19ef0e60: [<0807ffc1>] mm_release+0xd1/0x130 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0e8c: [<08083dad>] exit_mm+0x1d/0x100 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0eb8: [<08085b73>] do_exit+0xc3/0x660 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0f14: [<080861e9>] sys_exit+0x19/0x20 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0f20: [<08060d16>] handle_syscall+0xa6/0xb0 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0f68: [<08074cf1>] userspace+0x361/0x500 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0fe8: [<0805e0cb>] fork_handler+0x5b/0x70 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 19ef0ffc: [<00000000>] 0x0 2011-05-20T09:33:06.104+02:00 n22_uml kernel: 2011-05-20T09:33:06.104+02:00 n22_uml kernel: ---[ end trace 95fb08f635a473eb ]--- 2011-05-20T09:33:09.000+02:00 n22_uml cron[851]: (CRON) STARTUP (V5.0) 2011-05-20T09:33:10.112+02:00 n22_uml kernel: Virtual console 1 assigned device '/dev/pts/5' > > The futex/plist should not be affecting rwsem. > > -- Steve > > > tfoerste@n22 ~/devel/linux-2.6 $ git bisect bad > > 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c is the first bad commit > > commit 2e12978a9f7a7abd54e8eb9ce70a7718767b8b2c > > Author: Lai Jiangshan <la...@cn...> > > Date: Wed Dec 22 14:18:50 2010 +0800 > > > > futex,plist: Pass the real head of the priority list to plist_del() > > > > Some plist_del()s in kernel/futex.c are passed a faked head of the > > priority list. > > > > It does not fail because the current code does not require the real > > head in plist_del(). The current code of plist_del() just uses the > > head for > > > > checking, > > > > so it will not cause a bad result even when we use a faked head. > > > > But it is undocumented usage: > > > > /** > > > > * plist_del - Remove a @node from plist. > > * > > * @node: &struct plist_node pointer - entry to be removed > > * @head: &struct plist_head pointer - list head > > */ > > > > The document says that the @head is the "list head" head of the > > priority > > > > list. > > > > In futex code, several places use "plist_del(&q->list, > > &q->list.plist);", they pass a fake head. We need to fix them all. > > > > Thanks to Darren Hart for many suggestions. > > > > Acked-by: Darren Hart <dv...@li...> > > Signed-off-by: Lai Jiangshan <la...@cn...> > > LKML-Reference: <4D1...@cn...> > > Signed-off-by: Steven Rostedt <ro...@go...> > > : > > :040000 040000 78d47de377f8da1c131007a17ca915fbd13f7ff6 > > > > ffac93205aaf22fda0667d6395c8da7c7bf692e4 M kernel -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: richard -r. w. <ric...@gm...> - 2011-05-20 07:56:10
|
2011/5/20 Toralf Förster <tor...@gm...>: > ... > Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b Looks like a NULL-pointer bug. What code is at address 80a9f6b? Use "objdump -d -S | less" to find it. Please note, kernel binary and log message have to match! > The file /var/log/messages of the UML says : > > 2011-05-20T09:33:03.455+02:00 n22_uml kernel: ------------[ cut here ]------------ > 2011-05-20T09:33:03.455+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60() Is this really 2.6.39? Line 789 contains no WARN*(). http://lxr.linux.no/#linux+v2.6.39/kernel/futex.c#L789 -- Thanks, //richard |
From: Darren H. <dv...@li...> - 2011-05-20 16:30:43
|
On 05/20/2011 12:56 AM, richard -rw- weinberger wrote: > 2011/5/20 Toralf Förster <tor...@gm...>: >> ... >> Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b > > Looks like a NULL-pointer bug. > What code is at address 80a9f6b? > Use "objdump -d -S | less" to find it. > Please note, kernel binary and log message have to match! > >> The file /var/log/messages of the UML says : >> >> 2011-05-20T09:33:03.455+02:00 n22_uml kernel: ------------[ cut here ]------------ >> 2011-05-20T09:33:03.455+02:00 n22_uml kernel: WARNING: at kernel/futex.c:789 wake_futex+0x28/0x60() > > Is this really 2.6.39? > Line 789 contains no WARN*(). > http://lxr.linux.no/#linux+v2.6.39/kernel/futex.c#L789 > I suspect Toralf is hitting the WARN_ON in __unqueue_futex: if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) || plist_node_empty(&q->list))) Toralf, can you instrument that let us know which of conditions is triggering the WARN_ON? Something like the following should be adequate to get you the line number. I suspect it is plist_node_empty give the git bisect results you reported. diff --git a/kernel/futex.c b/kernel/futex.c index abd5324..7f31bca 100644 --- a/kernel/futex.c +++ b/kernel/futex.c @@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q) { struct futex_hash_bucket *hb; - if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) - || plist_node_empty(&q->list))) + if (WARN_ON(!q->lock_ptr)) + return; + if (!spin_is_locked(q->lock_ptr)) + return; + if (plist_node_empty(&q->list)) return; hb = container_of(q->lock_ptr, struct futex_hash_bucket, lock); -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel |
From: richard -r. w. <ric...@gm...> - 2011-05-20 06:44:47
|
Hi, a few more questions/ideas. :) 2011/5/19 Toralf Förster <tor...@gm...>: > FWIW I got : > > * Starting local What is this "Starting local", was UML crashing while starting your distro? > Kernel panic - not syncing: Segfault with no mm > 08335ed4: [<082b0b3b>] dump_stack+0x22/0x24 > 08335eec: [<082b0ba0>] panic+0x63/0x167 > 08335f14: [<080614af>] segv+0x27f/0x2f0 > 08335fcc: [<08061561>] segv_handler+0x41/0x60 > 08335fec: [<08071da4>] sig_handler_common+0x44/0xb0 > > > EIP: 0000:[<00000000>] CPU: 0 Not tainted EFLAGS: 00000000 > Not tainted > EAX: 00000000 EBX: 00000000 ECX: 00000000 EDX: 00000000 > ESI: 00000000 EDI: 00000000 EBP: 00000000 DS: 0000 ES: 0000 > 08335e88: [<0807935d>] show_regs+0xed/0x120 > 08335ea4: [<0806179c>] panic_exit+0x2c/0x50 > 08335eb4: [<080a2b9c>] notifier_call_chain+0x4c/0x70 > 08335edc: [<080a2c13>] atomic_notifier_call_chain+0x23/0x30 > 08335eec: [<082b0bc8>] panic+0x8b/0x167 > 08335f14: [<080614af>] segv+0x27f/0x2f0 > 08335fcc: [<08061561>] segv_handler+0x41/0x60 > 08335fec: [<08071da4>] sig_handler_common+0x44/0xb0 > > > and gdb gives in another session to reproduce the bug this: > > (gdb) c > Continuing. GDB stopped here and UML got SIGSEGV after you continued? GDB has to ignore SIGSEGV. UML uses this signal to handle page faults. type: handle SIGSEGV noprint nostop pass I fear the backtrace is garbage. Can you reproduce the issue using the default config? Are you using hostfs? What exactly is the output when it crashes? (Without GDB) Your host's kernel ring buffer should contain a line like this one after the crash: linux[123]: segfault at 0 ip xxx sp xxx error 4 in linux[xxx+yyy] Please share this line with me. -- Thanks, //richard |
From: Toralf F. <tor...@gm...> - 2011-05-20 07:43:20
|
richard -rw- weinberger wrote at 08:44:39 > > * Starting local > > What is this "Starting local", > was UML crashing while starting your distro? No, that the last init script of the Gentoo, I'm using within UML. > I fear the backtrace is garbage. yep, maybe this helps more : Program received signal SIGSEGV, Segmentation fault. 0x080a9f6b in futex_wake (uaddr=<value optimized out>, flags=<value optimized out>, nr_wake=<value optimized out>, bitset=4294967295) at kernel/futex.c:958 958 plist_for_each_entry_safe(this, next, head, list) { (gdb) bt #0 0x080a9f6b in futex_wake (uaddr=<value optimized out>, flags=<value optimized out>, nr_wake=<value optimized out>, bitset=4294967295) at kernel/futex.c:958 #1 0x080abb1d in do_futex (uaddr=0x81efe10, op=0, val=0, timeout=0x0, uaddr2=0x81efe0c, val2=0, val3=4294967295) at kernel/futex.c:2610 #2 0x080ac184 in sys_futex (uaddr=0x81efe10, op=129, val=1, utime=0x83f89b8, uaddr2=0x81efe0c, val3=134908072) at kernel/futex.c:2678 #3 0x08060d16 in handle_syscall (r=0x19545290) at arch/um/kernel/skas/syscall.c:35 #4 0x08074cf1 in handle_trap (regs=0x19545290) at arch/um/os-Linux/skas/process.c:201 #5 userspace (regs=0x19545290) at arch/um/os-Linux/skas/process.c:417 #6 0x0805e0cb in fork_handler () at arch/um/kernel/process.c:181 #7 0x00000000 in ?? () > > Can you reproduce the issue using the default config? yes > Are you using hostfs? Yes, but the issue is independend from that. -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: richard -r. w. <ric...@gm...> - 2011-05-20 08:39:10
|
2011/5/20 richard -rw- weinberger <ric...@gm...>: > Use "objdump -d -S | less" to find it. Ick, I meant "objdump -d -S vmlinux | less" -- Thanks, //richard |
From: Toralf F. <tor...@gm...> - 2011-05-20 08:42:25
|
richard -rw- weinberger wrote at 09:56:02 > 2011/5/20 Toralf Förster <tor...@gm...>: > > ... > > Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b > > Looks like a NULL-pointer bug. > What code is at address 80a9f6b? > Use "objdump -d -S | less" to find it. if (unlikely(ret != 0)) 80a9f3a: 85 c0 test %eax,%eax 80a9f3c: 75 ca jne 80a9f08 <futex_wake+0x18> goto out; hb = hash_futex(&key); 80a9f3e: 8d 45 e8 lea -0x18(%ebp),%eax 80a9f41: e8 aa f6 ff ff call 80a95f0 <hash_futex> 80a9f46: 89 c2 mov %eax,%edx spin_lock(&hb->lock); head = &hb->chain; plist_for_each_entry_safe(this, next, head, list) { 80a9f48: 8b 48 08 mov 0x8(%eax),%ecx 80a9f4b: 83 c2 08 add $0x8,%edx 80a9f4e: 8d 41 f4 lea -0xc(%ecx),%eax 80a9f51: 39 ca cmp %ecx,%edx 80a9f53: 8b 70 0c mov 0xc(%eax),%esi 80a9f56: 74 6a je 80a9fc2 <futex_wake+0xd2> 80a9f58: 89 d9 mov %ebx,%ecx 80a9f5a: 83 ee 0c sub $0xc,%esi 80a9f5d: 89 d3 mov %edx,%ebx 80a9f5f: 89 fa mov %edi,%edx 80a9f61: 89 cf mov %ecx,%edi 80a9f63: eb 12 jmp 80a9f77 <futex_wake+0x87> 80a9f65: 8d 76 00 lea 0x0(%esi),%esi 80a9f68: 8d 46 0c lea 0xc(%esi),%eax 80a9f6b: 8b 4e 0c mov 0xc(%esi),%ecx 80a9f6e: 39 c3 cmp %eax,%ebx 80a9f70: 74 4e je 80a9fc0 <futex_wake+0xd0> 80a9f72: 89 f0 mov %esi,%eax 80a9f74: 8d 71 f4 lea -0xc(%ecx),%esi if (match_futex (&this->key, &key)) { 80a9f77: 83 f8 e4 cmp $0xffffffe4,%eax 80a9f7a: 74 ec je 80a9f68 <futex_wake+0x78> 80a9f7c: 8b 48 1c mov 0x1c(%eax),%ecx 80a9f7f: 3b 4d e8 cmp -0x18(%ebp),%ecx 80a9f82: 75 e4 jne 80a9f68 <futex_wake+0x78> /* * Return 1 if two futex_keys are equal, 0 otherwise. */ > Is this really 2.6.39? No, but I didn't want to change the subject line, the bisected version is : v2.6.38-rc8-1-g2e12978 -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: Toralf F. <tor...@gm...> - 2011-05-20 08:58:30
|
richard -rw- weinberger wrote at 10:39:03 > 2011/5/20 richard -rw- weinberger <ric...@gm...>: > > Use "objdump -d -S | less" to find it. > > Ick, I meant "objdump -d -S vmlinux | less" Well (BTW nearly similar to the result of "objdump -d -S linux" ), but anyway here it is : spin_lock(&hb->lock); head = &hb->chain; plist_for_each_entry_safe(this, next, head, list) { 80a9f48: 8b 48 08 mov 0x8(%eax),%ecx 80a9f4b: 83 c2 08 add $0x8,%edx 80a9f4e: 8d 41 f4 lea -0xc(%ecx),%eax 80a9f51: 39 ca cmp %ecx,%edx 80a9f53: 8b 70 0c mov 0xc(%eax),%esi 80a9f56: 74 6a je 80a9fc2 <futex_wake+0xd2> 80a9f58: 89 d9 mov %ebx,%ecx 80a9f5a: 83 ee 0c sub $0xc,%esi 80a9f5d: 89 d3 mov %edx,%ebx 80a9f5f: 89 fa mov %edi,%edx 80a9f61: 89 cf mov %ecx,%edi 80a9f63: eb 12 jmp 80a9f77 <futex_wake+0x87> 80a9f65: 8d 76 00 lea 0x0(%esi),%esi 80a9f68: 8d 46 0c lea 0xc(%esi),%eax 80a9f6b: 8b 4e 0c mov 0xc(%esi),%ecx 80a9f6e: 39 c3 cmp %eax,%ebx 80a9f70: 74 4e je 80a9fc0 <futex_wake+0xd0> 80a9f72: 89 f0 mov %esi,%eax 80a9f74: 8d 71 f4 lea -0xc(%ecx),%esi if (match_futex (&this->key, &key)) { 80a9f77: 83 f8 e4 cmp $0xffffffe4,%eax 80a9f7a: 74 ec je 80a9f68 <futex_wake+0x78> 80a9f7c: 8b 48 1c mov 0x1c(%eax),%ecx 80a9f7f: 3b 4d e8 cmp -0x18(%ebp),%ecx 80a9f82: 75 e4 jne 80a9f68 <futex_wake+0x78> /* * Return 1 if two futex_keys are equal, 0 otherwise. */ -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: richard -r. w. <ric...@gm...> - 2011-05-20 09:02:46
|
2011/5/20 Toralf Förster <tor...@gm...>: >> Ick, I meant "objdump -d -S vmlinux | less" > Well (BTW nearly similar to the result of "objdump -d -S linux" ), but anyway > here it is : I hope it's _exactly_ the same result. vmlinux and linux are hard linked... -- Thanks, //richard |
From: Toralf F. <tor...@gm...> - 2011-05-20 09:19:45
|
richard -rw- weinberger wrote at 11:02:35 > I hope it's _exactly_ the same result. vmlinux and linux are hard linked... tfoerste@n22 ~/devel/linux-2.6 $ ls -lid vmlinux linux 2034205 -rwxr-xr-x 2 tfoerste users 35624874 May 20 09:28 linux 2034205 -rwxr-xr-x 2 tfoerste users 35624874 May 20 09:28 vmlinux -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: Steven R. <ro...@go...> - 2011-05-20 16:22:55
|
On Fri, 2011-05-20 at 12:04 -0400, Steven Rostedt wrote: > On Fri, 2011-05-20 at 08:55 -0700, Darren Hart wrote: > > > I suspect Toralf is hitting the WARN_ON in __unqueue_futex: > > > > if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) > > || plist_node_empty(&q->list))) > > > > Toralf, can you instrument that let us know which of conditions is > > triggering the WARN_ON? Something like the following should be adequate > > to get you the line number. I suspect it is plist_node_empty give the > > git bisect results you reported. > > > > > > diff --git a/kernel/futex.c b/kernel/futex.c > > index abd5324..7f31bca 100644 > > --- a/kernel/futex.c > > +++ b/kernel/futex.c > > @@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q) > > { > > struct futex_hash_bucket *hb; > > > > - if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) > > - || plist_node_empty(&q->list))) > > + if (WARN_ON(!q->lock_ptr)) > > + return; > > + if (!spin_is_locked(q->lock_ptr)) > > + return; > > + if (plist_node_empty(&q->list)) > > return; > > > > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in? > > I think UML is UP, and that spin_is_locked() will always return false. > Could you apply these patches: 2092e6be WARN_ON_SMP(): Allow use in if() statements on UP 29096202 futex: Fix WARN_ON() test for UP On top of this commit, and see if the problem goes away. What could have happened, is that you have two bugs, with one of them fixed. If the git bisect stumbled on this bug, it will show this one, even though later on, this code was fixed. If you apply the above two patches and it works again, then this isn't the bug you are looking for. -- Steve |
From: Steven R. <ro...@go...> - 2011-05-20 16:22:54
|
On Fri, 2011-05-20 at 08:55 -0700, Darren Hart wrote: > I suspect Toralf is hitting the WARN_ON in __unqueue_futex: > > if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) > || plist_node_empty(&q->list))) > > Toralf, can you instrument that let us know which of conditions is > triggering the WARN_ON? Something like the following should be adequate > to get you the line number. I suspect it is plist_node_empty give the > git bisect results you reported. > > > diff --git a/kernel/futex.c b/kernel/futex.c > index abd5324..7f31bca 100644 > --- a/kernel/futex.c > +++ b/kernel/futex.c > @@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q) > { > struct futex_hash_bucket *hb; > > - if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) > - || plist_node_empty(&q->list))) > + if (WARN_ON(!q->lock_ptr)) > + return; > + if (!spin_is_locked(q->lock_ptr)) > + return; > + if (plist_node_empty(&q->list)) > return; > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in? I think UML is UP, and that spin_is_locked() will always return false. -- Steve |
From: Darren H. <dv...@li...> - 2011-05-20 18:10:18
|
On 05/20/2011 09:04 AM, Steven Rostedt wrote: > On Fri, 2011-05-20 at 08:55 -0700, Darren Hart wrote: > >> I suspect Toralf is hitting the WARN_ON in __unqueue_futex: >> >> if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) >> || plist_node_empty(&q->list))) >> >> Toralf, can you instrument that let us know which of conditions is >> triggering the WARN_ON? Something like the following should be adequate >> to get you the line number. I suspect it is plist_node_empty give the >> git bisect results you reported. >> >> >> diff --git a/kernel/futex.c b/kernel/futex.c >> index abd5324..7f31bca 100644 >> --- a/kernel/futex.c >> +++ b/kernel/futex.c >> @@ -782,8 +782,11 @@ static void __unqueue_futex(struct futex_q *q) >> { >> struct futex_hash_bucket *hb; >> >> - if (WARN_ON(!q->lock_ptr || !spin_is_locked(q->lock_ptr) >> - || plist_node_empty(&q->list))) >> + if (WARN_ON(!q->lock_ptr)) >> + return; >> + if (!spin_is_locked(q->lock_ptr)) >> + return; >> + if (plist_node_empty(&q->list)) >> return; >> > Whoops, there should have been WARN_ON's in all the if blocks... duh. > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in? Hrm, I thought he said he was on 2.6.39-rc-something. Those patches went in pre 2.6.39-rc1 according to gitk. > > I think UML is UP, and that spin_is_locked() will always return false. > > -- Steve > > -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel |
From: Steven R. <ro...@go...> - 2011-05-20 17:41:24
|
On Fri, 2011-05-20 at 10:35 -0700, Darren Hart wrote: > > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in? > > Hrm, I thought he said he was on 2.6.39-rc-something. Those patches went > in pre 2.6.39-rc1 according to gitk. A git bisect can easily stumble on this where the fix is not made. -- Steve |
From: richard -r. w. <ric...@gm...> - 2011-05-20 16:25:02
|
2011/5/20 Toralf Förster <tor...@gm...>: > > richard -rw- weinberger wrote at 09:56:02 >> 2011/5/20 Toralf Förster <tor...@gm...>: >> > ... >> > Kernel panic - not syncing: Kernel mode fault at addr 0x0, ip 0x80a9f6b >> >> Looks like a NULL-pointer bug. >> What code is at address 80a9f6b? >> Use "objdump -d -S | less" to find it. > if (unlikely(ret != 0)) > 80a9f3a: 85 c0 test %eax,%eax > 80a9f3c: 75 ca jne 80a9f08 <futex_wake+0x18> > goto out; > > hb = hash_futex(&key); > 80a9f3e: 8d 45 e8 lea -0x18(%ebp),%eax > 80a9f41: e8 aa f6 ff ff call 80a95f0 <hash_futex> > 80a9f46: 89 c2 mov %eax,%edx > spin_lock(&hb->lock); > head = &hb->chain; > > plist_for_each_entry_safe(this, next, head, list) { > 80a9f48: 8b 48 08 mov 0x8(%eax),%ecx > 80a9f4b: 83 c2 08 add $0x8,%edx > 80a9f4e: 8d 41 f4 lea -0xc(%ecx),%eax > 80a9f51: 39 ca cmp %ecx,%edx > 80a9f53: 8b 70 0c mov 0xc(%eax),%esi > 80a9f56: 74 6a je 80a9fc2 <futex_wake+0xd2> > 80a9f58: 89 d9 mov %ebx,%ecx > 80a9f5a: 83 ee 0c sub $0xc,%esi > 80a9f5d: 89 d3 mov %edx,%ebx > 80a9f5f: 89 fa mov %edi,%edx > 80a9f61: 89 cf mov %ecx,%edi > 80a9f63: eb 12 jmp 80a9f77 <futex_wake+0x87> > 80a9f65: 8d 76 00 lea 0x0(%esi),%esi > 80a9f68: 8d 46 0c lea 0xc(%esi),%eax > 80a9f6b: 8b 4e 0c mov 0xc(%esi),%ecx Here in futex_wake() happens a NULL pointer dereference. Steve, any ideas? -- Thanks, //richard |
From: Steven R. <ro...@go...> - 2011-05-20 17:19:20
|
On Fri, 2011-05-20 at 18:24 +0200, richard -rw- weinberger wrote: > 2011/5/20 Toralf Förster <tor...@gm...>: > > > Here in futex_wake() happens a NULL pointer dereference. > Steve, any ideas? > Yes, if this is from the bisect, and does not contain the two commits that I've posted in another email. Without those fixes, the futex code will error and return without doing the proper work and cause all sorts of bugs. -- Steve |
From: Toralf F. <tor...@gm...> - 2011-05-20 17:10:28
|
Steven Rostedt wrote at 18:11:36 > > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in? > > > > I think UML is UP, and that spin_is_locked() will always return false. > > Could you apply these patches: > > 2092e6be WARN_ON_SMP(): Allow use in if() statements on UP > 29096202 futex: Fix WARN_ON() test for UP > > On top of this commit, and see if the problem goes away. What could have > happened, is that you have two bugs, with one of them fixed. If the git > bisect stumbled on this bug, it will show this one, even though later > on, this code was fixed. If you apply the above two patches and it works > again, then this isn't the bug you are looking for. > > -- Steve Right - applying those 2 commits on top of v2.6.38-rc8-1-g2e12978 works - now the issue is away. And yes - now I've to look for the other of two bugs, which was introduced between the tags v2.6.38 and v2.6.39, I think. Is there's an easy way to check, whether a given checked out git tree contains a specific commit id ? -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |
From: Steven R. <ro...@go...> - 2011-05-20 17:44:09
|
On Fri, 2011-05-20 at 19:10 +0200, Toralf Förster wrote: > Is there's an easy way to check, whether a given checked out git tree contains > a specific commit id ? You can try: git log --pretty=oneline | grep <SHA1> -- Steve |
From: Steven R. <ro...@go...> - 2011-05-20 17:46:16
|
On Fri, 2011-05-20 at 19:10 +0200, Toralf Förster wrote: > Steven Rostedt wrote at 18:11:36 > > > Wait! This is where we need the WARN_ON_SMP(), do we have that patch in? > > > > > > I think UML is UP, and that spin_is_locked() will always return false. > > > > Could you apply these patches: > > > > 2092e6be WARN_ON_SMP(): Allow use in if() statements on UP > > 29096202 futex: Fix WARN_ON() test for UP > > > > On top of this commit, and see if the problem goes away. What could have > > happened, is that you have two bugs, with one of them fixed. If the git > > bisect stumbled on this bug, it will show this one, even though later > > on, this code was fixed. If you apply the above two patches and it works > > again, then this isn't the bug you are looking for. > > > > -- Steve > Right - applying those 2 commits on top of v2.6.38-rc8-1-g2e12978 works - now > the issue is away. > > And yes - now I've to look for the other of two bugs, which was introduced > between the tags v2.6.38 and v2.6.39, I think. Another thing you could do is checkout 29096202 and see if it works. If it does, you can base that as your "git bisect good" and you should not be affected by this bug again. -- Steve |
From: Toralf F. <tor...@gm...> - 2011-05-20 22:54:04
|
Steven Rostedt wrote at 18:11:36 > Could you apply these patches: > > 2092e6be WARN_ON_SMP(): Allow use in if() statements on UP > 29096202 futex: Fix WARN_ON() test for UP > > On top of this commit, and see if the problem goes away. What could have > happened, is that you have two bugs, with one of them fixed. If the git > bisect stumbled on this bug, it will show this one, even though later > on, this code was fixed. If you apply the above two patches and it works > again, then this isn't the bug you are looking for. I bisected it again and applied at every step those 2 commits, if commit 2e12978 was in the source too. Furthermore it was necessary to use a fresh instance of firefox every time to reproduce a now shomehow changed issue: the UML system wasn't longer reachable, neither ping nor ssh into it was possible as soon as I tried to point firefox to https://n22_uml/phpmyadmin/ and no crash occured any longer. Furthermore a previously opened ssh session to that UML hangs completely. Bisecting gave : git bisect badd123375425d7df4b6081a631fc1203fceafa59b2 is the first bad commit commit d123375425d7df4b6081a631fc1203fceafa59b2 Author: Thomas Gleixner <tg...@li...> Date: Wed Jan 26 21:32:01 2011 +0100 rwsem: Remove redundant asmregparm annotation Peter Zijlstra pointed out, that the only user of asmregparm (x86) is compiling the kernel already with -mregparm=3. So the annotation of the rwsem functions is redundant. Remove it. Signed-off-by: Thomas Gleixner <tg...@li...> Cc: Peter Zijlstra <pe...@in...> Cc: David Howells <dho...@re...> Cc: Benjamin Herrenschmidt <be...@ke...> Cc: Matt Turner <mat...@gm...> Cc: Tony Luck <ton...@in...> Cc: Heiko Carstens <hei...@de...> Cc: Paul Mundt <le...@li...> Cc: David Miller <da...@da...> Cc: Chris Zankel <ch...@za...> LKML-Reference: <alpine.LFD.2.00.1101262130450.31804@localhost6.localdomain6> Signed-off-by: Thomas Gleixner <tg...@li...> :040000 040000 f373822625e4f5d03d89997cc9f06ef0e21c6d08 272479d3450a4924f3ad2d06a058d77c577ec0d4 M include :040000 040000 9294321acb9db51e4db72b8e7c95fbd1531a7f26 393fc63299ae482792439384485618a492619787 M lib /enjoy :-) -- MfG/Sincerely Toralf Förster pgp finger print: 7B1A 07F4 EC82 0F90 D4C2 8936 872A E508 7DB6 9DA3 |