From: Jiaying Z. <jia...@go...> - 2008-07-03 07:53:59
|
Hello, I found since 2.6.25 kernels, uml crashes when it calls down() on a semaphore with zero counter. Here is some example code. static struct semaphore test_sem; static int testfunc(NULL) { interruptible_sleep_on_timeout(&sleep_queue, 5 * HZ); // after some short period up(&test_sem); // up the semaphore } static int parent_func(unsigned argc, char **argv) { sema_init(&test_sem, 0); // init semaphore with zero counter kernel_thread((void *)testfunc, target, CLONE_FILES); // create a thread that will up the semaphore down_interruptible(&test_sem); // SHOULD wait here until testfunc up the semaphore } Our kernel module has used this kind of code to synchronize different kernel threads. It runs fine on real machine and old uml kernels, but crashes on 2.6.25.4uml. I tried the latest 2.6.25.9 kernel, and still saw the same problem. It seems to have something to do with uml's signal handling. Does anyone know what changes in 2.6.25 uml code that may cause the problem? Thanks a lot! Jiaying |
From: Jeff D. <jd...@ad...> - 2008-07-03 13:56:55
|
On Thu, Jul 03, 2008 at 12:53:46AM -0700, Jiaying Zhang wrote: > I found since 2.6.25 kernels, uml crashes when it calls down() on a > semaphore with > zero counter. What's the stack trace? Can you bisect it? Jeff -- Work email - jdike at linux dot intel dot com |
From: Mattia D. <mal...@li...> - 2008-07-20 15:20:37
|
On Fri, Jul 18, 2008 at 04:53:42PM -0400, Jeff Dike wrote: > On Thu, Jul 17, 2008 at 12:55:09PM +0800, Jiaying Zhang wrote: > > The patch below solves the 2.6.25 uml crash problem for me. Looks like the > > problem should be away in 2.6.26 kernel because down_interruptible has > > changed to the C code since 2.6.26. But I got kernel panic while booting > > the 2.6.26 kernel :(. > > > > --- linux-2.6.25.4/lib/semaphore-sleepers.c 2008-05-15 > > 23:00:12.000000000 +0800 > > +++ linux-2.6.25.4-new/lib/semaphore-sleepers.c 2008-07-17 > > 12:20:47.000000000 +0800 > > @@ -48,12 +48,12 @@ > > * we cannot lose wakeup events. > > */ > > > > -void __up(struct semaphore *sem) > > +asmregparm void __up(struct semaphore *sem) > > { > > wake_up(&sem->wait); > > } > > You continue to ignore a few important facts: > > 1 - There are a ton of semaphores in UML > 2 - They all work, except for yours > Therefore, a patch which changes all semphores across all > architectures for which asmregparam has meaning can't possibly be the > correct fix. > > However, you might have treated this as an important clue, and looked > at whether your broken semaphore has a different set of declarations > in force than those in the rest of the kernel. Jeff, it's not entirely clear to me why, but that patch fixes a segfault that I experience when booting uml 2.6.25 built with gcc-4.3 on a 2.6.25 host (I also applied your ICE workaround patch). I'm booting a debian sid image that I usually run before uploading the new uml package in debian. I've got no fancy modules written by me and the segfault is 100% reproducible with that debian image (a different image -a gentoo- doesn't crash). I'll provide more info tomorrow, I'll try to further trace the crash with gdb. cheers -- mattia :wq! |
From: Jiaying Z. <jia...@go...> - 2008-07-04 01:06:36
|
The stack trace isn't very helpful. Here it is. EIP: 0073:[<d84156c5>] CPU: 0 Not tainted ESP: 007b:0be3ea78 EFLAGS: 00210206 Not tainted EAX: 0be548d8 EBX: 08325b54 ECX: 08325b58 EDX: 0be548cc ESI: 00000001 EDI: 080598c6 EBP: 0be3ea98 DS: 007b ES: 007b 08323b6c: [<0806a718>] show_regs+0xc4/0xc9 08323b98: [<080594b3>] segv+0x20e/0x226 08323c3c: [<080592a0>] segv_handler+0x4f/0x54 08323c5c: [<0806537b>] sig_handler_common+0x63/0x72 08323cd4: [<080653b8>] sig_handler+0x2e/0x3e 08323cec: [<080654dd>] handle_signal+0x4d/0x7a 08323d0c: [<08066ebf>] hard_handler+0xf/0x14 08323d1c: [<b7fff420>] 0xb7fff420 Kernel panic - not syncing: Kernel mode fault at addr 0xd84156c5, ip 0xd84156c5 EIP: 0073:[<40146334>] CPU: 0 Not tainted ESP: 007b:bfaf2378 EFLAGS: 00200246 Not tainted EAX: ffffffda EBX: 00000003 ECX: c134fd09 EDX: 08050368 ESI: 4002b7c0 EDI: 40029180 EBP: bfaf24c8 DS: 007b ES: 007b 08323ad8: [<0806a718>] show_regs+0xc4/0xc9 08323b04: [<080596ed>] panic_exit+0x23/0x39 08323b18: [<080849d0>] notifier_call_chain+0x21/0x4d 08323b38: [<08084a72>] __atomic_notifier_call_chain+0x17/0x19 08323b54: [<08084a89>] atomic_notifier_call_chain+0x15/0x17 08323b70: [<0807116f>] panic+0x4f/0xd1 08323b8c: [<080594c1>] segv+0x21c/0x226 08323c3c: [<080592a0>] segv_handler+0x4f/0x54 08323c5c: [<0806537b>] sig_handler_common+0x63/0x72 08323cd4: [<080653b8>] sig_handler+0x2e/0x3e 08323cec: [<080654dd>] handle_signal+0x4d/0x7a 08323d0c: [<08066ebf>] hard_handler+0xf/0x14 08323d1c: [<b7fff420>] 0xb7fff420 Segmentation fault Jiaying On Thu, Jul 3, 2008 at 9:56 PM, Jeff Dike <jd...@ad...> wrote: > On Thu, Jul 03, 2008 at 12:53:46AM -0700, Jiaying Zhang wrote: > > I found since 2.6.25 kernels, uml crashes when it calls down() on a > > semaphore with > > zero counter. > > What's the stack trace? > > Can you bisect it? > > Jeff > > -- > Work email - jdike at linux dot intel dot com > |
From: Jeff D. <jd...@ad...> - 2008-07-20 15:44:40
|
On Mon, Jul 21, 2008 at 12:20:22AM +0900, Mattia Dongili wrote: > it's not entirely clear to me why, but that patch fixes a segfault that > I experience when booting uml 2.6.25 built with gcc-4.3 on a 2.6.25 > host (I also applied your ICE workaround patch). Hmmm, get a stack trace from it and let's see what's going on. Presumably, you're not doing kernel development, just building a stock UML? Jeff -- Work email - jdike at linux dot intel dot com |
From: Jiaying Z. <jia...@go...> - 2008-07-10 02:25:41
|
Hi Jeff, Do you have any thought about what the problem might be? Thanks a lot! Jiaying On Fri, Jul 4, 2008 at 9:06 AM, Jiaying Zhang <jia...@go...> wrote: > The stack trace isn't very helpful. Here it is. > > EIP: 0073:[<d84156c5>] CPU: 0 Not tainted ESP: 007b:0be3ea78 EFLAGS: > 00210206 > Not tainted > EAX: 0be548d8 EBX: 08325b54 ECX: 08325b58 EDX: 0be548cc > ESI: 00000001 EDI: 080598c6 EBP: 0be3ea98 DS: 007b ES: 007b > 08323b6c: [<0806a718>] show_regs+0xc4/0xc9 > 08323b98: [<080594b3>] segv+0x20e/0x226 > 08323c3c: [<080592a0>] segv_handler+0x4f/0x54 > 08323c5c: [<0806537b>] sig_handler_common+0x63/0x72 > 08323cd4: [<080653b8>] sig_handler+0x2e/0x3e > 08323cec: [<080654dd>] handle_signal+0x4d/0x7a > 08323d0c: [<08066ebf>] hard_handler+0xf/0x14 > 08323d1c: [<b7fff420>] 0xb7fff420 > > Kernel panic - not syncing: Kernel mode fault at addr 0xd84156c5, ip > 0xd84156c5 > > EIP: 0073:[<40146334>] CPU: 0 Not tainted ESP: 007b:bfaf2378 EFLAGS: > 00200246 > Not tainted > EAX: ffffffda EBX: 00000003 ECX: c134fd09 EDX: 08050368 > ESI: 4002b7c0 EDI: 40029180 EBP: bfaf24c8 DS: 007b ES: 007b > 08323ad8: [<0806a718>] show_regs+0xc4/0xc9 > 08323b04: [<080596ed>] panic_exit+0x23/0x39 > 08323b18: [<080849d0>] notifier_call_chain+0x21/0x4d > 08323b38: [<08084a72>] __atomic_notifier_call_chain+0x17/0x19 > 08323b54: [<08084a89>] atomic_notifier_call_chain+0x15/0x17 > 08323b70: [<0807116f>] panic+0x4f/0xd1 > 08323b8c: [<080594c1>] segv+0x21c/0x226 > 08323c3c: [<080592a0>] segv_handler+0x4f/0x54 > 08323c5c: [<0806537b>] sig_handler_common+0x63/0x72 > 08323cd4: [<080653b8>] sig_handler+0x2e/0x3e > 08323cec: [<080654dd>] handle_signal+0x4d/0x7a > 08323d0c: [<08066ebf>] hard_handler+0xf/0x14 > 08323d1c: [<b7fff420>] 0xb7fff420 > > Segmentation fault > > Jiaying > > > On Thu, Jul 3, 2008 at 9:56 PM, Jeff Dike <jd...@ad...> wrote: > >> On Thu, Jul 03, 2008 at 12:53:46AM -0700, Jiaying Zhang wrote: >> > I found since 2.6.25 kernels, uml crashes when it calls down() on a >> > semaphore with >> > zero counter. >> >> What's the stack trace? >> >> Can you bisect it? >> >> Jeff >> >> -- >> Work email - jdike at linux dot intel dot com >> > > |
From: Mattia D. <mal...@li...> - 2008-07-21 12:54:39
|
On Sun, Jul 20, 2008 at 11:44:20AM -0400, Jeff Dike wrote: > On Mon, Jul 21, 2008 at 12:20:22AM +0900, Mattia Dongili wrote: > > it's not entirely clear to me why, but that patch fixes a segfault that > > I experience when booting uml 2.6.25 built with gcc-4.3 on a 2.6.25 > > host (I also applied your ICE workaround patch). > > Hmmm, get a stack trace from it and let's see what's going on. > > Presumably, you're not doing kernel development, just building a stock UML? nope, not doing kernel development on that it's a stock UML, added patches are just small customizations for debian: http://svn.debian.org/viewsvn/pkg-uml/trunk/src/user-mode-linux/debian/patches/ patch #1 is not used, #2 and #3 are trivial changes. #4 is the gcc-4.3 ICE workaround and #5 is Jiaying's patch we are discussing. The configuration is this: http://svn.debian.org/viewsvn/pkg-uml/trunk/src/user-mode-linux/config.i386?rev=310&view=markup on top of this I enabled the debug info to be built: CONFIG_PRINTK_TIME=y CONFIG_DEBUG_KERNEL=y CONFIG_DEBUG_INFO=y CONFIG_FRAME_POINTER=y I also just reconfirmaed that with Jiaying's patch it doesn't happen. Program received signal SIGILL, Illegal instruction. 0x00000000 in ?? () (gdb) bt #0 0x00000000 in ?? () #1 0x080702e5 in __wake_up_common (q=0x16d50e88, mode=3, nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4145 #2 0x08070323 in __wake_up_locked (q=0x16d50e88, mode=3) at kernel/sched.c:4174 #3 0x082556da in __down (sem=0x16d50e80) at lib/semaphore-sleepers.c:88 #4 0x0825401a in __down_failed () at arch/um/sys-i386/../../x86/lib/semaphore_32.S:42 #5 0x081072b7 in flush_commit_list (s=0x16dfba00, jl=0x16d50e80, flushall=1) at include/asm/arch/semaphore_32.h:99 #6 0x081077a3 in flush_async_commits (work=0x18936124) at fs/reiserfs/journal.c:3507 #7 0x08082a24 in run_workqueue (cwq=0x16ee9080) at kernel/workqueue.c:276 #8 0x08082cdf in worker_thread (__cwq=0x16ee9080) at kernel/workqueue.c:321 #9 0x0808538f in kthread (_create=0x17c679b4) at kernel/kthread.c:80 #10 0x08068f2b in run_kernel_thread (fn=0x8085347 <kthread>, arg=0x17c679b4, jmp_ptr=0x16e28bb4) at arch/um/os-Linux/process.c:267 #11 0x0805ae87 in new_thread_handler () at arch/um/kernel/process.c:151 #12 0x00000000 in ?? () (gdb) l 178 * area at compile-time.. 179 */ 180 static __always_inline void * __constant_c_memset(void * s, unsigned long c, size_t count) 181 { 182 int d0, d1; 183 __asm__ __volatile__( 184 "rep ; stosl\n\t" 185 "testb $2,%b3\n\t" 186 "je 1f\n\t" 187 "stosw\n" (gdb) up #1 0x080702e5 in __wake_up_common (q=0x16d50e88, mode=3, nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4145 4145 if (curr->func(curr, mode, sync, key) && (gdb) print *curr $3 = {flags = 255, private = 0x0, func = 0, task_list = {next = 0x16d50e88, prev = 0x0}} it looks like there is not func here... (gdb) l 4140 wait_queue_t *curr, *next; 4141 4142 list_for_each_entry_safe(curr, next, &q->task_list, task_list) { 4143 unsigned flags = curr->flags; 4144 4145 if (curr->func(curr, mode, sync, key) && 4146 (flags & WQ_FLAG_EXCLUSIVE) && !--nr_exclusive) 4147 break; 4148 } 4149 } (gdb) up #2 0x08070323 in __wake_up_locked (q=0x16d50e88, mode=3) at kernel/sched.c:4174 4174 __wake_up_common(q, mode, 1, 0, NULL); (gdb) l 4169 /* 4170 * Same as __wake_up but called with the spinlock in wait_queue_head_t held. 4171 */ 4172 void __wake_up_locked(wait_queue_head_t *q, unsigned int mode) 4173 { 4174 __wake_up_common(q, mode, 1, 0, NULL); 4175 } 4176 4177 /** 4178 * __wake_up_sync - wake up threads blocked on a waitqueue. (gdb) up #3 0x082556da in __down (sem=0x16d50e80) at lib/semaphore-sleepers.c:88 88 wake_up_locked(&sem->wait); (gdb) l 83 84 spin_lock_irqsave(&sem->wait.lock, flags); 85 tsk->state = TASK_UNINTERRUPTIBLE; 86 } 87 remove_wait_queue_locked(&sem->wait, &wait); 88 wake_up_locked(&sem->wait); 89 spin_unlock_irqrestore(&sem->wait.lock, flags); 90 tsk->state = TASK_RUNNING; 91 } 92 (gdb) print *sem $4 = {count = {counter = 5833}, sleepers = 0, wait = {lock = {raw_lock = {<No data fields>}}, task_list = {next = 0xf, prev = 0xf}}} Any other useful information I could provide? -- mattia :wq! |
From: Jeff D. <jd...@ad...> - 2008-07-10 17:02:25
|
On Thu, Jul 10, 2008 at 10:25:29AM +0800, Jiaying Zhang wrote: > Do you have any thought about what the problem might be? > Thanks a lot! Yeah, my first thought is that your code is buggy. Since 2.6.25 seems OK, you can bisect between then and now to see either what caused the bug or what is triggering the crash on an existing bug. The other thing you can do is gdb the UML and see if gdb gives you a better stack trace. Jeff -- Work email - jdike at linux dot intel dot com |
From: Mattia D. <mal...@li...> - 2008-08-02 05:54:20
|
On Mon, Jul 21, 2008 at 09:54:26PM +0900, Mattia Dongili wrote: > On Sun, Jul 20, 2008 at 11:44:20AM -0400, Jeff Dike wrote: > > On Mon, Jul 21, 2008 at 12:20:22AM +0900, Mattia Dongili wrote: > > > it's not entirely clear to me why, but that patch fixes a segfault that > > > I experience when booting uml 2.6.25 built with gcc-4.3 on a 2.6.25 > > > host (I also applied your ICE workaround patch). > > > > Hmmm, get a stack trace from it and let's see what's going on. Hi Jeff, FWIW I can't reproduce this on 2.6.26. cheers -- mattia :wq! |
From: Jeff D. <jd...@ad...> - 2008-08-04 16:44:48
|
On Sat, Aug 02, 2008 at 02:54:08PM +0900, Mattia Dongili wrote: > FWIW I can't reproduce this on 2.6.26. Thanks for letting me know. Too bad it's still a mystery though. Jeff -- Work email - jdike at linux dot intel dot com |
From: Jiaying Z. <jia...@go...> - 2008-07-14 09:07:06
|
On Fri, Jul 11, 2008 at 1:02 AM, Jeff Dike <jd...@ad...> wrote: > On Thu, Jul 10, 2008 at 10:25:29AM +0800, Jiaying Zhang wrote: > > Do you have any thought about what the problem might be? > > Thanks a lot! > > Yeah, my first thought is that your code is buggy. > > Since 2.6.25 seems OK, you can bisect between then and now to see > either what caused the bug or what is triggering the crash on an > existing bug. The 2.6.24 kernels are OK, but I have seen this problem with all of the 2.6.25 kernels I have tried. There have been a lot of changes between 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead to this problem. > The other thing you can do is gdb the UML and see if gdb gives you a > better stack trace. > Here is the trace from gdb uml. Program received signal SIGTERM, Terminated. 0xb7fff410 in ?? () (gdb) bt #0 0xb7fff410 in ?? () #1 0x08323afc in cpu0_irqstack () #2 0xfffffffe in ?? () #3 0x0000000f in ?? () #4 0x464850c6 in kill () from /lib/tls/i686/cmov/libc.so.6 #5 0x0806624b in os_dump_core () at arch/um/os-Linux/util.c:92 #6 0x08059703 in panic_exit (self=0x83254f4, unused1=0, unused2=0x8340a80) at arch/um/kernel/um_arch.c:233 #7 0x080849d0 in notifier_call_chain (nl=0x0, val=0, v=0x8340a80, nr_to_call=0, nr_calls=0x0) at kernel/notifier.c:70 #8 0x08084a72 in __atomic_notifier_call_chain (nh=0x8340a60, val=0, v=0x8340a80, nr_to_call=-1, nr_calls=0x0) at kernel/notifier.c:159 #9 0x08084a89 in atomic_notifier_call_chain (nh=0x8340a60, val=0, v=0x8340a80) at kernel/notifier.c:168 #10 0x0807116f in panic (fmt=0x82d2039 "Kernel mode fault at addr 0x%lx, ip 0x%lx") at kernel/panic.c:101 #11 0x080594c1 in segv (fi={error_code = 6, cr2 = 98596, trap_no = 14}, ip=136845739, is_user=0, regs=0x8323c6c) at arch/um/kernel/trap.c:206 #12 0x080592a0 in segv_handler (sig=11, regs=0x8323c6c) at arch/um/kernel/trap.c:152 #13 0x0806537b in sig_handler_common (sig=11, sc=0x8323d24) at arch/um/os-Linux/signal.c:48 #14 0x080653b8 in sig_handler (sig=11, sc=0x8323d24) at arch/um/os-Linux/signal.c:80 #15 0x080654dd in handle_signal (sig=<value optimized out>, sc=0x8323d24) at arch/um/os-Linux/signal.c:157 #16 0x08066ebf in hard_handler (sig=11) at arch/um/os-Linux/sys-i386/signal.c:12 #17 <signal handler called> #18 __down_interruptible (sem=0x9f68978) at include/linux/list.h:50 #19 0x0828091a in __down_failed_interruptible () at arch/um/sys-i386/../../x86/lib/semaphore_32.S:63 #20 0x08220a89 in ddsnap_create (target=0xa829080, argc=4, argv=0x9f6f290) at include/asm/arch/semaphore_32.h:120 #21 0x0821b160 in dm_table_add_target (t=0x9f6f178, type=0xa82414c "ddsnap", start=165497564, len=204800, params=0xa82415c "/dev/ubdc") at drivers/md/dm-table.c:772 Looks like the problem happens when __down_interruptible is called. I checked the semaphore passed to __down_interruptible under gdb and found it was corrupted: (gdb) f 18 #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 50 prev->next = new; (gdb) p sem $15 = (struct semaphore *) 0x9f68d08 (gdb) p *sem $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = {raw_lock = {<No data fields>}}, task_list = { next = 0x9f68d5c, prev = 0x18124}}} But the semaphore looks correct before calling down_interruptible: (gdb) f 20 #20 0x082209fd in ddsnap_create (target=0xa829080, argc=4, argv=0x9f733a8) at include/asm/arch/semaphore_32.h:120 120 __asm__ __volatile__( (gdb) p info->identify_sem $28 = {count = {counter = -1}, sleepers = 0, wait = {lock = {raw_lock = {<No data fields>}}, task_list = { next = 0x9f0ca14, prev = 0x9f0ca14}}} I found from 2.6.25 kernel, the type of __down_failed_interruptible changed from fastcall to extern asmregparm. Can it be related to this problem? Jiaying |
From: Jeff D. <jd...@ad...> - 2008-07-14 14:46:23
|
On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote: > The 2.6.24 kernels are OK, but I have seen this problem with all of the > 2.6.25 kernels I have tried. There have been a lot of changes between > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead > to this problem. So bisect it. > Looks like the problem happens when __down_interruptible is called. > I checked the semaphore passed to __down_interruptible under gdb > and found it was corrupted: > (gdb) f 18 > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 > 50 prev->next = new; > (gdb) p sem > $15 = (struct semaphore *) 0x9f68d08 > (gdb) p *sem > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = > {raw_lock = {<No data fields>}}, task_list = { > next = 0x9f68d5c, prev = 0x18124}}} > > But the semaphore looks correct before calling down_interruptible: What's the problem with debugging this, then? You step through the code starting when the semaphore is good and see exactly when it gets corrupted. Jeff -- Work email - jdike at linux dot intel dot com |
From: Jiaying Z. <jia...@go...> - 2008-07-16 09:52:44
|
On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <jd...@ad...> wrote: > On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote: > > The 2.6.24 kernels are OK, but I have seen this problem with all of the > > 2.6.25 kernels I have tried. There have been a lot of changes between > > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead > > to this problem. > > So bisect it. The problem seems to be related to the getting rid of fastcall changes introduced in 2.6.25 kernels. I found the problem started to happen from commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify include/asm-x86/linkage_[32|64].h. After that, several commits related to __down_interruptible had been checked in, but they did not solve the crashing problem I saw. In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382 x86: fix UML and -regparm=3 commit would solve the problem because it adds the asmregparm macro that is the same as fastcall and uses the macro for __down_failed_interruptible declaration. Unfortunately, I tried that version of git code and saw the same problem happened. > > Looks like the problem happens when __down_interruptible is called. > > I checked the semaphore passed to __down_interruptible under gdb > > and found it was corrupted: > > (gdb) f 18 > > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 > > 50 prev->next = new; > > (gdb) p sem > > $15 = (struct semaphore *) 0x9f68d08 > > (gdb) p *sem > > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = > > {raw_lock = {<No data fields>}}, task_list = { > > next = 0x9f68d5c, prev = 0x18124}}} > > > > But the semaphore looks correct before calling down_interruptible: > > What's the problem with debugging this, then? You step through the > code starting when the semaphore is good and see exactly when it gets > corrupted. > Yes. Looks like the corruption happens when __down_failed_interruptible() calls __down_interruptible() and it has something to do with the 2.6.25's x86 gcc attribute changes. Jiaying |
From: Jiaying Z. <jia...@go...> - 2008-07-17 04:55:19
|
The patch below solves the 2.6.25 uml crash problem for me. Looks like the problem should be away in 2.6.26 kernel because down_interruptible has changed to the C code since 2.6.26. But I got kernel panic while booting the 2.6.26 kernel :(. --- linux-2.6.25.4/lib/semaphore-sleepers.c 2008-05-15 23:00:12.000000000 +0800 +++ linux-2.6.25.4-new/lib/semaphore-sleepers.c 2008-07-17 12:20:47.000000000 +0800 @@ -48,12 +48,12 @@ * we cannot lose wakeup events. */ -void __up(struct semaphore *sem) +asmregparm void __up(struct semaphore *sem) { wake_up(&sem->wait); } -void __sched __down(struct semaphore *sem) +asmregparm void __sched __down(struct semaphore *sem) { struct task_struct *tsk = current; DECLARE_WAITQUEUE(wait, tsk); @@ -90,7 +90,7 @@ void __sched __down(struct semaphore *se tsk->state = TASK_RUNNING; } -int __sched __down_interruptible(struct semaphore *sem) +asmregparm int __sched __down_interruptible(struct semaphore *sem) { int retval = 0; struct task_struct *tsk = current; @@ -153,7 +153,7 @@ int __sched __down_interruptible(struct * single "cmpxchg" without failure cases, * but then it wouldn't work on a 386. */ -int __down_trylock(struct semaphore *sem) +asmregparm int __down_trylock(struct semaphore *sem) { int sleepers; unsigned long flags; Jiaying On Wed, Jul 16, 2008 at 5:52 PM, Jiaying Zhang <jia...@go...> wrote: > > > On Mon, Jul 14, 2008 at 10:46 PM, Jeff Dike <jd...@ad...> wrote: > >> On Mon, Jul 14, 2008 at 05:06:49PM +0800, Jiaying Zhang wrote: >> > The 2.6.24 kernels are OK, but I have seen this problem with all of the >> > 2.6.25 kernels I have tried. There have been a lot of changes between >> > 2.6.24 kernels and 2.6.25 kernels. I am not sure which one may lead >> > to this problem. >> >> So bisect it. > > > The problem seems to be related to the getting rid of fastcall changes > introduced in 2.6.25 kernels. I found the problem started to happen from > commit 82f74e7159749cc511ebf5954a7b9ea6ad634949: x86: unify > include/asm-x86/linkage_[32|64].h. > After that, several commits related to __down_interruptible had been > checked in, but they did not solve the crashing problem I saw. > In particular, I thought the d50efc6c40620b2e11648cac64ebf4a824e40382 > x86: fix UML and -regparm=3 commit would solve the problem because it > adds the asmregparm macro that is the same as fastcall and uses the macro > for __down_failed_interruptible declaration. Unfortunately, I tried that > version > of git code and saw the same problem happened. > > >> > Looks like the problem happens when __down_interruptible is called. >> > I checked the semaphore passed to __down_interruptible under gdb >> > and found it was corrupted: >> > (gdb) f 18 >> > #18 __down_interruptible (sem=0x9f68d08) at include/linux/list.h:50 >> > 50 prev->next = new; >> > (gdb) p sem >> > $15 = (struct semaphore *) 0x9f68d08 >> > (gdb) p *sem >> > $16 = {count = {counter = -268435295}, sleepers = 4, wait = {lock = >> > {raw_lock = {<No data fields>}}, task_list = { >> > next = 0x9f68d5c, prev = 0x18124}}} >> > >> > But the semaphore looks correct before calling down_interruptible: >> >> What's the problem with debugging this, then? You step through the >> code starting when the semaphore is good and see exactly when it gets >> corrupted. >> > > Yes. Looks like the corruption happens when __down_failed_interruptible() > calls __down_interruptible() and it has something to do with the 2.6.25's > x86 > gcc attribute changes. > > Jiaying > > |
From: Jeff D. <jd...@ad...> - 2008-07-18 20:53:53
|
On Thu, Jul 17, 2008 at 12:55:09PM +0800, Jiaying Zhang wrote: > The patch below solves the 2.6.25 uml crash problem for me. Looks like the > problem should be away in 2.6.26 kernel because down_interruptible has > changed to the C code since 2.6.26. But I got kernel panic while booting > the 2.6.26 kernel :(. > > --- linux-2.6.25.4/lib/semaphore-sleepers.c 2008-05-15 > 23:00:12.000000000 +0800 > +++ linux-2.6.25.4-new/lib/semaphore-sleepers.c 2008-07-17 > 12:20:47.000000000 +0800 > @@ -48,12 +48,12 @@ > * we cannot lose wakeup events. > */ > > -void __up(struct semaphore *sem) > +asmregparm void __up(struct semaphore *sem) > { > wake_up(&sem->wait); > } You continue to ignore a few important facts: 1 - There are a ton of semaphores in UML 2 - They all work, except for yours Therefore, a patch which changes all semphores across all architectures for which asmregparam has meaning can't possibly be the correct fix. However, you might have treated this as an important clue, and looked at whether your broken semaphore has a different set of declarations in force than those in the rest of the kernel. Jeff -- Work email - jdike at linux dot intel dot com |