From: uml@s.rhythm.cx - 2006-02-01 03:23:42
|
I tried upgrading my UML running 2.4.28-1um to 2.6.15.2-bs1. After bringing the UML up under the later kernel, I noticed random processes within the UML would segfault. For instance, I tried SSHing to it over and over and in about 1 in 5 attempts my shell or something else during the login procedure would segfault before getting to a prompt. /lib/tls has been moved to /lib/tls.disabled and I stopped & restarted the UML after making that change, didn't help. Nothing distinct was printed to the guest's syslog or kernel log except "line_write_room: tty0: no room left in buffer", but the occurance of those messages didn't seem to correlate with the segfaults. I reverted back to 2.4.28-1um and everything is fine again. The only other changes I made were switching from devfs to regular device files in 2.6, and booting the 2.6 kernel with more memory (this 2.4 image I have has TT enabled and won't start with more than mem=200M). Any ideas? Any other info I can post? Thanks |
From: Blaisorblade <bla...@ya...> - 2006-02-01 14:28:37
|
On Wednesday 01 February 2006 04:23, uml@s.rhythm.cx wrote: > I tried upgrading my UML running 2.4.28-1um to 2.6.15.2-bs1. After bringing > the UML up under the later kernel, I noticed random processes within the > UML would segfault. For instance, I tried SSHing to it over and over and in > about 1 in 5 attempts my shell or something else during the login procedure > would segfault before getting to a prompt. > /lib/tls has been moved to /lib/tls.disabled and I stopped & restarted the > UML after making that change, didn't help. > Nothing distinct was printed to the guest's syslog or kernel log except > "line_write_room: tty0: no room left in buffer" That's indeed harmless... but didn't the ssh log contain anything? > , but the occurance of those > messages didn't seem to correlate with the segfaults. > I reverted back to 2.4.28-1um and everything is fine again. > The only other changes I made were switching from devfs to regular device > files in 2.6, and booting the 2.6 kernel with more memory (this 2.4 image I > have has TT enabled and won't start with more than mem=200M). > Any ideas? Any other info I can post? Never seen such a report... guess that knowing which guest distro, which mode (SKAS0 or TT or SKAS3), the command line and the kernel config (can be obtained with the --showconfig switch). > Thanks -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: Allen C. <al...@us...> - 2006-02-01 20:28:16
|
On Tuesday 31 January 2006 22:23, uml@s.rhythm.cx wrote: > I tried upgrading my UML running 2.4.28-1um to 2.6.15.2-bs1. After > bringing the UML up under the later kernel, I noticed random > processes within the UML would segfault. For instance, I tried > SSHing to it over and over and in about 1 in 5 attempts my shell or > something else during the login procedure would segfault before > getting to a prompt. I've had similar problems in every UML guest kernel starting with 2.6.11. The initial ssh login would segfault every now and then, but the UML appears to run fine otherwise. The same setup would work fine in a 2.4.* guest kernel and in 2.6.* guest kernels up to 2.6.10. After a little experimentation, it appears that the problem only occurs in a tcsh shell and the segfault occurs on command lines containing backquotes, and even then, the segfault only occurs roughly 10% of the time. The problem can be reproduced by repeatedly executing the following command inside the UML guest: tcsh -c 'echo `hostname`' Unfortunately, I have not found a solution other than to avoid tcsh and/or backquoted commands from inside tcsh scripts. |
From: uml@s.rhythm.cx - 2006-02-01 21:10:57
|
On Wed, Feb 01, 2006 at 03:28:06PM -0500, Allen Chan wrote: > On Tuesday 31 January 2006 22:23, uml@s.rhythm.cx wrote: > > I tried upgrading my UML running 2.4.28-1um to 2.6.15.2-bs1. After > > bringing the UML up under the later kernel, I noticed random > > processes within the UML would segfault. For instance, I tried > > SSHing to it over and over and in about 1 in 5 attempts my shell or > > something else during the login procedure would segfault before > > getting to a prompt. > > I've had similar problems in every UML guest kernel starting with > 2.6.11. The initial ssh login would segfault every now and then, but > the UML appears to run fine otherwise. The same setup would work > fine in a 2.4.* guest kernel and in 2.6.* guest kernels up to 2.6.10. > > After a little experimentation, it appears that the problem only > occurs in a tcsh shell and the segfault occurs on command lines > containing backquotes, and even then, the segfault only occurs > roughly 10% of the time. The problem can be reproduced by repeatedly > executing the following command inside the UML guest: > tcsh -c 'echo `hostname`' Ah, yes, mine is doing the exact same thing. I made it core dump, but the core it left behind is corrupt or something. gdb says it segfaulted while doing read() in libc, but there are thousands of bogus frames on the stack, so who knows... |
From: Jeff D. <jd...@ad...> - 2006-02-02 16:21:58
|
On Wed, Feb 01, 2006 at 03:28:06PM -0500, Allen Chan wrote: The problem can be reproduced by repeatedly > executing the following command inside the UML guest: > tcsh -c 'echo `hostname`' Nice test case! This appears to be a skas0 bug - the skas0 segfault handler is segfaulting for some reason, and this is causing the process to segfault. Jeff |
From: Allen C. <al...@us...> - 2006-02-02 17:01:35
|
On Thursday 02 February 2006 11:23, Jeff Dike wrote: > On Wed, Feb 01, 2006 at 03:28:06PM -0500, Allen Chan wrote: >> The problem can be reproduced by repeatedly > > executing the following command inside the UML guest: > > tcsh -c 'echo `hostname`' > > Nice test case! This appears to be a skas0 bug - the skas0 > segfault handler is segfaulting for some reason, and this is > causing the process to segfault. This issue may not be limited to skas0, as I'm encountering these symptoms while using skas3 patched hosts, both on my own machines and on a UML guest hosted at linode.com. |
From: uml@s.rhythm.cx - 2006-02-02 18:18:05
|
On Thu, Feb 02, 2006 at 12:01:26PM -0500, Allen Chan wrote: > This issue may not be limited to skas0, as I'm encountering these > symptoms while using skas3 patched hosts, both on my own machines and > on a UML guest hosted at linode.com. I'm also seeing this on skas3. |
From: Jeff D. <jd...@ad...> - 2006-02-08 20:12:11
|
On Thu, Feb 02, 2006 at 01:17:49PM -0500, uml@s.rhythm.cx wrote: > I'm also seeing this on skas3. Can you try out the patch below? It fixes it on i386, but I'm still seeing segfaulting on x86_64. Jeff Index: linux-2.6.15/arch/um/sys-i386/signal.c =================================================================== --- linux-2.6.15.orig/arch/um/sys-i386/signal.c 2006-02-06 23:24:24.000000000 -0500 +++ linux-2.6.15/arch/um/sys-i386/signal.c 2006-02-07 18:10:58.000000000 -0500 @@ -127,7 +127,7 @@ static inline unsigned long twd_fxsr_to_ } int copy_sc_to_user_skas(struct sigcontext *to, struct _fpstate *to_fp, - struct pt_regs *regs) + struct pt_regs *regs, unsigned long sp) { struct sigcontext sc; struct _fpstate * fp = (struct _fpstate *)regs->regs.skas.fp; @@ -141,7 +141,7 @@ int copy_sc_to_user_skas(struct sigconte sc.edi = REGS_EDI(regs->regs.skas.regs); sc.esi = REGS_ESI(regs->regs.skas.regs); sc.ebp = REGS_EBP(regs->regs.skas.regs); - sc.esp = REGS_SP(regs->regs.skas.regs); + sc.esp = sp; sc.ebx = REGS_EBX(regs->regs.skas.regs); sc.edx = REGS_EDX(regs->regs.skas.regs); sc.ecx = REGS_ECX(regs->regs.skas.regs); @@ -213,7 +213,7 @@ int copy_sc_from_user_tt(struct sigconte } int copy_sc_to_user_tt(struct sigcontext *to, struct _fpstate *fp, - struct sigcontext *from, int fpsize) + struct sigcontext *from, int fpsize, unsigned long sp) { struct _fpstate *to_fp, *from_fp; int err; @@ -221,11 +221,18 @@ int copy_sc_to_user_tt(struct sigcontext to_fp = (fp ? fp : (struct _fpstate *) (to + 1)); from_fp = from->fpstate; err = copy_to_user(to, from, sizeof(*to)); + + /* The SP in the sigcontext is the updated one for the signal + * delivery. The sp passed in is the original, and this needs + * to be restored, so we stick it in separately. + */ + err |= copy_to_user(&SC_SP(to), sp, sizeof(sp)); + if(from_fp != NULL){ err |= copy_to_user(&to->fpstate, &to_fp, sizeof(to->fpstate)); err |= copy_to_user(to_fp, from_fp, fpsize); } - return(err); + return err; } #endif @@ -240,11 +247,11 @@ static int copy_sc_from_user(struct pt_r } static int copy_sc_to_user(struct sigcontext *to, struct _fpstate *fp, - struct pt_regs *from) + struct pt_regs *from, unsigned long sp) { return(CHOOSE_MODE(copy_sc_to_user_tt(to, fp, UPT_SC(&from->regs), - sizeof(*fp)), - copy_sc_to_user_skas(to, fp, from))); + sizeof(*fp), sp), + copy_sc_to_user_skas(to, fp, from, sp))); } static int copy_ucontext_to_user(struct ucontext *uc, struct _fpstate *fp, @@ -255,7 +262,7 @@ static int copy_ucontext_to_user(struct err |= put_user(current->sas_ss_sp, &uc->uc_stack.ss_sp); err |= put_user(sas_ss_flags(sp), &uc->uc_stack.ss_flags); err |= put_user(current->sas_ss_size, &uc->uc_stack.ss_size); - err |= copy_sc_to_user(&uc->uc_mcontext, fp, ¤t->thread.regs); + err |= copy_sc_to_user(&uc->uc_mcontext, fp, ¤t->thread.regs, sp); err |= copy_to_user(&uc->uc_sigmask, set, sizeof(*set)); return(err); } @@ -288,6 +295,7 @@ int setup_signal_stack_sc(unsigned long { struct sigframe __user *frame; void *restorer; + unsigned long save_sp = PT_REGS_SP(regs); int err = 0; stack_top &= -8UL; @@ -299,9 +307,19 @@ int setup_signal_stack_sc(unsigned long if(ka->sa.sa_flags & SA_RESTORER) restorer = ka->sa.sa_restorer; + /* Update SP now because the page fault handler refuses to extend + * the stack if the faulting address is too far below the current + * SP, which frame now certainly is. If there's an error, the original + * value is restored on the way out. + * When writing the sigcontext to the stack, we have to write the + * original value, so that's passed to copy_sc_to_user, which does + * the right thing with it. + */ + PT_REGS_SP(regs) = (unsigned long) frame; + err |= __put_user(restorer, &frame->pretcode); err |= __put_user(sig, &frame->sig); - err |= copy_sc_to_user(&frame->sc, NULL, regs); + err |= copy_sc_to_user(&frame->sc, NULL, regs, save_sp); err |= __put_user(mask->sig[0], &frame->sc.oldmask); if (_NSIG_WORDS > 1) err |= __copy_to_user(&frame->extramask, &mask->sig[1], @@ -319,7 +337,7 @@ int setup_signal_stack_sc(unsigned long err |= __put_user(0x80cd, (short __user *)(frame->retcode+6)); if(err) - return(err); + goto err; PT_REGS_SP(regs) = (unsigned long) frame; PT_REGS_IP(regs) = (unsigned long) ka->sa.sa_handler; @@ -329,7 +347,11 @@ int setup_signal_stack_sc(unsigned long if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED)) ptrace_notify(SIGTRAP); - return(0); + return 0; + +err: + PT_REGS_SP(regs) = save_sp; + return err; } int setup_signal_stack_si(unsigned long stack_top, int sig, @@ -338,6 +360,7 @@ int setup_signal_stack_si(unsigned long { struct rt_sigframe __user *frame; void *restorer; + unsigned long save_sp = PT_REGS_SP(regs); int err = 0; stack_top &= -8UL; @@ -349,13 +372,16 @@ int setup_signal_stack_si(unsigned long if(ka->sa.sa_flags & SA_RESTORER) restorer = ka->sa.sa_restorer; + /* See comment above about why this is here */ + PT_REGS_SP(regs) = (unsigned long) frame; + err |= __put_user(restorer, &frame->pretcode); err |= __put_user(sig, &frame->sig); err |= __put_user(&frame->info, &frame->pinfo); err |= __put_user(&frame->uc, &frame->puc); err |= copy_siginfo_to_user(&frame->info, info); err |= copy_ucontext_to_user(&frame->uc, &frame->fpstate, mask, - PT_REGS_SP(regs)); + save_sp); /* * This is movl $,%eax ; int $0x80 @@ -369,9 +395,8 @@ int setup_signal_stack_si(unsigned long err |= __put_user(0x80cd, (short __user *)(frame->retcode+5)); if(err) - return(err); + goto err; - PT_REGS_SP(regs) = (unsigned long) frame; PT_REGS_IP(regs) = (unsigned long) ka->sa.sa_handler; PT_REGS_EAX(regs) = (unsigned long) sig; PT_REGS_EDX(regs) = (unsigned long) &frame->info; @@ -379,7 +404,11 @@ int setup_signal_stack_si(unsigned long if ((current->ptrace & PT_DTRACE) && (current->ptrace & PT_PTRACED)) ptrace_notify(SIGTRAP); - return(0); + return 0; + +err: + PT_REGS_SP(regs) = save_sp; + return err; } long sys_sigreturn(struct pt_regs regs) Index: linux-2.6.15/arch/um/sys-x86_64/signal.c =================================================================== --- linux-2.6.15.orig/arch/um/sys-x86_64/signal.c 2005-10-28 12:58:12.000000000 -0400 +++ linux-2.6.15/arch/um/sys-x86_64/signal.c 2006-02-08 15:08:19.000000000 -0500 @@ -55,7 +55,8 @@ static int copy_sc_from_user_skas(struct } int copy_sc_to_user_skas(struct sigcontext *to, struct _fpstate *to_fp, - struct pt_regs *regs, unsigned long mask) + struct pt_regs *regs, unsigned long mask, + unsigned long sp) { struct faultinfo * fi = ¤t->thread.arch.faultinfo; int err = 0; @@ -70,7 +71,11 @@ int copy_sc_to_user_skas(struct sigconte err |= PUTREG(regs, RDI, to, rdi); err |= PUTREG(regs, RSI, to, rsi); err |= PUTREG(regs, RBP, to, rbp); - err |= PUTREG(regs, RSP, to, rsp); + /* Must use orignal RSP, which is passed in, rather than what's in + * the pt_regs, because that's already been updated to point at the + * signal frame. + */ + err |= __put_user(sp, &to->rsp); err |= PUTREG(regs, RBX, to, rbx); err |= PUTREG(regs, RDX, to, rdx); err |= PUTREG(regs, RCX, to, rcx); @@ -102,7 +107,7 @@ int copy_sc_to_user_skas(struct sigconte #ifdef CONFIG_MODE_TT int copy_sc_from_user_tt(struct sigcontext *to, struct sigcontext *from, - int fpsize) + int fpsize) { struct _fpstate *to_fp, *from_fp; unsigned long sigs; @@ -120,7 +125,7 @@ int copy_sc_from_user_tt(struct sigconte } int copy_sc_to_user_tt(struct sigcontext *to, struct _fpstate *fp, - struct sigcontext *from, int fpsize) + struct sigcontext *from, int fpsize, unsigned long sp) { struct _fpstate *to_fp, *from_fp; int err; @@ -128,11 +133,17 @@ int copy_sc_to_user_tt(struct sigcontext to_fp = (fp ? fp : (struct _fpstate *) (to + 1)); from_fp = from->fpstate; err = copy_to_user(to, from, sizeof(*to)); + /* The SP in the sigcontext is the updated one for the signal + * delivery. The sp passed in is the original, and this needs + * to be restored, so we stick it in separately. + */ + err |= copy_to_user(&SC_SP(to), sp, sizeof(sp)); + if(from_fp != NULL){ err |= copy_to_user(&to->fpstate, &to_fp, sizeof(to->fpstate)); err |= copy_to_user(to_fp, from_fp, fpsize); } - return(err); + return err; } #endif @@ -148,11 +159,12 @@ static int copy_sc_from_user(struct pt_r } static int copy_sc_to_user(struct sigcontext *to, struct _fpstate *fp, - struct pt_regs *from, unsigned long mask) + struct pt_regs *from, unsigned long mask, + unsigned long sp) { return(CHOOSE_MODE(copy_sc_to_user_tt(to, fp, UPT_SC(&from->regs), - sizeof(*fp)), - copy_sc_to_user_skas(to, fp, from, mask))); + sizeof(*fp), sp), + copy_sc_to_user_skas(to, fp, from, mask, sp))); } struct rt_sigframe @@ -170,6 +182,7 @@ int setup_signal_stack_si(unsigned long { struct rt_sigframe __user *frame; struct _fpstate __user *fp = NULL; + unsigned long save_sp = PT_REGS_RSP(regs); int err = 0; struct task_struct *me = current; @@ -193,14 +206,25 @@ int setup_signal_stack_si(unsigned long goto out; } + /* Update SP now because the page fault handler refuses to extend + * the stack if the faulting address is too far below the current + * SP, which frame now certainly is. If there's an error, the original + * value is restored on the way out. + * When writing the sigcontext to the stack, we have to write the + * original value, so that's passed to copy_sc_to_user, which does + * the right thing with it. + */ + PT_REGS_RSP(regs) = (unsigned long) frame; + /* Create the ucontext. */ err |= __put_user(0, &frame->uc.uc_flags); err |= __put_user(0, &frame->uc.uc_link); err |= __put_user(me->sas_ss_sp, &frame->uc.uc_stack.ss_sp); - err |= __put_user(sas_ss_flags(PT_REGS_SP(regs)), + err |= __put_user(sas_ss_flags(save_sp), &frame->uc.uc_stack.ss_flags); err |= __put_user(me->sas_ss_size, &frame->uc.uc_stack.ss_size); - err |= copy_sc_to_user(&frame->uc.uc_mcontext, fp, regs, set->sig[0]); + err |= copy_sc_to_user(&frame->uc.uc_mcontext, fp, regs, set->sig[0], + save_sp); err |= __put_user(fp, &frame->uc.uc_mcontext.fpstate); if (sizeof(*set) == 16) { __put_user(set->sig[0], &frame->uc.uc_sigmask.sig[0]); @@ -217,10 +241,10 @@ int setup_signal_stack_si(unsigned long err |= __put_user(ka->sa.sa_restorer, &frame->pretcode); else /* could use a vstub here */ - goto out; + goto restore_sp; if (err) - goto out; + goto restore_sp; /* Set up registers for signal handler */ { @@ -238,10 +262,12 @@ int setup_signal_stack_si(unsigned long PT_REGS_RSI(regs) = (unsigned long) &frame->info; PT_REGS_RDX(regs) = (unsigned long) &frame->uc; PT_REGS_RIP(regs) = (unsigned long) ka->sa.sa_handler; - - PT_REGS_RSP(regs) = (unsigned long) frame; out: - return(err); + return err; + +restore_sp: + PT_REGS_RSP(regs) = save_sp; + return err; } long sys_rt_sigreturn(struct pt_regs *regs) |
From: Allen C. <al...@us...> - 2006-02-08 23:10:20
|
On Wednesday 08 February 2006 15:13, Jeff Dike wrote: > Can you try out the patch below? It fixes it on i386, but I'm > still seeing segfaulting on x86_64. The patch works great on my vanilla 2.6.15.3 guest kernel compiled on i386 running on a skas3 host. No segfaults after several thousand iterations of backquoted commands inside tcsh. Thanks. |
From: Blaisorblade <bla...@ya...> - 2006-02-08 23:34:39
|
On Thursday 09 February 2006 00:11, Allen Chan wrote: > On Wednesday 08 February 2006 15:13, Jeff Dike wrote: > > Can you try out the patch below? It fixes it on i386, but I'm > > still seeing segfaulting on x86_64. > > The patch works great on my vanilla 2.6.15.3 guest kernel compiled on > i386 running on a skas3 host. No segfaults after several thousand > iterations of backquoted commands inside tcsh. > Thanks. Ok, I'll include this in my -bs2, together with your last batch of patches for mainline. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com |
From: uml@s.rhythm.cx - 2006-02-09 02:32:17
|
On Wed, Feb 08, 2006 at 03:13:19PM -0500, Jeff Dike wrote: > On Thu, Feb 02, 2006 at 01:17:49PM -0500, uml@s.rhythm.cx wrote: > > I'm also seeing this on skas3. > > Can you try out the patch below? It fixes it on i386, but I'm still seeing > segfaulting on x86_64. Yup, that fixed it. 2.6.15.2-bs1 + your patch on the guest; 2.6.12.6+skas3 on an i386 host. I don't have any x86_64 hosts though... Thanks very much. |