From: Matt Z. <md...@de...> - 2003-12-20 01:13:31
|
There definitely seems to be something awry here, even in skas mode. I am getting a lot of ENOSYS errors inside UML: /usr/lib/apt/methods/http: error while loading shared libraries: libc.so.6: cannot map zero-fill pages: Error 38 dpkg: error processing /var/cache/apt/archives/debhelper_4.0.2_all.deb (--unpack): failed to rmdir/unlink `/usr/share/man/man1/dh_compress.1.gz.dpkg-tmp': Function not implemented basename: write error: Function not implemented etc. Downgrading to user-mode-linux 2.4.22-6um-1 still has the problem, downgrading further to 2.4.22-5um-1 (the old binary) fixes it. The following things changed from 2.4.22-5um-1 to 2.4.22-6um-1, most of which were outside of UML itself: 1. -6um and later were built with a glibc using 2.6 kernel headers (linux-kernel-headers in Debian unstable). This also required a patch to get UML to compile: @@ -35628,7 +35631,7 @@ + */ +static void disable_lcall(void) +{ -+ struct modify_ldt_ldt_s ldt; ++ struct user_desc ldt; + int err; + + bzero(&ldt, sizeof(ldt)); @@ -36784,7 +36787,7 @@ + nregs = sizeof(dummy->u_debugreg)/sizeof(dummy->u_debugreg[0]); + for(i = 0; i < nregs; i++){ + if((i == 4) || (i == 5)) continue; -+ if(ptrace(PTRACE_POKEUSR, pid, &dummy->u_debugreg[i], ++ if(ptrace(PTRACE_POKEUSER, pid, &dummy->u_debugreg[i], + regs[i]) < 0) + printk("write_debugregs - ptrace failed on " + "register %d, errno = %d\n", errno); @@ -36799,7 +36802,7 @@ + dummy = NULL; + nregs = sizeof(dummy->u_debugreg)/sizeof(dummy->u_debugreg[0]); + for(i = 0; i < nregs; i++){ -+ regs[i] = ptrace(PTRACE_PEEKUSR, pid, ++ regs[i] = ptrace(PTRACE_PEEKUSER, pid, + &dummy->u_debugreg[i], 0); + } +} 2. CONFIG_UML_REAL_TIME_CLOCK=y was added to the configuration 3. Various newer versions of the compiler toolchain were used, with the exception of gcc (the Debian package has been explicitly building using gcc 2.95 for a long time) 4. Update Debian kernel source tree So at this point, I don't know whether something in the development environment has broken UML, or if UML has broken itself somehow. I haven't tried patching 2.4.22-5um to compile with linux-kernel-headers yet, so I don't know whether a new binary of that older code would also work. I don't know whether I'm dealing with one bug or several. There are a lot of factors to try to eliminate, so to help with the investigation, I'd like to know if anyone else s seeing problems similar to http://bugs.debian.org/{224431,224502}, the ENOSYS errors shown above, or anything else strange with 2.4.22-6um or -7um, or after updating any toolchain components. I'd especially like to hear from Debian unstable users, especially anyone building from non-Debian kernel sources. Thanks... -- - mdz |
From: Jeff D. <jd...@ad...> - 2003-12-20 16:58:56
|
On Fri, Dec 19, 2003 at 05:13:23PM -0800, Matt Zimmerman wrote: > There definitely seems to be something awry here, even in skas mode. I am > getting a lot of ENOSYS errors inside UML: > > /usr/lib/apt/methods/http: error while loading shared libraries: libc.so.6: cannot map zero-fill pages: Error 38 > > dpkg: error processing /var/cache/apt/archives/debhelper_4.0.2_all.deb (--unpack): > failed to rmdir/unlink `/usr/share/man/man1/dh_compress.1.gz.dpkg-tmp': Function not implemented > > basename: write error: Function not implemented > > etc. > > Downgrading to user-mode-linux 2.4.22-6um-1 still has the problem, > downgrading further to 2.4.22-5um-1 (the old binary) fixes it. The > following things changed from 2.4.22-5um-1 to 2.4.22-6um-1, most of which > were outside of UML itself: These changes look OK. I looked through the -5 to -6 diffs and saw nothing that would cause stuff to start returning -ENOSYS. The only thing that seemed remotely relevant is the unistd.h change, which fixes a bug in the case of an error in an internally called system call. You can try reverting that and see what happens. Beyond that, if you can find a system call that reproducably fails, then just step through it, and see where the -ENOSYS comes from. This shouldn't be too difficult. Jeff |
From: Matt Z. <md...@de...> - 2003-12-21 00:47:56
|
On Sat, Dec 20, 2003 at 12:14:50PM -0500, Jeff Dike wrote: > On Fri, Dec 19, 2003 at 05:13:23PM -0800, Matt Zimmerman wrote: > > There definitely seems to be something awry here, even in skas mode. I am > > getting a lot of ENOSYS errors inside UML: > > > > /usr/lib/apt/methods/http: error while loading shared libraries: libc.so.6: cannot map zero-fill pages: Error 38 > > > > dpkg: error processing /var/cache/apt/archives/debhelper_4.0.2_all.deb (--unpack): > > failed to rmdir/unlink `/usr/share/man/man1/dh_compress.1.gz.dpkg-tmp': Function not implemented > > > > basename: write error: Function not implemented > > > > etc. > > > > Downgrading to user-mode-linux 2.4.22-6um-1 still has the problem, > > downgrading further to 2.4.22-5um-1 (the old binary) fixes it. The > > following things changed from 2.4.22-5um-1 to 2.4.22-6um-1, most of which > > were outside of UML itself: > > These changes look OK. I looked through the -5 to -6 diffs and saw nothing > that would cause stuff to start returning -ENOSYS. The only thing that > seemed remotely relevant is the unistd.h change, which fixes a bug in the > case of an error in an internally called system call. You can try reverting > that and see what happens. > > Beyond that, if you can find a system call that reproducably fails, then > just step through it, and see where the -ENOSYS comes from. This shouldn't > be too difficult. Thanks for looking at it. It's sounding more and more like the new glibc/linux-kernel-headers packages have broken UML. Maybe something is getting the host's kernel headers when it needs the ones from the UML build tree? -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-21 00:53:04
|
On Sat, Dec 20, 2003 at 12:14:50PM -0500, Jeff Dike wrote: > Beyond that, if you can find a system call that reproducably fails, then > just step through it, and see where the -ENOSYS comes from. This shouldn't > be too difficult. Unfortunately, I can't seem to find a system call that reproducibly fails. There's something weird and racy going on. With exactly the same UML invocation, twice in a row, the first will panic: Kernel panic: Segfault with no mm In idle task - not syncing and the second will boot successfully. Even within an invocation, things aren't normal: sh-2.05a# apt-get source -b hello apt-get: error while loading shared libraries: /lib/libm.so.6: cannot read file data: Error 38 sh-2.05a# sh-2.05a# apt-get source -b hello Reading Package Lists... Done Building Dependency Tree... Done And of course, I haven't been able to get this to happen in the debugger (yet). -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-21 01:07:03
|
On Sat, Dec 20, 2003 at 04:52:57PM -0800, Matt Zimmerman wrote: > sh-2.05a# apt-get source -b hello > apt-get: error while loading shared libraries: /lib/libm.so.6: cannot read > file data: Error 38 > sh-2.05a# > sh-2.05a# apt-get source -b hello > Reading Package Lists... Done > Building Dependency Tree... Done > > And of course, I haven't been able to get this to happen in the debugger > (yet). Well, with enough attempts, I'm able to get it to happen in gdb. However, I can't get it down to a single syscall, and can't do it reliably. I tried setting a breakpoint at the point in syscall_kern.c where it can return -ENOSYS, but it never hits it. It doesn't help that apparently when I send SIGINT to try to get back to gdb, it kills UML. This might be a gdb 6.0 problem. -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-28 09:33:25
|
On Sat, Dec 20, 2003 at 05:06:56PM -0800, Matt Zimmerman wrote: > On Sat, Dec 20, 2003 at 04:52:57PM -0800, Matt Zimmerman wrote: > > > sh-2.05a# apt-get source -b hello > > apt-get: error while loading shared libraries: /lib/libm.so.6: cannot read > > file data: Error 38 > > sh-2.05a# > > sh-2.05a# apt-get source -b hello > > Reading Package Lists... Done > > Building Dependency Tree... Done > > > > And of course, I haven't been able to get this to happen in the debugger > > (yet). > > Well, with enough attempts, I'm able to get it to happen in gdb. However, I > can't get it down to a single syscall, and can't do it reliably. I tried > setting a breakpoint at the point in syscall_kern.c where it can return > -ENOSYS, but it never hits it. It looks like in the case where it breaks, the system call number is 0, so it is passing the test in execute_syscall_skas, and instead invoking sys_ni_syscall. Here is the regs struct in one instance: $16 = {regs = {tt = {syscall = 2, sc = 0xbfffd68c}, skas = {regs = {2, 3221214860, 14, 14, 3221214860, 3221214492, 4294967258, 43, 43, 0, 0, 4, 1074631684, 35, 2097815, 3221214444, 43}, fp = { 0 <repeats 27 times>}, xfp = {2098047, 0 <repeats 31 times>, 4294967295, 2734743551, 16401, 0, 0, 3413842944, 16404, 0 <repeats 88 times>, 2726428672}, fault_addr = 0, fault_type = 1, trap_type = 0, syscall = 0, is_user = 1}}} The process isn't invoking syscall 0 (in this case it was actually __NR_select (82)). syscall matches ORIG_EAX, though, so I guess something is going wrong earlier, maybe in move_registers? -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-28 09:51:21
|
On Sun, Dec 28, 2003 at 01:33:17AM -0800, Matt Zimmerman wrote: > On Sat, Dec 20, 2003 at 05:06:56PM -0800, Matt Zimmerman wrote: > > > On Sat, Dec 20, 2003 at 04:52:57PM -0800, Matt Zimmerman wrote: > > > > > sh-2.05a# apt-get source -b hello > > > apt-get: error while loading shared libraries: /lib/libm.so.6: cannot read > > > file data: Error 38 > > > sh-2.05a# > > > sh-2.05a# apt-get source -b hello > > > Reading Package Lists... Done > > > Building Dependency Tree... Done > > > > > > And of course, I haven't been able to get this to happen in the debugger > > > (yet). > > > > Well, with enough attempts, I'm able to get it to happen in gdb. However, I > > can't get it down to a single syscall, and can't do it reliably. I tried > > setting a breakpoint at the point in syscall_kern.c where it can return > > -ENOSYS, but it never hits it. > > It looks like in the case where it breaks, the system call number is 0, so > it is passing the test in execute_syscall_skas, and instead invoking > sys_ni_syscall. Here is the regs struct in one instance: > > $16 = {regs = {tt = {syscall = 2, sc = 0xbfffd68c}, skas = {regs = {2, > 3221214860, 14, 14, 3221214860, 3221214492, 4294967258, 43, 43, 0, 0, > 4, 1074631684, 35, 2097815, 3221214444, 43}, fp = { > 0 <repeats 27 times>}, xfp = {2098047, 0 <repeats 31 times>, > 4294967295, 2734743551, 16401, 0, 0, 3413842944, 16404, > 0 <repeats 88 times>, 2726428672}, fault_addr = 0, fault_type = 1, > trap_type = 0, syscall = 0, is_user = 1}}} > > The process isn't invoking syscall 0 (in this case it was actually __NR_select > (82)). syscall matches ORIG_EAX, though, so I guess something is going wrong > earlier, maybe in move_registers? This is making less and less sense. handle_trap has this code: syscall_nr = PT_SYSCALL_NR(regs->skas.regs); UPT_SYSCALL_NR(regs) = syscall_nr; if(syscall_nr < 1){ relay_signal(SIGTRAP, regs); return; } As I understand it, PT_SYSCALL_NR refers to ORIG_EAX, and UPT_SYSCALL_NR refers to skas.syscall. i.e., syscall=0 can't happen. So either things are not as they seem, or something is happening to regs between here and execute_syscall_skas. Maybe there is some disconnect between uml_pt_regs and pt_regs? I can't think how, though. the structs are identical in asm/ptrace.h. In fact, the only differences are these: --- ptrace.h-2.4 2003-12-28 01:44:17.000000000 -0800 +++ ptrace.h-2.6 2003-12-28 01:44:20.000000000 -0800 @@ -49,15 +49,14 @@ #define PTRACE_GETFPXREGS 18 #define PTRACE_SETFPXREGS 19 -#define PTRACE_SETOPTIONS 21 +#define PTRACE_OLDSETOPTIONS 21 -/* options set using PTRACE_SETOPTIONS */ -#define PTRACE_O_TRACESYSGOOD 0x00000001 +#define PTRACE_GET_THREAD_AREA 25 +#define PTRACE_SET_THREAD_AREA 26 #ifdef __KERNEL__ #define user_mode(regs) ((VM_MASK & (regs)->eflags) || (3 & (regs)->xcs)) #define instruction_pointer(regs) ((regs)->eip) -extern void show_regs(struct pt_regs *); #endif #endif -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-28 09:52:31
|
On Sun, Dec 28, 2003 at 01:51:15AM -0800, Matt Zimmerman wrote: > syscall_nr = PT_SYSCALL_NR(regs->skas.regs); > UPT_SYSCALL_NR(regs) = syscall_nr; > if(syscall_nr < 1){ > relay_signal(SIGTRAP, regs); > return; > } By the way, why is the test < 1 here, and < 0 in execute_syscall_skas? Shouldn't they match? -- - mdz |
From: Jeff D. <jd...@ad...> - 2004-01-06 02:39:28
|
md...@de... said: > By the way, why is the test < 1 here, and < 0 in execute_syscall_skas? No idea :-) > Shouldn't they match? Yup. Jeff |
From: Matt Z. <md...@de...> - 2003-12-28 10:12:48
|
On Sun, Dec 28, 2003 at 01:51:15AM -0800, Matt Zimmerman wrote: > This is making less and less sense. handle_trap has this code: > > syscall_nr = PT_SYSCALL_NR(regs->skas.regs); > UPT_SYSCALL_NR(regs) = syscall_nr; > if(syscall_nr < 1){ > relay_signal(SIGTRAP, regs); > return; > } > > As I understand it, PT_SYSCALL_NR refers to ORIG_EAX, and UPT_SYSCALL_NR > refers to skas.syscall. i.e., syscall=0 can't happen. So either things are > not as they seem, or something is happening to regs between here and > execute_syscall_skas. [a few hundred printf's later] So, the sequence of events in handle_trap is this: 1. UPT_SYSCALL_NR(regs) == 78 2. ptrace(PTRACE_POKEUSER,...) 3. UPT_SYSCALL_NR(regs) == 78 (still OK) 4. ptrace(PTRACE_SYSCALL,...) 5. UPT_SYSCALL_NR(regs) == 78 (still OK) 6. waitpid(pid,...) 7. UPT_SYSCALL_NR(regs) == 0 (boom) I have no idea why. -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-28 11:30:49
|
On Sun, Dec 28, 2003 at 02:12:40AM -0800, Matt Zimmerman wrote: > So, the sequence of events in handle_trap is this: > > 1. UPT_SYSCALL_NR(regs) == 78 > > 2. ptrace(PTRACE_POKEUSER,...) > > 3. UPT_SYSCALL_NR(regs) == 78 (still OK) > > 4. ptrace(PTRACE_SYSCALL,...) > > 5. UPT_SYSCALL_NR(regs) == 78 (still OK) > > 6. waitpid(pid,...) > > 7. UPT_SYSCALL_NR(regs) == 0 (boom) > > I have no idea why. I added some code to dump the regs struct before and after waitpid, and it turns out that in fact, the syscall element is the only one which is different; the rest of the structure is untouched. Corruption seems unlikely, and the waitpid call certainly shouldn't be touching this...could another thread be clobbering it somehow? -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-30 18:43:39
|
On Sun, Dec 28, 2003 at 03:30:42AM -0800, Matt Zimmerman wrote: > On Sun, Dec 28, 2003 at 02:12:40AM -0800, Matt Zimmerman wrote: > > > So, the sequence of events in handle_trap is this: > > > > 1. UPT_SYSCALL_NR(regs) == 78 > > > > 2. ptrace(PTRACE_POKEUSER,...) > > > > 3. UPT_SYSCALL_NR(regs) == 78 (still OK) > > > > 4. ptrace(PTRACE_SYSCALL,...) > > > > 5. UPT_SYSCALL_NR(regs) == 78 (still OK) > > > > 6. waitpid(pid,...) > > > > 7. UPT_SYSCALL_NR(regs) == 0 (boom) > > > > I have no idea why. > > I added some code to dump the regs struct before and after waitpid, and it > turns out that in fact, the syscall element is the only one which is > different; the rest of the structure is untouched. Corruption seems > unlikely, and the waitpid call certainly shouldn't be touching this...could > another thread be clobbering it somehow? It turns out that this problem seems to be due to compiler incompatibility. UML had been built with gcc 2.95 due to old breakage, and when built with gcc 3.3 (as glibc is), everything starts working again. My suspicion is that this is due to certain recent changes in pthreads. -- - mdz |
From: Jeff D. <jd...@ad...> - 2004-01-06 02:39:07
|
md...@de... said: > It turns out that this problem seems to be due to compiler > incompatibility. UML had been built with gcc 2.95 due to old breakage, > and when built with gcc 3.3 (as glibc is), everything starts working > again. My suspicion is that this is due to certain recent changes in > pthreads. Is it your opinion that there's no problem in UML itself? Having a field in the sigcontext getting magically munged is somewhat worrying. I'd be happier knowing what exactly was happening so I can be sure this wasn't exposing some subtle UML bug. Jeff |
From: Nuno S. <nun...@vg...> - 2004-01-06 07:40:20
|
Hi! Jeff Dike wrote: > md...@de... said: > >>It turns out that this problem seems to be due to compiler >>incompatibility. UML had been built with gcc 2.95 due to old breakage, >>and when built with gcc 3.3 (as glibc is), everything starts working >>again. My suspicion is that this is due to certain recent changes in >>pthreads. > > > Is it your opinion that there's no problem in UML itself? Having a field > in the sigcontext getting magically munged is somewhat worrying. I'd be > happier knowing what exactly was happening so I can be sure this wasn't > exposing some subtle UML bug. > I've been doing some tests and I'd say that the problem is something regarding the NPTL+TLS+__thread features of recent libc6 (2.3.2 and 2.3.3cvs)... But I'm clueless about the fix :-) I'll setup a chroot where I can play around with glibc setups. OARS, why does linux (the UML executable) doesn't run under /lib/ld-linux.so? Example: puma:/tmp# ldd /usr/bin/scp libutil.so.1 => /lib/tls/i686/cmov/libutil.so.1 (0x4002f000) libz.so.1 => /usr/lib/libz.so.1 (0x40032000) libnsl.so.1 => /lib/tls/i686/cmov/libnsl.so.1 (0x40044000) libcrypto.so.0.9.7 => /usr/lib/i686/cmov/libcrypto.so.0.9.7 (0x40059000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x40156000) libdl.so.2 => /lib/tls/i686/cmov/libdl.so.2 (0x4028f000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) puma:/tmp# ldd ./linux libutil.so.1 => /lib/tls/i686/cmov/libutil.so.1 (0x40018000) libc.so.6 => /lib/tls/i686/cmov/libc.so.6 (0x4001b000) /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000) puma:/tmp# scp has a superset of linux's shared libs and runs ok: puma:/tmp# /lib/ld-linux.so.2 --library-path /lib:/usr/lib /usr/bin/scp usage: scp [-pqrvBC1246] [-F config] [-S program] [-P port] [-c cipher] [-i identity] [-l limit] [-o option] [[user@]host1:]file1 [...] [[user@]host2:]file2 puma:/tmp# But linux does not run: puma:/tmp# /lib/ld-linux.so.2 --library-path /lib:/usr/lib /tmp/linux Checking for the skas3 patch in the host...found Checking for /proc/mm...found Mapping memory: Invalid argument puma:/tmp# If linux could run under ld-linux-so helper testing would be easier :) Regards, Nuno Silva |
From: Matt Z. <md...@de...> - 2004-01-06 08:03:05
|
On Tue, Jan 06, 2004 at 07:41:39AM +0000, Nuno Silva wrote: > Jeff Dike wrote: > >md...@de... said: > >Is it your opinion that there's no problem in UML itself? Having a field > >in the sigcontext getting magically munged is somewhat worrying. I'd be > >happier knowing what exactly was happening so I can be sure this wasn't > >exposing some subtle UML bug. > > > > I've been doing some tests and I'd say that the problem is something > regarding the NPTL+TLS+__thread features of recent libc6 (2.3.2 and > 2.3.3cvs)... But I'm clueless about the fix :-) I suspect that the problem lies in this direction, because it's the only relevant news that I've heard from glibc in recent months, but I wouldn't expect NPTL to relate directly because I'm running on 2.4 (as are several others who have seen the problem). This is the issue where i386 support was dropped and i486-specific instructions used, right? -- - mdz |
From: Nuno S. <nun...@vg...> - 2004-01-06 08:18:51
|
Matt Zimmerman wrote: > On Tue, Jan 06, 2004 at 07:41:39AM +0000, Nuno Silva wrote: > > >>Jeff Dike wrote: >> >>>md...@de... said: >>>Is it your opinion that there's no problem in UML itself? Having a field >>>in the sigcontext getting magically munged is somewhat worrying. I'd be >>>happier knowing what exactly was happening so I can be sure this wasn't >>>exposing some subtle UML bug. >>> >> >>I've been doing some tests and I'd say that the problem is something >>regarding the NPTL+TLS+__thread features of recent libc6 (2.3.2 and >>2.3.3cvs)... But I'm clueless about the fix :-) > > > I suspect that the problem lies in this direction, because it's the only > relevant news that I've heard from glibc in recent months, but I wouldn't > expect NPTL to relate directly because I'm running on 2.4 (as are several > others who have seen the problem). > > This is the issue where i386 support was dropped and i486-specific > instructions used, right? > Yes, AFAIK nptl requires 486 asm. I just finished testing debian-unstable with glibc with nptl, tls and __thread support in chroot and everything works fine except UML :) I'm recompiling glibc with profiling to get a usable backtrace right now. With this setup linux gets a SIGSTOP (automagically) and after SIGCONT it just segfaults... In a few moments I'll have a backtrace :-) Regards, Nuno Silva |
From: Nuno S. <nun...@vg...> - 2004-01-06 08:46:10
|
>> >> I suspect that the problem lies in this direction, because it's the only >> relevant news that I've heard from glibc in recent months, but I wouldn't >> expect NPTL to relate directly because I'm running on 2.4 (as are several >> others who have seen the problem). >> >> This is the issue where i386 support was dropped and i486-specific >> instructions used, right? >> > > Yes, AFAIK nptl requires 486 asm. > > I just finished testing debian-unstable with glibc with nptl, tls and > __thread support in chroot and everything works fine except UML :) > Another suspect is sysenter/sysexit support present in recent libc when used with 2.6.0... Hmmm :) Regards, Nuno Silva |
From: Nuno S. <nun...@vg...> - 2004-01-06 09:18:47
Attachments:
straced-linux.txt
|
Nuno Silva wrote: [...] > > I've been doing some tests and I'd say that the problem is something > regarding the NPTL+TLS+__thread features of recent libc6 (2.3.2 and > 2.3.3cvs)... But I'm clueless about the fix :-) > > I'll setup a chroot where I can play around with glibc setups. > OK, just finished the chroot: debian unstable with glibc-2.3.3cvs hand-compiled (../libc/configure --with-tls --with-__thread --enable-add-ons=nptl --prefix=/ --enable-kernel=2.6.0) and removed the debian's libc. Everything runs OK with the new glibc, tested: bash, apt-get, perl, mc, strace, gcc, etc. However linux (UML's executable) doesn't run: puma:/uml# ls -la /proc/mm --w--w--w- 1 root root 0 Jan 6 08:33 /proc/mm puma:/uml# uname -a Linux puma 2.6.0 #2 Mon Jan 5 09:25:45 WET 2004 i686 unknown unknown GNU/Linux puma:/uml# ./linux Checking for the skas3 patch in the host...found Checking for /proc/mm...found [1]+ Stopped ./linux puma:/uml# fg ./linux Segmentation fault puma:/uml# Now with strace: (see attached file) Now with gdb: puma:/uml# gdb ./linux GNU gdb 6.0-debian Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-linux"... (gdb) run Starting program: /uml/linux Detaching after fork from child process 26378. Checking for the skas3 patch in the host...found Checking for /proc/mm...found Detaching after fork from child process 26379. Program received signal SIGSTOP, Stopped (signal). 0x400f18dc in clone () from /lib/libc.so.6 (gdb) bt #0 0x400f18dc in clone () from /lib/libc.so.6 #1 0x4014f000 in ?? () #2 0x00000007 in ?? () #3 0x00001000 in ?? () #4 0xa00a7b7f in start_userspace (cpu=26379) at process.c:113 #5 0xa00a88e2 in start_uml_skas () at process_kern.c:162 #6 0xa00a6bcb in linux_main (argc=0, argv=0xa0000000) at um_arch.c:387 #7 0xa000de0e in main (argc=1, argv=0xbffffa44, envp=0xbffffa4c) at arch/um/main.c:146 (gdb) cont Continuing. Program received signal SIGSTOP, Stopped (signal). 0x400f18dc in clone () from /lib/libc.so.6 (gdb) bt #0 0x400f18dc in clone () from /lib/libc.so.6 #1 0x4014f000 in ?? () #2 0x00000007 in ?? () #3 0x00001000 in ?? () #4 0xa00a7b7f in start_userspace (cpu=26379) at process.c:113 #5 0xa00a88e2 in start_uml_skas () at process_kern.c:162 #6 0xa00a6bcb in linux_main (argc=0, argv=0xa0000000) at um_arch.c:387 #7 0xa000de0e in main (argc=1, argv=0xbffffa44, envp=0xbffffa4c) at arch/um/main.c:146 (gdb) cont Continuing. Program received signal SIGSEGV, Segmentation fault. 0xa001305a in panic (fmt=0xbfffc000 "") at panic.c:67 67 panic.c: No such file or directory. in panic.c (gdb) bt #0 0xa001305a in panic (fmt=0xbfffc000 "") at panic.c:67 #1 0xa00a7bdd in start_userspace (cpu=-1073758208) at process.c:127 #2 0xa00a88e2 in start_uml_skas () at process_kern.c:162 #3 0xa00a6bcb in linux_main (argc=0, argv=0xa0000000) at um_arch.c:387 #4 0xa000de0e in main (argc=1, argv=0xbffffa44, envp=0xbffffa4c) at arch/um/main.c:146 (gdb) cont Continuing. Program terminated with signal SIGSEGV, Segmentation fault. The program no longer exists. (gdb) quit puma:/uml# In my tree, arch/um/kernel/skas/process.c:113 is the pid=clone(..., in void start_userspace(int cpu) { void *stack; unsigned long sp; int pid, status, n; stack = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); if(stack == MAP_FAILED) panic("start_userspace : mmap failed, errno = %d", errno); sp = (unsigned long) stack + PAGE_SIZE - sizeof(void *); pid = clone(userspace_tramp, (void *) sp, CLONE_FILES | CLONE_VM | SIGCHLD, NULL); I hope this makes sense to someone because I'm off to sleep a few hours :-) Regards, Nuno Silva |
From: Matt Z. <md...@de...> - 2004-01-06 17:13:09
|
On Tue, Jan 06, 2004 at 09:20:06AM +0000, Nuno Silva wrote: > Nuno Silva wrote: > > [...] > > > > >I've been doing some tests and I'd say that the problem is something > >regarding the NPTL+TLS+__thread features of recent libc6 (2.3.2 and > >2.3.3cvs)... But I'm clueless about the fix :-) > > > >I'll setup a chroot where I can play around with glibc setups. > > > > OK, just finished the chroot: debian unstable with glibc-2.3.3cvs > hand-compiled (../libc/configure --with-tls --with-__thread > --enable-add-ons=nptl --prefix=/ --enable-kernel=2.6.0) and removed the > debian's libc. > > Everything runs OK with the new glibc, tested: bash, apt-get, perl, mc, > strace, gcc, etc. > > However linux (UML's executable) doesn't run: > puma:/uml# ls -la /proc/mm > --w--w--w- 1 root root 0 Jan 6 08:33 /proc/mm > puma:/uml# uname -a > Linux puma 2.6.0 #2 Mon Jan 5 09:25:45 WET 2004 i686 unknown unknown > GNU/Linux > puma:/uml# ./linux > Checking for the skas3 patch in the host...found > Checking for /proc/mm...found > > [1]+ Stopped ./linux > puma:/uml# fg > ./linux > Segmentation fault > puma:/uml# This looks quite different from what I and others were seeing in #224431, which was random ENOSYS errors because the system call number in the regs struct was being mysteriously cleared during a context switch on the cost. > Now with strace: > (see attached file) > > Now with gdb: > puma:/uml# gdb ./linux > GNU gdb 6.0-debian > Copyright 2003 Free Software Foundation, Inc. > GDB is free software, covered by the GNU General Public License, and you are > welcome to change it and/or distribute copies of it under certain > conditions. > Type "show copying" to see the conditions. > There is absolutely no warranty for GDB. Type "show warranty" for details. > This GDB was configured as "i386-linux"... > (gdb) run > Starting program: /uml/linux > Detaching after fork from child process 26378. > Checking for the skas3 patch in the host...found > Checking for /proc/mm...found > Detaching after fork from child process 26379. > > Program received signal SIGSTOP, Stopped (signal). > 0x400f18dc in clone () from /lib/libc.so.6 > (gdb) bt > #0 0x400f18dc in clone () from /lib/libc.so.6 > #1 0x4014f000 in ?? () > #2 0x00000007 in ?? () > #3 0x00001000 in ?? () > #4 0xa00a7b7f in start_userspace (cpu=26379) at process.c:113 > #5 0xa00a88e2 in start_uml_skas () at process_kern.c:162 > #6 0xa00a6bcb in linux_main (argc=0, argv=0xa0000000) at um_arch.c:387 > #7 0xa000de0e in main (argc=1, argv=0xbffffa44, envp=0xbffffa4c) at > arch/um/main.c:146 > (gdb) cont > Continuing. > > Program received signal SIGSTOP, Stopped (signal). > 0x400f18dc in clone () from /lib/libc.so.6 > (gdb) bt > #0 0x400f18dc in clone () from /lib/libc.so.6 > #1 0x4014f000 in ?? () > #2 0x00000007 in ?? () > #3 0x00001000 in ?? () > #4 0xa00a7b7f in start_userspace (cpu=26379) at process.c:113 > #5 0xa00a88e2 in start_uml_skas () at process_kern.c:162 > #6 0xa00a6bcb in linux_main (argc=0, argv=0xa0000000) at um_arch.c:387 > #7 0xa000de0e in main (argc=1, argv=0xbffffa44, envp=0xbffffa4c) at > arch/um/main.c:146 > (gdb) cont > Continuing. > > Program received signal SIGSEGV, Segmentation fault. > 0xa001305a in panic (fmt=0xbfffc000 "") at panic.c:67 > 67 panic.c: No such file or directory. > in panic.c > (gdb) bt > #0 0xa001305a in panic (fmt=0xbfffc000 "") at panic.c:67 > #1 0xa00a7bdd in start_userspace (cpu=-1073758208) at process.c:127 > #2 0xa00a88e2 in start_uml_skas () at process_kern.c:162 > #3 0xa00a6bcb in linux_main (argc=0, argv=0xa0000000) at um_arch.c:387 > #4 0xa000de0e in main (argc=1, argv=0xbffffa44, envp=0xbffffa4c) at > arch/um/main.c:146 > (gdb) cont > Continuing. > > Program terminated with signal SIGSEGV, Segmentation fault. > The program no longer exists. > (gdb) quit > puma:/uml# > > In my tree, arch/um/kernel/skas/process.c:113 is the pid=clone(..., in > > void start_userspace(int cpu) > { > void *stack; > unsigned long sp; > int pid, status, n; > > stack = mmap(NULL, PAGE_SIZE, PROT_READ | PROT_WRITE | PROT_EXEC, > MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); > if(stack == MAP_FAILED) > panic("start_userspace : mmap failed, errno = %d", errno); > sp = (unsigned long) stack + PAGE_SIZE - sizeof(void *); > > pid = clone(userspace_tramp, (void *) sp, > CLONE_FILES | CLONE_VM | SIGCHLD, NULL); > > > I hope this makes sense to someone because I'm off to sleep a few hours :-) > > Regards, > Nuno Silva > > > > > execve("./linux", ["./linux"], [/* 25 vars */]) = 0 > uname({sys="Linux", node="puma", ...}) = 0 > brk(0) = 0xa01f6000 > open("/etc/ld.so.preload", O_RDONLY) = -1 ENOENT (No such file or directory) > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000 > open("/lib/tls/i686/mmx/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) > stat64("/lib/tls/i686/mmx", 0xbffff16c) = -1 ENOENT (No such file or directory) > open("/lib/tls/i686/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) > stat64("/lib/tls/i686", 0xbffff16c) = -1 ENOENT (No such file or directory) > open("/lib/tls/mmx/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) > stat64("/lib/tls/mmx", 0xbffff16c) = -1 ENOENT (No such file or directory) > open("/lib/tls/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) > stat64("/lib/tls", 0xbffff16c) = -1 ENOENT (No such file or directory) > open("/lib/i686/mmx/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) > stat64("/lib/i686/mmx", 0xbffff16c) = -1 ENOENT (No such file or directory) > open("/lib/i686/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) > stat64("/lib/i686", 0xbffff16c) = -1 ENOENT (No such file or directory) > open("/lib/mmx/libutil.so.1", O_RDONLY) = -1 ENOENT (No such file or directory) > stat64("/lib/mmx", 0xbffff16c) = -1 ENOENT (No such file or directory) > open("/lib/libutil.so.1", O_RDONLY) = 3 > read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0@\16\0\000"..., 512) = 512 > fstat64(3, {st_mode=S_IFREG|0755, st_size=92994, ...}) = 0 > mmap2(NULL, 10672, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x40018000 > mmap2(0x4001a000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x1) = 0x4001a000 > close(3) = 0 > open("/lib/libc.so.6", O_RDONLY) = 3 > read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0PZ\1\000"..., 512) = 512 > fstat64(3, {st_mode=S_IFREG|0755, st_size=19666162, ...}) = 0 > mmap2(NULL, 1252652, PROT_READ|PROT_EXEC, MAP_PRIVATE, 3, 0) = 0x4001b000 > mmap2(0x40142000, 36864, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 3, 0x127) = 0x40142000 > mmap2(0x4014b000, 7468, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0x4014b000 > close(3) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4014d000 > set_thread_area({entry_number:-1 -> 6, base_addr:0x4014d070, limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0 > rt_sigprocmask(SIG_SETMASK, [IO], NULL, 8) = 0 > getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0 > brk(0) = 0xa01f6000 > brk(0xa0217000) = 0xa0217000 > brk(0) = 0xa0217000 > rt_sigaction(SIGINT, {0xa000dc90, [], SA_NOMASK|SA_ONESHOT}, NULL, 8) = 0 > rt_sigaction(SIGTERM, {0xa000dc90, [], SA_NOMASK|SA_ONESHOT}, NULL, 8) = 0 > rt_sigaction(SIGHUP, {0xa000dc90, [], SA_NOMASK|SA_ONESHOT}, NULL, 8) = 0 > fstat64(1, {st_mode=S_IFREG|0644, st_size=3126, ...}) = 0 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4014e000 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4014f000 > clone(child_stack=0x4014ffd8, flags=0|SIGCHLD) = 26355 > --- SIGCHLD (Child exited) @ 0 (0) --- > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > waitpid(26355, [WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP], WUNTRACED) = 26355 > ptrace(0x34 /* PTRACE_??? */, 26355, 0, 0xbffff8a0) = 0 > ptrace(PTRACE_GETREGS, 26355, 0, 0xa01d8c60) = 0 > ptrace(PTRACE_GETFPXREGS, 26355, 0, 0xa01d8d40) = 0 > ptrace(PTRACE_CONT, 26355, 0, SIG_0) = 0 > waitpid(26355, [WIFEXITED(s) && WEXITSTATUS(s) == 1], 0) = 26355 > --- SIGCHLD (Child exited) @ 0 (0) --- > munmap(0x4014f000, 4096) = 0 > access("/proc/mm", W_OK) = 0 > brk(0) = 0xa0217000 > uname({sys="Linux", node="puma", ...}) = 0 > gettimeofday({1073378804, 532883}, NULL) = 0 > getpid() = 26354 > open("/tmp/vm_file-AukWKw", O_RDWR|O_CREAT|O_EXCL, 0600) = 3 > unlink("/tmp/vm_file-AukWKw") = 0 > fchmod(3, 0777) = 0 > _llseek(3, 33554432, [33554432], SEEK_SET) = 0 > write(3, "\0", 1) = 1 > fcntl64(3, F_SETFD, FD_CLOEXEC) = 0 > mmap2(0xa0800000, 25165824, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_FIXED, 3, 0x800) = 0xa0800000 > mkdir("/root/.uml/", 0777) = -1 EEXIST (File exists) > open("/root/.uml/uPjYnh", O_RDWR|O_CREAT|O_EXCL, 0600) = 4 > close(4) = 0 > unlink("/root/.uml/uPjYnh") = 0 > mkdir("/root/.uml/uPjYnh", 0777) = 0 > open("/root/.uml/uPjYnh/pid", O_RDWR|O_CREAT|O_EXCL|O_LARGEFILE, 0644) = 4 > write(4, "26354\n", 6) = 6 > close(4) = 0 > mprotect(0xa0196000, 8192, PROT_READ|PROT_WRITE|PROT_EXEC) = 0 > write(1, "Checking for the skas3 patch in "..., 79Checking for the skas3 patch in the host...found > Checking for /proc/mm...found > ) = 79 > mmap2(NULL, 4096, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x4014f000 > clone(child_stack=0x4014ffd8, flags=CLONE_VM|CLONE_FILES|SIGCHLD) = 26356 > --- SIGCHLD (Child exited) @ 0 (0) --- > --- SIGSTOP (Stopped (signal)) @ 0 (0) --- > --- SIGSTOP (Stopped (signal)) @ 0 (0) --- > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > waitpid(26356, [WIFEXITED(s) && WEXITSTATUS(s) == 0], WUNTRACED) = 26356 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [], [IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [ALRM VTALRM IO], NULL, 8) = 0 > rt_sigprocmask(SIG_UNBLOCK, [ALRM VTALRM], [ALRM VTALRM IO], 8) = 0 > rt_sigprocmask(SIG_BLOCK, [IO], NULL, 8) = 0 > --- SIGSEGV (Segmentation fault) @ 0 (0) --- > +++ killed by SIGSEGV +++ -- - mdz |
From: Nuno S. <nun...@vg...> - 2004-01-09 07:20:49
|
Matt Zimmerman wrote: >>puma:/uml# fg >>./linux >>Segmentation fault >>puma:/uml# > > > This looks quite different from what I and others were seeing in #224431, > which was random ENOSYS errors because the system call number in the regs > struct was being mysteriously cleared during a context switch on the cost. Yes, probably not the same issue. I've made some new tests and I'd say that my problem relates to NPTL in glibc. (I don't have a fix, yet :( ) I'll open another thread :-) Regards, Nuno Silva |
From: Matt Z. <md...@de...> - 2004-01-08 07:07:25
|
On Mon, Jan 05, 2004 at 09:58:32PM -0500, Jeff Dike wrote: > md...@de... said: > > It turns out that this problem seems to be due to compiler > > incompatibility. UML had been built with gcc 2.95 due to old breakage, > > and when built with gcc 3.3 (as glibc is), everything starts working > > again. My suspicion is that this is due to certain recent changes in > > pthreads. > > Is it your opinion that there's no problem in UML itself? Having a field > in the sigcontext getting magically munged is somewhat worrying. I'd be > happier knowing what exactly was happening so I can be sure this wasn't > exposing some subtle UML bug. I was not able to come to a satisfactory conclusion as to the origin of the problem, and once I found a solution, I stopped looking. I'm copying debian-gcc and debian-glibc in case they're interested. Summary for debian-{gcc,glibc}: UML built with gcc-2.95 fails to run correctly on a current unstable system with a 2.4 kernel. The symptoms are very strange; This started to happen recently; UML had been building with gcc 2.95 successfully for over 9 months now with no problems. Details are here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=224431 -- - mdz |
From: Matt Z. <md...@de...> - 2004-01-12 18:36:32
|
On Wed, Jan 07, 2004 at 11:07:18PM -0800, Matt Zimmerman wrote: > On Mon, Jan 05, 2004 at 09:58:32PM -0500, Jeff Dike wrote: > > > md...@de... said: > > > It turns out that this problem seems to be due to compiler > > > incompatibility. UML had been built with gcc 2.95 due to old breakage, > > > and when built with gcc 3.3 (as glibc is), everything starts working > > > again. My suspicion is that this is due to certain recent changes in > > > pthreads. > > > > Is it your opinion that there's no problem in UML itself? Having a field > > in the sigcontext getting magically munged is somewhat worrying. I'd be > > happier knowing what exactly was happening so I can be sure this wasn't > > exposing some subtle UML bug. > > I was not able to come to a satisfactory conclusion as to the origin of the > problem, and once I found a solution, I stopped looking. I'm copying > debian-gcc and debian-glibc in case they're interested. > > Summary for debian-{gcc,glibc}: > > UML built with gcc-2.95 fails to run correctly on a current unstable system > with a 2.4 kernel. The symptoms are very strange; This started to happen recently; UML had been building > with gcc 2.95 successfully for over 9 months now with no problems. > > Details are here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=224431 By the way, the original reason why I started building UML with gcc-2.95 was because building with 3.x broke the slirp transport like so: Kernel panic: read of switch_pipe failed, errno = 9 errno 9 is EBADF. I never did find the real cause of that bug, but it has resurfaced now that I am building with gcc 3.3 again to fix the other, worse bug. I would be interested to know if anyone else has run into it. More information is here: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=176485&archive=yes -- - mdz |
From: Bill A. <all...@ma...> - 2004-01-13 15:25:57
|
On Mon, Jan 12, 2004 at 10:36:23AM -0800, Matt Zimmerman wrote: > By the way, the original reason why I started building UML with gcc-2.95 was > because building with 3.x broke the slirp transport like so: > > Kernel panic: read of switch_pipe failed, errno = 9 > > errno 9 is EBADF. I never did find the real cause of that bug, but it has > resurfaced now that I am building with gcc 3.3 again to fix the other, worse > bug. I would be interested to know if anyone else has run into it. More > information is here: For what it worth with the current Debian UML package I get $ linux ubd0=uml root=/dev/ubd0 eth0=slirp|& less ... Netdevice 0 : SLIRP backend - command line: 'slirp' mconsole (version 2) initialized on /home/bill/.uml/wksNEk/mconsole Partition check: ubda: unknown partition table Initializing stdio console driver NET4: Linux TCP/IP 1.0 for NET4.0 IP: routing cache hash table of 512 buckets, 4Kbytes TCP: Hash tables configured (established 2048 bind 4096) Linux IP multicast router 0.06 plus PIM-SM NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. Kernel panic: Segfault with no mm Without the eth0=slirp parameter, there are no kernel panic. So at least the error message has changed. The host kernel has the skas patch from the Debian package applied. Cheers, -- Bill. <bal...@de...> Imagine a large red swirl here. |
From: Jeff D. <jd...@ad...> - 2004-01-13 17:09:38
|
all...@ma... said: > Kernel panic: Segfault with no mm > Without the eth0=slirp parameter, there are no kernel panic. Can someone get a stack trace from this? Jeff |
From: Bill A. <all...@ma...> - 2004-01-13 17:44:40
|
On Tue, Jan 13, 2004 at 12:30:22PM -0500, Jeff Dike wrote: > all...@ma... said: > > Kernel panic: Segfault with no mm > > Without the eth0=slirp parameter, there are no kernel panic. > > Can someone get a stack trace from this? Here what I get: NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. Program received signal SIGSEGV, Segmentation fault. walk_init_root (name=0xf4a33ee8 <Address 0xf4a33ee8 out of bounds>, nd=0xa08f7b74) at atomic.h:107 107 __asm__ __volatile__( (gdb) bt #0 walk_init_root (name=0xf4a33ee8 <Address 0xf4a33ee8 out of bounds>, nd=0xa08f7b74) at atomic.h:107 (gdb) c Continuing. Breakpoint 1, segv (address=4104339216, ip=2685837537, is_write=2, is_user=0, sc=0xf4a33f10) at trap_kern.c:124 124 if(!is_user && (address >= start_vm) && (address < end_vm)){ (gdb) bt #0 segv (address=4104339216, ip=2685837537, is_write=2, is_user=0, sc=0xf4a33f10) at trap_kern.c:124 (gdb) bt #0 segv (address=4104339216, ip=2685837537, is_write=2, is_user=0, sc=0xf4a33f10) at trap_kern.c:124 (gdb) c Continuing. Breakpoint 2, panic (fmt=0xa08f4000 "") at panic.c:58 58 machine_paniced = 1; (gdb) bt #0 panic (fmt=0xa08f4000 "") at panic.c:58 (gdb) c Continuing. Kernel panic: Segfault with no mm Program exited normally. Cheers, -- Bill. <bal...@de...> Imagine a large red swirl here. |