From: Matt Z. <md...@de...> - 2004-01-13 18:52:06
|
On Tue, Jan 13, 2004 at 12:30:22PM -0500, Jeff Dike wrote: > all...@ma... said: > > Kernel panic: Segfault with no mm > > Without the eth0=slirp parameter, there are no kernel panic. > > Can someone get a stack trace from this? Certainly (this is from 2.4.23-1um). Seems like something goes weird with procfs, but I've no idea why this only happens with slirp (and newer gcc for that matter). (gdb) bt #0 panic (fmt=0x0) at panic.c:58 #1 0xa00d769b in segv (address=8, ip=2685922193, is_write=0, is_user=0, sc=0x0) at trap_kern.c:144 #2 0xa00d7af5 in segv_handler (sig=11, regs=0xa0350274) at trap_user.c:67 #3 0xa00df411 in sig_handler_common_skas (sig=11, sc_ptr=0x58) at trap_user.c:33 #4 0xa00d7c05 in sig_handler (sig=0, sc= {gs = 0, __gsh = 0, fs = 0, __fsh = 0, es = 43, __esh = 0, ds = 43, __dsh = 0, edi = 2687843188, esi = 2688540673, ebp = 2687843100, esp = 2687843028, ebx = 2687843188, edx = 2687843188, ecx = 2687827968, eax = 0, trapno = 14, err = 4, eip = 2684641864, cs = 35, __csh = 0, eflags = 2163202, esp_at_signal = 2687843028, ss = 43, __ssh = 0, fpstate = 0x0, oldmask = 0, cr2 = 8}) at trap_user.c:103 #5 <signal handler called> #6 0xa0046248 in link_path_walk (name=0xa03fe001 "dev", nd=0xa0353b74) at namei.c:462 #7 0xa004674e in path_walk (name=0x0, nd=0xa0353b74) at namei.c:659 #8 0xa0046919 in path_lookup (path=0xa03fe000 "/dev", flags=2687843188, nd=0xa0353b74) at namei.c:748 #9 0xa0047754 in sys_mkdir (pathname=0x0, mode=448) at namei.c:1345 #10 0xa000eafa in prepare_namespace () at init/do_mounts.c:917 #11 0xa000e613 in init (unused=0x0) at init/main.c:580 #12 0xa00d22f9 in run_kernel_thread (fn=0xa000e600 <init>, arg=0x0, #13 0xa00de930 in new_thread_handler (sig=10) at process_kern.c:70 #14 <signal handler called> (gdb) i sym 2685922193 kill + 17 in section .text (gdb) i line *2685922193 Line 155 of "proc_fs.h" starts at address 0xa0178e94 <svc_proc_register+68> and ends at 0xa01c3913. (gdb) up 6 #6 0xa0046248 in link_path_walk (name=0xa03fe001 "dev", nd=0xa0353b74) at namei.c:462 462 inode = nd->dentry->d_inode; (gdb) print *nd $1 = {dentry = 0x0, mnt = 0x0, last = { name = 0x8124 <Address 0x8124 out of bounds>, len = 1, hash = 2686526256}, flags = 16, last_type = 1} -- - mdz |
From: Jeff D. <jd...@ad...> - 2004-01-16 02:17:21
|
md...@de... said: > Certainly (this is from 2.4.23-1um). Seems like something goes weird > with procfs, but I've no idea why this only happens with slirp (and > newer gcc for that matter). Just tried it, boots fine here, but then I'm not playing with new gccs. Can you send me the binary that crashes? I assume it's statically linked... I'm thinking that something funky is happening in the initcalls, and the slirp initcall happens to push the funkiness over the edge. Jeff |
From: Matt Z. <md...@de...> - 2004-01-16 02:38:38
|
On Thu, Jan 15, 2004 at 09:38:34PM -0500, Jeff Dike wrote: > md...@de... said: > > Certainly (this is from 2.4.23-1um). Seems like something goes weird > > with procfs, but I've no idea why this only happens with slirp (and > > newer gcc for that matter). > > Just tried it, boots fine here, but then I'm not playing with new gccs. > > Can you send me the binary that crashes? I assume it's statically linked... > > I'm thinking that something funky is happening in the initcalls, and the slirp > initcall happens to push the funkiness over the edge. http://people.debian.org/~mdz/temp/linux-176485 -- - mdz |
From: Jeff D. <jd...@ad...> - 2004-01-16 19:42:53
|
% file linux-176485 linux-176485: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped Having it stripped is inconvenient. Can you compress a debuggable binary and put it someplace I can grab it? Jeff |
From: Matt Z. <md...@de...> - 2004-01-16 19:49:40
|
On Fri, Jan 16, 2004 at 03:04:21PM -0500, Jeff Dike wrote: > % file linux-176485 > linux-176485: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped > > Having it stripped is inconvenient. Can you compress a debuggable binary > and put it someplace I can grab it? http://people.debian.org/~mdz/temp/linux-176485.gz (ETA ~10 minutes; it's rather enormous) -- - mdz |
From: Matt Z. <md...@de...> - 2004-01-17 00:42:30
|
On Fri, Jan 16, 2004 at 11:49:29AM -0800, Matt Zimmerman wrote: > On Fri, Jan 16, 2004 at 03:04:21PM -0500, Jeff Dike wrote: > > > % file linux-176485 > > linux-176485: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), statically linked, stripped > > > > Having it stripped is inconvenient. Can you compress a debuggable binary > > and put it someplace I can grab it? > > http://people.debian.org/~mdz/temp/linux-176485.gz > > (ETA ~10 minutes; it's rather enormous) As mentioned on IRC, if I change slirp_init to return early and skip the printk's at the end, UML boots and runs fine; the interface even comes up OK. So if there is corruption happening, maybe it's down below somewhere (or maybe it's weirder still). -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-21 01:29:32
|
On Sat, Dec 20, 2003 at 12:14:50PM -0500, Jeff Dike wrote: > The only thing that seemed remotely relevant is the unistd.h change, which > fixes a bug in the case of an error in an internally called system call. > You can try reverting that and see what happens. FWIW, I tried reverting this and it made no difference. -- - mdz |
From: Nick Craig-W. <nc...@ax...> - 2003-12-20 19:05:22
|
On Fri, Dec 19, 2003 at 05:13:23PM -0800, Matt Zimmerman wrote: > There definitely seems to be something awry here, even in skas mode. > I am getting a lot of ENOSYS errors inside UML: FYI I had no end of trouble building 2.4.22-7 with Debian testing (with gcc 2-95). I got it to compile OK (with lots of warnings), but I couldn't get it to run reliably. I've changed to building it in a woody chroot and it runs perfectly now! > 2. CONFIG_UML_REAL_TIME_CLOCK=y was added to the configuration Watch out if running on > 2 GHz CPU - see list archives for patch the UML will hang quite early on. Sorry not a very scientific report! -- Nick Craig-Wood nc...@ax... |
From: BlaisorBlade <bla...@ya...> - 2003-12-21 15:55:21
|
Alle 01:47, domenica 21 dicembre 2003, Matt Zimmerman ha scritto: > Thanks for looking at it. It's sounding more and more like the new > glibc/linux-kernel-headers packages have broken UML. > Maybe something is > getting the host's kernel headers when it needs the ones from the UML build > tree? Something is getting the host's headers and it must get them, not the ones from UML build tree. Every UML arch file with its name ending in _user.c(+ quite a lot of other ones, listed in USER_OBJS in Makefiles) are built against the host headers, since they are the code interacting with the host. But you need to actually compile UML against 2.4 host headers to see if this is the reason; debugging it can be worst than an unnoticed wrong pointer or buffer overrun, since probably a macro went in silently and it changed the semantics of sources... -- cat <<EOSIGN Paolo Giarrusso, aka Blaisorblade Linux Kernel 2.4.21/2.6.0-test on an i686; Linux registered user n. 292729 EOSIGN |
From: Matt Z. <md...@de...> - 2003-12-21 22:40:09
|
On Sun, Dec 21, 2003 at 04:58:44PM +0100, BlaisorBlade wrote: > Alle 01:47, domenica 21 dicembre 2003, Matt Zimmerman ha scritto: > > Thanks for looking at it. It's sounding more and more like the new > > glibc/linux-kernel-headers packages have broken UML. > > Maybe something is > > getting the host's kernel headers when it needs the ones from the UML build > > tree? > Something is getting the host's headers and it must get them, not the ones > from UML build tree. > > Every UML arch file with its name ending in _user.c(+ quite a lot of other > ones, listed in USER_OBJS in Makefiles) are built against the host headers, > since they are the code interacting with the host. Yes, I understand that. Was my sentence unclear? I was suggesting that it was possible that the host's kernel headers were being used in a situation where the UML tree kernel headers _should_ be used. > But you need to actually compile UML against 2.4 host headers to see if this > is the reason; debugging it can be worst than an unnoticed wrong pointer or > buffer overrun, since probably a macro went in silently and it changed the > semantics of sources... ...which is one thing that I have tested. I substituted 2.4.22 kernel headers for the 2.6 ones provided by linux-kernel-headers and rebuilt UML, and the problem persists. I have heard reports that building the same source on Debian woody (glibc 2.2.5-11.5 with 2.4 kernel headers) works, however. I am going to be testing this myself shortly. If it works, the cause is most likely somewhere in glibc. -- - mdz |
From: Matt Z. <md...@de...> - 2003-12-21 23:16:45
|
On Sun, Dec 21, 2003 at 02:40:01PM -0800, Matt Zimmerman wrote: > I have heard reports that building the same source on Debian woody (glibc > 2.2.5-11.5 with 2.4 kernel headers) works, however. I am going to be > testing this myself shortly. If it works, the cause is most likely > somewhere in glibc. I have just verified this myself. Building user-mode-linux 2.4.22-7um-1 on woody works fine (even when running on unstable), but building it on unstable does not. The one built on unstable randomly sees ENOSYS from certain system calls, such as select, read and mmap. I would appreciate any suggestions for how to track this problem down further. I suspect that changes in glibc/linux-kernel-headers have broken things. I tried replacing the headers in linux-kernel-headers (asm, asm-generic and linux) with the ones from Linux 2.4.22, but this did not help. -- - mdz |
From: Jeff D. <jd...@ad...> - 2003-12-22 00:09:52
|
md...@de... said: > I have just verified this myself. Building user-mode-linux > 2.4.22-7um-1 on woody works fine (even when running on unstable), but > building it on unstable does not. Conversely, does a unstable-built UML run on woody? > The one built on unstable randomly sees ENOSYS from certain system > calls, such as select, read and mmap. Only those, or are there others that you can tell are failing? Offhand, I don't see any commonality between those three, in terms of their interactions with the host. > I would appreciate any suggestions for how to track this problem down > further. The randomness is strange. It suggests that somehow interrupts are getting in the way. One possibility would be host system calls returning ENOSYS instead of EINTR. I don't see much possibility that that's what's actually happening, but that's the sort of thing I'd think about. Jeff |
From: Matt Z. <md...@de...> - 2003-12-22 04:09:07
|
On Sun, Dec 21, 2003 at 07:25:47PM -0500, Jeff Dike wrote: > md...@de... said: > > I have just verified this myself. Building user-mode-linux > > 2.4.22-7um-1 on woody works fine (even when running on unstable), but > > building it on unstable does not. > > Conversely, does a unstable-built UML run on woody? The unstable-built UML is broken on woody as well. So far, my most reproducible test case so far (not 100%, but close) is to start up a netcat listener, and connect to it with input from /dev/zero, i.e. just push a bunch of data over a TCP connection. What happens is this: rootstrap:~# nc -v -l -p 1234 >/dev/null </dev/null & [2] 138 rootstrap:~# listening on [any] 1234 ... rootstrap:~# nc -v -v localhost 1234 </dev/zero connect to [127.0.0.1] from localhost [127.0.0.1] 1028 localhost [127.0.0.1] 1234 (?) open select fuxored : Function not implemented too many output retries : Broken pipe sent 27820032, rcvd 0 [2]+ Exit 1 nc -v -l -p 1234 >/dev/null </dev/null The relevant netcat source code isn't doing anything unusual: rr = select (16, ding2, 0, 0, timer2); /* here it is, kiddies */ if (rr < 0) { if (errno != EINTR) { /* might have gotten ^Zed, etc ?*/ holler ("select fuxored"); close (fd); return (1); } } /* select fuckup */ so select is returning ENOSYS, but, as can be seen from the transfer statistics, it succeeds many times before it fails. Some other times, a program will simply hang (sometimes even stalling the boot process), or segfault. > > The one built on unstable randomly sees ENOSYS from certain system > > calls, such as select, read and mmap. > > Only those, or are there others that you can tell are failing? Offhand, I > don't see any commonality between those three, in terms of their interactions > with the host. Those are the ones that I have been able to easily identify. select came from the netcat test you see above. mmap was evident from the APT HTTP method: /usr/lib/apt/methods/http: error while loading shared libraries: libc.so.6: cannot map zero-fill pages: Error 38 (that error is from dl-load.c in glibc, and as far as I can tell indicates that mmap gave ENOSYS). basename from coreutils seemed to see write(2) failing: basename: write error: Function not implemented I also saw unlink do it, in dpkg: dpkg: error processing /var/cache/apt/archives/debhelper_4.0.2_all.deb (--unpack): failed to rmdir/unlink `/usr/share/man/man1/dh_compress.1.gz.dpkg-tmp': Function not implemented apt occasionally blows up read()ing from a socket as well: (none):~# apt-get update Get:1 http://debian woody/main Packages [1774kB] Err http://debian woody/main Packages Error reading from server - read (38 Function not implemented) Get:2 http://debian woody/main Release [95B] Fetched 95B in 0s (259B/s) Failed to fetch http://debian/dists/woody/main/binary-i386/Packages Error reading from server - read (38 Function not implemented) Reading Package Lists... Done Building Dependency Tree... Done E: Some index files failed to download, they have been ignored, or old ones used instead. > > I would appreciate any suggestions for how to track this problem down > > further. > > The randomness is strange. It suggests that somehow interrupts are getting > in the way. One possibility would be host system calls returning ENOSYS > instead of EINTR. I don't see much possibility that that's what's actually > happening, but that's the sort of thing I'd think about. Can you think of any way that userland changes could produce that kind of effect? I don't think I would know where to look. My kernel didn't change, and the problem seems to occur on different host kernels. I tried running UML under strace; this produces an impressive amount of output, but made it much more difficult to reproduce the bug. I finally got it to happen under strace, and I have a 226M logfile (7M gzipped) from the session, if you're interested in taking a look. I've put it up at http://people.debian.org/~mdz/temp/uml.strace.gz. I don't see any host system calls returning ENOSYS; the only failures are some very innocuous-looking EINTRs and a few EAGAINs that looks like they're associated with a terminal device. -- - mdz |
From: Daniel J. <da...@de...> - 2003-12-22 05:49:25
|
On Sun, Dec 21, 2003 at 08:08:57PM -0800, Matt Zimmerman wrote: > On Sun, Dec 21, 2003 at 07:25:47PM -0500, Jeff Dike wrote: > > > md...@de... said: > > > I have just verified this myself. Building user-mode-linux > > > 2.4.22-7um-1 on woody works fine (even when running on unstable), but > > > building it on unstable does not. > > > > Conversely, does a unstable-built UML run on woody? > > The unstable-built UML is broken on woody as well. So far, my most > reproducible test case so far (not 100%, but close) is to start up a netcat > listener, and connect to it with input from /dev/zero, i.e. just push a > bunch of data over a TCP connection. What happens is this: ... Matt, could you try with different compilers? This sounds a lot more to me like a compiler bug than a libc one (but that's just a hunch). -- Daniel Jacobowitz MontaVista Software Debian GNU/Linux Developer |
From: Matt Z. <md...@de...> - 2003-12-22 09:09:04
|
On Mon, Dec 22, 2003 at 12:49:14AM -0500, Daniel Jacobowitz wrote: > On Sun, Dec 21, 2003 at 08:08:57PM -0800, Matt Zimmerman wrote: > > On Sun, Dec 21, 2003 at 07:25:47PM -0500, Jeff Dike wrote: > > > > > md...@de... said: > > > > I have just verified this myself. Building user-mode-linux > > > > 2.4.22-7um-1 on woody works fine (even when running on unstable), but > > > > building it on unstable does not. > > > > > > Conversely, does a unstable-built UML run on woody? > > > > The unstable-built UML is broken on woody as well. So far, my most > > reproducible test case so far (not 100%, but close) is to start up a netcat > > listener, and connect to it with input from /dev/zero, i.e. just push a > > bunch of data over a TCP connection. What happens is this: > > Matt, could you try with different compilers? This sounds a lot more > to me like a compiler bug than a libc one (but that's just a hunch). user-mode-linux builds with gcc-2.95 explicitly, which I don't think has had substantial changes this year. I should note that UML is statically linked as well. -- - mdz |
From: Adam H. <ad...@do...> - 2004-01-05 17:50:20
|
On Mon, 22 Dec 2003, Matt Zimmerman wrote: > user-mode-linux builds with gcc-2.95 explicitly, which I don't think has had > substantial changes this year. > > I should note that UML is statically linked as well. try a dynamic skas-only build(on unstable), running on unstable, then woody. If this works, then it's probably a bug in libc itself. |
From: Matt Z. <md...@de...> - 2004-01-05 18:10:27
|
On Mon, Jan 05, 2004 at 11:51:24AM -0600, Adam Heath wrote: > On Mon, 22 Dec 2003, Matt Zimmerman wrote: > > > user-mode-linux builds with gcc-2.95 explicitly, which I don't think has had > > substantial changes this year. > > > > I should note that UML is statically linked as well. > > try a dynamic skas-only build(on unstable), running on unstable, then woody. > > If this works, then it's probably a bug in libc itself. Building with gcc-3.3 fixed the problem. -- - mdz |
From: BlaisorBlade <bla...@ya...> - 2003-12-24 17:34:46
|
Alle 23:40, domenica 21 dicembre 2003, Matt Zimmerman ha scritto: > > > Thanks for looking at it. It's sounding more and more like the new > > > glibc/linux-kernel-headers packages have broken UML. > > > Maybe something is > > > getting the host's kernel headers when it needs the ones from the UML > > > build tree? > > > > Something is getting the host's headers and it must get them, not the > > ones from UML build tree. > > > > Every UML arch file with its name ending in _user.c(+ quite a lot of > > other ones, listed in USER_OBJS in Makefiles) are built against the host > > headers, since they are the code interacting with the host. > > Yes, I understand that. Was my sentence unclear? No, just I don't think that situation can happen. Maybe the problem is just that glibc is built against 2.6(i.e. you should try not only replacing headers, but also rebuilding glibc after that). Maybe this could be related with new things from glibc and 2.6, i.e. vsyscall or NPTL(I am just shooting in the middle, I don't even have ideas of what vsyscall is). I've tried even if any of the header-generator programs use some changed headers, but for I saw is that every definition stayed the same (except #define PTRACE_OLDSETOPTIONS 21 which had same number but was named PTRACE_SETOPTIONS in 2.4). > I was suggesting that > it was possible that the host's kernel headers were being used in a > situation where the UML tree kernel headers _should_ be used. I understood this; only I think this doesn't happen... -- cat <<EOSIGN Paolo Giarrusso, aka Blaisorblade Linux Kernel 2.4.21/2.6.0-test on an i686; Linux registered user n. 292729 EOSIGN |