From: Jeff D. <jd...@ka...> - 2000-12-20 02:03:33
|
xr...@ho... said: > Unfortunately, running under heavy load (50 users simulated with > WebLoad) consistently causes the system to hang within a minute, > mostly with the following kernel panic: Kernel panic: Double fault on > 0xbedffc44 - panicing because it wasn't fixed the first time. Can you tell us exactly how you generate the load? > The host system is running debian woody with a 2.2.18 kernel (the same > results were also observed on 2.2.17) and the host filesystem is > reiserfs. Is this a known bug? any workarounds? Can you run it on an ext2 filesystem? I saw a problem on reiserfs (an older version) which caused UML to randomly crash. This sounds more consistent than that, but having the same thing happen on ext2 would make me more convinced that it's my problem. Also, it might turn out not to be a real problem. That panic is there to trap things which look like page faults but aren't really (like privileged instructions). It's possible that a page could get faulted in, swapped out, and faulted back in again, triggering that panic. I don't think so, but you might try removing that panic and seeing if the system still behaves. If something is still going wrong, it will probably hang, and when you look at it under gdb, you will see continuous faults on the same address. Jeff |
From: Myrtle M. <xr...@ho...> - 2000-12-25 00:16:26
|
> >Can you tell us exactly how you generate the load? > I'm using a simple perl-asp page (the asp engine from the debian package "libapache-asp-perl") which interfaces with a mysql database through perl-dbi (I can post the ASP page as well if it would help). The following SQL commands will generate the database I use: CREATE DATABASE /*!32312 IF NOT EXISTS*/ test; USE test; CREATE TABLE test ( id int(11) NOT NULL auto_increment, string varchar(20), PRIMARY KEY (id) ); GRANT SELECT,INSERT,DELETE ON test.* TO test@localhost IDENTIFIED BY 'test'; The asp performs the following SQL query: SELECT * FROM test WHERE string='%'; I use WebLoad (a commercial http load generator) to generate the load, however I've managed to recreate the problem with the following perl script (running it on a separate machine, not the uml or the uml host): ------------------------------------------------------- #!/usr/bin/perl use Getopt::Long; use HTTP::Request::Common; use LWP; my $num = 50; my $url = "http://umlhost/test.asp"; my $count = 0; GetOptions ("url=s", \$url, "num=i", \$num, "count=i", \$count); for (1 .. $num - 1) { my $pid = fork; if (!$pid) { sleep(rand(5)); last; } } $ua = LWP::UserAgent->new; while (1) { $ua->request(GET $url); last if ($count && !(--$count)); } --------------------------------------------------------- > >Can you run it on an ext2 filesystem? I saw a problem on reiserfs (an >older >version) which caused UML to randomly crash. This sounds more consistent >than >that, but having the same thing happen on ext2 would make me more convinced >that it's my problem. > I tried it on an ext2 (host) filesystem and got the same results >Also, it might turn out not to be a real problem. That panic is there to >trap >things which look like page faults but aren't really (like privileged >instructions). It's possible that a page could get faulted in, swapped >out, >and faulted back in again, triggering that panic. I don't think so, but >you >might try removing that panic and seeing if the system still behaves. If >something is still going wrong, it will probably hang, and when you look at >it >under gdb, you will see continuous faults on the same address. > I changed the panic to printk, but the system still gets stuck (the message is printed a few thousand times first, though). I removed the panic altogether, and the system still crashes. When running under gdb (after removing the panic), the debugger is interrupted with the following message (I don't need to press ^C - it stops by itself): warning: Couldn't get registers. Program received signal SIGSTOP, Stopped (signal). warning: Couldn't get registers. warning: Couldn't get registers. warning: Couldn't get registers. warning: Couldn't get registers. 0x100d5344 in ?? () (gdb) symbol-file /usr/src/uml/linux/linux Reading symbols from /usr/src/uml/linux/linux...done. warning: Couldn't get registers. warning: Couldn't get registers. (gdb) backtrace #0 0x100d5344 in sigprocmask () at eth_kern.c:383 Error accessing memory address 0x100d5310: No such process. I had "top" running on the console. At the moment the uml crashed it was displaying: 7:15pm up 1 min, 1 user, load average: 3.94, 1.00, 0.33 56 processes: 44 sleeping, 12 running, 0 zombie, 0 stopped CPU states: 78.2% user, 20.8% system, 1.0% nice, 0.0% idle Mem: 128876K av, 127824K used, 1052K free, 0K shrd, 1032K buff Swap: 0K av, 0K used, 0K free 7820K cached Detaching pid 1094 PID USER PRI NI SIZE RSS SHARE STAT LIB %CPU %MEM TIME COMMAND 608 www-data -25 0 6728 6728 3292 S 0 0.1 5.2 0:00 apache 611 www-data -30 0 6716 6716 3284 S 0 0.0 5.2 0:00 apache 606 www-data -30 0 6672 6672 3236 S 0 0.0 5.1 0:00 apache 607 www-data -30 0 6664 6664 3232 S 0 0.0 5.1 0:00 apache (the "Detaching pid 1094" isn't part of the "top" display, it's a message printed just before the uml crashes. Process 1094 is a mysql process). Sorry for the slow response; It took a while to perform all the tests. -Tal _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com |
From: Jeff D. <jd...@ka...> - 2000-12-25 22:29:50
|
xr...@ho... said: > however I've managed to recreate the problem with the following perl > script (running it on a separate machine, not the uml or the uml > host) OK, thanks. I'll see if I can reproduce it here. > I tried it on an ext2 (host) filesystem and got the same results I figured you probably would. I just wanted to check that. > I changed the panic to printk, but the system still gets stuck (the > message is printed a few thousand times first, though). If it wasn't really a problem, it would only have happened once. Jeff |
From: Jeff D. <jd...@ka...> - 2000-12-26 05:49:46
|
I can't get it to happen. I ran the test with both an empty table and one with a few thousand rows including one with a '%' in it. My configuration was 128M, no swap, the same as the one you were using. The kernel was my CVS, basically test12. > Detaching pid 1094 Did this happen when running it on reiserfs? I'm asking because this is pretty diagnostic of the resierfs bug that I encountered. That is basically impossible unless something is wrong with the host. Do you always see that just before the panic? How long does it normally take to get it to crash? Jeff |
From: Myrtle M. <xr...@ho...> - 2001-01-11 19:11:17
Attachments:
index.asp
|
Hi, I've found time to run some load tests again, this time with the latest CVS version (2.4.0) on a 2.2.18 host (ext2 filesystem), and can still consistently crash UML. I've attached the perl-asp file I use on the server side to generate the crash (the UML is running on the debian rootfs, 64M memory and 64M swap, Apache 1.3.14, mysql 2.23.28, libapache-mod-perl 1.24.01, libdbi-perl 1.14 and nodeworks' perl-asp 2.03). For networking I'm using um_eth_serv. The host machine is configured with "ifconfig tap0 192.168.66.1 arp up" and ip forwarding enabled and the UML with "ifconfig eth0 192.168.66.2 up" and 192.168.66.1 as the default route. I'm generating load with siege 1.01 ; The command I'm using is "siege -c 50 -u 'http://192.168.66.2/test/index.asp?nowrite=1' -v -t 100 -d 1" (where 192.168.66.2 is the IP of the UML). The crash occurs within 30 seconds. The panic messages vary. I've gotten: "Detaching pid xxxxx" "Seg fault in signals" "Kernel panic: Double fault on 0xbedffc44 - panicing because it wasn't fixed the first time" The crash is completely consistent; I haven't managed to pass a minute under load without crashing. If you can't recreate it, try running the load from a different machine (than the one running UML) - otherwise when the UML takes more CPU the load will go down. TIA, -Tal _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com |
From: William S. <wst...@po...> - 2002-12-31 20:25:38
|
Good day, Myrtle, Could you repeat the Apache/PHP/MySql test on a current kernel? I have a 2.4.19-uml45 precompiled and ready to go at http://www.stearns.org/uml/ if you don't have one. It would be great if you could confirm or deny that the original problem is fixed so we could get one off the todo list. Cheers, - Bill On Thu, 11 Jan 2001, Myrtle Meep wrote: > I've found time to run some load tests again, this time > with the latest CVS version (2.4.0) on a 2.2.18 host (ext2 filesystem), and > can still consistently crash UML. > I've attached the perl-asp file I use on the server side to generate the > crash (the UML is running on the debian rootfs, 64M memory and 64M swap, > Apache 1.3.14, mysql 2.23.28, libapache-mod-perl 1.24.01, libdbi-perl 1.14 > and nodeworks' perl-asp 2.03). For networking I'm using um_eth_serv. The > host machine is configured with "ifconfig tap0 192.168.66.1 arp up" and ip > forwarding enabled and the UML with "ifconfig eth0 192.168.66.2 up" and > 192.168.66.1 as the default route. > I'm generating load with siege 1.01 ; > The command I'm using is > "siege -c 50 -u 'http://192.168.66.2/test/index.asp?nowrite=1' -v -t 100 -d > 1" (where 192.168.66.2 is the IP of the UML). > The crash occurs within 30 seconds. The panic messages vary. I've gotten: > "Detaching pid xxxxx" > "Seg fault in signals" > "Kernel panic: Double fault on 0xbedffc44 - panicing because it wasn't fixed > the first time" > > The crash is completely consistent; I haven't managed to pass a minute under > load without crashing. > If you can't recreate it, try running the load from a different machine > (than the one running UML) - otherwise > when the UML takes more CPU the load will go down. --------------------------------------------------------------------------- My desk has a security flaw. If I bang my forehead at it for two days continuously I can make a hole in it. Wuss, bang harder. -- Slashdot debate on a Unix security issue -------------------------------------------------------------------------- William Stearns (wst...@po...). Mason, Buildkernel, named2hosts, and ipfwadm2ipchains are at: http://www.stearns.org -------------------------------------------------------------------------- |
From: Net L. <net...@li...> - 2002-12-31 20:42:48
|
On Tue, 31 Dec 2002, William Stearns wrote: > > Could you repeat the Apache/PHP/MySql test on a current kernel? > I have a 2.4.19-uml45 precompiled and ready to go at > http://www.stearns.org/uml/ if you don't have one. It would be great if Hi Bill, I notice that you have a few UML kernels on your site with a 'netfilter' designation. I'm assuming that this means that they were built with netfilter support? Does this imply that the precompiled kernels at user-mode-linux-sf.net are not built with netfilter support? I'm just wondering what the difference(s) is/are, that's all, since i'd love to have a source of more recent precompiled UML kernels for my production boxes. thanks, Lonni -- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Lonni J Friedman net...@li... Linux Step-by-step & TyGeMo http://netllama.ipfox.com |
From: William S. <wst...@po...> - 2002-12-31 21:15:29
|
Good day, Lonni, On Tue, 31 Dec 2002, Net Llama! wrote: > On Tue, 31 Dec 2002, William Stearns wrote: > > > > Could you repeat the Apache/PHP/MySql test on a current kernel? > > I have a 2.4.19-uml45 precompiled and ready to go at > > http://www.stearns.org/uml/ if you don't have one. It would be great if > > I notice that you have a few UML kernels on your site with a 'netfilter' > designation. I'm assuming that this means that they were built with > netfilter support? Does this imply that the precompiled kernels at They were. Specifically, I pulled down the netfilter patch-o-matic collection and applied the "base" and "extra" modules to the kernel before compiling, so these have some extra support over and above the stock kernel. > user-mode-linux-sf.net are not built with netfilter support? I'm just I have no idea. > wondering what the difference(s) is/are, that's all, since i'd love to > have a source of more recent precompiled UML kernels for my production > boxes. All the kernels I use will make their way up to that directory within a day or two of compilation. Cheers, - Bill --------------------------------------------------------------------------- A 'good' landing is one from which you can walk away. A 'great' landing is one after which they can use the plane again. -- Rules of the Air, #8 (Courtesy of "C. Bensend" <be...@be...>) -------------------------------------------------------------------------- William Stearns (wst...@po...). Mason, Buildkernel, named2hosts, and ipfwadm2ipchains are at: http://www.stearns.org -------------------------------------------------------------------------- |
From: Jeff D. <jd...@ka...> - 2001-01-11 20:03:41
|
xr...@ho... said: > The crash occurs within 30 seconds. The panic messages vary. I've > gotten: > "Detaching pid xxxxx" Was this running on reiserfs, by any chance? > The crash is completely consistent; I haven't managed to pass a minute > under load without crashing. If you can't recreate it, try running > the load from a different machine (than the one running UML) - > otherwise when the UML takes more CPU the load will go down. Thanks for the info, I'll try to reproduce it. I'm about on my way to Australia and maybe I'll get some time there to try. Otherwise, it will be in a couple of weeks. Jeff |
From: Myrtle M. <xr...@ho...> - 2001-01-12 13:22:52
|
>xr...@ho... said: > > The crash occurs within 30 seconds. The panic messages vary. I've > > gotten: > > "Detaching pid xxxxx" > >Was this running on reiserfs, by any chance? > This happens on both ext2 and reiserfs. > > The crash is completely consistent; I haven't managed to pass a minute > > under load without crashing. If you can't recreate it, try running > > the load from a different machine (than the one running UML) - > > otherwise when the UML takes more CPU the load will go down. > >Thanks for the info, I'll try to reproduce it. I'm about on my way to >Australia and maybe I'll get some time there to try. Otherwise, it will be >in >a couple of weeks. > Have fun in Australia :) -Tal _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com |
From: Jeff D. <jd...@ka...> - 2001-01-12 15:59:55
|
Just to check one more thing: xr...@ho... said: > This happens on both ext2 and reiserfs. Local ext2? Not over nfs? I ask because some versions of nfs seems to have the same sort of mmap problems that I saw with reiserfs. Jeff |
From: Myrtle M. <xr...@ho...> - 2001-01-14 09:55:35
|
Yes, local ext2 (and local reiserfs). I haven't tried running over nfs yet. -Tal > Just to check one more thing: > >xr...@ho... said: > > This happens on both ext2 and reiserfs. > >Local ext2? Not over nfs? > >I ask because some versions of nfs seems to have the same sort of mmap >problems that I saw with reiserfs. > > Jeff > > _________________________________________________________________ Get your FREE download of MSN Explorer at http://explorer.msn.com |
From: RHS L. U. <jd...@ka...> - 2001-01-14 17:02:21
|
xr...@ho... said: > Yes, local ext2 (and local reiserfs). I haven't tried running over nfs > yet. OK, just checking... That 'detaching pid nnn' message is typical of what I saw when the underlying fs had problems with mmap. > Have fun in Australia :) Thanks! I'm planning on it :-) I'm leaving for the airport in about an hour... Jeff |
From: Jeff D. <jd...@ka...> - 2001-02-05 04:34:34
|
Could you guys grab my latest CVS (or the uml-patch-latest patch) and try your MySQL stress tests on it? I found a couple ways of crashing UML by banging on it with Apache/perl/MySQL and fixed one of them. I'm working on the other now. Thanks, Jeff |