From: Burton M. S. I. <bst...@at...> - 2002-09-17 22:38:36
|
P3/800, some absurd amount of ram (700MB?) running 2.4.19. I was running uml 2.4.18.15um-0, updated to 2.4.19.5um-0, both from the rpms @ SourceForge. Yes, I did remember to update /lib/modules With 2.4.19, I do get more work done, but eventually it dies. It seem to be related to activity, in that it usually happens when I'm sshed in typing or watching an rpm build/install. Then I flip over to the uml machine's console and the screen is black. The whole host (Linux native) is shutdown. Have to press reset and reboot. I have two umls, ur-tigger and ur-tigger2 (ur-tigger is the build)(ur-tigger2 the test install) ur-tigger2 uses a cow, ur-tigger doesn't. Let's see... plenty of available memory: 17:20:29 tigger [Linux] user=bstrauss pwd=/shared/work/linux/ntop/html $ cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 594694144 52817920 541876224 0 8601600 20836352 Swap: 542826496 0 542826496 MemTotal: 580756 kB MemFree: 529176 kB MemShared: 0 kB Buffers: 8400 kB Cached: 20348 kB SwapCached: 0 kB Active: 8328 kB Inactive: 32436 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 580756 kB LowFree: 529176 kB SwapTotal: 530104 kB SwapFree: 530104 kB Basically I use UML to build and test rpms - other than ntop and a few system tasks, nothing else runs on it. Seriously: PID TTY STAT TIME COMMAND 1 ? S 0:03 init 2 ? SW 0:00 [keventd] 3 ? SW 0:00 [kapmd] 4 ? SWN 0:00 [ksoftirqd_CPU0] 5 ? SW 0:00 [kswapd] 6 ? SW 0:00 [bdflush] 7 ? SW 0:00 [kupdated] 10 ? SW 0:00 [khubd] 12 ? SW 0:00 [kjournald] 139 ? SW 0:00 [kjournald] 531 ? S 0:00 /sbin/dhclient -1 -q -lf /var/lib/dhcp/dhclient-eth0.leases -pf /var/run/ 596 ? S 0:00 syslogd -m 0 601 ? S 0:00 klogd -2 621 ? S 0:00 portmap 699 ? SL 0:00 ntpd -U ntp 753 ? S 0:00 /usr/sbin/sshd 985 ? S 0:00 \_ /usr/sbin/sshd 986 pts/0 S 0:00 \_ -bash 1061 pts/0 R 0:00 \_ ps -axf 794 ? S 0:00 gpm -t ps/2 -m /dev/mouse 812 ? S 0:00 crond 862 ? S 0:00 xfs -droppriv -daemon 900 ? S 0:00 /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /usr/share 953 ? S 0:00 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /usr/s 954 ? S 0:00 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /u 956 ? S 0:00 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /u 957 ? S 0:03 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /u 958 ? S 0:00 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /u 959 ? S 0:00 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /u 960 ? S 0:00 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /u 961 ? S 0:00 \_ /usr/bin/ntop -i eth0,eth1 -p /usr/share/ntop/protocol.list -P /u 936 ? S 0:00 /usr/sbin/atd 968 tty1 S 0:00 /sbin/mingetty tty1 969 tty2 S 0:00 /sbin/mingetty tty2 970 tty3 S 0:00 /sbin/mingetty tty3 No load on the native system: 5:33pm up 25 min, 2 users, load average: 0.29, 0.45, 0.21 45 processes: 44 sleeping, 1 running, 0 zombie, 0 stopped CPU states: 0.1% user, 0.5% system, 0.0% nice, 99.2% idle Mem: 580756K av, 576360K used, 4396K free, 0K shrd, 4032K buff Swap: 530104K av, 0K used, 530104K free 537408K cached No load on UML: 6:34pm up 2 min, 1 user, load average: 0.01, 0.02, 0.00 16 processes: 15 sleeping, 1 running, 0 zombie, 0 stopped Mem: 127444K av, 72424K used, 55020K free, 0K shrd, 608K buff Swap: 131064K av, 0K used, 131064K free 66332K cached The UML machine seems to have plenty of memory: [ntop@ur-tigger2 ntop]$ cat /proc/meminfo total: used: free: shared: buffers: cached: Mem: 130502656 74047488 56455168 0 622592 67923968 Swap: 134209536 0 134209536 MemTotal: 127444 kB MemFree: 55132 kB MemShared: 0 kB Buffers: 608 kB Cached: 66332 kB SwapCached: 0 kB Active: 6480 kB Inactive: 61932 kB HighTotal: 0 kB HighFree: 0 kB LowTotal: 127444 kB LowFree: 55132 kB SwapTotal: 131064 kB SwapFree: 131064 kB 17:26:30 tigger [Linux] user=bstrauss pwd=/home/ntop/uml/ur-tigger $ ls -lsAh 4.0k -rwxrwx--x 1 bstrauss ntop 3.1k Sep 17 09:34 control 4.0k drw-rw---- 2 bstrauss ntop 4.0k Jul 18 04:02 mnt 633M -rw-rw---- 1 bstrauss ntop 1.0G Sep 17 12:15 rootfs 12M -rw-rw---- 1 bstrauss ntop 128M Sep 17 12:12 swapfs 17:27:05 tigger [Linux] user=bstrauss pwd=/home/ntop/uml/ur-tigger2 $ ls -lsAh 8.0M -rw-r--r-- 1 bstrauss bstrauss 898M Sep 17 17:05 cow 4.0k drwxrwx--- 2 bstrauss ntop 4.0k Jul 18 04:02 mnt 588M -rw-r--r-- 1 bstrauss bstrauss 1.0G Sep 17 16:55 rootfs 12M -rwxrwx--- 1 bstrauss ntop 128M Jul 30 16:00 swapfs UML machine has enough disk: [ntop@ur-tigger2 ntop]$ df -h Filesystem Size Used Avail Use% Mounted on /dev/ubd/0 1008M 464M 493M 49% / none 62M 0 62M 0% /dev/shm / 11G 4.0G 6.8G 37% /tigger The UML startup... 17:31:10 tigger [Linux] user=bstrauss pwd=/home/ntop/uml/ur-tigger2 $ /etc/init.d/ur-tigger2 start tracing thread pid = 1487 Linux version 2.4.19-5um (jd...@um...) (gcc version 2.96 20000731 (Red Hat Linux 7.1 2.96-81)) #2 Mon Sep 16 15:41:15 EDT 2002 On node 0 totalpages: 32768 zone(0): 32768 pages. zone(1): 0 pages. zone(2): 0 pages. Kernel command line: umid=ur-tigger2 ubd0=cow,rootfs root=/dev/ubd/0 ubd7=swapfs mem=128m eth0=tuntap,,,192.168.42.32 con1=tty:/dev/tty4 Calibrating delay loop... 831.85 BogoMIPS Memory: 127444k available Dentry cache hash table entries: 16384 (order: 5, 131072 bytes) Inode cache hash table entries: 8192 (order: 4, 65536 bytes) Mount-cache hash table entries: 2048 (order: 2, 16384 bytes) Buffer-cache hash table entries: 8192 (order: 3, 32768 bytes) Page-cache hash table entries: 32768 (order: 5, 131072 bytes) Checking for host processor cmov support...Yes Checking for host processor xmm support...No Checking that ptrace can change system call numbers...OK Checking that host ptys support output SIGIO...Yes Checking that host ptys support SIGIO on close...No, enabling workaround POSIX conformance testing by UNIFIX Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Initializing RT netlink socket Starting kswapd VFS: Diskquotas version dquot_6.4.0 initialized Journalled Block Device driver loaded devfs: v1.12a (20020514) Richard Gooch (rg...@at...) devfs: boot_options: 0x1 Installing knfsd (copyright (C) 1996 ok...@mo...). pty: 256 Unix98 ptys configured RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) Universal TUN/TAP device driver 1.5 (C)1999-2002 Maxim Krasnyansky SCSI subsystem driver Revision: 1.00 NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP IP: routing cache hash table of 1024 buckets, 8Kbytes TCP: Hash tables configured (established 8192 bind 8192) NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. Netdevice 0 : TUN/TAP backend - IP = 192.168.42.32 Initializing software serial port version 1 mconsole (version 2) initialized on /home/bstrauss/.uml/ur-tigger2/mconsole Creating "cow" as COW file for "rootfs" Partition check: ubda: unknown partition table ubdh: unknown partition table UML Audio Relay Initializing stdio console driver VFS: Mounted root (ext2 filesystem) readonly. Mounted devfs on /dev INIT: version 2.78 booting Welcome to Red Hat Linux Press 'I' to enter interactive startup. Mounting proc filesystem: [ OK ] Configuring kernel parameters: [ OK ] hwclock is unable to get I/O port access: the iopl(3) call failed. Setting clock : Tue Sep 17 18:31:24 EDT 2002 [ OK ] Activating swap partitions: [ OK ] Setting hostname ur-tigger2.gateway.2wire.net: [ OK ] Your system appears to have shut down uncleanly Press Y within 2 seconds to force file system integrity check...y Checking root filesystem /dev/ubd/0: 34724/131072 files (0.5% non-contiguous), 122752/262144 blocks [/sbin/fsck.ext2 -- /] fsck.ext2 -a -f /dev/ubd/0 [ OK ] Remounting root filesystem in read-write mode: [ OK ] Finding module dependencies: [ OK ] Checking filesystems Checking all file systems. [ OK ] Mounting local filesystems: [ OK ] Enabling local filesystem quotas: [ OK ] Enabling swap space: [ OK ] INIT: Entering runlevel: 3 Entering non-interactive startup Updating /etc/fstab [ OK ] Checking for new hardware [ OK ] Setting network parameters: [ OK ] Bringing up interface lo: [ OK ] Bringing up interface eth0: [ OK ] Starting umlnet: [ OK ] Starting system logger: [ OK ] Starting kernel logger: [ OK ] Initializing random number generator: [ OK ] Mounting other filesystems: [ OK ] Starting sshd: [ OK ] Starting crond: [ OK ] Starting anacron: [ OK ] Starting atd: [ OK ] ************************************************** ************************************************** *** ur-tigger2 is up... *** ************************************************** ************************************************** Please help with any ideas? -----Burton |
From: James S. <mi...@st...> - 2002-09-17 22:48:51
|
Hi i would probably see this as a problem with the host or the host kernel not uml. uml should only be a trigger for the bug after all you should not be able to crash a linux machine as a non-root user and this appears to be happening. > P3/800, some absurd amount of ram (700MB?) running 2.4.19. I was running > uml 2.4.18.15um-0, updated to 2.4.19.5um-0, both from the rpms @ > SourceForge. Yes, I did remember to update /lib/modules > > With 2.4.19, I do get more work done, but eventually it dies. It seem to be > related to activity, in that it usually happens when I'm sshed in typing or > watching an rpm build/install. Then I flip over to the uml machine's > console and the screen is black. The whole host (Linux native) is shutdown. > Have to press reset and reboot. |
From: David C. <da...@da...> - 2002-09-17 22:52:59
|
Burton M. Strauss III wrote: > With 2.4.19, I do get more work done, but eventually it dies. It seem to be > related to activity, in that it usually happens when I'm sshed in typing or > watching an rpm build/install. Then I flip over to the uml machine's > console and the screen is black. The whole host (Linux native) is shutdown. > Have to press reset and reboot. You are aware that UML uses /tmp, rather than actual host memory? You might want to ensure that it is mounted /tmpfs, and there is sufficient space for all your UMLs. I've found that UML throws up a load of host kernel problems - I had lots of lock ups and swap problems with vanilla 2.4.19, and things seemed slightly better when patched with -rmap14a, although I did experience a lock up. I'm currently running 2.4.19-ck7-rmap14a as a host kernel, and it seems stable. Unless you can get some host debugging output, such as compiling SysRq support into it, then you're probably not going to get too far in fixing this. It's certainly a host issue, rather than UML - UML just seems to bash the host enough to encourage these bugs to show themselves. David -- David Coulson http://davidcoulson.net/ d...@vi... http://journal.davidcoulson.net/ |
From: Burton M. S. I. <bst...@ac...> - 2002-09-17 23:41:02
|
Actually, no I wasn't aware of the /tmpfs. It is mounted, but there's only about 280M in there. Still, I only run one uml at a time and they're only 64m. I can try bumping it up... This is vanilla 2.4.19 -- I've stayed away from patching the kernel... No, I don't have SysRq defined. Where would I get the 2.4.19-ck7-rmap14a patch? -----Burton -----Original Message----- From: David Coulson [mailto:da...@da...] Sent: Tuesday, September 17, 2002 5:53 PM To: Burton M. Strauss III Cc: use...@li... Subject: Re: [uml-user] UML kills my machine ... black screen dead Burton M. Strauss III wrote: > With 2.4.19, I do get more work done, but eventually it dies. It seem to be > related to activity, in that it usually happens when I'm sshed in typing or > watching an rpm build/install. Then I flip over to the uml machine's > console and the screen is black. The whole host (Linux native) is shutdown. > Have to press reset and reboot. You are aware that UML uses /tmp, rather than actual host memory? You might want to ensure that it is mounted /tmpfs, and there is sufficient space for all your UMLs. I've found that UML throws up a load of host kernel problems - I had lots of lock ups and swap problems with vanilla 2.4.19, and things seemed slightly better when patched with -rmap14a, although I did experience a lock up. I'm currently running 2.4.19-ck7-rmap14a as a host kernel, and it seems stable. Unless you can get some host debugging output, such as compiling SysRq support into it, then you're probably not going to get too far in fixing this. It's certainly a host issue, rather than UML - UML just seems to bash the host enough to encourage these bugs to show themselves. David -- David Coulson http://davidcoulson.net/ d...@vi... http://journal.davidcoulson.net/ |
From: David C. <da...@da...> - 2002-09-17 23:47:09
|
Burton M. Strauss III wrote: > Actually, no I wasn't aware of the /tmpfs. It is mounted, but there's only > about 280M in there. Still, I only run one uml at a time and they're only > 64m. I can try bumping it up... Is /tmp mounted tmpfs, or as a regular disk-based filesystem? If your UML is set to use 64M of memory, it won't consume more than that within the /tmp directory. I don't know why I said /tmpfs - I should have said "You might want to ensure that /tmp is mounted tmpfs". > This is vanilla 2.4.19 -- I've stayed away from patching the kernel... > > No, I don't have SysRq defined. Might be worth trying that, assuming the kernel is in a state to actually work when it goes wrong. If you don't have any useful console output, or can get anything interesting out of the logs, you're left with few options. > Where would I get the 2.4.19-ck7-rmap14a patch? http://members.optusnet.com.au/ckolivas/kernel/ Specificly, you want; http://members.optusnet.com.au/ckolivas/kernel/rmapck7_2.4.19.patch.bz2 and http://members.optusnet.com.au/ckolivas/kernel/pe2_pe1.diff David -- David Coulson http://davidcoulson.net/ d...@vi... http://journal.davidcoulson.net/ |
From: Adrian P. <a.p...@me...> - 2002-09-22 08:20:03
|
>>>>> "David" == David Coulson <da...@da...> writes: >> Where would I get the 2.4.19-ck7-rmap14a patch? And for those interested :- http://www.surriel.com/patches/ * sep 17 After some interruptions for work related stuff, 2.5 and procps here is rmap 14b, especially recommended for SMP systems: 2.4.19-rmap14b and the incremental rmap14a-rmap14b. I've just rebooted one of the mahcines with ck7-rmap14a plus rmap14a-rmap14b to see how this runs, Sincerely, Adrian Phillips -- Your mouse has moved. Windows NT must be restarted for the change to take effect. Reboot now? [OK] |
From: David C. <da...@da...> - 2002-09-22 10:34:43
|
Adrian Phillips wrote: > I've just rebooted one of the mahcines with ck7-rmap14a plus > rmap14a-rmap14b to see how this runs, I tried that, and the box locked up after around 6hrs. I've got no idea if this was down to the new patch, or because of the existing kernel. David -- David Coulson http://davidcoulson.net/ d...@vi... http://journal.davidcoulson.net/ |
From: Adrian P. <a.p...@me...> - 2002-09-23 05:01:42
|
>>>>> "David" == David Coulson <da...@da...> writes: David> Adrian Phillips wrote: >> I've just rebooted one of the mahcines with ck7-rmap14a plus >> rmap14a-rmap14b to see how this runs, David> I tried that, and the box locked up after around 6hrs. I've David> got no idea if this was down to the new patch, or because David> of the existing kernel. Oh okay. I'll leave it running, see what happens. Thanks Adrian -- Your mouse has moved. Windows NT must be restarted for the change to take effect. Reboot now? [OK] |
From: Adrian P. <a.p...@me...> - 2002-09-18 05:21:49
|
>>>>> "David" == David Coulson <da...@da...> writes: David> I've found that UML throws up a load of host kernel David> problems - I had lots of lock ups and swap problems with David> vanilla 2.4.19, and things seemed slightly better when David> patched with -rmap14a, although I did experience a lock David> up. I'm currently running 2.4.19-ck7-rmap14a as a host David> kernel, and it seems stable. I've had this happen also with 2.4.19-rc1 and later. This is interesting to know, thanks for the heads up on this David. May I ask which options you are using of the following :- CONFIG_PREEMPT CONFIG_LOLAT (I'm not sure whether the latter does anything - it seems to allow control of latency using sysctl), Sincerely, Adrian Phillips -- Your mouse has moved. Windows NT must be restarted for the change to take effect. Reboot now? [OK] |
From: Adrian P. <a.p...@me...> - 2002-09-18 05:24:36
|
>>>>> "Adrian" == Adrian Phillips <a.p...@me...> writes: Adrian> CONFIG_PREEMPT CONFIG_LOLAT Adrian> (I'm not sure whether the latter does anything - it seems Adrian> to allow control of latency using sysctl), Whoops, misread the code. LOLAT does make significant changes, LOLAT_SYSCTL is the sysctl interface. Sincerely, Adrian Phillips -- Your mouse has moved. Windows NT must be restarted for the change to take effect. Reboot now? [OK] |
From: David C. <da...@da...> - 2002-09-18 12:42:05
|
Adrian Phillips wrote: > I've had this happen also with 2.4.19-rc1 and later. This is > interesting to know, thanks for the heads up on this David. May I ask > which options you are using of the following :- > > CONFIG_PREEMPT > CONFIG_LOLAT I enabled them both - It's probably not going to hurt any, and my box seems to be stable at the moment. It might be worth doing some benchmarking with UML to see if the pre-emptive kernel patch and/or the low latency patch on the host make any difference, since they should improve things in high I/O situations. However, my priority at the moment is to set up a host which doesn't fall over. David -- David Coulson http://davidcoulson.net/ d...@vi... http://journal.davidcoulson.net/ |