From: Joern B. <jb...@bw...> - 2004-04-09 10:27:31
|
Hi, I have a strange problem with lagging network connections to my vservers. Here is a typical ping to one of the uml vservers, sent from the uml host itself: 64 bytes from 217.146.142.73: icmp_seq=44 ttl=64 time=0.2 ms 64 bytes from 217.146.142.73: icmp_seq=45 ttl=64 time=0.2 ms 64 bytes from 217.146.142.73: icmp_seq=46 ttl=64 time=0.2 ms 64 bytes from 217.146.142.73: icmp_seq=47 ttl=64 time=235.6 ms 64 bytes from 217.146.142.73: icmp_seq=48 ttl=64 time=0.4 ms 64 bytes from 217.146.142.73: icmp_seq=49 ttl=64 time=0.2 ms 64 bytes from 217.146.142.73: icmp_seq=50 ttl=64 time=0.4 ms 64 bytes from 217.146.142.73: icmp_seq=51 ttl=64 time=0.2 ms 64 bytes from 217.146.142.73: icmp_seq=52 ttl=64 time=0.4 ms 64 bytes from 217.146.142.73: icmp_seq=53 ttl=64 time=2876.6 ms 64 bytes from 217.146.142.73: icmp_seq=54 ttl=64 time=1876.7 ms 64 bytes from 217.146.142.73: icmp_seq=55 ttl=64 time=876.7 ms 64 bytes from 217.146.142.73: icmp_seq=56 ttl=64 time=1.8 ms 64 bytes from 217.146.142.73: icmp_seq=57 ttl=64 time=0.3 ms 64 bytes from 217.146.142.73: icmp_seq=58 ttl=64 time=0.4 ms As you can see, most of the times, the pings are just fine. But every 10 to 20 seconds there is a lag. Sometime only 500 ms, sometimes 3000 ms and sometimes even 10 seconds long. You can imagine it's rellay no fun to work on a lagging vserver via ssh. :-( The uml host itself has no networking problems with the outside world. Pings to everywhere are just as they are supposed to be. I use the tap-devices for networking. vserver2:/etc/init.d# ifconfig tap1 tap1 Link encap:Ethernet HWaddr 00:FF:7F:4C:C5:2F inet addr:217.146.142.84 Bcast:217.146.142.255 Mask:255.255.255.255 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:3478644 errors:0 dropped:0 overruns:0 frame:0 TX packets:2961598 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1503132252 (1.3 GiB) TX bytes:299835570 (285.9 MiB) The uml, as well as the uml host is a Debian Woody 3.0. Kernels in use: On the host: Vanilla 2.4.23 with the host-skas-patch. On the uml: Linux version 2.4.23-1um (root@vserver1) (gcc version 2.95.4 20011002 (Debian prerelease)) #5 Sun Dec 21 04:26:57 CET 2003 On node 0 totalpages: 16384 zone(0): 16384 pages. zone(1): 0 pages. zone(2): 0 pages. Kernel command line: ubd0=/dev/vserver/ns1 ubd1=/dev/vserver/swap_ns1 eth0=tuntap,,,217.146.142.84 umid=ns1 mem=64M root=/dev/ubd0 Calibrating delay loop... 3135.26 BogoMIPS Memory: 60772k available Dentry cache hash table entries: 8192 (order: 4, 65536 bytes) Inode cache hash table entries: 4096 (order: 3, 32768 bytes) Mount cache hash table entries: 512 (order: 0, 4096 bytes) Buffer cache hash table entries: 1024 (order: 0, 4096 bytes) Page-cache hash table entries: 16384 (order: 4, 65536 bytes) Checking for host processor cmov support...Yes Checking for host processor xmm support...No Checking that ptrace can change system call numbers...OK Checking that host ptys support output SIGIO...Yes Checking that host ptys support SIGIO on close...No, enabling workaround Checking for /dev/anon on the host...Not available (open failed with errno 2) POSIX conformance testing by UNIFIX Linux NET4.0 for Linux 2.4 Based upon Swansea University Computer Society NET3.039 Sangoma WANPIPE Router v1.1 (c) 1995-2000 Sangoma Technologies Inc. Initializing RT netlink socket Starting kswapd VFS: Disk quotas vdquot_6.5.1 Journalled Block Device driver loaded devfs: v1.12c (20020818) Richard Gooch (rg...@at...) devfs: boot_options: 0x1 pty: 2048 Unix98 ptys configured RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize loop: loaded (max 8 devices) PPP generic driver version 2.4.2 Equalizer1996: $Revision: 1.2.1 $ $Date: 1996/09/22 13:52:00 $ Simon Janes (si...@nc...) Universal TUN/TAP device driver 1.5 (C)1999-2002 Maxim Krasnyansky ipddp.c:v0.01 8/28/97 Bradford W. Johnson <joh...@ma...> ipddp0: Appletalk-IP Encap. mode by Bradford W. Johnson <joh...@ma...> md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 md: Autodetecting RAID arrays. md: autorun ... md: ... autorun DONE. Initializing software serial port version 1 mconsole (version 2) initialized on /root/.uml/ns1/mconsole Partition check: ubda: unknown partition table ubdb: unknown partition table ubd : Synchronous mode Initializing stdio console driver Netdevice 0 : TUN/TAP backend - IP = 217.146.142.84 NET4: Linux TCP/IP 1.0 for NET4.0 IP Protocols: ICMP, UDP, TCP, IGMP IP: routing cache hash table of 512 buckets, 4Kbytes TCP: Hash tables configured (established 4096 bind 4096) IPv4 over IPv4 tunneling driver GRE over IPv4 tunneling driver Linux IP multicast router 0.06 plus PIM-SM ip_conntrack version 2.1 (474 buckets, 3792 max) - 292 bytes per conntrack NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. IPv6 v0.8 for NET4.0 IPv6 over IPv4 tunneling driver NET4: AppleTalk 0.18a for Linux NET4.0 EXT3-fs: INFO: recovery required on readonly filesystem. EXT3-fs: write access will be enabled during recovery. kjournald starting. Commit interval 5 seconds EXT3-fs: recovery complete. EXT3-fs: mounted filesystem with ordered data mode. VFS: Mounted root (ext3 filesystem) readonly. Mounted devfs on /dev Adding Swap: 262136k swap-space (priority -1) EXT3 FS 2.4-0.9.19, 19 August 2002 on ubd(98,0), internal journal Virtual console 1 assigned device '/dev/ptyp1' * insmod tun insmod: tun: no module by that name found * ifconfig tap1 217.146.142.84 netmask 255.255.255.255 up * bash -c echo 1 > /proc/sys/net/ipv4/ip_forward * route add -host 217.146.142.73 dev tap1 * bash -c echo 1 > /proc/sys/net/ipv4/conf/tap1/proxy_arp * arp -Ds 217.146.142.73 eth0 pub * route del -host 217.146.142.73 dev tap1 * bash -c echo 0 > /proc/sys/net/ipv4/conf/tap1/proxy_arp * arp -i eth0 -d 217.146.142.73 pub * route add -host 217.146.142.73 dev tap1 * bash -c echo 1 > /proc/sys/net/ipv4/conf/tap1/proxy_arp * arp -Ds 217.146.142.73 eth0 pub eth0: no IPv6 routers present Any idea what could cause those lags? Any hint for further troubleshooting approaches? Thanks! Joern |
From: roland <for...@gm...> - 2004-04-09 11:04:46
|
Hi Joern! Is the uml 100% idle all the time? what`s going on on the host at the same time? uml is just another "process" on the host - and the scheduler (especially I/O) of the 2.4 kernel series is not the really best. 2.6 is MUCH better - so you probably could try 2.6 on HOST and compare, if that makes a difference? could you run "vmstat 1" on the host and inside the uml while pinging? could you also ping into the other direction and put the results "side by side", to see if there is a relation ? as of writing this, i searched the mailing-list archive and found a reference: http://sourceforge.net/mailarchive/message.php?msg_id=6285243 so, you probably can do "some more" I/O on your host or on your UML (dd if=....) and study the ping "behaviour" ? regards roland ----- Original Message ----- From: "Joern Bredereck" <jb...@bw...> To: <use...@li...> Sent: Friday, April 09, 2004 12:27 PM Subject: [uml-user] Network lags > Hi, > > I have a strange problem with lagging network connections to my vservers. > > Here is a typical ping to one of the uml vservers, sent from the uml host > itself: > > 64 bytes from 217.146.142.73: icmp_seq=44 ttl=64 time=0.2 ms > 64 bytes from 217.146.142.73: icmp_seq=45 ttl=64 time=0.2 ms > 64 bytes from 217.146.142.73: icmp_seq=46 ttl=64 time=0.2 ms > 64 bytes from 217.146.142.73: icmp_seq=47 ttl=64 time=235.6 ms > 64 bytes from 217.146.142.73: icmp_seq=48 ttl=64 time=0.4 ms > 64 bytes from 217.146.142.73: icmp_seq=49 ttl=64 time=0.2 ms > 64 bytes from 217.146.142.73: icmp_seq=50 ttl=64 time=0.4 ms > 64 bytes from 217.146.142.73: icmp_seq=51 ttl=64 time=0.2 ms > 64 bytes from 217.146.142.73: icmp_seq=52 ttl=64 time=0.4 ms > 64 bytes from 217.146.142.73: icmp_seq=53 ttl=64 time=2876.6 ms > 64 bytes from 217.146.142.73: icmp_seq=54 ttl=64 time=1876.7 ms > 64 bytes from 217.146.142.73: icmp_seq=55 ttl=64 time=876.7 ms > 64 bytes from 217.146.142.73: icmp_seq=56 ttl=64 time=1.8 ms > 64 bytes from 217.146.142.73: icmp_seq=57 ttl=64 time=0.3 ms > 64 bytes from 217.146.142.73: icmp_seq=58 ttl=64 time=0.4 ms > > As you can see, most of the times, the pings are just fine. But every 10 > to 20 seconds there is a lag. Sometime only 500 ms, sometimes 3000 ms and > sometimes even 10 seconds long. > > You can imagine it's rellay no fun to work on a lagging vserver via ssh. > :-( > > The uml host itself has no networking problems with the outside world. > Pings to everywhere are just as they are supposed to be. > > I use the tap-devices for networking. > > vserver2:/etc/init.d# ifconfig tap1 > tap1 Link encap:Ethernet HWaddr 00:FF:7F:4C:C5:2F > inet addr:217.146.142.84 Bcast:217.146.142.255 > Mask:255.255.255.255 > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:3478644 errors:0 dropped:0 overruns:0 frame:0 > TX packets:2961598 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:1503132252 (1.3 GiB) TX bytes:299835570 (285.9 MiB) > > > The uml, as well as the uml host is a Debian Woody 3.0. > > Kernels in use: > > On the host: Vanilla 2.4.23 with the host-skas-patch. > > On the uml: > > Linux version 2.4.23-1um (root@vserver1) (gcc version 2.95.4 20011002 (Debian prerelease)) #5 Sun Dec 21 04:26:57 CET 2003 > On node 0 totalpages: 16384 > zone(0): 16384 pages. > zone(1): 0 pages. > zone(2): 0 pages. > Kernel command line: ubd0=/dev/vserver/ns1 ubd1=/dev/vserver/swap_ns1 eth0=tuntap,,,217.146.142.84 umid=ns1 mem=64M root=/dev/ubd0 > Calibrating delay loop... 3135.26 BogoMIPS > Memory: 60772k available > Dentry cache hash table entries: 8192 (order: 4, 65536 bytes) > Inode cache hash table entries: 4096 (order: 3, 32768 bytes) > Mount cache hash table entries: 512 (order: 0, 4096 bytes) > Buffer cache hash table entries: 1024 (order: 0, 4096 bytes) > Page-cache hash table entries: 16384 (order: 4, 65536 bytes) > Checking for host processor cmov support...Yes > Checking for host processor xmm support...No > Checking that ptrace can change system call numbers...OK > Checking that host ptys support output SIGIO...Yes > Checking that host ptys support SIGIO on close...No, enabling workaround > Checking for /dev/anon on the host...Not available (open failed with errno 2) > POSIX conformance testing by UNIFIX > Linux NET4.0 for Linux 2.4 > Based upon Swansea University Computer Society NET3.039 > Sangoma WANPIPE Router v1.1 (c) 1995-2000 Sangoma Technologies Inc. > Initializing RT netlink socket > Starting kswapd > VFS: Disk quotas vdquot_6.5.1 > Journalled Block Device driver loaded > devfs: v1.12c (20020818) Richard Gooch (rg...@at...) > devfs: boot_options: 0x1 > pty: 2048 Unix98 ptys configured > RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize > loop: loaded (max 8 devices) > PPP generic driver version 2.4.2 > Equalizer1996: $Revision: 1.2.1 $ $Date: 1996/09/22 13:52:00 $ Simon Janes (si...@nc...) > Universal TUN/TAP device driver 1.5 (C)1999-2002 Maxim Krasnyansky > ipddp.c:v0.01 8/28/97 Bradford W. Johnson <joh...@ma...> > ipddp0: Appletalk-IP Encap. mode by Bradford W. Johnson <joh...@ma...> > md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > md: Autodetecting RAID arrays. > md: autorun ... > md: ... autorun DONE. > Initializing software serial port version 1 > mconsole (version 2) initialized on /root/.uml/ns1/mconsole > Partition check: > ubda: unknown partition table > ubdb: unknown partition table > ubd : Synchronous mode > Initializing stdio console driver > Netdevice 0 : TUN/TAP backend - IP = 217.146.142.84 > NET4: Linux TCP/IP 1.0 for NET4.0 > IP Protocols: ICMP, UDP, TCP, IGMP > IP: routing cache hash table of 512 buckets, 4Kbytes > TCP: Hash tables configured (established 4096 bind 4096) > IPv4 over IPv4 tunneling driver > GRE over IPv4 tunneling driver > Linux IP multicast router 0.06 plus PIM-SM > ip_conntrack version 2.1 (474 buckets, 3792 max) - 292 bytes per conntrack > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > IPv6 v0.8 for NET4.0 > IPv6 over IPv4 tunneling driver > NET4: AppleTalk 0.18a for Linux NET4.0 > EXT3-fs: INFO: recovery required on readonly filesystem. > EXT3-fs: write access will be enabled during recovery. > kjournald starting. Commit interval 5 seconds > EXT3-fs: recovery complete. > EXT3-fs: mounted filesystem with ordered data mode. > VFS: Mounted root (ext3 filesystem) readonly. > Mounted devfs on /dev > Adding Swap: 262136k swap-space (priority -1) > EXT3 FS 2.4-0.9.19, 19 August 2002 on ubd(98,0), internal journal > Virtual console 1 assigned device '/dev/ptyp1' > * insmod tun > insmod: tun: no module by that name found > * ifconfig tap1 217.146.142.84 netmask 255.255.255.255 up > * bash -c echo 1 > /proc/sys/net/ipv4/ip_forward > * route add -host 217.146.142.73 dev tap1 > * bash -c echo 1 > /proc/sys/net/ipv4/conf/tap1/proxy_arp > * arp -Ds 217.146.142.73 eth0 pub > * route del -host 217.146.142.73 dev tap1 > * bash -c echo 0 > /proc/sys/net/ipv4/conf/tap1/proxy_arp > * arp -i eth0 -d 217.146.142.73 pub > * route add -host 217.146.142.73 dev tap1 > * bash -c echo 1 > /proc/sys/net/ipv4/conf/tap1/proxy_arp > * arp -Ds 217.146.142.73 eth0 pub > eth0: no IPv6 routers present > > > Any idea what could cause those lags? Any hint for further troubleshooting > approaches? > > Thanks! > > Joern > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > User-mode-linux-user mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user > |
From: roland <for...@gm...> - 2004-04-09 12:41:16
|
hi again, i did some testing and i`m able to reproduce this with 2.6 based uml on 2.6 host. after starting some threads like while true;do find /;done >/dev/zero 2>&1 & while true;do dd if=/dev/urandom of=/somepath/test.dat >/dev/zero 2>&1 & while true;do dd íf=/dev/hda of=/dev/zero >/dev/zero 2>&1 & and thus stressing the I/O on host very much, all seems to run fine for some time - BUT: from time to time (didn`t examine, if this is periodic - but all runs fine at least >95% of the time) i get the same lags, when pinging the uml. since i`m stressing the I/O very heavily, the lags i see are much worse than joerns. i sometimes get lags >60s and even error-messages from ping: sendms: no buffer space available. ok - my generated disk I/O uses most of the buffers, but shouln`t it be the host kernels job,to leave some buffers reserved and "schedule" that appropriately ? i`m not an expert regarding scheduling - but this is what i expect from a kernel. does anybody run a more recent (>2.6.0) HOST kernel and is also able to reproduce this? maybe this is a "scheduling question" or a question for LKML ? maybe i`m just stupid doing such "nasty things" i shouldn`t do - and i`m expecting too much ? :D at least, i would be interested in getting to know: - why this happens - how to "tune" this, that it doesn`t happen - if this is expected behavour on "heavy loaded systems" or probably a uml or host-kernel bug regards roland ----- Original Message ----- From: "roland" <for...@gm...> To: "Joern Bredereck" <jb...@bw...>; <use...@li...> Sent: Friday, April 09, 2004 1:10 PM Subject: Re: [uml-user] Network lags > Hi Joern! > Is the uml 100% idle all the time? > what`s going on on the host at the same time? > uml is just another "process" on the host - and the scheduler (especially I/O) of the 2.4 kernel series is not the really best. 2.6 > is MUCH better - so you probably could try 2.6 on HOST and compare, if that makes a difference? > > could you run "vmstat 1" on the host and inside the uml while pinging? > > could you also ping into the other direction and put the results "side by side", to see if there is a relation ? > > as of writing this, i searched the mailing-list archive and found a reference: > > http://sourceforge.net/mailarchive/message.php?msg_id=6285243 > > so, you probably can do "some more" I/O on your host or on your UML (dd if=....) and study the ping "behaviour" ? > > > regards > roland > > > ----- Original Message ----- > From: "Joern Bredereck" <jb...@bw...> > To: <use...@li...> > Sent: Friday, April 09, 2004 12:27 PM > Subject: [uml-user] Network lags > > > > Hi, > > > > I have a strange problem with lagging network connections to my vservers. > > > > Here is a typical ping to one of the uml vservers, sent from the uml host > > itself: > > > > 64 bytes from 217.146.142.73: icmp_seq=44 ttl=64 time=0.2 ms > > 64 bytes from 217.146.142.73: icmp_seq=45 ttl=64 time=0.2 ms > > 64 bytes from 217.146.142.73: icmp_seq=46 ttl=64 time=0.2 ms > > 64 bytes from 217.146.142.73: icmp_seq=47 ttl=64 time=235.6 ms > > 64 bytes from 217.146.142.73: icmp_seq=48 ttl=64 time=0.4 ms > > 64 bytes from 217.146.142.73: icmp_seq=49 ttl=64 time=0.2 ms > > 64 bytes from 217.146.142.73: icmp_seq=50 ttl=64 time=0.4 ms > > 64 bytes from 217.146.142.73: icmp_seq=51 ttl=64 time=0.2 ms > > 64 bytes from 217.146.142.73: icmp_seq=52 ttl=64 time=0.4 ms > > 64 bytes from 217.146.142.73: icmp_seq=53 ttl=64 time=2876.6 ms > > 64 bytes from 217.146.142.73: icmp_seq=54 ttl=64 time=1876.7 ms > > 64 bytes from 217.146.142.73: icmp_seq=55 ttl=64 time=876.7 ms > > 64 bytes from 217.146.142.73: icmp_seq=56 ttl=64 time=1.8 ms > > 64 bytes from 217.146.142.73: icmp_seq=57 ttl=64 time=0.3 ms > > 64 bytes from 217.146.142.73: icmp_seq=58 ttl=64 time=0.4 ms > > > > As you can see, most of the times, the pings are just fine. But every 10 > > to 20 seconds there is a lag. Sometime only 500 ms, sometimes 3000 ms and > > sometimes even 10 seconds long. > > > > You can imagine it's rellay no fun to work on a lagging vserver via ssh. > > :-( > > > > The uml host itself has no networking problems with the outside world. > > Pings to everywhere are just as they are supposed to be. > > > > I use the tap-devices for networking. > > > > vserver2:/etc/init.d# ifconfig tap1 > > tap1 Link encap:Ethernet HWaddr 00:FF:7F:4C:C5:2F > > inet addr:217.146.142.84 Bcast:217.146.142.255 > > Mask:255.255.255.255 > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:3478644 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:2961598 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:1503132252 (1.3 GiB) TX bytes:299835570 (285.9 MiB) > > > > > > The uml, as well as the uml host is a Debian Woody 3.0. > > > > Kernels in use: > > > > On the host: Vanilla 2.4.23 with the host-skas-patch. > > > > On the uml: > > > > Linux version 2.4.23-1um (root@vserver1) (gcc version 2.95.4 20011002 (Debian prerelease)) #5 Sun Dec 21 04:26:57 CET 2003 > > On node 0 totalpages: 16384 > > zone(0): 16384 pages. > > zone(1): 0 pages. > > zone(2): 0 pages. > > Kernel command line: ubd0=/dev/vserver/ns1 ubd1=/dev/vserver/swap_ns1 eth0=tuntap,,,217.146.142.84 umid=ns1 mem=64M root=/dev/ubd0 > > Calibrating delay loop... 3135.26 BogoMIPS > > Memory: 60772k available > > Dentry cache hash table entries: 8192 (order: 4, 65536 bytes) > > Inode cache hash table entries: 4096 (order: 3, 32768 bytes) > > Mount cache hash table entries: 512 (order: 0, 4096 bytes) > > Buffer cache hash table entries: 1024 (order: 0, 4096 bytes) > > Page-cache hash table entries: 16384 (order: 4, 65536 bytes) > > Checking for host processor cmov support...Yes > > Checking for host processor xmm support...No > > Checking that ptrace can change system call numbers...OK > > Checking that host ptys support output SIGIO...Yes > > Checking that host ptys support SIGIO on close...No, enabling workaround > > Checking for /dev/anon on the host...Not available (open failed with errno 2) > > POSIX conformance testing by UNIFIX > > Linux NET4.0 for Linux 2.4 > > Based upon Swansea University Computer Society NET3.039 > > Sangoma WANPIPE Router v1.1 (c) 1995-2000 Sangoma Technologies Inc. > > Initializing RT netlink socket > > Starting kswapd > > VFS: Disk quotas vdquot_6.5.1 > > Journalled Block Device driver loaded > > devfs: v1.12c (20020818) Richard Gooch (rg...@at...) > > devfs: boot_options: 0x1 > > pty: 2048 Unix98 ptys configured > > RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize > > loop: loaded (max 8 devices) > > PPP generic driver version 2.4.2 > > Equalizer1996: $Revision: 1.2.1 $ $Date: 1996/09/22 13:52:00 $ Simon Janes (si...@nc...) > > Universal TUN/TAP device driver 1.5 (C)1999-2002 Maxim Krasnyansky > > ipddp.c:v0.01 8/28/97 Bradford W. Johnson <joh...@ma...> > > ipddp0: Appletalk-IP Encap. mode by Bradford W. Johnson <joh...@ma...> > > md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27 > > md: Autodetecting RAID arrays. > > md: autorun ... > > md: ... autorun DONE. > > Initializing software serial port version 1 > > mconsole (version 2) initialized on /root/.uml/ns1/mconsole > > Partition check: > > ubda: unknown partition table > > ubdb: unknown partition table > > ubd : Synchronous mode > > Initializing stdio console driver > > Netdevice 0 : TUN/TAP backend - IP = 217.146.142.84 > > NET4: Linux TCP/IP 1.0 for NET4.0 > > IP Protocols: ICMP, UDP, TCP, IGMP > > IP: routing cache hash table of 512 buckets, 4Kbytes > > TCP: Hash tables configured (established 4096 bind 4096) > > IPv4 over IPv4 tunneling driver > > GRE over IPv4 tunneling driver > > Linux IP multicast router 0.06 plus PIM-SM > > ip_conntrack version 2.1 (474 buckets, 3792 max) - 292 bytes per conntrack > > NET4: Unix domain sockets 1.0/SMP for Linux NET4.0. > > IPv6 v0.8 for NET4.0 > > IPv6 over IPv4 tunneling driver > > NET4: AppleTalk 0.18a for Linux NET4.0 > > EXT3-fs: INFO: recovery required on readonly filesystem. > > EXT3-fs: write access will be enabled during recovery. > > kjournald starting. Commit interval 5 seconds > > EXT3-fs: recovery complete. > > EXT3-fs: mounted filesystem with ordered data mode. > > VFS: Mounted root (ext3 filesystem) readonly. > > Mounted devfs on /dev > > Adding Swap: 262136k swap-space (priority -1) > > EXT3 FS 2.4-0.9.19, 19 August 2002 on ubd(98,0), internal journal > > Virtual console 1 assigned device '/dev/ptyp1' > > * insmod tun > > insmod: tun: no module by that name found > > * ifconfig tap1 217.146.142.84 netmask 255.255.255.255 up > > * bash -c echo 1 > /proc/sys/net/ipv4/ip_forward > > * route add -host 217.146.142.73 dev tap1 > > * bash -c echo 1 > /proc/sys/net/ipv4/conf/tap1/proxy_arp > > * arp -Ds 217.146.142.73 eth0 pub > > * route del -host 217.146.142.73 dev tap1 > > * bash -c echo 0 > /proc/sys/net/ipv4/conf/tap1/proxy_arp > > * arp -i eth0 -d 217.146.142.73 pub > > * route add -host 217.146.142.73 dev tap1 > > * bash -c echo 1 > /proc/sys/net/ipv4/conf/tap1/proxy_arp > > * arp -Ds 217.146.142.73 eth0 pub > > eth0: no IPv6 routers present > > > > > > Any idea what could cause those lags? Any hint for further troubleshooting > > approaches? > > > > Thanks! > > > > Joern > > > > > > > > ------------------------------------------------------- > > This SF.Net email is sponsored by: IBM Linux Tutorials > > Free Linux tutorial presented by Daniel Robbins, President and CEO of > > GenToo technologies. Learn everything from fundamentals to system > > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > > _______________________________________________ > > User-mode-linux-user mailing list > > Use...@li... > > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user > > > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > User-mode-linux-user mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user > |
From: Jeff D. <jd...@ad...> - 2004-04-09 15:49:15
|
for...@gm... said: > at least, i would be interested in getting to know: > - why this happens > - how to "tune" this, that it doesn`t happen > - if this is expected behavour on "heavy loaded systems" or probably a uml > or host-kernel bug As you said, you're making life hard for the host. My guess as to what's happening is this. You're generating a ton of IO which is pulling the host's memory into page cache. This causes UML to get swapped out because it's not doing anything except answering a ping once in a while. When it does have to answer a ping, it needs to be swapped back in, which might be hard since you're sucking down a lot of IO bandwidth. So, I don't think this says too much about whether anything's wrong with either the host or UML. Once thing you might try is do the same IO with O_DIRECT. That won't pollute the page cache. Jeff |
From: roland <for...@gm...> - 2004-04-09 18:27:30
|
Hi ! I think i found an interesting "scheduling issue" which led me to switch the subject of the thread. joern reported network lags, i have seen some other mails reporting about this issue, i had experienced such lags myself and i remember reading mails, that the first pings after "upping" the network interfaces in uml tend to give slow response time. just for the fun, i switched to deadline i/o scheduler on the host (kernel param "elevator=deadline") and, after a first try, things seem to behave better with that scheduler! i generated the same i/o load on the host, hogged the cpu like hell, loadavg was >40) and i NEVER recognized the network-lags again. i also cpu- and i/o-hogged the uml itself, which resulted in big increase of the ping-times (from average 0.2ms to 600ms), but i NEVER saw that strange behaviour of uml responsiveness again (ping times >90s,"sendmsg: no buffer space available") , as when i did use the standard AS (Anticipatory) Scheduler. there where _some_ pings >2000ms, too - but this didn`t happen often and not so seriously as with "AS" & an "idle uml" - the uml seems to be more responsive in general when using deadline scheduler. I cannot tell, if there is a bug in AS or this is a "uml<->AS issue" which causes uml network lags, but at least there should be done some more testing to confirm, what i found. so, i would like to encourage other people doing some more tests - especially those, who experience network-lags! regards roland ps: funny: i have solved my "problem" by switching to a different scheduler - joern has (probably) solved it by switching to a later uml-release. so is it a scheduler or a uml issue ? or both? pps: >This causes UML to get swapped out because it's not doing anything > except answering a ping once in a while. mhhh - as far as i remember i didn`t see(vmstat) a single byte of uml or any other process being swapped out when running my test. general question: does heavy usage of buffers lead to processes being swapped out? ----- Original Message ----- From: "Jeff Dike" <jd...@ad...> To: "roland" <for...@gm...> Cc: "Joern Bredereck" <jb...@bw...>; <use...@li...>; <use...@li...>; <mi...@el...> Sent: Friday, April 09, 2004 6:26 PM Subject: Re: [uml-user] Network lags > for...@gm... said: > > at least, i would be interested in getting to know: > > - why this happens > > - how to "tune" this, that it doesn`t happen > > - if this is expected behavour on "heavy loaded systems" or probably a uml > > or host-kernel bug > > As you said, you're making life hard for the host. > > My guess as to what's happening is this. > > You're generating a ton of IO which is pulling the host's memory into page > cache. This causes UML to get swapped out because it's not doing anything > except answering a ping once in a while. When it does have to answer a ping, > it needs to be swapped back in, which might be hard since you're sucking down > a lot of IO bandwidth. > > So, I don't think this says too much about whether anything's wrong with > either the host or UML. > > Once thing you might try is do the same IO with O_DIRECT. That won't pollute > the page cache. > > Jeff > |
From: BlaisorBlade <bla...@ya...> - 2004-04-10 12:28:47
|
Alle 20:33, venerd=EC 9 aprile 2004, roland ha scritto: > Hi ! > I think i found an interesting "scheduling issue" which led me to switch > the subject of the thread. 1) Well, you switched from AS to deadline and got a significant IO performa= nce=20 improvement, which made UML faster since it was Disk-bound. The problem wit= h=20 AS is that it uses an heuristic, which assumes sequential reads. But "find /" is very seeky, so it is slow with AS. So if you remove this bg= =20 thread, you should get significantly different results. Also, AS has had some changes in later kernels, so you should probably upgr= ade=20 your host kernel to the latest version. A security hole (a local exploit to= =20 become root) has also been fixed after 2.6.0. 2) In your last email, you speak about hogging the UML CPU. If Jeff guessed= =20 correctly what happens, then avoiding that UML is idle will decrease the=20 worst-case ping response times. =2D-=20 Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: roland <for...@gm...> - 2004-04-13 21:19:44
|
hi! >1) Well, you switched from AS to deadline and got a significant IO performance >improvement, which made UML faster since it was Disk-bound. The problem with >AS is that it uses an heuristic, which assumes sequential reads. mmhh - i didn`t really recognize that i got a big improvement of i/o performance in general. i just recognized, that my uml was more responsive to ping (no more "sendmsg: no buffer space available,no network lags of values like 60s anymore). btw: my uml didn`t access the disk at all, while answering the pings (why should it? all that network-stuff happens in uml-kernel space) >But "find /" is very seeky, so it is slow with AS. So if you remove this bg >thread, you should get significantly different results. no - that doesn`t really matter. ok - it took some more time that i got network lags with my uml and "sendmsg:..."-ping-errors. but the errors were the same and happened regardless of the 10 parallel "find /" threads, i started on the host. >Also, AS has had some changes in later kernels, so you should probably upgrade >your host kernel to the latest version. A security hole (a local exploit to thats a point. will try that if i find some time. does somebody have some more details what was changed exactly? (i dont know much about bitkeeper - but maybe i can view the changelog like in cvs, for every single file?) >2) In your last email, you speak about hogging the UML CPU. If Jeff guessed >correctly what happens, then avoiding that UML is idle will decrease the >worst-case ping response times. i think he guessed wrong. it doesnt matter if my uml is idle or not - no difference if i run a vmstat (and some more) inside my uml( i.e. it is not idle) while pinging. the effect i experience doesnt`t seem to be an effect which is caused by a uml being swapped out and in again - because i didn`t see a single byte being swapped out/in on the host. so this effect remains....strange for me. regards roland ----- Original Message ----- From: "BlaisorBlade" <bla...@ya...> To: "roland" <for...@gm...>; "Jeff Dike" <jd...@ad...> Cc: "Joern Bredereck" <jb...@bw...>; <use...@li...>; <use...@li...>; <mi...@el...> Sent: Saturday, April 10, 2004 2:28 PM Subject: Re: [uml-devel] uml responsivenes or "AS vs DEADLINE Scheduler on uml-host" - was: [uml-user] Network lags Alle 20:33, venerdì 9 aprile 2004, roland ha scritto: > Hi ! > I think i found an interesting "scheduling issue" which led me to switch > the subject of the thread. 1) Well, you switched from AS to deadline and got a significant IO performance improvement, which made UML faster since it was Disk-bound. The problem with AS is that it uses an heuristic, which assumes sequential reads. But "find /" is very seeky, so it is slow with AS. So if you remove this bg thread, you should get significantly different results. Also, AS has had some changes in later kernels, so you should probably upgrade your host kernel to the latest version. A security hole (a local exploit to become root) has also been fixed after 2.6.0. 2) In your last email, you speak about hogging the UML CPU. If Jeff guessed correctly what happens, then avoiding that UML is idle will decrease the worst-case ping response times. -- Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=ick _______________________________________________ User-mode-linux-user mailing list Use...@li... https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user |
From: Henrik N. <um...@he...> - 2004-04-14 01:03:00
|
On Tue, 13 Apr 2004, roland wrote: > mmhh - i didn`t really recognize that i got a big improvement of i/o > performance in general. i just recognized, that my uml was more > responsive to ping (no more "sendmsg: no buffer space available,no > network lags of values like 60s anymore). btw: my uml didn`t access the > disk at all, while answering the pings (why should it? all that > network-stuff happens in uml-kernel space) Heavy host I/O can cause the inactive portions of your UMLs to be swapped out.. > thats a point. will try that if i find some time. does somebody have some more details > what was changed exactly? (i dont know much about bitkeeper - but maybe i can view the > changelog like in cvs, for every single file?) You can view which changesets applies to a given file, and from there the log messages of those, much the same as in CVS but not as simple operation. > i think he guessed wrong. it doesnt matter if my uml is idle or not - no difference > if i run a vmstat (and some more) inside my uml( i.e. it is not idle) only running a vmstat is effectively idle. you need to be running something which really uses the CPU within your UML to not make it idle. A simple "perl -e '$i++ while(1);'" should do nicely as a CPU hog.. > while pinging. the effect i experience doesnt`t seem to be an effect > which is caused by a uml being swapped out and in again - because i > didn`t see a single byte being swapped out/in on the host. so this > effect remains....strange for me. What did "vmstat 1" on the host report for the period around your strange ping results? Are you using TT or SKAS? If TT then swapping of UML does not show up in swap in/out as the UML is then running from memory mapped files and not using the host swap. I think SKAS uses the host swap for the UML memory, but I am not sure.. Regards Henrik |
From: <sko...@up...> - 2004-04-10 11:04:36
|
> You're generating a ton of IO which is pulling the host's memory into page > cache. This causes UML to get swapped out because it's not doing anything > except answering a ping once in a while. When it does have to answer a ping, > it needs to be swapped back in, which might be hard since you're sucking down > a lot of IO bandwidth. is there any way for the UML-kernel to allocate a non-swappable memore-region in the host's memory? this could be especially important for dedicated host-machines that are only to run UMLs. |
From: Jeff D. <jd...@ad...> - 2004-04-10 13:30:23
|
On Sat, Apr 10, 2004 at 01:06:51PM +0200, Sven K=F6hler wrote: > is there any way for the UML-kernel to allocate a non-swappable=20 > memore-region in the host's memory? this could be especially important=20 > for dedicated host-machines that are only to run UMLs. Yes, but there's no way I'm going to support anything like that. What I am going to do is make it possible to manage the host's memory so that it doesn't swap, and the UMLs in effect are mlocked. Jeff |
From: <sko...@up...> - 2004-04-10 13:58:19
|
>>is there any way for the UML-kernel to allocate a non-swappable >>memore-region in the host's memory? this could be especially important >>for dedicated host-machines that are only to run UMLs. > > Yes, but there's no way I'm going to support anything like that. what's so bad about it? > What I am going to do is make it possible to manage the host's memory > so that it doesn't swap, and the UMLs in effect are mlocked. well, what do you mean with managing the host memory? will be some kind tool or built-in option of the UML-kernel? |
From: Jeff D. <jd...@ad...> - 2004-04-10 16:28:55
|
On Sat, Apr 10, 2004 at 04:00:25PM +0200, Sven K=F6hler wrote: > what's so bad about it? It can put the host at risk. There's a VM system for a reason, and it ne= eds swap in order to be able to back out of memory shortages. mlocking large amounts of memory subverts that and increases the possibility of deadlock= ing. There are only a few good reasons for mlocking, and performance isn't one= of them: when something would break horribly if a page got swapped, i.e. pending DMA or directIO into it security - gpg wants to mlock a page so it can be sure that secrets on it won't be written to disk Those only involve a page at a time, and don't possibly endanger the syst= em. The amounts of mlocked memory you're talking about are so large that they= =20 could endanger the system. > well, what do you mean with managing the host memory? will be some kind= =20 > tool or built-in option of the UML-kernel? Both. There is pluggable memory in UML, and a tool on the host to manage memory by moving memory between UMLs and between UMLs and the host. So, this daemon would be watching memory use on the host and the UMLs, an= d if it saw the host running short, it would find an idle UML and take some memory away from it: uml_mconsole <idle_umid> config mem=3D-32M This would free the memory to the host, and pull it away from swap. If the host was fine, but another UML was short of memory, it would then give that memory to the busy UML: uml_mconsole <busy_umid> config mem+=3D32M Jeff |
From: <sko...@up...> - 2004-04-10 21:20:39
|
>>what's so bad about it? > > It can put the host at risk. There's a VM system for a reason, and it needs > swap in order to be able to back out of memory shortages. mlocking large > amounts of memory subverts that and increases the possibility of deadlocking. good point, but mlocking 128M should be like running the host with 128M less. so if i could tell the kernel not to touch a certain memory-area on the append-line, should there still be any VM issues? of course that would need further patching of the host-kernel so that UML can access and manage the reserved memory. >>well, what do you mean with managing the host memory? will be some kind >>tool or built-in option of the UML-kernel? > > Both. There is pluggable memory in UML, and a tool on the host to manage > memory by moving memory between UMLs and between UMLs and the host. sound good too, but what i have in mind is a more static thing. even if the host goes mad and eats all the memory, i still want to be sure that the UML-kernels have their memory region and this won't get swapped out. |
From: Henrik N. <um...@he...> - 2004-04-10 23:30:03
|
On Sat, 10 Apr 2004, Sven K=F6hler wrote: > good point, but mlocking 128M should be like running the host with 128M= =20 > less. so if i could tell the kernel not to touch a certain memory-area=20 > on the append-line, should there still be any VM issues? > of course that would need further patching of the host-kernel so that=20 > UML can access and manage the reserved memory. Just add your own patch to uml to mlock the memory if you seriously=20 think this is a good idea. Myself would build me a host with ample memory, and no swap partition if swapping of the host worries me. (Note: /dev/anon required to stop UMLs from being swapped out). > sound good too, but what i have in mind is a more static thing. even if= =20 > the host goes mad and eats all the memory, i still want to be sure that= =20 > the UML-kernels have their memory region and this won't get swapped out. See first paragraph above.. or better yet, don't allow running anything o= n the host which may go berserk and eat all the memory or cause heavy=20 I/O.. Regards Henrik |
From: BlaisorBlade <bla...@ya...> - 2004-04-12 14:53:28
|
Alle 19:07, sabato 10 aprile 2004, Jeff Dike ha scritto: > On Sat, Apr 10, 2004 at 04:00:25PM +0200, Sven K=F6hler wrote: > > what's so bad about it? > > It can put the host at risk. There's a VM system for a reason, and it > needs swap in order to be able to back out of memory shortages. mlocking > large amounts of memory subverts that and increases the possibility of > deadlocking. Well, it is true, but also reducing memory on the host increases the=20 possibility of OOM (not deadlock - only processes killed; you can deadlock= =20 only with the OOM killer, but with late 2.4 kernels you can disable it). What is true is that mlock()ing cache memory is meaningless; each file shou= ld=20 be cached either on the host or on the guest; it is better to reduce cache= =20 memory size, by subtracting memory to UML. If we use humfs and mmap(), then we get that cache exist only on the host (= and=20 is mmap'ed on the guest). So we could mlock the memory actually used by UML= =20 (for instance the UML kernel, which should never be swapped). =2D-=20 Paolo Giarrusso, aka Blaisorblade Linux registered user n. 292729 |
From: roland <for...@gm...> - 2004-04-15 00:24:43
|
hi ! > what's so bad about it? same statement from me. whats so bad about mlocking? i would like to bring Matthew Blochs patch back into discussion, since it gives an OPTION to mlock a uml. see thread at: http://marc.theaimsgroup.com/?l=user-mode-linux-devel&w=2&r=1&s=Locking+user-mode+kernel+RAM+into+host+p&q=b btw: i like options ;) from the mail of matthew: >The 2.4 kernel does appear to swap out >applications when (e.g.) a large file copy is underway . .. >I like the sound of the more dynamic memory arrangement and manager daemon for >UMLs, sounds like a good way to squeeze more customers onto a host machine. >But if for the moment the only thing it should do is to keep the host from >swapping, I know from a few our customers that mlocking has given more >predictable behaviour from guest kernels, especially if the host load spikes >for a few minutes. this is a statement from a person whe has some advanced practical experience with uml! >Anyone else who's running lots of UMLs with interactive sessions may have >noticed the following symptoms which this patch cures: > > * irregular ping times to UMLs-- huge delay for the first ping, a few > normal, a couple of long delays etc.; > * long gaps in interactive response (having to "wake it up" with some > keypresses), especially after connecting for the first time in a few > hours. as far as i remember this problems have been reported repeatedly - so all users who experience weird uml lags could probably fix their problem just by activating that option. furthermore inclusion of the patch would "honour" the work of the person, who did that patch. personally, i find such thing important for an opensource project, because it is some form of "encouragement" to develop patches and giving credits to others. :) yes it is somewhat "controversial" if mlocking is good or bad. but since an option is an option, the "professional user" can decide himself. >It can put the host at risk. i think someone who runs a uml KNOWS what he does - and is able to calculate the amount of ram on his host and reserve free ram for i/o buffering or whatelse.... >mlocking large amounts of memory subverts that and increases the possibility of deadlocking. right! but this option could be documented with a "warning": use with care! this could endanger your system! furthermore - that patch doesn`t seem to be such "big issue" because it`s just some lines of code being changed in uml..... i would like to give a (late) vote for inclusion of that patch. regards roland ----- Original Message ----- From: "Jeff Dike" <jd...@ad...> To: "Sven Köhler" <sko...@up...> Cc: <use...@li...>; <use...@li...> Sent: Saturday, April 10, 2004 7:07 PM Subject: Re: [uml-user] Network lags On Sat, Apr 10, 2004 at 04:00:25PM +0200, Sven Köhler wrote: > what's so bad about it? It can put the host at risk. There's a VM system for a reason, and it needs swap in order to be able to back out of memory shortages. mlocking large amounts of memory subverts that and increases the possibility of deadlocking. There are only a few good reasons for mlocking, and performance isn't one of them: when something would break horribly if a page got swapped, i.e. pending DMA or directIO into it security - gpg wants to mlock a page so it can be sure that secrets on it won't be written to disk Those only involve a page at a time, and don't possibly endanger the system. The amounts of mlocked memory you're talking about are so large that they could endanger the system. > well, what do you mean with managing the host memory? will be some kind > tool or built-in option of the UML-kernel? Both. There is pluggable memory in UML, and a tool on the host to manage memory by moving memory between UMLs and between UMLs and the host. So, this daemon would be watching memory use on the host and the UMLs, and if it saw the host running short, it would find an idle UML and take some memory away from it: uml_mconsole <idle_umid> config mem=-32M This would free the memory to the host, and pull it away from swap. If the host was fine, but another UML was short of memory, it would then give that memory to the busy UML: uml_mconsole <busy_umid> config mem+=32M Jeff ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id70&alloc_id638&op=ick _______________________________________________ User-mode-linux-user mailing list Use...@li... https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user |
From: David C. <li...@ed...> - 2004-04-15 11:57:36
|
> >I like the sound of the more dynamic memory arrangement and manager daemon for > >UMLs, sounds like a good way to squeeze more customers onto a host machine. > >But if for the moment the only thing it should do is to keep the host from > >swapping, I know from a few our customers that mlocking has given more > >predictable behaviour from guest kernels, especially if the host load spikes > >for a few minutes. mlocking would have the effect of being able to "squeeze" less customers onto a physical machine. Without mlocking, pages can be swapped out of memory onto disk thus your available memory is limited only by whatever size swap partition you can afford. This way, even if you only have 256MB physical memory you can run ten UMLs with 64MB each, as an untested example figure. If you mlock those 64MB for each UML you can only physically run four, not including space for host kernel and user space memory. Obviously there are huge performance penalties by constantly swapping out pages so more physical memory or faster swap do help. > >Anyone else who's running lots of UMLs with interactive sessions may have > >noticed the following symptoms which this patch cures: > > > > * irregular ping times to UMLs-- huge delay for the first ping, a few > > normal, a couple of long delays etc.; The huge delay is often related to ARP, a fundamental part of 802.3/true ethernet. > as far as i remember this problems have been reported repeatedly - so all users who experience > weird uml lags could probably fix their problem just by activating that option. The problem with "just activating that option" is that many people will do it blindly, unaware of the serious implications it could have to their host. Whilst mlocking an area of memory may not affect you now, or in the next ten minutes, it could cause trouble in three weeks when you start doing something on the host that is memory intensive. In the last three years of running a remote server I have only had to turn up on site to fix it twice. Both were very silly things, one that I'd "just activated" to see what it would do. It cost me a rather expensive journey half way across the country to fix. > furthermore inclusion > of the patch would "honour" the work of the person, who did that patch. personally, i find such thing > important for an opensource project, because it is some form of "encouragement" to develop patches and > giving credits to others. :) It is definitely good to see a community working on patches for a project but it must be recognised that the end result should be a professional, stable and reliable product. I am sure nobody has any problem with people posting patches on their own sites or on the mailing list to allow people to carry out certain tasks but modifications that contain potentially unstable code should, in my opinion, be left out of a mainstream release. Note that I speak for myself here, not UML or Jeff Dike. > >It can put the host at risk. > i think someone who runs a uml KNOWS what he does - and is able to calculate the amount of ram on his host > and reserve free ram for i/o buffering or whatelse.... The idea behind UML is to create a separate environment from the host, constructing a UML kernel that can interfere so badly with the host is the total opposite. > furthermore - that patch doesn`t seem to be such "big issue" because it`s just some lines of code being changed > in uml..... That statement is asinine. I could just change "some lines" in my shell scripts to read "rm -rf /" instead. Is that not a big issue? I do not understand kernel design very well at all but changing "some lines" of source can have serious implications. I agree with Jeff, mlocking is bad. David |
From: Dan L. <ar...@co...> - 2004-04-15 16:11:35
|
David Cannings wrote: >mlocking would have the effect of being able to "squeeze" less customers >onto a physical machine. Without mlocking, pages can be swapped out of >memory onto disk thus your available memory is limited only by whatever size >swap partition you can afford.>snip< > > I normally stay quiet, but this is an area that I have a vested interest in with my UML usage. I have a machine with 8 gigs of ram, and I'm running 4 UML sessions (along with a couple vmware sessions for Windows PDC/BDC/etc) and I see my UMLs swap out quiete often. I run 512Mb on each of them, and with my 'free' statements I can see I have enough cache/buffer to spare to not do this. It does it though. In a situation where you don't want your UMLs to swap out, an option would be nice. Just testing on another box with 2 gigs ram, I turned off the swap and now it's running gorgeously. I can't do that on my production system, though, because there are processes that run in batch occasionally that do use memory into the swap region. >The problem with "just activating that option" is that many people will do >it blindly, unaware of the serious implications it could have to their host. > > It's best not to keep an option out because you think someone will use it without realizing what it is. If that's the case, the vanilla kernel should take out the packet generator, or kernel debugging. Both can have serious implications on either the local machine, or on another machine on the network. Heck, just selecting certain things wrong in the Linux kernel when configuring it can have serious implications. That's what the help portion of menuconfig is really for, to alert the user to how to properly handle that option. >Whilst mlocking an area of memory may not affect you now, or in the next ten >minutes, it could cause trouble in three weeks when you start doing >something on the host that is memory intensive. > If it's mlocked, your host will use swap for the other processes once it uses the ram that's still available. But, it's predictable. If you have 1 gig of ram, you mlock a 512mb UML, you always have 512mb left for the kernel/processes. If I'm not mistaken, VMWare does it that way, or at least locks a certain amount of ram in some way. >In the last three years of >running a remote server I have only had to turn up on site to fix it twice. >Both were very silly things, one that I'd "just activated" to see what it >would do. It cost me a rather expensive journey half way across the country >to fix. > > There is one thing I've learned very early on. Don't experiment with machines that you can't touch. I've been burned by that too many times, and it's a mantra I live by now. When I'm doing something new, I do it here. If it works, I implement it in Texas, 2 states east. If I turned a "just activated" something in Texas, can you imagine what would happen if it broke? >It is definitely good to see a community working on patches for a project >but it must be recognised that the end result should be a professional, >stable and reliable product. > And so it shall be, with or without mlock, or any other features that are implemented for that matter. >I am sure nobody has any problem with people >posting patches on their own sites or on the mailing list to allow people to >carry out certain tasks but modifications that contain potentially unstable >code should, in my opinion, be left out of a mainstream release. Note that >I speak for myself here, not UML or Jeff Dike. > > > That's why any new code needs to be tested, ran through the wringer so to speak, bug fixed, repeat as necessary. That goes for UML in general. >The idea behind UML is to create a separate environment from the host, >constructing a UML kernel that can interfere so badly with the host is the >total opposite. > > > I fail to see how allocating all of your memory in the beginning is a bad thing. It's just a different philosophy of VM. It won't put your host at risk, unless your host runs the UML, along with something else that will take ($TOTAL_RAM - $UMLRAM) + $SWAPSIZE. It's better than the alternative: Running a host without a swap partition to keep it from swapping out. >That statement is asinine. I could just change "some lines" in my shell >scripts to read "rm -rf /" instead. Is that not a big issue? I do not >understand kernel design very well at all but changing "some lines" of >source can have serious implications. > > > Let's not get silly. Of course, as a competent script writer you are going to realize that rm -rf / is a bad thing. The same with a coder realizing something being bad. If people had this type of mindset back when initrd/ramdisk/software raid were being put into the Linux kernel, it would have never made it into the official kernel. Options are good, even if they go against your convictions. --Dan ----------- This email message and any attachment may contain Confidential or Protected Health Information. If you are not the intended recipient please notify us immediately at 480-648-4545 and delete the message. |
From: roland <for...@gm...> - 2004-04-15 21:05:34
|
damn - a really controversial theme! :) this have been two really completely different opinions. personally, i don`t really NEED that mlock patch very urgently - so i`m not really pissed of, if it doesn`t go into uml. but i would be sad if it`s being drpped completely. lets give an example: if i drive a car, i can even kill people or kill myself - by accident. does somebody stop driving cars or do manufacturers stop building cars or limiting speed of cars because of that fact? no. cars give people an option to move quickly from one location to another. and it`s up to themselves to deceide, if that`s worth putting othere peoples life or their own life at risk. maybe this isn`t the best "comparison" - but i really miss the point, why a feature, which is useful for some people shouldn`t be included into a software, when it helps fixing a problem for them. just excluding that feature, because of some "theory" that it COULD probably "hurt" others? that`s the same as if developers of "dd" disable writing to files from that tool - because somebody could accidentally fill up a whole harddrive (dd if=/dev/zero of=test.dat ...) with that. what about the fact, that this patch actually seems so solve a problem and adds flexibility to uml? as i already said - there could be a big "WARNING" put to that option, so everybody who uses it, is made aware of the bells and whistles and potential problems resulting from this. something like this?: +static int __init uml_mlock_setup(char *line, int *add) +{ + physmem_lock = 1; + return 0; +} +__uml_setup("mlock", uml_mlock_setup, +"mlock\n" +" WARNING! Using this option could put your host at risk!\n" +" First read the documentation before using it!\n" +" \n" +" This option requests that the user-mode kernel's memory be locked\n" +" into the host kernel's RAM so that it cannot be swapped out.\n" +" This option requires that the user-mode kernel is run with root\n" +" privileges, which it drops after the mapping has been done.\n" +); to have both sides being satisfied, a good "compromise" could be, if that patch is maintained separatately or (better) is being made a compile option, which is OFF by default. regards roland ----- Original Message ----- From: "Dan Lund" <ar...@co...> To: "David Cannings" <li...@ed...>; "UML" <use...@li...> Sent: Thursday, April 15, 2004 5:44 PM Subject: Re: [uml-user] Network lags > David Cannings wrote: > > >mlocking would have the effect of being able to "squeeze" less customers > >onto a physical machine. Without mlocking, pages can be swapped out of > >memory onto disk thus your available memory is limited only by whatever size > >swap partition you can afford.>snip< > > > > > I normally stay quiet, but this is an area that I have a vested interest > in with my UML usage. > I have a machine with 8 gigs of ram, and I'm running 4 UML sessions > (along with a couple vmware sessions for Windows PDC/BDC/etc) and I see > my UMLs swap out quiete often. I run 512Mb on each of them, and with my > 'free' statements I can see I have enough cache/buffer to spare to not > do this. It does it though. In a situation where you don't want your > UMLs to swap out, an option would be nice. Just testing on another box > with 2 gigs ram, I turned off the swap and now it's running gorgeously. > I can't do that on my production system, though, because there are > processes that run in batch occasionally that do use memory into the > swap region. > > >The problem with "just activating that option" is that many people will do > >it blindly, unaware of the serious implications it could have to their host. > > > > > > It's best not to keep an option out because you think someone will use > it without realizing what it is. If that's the case, the vanilla kernel > should take out the packet generator, or kernel debugging. Both can > have serious implications on either the local machine, or on another > machine on the network. Heck, just selecting certain things wrong in > the Linux kernel when configuring it can have serious implications. > That's what the help portion of menuconfig is really for, to alert the > user to how to properly handle that option. > > >Whilst mlocking an area of memory may not affect you now, or in the next ten > >minutes, it could cause trouble in three weeks when you start doing > >something on the host that is memory intensive. > > > If it's mlocked, your host will use swap for the other processes once it > uses the ram that's still available. But, it's predictable. > If you have 1 gig of ram, you mlock a 512mb UML, you always have 512mb > left for the kernel/processes. If I'm not mistaken, VMWare does it that > way, or at least locks a certain amount of ram in some way. > > >In the last three years of > >running a remote server I have only had to turn up on site to fix it twice. > >Both were very silly things, one that I'd "just activated" to see what it > >would do. It cost me a rather expensive journey half way across the country > >to fix. > > > > > There is one thing I've learned very early on. Don't experiment with > machines that you can't touch. > I've been burned by that too many times, and it's a mantra I live by > now. When I'm doing something new, I do it here. If it works, I > implement it in Texas, 2 states east. If I turned a "just activated" > something in Texas, can you imagine what would happen if it broke? > > >It is definitely good to see a community working on patches for a project > >but it must be recognised that the end result should be a professional, > >stable and reliable product. > > > And so it shall be, with or without mlock, or any other features that > are implemented for that matter. > > >I am sure nobody has any problem with people > >posting patches on their own sites or on the mailing list to allow people to > >carry out certain tasks but modifications that contain potentially unstable > >code should, in my opinion, be left out of a mainstream release. Note that > >I speak for myself here, not UML or Jeff Dike. > > > > > > > That's why any new code needs to be tested, ran through the wringer so > to speak, bug fixed, repeat as necessary. That goes for UML in general. > > >The idea behind UML is to create a separate environment from the host, > >constructing a UML kernel that can interfere so badly with the host is the > >total opposite. > > > > > > > I fail to see how allocating all of your memory in the beginning is a > bad thing. It's just a different philosophy of VM. It won't put your > host at risk, unless your host runs the UML, along with something else > that will take ($TOTAL_RAM - $UMLRAM) + $SWAPSIZE. > It's better than the alternative: Running a host without a swap > partition to keep it from swapping out. > > >That statement is asinine. I could just change "some lines" in my shell > >scripts to read "rm -rf /" instead. Is that not a big issue? I do not > >understand kernel design very well at all but changing "some lines" of > >source can have serious implications. > > > > > > > > Let's not get silly. Of course, as a competent script writer you are > going to realize that rm -rf / is a bad thing. The same with a coder > realizing something being bad. > If people had this type of mindset back when initrd/ramdisk/software > raid were being put into the Linux kernel, it would have never made it > into the official kernel. > Options are good, even if they go against your convictions. > > --Dan > > ----------- > This email message and any attachment may contain Confidential or > Protected Health Information. If you are not the intended recipient > please notify us immediately at 480-648-4545 and delete the message. > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > User-mode-linux-user mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user > |
From: Matthew B. <ma...@by...> - 2004-04-16 15:12:36
|
David Cannings wrote: >>>Anyone else who's running lots of UMLs with interactive sessions may have >>>noticed the following symptoms which this patch cures: >>> >>> * irregular ping times to UMLs-- huge delay for the first ping, a few >>> normal, a couple of long delays etc.; > > The huge delay is often related to ARP, a fundamental part of 802.3/true > ethernet. Indeed, but when logging back on to an ssh session etc. it's very irritating for customers to find their machine takes seconds to "wake up" while the 2.4 kernel swaps stuff back in. [snip] > The problem with "just activating that option" is that many people will do > it blindly, unaware of the serious implications it could have to their host. > Whilst mlocking an area of memory may not affect you now, or in the next ten > minutes, it could cause trouble in three weeks when you start doing > something on the host that is memory intensive. In the last three years of > running a remote server I have only had to turn up on site to fix it twice. > Both were very silly things, one that I'd "just activated" to see what it > would do. It cost me a rather expensive journey half way across the country > to fix. Since writing that post back in October, we've deployed the mlock patch across two rackfuls of hosts, and we've never had a host kernel "just crash" (well we have, but that was down to something else <g>): longest uptime on one group of our hosts is 100 days. Yes it was experimental but we did a bit of testing first. From this experience; I would disagree that using lots of mlocked memory inherently destabilises the machine. I'm sure it *can* but this goes for a lot of things. [snip] > The idea behind UML is to create a separate environment from the host, > constructing a UML kernel that can interfere so badly with the host is the > total opposite. It's very easy for a UML kernel to trash the host, and I'm not sure anyone knows what "the idea behind UML" is :-) I can see Jeff's point about the mlock patch despite the fact that it solved a major problem for us with deploying UML in a hosting environment. Jeff's proposed new memory management features will give us more flexibility than my simple on/off mlock patch, and allows him to wash his hands of these grubby management decisions. cheerio, -- Matthew Bloch Bytemark Hosting http://www.bytemark-hosting.co.uk/ phone UK: 0845 004 3 004 US: 1-877 BYTEMAR Dedicated Linux hosts from 15ukp ($26) per month |
From: roland <for...@gm...> - 2004-04-18 10:48:27
|
hey, when the mlock patch doesn`t go into uml officially - why not updating it to the latest 2.4.x and 2.6.x uml and uploading it to the sourceforge patch section at http://sourceforge.net/tracker/?group_id=429&atid=300429 and "maintaining" it there? so others could find it, download it, use it, give comments...... regards roland ----- Original Message ----- From: "Matthew Bloch" <ma...@by...> To: <use...@li...> Cc: <use...@li...> Sent: Friday, April 16, 2004 5:12 PM Subject: Re: [uml-user] Network lags > David Cannings wrote: > >>>Anyone else who's running lots of UMLs with interactive sessions may have > >>>noticed the following symptoms which this patch cures: > >>> > >>> * irregular ping times to UMLs-- huge delay for the first ping, a few > >>> normal, a couple of long delays etc.; > > > > The huge delay is often related to ARP, a fundamental part of 802.3/true > > ethernet. > > Indeed, but when logging back on to an ssh session etc. it's very > irritating for customers to find their machine takes seconds to "wake > up" while the 2.4 kernel swaps stuff back in. > > [snip] > > The problem with "just activating that option" is that many people will do > > it blindly, unaware of the serious implications it could have to their host. > > Whilst mlocking an area of memory may not affect you now, or in the next ten > > minutes, it could cause trouble in three weeks when you start doing > > something on the host that is memory intensive. In the last three years of > > running a remote server I have only had to turn up on site to fix it twice. > > Both were very silly things, one that I'd "just activated" to see what it > > would do. It cost me a rather expensive journey half way across the country > > to fix. > > Since writing that post back in October, we've deployed the mlock patch > across two rackfuls of hosts, and we've never had a host kernel "just > crash" (well we have, but that was down to something else <g>): longest > uptime on one group of our hosts is 100 days. Yes it was experimental > but we did a bit of testing first. > > From this experience; I would disagree that using lots of mlocked > memory inherently destabilises the machine. I'm sure it *can* but this > goes for a lot of things. > > [snip] > > The idea behind UML is to create a separate environment from the host, > > constructing a UML kernel that can interfere so badly with the host is the > > total opposite. > > It's very easy for a UML kernel to trash the host, and I'm not sure > anyone knows what "the idea behind UML" is :-) > > I can see Jeff's point about the mlock patch despite the fact that it > solved a major problem for us with deploying UML in a hosting > environment. Jeff's proposed new memory management features will give > us more flexibility than my simple on/off mlock patch, and allows him to > wash his hands of these grubby management decisions. > > cheerio, > > -- > Matthew Bloch Bytemark Hosting > http://www.bytemark-hosting.co.uk/ > phone UK: 0845 004 3 004 US: 1-877 BYTEMAR > Dedicated Linux hosts from 15ukp ($26) per month > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > User-mode-linux-user mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user > |
From: Joern B. <jb...@bw...> - 2004-04-09 14:06:57
|
Hello Roland, On Fri, 9 Apr 2004, roland wrote: > Is the uml 100% idle all the time? the uml vserver is completey idle. > what`s going on on the host at the same time? 8 uml instances are running... that's basicly all. > uml is just another "process" on the host - and the scheduler > (especially I/O) of the 2.4 kernel series is not the really best. 2.6 is > MUCH better - so you probably could try 2.6 on HOST and compare, if that > makes a difference? I did some more testing and found out, that it seems to be a problem mit the uml kernel. By switching to the prebuild uml kernel from http://kernels.usermodelinux.org/kernels/linux-2.4.24-pre3-djc4-1um/mods/ I could get rid of the lags. Anyway as I want to find out, which setting in my self build 2.4.23 kernel are wrong, I'm currently trying several modified kernel setups. I will let you know, when I found out where that problem came from. > http://sourceforge.net/mailarchive/message.php?msg_id=6285243 > > so, you probably can do "some more" I/O on your host or on your UML (dd > if=....) and study the ping "behaviour" ? the pings didn't change while putting the host under stress with some dd disk I/O. My guess is, that there are some problems in the tcp/ip or netfilter stack. I don't think it's an I/O issue. Thanks for your help! Joern |
From: roland <for...@gm...> - 2004-04-10 00:41:21
|
> By switching to the prebuild uml kernel from http://kernels.usermodelinux.org/kernels/linux-2.4.24-pre3-djc4-1um/mods/ > I could get rid of the lags. > > Anyway as I want to find out, which setting in my self build 2.4.23 > kernel are wrong, I'm currently trying several modified kernel setups. > > I will let you know, when I found out >>>>where that problem came from. nice to see that there are people out there really digging into a problem and not just being satisfied that the problem is being "workarounded" for them (as most people tend to be) :) ----- Original Message ----- From: "Joern Bredereck" <jb...@bw...> To: <use...@li...> Sent: Friday, April 09, 2004 4:06 PM Subject: Re: [uml-user] Network lags > > Hello Roland, > > On Fri, 9 Apr 2004, roland wrote: > > > Is the uml 100% idle all the time? > > the uml vserver is completey idle. > > > what`s going on on the host at the same time? > > 8 uml instances are running... that's basicly all. > > > uml is just another "process" on the host - and the scheduler > > (especially I/O) of the 2.4 kernel series is not the really best. 2.6 is > > MUCH better - so you probably could try 2.6 on HOST and compare, if that > > makes a difference? > > I did some more testing and found out, that it seems to be a problem mit > the uml kernel. > > By switching to the prebuild uml kernel from http://kernels.usermodelinux.org/kernels/linux-2.4.24-pre3-djc4-1um/mods/ > I could get rid of the lags. > > Anyway as I want to find out, which setting in my self build 2.4.23 > kernel are wrong, I'm currently trying several modified kernel setups. > > I will let you know, when I found out where that problem came from. > > > http://sourceforge.net/mailarchive/message.php?msg_id=6285243 > > > > so, you probably can do "some more" I/O on your host or on your UML (dd > > if=....) and study the ping "behaviour" ? > > the pings didn't change while putting the host under stress with some dd > disk I/O. > > My guess is, that there are some problems in the tcp/ip or netfilter > stack. I don't think it's an I/O issue. > > Thanks for your help! > > Joern > > > > ------------------------------------------------------- > This SF.Net email is sponsored by: IBM Linux Tutorials > Free Linux tutorial presented by Daniel Robbins, President and CEO of > GenToo technologies. Learn everything from fundamentals to system > administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click > _______________________________________________ > User-mode-linux-user mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user > |
From: Jeff D. <jd...@ad...> - 2004-04-09 15:53:38
|
jb...@bw... said: > As you can see, most of the times, the pings are just fine. But every > 10 to 20 seconds there is a lag. Sometime only 500 ms, sometimes 3000 > ms and sometimes even 10 seconds long. As Roland said, vmstat on the host is a good start. Compare an episode of good pinging with a bad one to see if anything is happening on the host. Same thing inside the UML for the same reason. Then tcpdump on the host tap device and on the UML eth0, and comparing packet timestamps might give some clue as to where the pings are spending their time. Jeff |
From: Joern B. <jb...@bw...> - 2004-04-09 15:58:46
|
On Fri, 9 Apr 2004, Jeff Dike wrote: > > As you can see, most of the times, the pings are just fine. But every > > 10 to 20 seconds there is a lag. Sometime only 500 ms, sometimes 3000 > > ms and sometimes even 10 seconds long. > > As Roland said, vmstat on the host is a good start. Compare an episode of > good pinging with a bad one to see if anything is happening on the host. > > Same thing inside the UML for the same reason. As posted earlier, I could solve the problem by switching to the following precompiled kernel: http://kernels.usermodelinux.org/kernels/linux-2.4.23-rc3-djc3-6um/ This one works perfect with all 8 UMLs, that are running on that particular host. There are no lags anymore. The only thing that bothers me is that I could'nt find out what was wrong with my self build kernel. Anyway thanks for your help! Joern |