From: David L. <da...@la...> - 2006-03-27 08:46:44
|
I foolishly attempted to startup 25 uml instances on one system (dual 252 opterons with 8G of ram, each um instance getting 256M) what I found was that they seem to be getting in each others way a LOT (just on system boot), vmstat on the host is showing almost all of the cpu time (80%+) being spent in the system, not in userspace (which surprised me) so before I spend much time gathering info to try and debug this I wanted to ask what the current limits are, and if the limits should just be cpu and ram, then I'll do more digging to find out what's happening in my case. I definantly have some oddities among the 25, two of the 25 did not get far enough to use their cow files (20K each according to ls -ls) and when I connect to them via uml_mconsole it hangs when I issue any command (including help) I'm working from home so I won't see the consoles until I get in on monday, so I'll give more details then unless I'm told that I am being silly to try to start this many on one system. David Lang |
From: David L. <da...@la...> - 2006-03-27 23:42:56
|
yOn Mon, 27 Mar 2006, David Lang wrote: > I foolishly attempted to startup 25 uml instances on one system (dual 252 > opterons with 8G of ram, each um instance getting 256M) > > what I found was that they seem to be getting in each others way a LOT (just > on system boot), vmstat on the host is showing almost all of the cpu time > (80%+) being spent in the system, not in userspace (which surprised me) > > so before I spend much time gathering info to try and debug this I wanted to > ask what the current limits are, and if the limits should just be cpu and > ram, then I'll do more digging to find out what's happening in my case. well, I reduced the count to 19 instances, and upped the ram on each one to 400M (they were hitting oom with only 256m each) almost an hour later the machines still haven't finished booting with top looking basicly the same for the last half hour or so. top - 16:42:18 up 4 days, 23:11, 25 users, load average: 16.79, 16.65, 16.23 Tasks: 44193 total, 14 running, 139 sleeping, 44040 stopped, 0 zombie Cpu0 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Cpu1 : 3.2% us, 96.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.1% si Mem: 8186088k total, 8145620k used, 40468k free, 11436k buffers Swap: 2048276k total, 0k used, 2048276k free, 4959636k cached so far it looks to me like ram is Ok, but the high system percentage looks strange to me. the system closest to finishing it's boot has used a little over 10 min of cpu time (>5x the normal wall clock time for the boot) so I am running into contention at some point here. I upped the pid_max value to 128000 to give me some headroom there (each of the first 18 uml instances will end up running ~3600 processes when they finish booting) what could I do to assist in tracking down what is causing the contention? David Lang |
From: Christopher S. A. <ca...@th...> - 2006-03-27 23:59:07
|
David Lang wrote: > what could I do to assist in tracking down what is causing the contention? Are you running a skas3-patched host kernel? You didn't mention if you were running in 32bit mode. I don't believe there's a skas patch for 64bit kernels (yet). IMO, skas3 is required what you're after. http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/ > I upped the pid_max value to 128000 to give me some headroom there (each > of the first 18 uml instances will end up running ~3600 processes when > they finish booting) Sounds to me like you're running those UMLs in TT mode. If you can't/aren't going to patch your host with skas3, at least run a recent 2.6-um kernel in skas0 mode, which doesn't require a host kernel patch. The other two things are: Use the cfq disk scheduler. elevator=cfq on your kernel command line will do that, as long as it's compiled into your host's kernel. Use tmpfs mount for TMPDIR, as UML will use that to store its memory file. -Chris |
From: David L. <da...@la...> - 2006-03-28 00:17:11
|
On Mon, 27 Mar 2006, Christopher S. Aker wrote: > David Lang wrote: >> what could I do to assist in tracking down what is causing the contention? > > Are you running a skas3-patched host kernel? You didn't mention if you were > running in 32bit mode. I don't believe there's a skas patch for 64bit > kernels (yet). IMO, skas3 is required what you're after. > > http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/ sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 (which I understood included the skas patch) >> I upped the pid_max value to 128000 to give me some headroom there (each >> of the first 18 uml instances will end up running ~3600 processes when >> they finish booting) > > Sounds to me like you're running those UMLs in TT mode. If you can't/aren't > going to patch your host with skas3, at least run a recent 2.6-um kernel in > skas0 mode, which doesn't require a host kernel patch. they are running in skas mode, staticly compiled. the um's are 32-bit 2.6.16 TT mode disabled to enable static linking. the systems finish the boot sequence after useing about 12 min of cpu time each. > The other two things are: > > Use the cfq disk scheduler. elevator=cfq on your kernel command line will do > that, as long as it's compiled into your host's kernel. > > Use tmpfs mount for TMPDIR, as UML will use that to store its memory file. very little disk activity is takeing place during this time these are all COW root images from a ~300M base image ls -ls shows 305356 -rw-rw-rw- 1 root root 1073741825 2006-03-24 18:15 root_fs.basebuild2 9580 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1a-b 14880 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1a-p 12632 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1b-b 5512 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1b-p 5584 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1c-b 12180 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1c-p 14716 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1d-b 16164 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1d-p 12916 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1e-b 9504 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:13 root_fs.methane1e-p 10896 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 12:19 root_fs.methane1z-b 5780 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1z-p 6516 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 15:36 root_fs.methane2a-p 7980 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:14 root_fs.methane2b-p 6056 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 15:36 root_fs.methane2c-p 5968 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 15:36 root_fs.methane2d-p 7704 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane2e-p 7640 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:14 root_fs.methane2z-p 19364 -rw-rw-rw- 1 dlang staff 1074016257 2006-03-27 17:15 root_fs.router the command line to start these is ./linux-2.6.16-32 umid=methane1a-p ubd=memmap ubd0=root_fs.methane1a-p,root_fs.basebuild2 con0=tty:/dev/con-m1ap con=pty ssl=pty eth0=tuntap,m1ap-e0 eth1=tuntap,m1ap-e1 eth2=tuntap,m1ap-e2 eth3=tuntap,m1ap-e3 mem=400m stderr=1 </dev/con-m1ap >/dev/con-m1ap 2>/dev/con-m1ap & David Lang |
From: Jeff D. <jd...@ad...> - 2006-03-28 02:26:34
|
On Mon, Mar 27, 2006 at 03:42:52PM -0800, David Lang wrote: > well, I reduced the count to 19 instances, and upped the ram on each one > to 400M (they were hitting oom with only 256m each) The UMLs were OOMing, not the host? And if you run one, it doesn't OOM? Make sure you have enough space in the tmpfs that they are using for their memory files. > top - 16:42:18 up 4 days, 23:11, 25 users, load average: 16.79, 16.65, > 16.23 > Tasks: 44193 total, 14 running, 139 sleeping, 44040 stopped, 0 zombie > Cpu0 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si > Cpu1 : 3.2% us, 96.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.1% si Can you oprofile the system to see what's happening in that 97% system time? I did some oprofiling on x86_64 and a lot of time is spent in the scheduler - much more so than on i386. This looks like you've managed to confuse some part of the host into thrashing. > Mem: 8186088k total, 8145620k used, 40468k free, 11436k buffers > Swap: 2048276k total, 0k used, 2048276k free, 4959636k cached > > > so far it looks to me like ram is Ok, but the high system percentage looks > strange to me. the system closest to finishing it's boot has used a little > over 10 min of cpu time (>5x the normal wall clock time for the boot) so > I am running into contention at some point here. > > I upped the pid_max value to 128000 to give me some headroom there (each > of the first 18 uml instances will end up running ~3600 processes when > they finish booting) You mean they each have 3600 processes running in them after they've booted? Or they've run 3600 processes in the course of booting (and a smaller number will be running after they have booted)? The first is crazy, the second less so - my FC5 filesystem runs ~2000 processes during boot. FWIW, I've booted ~50 UMLs simultaneously on my laptop without any problem. Jeff |
From: David L. <da...@la...> - 2006-03-28 04:42:17
|
On Mon, 27 Mar 2006, Jeff Dike wrote: > Date: Mon, 27 Mar 2006 21:27:39 -0500 > From: Jeff Dike <jd...@ad...> > To: David Lang <da...@la...> > Cc: use...@li... > Subject: Re: [uml-user] what are the current limits on how many uml's on one > host? > > On Mon, Mar 27, 2006 at 03:42:52PM -0800, David Lang wrote: >> well, I reduced the count to 19 instances, and upped the ram on each one >> to 400M (they were hitting oom with only 256m each) > > The UMLs were OOMing, not the host? And if you run one, it doesn't > OOM? the OOM was happening inside the UML, with mem=256m they had trouble (individually or in multiples), with mem=512m they would grow to 368m so I defined them as 400m each and solved that problem > Make sure you have enough space in the tmpfs that they are using for > their memory files. hmm, I didn't define a tempfs >> top - 16:42:18 up 4 days, 23:11, 25 users, load average: 16.79, 16.65, >> 16.23 >> Tasks: 44193 total, 14 running, 139 sleeping, 44040 stopped, 0 zombie >> Cpu0 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si >> Cpu1 : 3.2% us, 96.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.1% si > > Can you oprofile the system to see what's happening in that 97% system > time? > > I did some oprofiling on x86_64 and a lot of time is spent in the > scheduler - much more so than on i386. This looks like you've managed > to confuse some part of the host into thrashing. I did have oprofile compiled into my host kernel, unfortunantly I just had the vmlinuz not the vmlinux so I did the following #cd /usr/src/linux-2.6.16 #cp /proc/config.gz . #gunzip config.gz #mv config.gz .config #make #opcontrol --setup --vmlinux=vmlinux #opcontrol --reset ; opcontrol --start; sleep 60; opcontrol --stop Using default event: CPU_CLK_UNHALTED:100000:0:1:1 Using 2.6+ OProfile kernel interface. Reading module info. oprofiled: /proc/modules not readable, can't process module samples. Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running. Stopping profiling. #opreport >oprofile which produced the following report Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 3016151 97.4122 vmlinux 48905 1.5795 linux-2.6.16-32 7542 0.2436 vm_file-FY3CEc (deleted) 7371 0.2381 oprofiled 7335 0.2369 vm_file-C7DfCD (deleted) 5639 0.1821 vm_file-g6ctgq (deleted) 1180 0.0381 vm_file-3MLZxy (deleted) 720 0.0233 mysqld 611 0.0197 vm_file-KmrmJQ (deleted) 321 0.0104 libc-2.3.2.so 192 0.0062 bash 116 0.0037 ld-2.3.2.so 60 0.0019 vm_file-4oS0Fo (deleted) 45 0.0015 vm_file-mu2rMT (deleted) 33 0.0011 ISO8859-1.so 18 5.8e-04 vm_file-8GxKww (deleted) 15 4.8e-04 ntpd 6 1.9e-04 libpthread-0.60.so 5 1.6e-04 apache 4 1.3e-04 grep 2 6.5e-05 uml_switch 1 3.2e-05 cat 1 3.2e-05 rm 1 3.2e-05 libdl-2.3.2.so 1 3.2e-05 expr 1 3.2e-05 tr doing oreport -l vmlinux produces >700 lines of output, the top ones are CPU: AMD64 processors, speed 2605.97 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 1683585 55.8190 eligible_child 974622 32.3134 do_wait 212888 7.0583 __write_lock_failed 14138 0.4687 get_exec_dcookie 13944 0.4623 main_timer_handler 13599 0.4509 __sched_text_start 13145 0.4358 do_gettimeoffset_pm 5767 0.1912 wait_task_inactive 5589 0.1853 ia32_syscall 5129 0.1701 __switch_to 4566 0.1514 copy_user_generic_c 4096 0.1358 wait_task_stopped 3688 0.1223 sys32_ptrace 3673 0.1218 ptrace_get_task_struct 2801 0.0929 try_to_wake_up 2463 0.0817 recalc_task_prio 2066 0.0685 gs_change 1691 0.0561 ptrace_stop 1627 0.0539 pmtimer_mark_offset 1526 0.0506 arch_ptrace 1464 0.0485 effective_prio 1422 0.0471 do_gettimeofday 1255 0.0416 ptrace_check_attach 1244 0.0412 putreg32 1199 0.0398 do_notify_parent_cldstop 1193 0.0396 find_pid 1145 0.0380 retint_restore_args 1092 0.0362 getreg32 I've only generated an oprofile report once in the past, so feel free to tell me other things I should have done. now, this was not with idle=poll on the boot line (I'd have to reboot and restart everything for that, I'll do that after I do everything I can without doing that since it will take an hour or two to get back into this steady state) >> Mem: 8186088k total, 8145620k used, 40468k free, 11436k buffers >> Swap: 2048276k total, 0k used, 2048276k free, 4959636k cached >> >> >> so far it looks to me like ram is Ok, but the high system percentage looks >> strange to me. the system closest to finishing it's boot has used a little >> over 10 min of cpu time (>5x the normal wall clock time for the boot) so >> I am running into contention at some point here. >> >> I upped the pid_max value to 128000 to give me some headroom there (each >> of the first 18 uml instances will end up running ~3600 processes when >> they finish booting) > > You mean they each have 3600 processes running in them after they've > booted? Or they've run 3600 processes in the course of booting (and a > smaller number will be running after they have booted)? > > The first is crazy, the second less so - my FC5 filesystem runs ~2000 > processes during boot. it's the first, after the system itself boots it starts 3600 user processes. as the systems all finally booted I ended up with >50,000 processes showing in the host (I was actually glad to see this, the current production systems these would be simulating tend to die when they hit ~10,000 processes) > FWIW, I've booted ~50 UMLs simultaneously on my laptop without any > problem. good to know logging into work from home I see that with a dozen (more or less) of these systems running it's hit steady state. top - 20:55:54 up 2:47, 2 users, load average: 4.46, 4.38, 4.38 Tasks: 28782 total, 2 running, 95 sleeping, 28685 stopped, 0 zombie Cpu0 : 4.0% us, 90.0% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.0% hi, 0.0% si Cpu1 : 3.8% us, 90.1% sy, 0.0% ni, 5.9% id, 0.1% wa, 0.0% hi, 0.1% si Mem: 8186088k total, 5506436k used, 2679652k free, 7908k buffers Swap: 2048276k total, 0k used, 2048276k free, 3398960k cached the loadave during boot was ~13 or so for the first 45 min or so, but it's still sitting at 90% system on each cpu. each UML is running ~3600 processes that are sitting listening for network connections, and heartbeat, which attempts to send one udp packet every 2 seconds from each box over each of three interfaces. David Lang |
From: David L. <da...@la...> - 2006-03-28 06:20:54
|
rebooting my system with idle=poll and profile=2 I get the following profile while the UML instances are trying to startup (again a 60 second sleep between start and end, this is somewhat longer then 60 seconds since the machine is so bogged down) CPU: AMD64 processors, speed 2605.96 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 3527977 41.2259 eligible_child 2601815 30.4033 do_wait 1857112 21.7011 __write_lock_failed 47511 0.5552 __sched_text_start 38076 0.4449 main_timer_handler 28270 0.3303 do_gettimeoffset_pm readprofile (same 60 sec sleep) results in sort -rn profile.out |head 150120 total 0.0353 59483 do_wait 63.6182 50935 eligible_child 282.9722 33870 wait_task_stopped 54.2788 1006 thread_return 4.6147 289 ia32_syscall 4.2500 252 ptrace_stop 0.6462 252 do_notify_parent_cldstop 0.9000 190 schedule 0.1203 I'll do another run after things finish stabalizing. David Lang |
From: David L. <da...@la...> - 2006-03-28 19:07:52
|
after things stabilize the results look basicly the same (the exact numbers vary slightly, but the top symbols remain the same should I forward this to linux-kernel as well? David Lang On Mon, 27 Mar 2006, David Lang wrote: > rebooting my system with idle=poll and profile=2 I get the following profile > while the UML instances are trying to startup (again a 60 second sleep > between start and end, this is somewhat longer then 60 seconds since the > machine is so bogged down) > > CPU: AMD64 processors, speed 2605.96 MHz (estimated) > Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit > mask of 0x00 (No unit mask) count 100000 > samples % symbol name > 3527977 41.2259 eligible_child > 2601815 30.4033 do_wait > 1857112 21.7011 __write_lock_failed > 47511 0.5552 __sched_text_start > 38076 0.4449 main_timer_handler > 28270 0.3303 do_gettimeoffset_pm > > readprofile (same 60 sec sleep) results in > > sort -rn profile.out |head > 150120 total 0.0353 > 59483 do_wait 63.6182 > 50935 eligible_child 282.9722 > 33870 wait_task_stopped 54.2788 > 1006 thread_return 4.6147 > 289 ia32_syscall 4.2500 > 252 ptrace_stop 0.6462 > 252 do_notify_parent_cldstop 0.9000 > 190 schedule 0.1203 > > > I'll do another run after things finish stabalizing. > > David Lang > > > > ------------------------------------------------------- > This SF.Net email is sponsored by xPML, a groundbreaking scripting language > that extends applications into web and mobile media. Attend the live webcast > and join the prime developer group breaking into this new coding territory! > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 > _______________________________________________ > User-mode-linux-user mailing list > Use...@li... > https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user > |
From: Blaisorblade <bla...@ya...> - 2006-03-28 20:03:33
|
On Tuesday 28 March 2006 01:58, Christopher S. Aker wrote: > David Lang wrote: > > I upped the pid_max value to 128000 to give me some headroom there (each > > of the first 18 uml instances will end up running ~3600 processes when > > they finish booting) > Sounds to me like you're running those UMLs in TT mode. Process proliferation happens even if they work in SKAS0 mode. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: David L. <da...@la...> - 2006-03-28 20:29:00
|
On Tue, 28 Mar 2006, Blaisorblade wrote: > On Tuesday 28 March 2006 01:58, Christopher S. Aker wrote: >> David Lang wrote: > >> > I upped the pid_max value to 128000 to give me some headroom there (each >> > of the first 18 uml instances will end up running ~3600 processes when >> > they finish booting) > >> Sounds to me like you're running those UMLs in TT mode. > > Process proliferation happens even if they work in SKAS0 mode. should I be applying the SKAS3 patches? (I can do this for testing, but as I go into production I'll need to use the vanilla kernels, are these getting close to being merged?) David Lang |
From: Blaisorblade <bla...@ya...> - 2006-03-28 20:41:33
|
On Tuesday 28 March 2006 22:28, David Lang wrote: > On Tue, 28 Mar 2006, Blaisorblade wrote: > should I be applying the SKAS3 patches? (I can do this for testing, but as > I go into production I'll need to use the vanilla kernels, are these > getting close to being merged?) No, we need to redesign them first; also there are various ideas (some only prototyped, some going to be included in -mm) able to improve performances further, so when they'll be ready we'll merge the final form. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive http://it.messenger.yahoo.com |
From: Blaisorblade <bla...@ya...> - 2006-03-28 21:00:44
|
On Tuesday 28 March 2006 02:16, David Lang wrote: > On Mon, 27 Mar 2006, Christopher S. Aker wrote: > > David Lang wrote: > >> what could I do to assist in tracking down what is causing the > >> contention? > > Are you running a skas3-patched host kernel? You didn't mention if you > > were running in 32bit mode. I don't believe there's a skas patch for > > 64bit kernels (yet). IMO, skas3 is required what you're after. > > http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/ > sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 (which > I understood included the skas patch) No, it doesn't if you don't patch it. Add the patch, but you can't run them in full SKAS3; you can pass "mode=skas0" to force skas0 mode, but you can then also pass "noprocmm" to force half SKAS3. > > Sounds to me like you're running those UMLs in TT mode. If you > > can't/aren't going to patch your host with skas3, at least run a recent > > 2.6-um kernel in skas0 mode, which doesn't require a host kernel patch. > they are running in skas mode, staticly compiled. the um's are 32-bit > 2.6.16 TT mode disabled to enable static linking. the systems finish the > boot sequence after useing about 12 min of cpu time each. > > Use tmpfs mount for TMPDIR, as UML will use that to store its memory > > file. > very little disk activity is takeing place during this time these are all > COW root images from a ~300M base image He's talking about UML's ram, not about disk images - that's mmapped from a file in $TMPDIR (normally /tmp). -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: David L. <da...@la...> - 2006-03-28 22:09:02
|
On Tue, 28 Mar 2006, Blaisorblade wrote: >>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/ > >> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 (which >> I understood included the skas patch) > > No, it doesn't if you don't patch it. Add the patch, but you can't run them in > full SKAS3; you can pass "mode=skas0" to force skas0 mode, but you can then > also pass "noprocmm" to force half SKAS3. Ok, I've gone through and read the docs on blaisorblade's pages about SKAS, and I'm still not understanding things. the 2.6.16 kernel includes a SKAS option in the configs (it only shows if you have TT mode enabled) when they boot up the uml's report that they are starting in skas0 mode however, the discussions you have up don't seem to match the bahavior of the resulting system (the discussions talk about TT vs SKAS mode, is this TT vs SKAS3 mode?) it sounds as if I need to apply the SKAS patches and then pass "noprocmm" to get the 'half SKAS3' mode is this correct? >>> Sounds to me like you're running those UMLs in TT mode. If you >>> can't/aren't going to patch your host with skas3, at least run a recent >>> 2.6-um kernel in skas0 mode, which doesn't require a host kernel patch. > >> they are running in skas mode, staticly compiled. the um's are 32-bit >> 2.6.16 TT mode disabled to enable static linking. the systems finish the >> boot sequence after useing about 12 min of cpu time each. > >>> Use tmpfs mount for TMPDIR, as UML will use that to store its memory >>> file. > >> very little disk activity is takeing place during this time these are all >> COW root images from a ~300M base image > > He's talking about UML's ram, not about disk images - that's mmapped from a > file in $TMPDIR (normally /tmp). Ok, I'll define $TMPDIR to be /dev/shm (which debian mounts a tmpfs on) and try this again David Lang |
From: Blaisorblade <bla...@ya...> - 2006-03-28 22:18:22
|
On Wednesday 29 March 2006 00:09, David Lang wrote: > On Tue, 28 Mar 2006, Blaisorblade wrote: > >>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/ > >> > >> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 > >> (which I understood included the skas patch) > > > > No, it doesn't if you don't patch it. Add the patch, but you can't run > > them in full SKAS3; you can pass "mode=skas0" to force skas0 mode, but > > you can then also pass "noprocmm" to force half SKAS3. > > Ok, I've gone through and read the docs on blaisorblade's pages about > SKAS, and I'm still not understanding things. > the 2.6.16 kernel includes a SKAS option in the configs (it only shows if > you have TT mode enabled) Because otherwise it's auto-enabled. We're talking of guest support, in case it's not clear. However: *) for ages, SKAS meant SKAS3. And SKAS3 requires a patch on the host. *) Now SKAS includes also SKAS0; SKAS0 was born some time ago, doesn't require special host support, and is not as fast as SKAS3, but a lot faster than TT mode. *) there are 3 differences between SKAS0 and SKAS3, and they can be individually enabled; /proc/mm doesn't work on x86_64 host, but with "noprocmm" you enable the others, which are more important for performance. That's what I called in the previous mail "half SKAS3" (you won't find references to this term anywhere else). > when they boot up the uml's report that they are starting in skas0 mode Exactly, matches with the above. > however, the discussions you have up don't seem to match the bahavior of > the resulting system (the discussions talk about TT vs SKAS mode, is this > TT vs SKAS3 mode?) Can't find the exact quote, however "SKAS" could have meant "SKAS3" specifically... For instance, in full SKAS3 you wouldn't get thousands of process on the host, and instead each UML would start about 5 processes. So "Chris Aker" said "I think you are running in TT mode", but he forgot that SKAS0 produces similar results (but SKAS0 currently starts a single thread per guest process, even if the guest process is creating more threads). > it sounds as if I need to apply the SKAS patches and then pass "noprocmm" > to get the 'half SKAS3' mode > is this correct? Yes. -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: David L. <da...@la...> - 2006-03-28 22:33:32
|
On Wed, 29 Mar 2006, Blaisorblade wrote: > On Wednesday 29 March 2006 00:09, David Lang wrote: >> On Tue, 28 Mar 2006, Blaisorblade wrote: >>>>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/ >>>> >>>> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 >>>> (which I understood included the skas patch) >>> >>> No, it doesn't if you don't patch it. Add the patch, but you can't run >>> them in full SKAS3; you can pass "mode=skas0" to force skas0 mode, but >>> you can then also pass "noprocmm" to force half SKAS3. >> >> Ok, I've gone through and read the docs on blaisorblade's pages about >> SKAS, and I'm still not understanding things. > >> the 2.6.16 kernel includes a SKAS option in the configs (it only shows if >> you have TT mode enabled) > > Because otherwise it's auto-enabled. > We're talking of guest support, in case it's not clear. > > However: > *) for ages, SKAS meant SKAS3. And SKAS3 requires a patch on the host. > *) Now SKAS includes also SKAS0; SKAS0 was born some time ago, doesn't require > special host support, and is not as fast as SKAS3, but a lot faster than TT > mode. these are the pieces I was missing, thanks. I just downloaded the 2.6.16-bb1 patchset, does it include the SKAS3 patches or should I install the skas-2.6.16-v9-pre9 patchset as well? David Lang |
From: Blaisorblade <bla...@ya...> - 2006-03-28 22:34:35
|
On Tuesday 28 March 2006 01:42, David Lang wrote: > yOn Mon, 27 Mar 2006, David Lang wrote: > > I foolishly attempted to startup 25 uml instances on one system (dual 252 > > opterons with 8G of ram, each um instance getting 256M) > > what I found was that they seem to be getting in each others way a LOT > > (just on system boot), vmstat on the host is showing almost all of the > > cpu time (80%+) being spent in the system, not in userspace (which > > surprised me) > > so before I spend much time gathering info to try and debug this I wanted > > to ask what the current limits are, and if the limits should just be cpu > > and ram, then I'll do more digging to find out what's happening in my > > case. > well, I reduced the count to 19 instances, and upped the ram on each one > to 400M (they were hitting oom with only 256m each) > almost an hour later the machines still haven't finished booting with top > looking basicly the same for the last half hour or so. > top - 16:42:18 up 4 days, 23:11, 25 users, load average: 16.79, 16.65, > 16.23 Tasks: 44193 total, 14 running, 139 sleeping, 44040 stopped, 0 > zombie Cpu0 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, > 0.0% si Cpu1 : 3.2% us, 96.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, > 0.1% si Mem: 8186088k total, 8145620k used, 40468k free, 11436k > buffers Swap: 2048276k total, 0k used, 2048276k free, 4959636k > cached > so far it looks to me like ram is Ok, but the high system percentage looks > strange to me. the system closest to finishing it's boot has used a little > over 10 min of cpu time (>5x the normal wall clock time for the boot) so > I am running into contention at some point here. I know that it's maybe a bad workaround, but what about sequential startup both of UMLs and of the jobs inside them? I'd run "vmstat 1" to watch for increase of context switches - an eccessive amount of them is likely to burn you out. Look below (I'm selecting the context switches count with awk) - the low numbers (~1000-2000) are with the system running only a CPU-hog in background, the high ones (~100 000) are when I run inside UML: $ while :; do /bin/true; done vmstat 1|awk '{print $12}' cs 2714 113208 92306 109654 82226 1478 1235 84262 114143 115037 112424 But with apache benchmark on a UML, I can get higher numbers: $ ab2 -t 30 -v 1 Sarge/apache2-default/ cs 5170 945 1089 977 756 1083 24879 106316 119013 121471 99706 122907 127361 108613 130089 126837 123382 116747 130797 131173 129478 cs 124357 124892 102280 127380 129434 109804 113807 123767 133519 125085 119384 129109 129997 126418 60348 892 675 743 851 655 780 -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: David L. <da...@la...> - 2006-03-28 22:47:04
|
On Wed, 29 Mar 2006, Blaisorblade wrote: > On Tuesday 28 March 2006 01:42, David Lang wrote: >> yOn Mon, 27 Mar 2006, David Lang wrote: >>> I foolishly attempted to startup 25 uml instances on one system (dual 252 >>> opterons with 8G of ram, each um instance getting 256M) > >>> what I found was that they seem to be getting in each others way a LOT >>> (just on system boot), vmstat on the host is showing almost all of the >>> cpu time (80%+) being spent in the system, not in userspace (which >>> surprised me) > >>> so before I spend much time gathering info to try and debug this I wanted >>> to ask what the current limits are, and if the limits should just be cpu >>> and ram, then I'll do more digging to find out what's happening in my >>> case. > >> well, I reduced the count to 19 instances, and upped the ram on each one >> to 400M (they were hitting oom with only 256m each) > >> almost an hour later the machines still haven't finished booting with top >> looking basicly the same for the last half hour or so. > >> top - 16:42:18 up 4 days, 23:11, 25 users, load average: 16.79, 16.65, >> 16.23 Tasks: 44193 total, 14 running, 139 sleeping, 44040 stopped, 0 >> zombie Cpu0 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, >> 0.0% si Cpu1 : 3.2% us, 96.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, >> 0.1% si Mem: 8186088k total, 8145620k used, 40468k free, 11436k >> buffers Swap: 2048276k total, 0k used, 2048276k free, 4959636k >> cached > >> so far it looks to me like ram is Ok, but the high system percentage looks >> strange to me. the system closest to finishing it's boot has used a little >> over 10 min of cpu time (>5x the normal wall clock time for the boot) so >> I am running into contention at some point here. > > I know that it's maybe a bad workaround, but what about sequential startup > both of UMLs and of the jobs inside them? I'll try it for a test and let you know how it works > I'd run "vmstat 1" to watch for increase of context switches - an eccessive > amount of them is likely to burn you out. I'll check for this, but this would surprise me. inside the uml's the only thing that is activly running is heartbeat (linux-ha.org). even with a dozen copies running (one per uml) this should only generate a small amount of traffic (18 udp packets sent per second to the broadcast addresses for all 12 boxes combined) David Lang |
From: Blaisorblade <bla...@ya...> - 2006-03-28 22:43:47
|
On Wednesday 29 March 2006 00:33, David Lang wrote: > On Wed, 29 Mar 2006, Blaisorblade wrote: > > On Wednesday 29 March 2006 00:09, David Lang wrote: > >> On Tue, 28 Mar 2006, Blaisorblade wrote: > >>>>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/ > >>>> > >>>> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 > >>>> (which I understood included the skas patch) > >>> > >>> No, it doesn't if you don't patch it. Add the patch, but you can't run > >>> them in full SKAS3; you can pass "mode=skas0" to force skas0 mode, but > >>> you can then also pass "noprocmm" to force half SKAS3. > >> > >> Ok, I've gone through and read the docs on blaisorblade's pages about > >> SKAS, and I'm still not understanding things. > >> > >> the 2.6.16 kernel includes a SKAS option in the configs (it only shows > >> if you have TT mode enabled) > > > > Because otherwise it's auto-enabled. > > We're talking of guest support, in case it's not clear. > > > > However: > > *) for ages, SKAS meant SKAS3. And SKAS3 requires a patch on the host. > > *) Now SKAS includes also SKAS0; SKAS0 was born some time ago, doesn't > > require special host support, and is not as fast as SKAS3, but a lot > > faster than TT mode. > > these are the pieces I was missing, thanks. > > I just downloaded the 2.6.16-bb1 patchset, does it include the SKAS3 > patches or should I install the skas-2.6.16-v9-pre9 patchset as well? It includes skas as well, but remember you must do mrproper between building UML and host kernel or use O= with two different output directories (as I do): mkdir ../BUILD OUT=../BUILD/um-linux-2.6.16-build mkdir $OUT make ARCH=um SUBARCH=i386 O=$OUT menuconfig make ARCH=um SUBARCH=i386 O=$OUT OUT_HOST=../BUILD/64-linux-2.6.16-build mkdir $OUT_HOST make O=$OUT_HOST menuconfig make O=$OUT_HOST Bye -- Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!". Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894) http://www.user-mode-linux.org/~blaisorblade ___________________________________ Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB http://mail.yahoo.it |
From: David L. <da...@la...> - 2006-03-30 00:11:47
|
On Wed, 29 Mar 2006, Blaisorblade wrote: >> these are the pieces I was missing, thanks. >> >> I just downloaded the 2.6.16-bb1 patchset, does it include the SKAS3 >> patches or should I install the skas-2.6.16-v9-pre9 patchset as well? > > It includes skas as well, but remember you must do mrproper between building > UML and host kernel or use O= with two different output directories (as I > do): I thought that what I had downloaded was the patches between 2.6.16 and 2.6.16-bb1, but instead it was the 2.6.16-bb1 precompiled kernel and modules. (downloaded from http://www.user-mode-linux.org/~blaisorblade/binaries/2.6.16-bb1/) I see guest patches at http://www.user-mode-linux.org/~blaisorblade/patches/guest/uml-2.6.16-bb1/ is there a similar set of patches for the host? or is this for both? David Lang |