From: David L. <da...@la...> - 2006-03-28 04:42:17
|
On Mon, 27 Mar 2006, Jeff Dike wrote: > Date: Mon, 27 Mar 2006 21:27:39 -0500 > From: Jeff Dike <jd...@ad...> > To: David Lang <da...@la...> > Cc: use...@li... > Subject: Re: [uml-user] what are the current limits on how many uml's on one > host? > > On Mon, Mar 27, 2006 at 03:42:52PM -0800, David Lang wrote: >> well, I reduced the count to 19 instances, and upped the ram on each one >> to 400M (they were hitting oom with only 256m each) > > The UMLs were OOMing, not the host? And if you run one, it doesn't > OOM? the OOM was happening inside the UML, with mem=256m they had trouble (individually or in multiples), with mem=512m they would grow to 368m so I defined them as 400m each and solved that problem > Make sure you have enough space in the tmpfs that they are using for > their memory files. hmm, I didn't define a tempfs >> top - 16:42:18 up 4 days, 23:11, 25 users, load average: 16.79, 16.65, >> 16.23 >> Tasks: 44193 total, 14 running, 139 sleeping, 44040 stopped, 0 zombie >> Cpu0 : 2.3% us, 97.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si >> Cpu1 : 3.2% us, 96.7% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.1% si > > Can you oprofile the system to see what's happening in that 97% system > time? > > I did some oprofiling on x86_64 and a lot of time is spent in the > scheduler - much more so than on i386. This looks like you've managed > to confuse some part of the host into thrashing. I did have oprofile compiled into my host kernel, unfortunantly I just had the vmlinuz not the vmlinux so I did the following #cd /usr/src/linux-2.6.16 #cp /proc/config.gz . #gunzip config.gz #mv config.gz .config #make #opcontrol --setup --vmlinux=vmlinux #opcontrol --reset ; opcontrol --start; sleep 60; opcontrol --stop Using default event: CPU_CLK_UNHALTED:100000:0:1:1 Using 2.6+ OProfile kernel interface. Reading module info. oprofiled: /proc/modules not readable, can't process module samples. Using log file /var/lib/oprofile/oprofiled.log Daemon started. Profiler running. Stopping profiling. #opreport >oprofile which produced the following report Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 CPU_CLK_UNHALT...| samples| %| ------------------ 3016151 97.4122 vmlinux 48905 1.5795 linux-2.6.16-32 7542 0.2436 vm_file-FY3CEc (deleted) 7371 0.2381 oprofiled 7335 0.2369 vm_file-C7DfCD (deleted) 5639 0.1821 vm_file-g6ctgq (deleted) 1180 0.0381 vm_file-3MLZxy (deleted) 720 0.0233 mysqld 611 0.0197 vm_file-KmrmJQ (deleted) 321 0.0104 libc-2.3.2.so 192 0.0062 bash 116 0.0037 ld-2.3.2.so 60 0.0019 vm_file-4oS0Fo (deleted) 45 0.0015 vm_file-mu2rMT (deleted) 33 0.0011 ISO8859-1.so 18 5.8e-04 vm_file-8GxKww (deleted) 15 4.8e-04 ntpd 6 1.9e-04 libpthread-0.60.so 5 1.6e-04 apache 4 1.3e-04 grep 2 6.5e-05 uml_switch 1 3.2e-05 cat 1 3.2e-05 rm 1 3.2e-05 libdl-2.3.2.so 1 3.2e-05 expr 1 3.2e-05 tr doing oreport -l vmlinux produces >700 lines of output, the top ones are CPU: AMD64 processors, speed 2605.97 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 samples % symbol name 1683585 55.8190 eligible_child 974622 32.3134 do_wait 212888 7.0583 __write_lock_failed 14138 0.4687 get_exec_dcookie 13944 0.4623 main_timer_handler 13599 0.4509 __sched_text_start 13145 0.4358 do_gettimeoffset_pm 5767 0.1912 wait_task_inactive 5589 0.1853 ia32_syscall 5129 0.1701 __switch_to 4566 0.1514 copy_user_generic_c 4096 0.1358 wait_task_stopped 3688 0.1223 sys32_ptrace 3673 0.1218 ptrace_get_task_struct 2801 0.0929 try_to_wake_up 2463 0.0817 recalc_task_prio 2066 0.0685 gs_change 1691 0.0561 ptrace_stop 1627 0.0539 pmtimer_mark_offset 1526 0.0506 arch_ptrace 1464 0.0485 effective_prio 1422 0.0471 do_gettimeofday 1255 0.0416 ptrace_check_attach 1244 0.0412 putreg32 1199 0.0398 do_notify_parent_cldstop 1193 0.0396 find_pid 1145 0.0380 retint_restore_args 1092 0.0362 getreg32 I've only generated an oprofile report once in the past, so feel free to tell me other things I should have done. now, this was not with idle=poll on the boot line (I'd have to reboot and restart everything for that, I'll do that after I do everything I can without doing that since it will take an hour or two to get back into this steady state) >> Mem: 8186088k total, 8145620k used, 40468k free, 11436k buffers >> Swap: 2048276k total, 0k used, 2048276k free, 4959636k cached >> >> >> so far it looks to me like ram is Ok, but the high system percentage looks >> strange to me. the system closest to finishing it's boot has used a little >> over 10 min of cpu time (>5x the normal wall clock time for the boot) so >> I am running into contention at some point here. >> >> I upped the pid_max value to 128000 to give me some headroom there (each >> of the first 18 uml instances will end up running ~3600 processes when >> they finish booting) > > You mean they each have 3600 processes running in them after they've > booted? Or they've run 3600 processes in the course of booting (and a > smaller number will be running after they have booted)? > > The first is crazy, the second less so - my FC5 filesystem runs ~2000 > processes during boot. it's the first, after the system itself boots it starts 3600 user processes. as the systems all finally booted I ended up with >50,000 processes showing in the host (I was actually glad to see this, the current production systems these would be simulating tend to die when they hit ~10,000 processes) > FWIW, I've booted ~50 UMLs simultaneously on my laptop without any > problem. good to know logging into work from home I see that with a dozen (more or less) of these systems running it's hit steady state. top - 20:55:54 up 2:47, 2 users, load average: 4.46, 4.38, 4.38 Tasks: 28782 total, 2 running, 95 sleeping, 28685 stopped, 0 zombie Cpu0 : 4.0% us, 90.0% sy, 0.0% ni, 5.9% id, 0.0% wa, 0.0% hi, 0.0% si Cpu1 : 3.8% us, 90.1% sy, 0.0% ni, 5.9% id, 0.1% wa, 0.0% hi, 0.1% si Mem: 8186088k total, 5506436k used, 2679652k free, 7908k buffers Swap: 2048276k total, 0k used, 2048276k free, 3398960k cached the loadave during boot was ~13 or so for the first 45 min or so, but it's still sitting at 90% system on each cpu. each UML is running ~3600 processes that are sitting listening for network connections, and heartbeat, which attempts to send one udp packet every 2 seconds from each box over each of three interfaces. David Lang |