Re: [uml-user] what are the current limits on how many uml's on one host?

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Mon, 27 Mar 2006, Jeff Dike wrote:

> Date: Mon, 27 Mar 2006 21:27:39 -0500
> From: Jeff Dike <jd...@ad...>
> To: David Lang <da...@la...>
> Cc: use...@li...
> Subject: Re: [uml-user] what are the current limits on how many uml's on one
>     host?
> 
> On Mon, Mar 27, 2006 at 03:42:52PM -0800, David Lang wrote:
>> well, I reduced the count to 19 instances, and upped the ram on each one
>> to 400M (they were hitting oom with only 256m each)
>
> The UMLs were OOMing, not the host?  And if you run one, it doesn't
> OOM?

the OOM was happening inside the UML, with mem=256m they had trouble 
(individually or in multiples), with mem=512m they would grow to 368m so I 
defined them as 400m each and solved that problem

> Make sure you have enough space in the tmpfs that they are using for
> their memory files.

hmm, I didn't define a tempfs

>> top - 16:42:18 up 4 days, 23:11, 25 users,  load average: 16.79, 16.65,
>> 16.23
>> Tasks: 44193 total,  14 running, 139 sleeping, 44040 stopped,   0 zombie
>>  Cpu0 :  2.3% us, 97.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
>>  Cpu1 :  3.2% us, 96.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.1% si
>
> Can you oprofile the system to see what's happening in that 97% system
> time?
>
> I did some oprofiling on x86_64 and a lot of time is spent in the
> scheduler - much more so than on i386.  This looks like you've managed
> to confuse some part of the host into thrashing.

I did have oprofile compiled into my host kernel, unfortunantly I just had 
the vmlinuz not the vmlinux so I did the following

#cd /usr/src/linux-2.6.16
#cp /proc/config.gz .
#gunzip config.gz
#mv config.gz .config
#make
#opcontrol --setup --vmlinux=vmlinux
#opcontrol --reset ; opcontrol --start; sleep 60; opcontrol --stop
Using default event: CPU_CLK_UNHALTED:100000:0:1:1
Using 2.6+ OProfile kernel interface.
Reading module info.
oprofiled: /proc/modules not readable, can't process module samples.
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.
Stopping profiling.
#opreport >oprofile

which produced the following report

Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
   samples|      %|
------------------
   3016151 97.4122 vmlinux
     48905  1.5795 linux-2.6.16-32
      7542  0.2436 vm_file-FY3CEc (deleted)
      7371  0.2381 oprofiled
      7335  0.2369 vm_file-C7DfCD (deleted)
      5639  0.1821 vm_file-g6ctgq (deleted)
      1180  0.0381 vm_file-3MLZxy (deleted)
       720  0.0233 mysqld
       611  0.0197 vm_file-KmrmJQ (deleted)
       321  0.0104 libc-2.3.2.so
       192  0.0062 bash
       116  0.0037 ld-2.3.2.so
        60  0.0019 vm_file-4oS0Fo (deleted)
        45  0.0015 vm_file-mu2rMT (deleted)
        33  0.0011 ISO8859-1.so
        18 5.8e-04 vm_file-8GxKww (deleted)
        15 4.8e-04 ntpd
         6 1.9e-04 libpthread-0.60.so
         5 1.6e-04 apache
         4 1.3e-04 grep
         2 6.5e-05 uml_switch
         1 3.2e-05 cat
         1 3.2e-05 rm
         1 3.2e-05 libdl-2.3.2.so
         1 3.2e-05 expr
         1 3.2e-05 tr

doing oreport -l vmlinux produces >700 lines of output, the top ones are

CPU: AMD64 processors, speed 2605.97 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
1683585  55.8190  eligible_child
974622   32.3134  do_wait
212888    7.0583  __write_lock_failed
14138     0.4687  get_exec_dcookie
13944     0.4623  main_timer_handler
13599     0.4509  __sched_text_start
13145     0.4358  do_gettimeoffset_pm
5767      0.1912  wait_task_inactive
5589      0.1853  ia32_syscall
5129      0.1701  __switch_to
4566      0.1514  copy_user_generic_c
4096      0.1358  wait_task_stopped
3688      0.1223  sys32_ptrace
3673      0.1218  ptrace_get_task_struct
2801      0.0929  try_to_wake_up
2463      0.0817  recalc_task_prio
2066      0.0685  gs_change
1691      0.0561  ptrace_stop
1627      0.0539  pmtimer_mark_offset
1526      0.0506  arch_ptrace
1464      0.0485  effective_prio
1422      0.0471  do_gettimeofday
1255      0.0416  ptrace_check_attach
1244      0.0412  putreg32
1199      0.0398  do_notify_parent_cldstop
1193      0.0396  find_pid
1145      0.0380  retint_restore_args
1092      0.0362  getreg32

I've only generated an oprofile report once in the past, so feel free to 
tell me other things I should have done.

now, this was not with idle=poll on the boot line (I'd have to reboot and 
restart everything for that, I'll do that after I do everything I can 
without doing that since it will take an hour or two to get back into this 
steady state)

>> Mem:   8186088k total,  8145620k used,    40468k free,    11436k buffers
>> Swap:  2048276k total,        0k used,  2048276k free,  4959636k cached
>>
>>
>> so far it looks to me like ram is Ok, but the high system percentage looks
>> strange to me. the system closest to finishing it's boot has used a little
>> over 10 min of cpu time (>5x the normal wall clock time for the boot) so
>> I am running into contention at some point here.
>>
>> I upped the pid_max value to 128000 to give me some headroom there (each
>> of the first 18 uml instances will end up running ~3600 processes when
>> they finish booting)
>
> You mean they each have 3600 processes running in them after they've
> booted?  Or they've run 3600 processes in the course of booting (and a
> smaller number will be running after they have booted)?
>
> The first is crazy, the second less so - my FC5 filesystem runs ~2000
> processes during boot.

it's the first, after the system itself boots it starts 3600 user 
processes. as the systems all finally booted I ended up with >50,000 
processes showing in the host (I was actually glad to see this, the 
current production systems these would be simulating tend to die when they 
hit ~10,000 processes)

> FWIW, I've booted ~50 UMLs simultaneously on my laptop without any
> problem.

good to know

logging into work from home I see that with a dozen (more or less) of 
these systems running it's hit steady state.

top - 20:55:54 up  2:47,  2 users,  load average: 4.46, 4.38, 4.38
Tasks: 28782 total,   2 running,  95 sleeping, 28685 stopped,   0 zombie
  Cpu0 :  4.0% us, 90.0% sy,  0.0% ni,  5.9% id,  0.0% wa,  0.0% hi,  0.0% si
  Cpu1 :  3.8% us, 90.1% sy,  0.0% ni,  5.9% id,  0.1% wa,  0.0% hi,  0.1% si
Mem:   8186088k total,  5506436k used,  2679652k free,     7908k buffers
Swap:  2048276k total,        0k used,  2048276k free,  3398960k cached

the loadave during boot was ~13 or so for the first 45 min or so, but it's 
still sitting at 90% system on each cpu.

each UML is running ~3600 processes that are sitting listening for network 
connections, and heartbeat, which attempts to send one udp packet every 2 
seconds from each box over each of three interfaces.

David Lang