Thread: [uml-user] what are the current limits on how many uml's on one host?

user-mode-linux-user

[uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-27 08:46:44

I foolishly attempted to startup 25 uml instances on one system (dual 252 
opterons with 8G of ram, each um instance getting 256M)

what I found was that they seem to be getting in each others way a LOT 
(just on system boot), vmstat on the host is showing almost all of the cpu 
time (80%+) being spent in the system, not in userspace (which surprised 
me)

so before I spend much time gathering info to try and debug this I wanted 
to ask what the current limits are, and if the limits should just be cpu 
and ram, then I'll do more digging to find out what's happening in my 
case.

I definantly have some oddities among the 25, two of the 25 did not get 
far enough to use their cow files (20K each according to ls -ls) and when 
I connect to them via uml_mconsole it hangs when I issue any command 
(including help)

I'm working from home so I won't see the consoles until I get in on 
monday, so I'll give more details then unless I'm told that I am being 
silly to try to start this many on one system.

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-27 23:42:56

yOn Mon, 27 Mar 2006, David Lang wrote:

> I foolishly attempted to startup 25 uml instances on one system (dual 252 
> opterons with 8G of ram, each um instance getting 256M)
>
> what I found was that they seem to be getting in each others way a LOT (just 
> on system boot), vmstat on the host is showing almost all of the cpu time 
> (80%+) being spent in the system, not in userspace (which surprised me)
>
> so before I spend much time gathering info to try and debug this I wanted to 
> ask what the current limits are, and if the limits should just be cpu and 
> ram, then I'll do more digging to find out what's happening in my case.

well, I reduced the count to 19 instances, and upped the ram on each one 
to 400M (they were hitting oom with only 256m each)

almost an hour later the machines still haven't finished booting with top 
looking basicly the same for the last half hour or so.

top - 16:42:18 up 4 days, 23:11, 25 users,  load average: 16.79, 16.65, 16.23
Tasks: 44193 total,  14 running, 139 sleeping, 44040 stopped,   0 zombie
  Cpu0 :  2.3% us, 97.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
  Cpu1 :  3.2% us, 96.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.1% si
Mem:   8186088k total,  8145620k used,    40468k free,    11436k buffers
Swap:  2048276k total,        0k used,  2048276k free,  4959636k cached


so far it looks to me like ram is Ok, but the high system percentage looks 
strange to me. the system closest to finishing it's boot has used a little 
over 10 min of cpu time (>5x the normal wall clock time for the boot) so 
I am running into contention at some point here.

I upped the pid_max value to 128000 to give me some headroom there (each 
of the first 18 uml instances will end up running ~3600 processes when 
they finish booting)

what could I do to assist in tracking down what is causing the contention?

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Christopher S. A. <ca...@th...> - 2006-03-27 23:59:07

David Lang wrote:
> what could I do to assist in tracking down what is causing the contention?

Are you running a skas3-patched host kernel?  You didn't mention if you 
were running in 32bit mode.  I don't believe there's a skas patch for 
64bit kernels (yet).  IMO, skas3 is required what you're after.

http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/

 > I upped the pid_max value to 128000 to give me some headroom there (each
 > of the first 18 uml instances will end up running ~3600 processes when
 > they finish booting)

Sounds to me like you're running those UMLs in TT mode.  If you 
can't/aren't going to patch your host with skas3, at least run a recent 
2.6-um kernel in skas0 mode, which doesn't require a host kernel patch.

The other two things are:

Use the cfq disk scheduler.  elevator=cfq on your kernel command line 
will do that, as long as it's compiled into your host's kernel.

Use tmpfs mount for TMPDIR, as UML will use that to store its memory file.

-Chris

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 00:17:11

On Mon, 27 Mar 2006, Christopher S. Aker wrote:

> David Lang wrote:
>> what could I do to assist in tracking down what is causing the contention?
>
> Are you running a skas3-patched host kernel?  You didn't mention if you were 
> running in 32bit mode.  I don't believe there's a skas patch for 64bit 
> kernels (yet).  IMO, skas3 is required what you're after.
>
> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/

sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 (which 
I understood included the skas patch)

>> I upped the pid_max value to 128000 to give me some headroom there (each
>> of the first 18 uml instances will end up running ~3600 processes when
>> they finish booting)
>
> Sounds to me like you're running those UMLs in TT mode.  If you can't/aren't 
> going to patch your host with skas3, at least run a recent 2.6-um kernel in 
> skas0 mode, which doesn't require a host kernel patch.

they are running in skas mode, staticly compiled. the um's are 32-bit 
2.6.16 TT mode disabled to enable static linking. the systems finish the 
boot sequence after useing about 12 min of cpu time each.

> The other two things are:
>
> Use the cfq disk scheduler.  elevator=cfq on your kernel command line will do 
> that, as long as it's compiled into your host's kernel.
>
> Use tmpfs mount for TMPDIR, as UML will use that to store its memory file.

very little disk activity is takeing place during this time these are all 
COW root images from a ~300M base image

ls -ls shows
305356 -rw-rw-rw-  1 root  root  1073741825 2006-03-24 18:15 root_fs.basebuild2
   9580 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1a-b
  14880 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1a-p
  12632 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1b-b
   5512 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1b-p
   5584 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1c-b
  12180 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1c-p
  14716 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1d-b
  16164 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1d-p
  12916 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1e-b
   9504 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:13 root_fs.methane1e-p
  10896 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 12:19 root_fs.methane1z-b
   5780 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane1z-p
   6516 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 15:36 root_fs.methane2a-p
   7980 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:14 root_fs.methane2b-p
   6056 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 15:36 root_fs.methane2c-p
   5968 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 15:36 root_fs.methane2d-p
   7704 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.methane2e-p
   7640 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:14 root_fs.methane2z-p
  19364 -rw-rw-rw-  1 dlang staff 1074016257 2006-03-27 17:15 root_fs.router


the command line to start these is

./linux-2.6.16-32 umid=methane1a-p ubd=memmap ubd0=root_fs.methane1a-p,root_fs.basebuild2 con0=tty:/dev/con-m1ap con=pty ssl=pty eth0=tuntap,m1ap-e0 eth1=tuntap,m1ap-e1 eth2=tuntap,m1ap-e2 eth3=tuntap,m1ap-e3 mem=400m stderr=1 </dev/con-m1ap >/dev/con-m1ap 2>/dev/con-m1ap &

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Jeff D. <jd...@ad...> - 2006-03-28 02:26:34

On Mon, Mar 27, 2006 at 03:42:52PM -0800, David Lang wrote:
> well, I reduced the count to 19 instances, and upped the ram on each one 
> to 400M (they were hitting oom with only 256m each)

The UMLs were OOMing, not the host?  And if you run one, it doesn't
OOM?

Make sure you have enough space in the tmpfs that they are using for
their memory files.

> top - 16:42:18 up 4 days, 23:11, 25 users,  load average: 16.79, 16.65, 
> 16.23
> Tasks: 44193 total,  14 running, 139 sleeping, 44040 stopped,   0 zombie
>  Cpu0 :  2.3% us, 97.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
>  Cpu1 :  3.2% us, 96.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.1% si

Can you oprofile the system to see what's happening in that 97% system
time?

I did some oprofiling on x86_64 and a lot of time is spent in the
scheduler - much more so than on i386.  This looks like you've managed
to confuse some part of the host into thrashing.

> Mem:   8186088k total,  8145620k used,    40468k free,    11436k buffers
> Swap:  2048276k total,        0k used,  2048276k free,  4959636k cached
> 
> 
> so far it looks to me like ram is Ok, but the high system percentage looks 
> strange to me. the system closest to finishing it's boot has used a little 
> over 10 min of cpu time (>5x the normal wall clock time for the boot) so 
> I am running into contention at some point here.
> 
> I upped the pid_max value to 128000 to give me some headroom there (each 
> of the first 18 uml instances will end up running ~3600 processes when 
> they finish booting)

You mean they each have 3600 processes running in them after they've
booted?  Or they've run 3600 processes in the course of booting (and a
smaller number will be running after they have booted)?

The first is crazy, the second less so - my FC5 filesystem runs ~2000
processes during boot.

FWIW, I've booted ~50 UMLs simultaneously on my laptop without any
problem.

				Jeff

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 04:42:17

On Mon, 27 Mar 2006, Jeff Dike wrote:

> Date: Mon, 27 Mar 2006 21:27:39 -0500
> From: Jeff Dike <jd...@ad...>
> To: David Lang <da...@la...>
> Cc: use...@li...
> Subject: Re: [uml-user] what are the current limits on how many uml's on one
>     host?
> 
> On Mon, Mar 27, 2006 at 03:42:52PM -0800, David Lang wrote:
>> well, I reduced the count to 19 instances, and upped the ram on each one
>> to 400M (they were hitting oom with only 256m each)
>
> The UMLs were OOMing, not the host?  And if you run one, it doesn't
> OOM?

the OOM was happening inside the UML, with mem=256m they had trouble 
(individually or in multiples), with mem=512m they would grow to 368m so I 
defined them as 400m each and solved that problem

> Make sure you have enough space in the tmpfs that they are using for
> their memory files.

hmm, I didn't define a tempfs

>> top - 16:42:18 up 4 days, 23:11, 25 users,  load average: 16.79, 16.65,
>> 16.23
>> Tasks: 44193 total,  14 running, 139 sleeping, 44040 stopped,   0 zombie
>>  Cpu0 :  2.3% us, 97.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
>>  Cpu1 :  3.2% us, 96.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.1% si
>
> Can you oprofile the system to see what's happening in that 97% system
> time?
>
> I did some oprofiling on x86_64 and a lot of time is spent in the
> scheduler - much more so than on i386.  This looks like you've managed
> to confuse some part of the host into thrashing.

I did have oprofile compiled into my host kernel, unfortunantly I just had 
the vmlinuz not the vmlinux so I did the following

#cd /usr/src/linux-2.6.16
#cp /proc/config.gz .
#gunzip config.gz
#mv config.gz .config
#make
#opcontrol --setup --vmlinux=vmlinux
#opcontrol --reset ; opcontrol --start; sleep 60; opcontrol --stop
Using default event: CPU_CLK_UNHALTED:100000:0:1:1
Using 2.6+ OProfile kernel interface.
Reading module info.
oprofiled: /proc/modules not readable, can't process module samples.
Using log file /var/lib/oprofile/oprofiled.log
Daemon started.
Profiler running.
Stopping profiling.
#opreport >oprofile

which produced the following report

Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
CPU_CLK_UNHALT...|
   samples|      %|
------------------
   3016151 97.4122 vmlinux
     48905  1.5795 linux-2.6.16-32
      7542  0.2436 vm_file-FY3CEc (deleted)
      7371  0.2381 oprofiled
      7335  0.2369 vm_file-C7DfCD (deleted)
      5639  0.1821 vm_file-g6ctgq (deleted)
      1180  0.0381 vm_file-3MLZxy (deleted)
       720  0.0233 mysqld
       611  0.0197 vm_file-KmrmJQ (deleted)
       321  0.0104 libc-2.3.2.so
       192  0.0062 bash
       116  0.0037 ld-2.3.2.so
        60  0.0019 vm_file-4oS0Fo (deleted)
        45  0.0015 vm_file-mu2rMT (deleted)
        33  0.0011 ISO8859-1.so
        18 5.8e-04 vm_file-8GxKww (deleted)
        15 4.8e-04 ntpd
         6 1.9e-04 libpthread-0.60.so
         5 1.6e-04 apache
         4 1.3e-04 grep
         2 6.5e-05 uml_switch
         1 3.2e-05 cat
         1 3.2e-05 rm
         1 3.2e-05 libdl-2.3.2.so
         1 3.2e-05 expr
         1 3.2e-05 tr

doing oreport -l vmlinux produces >700 lines of output, the top ones are

CPU: AMD64 processors, speed 2605.97 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
1683585  55.8190  eligible_child
974622   32.3134  do_wait
212888    7.0583  __write_lock_failed
14138     0.4687  get_exec_dcookie
13944     0.4623  main_timer_handler
13599     0.4509  __sched_text_start
13145     0.4358  do_gettimeoffset_pm
5767      0.1912  wait_task_inactive
5589      0.1853  ia32_syscall
5129      0.1701  __switch_to
4566      0.1514  copy_user_generic_c
4096      0.1358  wait_task_stopped
3688      0.1223  sys32_ptrace
3673      0.1218  ptrace_get_task_struct
2801      0.0929  try_to_wake_up
2463      0.0817  recalc_task_prio
2066      0.0685  gs_change
1691      0.0561  ptrace_stop
1627      0.0539  pmtimer_mark_offset
1526      0.0506  arch_ptrace
1464      0.0485  effective_prio
1422      0.0471  do_gettimeofday
1255      0.0416  ptrace_check_attach
1244      0.0412  putreg32
1199      0.0398  do_notify_parent_cldstop
1193      0.0396  find_pid
1145      0.0380  retint_restore_args
1092      0.0362  getreg32

I've only generated an oprofile report once in the past, so feel free to 
tell me other things I should have done.

now, this was not with idle=poll on the boot line (I'd have to reboot and 
restart everything for that, I'll do that after I do everything I can 
without doing that since it will take an hour or two to get back into this 
steady state)

>> Mem:   8186088k total,  8145620k used,    40468k free,    11436k buffers
>> Swap:  2048276k total,        0k used,  2048276k free,  4959636k cached
>>
>>
>> so far it looks to me like ram is Ok, but the high system percentage looks
>> strange to me. the system closest to finishing it's boot has used a little
>> over 10 min of cpu time (>5x the normal wall clock time for the boot) so
>> I am running into contention at some point here.
>>
>> I upped the pid_max value to 128000 to give me some headroom there (each
>> of the first 18 uml instances will end up running ~3600 processes when
>> they finish booting)
>
> You mean they each have 3600 processes running in them after they've
> booted?  Or they've run 3600 processes in the course of booting (and a
> smaller number will be running after they have booted)?
>
> The first is crazy, the second less so - my FC5 filesystem runs ~2000
> processes during boot.

it's the first, after the system itself boots it starts 3600 user 
processes. as the systems all finally booted I ended up with >50,000 
processes showing in the host (I was actually glad to see this, the 
current production systems these would be simulating tend to die when they 
hit ~10,000 processes)

> FWIW, I've booted ~50 UMLs simultaneously on my laptop without any
> problem.

good to know

logging into work from home I see that with a dozen (more or less) of 
these systems running it's hit steady state.

top - 20:55:54 up  2:47,  2 users,  load average: 4.46, 4.38, 4.38
Tasks: 28782 total,   2 running,  95 sleeping, 28685 stopped,   0 zombie
  Cpu0 :  4.0% us, 90.0% sy,  0.0% ni,  5.9% id,  0.0% wa,  0.0% hi,  0.0% si
  Cpu1 :  3.8% us, 90.1% sy,  0.0% ni,  5.9% id,  0.1% wa,  0.0% hi,  0.1% si
Mem:   8186088k total,  5506436k used,  2679652k free,     7908k buffers
Swap:  2048276k total,        0k used,  2048276k free,  3398960k cached

the loadave during boot was ~13 or so for the first 45 min or so, but it's 
still sitting at 90% system on each cpu.

each UML is running ~3600 processes that are sitting listening for network 
connections, and heartbeat, which attempts to send one udp packet every 2 
seconds from each box over each of three interfaces.

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 06:20:54

rebooting my system with idle=poll and profile=2 I get the following 
profile while the UML instances are trying to startup (again a 60 second 
sleep between start and end, this is somewhat longer then 60 seconds since 
the machine is so bogged down)

CPU: AMD64 processors, speed 2605.96 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
mask of 0x00 (No unit mask) count 100000
samples  %        symbol name
3527977  41.2259  eligible_child
2601815  30.4033  do_wait
1857112  21.7011  __write_lock_failed
47511     0.5552  __sched_text_start
38076     0.4449  main_timer_handler
28270     0.3303  do_gettimeoffset_pm

readprofile (same 60 sec sleep) results in

sort -rn profile.out |head
150120 total                                      0.0353
  59483 do_wait                                   63.6182
  50935 eligible_child                           282.9722
  33870 wait_task_stopped                         54.2788
   1006 thread_return                              4.6147
    289 ia32_syscall                               4.2500
    252 ptrace_stop                                0.6462
    252 do_notify_parent_cldstop                   0.9000
    190 schedule                                   0.1203


I'll do another run after things finish stabalizing.

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 19:07:52

after things stabilize the results look basicly the same (the exact 
numbers vary slightly, but the top symbols remain the same

should I forward this to linux-kernel as well?

David Lang

On Mon, 27 Mar 2006, David Lang wrote:

> rebooting my system with idle=poll and profile=2 I get the following profile 
> while the UML instances are trying to startup (again a 60 second sleep 
> between start and end, this is somewhat longer then 60 seconds since the 
> machine is so bogged down)
>
> CPU: AMD64 processors, speed 2605.96 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit 
> mask of 0x00 (No unit mask) count 100000
> samples  %        symbol name
> 3527977  41.2259  eligible_child
> 2601815  30.4033  do_wait
> 1857112  21.7011  __write_lock_failed
> 47511     0.5552  __sched_text_start
> 38076     0.4449  main_timer_handler
> 28270     0.3303  do_gettimeoffset_pm
>
> readprofile (same 60 sec sleep) results in
>
> sort -rn profile.out |head
> 150120 total                                      0.0353
> 59483 do_wait                                   63.6182
> 50935 eligible_child                           282.9722
> 33870 wait_task_stopped                         54.2788
>  1006 thread_return                              4.6147
>   289 ia32_syscall                               4.2500
>   252 ptrace_stop                                0.6462
>   252 do_notify_parent_cldstop                   0.9000
>   190 schedule                                   0.1203
>
>
> I'll do another run after things finish stabalizing.
>
> David Lang
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by xPML, a groundbreaking scripting language
> that extends applications into web and mobile media. Attend the live webcast
> and join the prime developer group breaking into this new coding territory!
> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642
> _______________________________________________
> User-mode-linux-user mailing list
> Use...@li...
> https://lists.sourceforge.net/lists/listinfo/user-mode-linux-user
>

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Blaisorblade <bla...@ya...> - 2006-03-28 20:03:33

On Tuesday 28 March 2006 01:58, Christopher S. Aker wrote:
> David Lang wrote:

>  > I upped the pid_max value to 128000 to give me some headroom there (each
>  > of the first 18 uml instances will end up running ~3600 processes when
>  > they finish booting)

> Sounds to me like you're running those UMLs in TT mode.

Process proliferation happens even if they work in SKAS0 mode.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 20:29:00

On Tue, 28 Mar 2006, Blaisorblade wrote:

> On Tuesday 28 March 2006 01:58, Christopher S. Aker wrote:
>> David Lang wrote:
>
>> > I upped the pid_max value to 128000 to give me some headroom there (each
>> > of the first 18 uml instances will end up running ~3600 processes when
>> > they finish booting)
>
>> Sounds to me like you're running those UMLs in TT mode.
>
> Process proliferation happens even if they work in SKAS0 mode.

should I be applying the SKAS3 patches? (I can do this for testing, but as 
I go into production I'll need to use the vanilla kernels, are these 
getting close to being merged?)

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Blaisorblade <bla...@ya...> - 2006-03-28 20:41:33

On Tuesday 28 March 2006 22:28, David Lang wrote:
> On Tue, 28 Mar 2006, Blaisorblade wrote:

> should I be applying the SKAS3 patches? (I can do this for testing, but as
> I go into production I'll need to use the vanilla kernels, are these
> getting close to being merged?)

No, we need to redesign them first; also there are various ideas (some only 
prototyped, some going to be included in -mm) able to improve performances 
further, so when they'll be ready we'll merge the final form.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

		
___________________________________ 
Yahoo! Messenger with Voice: chiama da PC a telefono a tariffe esclusive 
http://it.messenger.yahoo.com

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Blaisorblade <bla...@ya...> - 2006-03-28 21:00:44

On Tuesday 28 March 2006 02:16, David Lang wrote:
> On Mon, 27 Mar 2006, Christopher S. Aker wrote:
> > David Lang wrote:
> >> what could I do to assist in tracking down what is causing the
> >> contention?

> > Are you running a skas3-patched host kernel?  You didn't mention if you
> > were running in 32bit mode.  I don't believe there's a skas patch for
> > 64bit kernels (yet).  IMO, skas3 is required what you're after.

> > http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/

> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 (which
> I understood included the skas patch)

No, it doesn't if you don't patch it. Add the patch, but you can't run them in 
full SKAS3; you can pass "mode=skas0" to force skas0 mode, but you can then 
also pass "noprocmm" to force half SKAS3.

> > Sounds to me like you're running those UMLs in TT mode.  If you
> > can't/aren't going to patch your host with skas3, at least run a recent
> > 2.6-um kernel in skas0 mode, which doesn't require a host kernel patch.

> they are running in skas mode, staticly compiled. the um's are 32-bit
> 2.6.16 TT mode disabled to enable static linking. the systems finish the
> boot sequence after useing about 12 min of cpu time each.

> > Use tmpfs mount for TMPDIR, as UML will use that to store its memory
> > file.

> very little disk activity is takeing place during this time these are all
> COW root images from a ~300M base image

He's talking about UML's ram, not about disk images - that's mmapped from a 
file in $TMPDIR (normally /tmp).
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 22:09:02

On Tue, 28 Mar 2006, Blaisorblade wrote:

>>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/
>
>> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16 (which
>> I understood included the skas patch)
>
> No, it doesn't if you don't patch it. Add the patch, but you can't run them in
> full SKAS3; you can pass "mode=skas0" to force skas0 mode, but you can then
> also pass "noprocmm" to force half SKAS3.

Ok, I've gone through and read the docs on blaisorblade's pages about 
SKAS, and I'm still not understanding things.

the 2.6.16 kernel includes a SKAS option in the configs (it only shows if 
you have TT mode enabled)

when they boot up the uml's report that they are starting in skas0 mode

however, the discussions you have up don't seem to match the bahavior of 
the resulting system (the discussions talk about TT vs SKAS mode, is this 
TT vs SKAS3 mode?)

it sounds as if I need to apply the SKAS patches and then pass "noprocmm" 
to get the 'half SKAS3' mode

is this correct?

>>> Sounds to me like you're running those UMLs in TT mode.  If you
>>> can't/aren't going to patch your host with skas3, at least run a recent
>>> 2.6-um kernel in skas0 mode, which doesn't require a host kernel patch.
>
>> they are running in skas mode, staticly compiled. the um's are 32-bit
>> 2.6.16 TT mode disabled to enable static linking. the systems finish the
>> boot sequence after useing about 12 min of cpu time each.
>
>>> Use tmpfs mount for TMPDIR, as UML will use that to store its memory
>>> file.
>
>> very little disk activity is takeing place during this time these are all
>> COW root images from a ~300M base image
>
> He's talking about UML's ram, not about disk images - that's mmapped from a
> file in $TMPDIR (normally /tmp).

Ok, I'll define $TMPDIR to be /dev/shm (which debian mounts a tmpfs on) 
and try this again

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Blaisorblade <bla...@ya...> - 2006-03-28 22:18:22

On Wednesday 29 March 2006 00:09, David Lang wrote:
> On Tue, 28 Mar 2006, Blaisorblade wrote:
> >>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/
> >>
> >> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16
> >> (which I understood included the skas patch)
> >
> > No, it doesn't if you don't patch it. Add the patch, but you can't run
> > them in full SKAS3; you can pass "mode=skas0" to force skas0 mode, but
> > you can then also pass "noprocmm" to force half SKAS3.
>
> Ok, I've gone through and read the docs on blaisorblade's pages about
> SKAS, and I'm still not understanding things.

> the 2.6.16 kernel includes a SKAS option in the configs (it only shows if
> you have TT mode enabled)

Because otherwise it's auto-enabled.
We're talking of guest support, in case it's not clear.

However:
*) for ages, SKAS meant SKAS3. And SKAS3 requires a patch on the host.
*) Now SKAS includes also SKAS0; SKAS0 was born some time ago, doesn't require 
special host support, and is not as fast as SKAS3, but a lot faster than TT 
mode.
*) there are 3 differences between SKAS0 and SKAS3, and they can be 
individually enabled; /proc/mm doesn't work on x86_64 host, but with 
"noprocmm" you enable the others, which are more important for performance. 
That's what I called in the previous mail "half SKAS3" (you won't find 
references to this term anywhere else).

> when they boot up the uml's report that they are starting in skas0 mode

Exactly, matches with the above.

> however, the discussions you have up don't seem to match the bahavior of
> the resulting system (the discussions talk about TT vs SKAS mode, is this
> TT vs SKAS3 mode?)

Can't find the exact quote, however "SKAS" could have meant "SKAS3" 
specifically...

For instance, in full SKAS3 you wouldn't get thousands of process on the host, 
and instead each UML would start about 5 processes. So "Chris Aker" said "I 
think you are running in TT mode", but he forgot that SKAS0 produces similar 
results (but SKAS0 currently starts a single thread per guest process, even 
if the guest process is creating more threads).

> it sounds as if I need to apply the SKAS patches and then pass "noprocmm"
> to get the 'half SKAS3' mode

> is this correct?
Yes.
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 22:33:32

On Wed, 29 Mar 2006, Blaisorblade wrote:

> On Wednesday 29 March 2006 00:09, David Lang wrote:
>> On Tue, 28 Mar 2006, Blaisorblade wrote:
>>>>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/
>>>>
>>>> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16
>>>> (which I understood included the skas patch)
>>>
>>> No, it doesn't if you don't patch it. Add the patch, but you can't run
>>> them in full SKAS3; you can pass "mode=skas0" to force skas0 mode, but
>>> you can then also pass "noprocmm" to force half SKAS3.
>>
>> Ok, I've gone through and read the docs on blaisorblade's pages about
>> SKAS, and I'm still not understanding things.
>
>> the 2.6.16 kernel includes a SKAS option in the configs (it only shows if
>> you have TT mode enabled)
>
> Because otherwise it's auto-enabled.
> We're talking of guest support, in case it's not clear.
>
> However:
> *) for ages, SKAS meant SKAS3. And SKAS3 requires a patch on the host.
> *) Now SKAS includes also SKAS0; SKAS0 was born some time ago, doesn't require
> special host support, and is not as fast as SKAS3, but a lot faster than TT
> mode.

these are the pieces I was missing, thanks.

I just downloaded the 2.6.16-bb1 patchset, does it include the SKAS3 
patches or should I install the skas-2.6.16-v9-pre9 patchset as well?

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Blaisorblade <bla...@ya...> - 2006-03-28 22:34:35

On Tuesday 28 March 2006 01:42, David Lang wrote:
> yOn Mon, 27 Mar 2006, David Lang wrote:
> > I foolishly attempted to startup 25 uml instances on one system (dual 252
> > opterons with 8G of ram, each um instance getting 256M)

> > what I found was that they seem to be getting in each others way a LOT
> > (just on system boot), vmstat on the host is showing almost all of the
> > cpu time (80%+) being spent in the system, not in userspace (which
> > surprised me)

> > so before I spend much time gathering info to try and debug this I wanted
> > to ask what the current limits are, and if the limits should just be cpu
> > and ram, then I'll do more digging to find out what's happening in my
> > case.

> well, I reduced the count to 19 instances, and upped the ram on each one
> to 400M (they were hitting oom with only 256m each)

> almost an hour later the machines still haven't finished booting with top
> looking basicly the same for the last half hour or so.

> top - 16:42:18 up 4 days, 23:11, 25 users,  load average: 16.79, 16.65,
> 16.23 Tasks: 44193 total,  14 running, 139 sleeping, 44040 stopped,   0
> zombie Cpu0 :  2.3% us, 97.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 
> 0.0% si Cpu1 :  3.2% us, 96.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi, 
> 0.1% si Mem:   8186088k total,  8145620k used,    40468k free,    11436k
> buffers Swap:  2048276k total,        0k used,  2048276k free,  4959636k
> cached

> so far it looks to me like ram is Ok, but the high system percentage looks
> strange to me. the system closest to finishing it's boot has used a little
> over 10 min of cpu time (>5x the normal wall clock time for the boot) so
> I am running into contention at some point here.

I know that it's maybe a bad workaround, but what about sequential startup 
both of UMLs and of the jobs inside them?

I'd run "vmstat 1" to watch for increase of context switches - an eccessive 
amount of them is likely to burn you out.

Look below (I'm selecting the context switches count with awk) - the low 
numbers (~1000-2000) are with the system running only a CPU-hog in 
background, the high ones (~100 000) are when I run inside UML:

$ while :; do /bin/true; done

vmstat 1|awk '{print $12}'

cs
2714
113208
92306
109654
82226
1478
1235
84262
114143
115037
112424

But with apache benchmark on a UML, I can get higher numbers:
$ ab2  -t 30 -v 1 Sarge/apache2-default/

cs
5170
945
1089
977
756
1083
24879
106316
119013
121471
99706
122907
127361
108613
130089
126837
123382
116747
130797
131173
129478

cs
124357
124892
102280
127380
129434
109804
113807
123767
133519
125085
119384
129109
129997
126418
60348
892
675
743
851
655
780
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-28 22:47:04

On Wed, 29 Mar 2006, Blaisorblade wrote:

> On Tuesday 28 March 2006 01:42, David Lang wrote:
>> yOn Mon, 27 Mar 2006, David Lang wrote:
>>> I foolishly attempted to startup 25 uml instances on one system (dual 252
>>> opterons with 8G of ram, each um instance getting 256M)
>
>>> what I found was that they seem to be getting in each others way a LOT
>>> (just on system boot), vmstat on the host is showing almost all of the
>>> cpu time (80%+) being spent in the system, not in userspace (which
>>> surprised me)
>
>>> so before I spend much time gathering info to try and debug this I wanted
>>> to ask what the current limits are, and if the limits should just be cpu
>>> and ram, then I'll do more digging to find out what's happening in my
>>> case.
>
>> well, I reduced the count to 19 instances, and upped the ram on each one
>> to 400M (they were hitting oom with only 256m each)
>
>> almost an hour later the machines still haven't finished booting with top
>> looking basicly the same for the last half hour or so.
>
>> top - 16:42:18 up 4 days, 23:11, 25 users,  load average: 16.79, 16.65,
>> 16.23 Tasks: 44193 total,  14 running, 139 sleeping, 44040 stopped,   0
>> zombie Cpu0 :  2.3% us, 97.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,
>> 0.0% si Cpu1 :  3.2% us, 96.7% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,
>> 0.1% si Mem:   8186088k total,  8145620k used,    40468k free,    11436k
>> buffers Swap:  2048276k total,        0k used,  2048276k free,  4959636k
>> cached
>
>> so far it looks to me like ram is Ok, but the high system percentage looks
>> strange to me. the system closest to finishing it's boot has used a little
>> over 10 min of cpu time (>5x the normal wall clock time for the boot) so
>> I am running into contention at some point here.
>
> I know that it's maybe a bad workaround, but what about sequential startup
> both of UMLs and of the jobs inside them?

I'll try it for a test and let you know how it works

> I'd run "vmstat 1" to watch for increase of context switches - an eccessive
> amount of them is likely to burn you out.

I'll check for this, but this would surprise me. inside the uml's the only 
thing that is activly running is heartbeat (linux-ha.org). even with a 
dozen copies running (one per uml) this should only generate a small 
amount of traffic (18 udp packets sent per second to the broadcast 
addresses for all 12 boxes combined)

David Lang

Re: [uml-user] what are the current limits on how many uml's on one host?

From: Blaisorblade <bla...@ya...> - 2006-03-28 22:43:47

On Wednesday 29 March 2006 00:33, David Lang wrote:
> On Wed, 29 Mar 2006, Blaisorblade wrote:
> > On Wednesday 29 March 2006 00:09, David Lang wrote:
> >> On Tue, 28 Mar 2006, Blaisorblade wrote:
> >>>>> http://www.user-mode-linux.org/~blaisorblade/patches/skas3-2.6/
> >>>>
> >>>> sorry, the host is a dual Opteron 252 with 8G of ram running 2.6.16
> >>>> (which I understood included the skas patch)
> >>>
> >>> No, it doesn't if you don't patch it. Add the patch, but you can't run
> >>> them in full SKAS3; you can pass "mode=skas0" to force skas0 mode, but
> >>> you can then also pass "noprocmm" to force half SKAS3.
> >>
> >> Ok, I've gone through and read the docs on blaisorblade's pages about
> >> SKAS, and I'm still not understanding things.
> >>
> >> the 2.6.16 kernel includes a SKAS option in the configs (it only shows
> >> if you have TT mode enabled)
> >
> > Because otherwise it's auto-enabled.
> > We're talking of guest support, in case it's not clear.
> >
> > However:
> > *) for ages, SKAS meant SKAS3. And SKAS3 requires a patch on the host.
> > *) Now SKAS includes also SKAS0; SKAS0 was born some time ago, doesn't
> > require special host support, and is not as fast as SKAS3, but a lot
> > faster than TT mode.
>
> these are the pieces I was missing, thanks.
>
> I just downloaded the 2.6.16-bb1 patchset, does it include the SKAS3
> patches or should I install the skas-2.6.16-v9-pre9 patchset as well?

It includes skas as well, but remember you must do mrproper between building 
UML and host kernel or use O= with two different output directories (as I 
do):

mkdir ../BUILD

OUT=../BUILD/um-linux-2.6.16-build
mkdir $OUT
make ARCH=um SUBARCH=i386 O=$OUT menuconfig
make ARCH=um SUBARCH=i386 O=$OUT 

OUT_HOST=../BUILD/64-linux-2.6.16-build
mkdir $OUT_HOST
make O=$OUT_HOST menuconfig
make O=$OUT_HOST

Bye
-- 
Inform me of my mistakes, so I can keep imitating Homer Simpson's "Doh!".
Paolo Giarrusso, aka Blaisorblade (Skype ID "PaoloGiarrusso", ICQ 215621894)
http://www.user-mode-linux.org/~blaisorblade

	

	
		
___________________________________ 
Yahoo! Mail: gratis 1GB per i messaggi e allegati da 10MB 
http://mail.yahoo.it

Re: [uml-user] what are the current limits on how many uml's on one host?

From: David L. <da...@la...> - 2006-03-30 00:11:47

On Wed, 29 Mar 2006, Blaisorblade wrote:

>> these are the pieces I was missing, thanks.
>>
>> I just downloaded the 2.6.16-bb1 patchset, does it include the SKAS3
>> patches or should I install the skas-2.6.16-v9-pre9 patchset as well?
>
> It includes skas as well, but remember you must do mrproper between building
> UML and host kernel or use O= with two different output directories (as I
> do):

I thought that what I had downloaded was the patches between 2.6.16 and 
2.6.16-bb1, but instead it was the 2.6.16-bb1 precompiled kernel and 
modules. (downloaded from http://www.user-mode-linux.org/~blaisorblade/binaries/2.6.16-bb1/)

I see guest patches at 
http://www.user-mode-linux.org/~blaisorblade/patches/guest/uml-2.6.16-bb1/ 
is there a similar set of patches for the host? or is this for both?

David Lang