|
From: David S. <ope...@to...> - 2008-11-27 17:46:44
|
James MacLean wrote:
> David Sommerseth wrote:
>>
>>
>> James MacLean wrote:
>>> Hi Folks,
>>>
>>> I have parsed around a bit but have not come up with a solid
>>> suggestion to increase performance in the following environment :
>>>
>>> . +150 clients always on, always via COAX modem 15Mb/s down 1.5Mb/s up.
>>> . OpenVPN-2.0.9 and 2.1rc13 tested, setup as single server
>>> . Server Kernel 2.6.25.4
>>> . Server 64bit
>>> . Server CPU % rarely goes above 30
>>> . Server is fed over a 10G link
>>>
>>> Currently we get what appears to be only between 5 and 6 MB/s average
>>> using this setup.
>>>
>>> If only activity is over a single tunnel we can get the expected max
>>> (about 14Mb/s to the remote site) for the COAX sites. Once traffic
>>> builds during the day, that number drops.
>>>
>>> We know if we hit it locally we can get 160Mb/s. We know if we do hit
>>> it locally and are getting the 160Mb/s that the COAX tunnels do
>>> suffer. Starting by almost 1/2 of their normal throughput tunnel
>>> speed of almost 14Mb/s.
>>>
>>> So in my small mind, I am thinking we are seeing around 48Mb/s
>>> (6MB/s*8) used, but that we should be able to get over 150Mb/s. CPU
>>> isn't hurting. Almost feels like there is a governor slowing down the
>>> traffic :).
>>>
>>> Important settings from latest config :
>>>
>>> verb 1
>>> dev tap
>>> tun-mtu 1500
>>> tun-mtu-extra 32
>>> mssfix 1468
>>> proto udp
>>> ca SSCert.pem
>>> cert servercert.pem
>>> key serverkey.pem
>>> dh dh1024.pem
>>> tls-auth ./tlspass
>>> keepalive 30 63
>>> ping-timer-rem
>>> persist-tun 1
>>> persist-key 1
>>> cipher none
>>> tcp-queue-limit 4096
>>> sndbuf 131072
>>> rcvbuf 131072
>>>
>>>
>>> Anyone have any words of wisdom :) ?
>>>
>>
>> Have you tried different ciphers and/or cipher key sizes? I know you
>> say the server do not suffer with too high load, but it could be
>> inefficiency in the cipher algorithm. If that's the case it might be
>> as well an OpenSSL issue too. It's a shot in the dark, but would be
>> good to wipe this one out. The default is blowfish, so I really do
>> not expect an improvement.
>>
>> Do you know if threads are enabled in your OpenVPN setup?
>> (compile/configure setting). I believe the default is not to use
>> threads.
>>
>> Does the performance drop if you have 150+ clients connected while
>> being passive (not sending any traffic over the tunnel) and only
>> having 1 client sending traffic?
>>
>>
>> kind regards,
>>
>> David Sommerseth
> Hi David,
>
> I had hoped that "cipher none" would have the least overhead. Perhaps
> there is a better one to try?
Hehe ... no, "cipher none" should have the very least overhead. I would be
very much surprised if anything goes through OpenSSL at this moment. But I
probably don't need to say anything about the security level by doing it.
Anyway, for testing and debugging - good approach!
> Threads are enabled in the build, but I only ever see one in the running
> program. Maybe 64bit is showing it differently or "ps axms" and "ps
> -eLf" are not the way to display them ?
ps -eLf should display all threads, afaik.
Not sure though how the really threads are implemented, but when I dig into
the code it seems to be initialised as a single thread. I cannot find
traces in the code that indicates that multiple threads is implemented.
But it seems like the code is getting ready for it.
I will need to be corrected if my suspicion is wrong, that the core
behaviour between threaded and non-threaded binaries is almost behaving the
same, and not spawning out a thread per connection. If this is the case,
I'm not sure if it has any performance impact to use the threaded model.
Unless OpenSSL encryption is running in an own separate thread (I have not
investigated this)
> Performance seems fine if they are doing nothing. We can get the full
> expected bandwidth from a single client, or even a small number of clients.
>
> But when the general use of the tunnels comes up, that's when they
> appear to suffer.
>
> I regret I do not have much in depth info, but I'm really not sure which
> direction I should be aiming :).
Hmm ... that just seems to indicate that it is a drastic performance drop
when too many clients are using the tunnels.
When I look at the code, which is quite complex when it comes to the part
when clients connect, it seems like OpenVPN has it own way of scheduling
for when and how to handle the clients. And it might be that you've found
a limit in the implementation.
This code is taken from mudp.c
/* per-packet event loop */
while (true)
{
perf_push (PERF_EVENT_LOOP);
/* set up and do the io_wait() */
multi_get_timeout (&multi, &multi.top.c2.timeval);
io_wait (&multi.top, p2mp_iow_flags (&multi));
MULTI_CHECK_SIG (&multi);
/* check on status of coarse timers */
multi_process_per_second_timers (&multi);
/* timeout? */
if (multi.top.c2.event_set_status == ES_TIMEOUT)
{
multi_process_timeout (&multi, MPP_PRE_SELECT|MPP_CLOSE_ON_SIGNAL);
}
else
{
/* process I/O */
multi_process_io_udp (&multi);
MULTI_CHECK_SIG (&multi);
}
perf_pop ();
}
This seems to me to be the main loop. Here it seems that OpenVPN server is
listening for traffic on the network connections and processes each packet,
no matter which client sending it - and then analysing the packet and let a
connection "object" take care of further processing of the packet. This is
just a wild-guess, as I only spent 10-15 min looking through the code. But
a lot of process magic happens in multi_process_io_udp(), and a couple of
levels deeper a scheduling function is called.
If this really is true, it might be that this model works very well for a
good number of clients, until you reach a limit around 150+, when the cost
of doing this rescheduling begins to be too costly. If this scheduling is
not efficient enough (having a small "sleep" in between, waiting for IO,
inefficient or too many code jumps, etc), you will not see that the load on
the server increases too much - but you will most probably feel the
performance loss on the client side. With few active clients, this will of
course go better, as the internal scheduler has less clients to switch between.
In addition, I see that the code path is quite long, doing a lot of jumps
between a lot of function, and this of course also adds some penalty - even
though each function seems to be optimised.
This is of course a way how to avoid forking out or starting a new thread
per client which works independently, being task switched by the OS. But
to be honest, I think the OS scheduler might be much more efficient in the
scheduling and process switches than to have a separate one.
Can anyone with deeper knowledge than me verify or correct me? I would
like to understand this part of the code much better.
kind regards,
David Sommerseth
|