#1 very poor performance

closed
nobody
5
2014-08-20
2008-04-23
nyc_863
No

I tried mod_qos on an httpd-2.2.8, and found very poor performance. The server was previously handling maybe 2000 requests per minute over 2000 IPs without mod_qos and was 97% idle.

With mod_qos it was 100% busy and the requests were bring processed without error, but so slowly all 4096 slots filled up almost immediately and I could no longer even go to server-status.

it was difficult for me to debug where the slowdown originated but I'm wondering if perhaps it was regular expression matching headers?

or some function of the larger working set of IP addresses vs a test environment?

my config looked like:

#
# mod_qos
#
QS_ErrorPage /front/xxx

SetEnvIf Remote_Addr 192.168.1.30 QS_VipRequest=yes

QS_VipIPHeaderName mod-qos-login
QS_ClientPrefer
QS_ClientEntries 100000

# restricts max concurrent requests for any location which has no
# individual rule:
QS_LocRequestLimitDefault 100
#QS_LocRequestLimitMatch "^/r0/download.*" 30

# allows the application to nominate VIP users by sending a
# "mod-qos-vip" HTTP response header:
QS_VipHeaderName mod-qos-vip
QS_SessionKey na&5san-sB.F4_0a=%D200ahLK1
# set this header to VIP someone
QS_VipHeaderName mod-qos-vip
QS_SessionKey na&5san-sB.F4_0a=%D200ahLK1

# limits the connections for this virtual host:
QS_SrvMaxConn 4096

# allows keep-alive support till the server reaches 600 connections:
QS_SrvMaxConnClose 2048
QS_SrvConnTimeout 5

# allows max 20 connections from a single ip address:
QS_SrvMaxConnPerIP 5

# disables connection restrictions for certain clients:
#QS_SrvMaxConnExcludeIP 172.18.3.32

QS_SetEnvStatus 503 QS_Block
QS_ClientEventPerSecLimit 5
QS_ClientEventBlockCount 20 600

# don't allow a client IP to access /app/start.html 20 or
# more times within 10 minutes:
#SetEnvIf Request_URI /about QS_Block=yes
#QS_ClientEventBlockCount 20

# don't allow more than 20 "403" status code responses
# (forbidden) for a client withn 10 minutes:
#QS_SetEnvStatus 403 QS_Block

By the way, I could not get "error pages" to work where an IP was blocked due to QS_Block.

The IP would get internal server error 500s instead of any custom error page.

Discussion

  • nyc_863
    nyc_863
    2008-04-23

    Logged In: YES
    user_id=2070672
    Originator: YES

    I'm looking back at the access_log as the server ground to a halt.

    the time stamps are out of order.. this indicates the
    high degree of variability in request processing time, some requests are taking 30 seconds or more to log.

    And the field I added to access_log that shows concurrent users rises, but does
    not approach 4096, I see from 20 to 300 being reported there.

    So basically the server becomes very VERY slow.

     
  • Logged In: YES
    user_id=954354
    Originator: NO

    This has nothing to do with regular expressions. I rather guess, that you reached one of the connection restrictions during the measurement, e.g. QS_SrvMaxConnClose or QS_LocRequestLimitDefault. In addition, QS_ClientEventPerSecLimit slows requests down (stop request processing for up to 5 seconds) in order to enforce the defined limitation (but I don't know your SetEnvIf sesstings about QS_Event nor do I know your test clients/server application).

    HTTP keep alive has always two aspects: Using keep alive provides high performance to a few clients. Closing connections allows other clients to send their HTTP requests, but it usually causes high server load and bad system performance.

    I've tested the provided configuration (only 20 clients, but processing about 1800 requests/second) and measured a difference of about 3 percent.

    About QS_ClientEventBlockCount: mod_qos denies access at connection level (before the client is able to cause any further events, see also http://mod-qos.sourceforge.net/mod_qos_seq.gif\). The server can't provide an error message but closes the TCP connection.

     
    • status: open --> pending
     
  • nyc_863
    nyc_863
    2008-05-03

    Logged In: YES
    user_id=2070672
    Originator: YES

    thanks for the feedback.

    About QS_ClientEventBlockCount - why would the client get internal server error 500 if mod_qos denies access at connection level? the error 500 is getting reported to the client _by apache_. If apache was terminating the connection then the client would get "connection reset by server" or similar non-error error?

    Back to the performance question. I'll test the config again but will raise all the limits so none could possibly be hit.

    I have some doubt that any limits caused the server to choke because the client request stream was a very wide array of real IPs doing real requests, there would be no reason for any of them to hit any of the limits in the config (unless I have misunderstood one of the config variables) - AND - the behavior of the server under load was to suck down 100% user cpu while it processed everyones requests very slowly.

    If delays to "slow requests down" were being triggered, the server should be full but largely idle in terms of cpu? vmstat showed 100% user-space cpu :(

    I'll restart a production test with no limits at all to first verify that mod_qos can inspect the request stream without impact, then gradually introduce limits if that works ok, and come back with findings.

     
  • nyc_863
    nyc_863
    2008-05-03

    • status: pending --> open
     
  • Logged In: YES
    user_id=954354
    Originator: NO

    Well, I've to thank you for giving me feedback about your experience by using mod_qos.

    The only directive requiring CPU is QS_ClientEntries. Every new IP address (first seen) is inserted into a sorted list which requires several milliseconds (my Intel/Linux based laptop requires 30 to 50ms to sort a list with 100000 entries). This could be an issue when starting 2000 test clients simultaneously. What happens if you lower the QS_ClientEntries setting (less IP enties)? Are you able to get a pstack output of the most busy processes (probably difficult to determine unless your server runs in MPM worker mode with using many threads).

     
    • status: open --> pending
     
  • Logged In: YES
    user_id=1312539
    Originator: NO

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).

     
    • status: pending --> closed