#14 event counts get off and incorrectly kills requests

open
nobody
5
2011-11-22
2011-11-22
Jon Foster
No

The QS_EventRequestLimit counts aren't accurate. I have a couple of lines in my config:

BrowserMatchNoCase "googlebot" QS_Cond=google
QS_EventRequestLimit QS_Cond=^google$ 20

using netstat I can watch connections from the googlebot net and compare it to the reported rule counts in the QOS status page. When Apache is first started the counts do as you expect, going up and down with the connection counts. At some point the counts start "ratcheting" up. When the connections are all cleared for a period of time, according to netstat, the rule count won't return to 0. The value that it shows for zero will keep going up until an Apache restart is performed. The values will eventually hit the limit and start blocking all requests that match the event rule.

I've watched the access logs simultaneously to watching netstat and the status page. During the quiet periods I don't see any logged hits that match the rules. I've also changed the "environment" variable used to trigger the rule and used "SetEnvIf Remote_Addr" to set the variable via IP address and get the same results.

Its seems the counts don't decrement when the traffic matching the rule happens fast enough. But that is just a rough observation.

We're using Gentoo x86_64 Linux (up to date), Apache 2.2.21 and mod_qos 9.74.

Discussion

  • Jon Foster

    Jon Foster - 2011-11-22
    • priority: 5 --> 9
     
  • Pascal Buchbinder

    What MPM are you using (worker? prefork?) Do you have a MaxRequestsPerChild limitation set?

     
  • Pascal Buchbinder

    • priority: 9 --> 5
     
  • Jon Foster

    Jon Foster - 2011-11-23

    We're using prefork. MaxRequestsPerChild is set to 1000.

     
  • Pascal Buchbinder

    I could not reproduce the problem yet.

     
  • Jon Foster

    Jon Foster - 2011-11-30

    We are getting about 330,000 hits an hour. Perhaps it has something to do with the volume of requests or how quickly the events that we're trying to track are firing. On average we have seen around 10 simultaneous connections from Google. Most of the hits that I've seen seem to come from about a 512 IP address range. Normally all of the concurrent connections will come from different IPs.

    I realize we're counting events here and not connections. Just throwing out info about everything I can think of with this particular issue in hopes of sparking some idea.

    For my info: What triggers the start and end of an event?

     
  • Jon Foster

    Jon Foster - 2011-11-30

    Another note, it if makes any difference: The machine in question is a dual-four-core Intel Xeon. I don't know if the number of physical chips and/or cores might contribute to the problem.

     
  • Pascal Buchbinder

    > For my info: What triggers the start and end of an event?
    The QS_EventRequestLimit directive may be used to limit the number of concurrent requests having the specified environment variable set. Detection (counter increment) starts at the header parser hook and the is reseted (decrement) at the logger hook (when all data has been send to the client), see http://opensource.adnovum.ch/mod_qos/mod_qos_seq.gif.

    The things I know about your setup:
    - One mod_qos directive (QS_EventRequestLimit)
    - MPM prefork
    - MaxRequestsPerChild is set to 1000
    - about 90 requests per second (some hitting the event rule)

    Information that could be helpful to reproduce the error:
    - Do you use VirtualHosts? (have you defined QS_EventRequestLimit globally?)
    - Do you get errors in the log? Have you the ErrorDocument directive set for those errors?
    - Has KeepAlive been enabled?
    - What's the handler serving the request? (local htdocs? mod_proxy? php? ...?)
    - anything else?

     
  • Jon Foster

    Jon Foster - 2011-12-01

    > Information that could be helpful to reproduce the error:
    > - Do you use VirtualHosts? (have you defined QS_EventRequestLimit globally?)
    Yes, we have many virtual hosts. All mod_qos settings are done globally, not per host.

    > - Do you get errors in the log? Have you the ErrorDocument directive set for those errors?
    I don't see any errors for mod_qos or for the IP range in question except for "access denied, invalid request line: can't parse uri" or "access denied, QS_EventRequestLimit rule: ...". That first error hasn't happened for the traffic we're targeting. We have ErrorDocument for 403 & 404 errors.

    > - Has KeepAlive been enabled?
    Yes.

    > - What's the handler serving the request? (local htdocs? mod_proxy? php? ...?)
    Mostly PHP with local for non-dynamic content.

    > - anything else?
    Here are some of the other features that we are using:
    QS_SrvMaxConnExcludeIP
    QS_SetEnvIf
    QS_ErrorResponseCode 503
    QS_VipUser
    QS_ClientPrefer
    QS_SrvMaxConnClose
    QS_SrvMaxConnPerIP
    QS_SrvMinDataRate
    QS_ClientEntries 400000

    I included the setting where I thought it might be of interest. We do use the QS_SetEnvIf to check for QS_VipRequest or QS_IsVipRequest and delete the environment variables that would trigger the events if they are set.

     

Log in to post a comment.