Menu

RepeatedlySlow rule triggers too soon, blocks spiders

A.J. A.J.
2024-09-19
2024-09-29
  • A.J. A.J.

    A.J. A.J. - 2024-09-19

    First, thank you very much for creating this very useful module!

    My issue is that legitimate spiders like Googlebot are being blocked by mod_qos, even if the number of requests they send is well below the limit I've set.

    Here are my rules, which are based on your suggestions for some basic DOS protection:

    # Allows max 100 connections from a single ip address:
    QS_SrvMaxConnPerIP 100
    
    # Don't allow a client IP to access a "handler" (not a static resource like
    # a jpg, gif, ..) more than 30 times within two seconds:
    QS_ClientEventLimitCount          30 2 SlowRequest
    SetEnvIf              Request_URI / SlowRequest=1
    SetEnvIf              Request_URI .*\.(jpg)|(jpeg)|(gif)|(png)|(js)|(css)|(ico) !SlowRequest
    
    # Deny a client IP for 10 minutes if it violates the rule above 2 times:
    QS_ClientEventLimitCount          2 600 RepeatedlySlow
    QS_SetEnvIf      SlowRequest_Counter=20 RepeatedlySlow=1
    
    # Send error code 429 Too Many Requests on tripping the limit.
    QS_ErrorResponseCode 429
    

    In my Apache error logs, the first appearance of the block looks like this:

    [qos:error] [client 66.249.64.166:63615] mod_qos(067): access denied, QS_ClientEventLimitCount rule: event=RepeatedlySlow, max=2, current=2, age=3, c=66.249.64.166
    

    That message is repeated many times, as mod_qos is blocking the spider repeatedly for 10 minute periods.

    When I carefully examine my access logs, I can find no evidence that the limit I've set is exceeded. In fact, this IP address is requesting no more than 5-6 pages per second at peak. In addition, I find no error log entries indicating that the "SlowRequest" event has occurred. If I understand correctly, there should be multiple SlowRequest events before RepeatedlySlow is triggered.

    I have tried restarting Apache in the hope that some persistent in-memory state would be cleared, but this hasn't helped.

    Can you suggest what might be going on or how I can troubleshoot this further?

    Thanks for any help!

     
  • Pascal Buchbinder

    I don't know why you decided on an aggressive two-stage solution. I suggest to start with a simple rule. Know the critical resources (e.g. slow application) which you want to protect and define a rule for that.
    About the two counters: RepeatedlySlow comes first as it blocks already at 20 requests (though the second time only) while SlowRequest waits until 30.

     
  • A.J. A.J.

    A.J. A.J. - 2024-09-25

    Thanks for the reply.

    My aggressive solution is based on the behavior of some overly-aggressive spiders. I've excluded static resources like images, so the idea is to enforce a reasonable rate on all other requests to my application (30 requests within 2 seconds).

    I indeed made a typo in my rules: the 20 should be 30 instead. So that explains why I saw log messages related to the "RepeatedlySlow" rule without first seeing "SlowRequest" entries.

    However, it's still a mystery why that rule was being tripped, as I'm not seeing even a rate of 20 requests per two seconds in my logs. In fact I'm seeing about half of that.

    Is there any way I can turn on a counter in the logs to see exactly how the module is counting requests? I see that I can add some extra variables to my Apache LogFormat, which I tried, but they don't shed any light on the current rate of requests. Thanks!

     
    • Pascal Buchbinder

      You can log the events and counts as part of you access log entries (adding the to the format definition), e.g. %{SlowRequest}e and %{SlowRequest_Counter}e

       

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.