Menu

Is this right syntax?

2012-03-08
2013-05-23
  • Dittawat Thaiseoboard

    I need to block ip which have 404 error. Below is my code

       QS_ClientEventBlockCount 20 300
       QS_SetEnvIfStatus        404               QS_Block
       QS_SetEnvIfStatus        NullConnection    QS_Block
       SetEnvIf     Remote_Addr 66.249.     !QS_Block

    I need to by pass Googlebot, for make sure googlebot not be blocked. Is this config valid?

    Thank you

     
  • Pascal Buchbinder

    SetEnvIf is processed when reading the HTTP request while QS_SetEnvIfStatus sets the event at the response. You may unset QS_Block for Googlebot using the QS_SetEnvIf directive.

    QS_ClientEventBlockCount 20 300
    QS_SetEnvIfStatus 404 QS_Block
    QS_SetEnvIfStatus NullConnection QS_Block 
    SetEnvIf Remote_Addr ^66\.249\. QS_Googlebot
    QS_SetEnvIf QS_Googlebot QS_Block !QS_Block
    

    see also http://opensource.adnovum.ch/mod_qos/mod_qos_seq.gif

    By the way: how do you know google's IP addresses? Why does the bot cause multiple 404 responses?

     
  • Dittawat Thaiseoboard

    Hello,

    Google bot use 66.249.0.0/16 range for a while (all of google bot come with that range) Multiple 404 for Google bot can occur when our site was deleted but google do not know it and try to connect to the same page (which was deleted).

    I have further question.

    If I set too many line of  SetEnvIf Remote_Addr xx.xx QS_VipRequest=yes (may be around 100 lines) do this affect some performance? (I use VPS) As I need to whitelist set of IP range.

     
  • Pascal Buchbinder

    I would expect that a regular expression like

    ^66\.249
    

    is quite fast (but still consumes CPU cycles which can't be used by other threads).

    How do you find out which IP addresses are from google? Do you analyse the User-Agent header (if so, why not defining a User-Agent rule)? And, why not just block 404 requests (even they are from google) as long as you don't have invalid links within your content? Guess the crawler will resume scanning your site later even it gets temporary blocked.
    ….just my two cents

     
  • Dittawat Thaiseoboard

    Get an idea from you, I will regex for user-agent string.

    If I set too many line of  SetEnvIf Remote_Addr xx.xx QS_VipRequest=yes (may be around 100 lines to whitelist all IP range in my country) do this affect some performance?

    Thank you very much

     
  • Pascal Buchbinder

    I've tested 100 rules

    SetEnvIfPlus Remote_Addr ^62\.1\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.2\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.3\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.4\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.5\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.6\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.7\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.8\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.9\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.10\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.11\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.12\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.13\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.14\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.15\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.16\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.17\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.18\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.19\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.20\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.21\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.22\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.23\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.24\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.25\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.26\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.27\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.28\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.29\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.30\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.31\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.32\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.33\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.34\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.35\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.36\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.37\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.38\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.39\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.40\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.41\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.42\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.43\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.44\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.45\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.46\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.47\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.48\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.49\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.50\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.51\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.52\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.53\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.54\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.55\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.56\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.57\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.58\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.59\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.60\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.61\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.62\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.63\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.64\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.65\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.66\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.67\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.68\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.69\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.70\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.71\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.72\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.73\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.74\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.75\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.76\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.77\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.78\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.79\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.80\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.81\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.82\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.83\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.84\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.85\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.86\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.87\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.88\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.89\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.90\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.91\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.92\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.93\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.94\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.95\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.96\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.97\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.98\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.99\. QS_Googlebot
    SetEnvIfPlus Remote_Addr ^62\.100\. QS_Googlebot
    

    on a Intel U2300 1.20GHz CPU running a 32bit Apache server. It took 0.04 milliseconds to iterate through all rules. I would not expect any performance impact.

     
  • Dittawat Thaiseoboard

    Many thank again for your test :)

    I'm sorry if I asking too much question as I am very new to httpd

    1. What's different between SetEnvIfPlus and SetEnvIf?
    2. I need to totally block some bad bot in the first connection attemp (while I have "QS_ClientEventBlockCount 20"). Do below syntax is okay?

       SetEnvIfNoCase    User-Agent "MJ12bot" QS_Block
       QS_ClientEventBlockCount 20 900
       QS_SetEnvIfStatus        404               QS_Block
       QS_SetEnvIfStatus        NullConnection    QS_Block

    Please note that I need to block MJ12bot at the first connection (Do not count to 20 connection)

     
  • Pascal Buchbinder

    1. What's different between SetEnvIfPlus and SetEnvIf?

    The code between mod_setenvif (standard Apache module) and mod_setenvifplus (optional module I have developed and therefore it was faster for me to equip it by some profiling code) is very similar and should have the same performance impact.

    I would configure the following directives to block every request from a client sending the string MJ12bot within the User-Agent header:

    BrowserMatchNoCase MJ12bot QS_DenyMJ12
    <Location />
      QS_DenyEvent +DenyMJ12 deny QS_DenyMJ12
    </Location>
    
     

Log in to post a comment.