SetEnvIf is processed when reading the HTTP request while QS_SetEnvIfStatus sets the event at the response. You may unset QS_Block for Googlebot using the QS_SetEnvIf directive.
Google bot use 66.249.0.0/16 range for a while (all of google bot come with that range) Multiple 404 for Google bot can occur when our site was deleted but google do not know it and try to connect to the same page (which was deleted).
I have further question.
If I set too many line of SetEnvIf Remote_Addr xx.xx QS_VipRequest=yes (may be around 100 lines) do this affect some performance? (I use VPS) As I need to whitelist set of IP range.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
is quite fast (but still consumes CPU cycles which can't be used by other threads).
How do you find out which IP addresses are from google? Do you analyse the User-Agent header (if so, why not defining a User-Agent rule)? And, why not just block 404 requests (even they are from google) as long as you don't have invalid links within your content? Guess the crawler will resume scanning your site later even it gets temporary blocked.
….just my two cents
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Get an idea from you, I will regex for user-agent string.
If I set too many line of SetEnvIf Remote_Addr xx.xx QS_VipRequest=yes (may be around 100 lines to whitelist all IP range in my country) do this affect some performance?
Thank you very much
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
on a Intel U2300 1.20GHz CPU running a 32bit Apache server. It took 0.04 milliseconds to iterate through all rules. I would not expect any performance impact.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm sorry if I asking too much question as I am very new to httpd
1. What's different between SetEnvIfPlus and SetEnvIf?
2. I need to totally block some bad bot in the first connection attemp (while I have "QS_ClientEventBlockCount 20"). Do below syntax is okay?
1. What's different between SetEnvIfPlus and SetEnvIf?
The code between mod_setenvif (standard Apache module) and mod_setenvifplus (optional module I have developed and therefore it was faster for me to equip it by some profiling code) is very similar and should have the same performance impact.
I would configure the following directives to block every request from a client sending the string MJ12bot within the User-Agent header:
I need to block ip which have 404 error. Below is my code
QS_ClientEventBlockCount 20 300
QS_SetEnvIfStatus 404 QS_Block
QS_SetEnvIfStatus NullConnection QS_Block
SetEnvIf Remote_Addr 66.249. !QS_Block
I need to by pass Googlebot, for make sure googlebot not be blocked. Is this config valid?
Thank you
SetEnvIf is processed when reading the HTTP request while QS_SetEnvIfStatus sets the event at the response. You may unset QS_Block for Googlebot using the QS_SetEnvIf directive.
see also http://opensource.adnovum.ch/mod_qos/mod_qos_seq.gif
By the way: how do you know google's IP addresses? Why does the bot cause multiple 404 responses?
Hello,
Google bot use 66.249.0.0/16 range for a while (all of google bot come with that range) Multiple 404 for Google bot can occur when our site was deleted but google do not know it and try to connect to the same page (which was deleted).
I have further question.
If I set too many line of SetEnvIf Remote_Addr xx.xx QS_VipRequest=yes (may be around 100 lines) do this affect some performance? (I use VPS) As I need to whitelist set of IP range.
I would expect that a regular expression like
is quite fast (but still consumes CPU cycles which can't be used by other threads).
How do you find out which IP addresses are from google? Do you analyse the User-Agent header (if so, why not defining a User-Agent rule)? And, why not just block 404 requests (even they are from google) as long as you don't have invalid links within your content? Guess the crawler will resume scanning your site later even it gets temporary blocked.
….just my two cents
Get an idea from you, I will regex for user-agent string.
If I set too many line of SetEnvIf Remote_Addr xx.xx QS_VipRequest=yes (may be around 100 lines to whitelist all IP range in my country) do this affect some performance?
Thank you very much
I've tested 100 rules
on a Intel U2300 1.20GHz CPU running a 32bit Apache server. It took 0.04 milliseconds to iterate through all rules. I would not expect any performance impact.
Many thank again for your test :)
I'm sorry if I asking too much question as I am very new to httpd
1. What's different between SetEnvIfPlus and SetEnvIf?
2. I need to totally block some bad bot in the first connection attemp (while I have "QS_ClientEventBlockCount 20"). Do below syntax is okay?
SetEnvIfNoCase User-Agent "MJ12bot" QS_Block
QS_ClientEventBlockCount 20 900
QS_SetEnvIfStatus 404 QS_Block
QS_SetEnvIfStatus NullConnection QS_Block
Please note that I need to block MJ12bot at the first connection (Do not count to 20 connection)
The code between mod_setenvif (standard Apache module) and mod_setenvifplus (optional module I have developed and therefore it was faster for me to equip it by some profiling code) is very similar and should have the same performance impact.
I would configure the following directives to block every request from a client sending the string MJ12bot within the User-Agent header: