Thank you Ryan
I don’t want to use too much resources and slow down the server by doing remote host lookups.
As User-Agent and Accept-Headers can be faked, this leaves only the list of known IP of legitimate bots to use as verification.
Supposing I can work out such a list, I would need a rule that allows checking several possible allowed IPs.
Reading here and there, I found it’s apparently possible to use regex to match a range of IPs.
n I also saw “@pm, which apparently allows to include a list of arguments to a rule which will trigger if any one of the arguments matches. Is that correct?
So in order for a rule to match any one of the specified IP range regex, I thought the following might work:
SecRule REMOTE_ADDR "pm@
Is that correct?
On 11/9/13 9:36 AM, "TGWM" <firstname.lastname@example.org> wrote:
Ok, thanks for your help harald :)
Von: Reindl Harald [mailto:email@example.com]
Gesendet: Samstag, 9. November 2013 21:35
Betreff: Re: [mod-security-users] Whitelisting Google and other bots
a missing user-agent or accept-header is not an attack think what a rule
gains you and the answer if it is safe to disable it becomes logically
in our infrastructure hosting some hundret domains and regulary security
audited internally and from third parties they are years ago
Wow, lots of items here to comment on -
1) The rules in the https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/master/base_rules/modsecurity_crs_21_protocol_anomalies.conf are mainly geared towards identify non-browser related requests as all real browsers send the following headers -
The usefulness of these rules are for -
A) Tagging clients as scripts/bots - in the IP persistent collection and then treating them differently, or
B) Using these signatures to contribute to a transactional/session anomaly score rather then blocking on them themselves.
2) Consider using Anomaly Scoring Mode vs. Traditional Mode -
The benefit here is that those 21 Protocol Anomalies will not block on their own but still get to contribute to a transactional anomaly score. The other benefit is the you can adjust the Inbound Blocking Threshold to a setting that is tolarable for your site. Some sites are more risk adversed and have a much lower setting than others. Also note that you can adjust the Logging aspect of these rules -
If you want these individual rules to contribute to the anomaly score, but you do not want them to clutter up the Apache Error Log, then use this setting -
This will have the individual rules only log to the Audit log file. These are considered "Reference Events" and can be reviewed if a transaction is blocked. In this mode, only 1 or 2 alerts would show up in the Apache Error Log - these are the Correlated Event alerts.
3) Back to your specific use-case - if you want to disable certain signatures from legit Search Engine bots, then you do need to correlate a few items as you stated. A few items to consider -
A) In order to use REMOTE_HOST - Apache must have HostnameLookups directive enabled - http://httpd.apache.org/docs/2.2/mod/core.html#hostnamelookups. Most sites don't do this for performance reasons. What you could do is to check the User-Agent header field and if it says something like "googlebot" then you can fire off a Lua script that can do nslookups and verify that the IP address does resolve to Google. See this past mail-list thread - http://marc.info/?l=mod-security-users&m=132794762123753
B) If you decide that the client is a real search engine bot, then you can use - ctl:ruleRemoveById=960015 to remove that rule only for the current transaction.
4) As Reindl mentioned - you don't want to blindly whitelist Search Engine bots as they could in fact be being tricked into sending actual attacks -
Hope this info helps.
ModSecurity Project Leader
OWASP ModSecurity CRS Project Leader
This transmission may contain information that is privileged, confidential, and/or exempt from disclosure under applicable law. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution, or use of the information contained herein (including any reliance thereon) is strictly prohibited. If you received this transmission in error, please immediately contact the sender and destroy the material in its entirety, whether in electronic or hard copy format.