Re: [mod-security-users] Whitelisting Google and other bots

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Thank you Ryan

I dont want to use too much resources and slow down the server by doing
remote host lookups.

As User-Agent and Accept-Headers can be faked, this leaves only the list of
known IP of legitimate bots to use as verification. 

Supposing I can work out such a list, I would need a rule that allows
checking several possible allowed IPs.

Reading here and there, I found its apparently possible to use regex to
match a range of IPs.

n I also saw  <mailto:n%20I%20also%20saw%20@pm> @pm, which apparently
allows to include a list of arguments to a rule which will trigger if any
one of the arguments matches. Is that correct?

So in order for a rule to match any one of the specified IP range regex, I
thought the following might work:

SecRule REMOTE_ADDR "pm@

^192\.168\.[0-1]{1}\.[0-9]{1,3}$|^193\.168\.[0-1]{1}\.[0-9]{1,3}$|^194\.168\

.[0-1]{1}\.[0-9]{1,3}$

Is that correct?

Von: Ryan Barnett [mailto:RBa...@tr...] 
Gesendet: Samstag, 9. November 2013 22:28
An: mod...@li...
Betreff: Re: [mod-security-users] Whitelisting Google and other bots

On 11/9/13 9:36 AM, "TGWM" <tg...@gm...> wrote:

Ok, thanks for your help harald :)

-----Ursprüngliche Nachricht-----

Von: Reindl Harald [mailto:h.r...@th...] 

Gesendet: Samstag, 9. November 2013 21:35

An: mod...@li...

Betreff: Re: [mod-security-users] Whitelisting Google and other bots

a missing user-agent or accept-header is not an attack think what a rule

gains you and the answer if it is safe to disable it becomes logically

in our infrastructure hosting some hundret domains and regulary security

audited internally and from third parties they are years ago

Wow, lots of items here to comment on -

1) The rules in the
https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/master/base_rules/m
odsecurity_crs_21_protocol_anomalies.conf are mainly geared towards identify
non-browser related requests as all real browsers send the following headers
-

- Host

- User-Agent

- Accept

The usefulness of these rules are for -

  A) Tagging clients as scripts/bots - in the IP persistent collection and
then treating them differently, or

  B) Using these signatures to contribute to a transactional/session anomaly
score rather then blocking on them themselves.

2) Consider using Anomaly Scoring Mode vs. Traditional Mode - 

http://blog.spiderlabs.com/2010/11/advanced-topic-of-the-week-traditional-vs
-anomaly-scoring-detection-modes.html

The benefit here is that those 21 Protocol Anomalies will not block on their
own but still get to contribute to a transactional anomaly score.  The other
benefit is the you can adjust the Inbound Blocking Threshold to a setting
that is tolarable for your site.  Some sites are more risk adversed and have
a much lower setting than others.  Also note that you can adjust the Logging
aspect of these rules -

https://github.com/SpiderLabs/owasp-modsecurity-crs/blob/master/modsecurity_
crs_10_setup.conf.example#L42-66

If you want these individual rules to contribute to the anomaly score, but
you do not want them to clutter up the Apache Error Log, then use this
setting -

SecDefaultAction "phase:1,pass,nolog,auditlog"

This will have the individual rules only log to the Audit log file.  These
are considered "Reference Events" and can be reviewed if a transaction is
blocked.  In this mode, only 1 or 2 alerts would show up in the Apache Error
Log - these are the Correlated Event alerts.

3) Back to your specific use-case - if you want to disable certain
signatures from legit Search Engine bots, then you do need to correlate a
few items as you stated.  A few items to consider -

  A) In order to use REMOTE_HOST - Apache must have HostnameLookups
directive enabled -
http://httpd.apache.org/docs/2.2/mod/core.html#hostnamelookups.  Most sites
don't do this for performance reasons.  What you could do is to check the
User-Agent header field and if it says something like "googlebot" then you
can fire off a Lua script that can do nslookups and verify that the IP
address does resolve to Google.  See this past mail-list thread -
http://marc.info/?l=mod-security-users&m=132794762123753

  B) If you decide that the client is a real search engine bot, then you can
use - ctl:ruleRemoveById=960015 to remove that rule only for the current
transaction.

4) As Reindl mentioned - you don't want to blindly whitelist Search Engine
bots as they could in fact be being tricked into sending actual attacks - 

http://arstechnica.com/security/2013/11/google-crawler-tricked-into-performi
ng-sql-injection-attacks-using-decade-old-technique/

Hope this info helps.

-- 

Ryan Barnett
Trustwave SpiderLabs 

ModSecurity Project Leader
OWASP ModSecurity CRS Project Leader

  _____  

This transmission may contain information that is privileged, confidential,
and/or exempt from disclosure under applicable law. If you are not the
intended recipient, you are hereby notified that any disclosure, copying,
distribution, or use of the information contained herein (including any
reliance thereon) is strictly prohibited. If you received this transmission
in error, please immediately contact the sender and destroy the material in
its entirety, whether in electronic or hard copy format.