Re: [mod-security-users] mod_security causing Apache 1.3.33 to ha ng

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Ivan Ristic wrote:

>   Windows because of the smaller stack size. If I recall
>   correctly PCRE uses recursion for subexpressions internally,
>   which leads to stack space consumption when the regex
>   is applied to a long string.

For performance reasons, all regular expressions should be simplified as 
much as possible.  Under the wrong circumstances, they can end up using 
lots of resources.  For instance, expressions should be greedy whenever 
possible.  The expression /<.+>/ will match "<head>" but will also 
search "<head> blah blah blah blah blah ..." until the end of the string 
to determine if the ">" is a part of the "." or not.  It will also match 
"<head><title>HTML Injection Attack</title></head>" even though it would 
be sufficient to stop at "<head>" if you're just trying to reject HTML 
tags of any kind.  So a more efficient version that prevents all kinds 
of recursive backtracking would be the greedy one /<.+?>/.

But still, any filter that looks for one or two characters followed by 
".+" or even ".+?" is going to be a likely resource hog during false 
positives.  To cut down on this, try to add as much detail to an 
expression as possible.  Using character classes to reduce the set of 
characters that will match can both cut down on false positives and also 
significantly reduce the recursion on each string.  For instance, if an 
HTML tag cannot start with a number, then using the expression 
/<\s*[^\d].+?>/ will prevent the regex engine from searching a term such 
as "if x < 5, then z = 0 blah blah blah...." all the way to the end of 
the string.  We've added more detail before the ".+?" part.

This might be a bad example since most HTML engines will just ignore a 
number at the beginning of a tag, but then again, an HTML tag -- being 
an enclosure of just about any size string -- is just too fungible to 
efficiently identify and flag with a filter directive anyway.  Better 
instead would be to sanitize your input so that HTML tags are made 
impossible by escaping the tag symbols themselves.  But you can't just 
do this for every input ever passed into Apache, as some maybe shouldn't 
be mutilated in this way if they're ultimately never going to be 
displayed on a web page.  Ideally, the script that handles this input 
should do its own sanitizing.  I'm not sure if you can use mod_security 
to do this, but maybe you can try something like:

SecFilterSelective THE_REQUEST "vulnerable-script-name" chain
SecFilterSelective ARG_SANITIZEME "(<|>)" "exec:html_escape.pl"

But I don't think the exec'd script gets passed the info or inserts 
anything back into the string.  Ideally "html_escape.pl" would be passed 
the "ARG_SANITIZEME" content on STDIN and then mod_security would 
replace "ARG_SANITIZEME" with the output of "html_escape.pl".  That 
would be a true external filter, similar to how procmail works.  Ivan, 
correct me if I'm wrong in saying that you can't do using mod_security 
what I'm suggesting would be the right technique.  Actually, ideally you 
could do this:

SecFilterSelective THE_REQUEST "vulnerable-script-name" chain
SecFilterSelective ARG_SANITIZEME s/</&lt;/
SecFilterSelective ARG_SANITIZEME s/>/&gt;/

But that too wouldn't work in mod_security I believe.  Is this something 
that could be added in future versions?  Or maybe even a new directive 
specifically for html escaping input?  Something like:

SecFilterSelective THE_REQUEST "vulnerable-script-name" chain
SecFilterHTMLEscape ARG_SANITIZEME

I think it would be extremely useful to be able to modify request 
content in this way rather than just flagging it.

Tom