Hi Felipe,
On Thu, Apr 5, 2018 at 6:33 AM, Felipe Zimmerle <fe...@zi...> wrote:
> Hi,
>
> There are few things in the code that can be improved. There are even
> "TODO:" marking. But at certain point you may want to look at the rules.
> That is very important.
>
I'm not sure if I emphasized my last point well enough. Yes, the design of
the rules impacts performance (that's a given in a data-driven model), but
the engine itself suffers from some severe limitations at this point.
I created a very dumb, sizeable set of rules: https://gist.github.com
/p0pr0ck5/e0c73606f0be8ab93edb729e6cb56c5d
Using the simple harness I mentioned earlier (https://gist.github.com/p0pr0
ck5/9b2c414641c9b03d527679d0c8cb7d86), we see a sizeable decrease in
performance as the number of dumb rules increases. With 200 rules, a single
process can evaluate around 800 full transactions per second. When we
double the number of rules processed (remaining equally distributed
throughout phases 1-5), around 450 transactions are processed, and another
doubling in rule size (to 800 rules) results in a throughput of about 200
transactions/sec.
Of course, this simple example does a lot of unnecessary memory
transactions with ModSecurity objects; the Nginx connector is a bit more
performant. With 800 rules, a single worker process reports still only
triple-digit RPS:
$ wrk -t 5 -c 50 -d 5s http://localhost:8080/index.html
Running 5s test @ http://localhost:8080/index.html
5 threads and 50 connections
Thread Stats Avg Stdev Max +/- Stdev
Latency 133.28ms 230.40ms 1.92s 92.92%
Req/Sec 133.53 53.89 410.00 77.39%
3085 requests in 5.03s, 2.50MB read
Requests/sec: 612.94
Transfer/sec: 508.79KB
Yes, this is better than the CRS performance, but it's still orders of
magnitude from what Nginx is capable of processing on its own. So we cannot
say that such poor performance with the CRS is related to the design of the
ruleset itself. There is a clear relationship between the *number* of rules
that lobmodsecurity has to process, and performance. This further confirms
my original assertions, that there are fundamental limitations in core
ModSecurity code that hinders performance. Furthermore there are no TODO
comments regarding the performance of Rule object evaluation:
poprocks@mini-vm:~/src/ModSecurity/src$ git grep TODO
parser/seclang-scanner.cc:/* TODO: this is always defined, so inline it */
parser/seclang-scanner.cc: /** TODO: Implement the server
logging mechanism. */
parser/seclang-scanner.cc: /* TODO. We should be able to replace this
entire function body
parser/seclang-scanner.ll: /** TODO: Implement the server
logging mechanism. */
request_body_processor/json.cc: * TODO: make UTF8 validation optional,
as it depends on Content-Encoding
transaction.cc: /** TODO: Check variable */
transaction.cc: /** TODO: Check variable */
transaction.cc: /** TODO: Check variable */
transaction.cc: /** TODO: Check variable */
transaction.cc: /** TODO: write audit_log D part. */
transaction.cc: /** TODO: write audit_log G part. */
transaction.cc: /** TODO: write audit_log H part. */
transaction.cc: /** TODO: write audit_log I part. */
transaction.cc: /** TODO: write audit_log J part. */
transaction.cc: /** TODO: write audit_log K part. */
utils/acmp.cc: * TODO: This code comes from ModSecurity 2.9.0 there are two
memory leaks here
utils/msc_tree.h: * TODO: This is an improved copy of the ModSecurity 2.9
file, this may need
There is a lot of needlessly repeated work done in Rule::evaluate.
Collection values, exemptions, transformations, etc., these can all be
cached based on some key relevant to the rule. I implemented a lot of these
design patterns for lua-resty-waf, and it's really helped with performance;
I'd love to share some detailed thoughts on this in a development
discussion setting. And, as flamegraphs show, there's a lot of std
container allocation/management that's done that I suspect could be
optimized away. But these are fundamental design changes that would have a
major impact, and would require careful study and planning. This is
something I doubt could be handled by a community contribution. Ultimately
we're talking about a refactor of some significant hot path code. And
Felipe, please understand that I and the community deeply respect the work
that's been done on this project, but frankly this doesn't seem to be a
priority for you, based on the responses in this thread. And if in your
view I'm way off base in my assumptions, I'd love to hear it, and review
some data and example execution that contradicts what I, Christian, Jai,
and Andrei have shown here, and if not, at least an acknowledgement from
either Trustwave or Nginx that the inclusion of Modsecurity into Nginx
results in a substantial, meaningful variance in throughput capacity.
|