Re: [Mod-security-developers] Fwd: Performance improvement plan

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Aug 25, 2009 at 1:38 PM, Ivan Ristic<iva...@gm...> wrote:
> Forwarding message, as the original did not show in the archives.
>
> Is it getting through?

I do not see any since Jan 2009.

https://sourceforge.net/mailarchive/forum.php?forum_name=mod-security-developers

>
>
> ---------- Forwarded message ----------
> From: Ivan Ristic <iva...@gm...>
> Date: Fri, Aug 21, 2009 at 5:54 PM
> Subject: Performance improvement plan
> To: mod...@li...
>
>
> I am planning to improve the performance of the ModSecurity rule
> engine. I've reviewed the source code and come up with the following
> ideas:
>
> 1. Avoid table lookup currently used to determine if a rule is
> multiMatch. I don't suppose this is going to result with a significant
> improvement, but it's been getting on my nerves for years now and I
> want to get rid of it. I plan to add a variable to the rule structure
> and set it at configuration time. Before I do anything, however, I
> will investigate how SecRuleUpdateActionById interacts with action
> lists.

Agreed.

>
> 2. Change the individual rule processing code to avoid discovering
> tfns at runtime. At the moment we are re-discovering tfns for every
> variable, even though the action list never changes. Resolving the
> tfns at configuration time should result a significant performance
> improvement.

You will need to re-build the list from SecRuleUpdateActionById at
config time, which should be fine.  Just a stored list of tfn structs
would be fine instead of building the list at runtime.  It should have
been like that to begin with ;)

The whole runtime "rule" structure needs to be direct-access instead
of lookups, so you might as well remove all the lookups.

>
> 3. Cache entire target lists. Currently a lot of time is spent
> resolving complex variables, adding and removing variables from the
> target list, and transforming them in the end. I want to identify the
> rules that share target lists, cache a reusable target list when it is
> created for the first time, only to reuse it later. The cost of each
> rule in the engine should go down a lot. In the worst case scenario we
> waste a hash lookup (per rule). This should be countered by the
> removal of the multiMatch lookup ;)

Only issue I see here is with non-cacheable targets and targets that
are listed in different orders.  We need a good way to "hash" the
target list so that any rule that uses the same targets (at least the
same cacheable targets and in any order) resolves to the same hashed
value.

I had thought about using the ptr value of the tfn structure as this
is unique per tfn and it does not have to survive a restart.

So, you have to pull the cached values, then merge in and transform
any of the non-cacheable targets.  i would prefer if target order is
maintained, which gives the rule writer some flexibility for
optimizing the most likely targets first., but I am not sure that is a
big deal.

>
> I have a couple of other smaller things I may look into, but I suspect
> the improvements from the above 3 changes will make me happy enough.

One other *big* improvement that is needed is how SecAction/SecMarker
work as well as skipping.  This was meant to just skip a few rules,
but it is now being used to skip entire sections of rules and this is
*very* slow.

We need a way to do a real skipAfter without nop'ing each rule in
between.  This has to also handle rule removal/updating at config time
*and* runtime, which is where it gets tricky.

Originally I was thinking a linked list with a pointer to the "next
rule" to run on a match success/failure.  But the accounting is tricky
with ctl:ruleRemovebyId.

>
> The plan is to benchmark with the available rule sets.
>

Sounds great.  It may also show some need for optimization in the
rulesets as well.  Any plan for how you will be doing the benchmark?

> The only question is what to branch and when? I am hoping for the
> improvements to be significant enough to warrant a middle digit bump
> and I'd like to see them go into 2.6.0.

The trunk is 2.6 (will be).  Work there as nothing but 2.5.x merges
are getting commited there currently.

thanks,
-B