Thread: [Mod-security-developers] Performance improvement plan

Brought to you by: victorhora, zimmerletw

mod-security-developers

[Mod-security-developers] Performance improvement plan

From: Ivan R. <iva...@gm...> - 2009-08-21 16:54:30

I am planning to improve the performance of the ModSecurity rule
engine. I've reviewed the source code and come up with the following
ideas:

1. Avoid table lookup currently used to determine if a rule is
multiMatch. I don't suppose this is going to result with a significant
improvement, but it's been getting on my nerves for years now and I
want to get rid of it. I plan to add a variable to the rule structure
and set it at configuration time. Before I do anything, however, I
will investigate how SecRuleUpdateActionById interacts with action
lists.

2. Change the individual rule processing code to avoid discovering
tfns at runtime. At the moment we are re-discovering tfns for every
variable, even though the action list never changes. Resolving the
tfns at configuration time should result a significant performance
improvement.

3. Cache entire target lists. Currently a lot of time is spent
resolving complex variables, adding and removing variables from the
target list, and transforming them in the end. I want to identify the
rules that share target lists, cache a reusable target list when it is
created for the first time, only to reuse it later. The cost of each
rule in the engine should go down a lot. In the worst case scenario we
waste a hash lookup (per rule). This should be countered by the
removal of the multiMatch lookup ;)

I have a couple of other smaller things I may look into, but I suspect
the improvements from the above 3 changes will make me happy enough.

The plan is to benchmark with the available rule sets.

The only question is what to branch and when? I am hoping for the
improvements to be significant enough to warrant a middle digit bump
and I'd like to see them go into 2.6.0.

-- 
Ivan Ristic
Security assessment of your SSL servers
https://www.ssllabs.com/ssldb/

[Mod-security-developers] Fwd: Performance improvement plan

From: Ivan R. <iva...@gm...> - 2009-08-25 20:38:27

Forwarding message, as the original did not show in the archives.

Is it getting through?

---------- Forwarded message ----------
From: Ivan Ristic <iva...@gm...>
Date: Fri, Aug 21, 2009 at 5:54 PM
Subject: Performance improvement plan
To: mod...@li...

I am planning to improve the performance of the ModSecurity rule
engine. I've reviewed the source code and come up with the following
ideas:

1. Avoid table lookup currently used to determine if a rule is
multiMatch. I don't suppose this is going to result with a significant
improvement, but it's been getting on my nerves for years now and I
want to get rid of it. I plan to add a variable to the rule structure
and set it at configuration time. Before I do anything, however, I
will investigate how SecRuleUpdateActionById interacts with action
lists.

2. Change the individual rule processing code to avoid discovering
tfns at runtime. At the moment we are re-discovering tfns for every
variable, even though the action list never changes. Resolving the
tfns at configuration time should result a significant performance
improvement.

3. Cache entire target lists. Currently a lot of time is spent
resolving complex variables, adding and removing variables from the
target list, and transforming them in the end. I want to identify the
rules that share target lists, cache a reusable target list when it is
created for the first time, only to reuse it later. The cost of each
rule in the engine should go down a lot. In the worst case scenario we
waste a hash lookup (per rule). This should be countered by the
removal of the multiMatch lookup ;)

I have a couple of other smaller things I may look into, but I suspect
the improvements from the above 3 changes will make me happy enough.

The plan is to benchmark with the available rule sets.

The only question is what to branch and when? I am hoping for the
improvements to be significant enough to warrant a middle digit bump
and I'd like to see them go into 2.6.0.

-- 
Ivan Ristic
Security assessment of your SSL servers
https://www.ssllabs.com/ssldb/

Re: [Mod-security-developers] Fwd: Performance improvement plan

From: Brian R. <bre...@gm...> - 2009-08-25 21:28:35

On Tue, Aug 25, 2009 at 1:38 PM, Ivan Ristic<iva...@gm...> wrote:
> Forwarding message, as the original did not show in the archives.
>
> Is it getting through?

I do not see any since Jan 2009.

https://sourceforge.net/mailarchive/forum.php?forum_name=mod-security-developers

>
>
> ---------- Forwarded message ----------
> From: Ivan Ristic <iva...@gm...>
> Date: Fri, Aug 21, 2009 at 5:54 PM
> Subject: Performance improvement plan
> To: mod...@li...
>
>
> I am planning to improve the performance of the ModSecurity rule
> engine. I've reviewed the source code and come up with the following
> ideas:
>
> 1. Avoid table lookup currently used to determine if a rule is
> multiMatch. I don't suppose this is going to result with a significant
> improvement, but it's been getting on my nerves for years now and I
> want to get rid of it. I plan to add a variable to the rule structure
> and set it at configuration time. Before I do anything, however, I
> will investigate how SecRuleUpdateActionById interacts with action
> lists.

Agreed.

>
> 2. Change the individual rule processing code to avoid discovering
> tfns at runtime. At the moment we are re-discovering tfns for every
> variable, even though the action list never changes. Resolving the
> tfns at configuration time should result a significant performance
> improvement.

You will need to re-build the list from SecRuleUpdateActionById at
config time, which should be fine.  Just a stored list of tfn structs
would be fine instead of building the list at runtime.  It should have
been like that to begin with ;)

The whole runtime "rule" structure needs to be direct-access instead
of lookups, so you might as well remove all the lookups.

>
> 3. Cache entire target lists. Currently a lot of time is spent
> resolving complex variables, adding and removing variables from the
> target list, and transforming them in the end. I want to identify the
> rules that share target lists, cache a reusable target list when it is
> created for the first time, only to reuse it later. The cost of each
> rule in the engine should go down a lot. In the worst case scenario we
> waste a hash lookup (per rule). This should be countered by the
> removal of the multiMatch lookup ;)

Only issue I see here is with non-cacheable targets and targets that
are listed in different orders.  We need a good way to "hash" the
target list so that any rule that uses the same targets (at least the
same cacheable targets and in any order) resolves to the same hashed
value.

I had thought about using the ptr value of the tfn structure as this
is unique per tfn and it does not have to survive a restart.

So, you have to pull the cached values, then merge in and transform
any of the non-cacheable targets.  i would prefer if target order is
maintained, which gives the rule writer some flexibility for
optimizing the most likely targets first., but I am not sure that is a
big deal.

>
> I have a couple of other smaller things I may look into, but I suspect
> the improvements from the above 3 changes will make me happy enough.

One other *big* improvement that is needed is how SecAction/SecMarker
work as well as skipping.  This was meant to just skip a few rules,
but it is now being used to skip entire sections of rules and this is
*very* slow.

We need a way to do a real skipAfter without nop'ing each rule in
between.  This has to also handle rule removal/updating at config time
*and* runtime, which is where it gets tricky.

Originally I was thinking a linked list with a pointer to the "next
rule" to run on a match success/failure.  But the accounting is tricky
with ctl:ruleRemovebyId.

>
> The plan is to benchmark with the available rule sets.
>

Sounds great.  It may also show some need for optimization in the
rulesets as well.  Any plan for how you will be doing the benchmark?

> The only question is what to branch and when? I am hoping for the
> improvements to be significant enough to warrant a middle digit bump
> and I'd like to see them go into 2.6.0.

The trunk is 2.6 (will be).  Work there as nothing but 2.5.x merges
are getting commited there currently.

thanks,
-B

Re: [Mod-security-developers] Fwd: Performance improvement plan

From: Ivan R. <iva...@gm...> - 2009-08-27 16:30:10

On Tue, Aug 25, 2009 at 10:28 PM, Brian Rectanus<bre...@gm...> wrote:
>
> ...
>
>> 2. Change the individual rule processing code to avoid discovering
>> tfns at runtime. At the moment we are re-discovering tfns for every
>> variable, even though the action list never changes. Resolving the
>> tfns at configuration time should result a significant performance
>> improvement.
>
> You will need to re-build the list from SecRuleUpdateActionById at
> config time, which should be fine.  Just a stored list of tfn structs
> would be fine instead of building the list at runtime.  It should have
> been like that to begin with ;)

Sure, but then it would be more difficult to improve the performance ;)


>> 3. Cache entire target lists. Currently a lot of time is spent
>> resolving complex variables, adding and removing variables from the
>> target list, and transforming them in the end. I want to identify the
>> rules that share target lists, cache a reusable target list when it is
>> created for the first time, only to reuse it later. The cost of each
>> rule in the engine should go down a lot. In the worst case scenario we
>> waste a hash lookup (per rule). This should be countered by the
>> removal of the multiMatch lookup ;)
>
> Only issue I see here is with non-cacheable targets

Right. If there's even a single non-cacheable variable in a list, it
will make the list uncacheable.

> and targets that
> are listed in different orders.  We need a good way to "hash" the
> target list so that any rule that uses the same targets (at least the
> same cacheable targets and in any order) resolves to the same hashed
> value.

I am not going to address that in the first version, mostly because
variables are usually listed in the same order and also because
caching would cause us to change the order in which variables are
evaluated. I am guessing it's not important, but still.


> I had thought about using the ptr value of the tfn structure as this
> is unique per tfn and it does not have to survive a restart.

I am sorry, I don't follow what you're saying here.


> So, you have to pull the cached values, then merge in and transform
> any of the non-cacheable targets.  i would prefer if target order is
> maintained, which gives the rule writer some flexibility for
> optimizing the most likely targets first., but I am not sure that is a
> big deal.

Actually I don't want to bother with individual variables. I will only
identify identical variable lists and cache entire lists at a time. It
makes the code much simpler and the performance better.


>> I have a couple of other smaller things I may look into, but I suspect
>> the improvements from the above 3 changes will make me happy enough.
>
> One other *big* improvement that is needed is how SecAction/SecMarker
> work as well as skipping.  This was meant to just skip a few rules,
> but it is now being used to skip entire sections of rules and this is
> *very* slow.

I was looking at that code the other day. I don't think that it is
very slow right now (there's so much happening within one rule!), but
it may be perceived as very slow once the caching I add kicks in. I
don't think it's a big deal, but we'll see.


>> The plan is to benchmark with the available rule sets.
>>
>
> Sounds great.  It may also show some need for optimization in the
> rulesets as well.  Any plan for how you will be doing the benchmark?

Initially just the built-in performance measurement. I may try oprofile as well.


>> The only question is what to branch and when? I am hoping for the
>> improvements to be significant enough to warrant a middle digit bump
>> and I'd like to see them go into 2.6.0.
>
> The trunk is 2.6 (will be).  Work there as nothing but 2.5.x merges
> are getting commited there currently.

OK.


> thanks,
> -B
>



-- 
Ivan Ristic
Security assessment of your SSL servers
https://www.ssllabs.com/ssldb/