Re: [Mod-security-developers] Fwd: Performance improvement plan

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On Tue, Aug 25, 2009 at 10:28 PM, Brian Rectanus<bre...@gm...> wrote:
>
> ...
>
>> 2. Change the individual rule processing code to avoid discovering
>> tfns at runtime. At the moment we are re-discovering tfns for every
>> variable, even though the action list never changes. Resolving the
>> tfns at configuration time should result a significant performance
>> improvement.
>
> You will need to re-build the list from SecRuleUpdateActionById at
> config time, which should be fine.  Just a stored list of tfn structs
> would be fine instead of building the list at runtime.  It should have
> been like that to begin with ;)

Sure, but then it would be more difficult to improve the performance ;)

>> 3. Cache entire target lists. Currently a lot of time is spent
>> resolving complex variables, adding and removing variables from the
>> target list, and transforming them in the end. I want to identify the
>> rules that share target lists, cache a reusable target list when it is
>> created for the first time, only to reuse it later. The cost of each
>> rule in the engine should go down a lot. In the worst case scenario we
>> waste a hash lookup (per rule). This should be countered by the
>> removal of the multiMatch lookup ;)
>
> Only issue I see here is with non-cacheable targets

Right. If there's even a single non-cacheable variable in a list, it
will make the list uncacheable.

> and targets that
> are listed in different orders.  We need a good way to "hash" the
> target list so that any rule that uses the same targets (at least the
> same cacheable targets and in any order) resolves to the same hashed
> value.

I am not going to address that in the first version, mostly because
variables are usually listed in the same order and also because
caching would cause us to change the order in which variables are
evaluated. I am guessing it's not important, but still.

> I had thought about using the ptr value of the tfn structure as this
> is unique per tfn and it does not have to survive a restart.

I am sorry, I don't follow what you're saying here.

> So, you have to pull the cached values, then merge in and transform
> any of the non-cacheable targets.  i would prefer if target order is
> maintained, which gives the rule writer some flexibility for
> optimizing the most likely targets first., but I am not sure that is a
> big deal.

Actually I don't want to bother with individual variables. I will only
identify identical variable lists and cache entire lists at a time. It
makes the code much simpler and the performance better.

>> I have a couple of other smaller things I may look into, but I suspect
>> the improvements from the above 3 changes will make me happy enough.
>
> One other *big* improvement that is needed is how SecAction/SecMarker
> work as well as skipping.  This was meant to just skip a few rules,
> but it is now being used to skip entire sections of rules and this is
> *very* slow.

I was looking at that code the other day. I don't think that it is
very slow right now (there's so much happening within one rule!), but
it may be perceived as very slow once the caching I add kicks in. I
don't think it's a big deal, but we'll see.

>> The plan is to benchmark with the available rule sets.
>>
>
> Sounds great.  It may also show some need for optimization in the
> rulesets as well.  Any plan for how you will be doing the benchmark?

Initially just the built-in performance measurement. I may try oprofile as well.

>> The only question is what to branch and when? I am hoping for the
>> improvements to be significant enough to warrant a middle digit bump
>> and I'd like to see them go into 2.6.0.
>
> The trunk is 2.6 (will be).  Work there as nothing but 2.5.x merges
> are getting commited there currently.

OK.

> thanks,
> -B
>

-- 
Ivan Ristic
Security assessment of your SSL servers
https://www.ssllabs.com/ssldb/