Thread: [Mod-security-developers] Performance improvement plan
Brought to you by:
victorhora,
zimmerletw
|
From: Ivan R. <iva...@gm...> - 2009-08-21 16:54:30
|
I am planning to improve the performance of the ModSecurity rule engine. I've reviewed the source code and come up with the following ideas: 1. Avoid table lookup currently used to determine if a rule is multiMatch. I don't suppose this is going to result with a significant improvement, but it's been getting on my nerves for years now and I want to get rid of it. I plan to add a variable to the rule structure and set it at configuration time. Before I do anything, however, I will investigate how SecRuleUpdateActionById interacts with action lists. 2. Change the individual rule processing code to avoid discovering tfns at runtime. At the moment we are re-discovering tfns for every variable, even though the action list never changes. Resolving the tfns at configuration time should result a significant performance improvement. 3. Cache entire target lists. Currently a lot of time is spent resolving complex variables, adding and removing variables from the target list, and transforming them in the end. I want to identify the rules that share target lists, cache a reusable target list when it is created for the first time, only to reuse it later. The cost of each rule in the engine should go down a lot. In the worst case scenario we waste a hash lookup (per rule). This should be countered by the removal of the multiMatch lookup ;) I have a couple of other smaller things I may look into, but I suspect the improvements from the above 3 changes will make me happy enough. The plan is to benchmark with the available rule sets. The only question is what to branch and when? I am hoping for the improvements to be significant enough to warrant a middle digit bump and I'd like to see them go into 2.6.0. -- Ivan Ristic Security assessment of your SSL servers https://www.ssllabs.com/ssldb/ |
|
From: Ivan R. <iva...@gm...> - 2009-08-25 20:38:27
|
Forwarding message, as the original did not show in the archives. Is it getting through? ---------- Forwarded message ---------- From: Ivan Ristic <iva...@gm...> Date: Fri, Aug 21, 2009 at 5:54 PM Subject: Performance improvement plan To: mod...@li... I am planning to improve the performance of the ModSecurity rule engine. I've reviewed the source code and come up with the following ideas: 1. Avoid table lookup currently used to determine if a rule is multiMatch. I don't suppose this is going to result with a significant improvement, but it's been getting on my nerves for years now and I want to get rid of it. I plan to add a variable to the rule structure and set it at configuration time. Before I do anything, however, I will investigate how SecRuleUpdateActionById interacts with action lists. 2. Change the individual rule processing code to avoid discovering tfns at runtime. At the moment we are re-discovering tfns for every variable, even though the action list never changes. Resolving the tfns at configuration time should result a significant performance improvement. 3. Cache entire target lists. Currently a lot of time is spent resolving complex variables, adding and removing variables from the target list, and transforming them in the end. I want to identify the rules that share target lists, cache a reusable target list when it is created for the first time, only to reuse it later. The cost of each rule in the engine should go down a lot. In the worst case scenario we waste a hash lookup (per rule). This should be countered by the removal of the multiMatch lookup ;) I have a couple of other smaller things I may look into, but I suspect the improvements from the above 3 changes will make me happy enough. The plan is to benchmark with the available rule sets. The only question is what to branch and when? I am hoping for the improvements to be significant enough to warrant a middle digit bump and I'd like to see them go into 2.6.0. -- Ivan Ristic Security assessment of your SSL servers https://www.ssllabs.com/ssldb/ |
|
From: Brian R. <bre...@gm...> - 2009-08-25 21:28:35
|
On Tue, Aug 25, 2009 at 1:38 PM, Ivan Ristic<iva...@gm...> wrote: > Forwarding message, as the original did not show in the archives. > > Is it getting through? I do not see any since Jan 2009. https://sourceforge.net/mailarchive/forum.php?forum_name=mod-security-developers > > > ---------- Forwarded message ---------- > From: Ivan Ristic <iva...@gm...> > Date: Fri, Aug 21, 2009 at 5:54 PM > Subject: Performance improvement plan > To: mod...@li... > > > I am planning to improve the performance of the ModSecurity rule > engine. I've reviewed the source code and come up with the following > ideas: > > 1. Avoid table lookup currently used to determine if a rule is > multiMatch. I don't suppose this is going to result with a significant > improvement, but it's been getting on my nerves for years now and I > want to get rid of it. I plan to add a variable to the rule structure > and set it at configuration time. Before I do anything, however, I > will investigate how SecRuleUpdateActionById interacts with action > lists. Agreed. > > 2. Change the individual rule processing code to avoid discovering > tfns at runtime. At the moment we are re-discovering tfns for every > variable, even though the action list never changes. Resolving the > tfns at configuration time should result a significant performance > improvement. You will need to re-build the list from SecRuleUpdateActionById at config time, which should be fine. Just a stored list of tfn structs would be fine instead of building the list at runtime. It should have been like that to begin with ;) The whole runtime "rule" structure needs to be direct-access instead of lookups, so you might as well remove all the lookups. > > 3. Cache entire target lists. Currently a lot of time is spent > resolving complex variables, adding and removing variables from the > target list, and transforming them in the end. I want to identify the > rules that share target lists, cache a reusable target list when it is > created for the first time, only to reuse it later. The cost of each > rule in the engine should go down a lot. In the worst case scenario we > waste a hash lookup (per rule). This should be countered by the > removal of the multiMatch lookup ;) Only issue I see here is with non-cacheable targets and targets that are listed in different orders. We need a good way to "hash" the target list so that any rule that uses the same targets (at least the same cacheable targets and in any order) resolves to the same hashed value. I had thought about using the ptr value of the tfn structure as this is unique per tfn and it does not have to survive a restart. So, you have to pull the cached values, then merge in and transform any of the non-cacheable targets. i would prefer if target order is maintained, which gives the rule writer some flexibility for optimizing the most likely targets first., but I am not sure that is a big deal. > > I have a couple of other smaller things I may look into, but I suspect > the improvements from the above 3 changes will make me happy enough. One other *big* improvement that is needed is how SecAction/SecMarker work as well as skipping. This was meant to just skip a few rules, but it is now being used to skip entire sections of rules and this is *very* slow. We need a way to do a real skipAfter without nop'ing each rule in between. This has to also handle rule removal/updating at config time *and* runtime, which is where it gets tricky. Originally I was thinking a linked list with a pointer to the "next rule" to run on a match success/failure. But the accounting is tricky with ctl:ruleRemovebyId. > > The plan is to benchmark with the available rule sets. > Sounds great. It may also show some need for optimization in the rulesets as well. Any plan for how you will be doing the benchmark? > The only question is what to branch and when? I am hoping for the > improvements to be significant enough to warrant a middle digit bump > and I'd like to see them go into 2.6.0. The trunk is 2.6 (will be). Work there as nothing but 2.5.x merges are getting commited there currently. thanks, -B |
|
From: Ivan R. <iva...@gm...> - 2009-08-27 16:30:10
|
On Tue, Aug 25, 2009 at 10:28 PM, Brian Rectanus<bre...@gm...> wrote: > > ... > >> 2. Change the individual rule processing code to avoid discovering >> tfns at runtime. At the moment we are re-discovering tfns for every >> variable, even though the action list never changes. Resolving the >> tfns at configuration time should result a significant performance >> improvement. > > You will need to re-build the list from SecRuleUpdateActionById at > config time, which should be fine. Just a stored list of tfn structs > would be fine instead of building the list at runtime. It should have > been like that to begin with ;) Sure, but then it would be more difficult to improve the performance ;) >> 3. Cache entire target lists. Currently a lot of time is spent >> resolving complex variables, adding and removing variables from the >> target list, and transforming them in the end. I want to identify the >> rules that share target lists, cache a reusable target list when it is >> created for the first time, only to reuse it later. The cost of each >> rule in the engine should go down a lot. In the worst case scenario we >> waste a hash lookup (per rule). This should be countered by the >> removal of the multiMatch lookup ;) > > Only issue I see here is with non-cacheable targets Right. If there's even a single non-cacheable variable in a list, it will make the list uncacheable. > and targets that > are listed in different orders. We need a good way to "hash" the > target list so that any rule that uses the same targets (at least the > same cacheable targets and in any order) resolves to the same hashed > value. I am not going to address that in the first version, mostly because variables are usually listed in the same order and also because caching would cause us to change the order in which variables are evaluated. I am guessing it's not important, but still. > I had thought about using the ptr value of the tfn structure as this > is unique per tfn and it does not have to survive a restart. I am sorry, I don't follow what you're saying here. > So, you have to pull the cached values, then merge in and transform > any of the non-cacheable targets. i would prefer if target order is > maintained, which gives the rule writer some flexibility for > optimizing the most likely targets first., but I am not sure that is a > big deal. Actually I don't want to bother with individual variables. I will only identify identical variable lists and cache entire lists at a time. It makes the code much simpler and the performance better. >> I have a couple of other smaller things I may look into, but I suspect >> the improvements from the above 3 changes will make me happy enough. > > One other *big* improvement that is needed is how SecAction/SecMarker > work as well as skipping. This was meant to just skip a few rules, > but it is now being used to skip entire sections of rules and this is > *very* slow. I was looking at that code the other day. I don't think that it is very slow right now (there's so much happening within one rule!), but it may be perceived as very slow once the caching I add kicks in. I don't think it's a big deal, but we'll see. >> The plan is to benchmark with the available rule sets. >> > > Sounds great. It may also show some need for optimization in the > rulesets as well. Any plan for how you will be doing the benchmark? Initially just the built-in performance measurement. I may try oprofile as well. >> The only question is what to branch and when? I am hoping for the >> improvements to be significant enough to warrant a middle digit bump >> and I'd like to see them go into 2.6.0. > > The trunk is 2.6 (will be). Work there as nothing but 2.5.x merges > are getting commited there currently. OK. > thanks, > -B > -- Ivan Ristic Security assessment of your SSL servers https://www.ssllabs.com/ssldb/ |