Re: [Sqlgrey-users] idea for improving Discrimination feature

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 02/02/07, Dan Faerch <da...@ha...> wrote:
> Riaan Kok wrote:
> > a case that I saw in the logs intrigued me, so I did a quick lookup of
> > the
> > keys() function and read a bit about hashes..  Perl's hash structure
> > is not
> > ordered in any way.  So, iterating through a hash returns the elements in
> > undefined order..
> Yeah.. Basically the rules gets ordered by postfix_attrib.
> Postfix_attrib's gets randomly ordered.
> So eg. all helo_name's might get run first, then all client_name and so
> forth.
>
> > There's not much point in
> > gathering statistics then!
> I added the rule_nr to help our support department. Apparently, many of
> our customers have mailserver-software that acts REALLY weird on 45X
> errors. Some bounce, some send a mail that looks like a bounce telling
> the sender that it got a 45X but will keep trying and more odd stuff
> like that.
> So to enable our support department to help the customers bypass our
> rules, they needed to know what rule nailed the client.

Agree, I can see it being useful for this purpose.

> And i personally like to sometimes grep for which rules nails most. I
> dont see what you gain by knowing which other rules didnt catch the spammer.

It's just that, if you're curious about statistics, the random order
of the hash list of rules means that the only way of knowing what
percentage of connections get nailed by a rule is to have only one
rule.
For example, say your first rule checks for "unknown" clients, and
statistically, 50% of *all* connections gets nailed by this one.  Your
second rule checks for dialup/dsl accounts, and lets for arguments
sake assume that all ISPs reliably give dns names for all accounts.
Therefore, anything caught by rule2 would never be caught by rule1 -
they're mostly statistically independent.  Also, say that rule2
catches 20% of all traffic.  Now have a few other rules 3-6 about
which you care less to know any statistics about.  If the code would
step through the rules in order, grep counts will converge to 50%,
20%, (and whatever the rest may be.)  The later rules will yield less
statistical information because the circumstances of processing will
become conditional (but there's still *some* information there).  And,
if the logs (or support call) say "451 greylisted blah blah 5 minutes
blah (rule 5)", you will simply *know* that for that instance, rules
1-4 did not trigger greylisting, and rule 6 were never seen.

Now, currently, if the logs were to give that same message, the only
information you will have is that rule 5 was triggered.  We have no
way of knowing what number of rules were checked before this one (and
in what order).  Because of this, counting the appearance of a rule
number in the logs will give you a number with very little meaning.
It's a vague indication, at best.

Another advantage of knowing the order in which rules will execute is
that, in production, you can place cheap and broad rules first, and
more expensive rules last (such as that badass rule for catching
dynamic IP client hostnames in dyn_fqdn.regexp).  If your traffic is
sufficient, your CPU might just appreciate it..

> > So, how about rather storing the list of rules in an array, which does
> > away
> > with the need for storing the $rulenr, and each array item like $rule
> > containing:
> > $rule->{attrib}
> > $rule->{oper}
> > $rule->{regexp}
> Hmm well.. Its seems like a lot of work for a very small result. I dont
> think ill be coding this anytime soon ;).. But if youre a perl coder,
> patches are welcome.

I agree!  This suggestion is mostly about improving the
experimentation experience of anybody tweaking a rule list, but it
doesn't bug me sufficiently YET to invest the time to get familiar
with this part of the code, and then build my suggestion.  My todo
list looks like a screenwriter's first draft!

Although.. (just got a random (he) idea here..)  instead of an
invasive patch, can't we just sort the hash of rules upon creation by
the second layer $rulenr value?  Is it possible?  (David?)

> > And, yet another suggestion: it could be useful to include in one or
> > more of
> > the documentation locations a quick list of postfix attributes that
> > can be
> > used with discrimination..
> I was actually about to do that when i made the 1.7.4 release, but then
> it got all confusing with different versions of Postfix giving different
> attributes. So i decided not to, and instead made some examples showing
> the most useful postfix attribs.
> If you have compiled a list of attribs that work with your postfix
> version, i can include that in the docs in 1.7.5.

Ah, that's why I couldn't find such a list easily!  Well, I can vouch
for the existence of "sender_domain" and "recipient_domain", using
Postfix 2.3.6.

cheers,
Riaan