Re: [Sqlgrey-users] idea for improving Discrimination feature

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

On 02/02/07, Riaan Kok <ria...@gm...> wrote:
>
> On 01/02/07, Dan Faerch <da...@ha...> wrote:
> >
> > Riaan Kok wrote:
> > >
> > > sweet; nice and small patch; thanks!  I'll test it early this week,
> > > and if
> > > there's any interesting statistics that pops out, I'll post them..
> > >
> > > Riaan
> > Riaan. Is it usefull?
> > If it is usefull to you, and no one has objections, i will include this
> > feature in 1.7.5.
> >
> >
> When more than one rules match (which is likely), which one gets displayed
> in the logs?  Does the last one that matches gets displayed, or does
> processing stop as soon as the first rule from the top of the "
> discrimination.regexp" file raises a "greylist this" flag?
>
> R
>
>
a case that I saw in the logs intrigued me, so I did a quick lookup of the
keys() function and read a bit about hashes..  Perl's hash structure is not
ordered in any way.  So, iterating through a hash returns the elements in
undefined order..  so, in this code, the first one of the regexps in random
order that matches activates greylisting.  There's not much point in
gathering statistics then!  (By the way, this could be related to what you
referred to in the code regarding resetting the hash.)

>From reading the code it seems like there are a few variables in this play:
$hash: contains a whole instance of a regexp rule:
  - $var: contains the postfix attribute
  - $data: a hash containing:
    * $rulenr: just that, the number of the rule
    * $regex: containing the two keys "oper" and "regexp"

So, how about rather storing the list of rules in an array, which does away
with the need for storing the $rulenr, and each array item like $rule
containing:
$rule->{attrib}
$rule->{oper}
$rule->{regexp}

This would be a bit more invasive to do, but it would allow the rule number
return to generate more meaningful statistics..  One can then order the
rules in the file from most specific to most general, and then pretty much
be able to gather information by counting the return lines of a grep..  This
could (maybe) encourage people to experiment with discrimination a bit more!

For the moment I shall examine the emails that bypasses my regexp list..  I
can adjust the log level on my own to make the logs provide this
information, but, in general, wouldn't it make sense to log at level 2 when
no regexp matches?  Sqlgrey normally indicates at log level 2 whether a
given smtp combination passes awl, gets greylisted, is new, etc., so
indicating that greylisting is bypassed due to no regexp match makes sense
to me to be at level 2 as well..

And, yet another suggestion: it could be useful to include in one or more of
the documentation locations a quick list of postfix attributes that can be
used with discrimination..

I'll stop for now!

thanks,

Riaan