|
From: Lionel B. <lio...@bo...> - 2005-09-21 13:22:29
|
Jeff Rice wrote the following on 21.09.2005 00:14 :
>Hi,
>Just thought I would share a small patch that deals with a number of
>single-use email addresses that weren't being recognized by the existing
>regex in sqlgrey. These are the sort of bounce-return-12310123981, etc.
> This patch just tries to mask the parts that appear to be unique, so
>the database doesn't get filled with addresses that won't be used again.
>
>I somewhat arbitrarily decided that if an email name contained a
>delimiter such as "-","_", or "." along with a string of 12 or more
>alphanumeric characters, then those characters should be masked. That
>may or may not result in some emails being masked when they should not,
>or some not being masked when they should. I don't believe the result
>will be tragic in either case, and this can be adjusted to your liking.
>
>It might not work as well for other folks, but it seems to catch the
>major ones I see. I am sure there are other patterns that I didn't
>catch simply because they don't come up frequently in my email mix.
>
>Jeff
>
>
Thanks, added in the 1.7.x branch, will be in 1.7.2. Comments below in
the patch.
>--- sqlgrey 2005-09-03 01:09:21.000296554 +0000
>+++ /usr/sbin/sqlgrey 2005-09-03 01:09:02.000989883 +0000
>@@ -986,14 +986,21 @@
> $user =~ s/^srs1=[^=]+=([^=]+)(=+)[^=]+=[^=]+=([^=]+)=([^=]+)$/srs1=#=$1$2#=#=$3=$4/;
> # strip extension, used sometimes for mailing-list VERP
> $user =~ s/\+.*//;
>+
>+ # strip frequently used bounce/return masks
>+ $user =~ s/((bo|bounce|notice-return|notice-reply)[._-])[0-9a-z-_.]+$/$1#/gi; # Added by JR
>+
>
>
Good, I believe this is useful. Note: the case insensitive match isn't
needed. All addresses are lowercased before being processed. I removed
it from all your substitution.
> # strip hexadecimal sequences (doable in one regexp ?)
> # don't strip a leading hex sequence though
> my $tmp = '';
> while ($tmp ne $user) {
> $tmp = $user;
> $user =~ s/([._-])[0-9a-f]+([._-])/$1#$2/g;
>- }
>+ $user =~ s/([._-])[0-9a-z]{12,}([._-])/$1#$2/gi; # Added by JR
>
>
12 is arbitrary but seems good to me. I'm not sure how this one will
play out in the wild (this is why I prefer to put this code in the 1.7.x
branch).
>+
>+ }
> $user =~ s/([._-])[0-9a-f]+$/$1#/g;
>+ $user =~ s/([._-])[0-9a-z]{12,}$/$1#/gi; # Added by JR
>
>
OK
Lionel.
|