Re: [Sqlgrey-users] Using memcached with SQLGrey

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

First, some memcache background. You can find the details on http://www.dan=
ga.com/memcached/ as well as the Perl APIs and a small example of code.
=20
Memcache is a caching system which allows placing data in cache, and =
getting it out quickly. You can think of memcache as a clipboard for data. =
If System-A does a SQL lookup and gets the data, it takes a few microsecond=
s to put it into memcached. Then when System-A, System-B, etc needs to do =
the SQL query again, it checks the memcache "clipboard", and if it's =
there, it reads it from memcache instead of reading from SQL.
=20
Often code you write makes the same SQL SELECTs over and over again. Such =
is the case with SQLGrey. Each email must be checked against SQL. However, =
my code checks to see if we have already done the SQL lookup. For =
instance, if SQLGrey gets an email from bo...@ao..., it can't find it in =
memcache, so it needs to look it up in SQL. When it does, it saves the =
result in memcached before it goes on as usual. Then, the next time =
SQLgrey gets an email from bo...@ao..., it looks in the memcache and finds =
the data. No need to bother SQL with the query.=20
=20
Using memcache saves time in the SQL lookup, but a small amount of time. I =
have found memcached lookups are about 10x faster than our high-end SQL =
servers, but still, we are talking about saving fractions of a second. =
Where you really save data is in the load on SQL. If mail volume on =
SQLGrey is only a few thousand emails a day, memcached will only save you =
a few hundred SQL requests - not a huge savings. But, if you're SQL server =
already runs slowly, or you get millions of emails a day, the savings in =
load on your SQL server is substantial.
=20
First off, at the top, I added two models MD5 and Memcached:
=20
package sqlgrey;
use strict;
use Pod::Usage;
use Getopt::Long 2.25 qw(:config posix_default no_ignore_case);
use Net::Server::Multiplex;
use DBI;
use POSIX ':sys_wait_h';
use Sys::Hostname;
use Digest::MD5;
use Cache::Memcached;
I used the MD5 library from CPAN to get a unique token so I could store =
the result of the request. The following is an example of a function I =
altered in SQLGrey. Comments are included. Note that all I am doing is =
checking memcached for data  before I look up in SQL. If I can't find it =
in memcached, I look it up, and store it in memcached. You can try adding =
this idea of caching into all the SQL lookups, and see which ones work =
best for you.
=20
sub is_in_from_awl {
 my ($self, $sender_name, $sender_domain, $host) =3D @_;
=20
    # Note I store the results of my find in memcached as "T" or "F". I =
did this as I don't know how to use TRUE or FALSE in Perl, and since it's =
such
    # a small amount of data, it should be wash in data storage.
=20
    # We need this as a variable as I will use it for the MD5 hash later. =
Note the SQL is simpler than the=20
    # original SQLGrey code. I did this to allow better caching=20
    my $sql =3D "SELECT 1 FROM $from_awl WHERE sender_name =3D '$sender_nam=
e' AND sender_domain =3D '$sender_domain' AND src =3D '$host'" ;
=20
    # Create a new variable with the SQL string as an MD5 hash. And yes, =
Lionel said this was not a great idea ;-)
    use Digest::MD5;
    my $md5 ;
    $md5 =3D Digest::MD5->new;
    $md5->add($sql) ;
    my $md5_id =3D $md5->hexdigest ;
    print "SQL hashed to $md5_id \n" ;
=20
    # This memcached library came from http://www.danga.com/memcached/=20
    use Cache::Memcached;
=20
    # I use three different memcached servers (note the fake IPs below - =
use yours instead) and I use the standard port of "11211"
    # I set debug off (it may help when you are coding), and let the =
library compress the data before it stores
    # it in memcached. You can use as few memcached servers as 1 to start, =
and I would recommend this
    # until you are comfortable with memcached. Just add more into this =
array as you need them.
    my $memcached =3D new Cache::Memcached {
     'servers' =3D> ["1.2.3.4:11211", "1.2.3.5:11211","1.2.3.6:11211"],
    'debug' =3D> 0 ,
    'compress_threshold' =3D> 10_000,
   };
=20
   # Set the length of time you want this entry to Cache for. I set it to =
expire in 14 days, but that's just what we did.
   # There are no "right" settings here. The TTL is measured in seconds. =
After this time, the data will automatically
   # removed from cache. Set this low at first, perhaps 24 hours, until =
you are comfortable with memcached.
   my $memcached_ttl =3D 60 * 60 * 24 * 14 ;
  =20
 # With any luck, we know the value in memcached, and we will
 # look it up and return it here. No SQL transaction needed.
 my $boolean =3D $memcached->get("$md5_id");
    if ($boolean)
    {
     # We found the boolean, and we return it.
     $self->mylog('whitelist', 2, "HIT from is_in_from_awl() - found in =
memcached as $boolean \n") ;
  if ($boolean eq 'T')
  {
      return 1 ;
     }
     else
     {
      return 0 ;
     }
    }
=20
    $self->mylog('whitelist', 2, "MISS from is_in_from_awl() - not found =
in memcached. \n") ;
=20
 # last_seen less than $self->{sqlgrey}{awl_age} days ago
 my $sth =3D $self->prepare("SELECT 1 FROM $from_awl " .
 'WHERE sender_name =3D ? ' .
 'AND sender_domain =3D ? ' .
 'AND src =3D ? ' .
 'AND last_seen > ' .
 $self->past_tstamp($self->{sqlgrey}{awl_age},
 'DAY')
 );
 if (!defined $sth or !$sth->execute($sender_name, $sender_domain, $host)) =
{
  $self->db_unavailable();
  $self->mylog('dbaccess', 0, "error: couldn't access $from_awl table: =
$DBI::errstr");
  return 1; # in doubt, accept
 } else {
  $self->db_available();
 }
 my $result =3D $sth->fetchall_arrayref();
 if ($#$result !=3D 0)=20
 {
     $memcached->set("$md5_id", "F", $memcached_ttl);
  return 0; # not a single entry
 }=20
 else=20
 {
     $memcached->set("$md5_id", "T", $memcached_ttl);
  return 1; # one single entry (no multiple entries by design)
 }
}

>>> Dan Faerch <da...@ha...> 11:07 AM Sunday, January 28, 2007 >>>
Dave Strickler wrote:

> memcached. Since using SQLGrey with memcached, we are seeing a drop of =
about 2 LA points. From about 5 to about 3. Of course, this=20

Thats not bad at all, actually.

> 'real' way. I am not a full-time programmer, just a salty-dog CTO, and =
don't even know Perl. What I have is a hack, and I need a Perl=20

Im not a full-time coder at all. Though i have been coding for 10-15=20
years, Perl i something i was "forced" to learn at work, to be able to=20
maintain some of our other core systems. And how i really really hate=20
Perl ;).. I think ive probably spend around 8 weeks of perl coding in =
my=20
entire life, 3 of them dedicated to sqlgrey.
Mostly its somewhat easy for me, since it resembles C so much, but all=20
this @var, $var, $@var, $#{$var}, ect. can be really hard for me=20
sometimes ;).
Luckily im a very thorough tester of my own code.

> Who and where should I send this code snippet ?
Well.. I usually always post my patches here first before throwing into =
CVS. Though most ppl probably dont read/try the patches, its comforting to =
know that everyone at least had a chance to point out any mistakes ;)

- Dan

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share =
your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE=
VDEV=20
_______________________________________________
Sqlgrey-users mailing list
Sql...@li...=20
https://lists.sourceforge.net/lists/listinfo/sqlgrey-users

This message has been certified virus-free by MailWise Filter - The real-ti=
me, intelligent, e-mail firewall used to scan inbound and outbound messages =
for SPAM, Viruses and Content. =0A=0A  For more information, please visit: http:=
//www.mailwise.com=0A