|
From: Dave S. <dst...@ma...> - 2007-01-28 17:05:30
|
First, some memcache background. You can find the details on http://www.dan= ga.com/memcached/ as well as the Perl APIs and a small example of code. =20 Memcache is a caching system which allows placing data in cache, and = getting it out quickly. You can think of memcache as a clipboard for data. = If System-A does a SQL lookup and gets the data, it takes a few microsecond= s to put it into memcached. Then when System-A, System-B, etc needs to do = the SQL query again, it checks the memcache "clipboard", and if it's = there, it reads it from memcache instead of reading from SQL. =20 Often code you write makes the same SQL SELECTs over and over again. Such = is the case with SQLGrey. Each email must be checked against SQL. However, = my code checks to see if we have already done the SQL lookup. For = instance, if SQLGrey gets an email from bo...@ao..., it can't find it in = memcache, so it needs to look it up in SQL. When it does, it saves the = result in memcached before it goes on as usual. Then, the next time = SQLgrey gets an email from bo...@ao..., it looks in the memcache and finds = the data. No need to bother SQL with the query.=20 =20 Using memcache saves time in the SQL lookup, but a small amount of time. I = have found memcached lookups are about 10x faster than our high-end SQL = servers, but still, we are talking about saving fractions of a second. = Where you really save data is in the load on SQL. If mail volume on = SQLGrey is only a few thousand emails a day, memcached will only save you = a few hundred SQL requests - not a huge savings. But, if you're SQL server = already runs slowly, or you get millions of emails a day, the savings in = load on your SQL server is substantial. =20 First off, at the top, I added two models MD5 and Memcached: =20 package sqlgrey; use strict; use Pod::Usage; use Getopt::Long 2.25 qw(:config posix_default no_ignore_case); use Net::Server::Multiplex; use DBI; use POSIX ':sys_wait_h'; use Sys::Hostname; use Digest::MD5; use Cache::Memcached; I used the MD5 library from CPAN to get a unique token so I could store = the result of the request. The following is an example of a function I = altered in SQLGrey. Comments are included. Note that all I am doing is = checking memcached for data before I look up in SQL. If I can't find it = in memcached, I look it up, and store it in memcached. You can try adding = this idea of caching into all the SQL lookups, and see which ones work = best for you. =20 sub is_in_from_awl { my ($self, $sender_name, $sender_domain, $host) =3D @_; =20 # Note I store the results of my find in memcached as "T" or "F". I = did this as I don't know how to use TRUE or FALSE in Perl, and since it's = such # a small amount of data, it should be wash in data storage. =20 # We need this as a variable as I will use it for the MD5 hash later. = Note the SQL is simpler than the=20 # original SQLGrey code. I did this to allow better caching=20 my $sql =3D "SELECT 1 FROM $from_awl WHERE sender_name =3D '$sender_nam= e' AND sender_domain =3D '$sender_domain' AND src =3D '$host'" ; =20 # Create a new variable with the SQL string as an MD5 hash. And yes, = Lionel said this was not a great idea ;-) use Digest::MD5; my $md5 ; $md5 =3D Digest::MD5->new; $md5->add($sql) ; my $md5_id =3D $md5->hexdigest ; print "SQL hashed to $md5_id \n" ; =20 # This memcached library came from http://www.danga.com/memcached/=20 use Cache::Memcached; =20 # I use three different memcached servers (note the fake IPs below - = use yours instead) and I use the standard port of "11211" # I set debug off (it may help when you are coding), and let the = library compress the data before it stores # it in memcached. You can use as few memcached servers as 1 to start, = and I would recommend this # until you are comfortable with memcached. Just add more into this = array as you need them. my $memcached =3D new Cache::Memcached { 'servers' =3D> ["1.2.3.4:11211", "1.2.3.5:11211","1.2.3.6:11211"], 'debug' =3D> 0 , 'compress_threshold' =3D> 10_000, }; =20 # Set the length of time you want this entry to Cache for. I set it to = expire in 14 days, but that's just what we did. # There are no "right" settings here. The TTL is measured in seconds. = After this time, the data will automatically # removed from cache. Set this low at first, perhaps 24 hours, until = you are comfortable with memcached. my $memcached_ttl =3D 60 * 60 * 24 * 14 ; =20 # With any luck, we know the value in memcached, and we will # look it up and return it here. No SQL transaction needed. my $boolean =3D $memcached->get("$md5_id"); if ($boolean) { # We found the boolean, and we return it. $self->mylog('whitelist', 2, "HIT from is_in_from_awl() - found in = memcached as $boolean \n") ; if ($boolean eq 'T') { return 1 ; } else { return 0 ; } } =20 $self->mylog('whitelist', 2, "MISS from is_in_from_awl() - not found = in memcached. \n") ; =20 # last_seen less than $self->{sqlgrey}{awl_age} days ago my $sth =3D $self->prepare("SELECT 1 FROM $from_awl " . 'WHERE sender_name =3D ? ' . 'AND sender_domain =3D ? ' . 'AND src =3D ? ' . 'AND last_seen > ' . $self->past_tstamp($self->{sqlgrey}{awl_age}, 'DAY') ); if (!defined $sth or !$sth->execute($sender_name, $sender_domain, $host)) = { $self->db_unavailable(); $self->mylog('dbaccess', 0, "error: couldn't access $from_awl table: = $DBI::errstr"); return 1; # in doubt, accept } else { $self->db_available(); } my $result =3D $sth->fetchall_arrayref(); if ($#$result !=3D 0)=20 { $memcached->set("$md5_id", "F", $memcached_ttl); return 0; # not a single entry }=20 else=20 { $memcached->set("$md5_id", "T", $memcached_ttl); return 1; # one single entry (no multiple entries by design) } } >>> Dan Faerch <da...@ha...> 11:07 AM Sunday, January 28, 2007 >>> Dave Strickler wrote: > memcached. Since using SQLGrey with memcached, we are seeing a drop of = about 2 LA points. From about 5 to about 3. Of course, this=20 Thats not bad at all, actually. > 'real' way. I am not a full-time programmer, just a salty-dog CTO, and = don't even know Perl. What I have is a hack, and I need a Perl=20 Im not a full-time coder at all. Though i have been coding for 10-15=20 years, Perl i something i was "forced" to learn at work, to be able to=20 maintain some of our other core systems. And how i really really hate=20 Perl ;).. I think ive probably spend around 8 weeks of perl coding in = my=20 entire life, 3 of them dedicated to sqlgrey. Mostly its somewhat easy for me, since it resembles C so much, but all=20 this @var, $var, $@var, $#{$var}, ect. can be really hard for me=20 sometimes ;). Luckily im a very thorough tester of my own code. > Who and where should I send this code snippet ? Well.. I usually always post my patches here first before throwing into = CVS. Though most ppl probably dont read/try the patches, its comforting to = know that everyone at least had a chance to point out any mistakes ;) - Dan ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share = your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=3Djoin.php&p=3Dsourceforge&CID=3DDE= VDEV=20 _______________________________________________ Sqlgrey-users mailing list Sql...@li...=20 https://lists.sourceforge.net/lists/listinfo/sqlgrey-users This message has been certified virus-free by MailWise Filter - The real-ti= me, intelligent, e-mail firewall used to scan inbound and outbound messages = for SPAM, Viruses and Content. =0A=0A For more information, please visit: http:= //www.mailwise.com=0A |