From: Lionel B. <lio...@bo...> - 2005-06-29 15:51:50
|
Edward Shornock (sqlgrey list) wrote the following on 27.06.2005 05:35 : >Edward Shornock (sqlgrey list) wrote: > > > >>I'm experiencing the the problem now as well, and this just started >>Friday. I thought it might have had something to do with the upgrade to >>1.7, but after reverting back to 1.6 (which I had no problems with), the >>problem persists. I've also turned up the logging level trying to track >>this down. I'm using Debian Sid/unstable and PostgreSQL (7.4.8) as the >>DB system. >> >>Hopefully we can track this problem down fairly easily... >> >> >> >> >More information (I hope it helps), just before the process hangs: > >Jun 26 20:01:47 darkside postfix/smtpd[18921]: connect from >russian-caravan.cloud9.net[168.100.1.4] >Jun 26 20:01:48 darkside sqlgrey: 2005/06/26-20:01:48 CONNECT TCP Peer: >"127.0.0.1:56048" Local: "127.0.0.1:2501" >Jun 26 20:01:48 darkside sqlgrey: optin: greylisting active for >pos...@cr... >Jun 26 20:01:48 darkside sqlgrey: grey: unknown pattern: >russian-caravan.cloud9.net, 168.100.1.4: using C-class (168.100.1). >Jun 26 20:01:48 darkside sqlgrey: system: forked cleanup child (18925) >Jun 26 20:01:48 darkside sqlgrey: warning: ERROR: prepared statement >"dbdpg_1" does not exist >Jun 26 20:01:48 darkside sqlgrey: warning: message type 0x43 arrived >from server while idle >Jun 26 20:01:48 darkside sqlgrey: warning: message type 0x5a arrived >from server while idle >Jun 26 20:01:48 darkside sqlgrey: grey: domain awl match: updating >168.100.1(168.100.1.4), postfix.org > >...and at this point, sqlgrey has given up. >Jun 26 20:03:20 darkside sqlgrey: optin: greylisting active for >pos...@cr... >Jun 26 20:03:20 darkside sqlgrey: grey: unknown pattern: >camomile.cloud9.net, 168.100.1.3: using C-class (168.100.1). >Jun 26 20:03:20 darkside sqlgrey: perf: spent 0s cleaning: from_awl (0) >domain_awl (0) connect (0) >Jun 26 20:03:20 darkside sqlgrey: system: cleanup child exit (18925) >Jun 26 20:04:46 darkside postfix/anvil[18364]: statistics: max >connection rate 1/60s for (smtp:146.82.138.6) at Jun 26 19:54:46 >Jun 26 20:04:46 darkside postfix/anvil[18364]: statistics: max >connection count 1 for (smtp:146.82.138.6) at Jun 26 19:54:46 >Jun 26 20:04:46 darkside postfix/anvil[18364]: statistics: max cache >size 1 at Jun 26 19:54:46 >Jun 26 20:05:00 darkside postfix/smtpd[18921]: warning: timeout on >127.0.0.1:2501 while reading input attribute name >Jun 26 20:05:00 darkside postfix/smtpd[18921]: warning: problem talking >to server 127.0.0.1:2501: Connection timed out >Jun 26 20:06:41 darkside postfix/smtpd[18921]: warning: timeout on >127.0.0.1:2501 while reading input attribute name >Jun 26 20:06:41 darkside postfix/smtpd[18921]: warning: problem talking >to server 127.0.0.1:2501: Connection timed out >Jun 26 20:06:41 darkside postfix/smtpd[18921]: NOQUEUE: reject: RCPT >from camomile.cloud9.net[168.100.1.3]: 450 Server configuration problem; >from=<own...@po...> >to=<pos...@cr...> proto=ESMTP >helo=<camomile.cloud9.net> >Jun 26 20:06:41 darkside postfix/cleanup[19161]: 8BD20D6B082: >message-id=<200...@da...> >Jun 26 20:06:41 darkside postfix/qmgr[3099]: 8BD20D6B082: >from=<dou...@da...>, size=1086, >nrcpt=1 (queue active) >Jun 26 20:06:41 darkside postfix/smtpd[18921]: disconnect from >camomile.cloud9.net[168.100.1.3] > > > ># ps aux |grep sqlgrey >sqlgrey 3754 0.0 0.9 13660 8684 ? Ss 15:43 0:00 >/usr/bin/perl -w /usr/sbin/sqlgrey -d >postgres 3755 0.0 1.0 18076 9096 ? S 15:43 0:00 >postgres: sqlgrey sqlgrey 127.0.0.1 >idle >sqlgrey 18925 0.0 0.0 0 0 ? Z 20:01 0:00 >[sqlgrey] <defunct> >root 26635 0.0 0.0 1624 452 pts/2 S+ 22:42 0:00 grep >sqlgrey > > >strace on PID 3754 just sat there. I restarted PostgreSQL, and then the >strace finally showed activity. > >It seems that even though sqlgrey wasn't accepting connections The logs >at this time show: >Jun 26 22:56:51 darkside postfix/pickup[27380]: 08C7AD6B082: uid=132 >from=<sqlgrey> >Jun 26 22:56:51 darkside postfix/qmgr[26777]: 08C7AD6B082: >from=<sq...@da...>, size=485, nrcpt=1 >(queue active) >Jun 26 22:56:51 darkside amavis[15461]: (15461-06-3) LMTP::10024 >/var/lib/amavis/tmp/amavis-20050626T190955-15461: ><sq...@da...> -> ><pos...@cr...> Received: SIZE=485 from >darkside.crazeecanuck.homelinux.net ([127.0.0.1]) by localhost (darkside >[127.0.0.1]) (amavisd-new, port 10024) with LMTP id 15461-06-3 for ><pos...@cr...>; Sun, 26 Jun 2005 22:56:51 -0400 >(EDT) >Jun 26 22:56:51 darkside sqlgrey: dbaccess: error: couldn't get now() >from DB: FATAL: terminating connection due to administrator command >server closed the connection unexpectedly This probably means the >server terminated abnormally before or while processing the request. >Jun 26 22:56:51 darkside sqlgrey: system: forked cleanup child (27706) >Jun 26 22:56:51 darkside amavis[15461]: (15461-06-3) Checking: ><sq...@da...> -> ><pos...@cr...> >Jun 26 22:56:51 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: >the database system is shutting down >Jun 26 22:56:51 darkside sqlgrey: dbaccess: error: couldn't access >domain_awl table: no connection to the server >Jun 26 22:56:51 darkside sqlgrey: grey: domain awl match: updating >168.100.1(168.100.1.3), postfix.org >Jun 26 22:56:51 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: >the database system is shutting down >Jun 26 22:56:51 darkside sqlgrey: dbaccess: warning: couldn't do query: >UPDATE domain_awl SET last_seen = NOW(), first_seen = first_seen WHERE >sender_domain = NULL AND src = NULL: FATAL: the database system is >shutting down , reconnecting to DB >JJun 26 22:56:53 darkside sqlgrey: 2005/06/26-22:56:53 CONNECT TCP Peer: >"127.0.0.1:52685" Local: "127.0.0.1:2501" >Jun 26 22:56:53 darkside sqlgrey: optin: greylisting active for >deb...@cr... >Jun 26 22:56:53 darkside sqlgrey: grey: unknown pattern: >murphy.debian.org, 146.82.138.6: using C-class (146.82.138). >Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: >the database system is shutting down >Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't get now() >from DB: FATAL: the database system is shutting down >Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: >the database system is shutting down >Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't access >domain_awl table: FATAL: the database system is shutting down >Jun 26 22:56:53 darkside sqlgrey: grey: domain awl match: updating >146.82.138(146.82.138.6), lists.debian.org >Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: >the database system is shutting down >Jun 26 22:56:53 darkside sqlgrey: dbaccess: warning: couldn't do query: >UPDATE domain_awl SET last_seen = NOW(), first_seen = first_seen WHERE >sender_domain = NULL AND src = NULL: FATAL: the database system is >shutting down , reconnecting to DB >Jun 26 22:56:53 darkside sqlgrey: 2005/06/26-22:56:53 CONNECT TCP Peer: >"127.0.0.1:52693" Local: "127.0.0.1:2501" >Jun 26 22:56:53 darkside sqlgrey: optin: greylisting active for >sp...@cr... >Jun 26 22:56:53 darkside sqlgrey: grey: unknown pattern: >cherry.ease.lsoft.com, 209.119.0.109: using C-class (209.119.0). >Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: >the database system is shutting down >Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't get now() >from DB: FATAL: the database system is shutting down >Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: could >not connect to server: Connection refused Is the server running on >host "localhost" and accepting TCP/IP connections on port 5432? >Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't access >domain_awl table: could not connect to server: Connection refused Is >the server running on host "localhost" and accepting TCP/IP >connections on port 5432? >Jun 26 22:56:53 darkside sqlgrey: grey: domain awl match: updating >209.119.0(209.119.0.109), peach.ease.lsoft.com > >the PostgreSQL logs showed: >2005-06-26 20:01:48 [18926] LOG: connection authorized: user=sqlgrey >database=sqlgrey >2005-06-26 20:01:48 [3755] ERROR: prepared statement "dbdpg_1" does not >exist > >After restarting PGSQL, sqlgrey is working again. Note that I could >connect to the sqlgrey database just fine. I'm now strace'ing the >sqlgrey process, and dumping the output to a log file, hopefully that >log will give a glue as to what's happening. at the time of hanging--I >have no idea at this point. > >If there's anything else that I can do to help track down this bug, >please let me know... > > You may want to check the number of connections to the database. Do something like "netstat -anp | grep <sqlgrey-pid>" to check open TCP connections. You should only have one conection to the PostgreSQL server. I'm currently testing a 1.6.2 release (tar.bz2 at http://sqlgrey.bouton.name/sqlgrey-1.6.2.tar.bz2 until I consider stable enough to put it on sourceforge) with a safenet for the database connection handling: SQLgrey will now explicitly call disconnect on a buggy connection before reconnecting. All database connections are set up to be explicitly disconnected in this version too: SQLgrey doesn't rely on the DBD driver for reaping inactive connections anymore. Hope it can make a difference. If all fails, I'll make the fork configurable and inactive by default. Lionel. |