|
From: Edward S. (s. list) <sq...@cr...> - 2005-06-27 03:35:51
|
Edward Shornock (sqlgrey list) wrote: >I'm experiencing the the problem now as well, and this just started >Friday. I thought it might have had something to do with the upgrade to >1.7, but after reverting back to 1.6 (which I had no problems with), the >problem persists. I've also turned up the logging level trying to track >this down. I'm using Debian Sid/unstable and PostgreSQL (7.4.8) as the >DB system. > >Hopefully we can track this problem down fairly easily... > > More information (I hope it helps), just before the process hangs: Jun 26 20:01:47 darkside postfix/smtpd[18921]: connect from russian-caravan.cloud9.net[168.100.1.4] Jun 26 20:01:48 darkside sqlgrey: 2005/06/26-20:01:48 CONNECT TCP Peer: "127.0.0.1:56048" Local: "127.0.0.1:2501" Jun 26 20:01:48 darkside sqlgrey: optin: greylisting active for pos...@cr... Jun 26 20:01:48 darkside sqlgrey: grey: unknown pattern: russian-caravan.cloud9.net, 168.100.1.4: using C-class (168.100.1). Jun 26 20:01:48 darkside sqlgrey: system: forked cleanup child (18925) Jun 26 20:01:48 darkside sqlgrey: warning: ERROR: prepared statement "dbdpg_1" does not exist Jun 26 20:01:48 darkside sqlgrey: warning: message type 0x43 arrived from server while idle Jun 26 20:01:48 darkside sqlgrey: warning: message type 0x5a arrived from server while idle Jun 26 20:01:48 darkside sqlgrey: grey: domain awl match: updating 168.100.1(168.100.1.4), postfix.org ...and at this point, sqlgrey has given up. Jun 26 20:03:20 darkside sqlgrey: optin: greylisting active for pos...@cr... Jun 26 20:03:20 darkside sqlgrey: grey: unknown pattern: camomile.cloud9.net, 168.100.1.3: using C-class (168.100.1). Jun 26 20:03:20 darkside sqlgrey: perf: spent 0s cleaning: from_awl (0) domain_awl (0) connect (0) Jun 26 20:03:20 darkside sqlgrey: system: cleanup child exit (18925) Jun 26 20:04:46 darkside postfix/anvil[18364]: statistics: max connection rate 1/60s for (smtp:146.82.138.6) at Jun 26 19:54:46 Jun 26 20:04:46 darkside postfix/anvil[18364]: statistics: max connection count 1 for (smtp:146.82.138.6) at Jun 26 19:54:46 Jun 26 20:04:46 darkside postfix/anvil[18364]: statistics: max cache size 1 at Jun 26 19:54:46 Jun 26 20:05:00 darkside postfix/smtpd[18921]: warning: timeout on 127.0.0.1:2501 while reading input attribute name Jun 26 20:05:00 darkside postfix/smtpd[18921]: warning: problem talking to server 127.0.0.1:2501: Connection timed out Jun 26 20:06:41 darkside postfix/smtpd[18921]: warning: timeout on 127.0.0.1:2501 while reading input attribute name Jun 26 20:06:41 darkside postfix/smtpd[18921]: warning: problem talking to server 127.0.0.1:2501: Connection timed out Jun 26 20:06:41 darkside postfix/smtpd[18921]: NOQUEUE: reject: RCPT from camomile.cloud9.net[168.100.1.3]: 450 Server configuration problem; from=<own...@po...> to=<pos...@cr...> proto=ESMTP helo=<camomile.cloud9.net> Jun 26 20:06:41 darkside postfix/cleanup[19161]: 8BD20D6B082: message-id=<200...@da...> Jun 26 20:06:41 darkside postfix/qmgr[3099]: 8BD20D6B082: from=<dou...@da...>, size=1086, nrcpt=1 (queue active) Jun 26 20:06:41 darkside postfix/smtpd[18921]: disconnect from camomile.cloud9.net[168.100.1.3] # ps aux |grep sqlgrey sqlgrey 3754 0.0 0.9 13660 8684 ? Ss 15:43 0:00 /usr/bin/perl -w /usr/sbin/sqlgrey -d postgres 3755 0.0 1.0 18076 9096 ? S 15:43 0:00 postgres: sqlgrey sqlgrey 127.0.0.1 idle sqlgrey 18925 0.0 0.0 0 0 ? Z 20:01 0:00 [sqlgrey] <defunct> root 26635 0.0 0.0 1624 452 pts/2 S+ 22:42 0:00 grep sqlgrey strace on PID 3754 just sat there. I restarted PostgreSQL, and then the strace finally showed activity. It seems that even though sqlgrey wasn't accepting connections The logs at this time show: Jun 26 22:56:51 darkside postfix/pickup[27380]: 08C7AD6B082: uid=132 from=<sqlgrey> Jun 26 22:56:51 darkside postfix/qmgr[26777]: 08C7AD6B082: from=<sq...@da...>, size=485, nrcpt=1 (queue active) Jun 26 22:56:51 darkside amavis[15461]: (15461-06-3) LMTP::10024 /var/lib/amavis/tmp/amavis-20050626T190955-15461: <sq...@da...> -> <pos...@cr...> Received: SIZE=485 from darkside.crazeecanuck.homelinux.net ([127.0.0.1]) by localhost (darkside [127.0.0.1]) (amavisd-new, port 10024) with LMTP id 15461-06-3 for <pos...@cr...>; Sun, 26 Jun 2005 22:56:51 -0400 (EDT) Jun 26 22:56:51 darkside sqlgrey: dbaccess: error: couldn't get now() from DB: FATAL: terminating connection due to administrator command server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. Jun 26 22:56:51 darkside sqlgrey: system: forked cleanup child (27706) Jun 26 22:56:51 darkside amavis[15461]: (15461-06-3) Checking: <sq...@da...> -> <pos...@cr...> Jun 26 22:56:51 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: the database system is shutting down Jun 26 22:56:51 darkside sqlgrey: dbaccess: error: couldn't access domain_awl table: no connection to the server Jun 26 22:56:51 darkside sqlgrey: grey: domain awl match: updating 168.100.1(168.100.1.3), postfix.org Jun 26 22:56:51 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: the database system is shutting down Jun 26 22:56:51 darkside sqlgrey: dbaccess: warning: couldn't do query: UPDATE domain_awl SET last_seen = NOW(), first_seen = first_seen WHERE sender_domain = NULL AND src = NULL: FATAL: the database system is shutting down , reconnecting to DB JJun 26 22:56:53 darkside sqlgrey: 2005/06/26-22:56:53 CONNECT TCP Peer: "127.0.0.1:52685" Local: "127.0.0.1:2501" Jun 26 22:56:53 darkside sqlgrey: optin: greylisting active for deb...@cr... Jun 26 22:56:53 darkside sqlgrey: grey: unknown pattern: murphy.debian.org, 146.82.138.6: using C-class (146.82.138). Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: the database system is shutting down Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't get now() from DB: FATAL: the database system is shutting down Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: the database system is shutting down Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't access domain_awl table: FATAL: the database system is shutting down Jun 26 22:56:53 darkside sqlgrey: grey: domain awl match: updating 146.82.138(146.82.138.6), lists.debian.org Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: the database system is shutting down Jun 26 22:56:53 darkside sqlgrey: dbaccess: warning: couldn't do query: UPDATE domain_awl SET last_seen = NOW(), first_seen = first_seen WHERE sender_domain = NULL AND src = NULL: FATAL: the database system is shutting down , reconnecting to DB Jun 26 22:56:53 darkside sqlgrey: 2005/06/26-22:56:53 CONNECT TCP Peer: "127.0.0.1:52693" Local: "127.0.0.1:2501" Jun 26 22:56:53 darkside sqlgrey: optin: greylisting active for sp...@cr... Jun 26 22:56:53 darkside sqlgrey: grey: unknown pattern: cherry.ease.lsoft.com, 209.119.0.109: using C-class (209.119.0). Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: FATAL: the database system is shutting down Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't get now() from DB: FATAL: the database system is shutting down Jun 26 22:56:53 darkside sqlgrey: dbaccess: can't connect to DB: could not connect to server: Connection refused Is the server running on host "localhost" and accepting TCP/IP connections on port 5432? Jun 26 22:56:53 darkside sqlgrey: dbaccess: error: couldn't access domain_awl table: could not connect to server: Connection refused Is the server running on host "localhost" and accepting TCP/IP connections on port 5432? Jun 26 22:56:53 darkside sqlgrey: grey: domain awl match: updating 209.119.0(209.119.0.109), peach.ease.lsoft.com the PostgreSQL logs showed: 2005-06-26 20:01:48 [18926] LOG: connection authorized: user=sqlgrey database=sqlgrey 2005-06-26 20:01:48 [3755] ERROR: prepared statement "dbdpg_1" does not exist After restarting PGSQL, sqlgrey is working again. Note that I could connect to the sqlgrey database just fine. I'm now strace'ing the sqlgrey process, and dumping the output to a log file, hopefully that log will give a glue as to what's happening. at the time of hanging--I have no idea at this point. If there's anything else that I can do to help track down this bug, please let me know... |