|
From: Lionel B. <lio...@bo...> - 2005-07-05 14:14:20
Attachments:
sqlgrey-zombies.patch
|
If I'm right, I'll need a brown-paper bag. I'm interested by the result of the attached patch on configurations where freezes occur... cd /usr/sbin; cp sqlgrey sqlgrey.sav patch -p0 < sqlgrey-zombies.patch |
|
From: Ed S. (s. list) <sq...@cr...> - 2005-07-05 17:29:29
|
Lionel Bouton wrote: > If I'm right, I'll need a brown-paper bag. > > I'm interested by the result of the attached patch on configurations > where freezes occur... > > cd /usr/sbin; > cp sqlgrey sqlgrey.sav > > patch -p0 < sqlgrey-zombies.patch > > I've reverted back to a pristine 1.6.2, applied this patch, turned the log level up to 4, and i'm strace'ing the sqlgrey process to a log file. I'll keep you posted... Cheers, |
|
From: Ed S. (s. list) <sq...@cr...> - 2005-07-06 12:52:35
|
Ed Shornock (sqlgrey list) wrote:
>Lionel Bouton wrote:
>
>
>>I'm interested by the result of the attached patch on configurations
>>where freezes occur...
>>
>>
>I've reverted back to a pristine 1.6.2, applied this patch, turned the
>log level up to 4, and i'm strace'ing the sqlgrey process to a log file.
>
>I'll keep you posted...
>
>
At around 12:40 AM, I noticed that SQLgrey locked up. I was running
strace on the process at the time, this is the output of the strace
command. SQLgrey unlocked after I restarted PGSQL:
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn() = ? (mask now [RTMIN])
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
waitpid(-1, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], WNOHANG) = 24973
waitpid(-1, 0xbff828f8, WNOHANG) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigaction(SIGCHLD, {0xb7eed7a0, [], 0}, {0xb7eed7a0, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
waitpid(24975, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 24975
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn() = ? (mask now [RTMIN])
rt_sigaction(SIGINT, {0xb7eed7a0, [], 0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {0xb7eed7a0, [], 0}, NULL, 8) = 0
read(6, "", 4) = 0
close(6) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
waitpid(-1, 0xbff82aa8, WNOHANG) = -1 ECHILD (No child processes)
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigaction(SIGCHLD, {0xb7eed7a0, [], 0}, {0xb7eed7a0, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<19>sqlgrey: dbaccess: error: co"..., 118) = 118
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<21>sqlgrey: grey: domain awl ma"..., 98) = 98
open("/etc/hosts", O_RDONLY) = 6
fcntl64(6, F_GETFD) = 0
fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=575, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f58000
read(6, "127.0.0.1 localhost\n69.7.1"..., 4096) = 575
read(6, "", 4096) = 0
close(6) = 0
munmap(0xb7f58000, 4096) = 0
open("/etc/hosts", O_RDONLY) = 6
fcntl64(6, F_GETFD) = 0
fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
fstat64(6, {st_mode=S_IFREG|0644, st_size=575, ...}) = 0
old_mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS,
-1, 0) = 0xb7f58000
read(6, "127.0.0.1 localhost\n69.7.1"..., 4096) = 575
read(6, "", 4096) = 0
close(6) = 0
munmap(0xb7f58000, 4096) = 0
socket(PF_INET, SOCK_STREAM, IPPROTO_IP) = 6
setsockopt(6, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl64(6, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
fcntl64(6, F_SETFD, FD_CLOEXEC) = 0
connect(6, {sa_family=AF_INET, sin_port=htons(5432),
sin_addr=inet_addr("127.0.0.1")}, 16) = -1 EINPROGRESS (Operation now in
progress)
poll([{fd=6, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1
getsockopt(6, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
getsockname(6, {sa_family=AF_INET, sin_port=htons(53553),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
poll([{fd=6, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1
rt_sigprocmask(SIG_BLOCK, [PIPE], [RTMIN], 8) = 0
send(6, "\0\0\0\10\4\322\26/", 8, 0) = 8
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
poll([{fd=6, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(6, "N", 16384, 0) = 1
poll([{fd=6, events=POLLOUT|POLLERR, revents=POLLOUT}], 1, -1) = 1
rt_sigprocmask(SIG_BLOCK, [PIPE], [RTMIN], 8) = 0
send(6, "\0\0\0\'\0\3\0\0user\0sqlgrey\0database\0sq"..., 39, 0) = 39
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
poll([{fd=6, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(6, "R\0\0\0\10\0\0\0\3", 16384, 0) = 9
rt_sigprocmask(SIG_BLOCK, [PIPE], [RTMIN], 8) = 0
send(6, "p\0\0\0\fsqlgrey\0", 13, 0) = 13
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
poll([{fd=6, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(6, "R\0\0\0\10\0\0\0\0S\0\0\0\36client_encoding\0SQ"..., 16384, 0)
= 166
rt_sigprocmask(SIG_BLOCK, [PIPE], [RTMIN], 8) = 0
send(6, "Q\0\0\0\213UPDATE domain_awl SET last_"..., 140, 0) = 140
rt_sigprocmask(SIG_SETMASK, [RTMIN], NULL, 8) = 0
poll([{fd=6, events=POLLIN|POLLERR, revents=POLLIN}], 1, -1) = 1
recv(6, "C\0\0\0\rUPDATE 0\0Z\0\0\0\5I", 16384, 0) = 20
time(NULL) = 1120614094
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<23>sqlgrey: mail: mail_bucket: "..., 50) = 50
pipe([8, 9]) = 0
fork() = 24984
close(9) = 0
rt_sigaction(SIGINT, {SIG_IGN}, {0xb7eed7a0, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN}, {0xb7eed7a0, [], 0}, 8) = 0
waitpid(24984, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0) = 24984
--- SIGCHLD (Child exited) @ 0 (0) ---
sigreturn() = ? (mask now [RTMIN])
rt_sigaction(SIGINT, {0xb7eed7a0, [], 0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {0xb7eed7a0, [], 0}, NULL, 8) = 0
read(8, "", 4) = 0
close(8) = 0
rt_sigprocmask(SIG_BLOCK, [CHLD], NULL, 8) = 0
waitpid(-1, 0xbff82aa8, WNOHANG) = -1 ECHILD (No child processes)
rt_sigprocmask(SIG_BLOCK, [CHLD], [CHLD RTMIN], 8) = 0
rt_sigaction(SIGCHLD, {0xb7eed7a0, [], 0}, {0xb7eed7a0, [], 0}, 8) = 0
rt_sigprocmask(SIG_SETMASK, [CHLD RTMIN], NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [CHLD], NULL, 8) = 0
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<23>sqlgrey: request: ccert_fing"..., 474) = 474
select(16, [5 7], [7], NULL, NULL) = 1 (out [7])
write(7, "action=PREPEND X-Greylist: domai"..., 69) = 69
select(16, [5 7], [], NULL, NULL) = 1 (in [7])
read(7, "", 8192) = 0
shutdown(7, 1 /* send */) = 0
close(7) = 0
select(16, [5], [], NULL, NULL) = 1 (in [5])
accept(5, {sa_family=AF_INET, sin_port=htons(51621),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 7
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbff81b8c) = -1 EINVAL (Invalid
argument)
_llseek(7, 0, 0xbff81bd0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, 0xbff81b8c) = -1 EINVAL (Invalid
argument)
_llseek(7, 0, 0xbff81bd0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
fcntl64(7, F_SETFD, FD_CLOEXEC) = 0
fcntl64(7, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(7, F_SETFL, O_RDWR|O_NONBLOCK) = 0
getsockopt(7, SOL_SOCKET, SO_TYPE, [1], [4]) = 0
getsockname(7, {sa_family=AF_INET, sin_port=htons(2501),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
getpeername(7, {sa_family=AF_INET, sin_port=htons(51621),
sin_addr=inet_addr("127.0.0.1")}, [16]) = 0
time([1120615029]) = 1120615029
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<22>sqlgrey: 2005/07/05-21:57:09"..., 95) = 95
time(NULL) = 1120615029
select(16, [5 7], [], NULL, NULL) = 1 (in [7])
read(7, "request=smtpd_access_policy\nprot"..., 8192) = 417
stat64("/etc/sqlgrey/clients_ip_whitelist.local", {st_mode=S_IFREG|0644,
st_size=10, ...}) = 0
stat64("/etc/sqlgrey/clients_fqdn_whitelist.local",
{st_mode=S_IFREG|0644, st_size=3106, ...}) = 0
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<22>sqlgrey: whitelist: murphy.d"..., 64) = 64
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
write(4, "<21>sqlgrey: whitelist: bounce-d"..., 172) = 172
select(8, [4], NULL, [4], {0, 0}) = 0 (Timeout)
I know that SQLgrey *used to work flawlessly* without lock-ups
previously. Checking the changelog I see:
* Sat Mar 05 2005 Lionel Bouton <lio...@bo...>
- 1.5.3 release
- the cleanup is now done in a separate process to avoid stalling the
service
Is 1.5.3 when the new forking started? 1.5.4 worked FLAWLESSLY on my box
when I installed it in April.
v1.5.9 worked well too. This locking up JUST started within the last
few weeks and it may not even be related to SQLgrey's code itself.
I did notice that on June 12 (by checking logs) that PGSQL was updated.
I've since tried reverting back to the old version that was installed
prior to June 12th, just in case that has caused the problem. The
description for the package shows:
> This is a transitional package to automatically migrate to the new
> multicluster/multiversion structure provided by postgresql-common and
> postgresql-<version>. On installation it will integrate the existing
> database into this new structure. You can safely remove this package
> afterwards.
Let's see WTF the downgrading of the package does. I'll post another
update if this fixes the problem (and I guess we'll figure out where to
go from there).
Cheers,
--
Ed
|
|
From: Ed S. (s. list) <sq...@cr...> - 2005-07-08 18:57:50
|
Ed Shornock (sqlgrey list) wrote: [snip] > I did notice that on June 12 (by checking logs) that PGSQL was updated. > [snip] > > Let's see WTF the downgrading of the package does. I'll post another > update if this fixes the problem (and I guess we'll figure out where to > go from there). > Another hang with the patched 1.6.2 and a downloaded PostgreSQL. No idea WTF is causing it (and why it's just Mr. Rees and myself having this problem). I wonder if it'd help to completely overwrite my config file and re-edit it to compensate for the new options....hmm...let's do that with 1.6.3.... |
|
From: Lionel B. <lio...@bo...> - 2005-07-08 19:15:54
|
Ed Shornock (sqlgrey list) wrote the following on 08.07.2005 20:56 : >Ed Shornock (sqlgrey list) wrote: >[snip] > > > >>I did notice that on June 12 (by checking logs) that PGSQL was updated. >> >> >> >[snip] > > >>Let's see WTF the downgrading of the package does. I'll post another >>update if this fixes the problem (and I guess we'll figure out where to >>go from there). >> >> >> > >Another hang with the patched 1.6.2 and a downloaded PostgreSQL. No >idea WTF is causing it (and why it's just Mr. Rees and myself having >this problem). > > I'm still not sure myself. After a fork one needs to take precautions with various things (file descriptors, sockets and so on...). The current problem with SQLgrey is the amount of code in the Net::Server::Multiplex module I'm not familiar with. I already considered ditching it and replacing it with my own code because there seems to be bugs in it (SIGHUP handling for example doesn't work as documented and was pretty unusable when I tried to use it). 1.6.3 'sync' clean_method should hopefully solve the problem for the short time and I'll probably recode the socket handling in 1.7.x. Lionel. |
|
From: Edward J S. <ed...@cr...> - 2005-07-08 21:24:44
|
Ed Shornock (sqlgrey list) wrote: >Another hang with the patched 1.6.2 and a downloaded PostgreSQL. > GRRRR s/downloaded/downgraded/ |