From: Thomas J. <tho...@in...> - 2010-07-27 15:53:08
|
Hello, I recently had two cases of hanging fetchmail processes during TLS negotiation with two different servers. The config looks like this: ----------------- defaults proto pop3 set logfile /var/log/fetchmail set postmaster "postmaster" set showdots poll "server" timeout 90 user "USER" password "PASS" to "something" fetchall ... poll "server" timeout 90 user "USERx" password "PASS" to "otherX" fetchall ----------------- I was able to grab a log of a stalled fetchmail process: ---------------------------------------------- fetchmail: 6.3.17 querying pop.1und1.com (protocol POP3) at Sat, 24 Jul 2010 04:57:01 +0200 (CEST): poll started fetchmail: Trying to connect to 212.227.15.161/110...connected. fetchmail: POP3< +OK POP server ready H mimap4 fetchmail: POP3> CAPA fetchmail: POP3< +OK Capability list follows fetchmail: POP3< TOP fetchmail: POP3< USER fetchmail: POP3< UIDL fetchmail: POP3< STLS fetchmail: POP3< IMPLEMENTATION trinity fetchmail: POP3< . fetchmail: POP3> STLS fetchmail: POP3< +OK Begin TLS negotiation ---------- fetchmail hangs ------------------ Any idea why the timeout of 90 seconds is not triggered? Maybe it's stuck in the openssl code and the timeout is not working? The openssl version in use is 0.9.8o. Best regards, Thomas Jarosch |
From: Matthias A. <mat...@gm...> - 2010-07-27 17:03:34
|
Am 27.07.2010 15:21, schrieb Thomas Jarosch: > I recently had two cases of hanging fetchmail processes > during TLS negotiation with two different servers. Might be a missing call to set_timeout() around the POP3 STLS/getauth() related methods. Try wrapped SSL ("ssl" option) as a workaround, it might help, and if so, help debugging. -- Matthias Andree |
From: Thomas J. <tho...@in...> - 2010-07-28 11:31:58
|
On Tuesday, 27. July 2010 16:56:52 Matthias Andree wrote: > > I recently had two cases of hanging fetchmail processes > > during TLS negotiation with two different servers. > > Might be a missing call to set_timeout() around the POP3 STLS/getauth() > related methods. Try wrapped SSL ("ssl" option) as a workaround, it > might help, and if so, help debugging. The issue only appears every other month or so. I also thought about adding debug output to set_timeout() so we could trace the last set timeout, though this will only give a close approximation where that problem is. (or none at all if the code mostly uses the same timeout values) This morning I had a better idea: Enable core dumps and kill the task via "kill -11" if it hangs again. Then we'll know for sure where it's stuck. I prefer not to add the workaround for now as I'd like to trace the real issue. Thanks, Thomas |
From: Matthias A. <mat...@gm...> - 2010-07-28 15:31:24
|
Am 28.07.2010 11:31, schrieb Thomas Jarosch: > On Tuesday, 27. July 2010 16:56:52 Matthias Andree wrote: >> > I recently had two cases of hanging fetchmail processes >> > during TLS negotiation with two different servers. >> >> Might be a missing call to set_timeout() around the POP3 STLS/getauth() >> related methods. Try wrapped SSL ("ssl" option) as a workaround, it >> might help, and if so, help debugging. > > The issue only appears every other month or so. I also thought about adding > debug output to set_timeout() so we could trace the last set timeout, > though this will only give a close approximation where that problem is. > (or none at all if the code mostly uses the same timeout values) > > This morning I had a better idea: Enable core dumps and kill the task via > "kill -11" if it hangs again. Then we'll know for sure where it's stuck. Don't. fetchmail in most modes suppresses writing core files with setrlimit(), to avoid passwords hitting the disk outside the .netrc or .fetchmailrc files. Instead, if fetchmail hangs, just attach gdb with "gdb /usr/bin/fetchmail-unstripped 12345", where fetchmail-unstripped is an executable compiled with -g or better -ggdb3 option, and installed before the run, without running strip and without adding -s to the install command line. Then, once gdb has attached to the hanging process, "backtrace full" will provide the necessary debug output. Inside gdb you can issue "signal 2" (SIGINT) and "continue" to have fetchmail terminate the run in an orderly manner. > I prefer not to add the workaround for now > as I'd like to trace the real issue. :) -- Matthias Andree |
From: Thomas J. <tho...@in...> - 2010-07-28 15:44:09
|
On Wednesday, 28. July 2010 15:31:19 Matthias Andree wrote: > > This morning I had a better idea: Enable core dumps and kill the task > > via "kill -11" if it hangs again. Then we'll know for sure where it's > > stuck. > > Don't. fetchmail in most modes suppresses writing core files with > setrlimit(), to avoid passwords hitting the disk outside the .netrc or > .fetchmailrc files. I just patched that out :o) Before pushing the "new" version to the productive system, I've verified the coredump gives a useful backtrace. > Instead, if fetchmail hangs, just attach gdb with "gdb > /usr/bin/fetchmail-unstripped 12345", where fetchmail-unstripped is an > executable compiled with -g or better -ggdb3 option, and installed before > the run, without running strip and without adding -s to the install > command line. No gdb on the productive box. I've just installed the unstripped version (took me 30min to discover the -s switch in the compile statement. Whoops). Now we need to wait a month or so... thanks for your suggestions! Cheers, Thomas |
From: Matthias A. <mat...@gm...> - 2010-08-05 00:18:32
Attachments:
0001-Apply-timeout-to-getauth-methods.patch
|
Am 28.07.2010, 15:44 Uhr, schrieb Thomas Jarosch: > On Wednesday, 28. July 2010 15:31:19 Matthias Andree wrote: >> > This morning I had a better idea: Enable core dumps and kill the task >> > via "kill -11" if it hangs again. Then we'll know for sure where it's >> > stuck. >> >> Don't. fetchmail in most modes suppresses writing core files with >> setrlimit(), to avoid passwords hitting the disk outside the .netrc or >> .fetchmailrc files. > > I just patched that out :o) Before pushing the "new" version to the > productive system, I've verified the coredump gives a useful backtrace. > >> Instead, if fetchmail hangs, just attach gdb with "gdb >> /usr/bin/fetchmail-unstripped 12345", where fetchmail-unstripped is an >> executable compiled with -g or better -ggdb3 option, and installed >> before >> the run, without running strip and without adding -s to the install >> command line. > > No gdb on the productive box. I've just installed the unstripped version > (took me 30min to discover the -s switch in the compile statement. > Whoops). > Now we need to wait a month or so... thanks for your suggestions! Hi Thomas, Well... perhaps not. The attached patch sets the timeout for the getauth() stage (which entails STARTTLS-like negotiation). Please try that too and see if you get timeout reports while fetchmail tries to negotiate TLS. It should. Thanks for the report and offered help in debugging. Would be good if you could report back in a month or so :-) Best regards -- Matthias Andree |
From: R P H. <he...@ow...> - 2010-07-28 20:03:42
|
On Wed, 28 Jul 2010, Matthias Andree wrote: > Don't. fetchmail in most modes suppresses writing core files > with setrlimit(), to avoid passwords hitting the disk > outside the .netrc or .fetchmailrc files. goodness -- I recall reporting that matter to ESR and getting the patch, probably a decade ago -- Russ herrold |
From: Matthias A. <mat...@gm...> - 2010-07-28 20:24:00
|
Am 28.07.2010 19:04, schrieb R P Herrold: > On Wed, 28 Jul 2010, Matthias Andree wrote: > >> Don't. fetchmail in most modes suppresses writing core files >> with setrlimit(), to avoid passwords hitting the disk >> outside the .netrc or .fetchmailrc files. > > goodness -- I recall reporting that matter to ESR and getting > the patch, probably a decade ago Hi there, just for our edification, with grep & git gui blame, it took me only a couple of seconds to figure this out: It's been 12 years minus 5 days that ESR committed this code to fetchmail.c: > /* > * Before getting passwords, disable core dumps unless -v -d0 mode is on. > * Core dumps could otherwise contain passwords to be scavenged by a > * cracker. > */ > if (outlevel < O_VERBOSE || run.poll_interval > 0) > { > struct rlimit corelimit; > corelimit.rlim_cur = 0; > corelimit.rlim_max = 0; > setrlimit(RLIMIT_CORE, &corelimit); > } (I'm pasting from a future development branch that has lost the #ifdef HAVE_SETRLIMIT guards.) Now, after two repository conversions (CVS->SVN and SVN->Git), we can still figure when he did that: > commit 1587e4153763fab493acf2deee9028e24e1da57f > Author: Eric S. Raymond <es...@th...> > Date: Sun Aug 2 16:30:25 1998 +0000 > > Improved security. > > svn path=/trunk/; revision=2032 >From OLDNEWS: > fetchmail-4.5.5 (Mon Aug 3 16:08:14 EDT 1998), 15286 lines: ... > * Added setrlimit call to inhibit core dumps unless debugging is on. ... This also states how Thomas can enable core dumps: always run with -vd0 (which spams the logs or cron output quite a bit). I had - long ago - read there was such code, but lacked the time to dig deeper earlier today. Now that I got this pointer, here we go... :) Best regards -- Matthias Andree |
From: Thomas J. <tho...@in...> - 2011-04-28 10:01:59
|
Hello Matthias, On Thursday, 5. August 2010 00:18:30 Matthias Andree wrote: > The attached patch sets the timeout for the getauth() stage (which > entails STARTTLS-like negotiation). Please try that too and see if you > get timeout reports while fetchmail tries to negotiate TLS. It should. > > Thanks for the report and offered help in debugging. Would be good if you > could report back in a month or so :-) I just had another case of this issue on a different box, this time running fetchmail 6.3.18. Here's all the info I was able to collect: Log from fetchmail: fetchmail: POP3< +OK Capability list follows fetchmail: POP3< TOP fetchmail: POP3< USER fetchmail: POP3< UIDL fetchmail: POP3< STLS fetchmail: POP3< SASL PLAIN fetchmail: POP3< IMPLEMENTATION trinity fetchmail: POP3< . fetchmail: POP3> STLS fetchmail: POP3< +OK Begin TLS negotiation fetchmail was stuck at that line since 21.04.2011. [root@intranator log]# strace -p 1505 Process 1505 attached - interrupt to quit read(3, (gdb) bt #0 0x00c17424 in __kernel_vsyscall () #1 0x00867ef3 in __read_nocancel () at ../sysdeps/unix/syscall- template.S:82 #2 0x0033771e in sock_read () from /usr/lib/libcrypto.so.8 #3 0x00000003 in ?? () #4 0x09e7cafe in ?? () #5 0x000006bf in ?? () #6 0x89f4e239 in ?? () #7 0x6ce90be1 in ?? () #8 0xa8a2f11c in ?? () #9 0x60917bce in ?? () #10 0x003c8c28 in ?? () from /usr/lib/libcrypto.so.8 #11 0x09e762f8 in ?? () #12 0x00000000 in ?? () Even though I installed the -debuginfo packages, I wasn't able to get a meaningful backtrace. So the stall happens before the authentication state? Cheers, Thomas |
From: Matthias A. <mat...@gm...> - 2011-04-28 13:19:08
|
Am 28.04.2011 09:38, schrieb Thomas Jarosch: > Hello Matthias, > > On Thursday, 5. August 2010 00:18:30 Matthias Andree wrote: >> The attached patch sets the timeout for the getauth() stage (which >> entails STARTTLS-like negotiation). Please try that too and see if you >> get timeout reports while fetchmail tries to negotiate TLS. It should. >> >> Thanks for the report and offered help in debugging. Would be good if you >> could report back in a month or so :-) > > I just had another case of this issue on a different box, > this time running fetchmail 6.3.18. > > Here's all the info I was able to collect: > > Log from fetchmail: > fetchmail: POP3< +OK Capability list follows > fetchmail: POP3< TOP > fetchmail: POP3< USER > fetchmail: POP3< UIDL > fetchmail: POP3< STLS > fetchmail: POP3< SASL PLAIN > fetchmail: POP3< IMPLEMENTATION trinity > fetchmail: POP3< . > fetchmail: POP3> STLS > fetchmail: POP3< +OK Begin TLS negotiation Looks like GMX. > fetchmail was stuck at that line since 21.04.2011. > > [root@intranator log]# strace -p 1505 > Process 1505 attached - interrupt to quit > read(3, > > (gdb) bt > #0 0x00c17424 in __kernel_vsyscall () > #1 0x00867ef3 in __read_nocancel () at ../sysdeps/unix/syscall- > template.S:82 > #2 0x0033771e in sock_read () from /usr/lib/libcrypto.so.8 > #3 0x00000003 in ?? () > #4 0x09e7cafe in ?? () > #5 0x000006bf in ?? () > #6 0x89f4e239 in ?? () > #7 0x6ce90be1 in ?? () > #8 0xa8a2f11c in ?? () > #9 0x60917bce in ?? () > #10 0x003c8c28 in ?? () from /usr/lib/libcrypto.so.8 > #11 0x09e762f8 in ?? () > #12 0x00000000 in ?? () > > > Even though I installed the -debuginfo packages, > I wasn't able to get a meaningful backtrace. > > So the stall happens before the authentication state? Hi Thomas, what OS and version (distribution) does this happen on? Do the -debuginfo packages match the actual RPMs? Could you try building 6.3.19 from source and use that to somehow reproduce the hang? I wonder if the SSL stuff somehow masks the timeout alarm we're using, that could possibly explain things, but I don't have an idea how to figure that out yet. Sorry - a backtrace could possibly help a bit because I could try reading the OpenSSL code paths in the OpenSSL sources. Best regards Matthias |
From: Thomas J. <tho...@in...> - 2011-04-28 14:19:03
|
On Thursday, 28. April 2011 13:18:59 Matthias Andree wrote: > Looks like GMX. Yes, 1&1 ;) > what OS and version (distribution) does this happen on? Do the > -debuginfo packages match the actual RPMs? Could you try building > 6.3.19 from source and use that to somehow reproduce the hang? It's our custom distribution which is based on Fedora. I recovered the correct -debuginfo packages from the time I built the binary RPMs. If the debuginfo packages don't fit (f.e. glibc header changed), gdb will skip those debug symbols with an CRC mismatch error. > I wonder if the SSL stuff somehow masks the timeout alarm we're using, > that could possibly explain things, but I don't have an idea how to > figure that out yet. It just came to my mind: We can simulate the situation with socat. And I was able to trigger the problem. Try this: - Start a "fake POP3 server" with socat: socat - tcp4-listen:110 - fetchmail connects to this server - Paste the welcome greeting: +OK POP fake server ready H mimap4 - fetchmail will reply with: "CAPA" - Paste this: +OK Capability list follows TOP USER UIDL STLS SASL PLAIN IMPLEMENTATION trinity . - fetchmail will reply: "STLS" - Paste this: +OK Begin TLS negotiation Then do nothing in socat until the timeout should be triggered. Hope that helps, Thomas |
From: Matthias A. <mat...@gm...> - 2011-04-28 14:42:23
|
Am 28.04.2011 13:50, schrieb Thomas Jarosch: > On Thursday, 28. April 2011 13:18:59 Matthias Andree wrote: >> Looks like GMX. > > Yes, 1&1 ;) > >> what OS and version (distribution) does this happen on? Do the >> -debuginfo packages match the actual RPMs? Could you try building >> 6.3.19 from source and use that to somehow reproduce the hang? > > It's our custom distribution which is based on Fedora. > > I recovered the correct -debuginfo packages from the time I built the binary > RPMs. If the debuginfo packages don't fit (f.e. glibc header changed), > gdb will skip those debug symbols with an CRC mismatch error. > >> I wonder if the SSL stuff somehow masks the timeout alarm we're using, >> that could possibly explain things, but I don't have an idea how to >> figure that out yet. > > It just came to my mind: We can simulate the situation with socat. > And I was able to trigger the problem. Try this: > > - Start a "fake POP3 server" with socat: socat - tcp4-listen:110 > > - fetchmail connects to this server > > - Paste the welcome greeting: > +OK POP fake server ready H mimap4 > > - fetchmail will reply with: "CAPA" > > - Paste this: > +OK Capability list follows > TOP > USER > UIDL > STLS > SASL PLAIN > IMPLEMENTATION trinity > . > > - fetchmail will reply: "STLS" > > - Paste this: > +OK Begin TLS negotiation > > > Then do nothing in socat until the timeout should be triggered. Good idea. Timeouts work for me in 6.3.20-pre1, and I haven't changed the timeout handling since 6.3.18 AFAIR. The default is 300, I've changed it to 10 only for testing. Is there anything that masks or traps SIGALRM in your setup? I'm currently testing on Ubuntu 10.10 amd64 with OpenSSL 0.9.8o and libc6 which is embedded GNU libc 2.12.1 (both likely with Ubuntu patches). In one console: $ sudo socat -dddD - tcp4-listen:110,reuseaddr 1. paste +OK\n as greeting 2. paste +OK\nSTLS\n.\n as response to STLS, then wait In another console (I hope Thunderbird doesn't trash everything here): $ LC_ALL=C strace -T ./fetchmail -p pop3 -Nd0 localhost --nosyslog -s --timeout 10 socket(PF_INET, SOCK_DGRAM, IPPROTO_IP) = 3 <0.000023> connect(3, {sa_family=AF_INET, sin_port=htons(110), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 <0.000028> getsockname(3, {sa_family=AF_INET, sin_port=htons(38674), sin_addr=inet_addr("127.0.0.1")}, [16]) = 0 <0.000025> close(3) = 0 <0.000027> socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3 <0.000026> connect(3, {sa_family=AF_INET, sin_port=htons(110), sin_addr=inet_addr("127.0.0.1")}, 16) = 0 <0.000101> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000016> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000013> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000014> recvfrom(3, "+OK\n", 512, MSG_PEEK, NULL, NULL) = 4 <3.263103> read(3, "+OK\n", 4) = 4 <0.000031> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000021> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000020> write(3, "CAPA\r\n", 6) = 6 <0.000090> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000020> recvfrom(3, "+OK\n", 512, MSG_PEEK, NULL, NULL) = 4 <1.739998> read(3, "+OK\n", 4) = 4 <0.000030> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000020> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000020> recvfrom(3, "STLS\n", 63, MSG_PEEK, NULL, NULL) = 5 <0.959806> read(3, "STLS\n", 5) = 5 <0.000026> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000019> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000020> recvfrom(3, ".\n", 63, MSG_PEEK, NULL, NULL) = 2 <0.220514> read(3, ".\n", 2) = 2 <0.000030> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000021> write(3, "STLS\r\n", 6) = 6 <0.000078> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={10, 0}}, NULL) = 0 <0.000031> recvfrom(3, 0x7fff216f2770, 512, 2, 0, 0) = ? ERESTARTSYS (To be restarted) <9.999914> --- SIGALRM (Alarm clock) @ 0 (0) --- rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0 <0.000023> rt_sigprocmask(SIG_UNBLOCK, ~[RTMIN RT_1], NULL, 8) = 0 <0.000020> fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0 <0.000021> mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7ff3a6332000 <0.000024> write(1, "fetchmail: timeout after 10 seco"..., 66fetchmail: timeout after 10 seconds waiting for server localhost. ) = 66 <0.000031> rt_sigaction(SIGALRM, {0x40dce0, [], SA_RESTORER, 0x7ff3a4a75c20}, {0x40e0e0, [], SA_RESTORER, 0x7ff3a4a75c20}, 8) = 0 <0.000019> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={60, 0}}, NULL) = 0 <0.000020> close(3) = 0 <0.000097> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000029> rt_sigaction(SIGALRM, {0x40e0e0, [], SA_RESTORER, 0x7ff3a4a75c20}, {0x40dce0, [], SA_RESTORER, 0x7ff3a4a75c20}, 8) = 0 <0.000019> write(2, "fetchmail: ", 11fetchmail: ) = 11 <0.000024> write(2, "socket error while fetching from"..., 51socket error while fetching from mandree@localhost ) = 51 <0.000023> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000019> rt_sigaction(SIGALRM, {0x4074f0, [], SA_RESTORER, 0x7ff3a4a75c20}, {0x40e0e0, [], SA_RESTORER, 0x7ff3a4a75c20}, 8) = 0 <0.000018> write(1, "fetchmail: Query status=2 (SOCKE"..., 35fetchmail: Query status=2 (SOCKET) ) = 35 <0.000023> |
From: Matthias A. <mat...@gm...> - 2011-04-28 14:57:26
|
BTW, timeouts also work for me on openSUSE 11.4 i386, with openSSL 1.0.0c and glibc 2.11.3, and fetchmail 6.3.20-pre1. |
From: Thomas J. <tho...@in...> - 2011-04-28 17:06:12
|
On Thursday, 28. April 2011 14:42:21 Matthias Andree wrote: > $ sudo socat -dddD - tcp4-listen:110,reuseaddr > 1. paste +OK\n as greeting > 2. paste +OK\nSTLS\n.\n as response to STLS, then wait Ok, I found the difference. If I wait at the same point as you do, the timeout is triggered. Please at step 3.: 3. paste "+OK do it\n" and then wait Some SSL garbage will appear in socat but that's fine. The strace output looks like this: write(3, "STLS\r\n", 6) = 6 <0.000078> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={90, 0}}, NULL) = 0 <0.000014> recv(3, "+OK do it\n", 512, MSG_PEEK) = 10 <3.055521> read(3, "+OK do it\n", 10) = 10 <0.000015> setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000012> ... -> The timeout get's disabled again. Any idea why? Cheers, Thomas |
From: Matthias A. <mat...@gm...> - 2011-04-28 17:28:38
|
Am 28.04.2011 17:06, schrieb Thomas Jarosch: > On Thursday, 28. April 2011 14:42:21 Matthias Andree wrote: >> $ sudo socat -dddD - tcp4-listen:110,reuseaddr >> 1. paste +OK\n as greeting >> 2. paste +OK\nSTLS\n.\n as response to STLS, then wait > > Ok, I found the difference. If I wait at the same point as you do, > the timeout is triggered. Please at step 3.: > > 3. paste "+OK do it\n" and then wait Got it, I can reproduce the problem. > Some SSL garbage will appear in socat but that's fine. > The strace output looks like this: > > write(3, "STLS\r\n", 6) = 6 <0.000078> > setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={90, 0}}, NULL) = 0 <0.000014> > recv(3, "+OK do it\n", 512, MSG_PEEK) = 10 <3.055521> > read(3, "+OK do it\n", 10) = 10 <0.000015> > setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 0}}, NULL) = 0 <0.000012> ouch. > ... > > -> The timeout get's disabled again. Any idea why? I've analyzed it. The trace in the Git master branch is: (gdb) bt #0 set_timeout (timeleft=0) at ../driver.c:100 #1 0x00000000004111a6 in gen_recv (sock=5, buf=0x7fffffff76d0 "+OK go\n", size=513) at ../transact.c:1575 #2 0x00000000004208c1 in pop3_ok (sock=0, argbuf=0x7fffffff7930 "STLS") at ../pop3.c:116 #3 0x0000000000410e56 in gen_transact (sock=5, fmt=0x42a0a2 "STLS") at ../transact.c:1632 #4 0x0000000000420ee4 in pop3_getauth (sock=5, ctl=0x4516d0, greeting=0x7fffffff9c30 "") at ../pop3.c:451 IOW, I had overlooked that gen_recv resets the timeout. We extract the STLS response, which resets the timeout, thus the subsequent SSLOpen isn't under some timeout. Sorry for the incomplete fix in 6.3.18, and thanks for the report. Now I need to make sure I catch all similar bugs before releasing the next version, and I need to update and re-issue the corresponding CVE (or a new one, need to check with the gurus) and security announcement. The fix needs a thorough analysis of the code. Note that I've already queued patches to ditch SSLv2 support, I need to reconsider that, or making that an option so that distributors can go ahead and update their 6.3.ancient version to 6.3.20 without major incompatibilities. I'll get back to this. Thanks again -- I do appreciate bug reports that are easy to reproduce :-) Best regards Matthias |
From: Matthias A. <mat...@gm...> - 2011-05-23 21:03:58
|
Am 28.04.2011 17:28, schrieb Matthias Andree: > Sorry for the incomplete fix in 6.3.18, and thanks for the report. Now > I need to make sure I catch all similar bugs before releasing the next > version, and I need to update and re-issue the corresponding CVE (or a > new one, need to check with the gurus) and security announcement. > > The fix needs a thorough analysis of the code. Note that I've already > queued patches to ditch SSLv2 support, I need to reconsider that, or > making that an option so that distributors can go ahead and update their > 6.3.ancient version to 6.3.20 without major incompatibilities. > > I'll get back to this. > > Thanks again -- I do appreciate bug reports that are easy to reproduce :-) Thomas, 6.3.20-pre1 should fix that - please test and report back (see the separate announcement). It took a while since I chose to set SO_SNDTIMEO/SO_RCVTIMEO, a BSD socket-level timeout feature, and also SO_KEEPALIVE, to detect crashed TCP connections (although that can take 2 hours and more than 11 minutes on some operating systems to trigger - instead you'll usually get a socket error). I need particular test reports of --idle mode. Best regards, Matthias |