|
From: Uli Z. <ul...@ri...> - 2006-06-27 05:13:06
|
Hi,
I'm encountering the following bug in fetchmail (versions from at
least 6.2.5 up to 6.3.4) running in daemon mode on Mac OS X (versions
from at least 10.4.2 up to 10.4.7) with a 60s poll interval.
Usually, daemon mode works just fine. If the Internet connection goes
down for whatever reason, fetchmail will produce a SOCKET error ("No
address associated with nodename"), as you would expect. If the
Internet connection comes up again after a "short" amount of time,
fetchmail will resume fetching mails as it should. However, if the
Internet connection is down for a "longer" amount of time, fetchmail
will keep reporting the same SOCKET errors if the Internet connection
is finally up again, and never resume fetching mails until it is
restarted (then everything works just fine again). I can't give an
exact number for "longer", it can be everything from 15 to 90 minutes
it seems (after 90 minutes, the issue *always* occurs, but sometimes
it already occurs after 15 minutes).
Obviously, this makes using fetchmail in daemon mode highly
unreliable as far as timely delivery of mails is concerned, and more
or less unusable.
The issue is obviously tied to the the output of getaddrinfo(). If
the issue occurs, it is because getaddrinfo() erroneously keeps
returning the respective error after the the Internet connection is
up again. At first I thought that this may be a bug in Mac OS X's
getaddrinfo() implementation. However, no such bug was known, so I
performed the following test myself (with extremely weird results):
At the beginning of fetchmail's daemon loop (after line 590 in
fetchmail.c (of 6.3.4)) I inserted the following test code:
struct addrinfo *ai, req;
int gaiRes;
memset(&req, 0, sizeof(struct addrinfo));
req.ai_socktype = SOCK_STREAM;
if (gaiRes=getaddrinfo("pop.1und1.com", "pop3s", &req, &ai))
{
report(stdout, GT_("GAI: 1und1: ERROR: %s\n"), gai_strerror(gaiRes));
}
else
{
freeaddrinfo(ai);
report(stdout, GT_("GAI: 1und1: OK\n"));
}
This is a code snippet basically taken from socket.c where the actual
SOCKET error occurs. The queried host is one of the hosts I actually
use with fetchmail. I repeated this code a second time for another
host I actually use ("popmail.server.uni-frankfurt.de", "pop3s"), and
a third time, as a comparison, for Apple's web server
("www.apple.com", "http").
Then I took *exactly* this code (well, apart from replacing "report"
by "printf") I inserted at the beginning of the fetchmail loop and
put it in an endless loop (with 60s sleep after each loop) in a stand-
alone Unix program.
Finally, I ran fetchmail in daemon mode and the test Unix program
simultaneously (both as root), disrupted the Internet connection and
looked what happened.
The really strange results are:
1. As long as the Internet connection is up, both programs (fetchmail
and my test program) report "OK" for all three servers, as you would
expect.
2. When the Internet connection goes down, for one server after
another, a "No address associated with nodename" is displayed. The
time it takes until this error shows up differs for the three servers
(some caching issue, I suppose); however, it is synchronous on both
programs (as soon as it is displayed on one program, it's also
displayed on the other). Again, kind of what you would expect.
3. Now it gets weird. If the Internet connection comes up again, my
test program immediately displays "OK" again for all three servers -
that's how it should be. fetchmail, however, displays only "OK" for
the Apple server; the two pop servers - that fetchmail actually works
with in its code - keep displaying the "No address associated with
nodename" error, which at least is consistent with the error messages
in fetchmail's own code.
So to sum up:
Exactly the same getaddrinfo() code in fetchmail and a test program,
run every 60s, produces exactly the same results for a server that
otherwise is not called in fetchmail's original code, but different
results for the two servers that are actually called in fetchmail's
original code. In the latter case, the test program works as it
should (no errors after the Internet connection is up again), while
fetchmail does not work as it should (errors although the Internet is
up again).
The only explanation for this is that fetchmail somehow manages to do
something to the servers it calls that corrupts future getaddrinfo()
results for these (and only these) servers. Just what that could
possibly be, I have no idea.
Also, I have no way of knowing if this is only the case in connection
with Mac OS X's getaddrinfo() implementation (I only have Macs here).
My test program shows that Mac OS X's getaddrinfo() works fine under
"normal" conditions, though.
So I guess the first thing to find out is if this bug is reproducible
on other Unix systems, and then proceed from there.
Bye
Uli
________________________________________________________
Uli Zappe, Solmsstraße 5, D-65189 Wiesbaden, Germany
http://www.ritual.org
Fon: +49-700-ULIZAPPE
Fax: +49-700-ZAPPEFAX
________________________________________________________
|