From: Uli Z. <ul...@ri...> - 2006-06-27 05:13:06
|
Hi, I'm encountering the following bug in fetchmail (versions from at least 6.2.5 up to 6.3.4) running in daemon mode on Mac OS X (versions from at least 10.4.2 up to 10.4.7) with a 60s poll interval. Usually, daemon mode works just fine. If the Internet connection goes down for whatever reason, fetchmail will produce a SOCKET error ("No address associated with nodename"), as you would expect. If the Internet connection comes up again after a "short" amount of time, fetchmail will resume fetching mails as it should. However, if the Internet connection is down for a "longer" amount of time, fetchmail will keep reporting the same SOCKET errors if the Internet connection is finally up again, and never resume fetching mails until it is restarted (then everything works just fine again). I can't give an exact number for "longer", it can be everything from 15 to 90 minutes it seems (after 90 minutes, the issue *always* occurs, but sometimes it already occurs after 15 minutes). Obviously, this makes using fetchmail in daemon mode highly unreliable as far as timely delivery of mails is concerned, and more or less unusable. The issue is obviously tied to the the output of getaddrinfo(). If the issue occurs, it is because getaddrinfo() erroneously keeps returning the respective error after the the Internet connection is up again. At first I thought that this may be a bug in Mac OS X's getaddrinfo() implementation. However, no such bug was known, so I performed the following test myself (with extremely weird results): At the beginning of fetchmail's daemon loop (after line 590 in fetchmail.c (of 6.3.4)) I inserted the following test code: struct addrinfo *ai, req; int gaiRes; memset(&req, 0, sizeof(struct addrinfo)); req.ai_socktype = SOCK_STREAM; if (gaiRes=getaddrinfo("pop.1und1.com", "pop3s", &req, &ai)) { report(stdout, GT_("GAI: 1und1: ERROR: %s\n"), gai_strerror(gaiRes)); } else { freeaddrinfo(ai); report(stdout, GT_("GAI: 1und1: OK\n")); } This is a code snippet basically taken from socket.c where the actual SOCKET error occurs. The queried host is one of the hosts I actually use with fetchmail. I repeated this code a second time for another host I actually use ("popmail.server.uni-frankfurt.de", "pop3s"), and a third time, as a comparison, for Apple's web server ("www.apple.com", "http"). Then I took *exactly* this code (well, apart from replacing "report" by "printf") I inserted at the beginning of the fetchmail loop and put it in an endless loop (with 60s sleep after each loop) in a stand- alone Unix program. Finally, I ran fetchmail in daemon mode and the test Unix program simultaneously (both as root), disrupted the Internet connection and looked what happened. The really strange results are: 1. As long as the Internet connection is up, both programs (fetchmail and my test program) report "OK" for all three servers, as you would expect. 2. When the Internet connection goes down, for one server after another, a "No address associated with nodename" is displayed. The time it takes until this error shows up differs for the three servers (some caching issue, I suppose); however, it is synchronous on both programs (as soon as it is displayed on one program, it's also displayed on the other). Again, kind of what you would expect. 3. Now it gets weird. If the Internet connection comes up again, my test program immediately displays "OK" again for all three servers - that's how it should be. fetchmail, however, displays only "OK" for the Apple server; the two pop servers - that fetchmail actually works with in its code - keep displaying the "No address associated with nodename" error, which at least is consistent with the error messages in fetchmail's own code. So to sum up: Exactly the same getaddrinfo() code in fetchmail and a test program, run every 60s, produces exactly the same results for a server that otherwise is not called in fetchmail's original code, but different results for the two servers that are actually called in fetchmail's original code. In the latter case, the test program works as it should (no errors after the Internet connection is up again), while fetchmail does not work as it should (errors although the Internet is up again). The only explanation for this is that fetchmail somehow manages to do something to the servers it calls that corrupts future getaddrinfo() results for these (and only these) servers. Just what that could possibly be, I have no idea. Also, I have no way of knowing if this is only the case in connection with Mac OS X's getaddrinfo() implementation (I only have Macs here). My test program shows that Mac OS X's getaddrinfo() works fine under "normal" conditions, though. So I guess the first thing to find out is if this bug is reproducible on other Unix systems, and then proceed from there. Bye Uli ________________________________________________________ Uli Zappe, Solmsstraße 5, D-65189 Wiesbaden, Germany http://www.ritual.org Fon: +49-700-ULIZAPPE Fax: +49-700-ZAPPEFAX ________________________________________________________ |