From: Matthias A. <mat...@gm...> - 2006-06-28 02:46:54
|
Uli Zappe <ul...@ri...> writes: > When getaddrinfo() is called while the Internet is down, it may take > quite some time until it returns the respective error. Now, if the > "timeout" variable in fetchmail is set to a value shorter than this > time that getaddrinfo() needs, a call to SockOpen() (socket.c, line > 266) from (at least) line 1053 in driver.c will be interrupted by the > timeout signal handler. In this situation, freeaddrinfo() (line 311 in > socket.c) will never be called to close ai0 correctly. This, in turn, > seems to produce the corrupted behavior of future getaddrinfo() calls > (at least in Mac OS X's implementation). I haven't yet investigated all details, but your research looks valuable and plausible, thank you! While I'll certainly plug the leak (it may take until the week-end though), there might still be a related MacOS X bug. There is no obligation to call freeaddrinfo() before calling getaddrinfo() again, plus this function is required to be thread-safe (i. e. it needs to be reentrant). However, you say that the problems happen after a certain... > In my test case, getaddrinfo() may need up to 180s to time out. However, > I had set fetchmail's "timeout" parameter to only 60s. In my tests, > during the time when the Internet connection was down, at least one > "timeout after 60 seconds waiting to connect to server xy" did indeed > appear in the log for each server. ...amount of time. Can you check with "lsof" or similar tools (that can list open files and sockets) how many files and sockets fetchmail holds open at the time when the problems start? It might be that the OS itself is leaking sockets here which might appear in fetchmail's address space. > I have now set fetchmail's "timeout" variable to 6000 and repeated my > tests. No timeout message occurred, and after the Internet connection > was up again, fetchmail resumed fetching mails just as it should. > > So I'm quite sure that's the bug: Care must be taken that freeaddrinfo > () is called even if SockOpen() is interrupted by a timeout. Certainly, and to avoid leaking memory on disconnected computers would be reason enough to justify such a fix. > So you must either make ai0 a global variable (which won't work if you > plan to make fetchmail open more than one socket simultaneously), or > declare it in the calling code (driver.c or whatever) and pass it to > SockOpen. "you'll have to" or "you may have to" sounds more polite than "you must" (no offense taken, don't worry). > My question now is how to proceed. Since you definitely know fetchmail's > code much better than I do, it would make sense that you fix the bug in > all places where it might occur. Bugs related to signal handling (which is used for timeout handling) require extra care. I myself will have to review the code again before making changes in that area. > However, since the bug is crucial at least for me, I'd need a fix > soon. So do you think you will come up with a fix in a short amount of > time, of should I provide a fix temporarily (but I will probably > overlook some of the situations where this bug might possibly occur - > so far I'm only aware of driver.c line 1053 calling SockOpen())? I don't think I'll be able to handle this before Saturday, perhaps Sunday; but providing patches for you to test should be feasible. Note that I don't have MacOS X machines to test on either. Thank you! -- Matthias Andree |