From: Sunil S. <sh...@bo...> - 2009-07-22 18:44:20
|
Dear Matthias, Quoting from Matthias Andree's mail on Wed, Jul 22, 2009: > There seems to be a general issue with fetchmail around fetching large > messages, spam filtering, timeouts and thereabouts. > > > Some three years ago I made a foray into tracking pending DELEtes that we > could no longer issue or that were reverted by POP3 servers through socket > errors (often timeouts or network failures, not too clear to me at the > time), for reference, see > > - https://lists.berlios.de/pipermail/fetchmail-users/2006-May/000409.html > > and followups. At the time, we discussed several concerns around this > issue and I eventually shelved a solution for lack of ideas and time. ... > It seems the problem has two faces: > > 1. Timeout on the server we're fetching from, because the SMTP server or > MDA takes too long. There are actually two possibilities where the delay could be occurring. ==================================================================== Case 1: After fetchmail has sent the initial MAIL FROM: to the SMTP server/invoked the MDA with %F substitution: I am assuming here that the SMTP server/MDA is doing a DNS lookup/basic spam testing on the sender address. So, in the SMTP server case, fetchmail is waiting for a response from the server after sending the initial MAIL FROM: while the SMTP server is doing the DNS lookup/basic spam testing. In the MDA case, fetchmail goes ahead with popen() and starts writing the body to the MDA. As the MDA is still doing the DNS lookup/basic spam testing and has not yet started reading the body, the pipe buffer will get full and fetchmail will get blocked on an fwrite() later. While fetchmail is blocked, the POP3 mailserver is waiting for fetchmail to read the entire mail body. Note that the write() timeout for the POP3 mailserver may be shorter than the read() timeout. This means that the POP3 mailserver is more likely to timeout faster if it finds that the body is not getting drained at all in a reasonable amount of time. ==================================================================== Case 2: After fetchmail has sent the entire mail to the SMTP server/MDA: Here, the remote mailserver is waiting for a command from fetchmail, while fetchmail is waiting for a response from the SMTP server/exit code from the MDA. As mentioned above, the read() timeout may be longer for the POP3 mailserver and so it may not mind waiting for the next command. ==================================================================== Of course, a combination of the above two cases is also possible. > Generally, I see several approaches to 1: > > a. queue downloaded messages before handing them off for delivery. This > avoids timeouts that originate in the SMTP/LMTP server or MDA that > fetchmail forwards to. This should work. Of course, fetchmail will have to work entirely with UIDs as it will have to reconnect later and mark delivered mails for deletion. > b. Alternatively, we could try making fetchmail multithreaded and keeping > the POP3 server happy by spamming it with NOOP. I'm not sure how good this > works, how many POP3 servers implement NOOP, how many NOOP in sequence > they tolerate. Given fetchmail's design, it's very intrusive and amounts > to a rewrite of several major parts. It would have other benefits, but > it's a major effort. This will not work in Case 1. There, the POP3 mailserver is obviously in no mood for NOOPs and may even treat it as a protocol error if it gets a command even though the complete body has not been sent. > c. Alternatively, we could try to reconnect after loss of connection - > however, we may lose prior DELE commands when we don't send QUIT, so > again, we need to bundle DELE requests at the end or for a separate > transaction. Given that many sites (including hotmail, where Tony had his > problem) limit the number of logins per unit of time, often to once per 15 > minutes, we can't preventively send QUIT so as not to lock ourselves out. > Anyways, the solution means we would do 2. In Case 1 above, the POP3 mailserver may have timed out before sending the entire body. If this is happening repeatedly, fetchmail will always fail on the same mail. > Fixing 2 is sort of a requisite for solving 1 in way a or c - we need to > track more state. This does entail changing the .fetchids format as > discussed in 2006, but the UID parser appeared very tolerant even at that > time, so that an extension would be possible and backwards compatible. I > would feel more comfortable checking that again, but I think I checked > thoroughly in 2006 already. Even if we must change the .fetchids > format/layout, I'm open to it. Well, changing the .fetchids format is anyway a must. If you can incorporate the UID parser, it will be great. If I remember correctly, the UID parser also had an option to mark bad mails. This would be used in such cases where there is a repeated delivery failure on the same mail. Once a certain bad count is reached, fetchmail will stop attempting to download the mail. > Functionally, we'd probably need to bundle DELEs into a bulk operation of > "DELE n1 DELE n2 DELE n3 ... DELE nm QUIT" so that we have a reasonable > chance that the server isn't going away from boredom between the first > DELE and the QUIT, and we have more chances to avoid UID reassignment and > "delete wrong message" issues that happen in the race Sunil described, i. > e. if the network dies if the server executes QUIT but fetchmail doesn't > see the +OK response. This should be possible. I have not gone through the cases you have mentioned yet, but it would be better to categorize them as Case 1 or Case 2 (or both!) first before deciding the course of action. For SMTP server, it will be simple as the time between the SMTP transactions will give a clear indication in syslog. For MDA, this will probably require an strace output. -- Sunil Shetye. |