From: Matthias A. <mat...@gm...> - 2009-07-22 13:51:45
|
Dear Sunil, dear list readers, There seems to be a general issue with fetchmail around fetching large messages, spam filtering, timeouts and thereabouts. Some three years ago I made a foray into tracking pending DELEtes that we could no longer issue or that were reverted by POP3 servers through socket errors (often timeouts or network failures, not too clear to me at the time), for reference, see - https://lists.berlios.de/pipermail/fetchmail-users/2006-May/000409.html and followups. At the time, we discussed several concerns around this issue and I eventually shelved a solution for lack of ideas and time. Originally reported by Frédéric Marchal around that time (2006 or 2005, I didn't check). However, I have had at least two plausible reports (perhaps more that I don't currently recall or connect) since. In these reports, the local delivery took so long that the server disconnected fetchmail before it could even send the DELE, or fetchmail disconnected itself if the SMTP server took too long to actually accept a message. One newer issue was filed to the Berlios bug tracker as #10972 by Viktor Binzberger in 2007. The other was reported to the fetchmail-users@ list last Friday (2009-07-17) by Tony Morehen; I have received a full log off-list. While Viktor's problem is worked around in fetchmail 6.3.10 through extending the SMTP client timeouts to the minimum RFC 5321 (SMTP, obsoleting RFC 2821 and RFC 821) recommended periods, this isn't going to help if the POP3 server drops the connection as early as in Tony's case - plus he's using an MDA anyways. It seems the problem has two faces: 1. Timeout on the server we're fetching from, because the SMTP server or MDA takes too long. 2. Tracking what messages were successfully fetched and delivered, and which subset of these were flushed. Generally, I see several approaches to 1: a. queue downloaded messages before handing them off for delivery. This avoids timeouts that originate in the SMTP/LMTP server or MDA that fetchmail forwards to. b. Alternatively, we could try making fetchmail multithreaded and keeping the POP3 server happy by spamming it with NOOP. I'm not sure how good this works, how many POP3 servers implement NOOP, how many NOOP in sequence they tolerate. Given fetchmail's design, it's very intrusive and amounts to a rewrite of several major parts. It would have other benefits, but it's a major effort. c. Alternatively, we could try to reconnect after loss of connection - however, we may lose prior DELE commands when we don't send QUIT, so again, we need to bundle DELE requests at the end or for a separate transaction. Given that many sites (including hotmail, where Tony had his problem) limit the number of logins per unit of time, often to once per 15 minutes, we can't preventively send QUIT so as not to lock ourselves out. Anyways, the solution means we would do 2. Another workaround is making sure that your MTA or MDA accepts messages quickly (a few seconds at most, as common in Postfix unless set to non-default pre-queue content inspection) and uses the spam filter off-line after accepting the message. I understand that it's not always the best solution, but since the POP3 or IMAP server has accepted the message already, we seriously can only either forward or drop it anyways - bouncing is dangerous, since the envelope sender might be forged; I think messages can only sensibly be rejected before being accepted by your respective ISP through SMTP. We could add a trivial queueing agent separately and link it through the MDA option for early experiments (sort of nullmailer for inbound messages), but Fixing 2 is sort of a requisite for solving 1 in way a or c - we need to track more state. This does entail changing the .fetchids format as discussed in 2006, but the UID parser appeared very tolerant even at that time, so that an extension would be possible and backwards compatible. I would feel more comfortable checking that again, but I think I checked thoroughly in 2006 already. Even if we must change the .fetchids format/layout, I'm open to it. Functionally, we'd probably need to bundle DELEs into a bulk operation of "DELE n1 DELE n2 DELE n3 ... DELE nm QUIT" so that we have a reasonable chance that the server isn't going away from boredom between the first DELE and the QUIT, and we have more chances to avoid UID reassignment and "delete wrong message" issues that happen in the race Sunil described, i. e. if the network dies if the server executes QUIT but fetchmail doesn't see the +OK response. Anybody care to share their thoughts? I'm looking forward to them. Best regards and thanks for your time. -- Matthias Andree |