From: Matthias A. <mat...@gm...> - 2009-07-23 09:42:12
|
Am 22.07.2009, 18:20 Uhr, schrieb Sunil Shetye <sh...@bo...>: > There are actually two possibilities where the delay could be > occurring. > > ==================================================================== > Case 1: After fetchmail has sent the initial MAIL FROM: to the SMTP > server/invoked the MDA with %F substitution: > > I am assuming here that the SMTP server/MDA is doing a DNS > lookup/basic spam testing on the sender address. > > So, in the SMTP server case, fetchmail is waiting for a response from > the server after sending the initial MAIL FROM: while the SMTP server > is doing the DNS lookup/basic spam testing. > > In the MDA case, fetchmail goes ahead with popen() and starts writing > the body to the MDA. As the MDA is still doing the DNS lookup/basic > spam testing and has not yet started reading the body, the pipe buffer > will get full and fetchmail will get blocked on an fwrite() later. > > While fetchmail is blocked, the POP3 mailserver is waiting for > fetchmail to read the entire mail body. > > Note that the write() timeout for the POP3 mailserver may be shorter > than the read() timeout. This means that the POP3 mailserver is more > likely to timeout faster if it finds that the body is not getting > drained at all in a reasonable amount of time. > ==================================================================== > Case 2: After fetchmail has sent the entire mail to the SMTP server/MDA: > > Here, the remote mailserver is waiting for a command from fetchmail, > while fetchmail is waiting for a response from the SMTP server/exit > code from the MDA. > > As mentioned above, the read() timeout may be longer for the POP3 > mailserver and so it may not mind waiting for the next command. > ==================================================================== I'm not sure if I want to design fetchmail around guesses if read() or write() timeouts on the server are set differently. >> Generally, I see several approaches to 1: >> >> a. queue downloaded messages before handing them off for delivery. This >> avoids timeouts that originate in the SMTP/LMTP server or MDA that >> fetchmail forwards to. > > This should work. Of course, fetchmail will have to work entirely with > UIDs as it will have to reconnect later and mark delivered mails for > deletion. Yup. That's what I want to do anyways in the next after-6.3.N releases (if I'll call them 6.4 or 6.5 or 7.0, I'll decide later). So we'd have to polish the existing UID patches for IMAP to support UIDVALIDITY (not a major issue, once you detect UIDVALIDITY changes, you discard all stored UIDs) - OTOH if fetchmail deals cleanly with the existing \Seen and \Deleted flags, we don't even need that for IMAP. I need to check the IMAP4r1 transaction model though. >> b. Alternatively, we could try making fetchmail multithreaded and >> keeping the POP3 server happy by spamming it with NOOP. I'm not sure >> how good this works, how many POP3 servers implement NOOP, how many >> NOOP in sequence >> they tolerate. Given fetchmail's design, it's very intrusive and amounts >> to a rewrite of several major parts. It would have other benefits, but >> it's a major effort. > > This will not work in Case 1. There, the POP3 mailserver is obviously > in no mood for NOOPs and may even treat it as a protocol error if it > gets a command even though the complete body has not been sent. True. So this approach is out for yet another reason. >> c. Alternatively, we could try to reconnect after loss of connection - >> however, we may lose prior DELE commands when we don't send QUIT, so >> again, we need to bundle DELE requests at the end or for a separate >> transaction. Given that many sites (including hotmail, where Tony had >> his >> problem) limit the number of logins per unit of time, often to once per >> 15 >> minutes, we can't preventively send QUIT so as not to lock ourselves >> out. >> Anyways, the solution means we would do 2. > > In Case 1 above, the POP3 mailserver may have timed out before sending > the entire body. If this is happening repeatedly, fetchmail will > always fail on the same mail. Also solved by queueing (a.) >> Fixing 2 is sort of a requisite for solving 1 in way a or c - we need to >> track more state. This does entail changing the .fetchids format as >> discussed in 2006, but the UID parser appeared very tolerant even at >> that >> time, so that an extension would be possible and backwards compatible. I >> would feel more comfortable checking that again, but I think I checked >> thoroughly in 2006 already. Even if we must change the .fetchids >> format/layout, I'm open to it. > > Well, changing the .fetchids format is anyway a must. If you can > incorporate the UID parser, it will be great. If I remember correctly, I'm not sure what you mean by "incorporate" here. > the UID parser also had an option to mark bad mails. This would be > used in such cases where there is a repeated delivery failure on the > same mail. Once a certain bad count is reached, fetchmail will stop > attempting to download the mail. I don't think fetchmail has such a feature in the baseline code. The internal uid data structure is: struct idlist { char *id; union { struct { int num; flag mark; /* UID-index information */ #define UID_UNSEEN 0 /* hasn't been seen */ #define UID_SEEN 1 /* seen, but not deleted */ #define UID_DELETED 2 /* this message has been marked deleted */ #define UID_EXPUNGED 3 /* this message has been expunged */ } status; char *id2; } val; struct idlist *next; }; You may be referring to fetchmail marking /servers/ "wedged" if it sees too many timeouts on a particular server (this only works in daemon mode). >> Functionally, we'd probably need to bundle DELEs into a bulk operation >> of >> "DELE n1 DELE n2 DELE n3 ... DELE nm QUIT" so that we have a reasonable >> chance that the server isn't going away from boredom between the first >> DELE and the QUIT, and we have more chances to avoid UID reassignment >> and >> "delete wrong message" issues that happen in the race Sunil described, >> i. >> e. if the network dies if the server executes QUIT but fetchmail doesn't >> see the +OK response. > > This should be possible. > > I have not gone through the cases you have mentioned yet, but it would > be better to categorize them as Case 1 or Case 2 (or both!) first > before deciding the course of action. For SMTP server, it will be > simple as the time between the SMTP transactions will give a clear > indication in syslog. For MDA, this will probably require an strace > output. I'm considering a general solution that doesn't require such an analysis, but solves all of the issues at the same time. WRT tracking the DELE/QUIT races in POP3, I am wondering about the handling of the QUIT. Can we see a difference between "server hasn't received QUIT" and "we haven't seen the answer"? In other words, will the server's TCP stack hand the QUIT command to the server application software even if TCP couldn't send the ACK? I think it will, because the ACK itself needn't be ACKed and the server often won't care if we don't see the +OK after QUIT... The other option is top track UIDs with "to be deleted" and "deleted, QUIT +OK pending" and "deleted, QUIT acknowledged": - "QUIT acknowledged" is easy, we don't save that state per UID, but just drop the corresponding UID as the server will do the same. - "to be deleted" means we're positively sure that the transaction was rolled back (because we haven't sent the QUIT command) - we need a workaround server option though, because some servers can be configured to spoil the protocol dangerously and commit DELEtes on loss of connection unless there's a RSET. We can assume the server won't reassign the UID until the next cycle (*) - "deleted, QUIT +OK pending" is for your borderline case, we've sent the QUIT to the TCP/IP stack but haven't seen the +OK response. If we see more than half of the UIDs marked QUIT +OK pending in the next cycle, we'll mark them "to be deleted", if it's less than half, we'll forget them and re-fetch. The other option is to hash a subset of whitespace-normalized message headers (Received, Message-ID, perhaps others, making sure to avoid X-Status or other mutable headers) to accompany the UID. We could hash headers as they pass by in forwarding and only re-fetch them in your "we send QUIT but don't see +OK" case if we don't trust the UID. I wonder if we should do that. (*) there is another borderline case, that is: UID reassignment if ANOTHER client (other than fetchmail) deletes messages. I think we need to rule that out through documentation and tell users that only *one particular* client must ever delete messages from a POP3 server. If that's THunderbird or something, the user will have to run fetchmail in "keep" mode, otherwise, if he runs fetchmail without "keep", the other POP3 client must be configured to leave ALL messages on the server. WRT getting stuck on one message, we could record message UIDs and mark them as "fetch attempted before", perhaps with a counter. We'd set this state if we ever send a TOP or RETR for a new message and keep this state for the next poll cycle. This would be less than "seen". On the next poll cycle, we'll fetch new messages before those marked "fetch attempted before". This would allow new mail to be fetched even if we get stuck on particular messages through server, fetchmail, or MTA/MDA bugs. If we add a counter, we can mark a message "broken" if the counter exceeds a threshold and give up on it without deleting it, and request manual intervention from the postmaster (in multidrop) or addressee (in singledrop). Seems like I should draw the full finite state machine (FSM)... Best regards -- Matthias Andree |