From: Frederic M. <fre...@wo...> - 2006-05-19 11:19:45
|
Matthias Andree wrote: > (collecting replies to two postings) > >> - Some mailservers keep the flags of a mail in the mail itself by >> adding a header like Status:. So, the size of a mail may actually >> change when it turns from 'new' to 'old'. Due to size mismatch, >> mails from such mailservers will get downloaded again. >> > > That is a rather important concern. > > The refinement of the suggestion would be to hash all but a few > non-constant headers with some decent hash function. MD5 would be > simple, but we shouldn't hardcode anything here. > You are right. This is a better. To reduce the load, the hash could be used only when both the UID and the size of a previously deleted message are found again on the next poll. That would not occur too often and, therefore, it should not increase significantly the load on a relatively stable connection with a server that uses good UID and reports valid constant mail size. The user could also choose to use the hash instead of the size when the size reported by the server is unreliable. >>> - Some mailservers keep the flags of a mail in the mail itself by >>> adding a header like Status:. So, the size of a mail may actually >>> change when it turns from 'new' to 'old'. Due to size mismatch, >>> mails from such mailservers will get downloaded again. >>> > > >> You got me there :-) In that case, the mail would be downloaded a second >> time and deleted. >> > > The "message of same size" (say, automated messages in a fixed format, > stock reports, particular logs) problem isn't solved though - and they > might even get the same UID if the server used just the MD5 or a file > name with a recycled inode number... > I don't think so. If two messages are really different, the date and the message ID should at least be different. Even for a badly formated mail with a constant date and message ID sent, for instance, by a crontab running sendmail us...@ex... < fixed_mail.txt, the received headers should be different. If the two mails are completely identical and lead to the same hash, then they are the same mail. The hash of the significant headers looks safe to me if the "significant headers" are carefully chosen. > The question is (1) if it's worth the added effort to track sizes, or > (2) if we should rather go Sunil's simple "play it safe and redownload > if in doubt" route; or (3) refine my patch to assume QUIT succeeded the > moment it is handed off to the write() call. That also has the effect of > redownloading messages if the QUIT fails, but will retry the DELEte if > fetchmail doesn't reach the point where it would send QUIT. > Fetchmail should not (1) leave the mails on the server forever, (2) download the mails again and again because it chokes on one mail, (3) drop one mail because it could not distinguish it from a previously deleted mail. Now, to make things clearer, let me tell you a problem that occurred to me. It explain the reason I add point (2) above and is a concrete example of something that can go wrong. We have a pop3 proxy (p3scan) through which fetchmail downloads the mails from the external pop3 server. p3scan get the whole mail and hand it to clamd to scan it for a virus. If everything is ok, the mail is passed to fetchmail. Therefore, fetchmail receives the mail after some delay depending on the size of the mail but it can send it successfully to postfix. Then, fetchmail tries to send the dele command and it fails because the scanning of the mail by clamd took too much time and the pop3 server timed out. If fetchmail can discover that the mail was delivered but it could not be deleted on the previous poll (remember the dele command was never acknowledged by the server) and delete it without downloading it again, then the problem is solved. In my case, I received a 32MB mails and it took 13 minutes to scan it. It is far beyond the patience of the most patient server... Therefore, the mail kept being downloaded over and over and delivered to the two users on the To field... Yes it is rather painful when you open you mailbox and discover 15 copies of a 32MB mail :-) >> And this, only if the connection is dropped after the delete command >> is sent AND the size of the mail is changed by the server on the next >> poll. >> >> The worst case occurs if the size of the mail keeps changing at every >> poll. I have no solution here. It's not funny at all to have to deal >> with broken servers :-( >> > > I think I might just charge users for adding more workarounds. Then they > have the cheap route of complaining to the server's operator, or the > expensive one of having the fix. :-P > > I think size isn't necessarily the best complement here to detect > changes. I like the hash approach better, but this seems quite intrusive > as well. > > A hash that precludes some dynamic headers such as [X-]Status and > similar may be needed anyways to emulate UIDL for --keep setups on > servers that don't support UIDL -- but if they lack all UIDL and TOP, > the network impact of --keep will be rather painful. > (Perhaps fetchmail should just refuse --keep on such servers.) > Or it might be documented. The user would then have to choose between the "cheap route" or the time consuming one :-) Frederic |