Re: [fetchmail-users] Re: fix lost POP3 deletes

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Matthias Andree wrote:
> (collecting replies to two postings)
>   
>> - Some mailservers keep the flags of a mail in the mail itself by
>>   adding a header like Status:. So, the size of a mail may actually
>>   change when it turns from 'new' to 'old'. Due to size mismatch,
>>   mails from such mailservers will get downloaded again.
>>     
>
> That is a rather important concern.
>
> The refinement of the suggestion would be to hash all but a few
> non-constant headers with some decent hash function. MD5 would be
> simple, but we shouldn't hardcode anything here.
>   
You are right. This is a better.

To reduce the load, the hash could be used only when both the UID and 
the size of a previously deleted message are found again on the next 
poll. That would not occur too often and, therefore, it should not 
increase significantly the load on a relatively stable connection with a 
server that uses good UID and reports valid constant mail size.

The user could also choose to use the hash instead of the size when the 
size reported by the server is unreliable.

>>> - Some mailservers keep the flags of a mail in the mail itself by
>>>   adding a header like Status:. So, the size of a mail may actually
>>>   change when it turns from 'new' to 'old'. Due to size mismatch,
>>>   mails from such mailservers will get downloaded again.
>>>       
>
>   
>> You got me there :-) In that case, the mail would be downloaded a second
>> time and deleted.
>>     
>
> The "message of same size" (say, automated messages in a fixed format,
> stock reports, particular logs) problem isn't solved though - and they
> might even get the same UID if the server used just the MD5 or a file
> name with a recycled inode number...
>   
I don't think so. If two messages are really different, the date and the 
message ID should at least be different. Even for a badly formated mail 
with a constant date and message ID sent, for instance, by a crontab 
running sendmail us...@ex... < fixed_mail.txt, the received headers 
should be different. If the two mails are completely identical and lead 
to the same hash, then they are the same mail.

The hash of the significant headers looks safe to me if the "significant 
headers" are carefully chosen.

> The question is (1) if it's worth the added effort to track sizes, or
> (2) if we should rather go Sunil's simple "play it safe and redownload
> if in doubt" route; or (3) refine my patch to assume QUIT succeeded the
> moment it is handed off to the write() call. That also has the effect of
> redownloading messages if the QUIT fails, but will retry the DELEte if
> fetchmail doesn't reach the point where it would send QUIT.
>   
Fetchmail should not (1) leave the mails on the server forever, (2) 
download the mails again and again because it chokes on one mail, (3) 
drop one mail because it could not distinguish it from a previously 
deleted mail.

Now, to make things clearer, let me tell you a problem that occurred to 
me. It explain the reason I add point (2) above and is a concrete 
example of something that can go wrong.

We have a pop3 proxy (p3scan) through which fetchmail downloads the 
mails from the external pop3 server. p3scan get the whole mail and hand 
it to clamd to scan it for a virus. If everything is ok, the mail is 
passed to fetchmail. Therefore, fetchmail receives the mail after some 
delay depending on the size of the mail but it can send it successfully 
to postfix. Then, fetchmail tries to send the dele command and it fails 
because the scanning of the mail by clamd took too much time and the 
pop3 server timed out.

If fetchmail can discover that the mail was delivered but it could not 
be deleted on the previous poll (remember the dele command was never 
acknowledged by the server) and delete it without downloading it again, 
then the problem is solved. In my case, I received a 32MB mails and it 
took 13 minutes to scan it. It is far beyond the patience of the most 
patient server... Therefore, the mail kept being downloaded over and 
over and delivered to the two users on the To field... Yes it is rather 
painful when you open you mailbox and discover 15 copies of a 32MB mail :-)

>> And this, only if the connection is dropped after the delete command
>> is sent AND the size of the mail is changed by the server on the next
>> poll.
>>
>> The worst case occurs if the size of the mail keeps changing at every
>> poll. I have no solution here. It's not funny at all to have to deal
>> with broken servers :-(
>>     
>
> I think I might just charge users for adding more workarounds. Then they
> have the cheap route of complaining to the server's operator, or the
> expensive one of having the fix. :-P
>
> I think size isn't necessarily the best complement here to detect
> changes. I like the hash approach better, but this seems quite intrusive
> as well.
>
> A hash that precludes some dynamic headers such as [X-]Status and
> similar may be needed anyways to emulate UIDL for --keep setups on
> servers that don't support UIDL -- but if they lack all UIDL and TOP,
> the network impact of --keep will be rather painful.
> (Perhaps fetchmail should just refuse --keep on such servers.)
>   
Or it might be documented. The user would then have to choose between 
the "cheap route" or the time consuming one :-)

Frederic

Re: [fetchmail-users] Re: fix lost POP3 deletes

Client daemon to move mail from POP and IMAP to your local computer

Re: [fetchmail-users] Re: fix lost POP3 deletes