Re: [fetchmail-devel] Tracking pending POP3 deletes to solve some problems

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Quoting from Matthias Andree's mail on Thu, Jul 23, 2009:
> > ====================================================================
> > Case 1: After fetchmail has sent the initial MAIL FROM: to the SMTP
> > server/invoked the MDA with %F substitution:
> >
> > I am assuming here that the SMTP server/MDA is doing a DNS
> > lookup/basic spam testing on the sender address.
> >
> > So, in the SMTP server case, fetchmail is waiting for a response from
> > the server after sending the initial MAIL FROM: while the SMTP server
> > is doing the DNS lookup/basic spam testing.
> >
> > In the MDA case, fetchmail goes ahead with popen() and starts writing
> > the body to the MDA. As the MDA is still doing the DNS lookup/basic
> > spam testing and has not yet started reading the body, the pipe buffer
> > will get full and fetchmail will get blocked on an fwrite() later.
> >
> > While fetchmail is blocked, the POP3 mailserver is waiting for
> > fetchmail to read the entire mail body.
> >
> > Note that the write() timeout for the POP3 mailserver may be shorter
> > than the read() timeout. This means that the POP3 mailserver is more
> > likely to timeout faster if it finds that the body is not getting
> > drained at all in a reasonable amount of time.
> > ====================================================================
> > Case 2: After fetchmail has sent the entire mail to the SMTP server/MDA:
> >
> > Here, the remote mailserver is waiting for a command from fetchmail,
> > while fetchmail is waiting for a response from the SMTP server/exit
> > code from the MDA.
> >
> > As mentioned above, the read() timeout may be longer for the POP3
> > mailserver and so it may not mind waiting for the next command.
> > ====================================================================
> 
> I'm not sure if I want to design fetchmail around guesses if read() or  
> write() timeouts on the server are set differently.

What I mean to say is that Case 1 above must be occurring more
frequently than reported. Hence, it should be tackled first. Of
course, I could be wrong and so Case 2 should be tackled first.

> >> a. queue downloaded messages before handing them off for delivery. This
> >> avoids timeouts that originate in the SMTP/LMTP server or MDA that
> >> fetchmail forwards to.
> >
> > This should work. Of course, fetchmail will have to work entirely with
> > UIDs as it will have to reconnect later and mark delivered mails for
> > deletion.
> 
> Yup. That's what I want to do anyways in the next after-6.3.N releases (if  
> I'll call them 6.4 or 6.5 or 7.0, I'll decide later).
> 
> So we'd have to polish the existing UID patches for IMAP to support  
> UIDVALIDITY (not a major issue, once you detect UIDVALIDITY changes, you  
> discard all stored UIDs) - OTOH if fetchmail deals cleanly with the  
> existing \Seen and \Deleted flags, we don't even need that for IMAP. I  
> need to check the IMAP4r1 transaction model though.

Ok.

> >> Fixing 2 is sort of a requisite for solving 1 in way a or c - we need to
> >> track more state. This does entail changing the .fetchids format as
> >> discussed in 2006, but the UID parser appeared very tolerant even at  
> >> that
> >> time, so that an extension would be possible and backwards compatible. I
> >> would feel more comfortable checking that again, but I think I checked
> >> thoroughly in 2006 already. Even if we must change the .fetchids
> >> format/layout, I'm open to it.
> >
> > Well, changing the .fetchids format is anyway a must. If you can
> > incorporate the UID parser, it will be great. If I remember correctly,
> 
> I'm not sure what you mean by "incorporate" here.

I mean, please merge my UID parser code patch in fetchmail, if it is
still suitable.

> > the UID parser also had an option to mark bad mails. This would be
> > used in such cases where there is a repeated delivery failure on the
> > same mail. Once a certain bad count is reached, fetchmail will stop
> > attempting to download the mail.
> 
> I don't think fetchmail has such a feature in the baseline code. The  
> internal uid data structure is:

From my UID parser code patch:

struct uidlist
{
    union
    {
       unsigned char *id;
       long int nid;                   /* IMAP ids are integers! */
    };
    int                num;
    int                mark;           /* UID-index information */
#define UID_UNSEEN     0               /* hasn't been seen */
#define UID_SEEN       1               /* seen, but not deleted */
#define UID_DELETED    2               /* this message has been deleted */
#define UID_EXPUNGED   3               /* this message has been expunged */
#define UID_OVERSIZED  4               /* this message is oversized */
#define UID_SKIPPED    5               /* this message has been skipped */
#define UID_OLD_DELETED        6       /* this message was deleted but not expunged! */
#define UID_ERROR      99              /* processing error */
    time_t     dltime;                 /* time of delivery */
    int                errcount;       /* count of errors while downloading this mail */
    int                size;           /* size of the mail */
    struct uidlist *next;
};

> I'm considering a general solution that doesn't require such an analysis,  
> but solves all of the issues at the same time.
> 
> WRT tracking the DELE/QUIT races in POP3, I am wondering about the  
> handling of the QUIT. Can we see a difference between "server hasn't  
> received QUIT" and "we haven't seen the answer"? In other words, will the  
> server's TCP stack hand the QUIT command to the server application  
> software even if TCP couldn't send the ACK? I think it will, because the  
> ACK itself needn't be ACKed and the server often won't care if we don't  
> see the +OK after QUIT...
> 
> The other option is top track UIDs with "to be deleted" and "deleted, QUIT  
> +OK pending" and "deleted, QUIT acknowledged":
> 
> - "QUIT acknowledged" is easy, we don't save that state per UID, but just  
> drop the corresponding UID as the server will do the same.
> 
> - "to be deleted" means we're positively sure that the transaction was  
> rolled back (because we haven't sent the QUIT command) - we need a  
> workaround server option though, because some servers can be configured to  
> spoil the protocol dangerously and commit DELEtes on loss of connection  
> unless there's a RSET. We can assume the server won't reassign the UID  
> until the next cycle (*)
> 
> - "deleted, QUIT +OK pending" is for your borderline case, we've sent the  
> QUIT to the TCP/IP stack but haven't seen the +OK response. If we see more  
> than half of the UIDs marked QUIT +OK pending in the next cycle, we'll  
> mark them "to be deleted", if it's less than half, we'll forget them and  
> re-fetch. The other option is to hash a subset of whitespace-normalized  
> message headers (Received, Message-ID, perhaps others, making sure to  
> avoid X-Status or other mutable headers) to accompany the UID. We could  
> hash headers as they pass by in forwarding and only re-fetch them in your  
> "we send QUIT but don't see +OK" case if we don't trust the UID. I wonder  
> if we should do that.

Well, the 'UID_OLD_DELETED' mark mentioned above was meant to address these
cases.

> WRT getting stuck on one message, we could record message UIDs and mark  
> them as "fetch attempted before", perhaps with a counter. We'd set this  
> state if we ever send a TOP or RETR for a new message and keep this state  
> for the next poll cycle. This would be less than "seen". On the next poll  
> cycle, we'll fetch new messages before those marked "fetch attempted  
> before". This would allow new mail to be fetched even if we get stuck on  
> particular messages through server, fetchmail, or MTA/MDA bugs. If we add  
> a counter, we can mark a message "broken" if the counter exceeds a  
> threshold and give up on it without deleting it, and request manual  
> intervention from the postmaster (in multidrop) or addressee (in  
> singledrop).

The 'errcount' field mentioned above was meant for this purpose.

-- 
Sunil Shetye.

Re: [fetchmail-devel] Tracking pending POP3 deletes to solve some problems

Client daemon to move mail from POP and IMAP to your local computer

Re: [fetchmail-devel] Tracking pending POP3 deletes to solve some problems