From: Matthias A. <mat...@gm...> - 2006-02-14 11:54:32
|
Frederic Marchal <fre...@wo...> writes: > Well, I'm a C programmer and I may be able to help but I'm mainly a > windows programmer (yes, I know, shame on me :-) ) and I'm not yet all > too familiar with the Linux programming and especially with the patch > mechanism and none at all with the diff tool. The workflow is: 1. create a copy of all files you need to edit, for instance, copy foo.c to foo.c.orig alternatively, copy the whole directory tree recursively 2. edit until you're happy with the code 3. test the edits (compile, install, run) 4. generate a unified (preferred) or context patch. Not all diff utilities support unified output, but all are to support context format, as mandated by the relevant standard, IEEE Std 1003.1-2001. Plain ed-style patches (unfortunately still the default in the diff utility) are impractical so they are not usually accepted. The diff tool is quite simple actually, if you created file copies: diff -u OLDFILE NEWFILE >PATCHFILE (unified diff, preferred) diff -c OLDFILE NEWFILE >PATCHFILE (context diff, alternative) If you copied the whole directory before the edits: diff -u -r BACKUP_DIR EDITED_DIR >PATCHFILE (or -c instead of -u) If you added new files to EDITED_DIR, you'd use -N but I don't think we'll need this here. diff, used either way, will produce a file (PATCHFILE) that you'd attach to a message and mail to me (or Rob, or preferably the -devel@ list) or upload to the BerliOS patch tracker. If I have the choice, I prefer the -devel@ list. That's public, archived and easiest for me since I don't need to fetch anything, but get it pushed into my inbox. The patch utility does the reverse and will usually just be used like this, it derives the file names from the PATCHFILE. patch <PATCHFILE perhaps with -d (to change directory) or -p (to strip leading path components from the paths shown in the patch file; used if patch cannot find the files). > Nevertheless, I have identified the core of the problem (its the call > to readheaders in drive.c and the response to PS_REFUSED). I still > have to check what happens if an invalid FROM or TO header is passed > after readheaders but I think I can fix the program. That would be great. Basically they should not matter that much. fetchmail has some deprecated code to guess from To:/Cc: or similar, but that is an unreliable concept and doesn't deserve attention, From: shouldn't matter as long as Return-Path: is given (and even then, From: shouldn't matter). Missing or corrupted Return-Path: headers should be treated as though the message had contained "Return-Path: <>". > I'll also need some direction about the proper way of doing it for a > good integration in the current source. The bigger concern is that fetchmail shouldn't be emitting headers it knows are broken, so that the next hop (the MTA or MDA) does not get confused. I wonder if fetchmail should prefix the broken lines with X-Fetchmail-Escaped-Broken-Header: or something similar. -- Matthias Andree |
From: Frederic M. <fre...@wo...> - 2006-02-17 10:55:22
|
Matthias Andree wrote: > Frederic Marchal <fre...@wo...> writes: > > >> I expect this mail will be wrapped so here is the explanation. The >> References header is one long line and is wrapped at some point after >> the <43ea8d070602140508w6af9d34aq5c which leaves the >> 222...@ma...> all by itself on a line. You have >> received that e-mail and can compare it to your copy >> >> On my side, the References line is truncated at column 263. rfc821 >> specifies a maximum line length of 1000 characters but an implementation >> should be prepared for longer lines or reject the mail. It is not what >> is happening here. >> >> The culprit may be Mercury/32 or the Wingate proxy which is not listed >> in the header. >> > > Is there any chance we might isolate the bug before we fix it? > Isolate in what ? Mercury or fetchmail ? I installed a Mercury/32 server on my computer and I made some tests. The header remains intact until it is fetched from the POP3 server by the client. This bug is not (yet) mentioned on the mailing list of the project. It is a closed source project and there seems to be only one very busy developer. I won't get any help from there. In fetchmail, the problem is clearly located in transact.c (I wouldn't call it a bug until I get some clarification from the one who wrote it). The part of the code that rejects the mail is preceded by a comment stating * At least one brain-dead website (netmind.com) is known to * send out robotmail that's missing the RFC822 delimiter blank * line before the body! Without this check fetchmail segfaults. * With it, we treat such messages as spam and refuse them. But the test after this comment doesn't test for the beginning of the body but for a header line not starting with a blank and not containing a colon. If the condition is true, refuse_mail is set to 1 and the function returns PS_REFUSED which delete the mail on the server. Frederic |
From: Matthias A. <mat...@gm...> - 2006-02-17 16:02:26
|
Frederic Marchal <fre...@wo...> writes: > Matthias Andree wrote: >> Frederic Marchal <fre...@wo...> writes: >> >> >>> I expect this mail will be wrapped so here is the explanation. The >>> References header is one long line and is wrapped at some point after >>> the <43ea8d070602140508w6af9d34aq5c which leaves the >>> 222...@ma...> all by itself on a line. You have >>> received that e-mail and can compare it to your copy >>> >>> On my side, the References line is truncated at column 263. rfc821 >>> specifies a maximum line length of 1000 characters but an implementation >>> should be prepared for longer lines or reject the mail. It is not what >>> is happening here. >>> >>> The culprit may be Mercury/32 or the Wingate proxy which is not listed >>> in the header. >>> >> >> Is there any chance we might isolate the bug before we fix it? > Isolate in what ? Mercury or fetchmail ? Finding out if it's in Mercury/32 or Wingate or outside both. > In fetchmail, the problem is clearly located in transact.c (I wouldn't > call it a bug until I get some clarification from the one who wrote > it). The part of the code that rejects the mail is preceded by a comment > stating > > * At least one brain-dead website (netmind.com) is known to > * send out robotmail that's missing the RFC822 delimiter blank > * line before the body! Without this check fetchmail segfaults. > * With it, we treat such messages as spam and refuse them. > > But the test after this comment doesn't test for the beginning of the > body but for a header line not starting with a blank and not containing > a colon. Which is assuming that it's looking at a body line rather than a header, on the assumption that all header lines either start with whitespace, or with a sequence of non-space characters terminated with a colon (":"). I'm not sure if it's still needed, and the segfault had probably better be fixed in the place where it actually occurs, but that's stuff for 6.4.0, I'm not going to attempt removing this workaround in 6.3.X to avoid user astonishment. -- Matthias Andree |
From: Rob M. <rob...@gm...> - 2006-02-14 14:08:57
|
On 2/14/06, Matthias Andree <mat...@gm...> wrote: > > The bigger concern is that fetchmail shouldn't be emitting headers it > knows are broken, so that the next hop (the MTA or MDA) does not get > confused. I wonder if fetchmail should prefix the broken lines with > X-Fetchmail-Escaped-Broken-Header: or something similar. My 0.02$CURRENCY - when fetchmail finds a mail such as this it should create a new mail, addressed to the postmaster address, with the illegal email as a text/plain attachment. If you're feeling kind, insert a blank line before the offending illegal header and include that altered version as a message/rfc822 attachment (but still have the text/plain attachment) There's no point in passing on the known broken email as is, and simply "fixing" it may result in more problems. At least if the postmaster gets it as an attachment then they can decide how to handle it. Personally, I'm curious to know what mail client is producing such borken emails, and what MTAs are passing them on. -- Please keep list traffic on the list. Rob MacGregor Whoever fights monsters should see to it that in the process he doesn't become a monster. Friedrich Nietzsche |
From: Frederic M. <fre...@wo...> - 2006-02-14 14:47:21
|
Rob MacGregor wrote: > On 2/14/06, Matthias Andree <mat...@gm...> wrote: > >> The bigger concern is that fetchmail shouldn't be emitting headers it >> knows are broken, so that the next hop (the MTA or MDA) does not get >> confused. I wonder if fetchmail should prefix the broken lines with >> X-Fetchmail-Escaped-Broken-Header: or something similar. >> > > My 0.02$CURRENCY - when fetchmail finds a mail such as this it should > create a new mail, addressed to the postmaster address, with the > illegal email as a text/plain attachment. > > If you're feeling kind, insert a blank line before the offending > illegal header and include that altered version as a message/rfc822 > attachment (but still have the text/plain attachment) > > There's no point in passing on the known broken email as is, and > simply "fixing" it may result in more problems. At least if the > postmaster gets it as an attachment then they can decide how to handle > it. > > Personally, I'm curious to know what mail client is producing such > borken emails, and what MTAs are passing them on. > > This thread started in fetchmail-user. My problem is that at least one good e-mail (I mean one I would have want to receive) had an invalid header line and was deleted by fetchmail. The mail was sent by hotmail with a very long TO header (it is not rfc822 compliant) and further wrapped at some point. As a result, the header contained one line without colon and not starting with a blank character. The whole mail was simply deleted by fetchmail although it could have been delivered. I believe it is not fetchmail's job to enforce rfc822 that way. It should be left to a program dedicated to that task or, at least, it should not be the default option. It would be a good idea to tag the mail with a specific header though. It would make it easier to filter out with a program such as procmail. I wouldn't divert the mail to the postmaster either. It is likely that a busy postmaster will throw it away without looking at it if he is not the obvious recipient. The drawback is that if the mail is really junk, it will be downloaded and processed until it encounters the spam filter. It is a problem for a dialup link or even a DSL line with a low download limit. That's a reason to keep it an option the user can set. Frederic |
From: Rob M. <rob...@gm...> - 2006-02-14 18:10:55
|
On 2/14/06, Frederic Marchal <fre...@wo...> wrote: > > I believe it is not fetchmail's job to enforce rfc822 that way. It > should be left to a program dedicated to that task or, at least, it > should not be the default option. It would be a good idea to tag the > mail with a specific header though. It would make it easier to filter > out with a program such as procmail. I wouldn't divert the mail to the > postmaster either. It is likely that a busy postmaster will throw it > away without looking at it if he is not the obvious recipient. However, the postmaster defined by fetchmail doesn't have to be The Postmaster. Of course, there's nothing to say that the option can't both have a global setting and a per-user override. I'd rather have the option of processing by email because at least in the case of 3 fetchmail boxes I look after, procmail et all aren't installed. Nothing to say the 2 are exclusive however. -- Please keep list traffic on the list. Rob MacGregor Whoever fights monsters should see to it that in the process he doesn't become a monster. Friedrich Nietzsche |
From: Frederic M. <fre...@wo...> - 2006-02-16 11:00:00
|
Rob MacGregor wrote: > On 2/14/06, Frederic Marchal <fre...@wo...> wrote: > >> I believe it is not fetchmail's job to enforce rfc822 that way. It >> should be left to a program dedicated to that task or, at least, it >> should not be the default option. It would be a good idea to tag the >> mail with a specific header though. It would make it easier to filter >> out with a program such as procmail. I wouldn't divert the mail to the >> postmaster either. It is likely that a busy postmaster will throw it >> away without looking at it if he is not the obvious recipient. >> > > However, the postmaster defined by fetchmail doesn't have to be The > Postmaster. Of course, there's nothing to say that the option can't > both have a global setting and a per-user override. > In my case, that account (which is not The Postmaster) received 82 mails for the last 6 hours and they ended up at the secretary that must deal with 80% of spam from various addresses every day. I double check that account from time to time and I salvage a few mails every month. Sending a mail to an account where the user spend most of his/her time deleting spams is unreliable at best :-). With the limited experience I gained from our small particular net, I think it is best to let the mail go unchanged if fetchmail can extract enough information to determine the next route. It could also add a valid X-Envelope-To if none was found but, as you said in a previous post, anything beyond that would be risky... Now, other e-mail processing configurations may have requirements I don't see. So, feel free to tell me about it. It would help me to have a clear picture of what fetchmail must deal with if I start changing the code, preferably before I do it :-). Frederic |
From: Frederic M. <fre...@wo...> - 2006-02-16 11:51:40
|
Rob MacGregor wrote: > On 2/14/06, Matthias Andree <mat...@gm...> wrote: > > Personally, I'm curious to know what mail client is producing such > borken emails, and what MTAs are passing them on. > Here it is. I just got one from fetchmail-devel: Received: from spooler by webmobile.be (Mercury/32 v4.01a); 14 Feb 2006 14:47:07 +0100 X-Envelope-To: <fre...@wo...> Return-path: <fet...@li...> Received: from bat.berlios.de (195.37.77.135) by webmobile.be (Mercury/32 v4.01a) with ESMTP ID MG001FEF; 14 Feb 2006 14:47:03 +0100 Received: from bat.berlios.de (localhost [127.0.0.1]) by bat.berlios.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id k1EDm2E08521; Tue, 14 Feb 2006 14:48:02 +0100 Received: from outmx024.isp.belgacom.be (outmx024.isp.belgacom.be [195.238.4.128]) by bat.berlios.de (8.11.3/8.11.3/SuSE Linux 8.11.1-0.5) with ESMTP id k1EDlLE08506 for <fet...@be...>; Tue, 14 Feb 2006 14:47:21 +0100 Received: from outmx024.isp.belgacom.be (localhost [127.0.0.1]) by outmx024.isp.belgacom.be (8.12.11/8.12.11/Skynet-OUT-2.22) with ESMTP id k1EDlDjl009366 for <fet...@be...>; Tue, 14 Feb 2006 14:47:13 +0100 (envelope-from <fre...@wo...>) Received: from [192.168.100.30] (167.191-201-80.adsl.skynet.be [80.201.191.167]) by outmx024.isp.belgacom.be (8.12.11/8.12.11/Skynet-OUT-2.22) with ESMTP id k1EDl5Rq009276 for <fet...@be...>; Tue, 14 Feb 2006 14:47:05 +0100 (envelope-from <fre...@wo...>) Message-ID: <43F...@wo...> From: Frederic Marchal <fre...@wo...> User-Agent: Thunderbird 1.5 (Windows/20051201) MIME-Version: 1.0 To: fet...@be... Subject: Re: [fetchmail-devel] Re: [fetchmail-users] Incorrect header line and lost mails References: <43F...@wo...> <43e...@ma...> <m3h...@me...> <43F...@wo...> <m3o...@me...> <43ea8d070602140508w6af9d34aq5c 222...@ma...> In-Reply-To: <43e...@ma...> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Sender: fet...@be... Errors-To: fet...@be... X-BeenThere: fet...@li... X-Mailman-Version: 2.0.13 Precedence: bulk List-Help: <mailto:fet...@li...?subject=help> List-Post: <mailto:fet...@li...> List-Subscribe: <http://lists.berlios.de/mailman/listinfo/fetchmail-devel>, <mailto:fet...@li...?subject=subscribe> List-Id: <fetchmail-devel.lists.berlios.de> List-Unsubscribe: <http://lists.berlios.de/mailman/listinfo/fetchmail-devel>, <mailto:fet...@li...?subject=unsubscribe> List-Archive: <http://lists.berlios.de/pipermail/fetchmail-devel/> Date: Tue, 14 Feb 2006 14:47:14 +0100 X-CC-Diagnostic: Body has "colon" (20) I expect this mail will be wrapped so here is the explanation. The References header is one long line and is wrapped at some point after the <43ea8d070602140508w6af9d34aq5c which leaves the 222...@ma...> all by itself on a line. You have received that e-mail and can compare it to your copy On my side, the References line is truncated at column 263. rfc821 specifies a maximum line length of 1000 characters but an implementation should be prepared for longer lines or reject the mail. It is not what is happening here. The culprit may be Mercury/32 or the Wingate proxy which is not listed in the header. Anyway, I can't change any of them and nothing prevent the mail from being delivered. Moreover I can read it in thunderbird without any warning or inconvenience. Frederic |
From: Matthias A. <mat...@gm...> - 2006-02-16 18:02:24
|
Frederic Marchal <fre...@wo...> writes: > I expect this mail will be wrapped so here is the explanation. The > References header is one long line and is wrapped at some point after > the <43ea8d070602140508w6af9d34aq5c which leaves the > 222...@ma...> all by itself on a line. You have > received that e-mail and can compare it to your copy > > On my side, the References line is truncated at column 263. rfc821 > specifies a maximum line length of 1000 characters but an implementation > should be prepared for longer lines or reject the mail. It is not what > is happening here. > > The culprit may be Mercury/32 or the Wingate proxy which is not listed > in the header. Is there any chance we might isolate the bug before we fix it? Even if that isn't of immediate help for you, documenting this properly might help other people solve their problem, and we may be able to write a workaround in the least intrusive way. > Anyway, I can't change any of them and nothing prevent the mail from > being delivered. Moreover I can read it in thunderbird without any > warning or inconvenience. I think given the vast number of varieties how messages can be broken, and one workaround might break another, the best bet is probably to wrap the message up as a MIME message/rfc822 attachment that contains a short introductory text that tells the end user the message contained broken headers. -- Matthias Andree |
From: Frederic M. <fre...@wo...> - 2006-02-16 10:31:48
|
Matthias Andree wrote: > Frederic Marchal <fre...@wo...> writes: > > >> Well, I'm a C programmer and I may be able to help but I'm mainly a >> windows programmer (yes, I know, shame on me :-) ) and I'm not yet all >> too familiar with the Linux programming and especially with the patch >> mechanism and none at all with the diff tool. >> > > The workflow is: > > 1. create a copy of all files you need to edit, for instance, > copy foo.c to foo.c.orig > > alternatively, copy the whole directory tree recursively > > 2. edit until you're happy with the code > > 3. test the edits (compile, install, run) > > 4. generate a unified (preferred) or context patch. Not all diff > utilities support unified output, but all are to support context > format, as mandated by the relevant standard, IEEE Std 1003.1-2001. > > Plain ed-style patches (unfortunately still the default in the diff > utility) are impractical so they are not usually accepted. > > The diff tool is quite simple actually, if you created file copies: > > diff -u OLDFILE NEWFILE >PATCHFILE (unified diff, preferred) > > diff -c OLDFILE NEWFILE >PATCHFILE (context diff, alternative) > > If you copied the whole directory before the edits: > > diff -u -r BACKUP_DIR EDITED_DIR >PATCHFILE (or -c instead of -u) > > If you added new files to EDITED_DIR, you'd use -N but I don't think > we'll need this here. > > diff, used either way, will produce a file (PATCHFILE) that you'd attach > to a message and mail to me (or Rob, or preferably the -devel@ list) or > upload to the BerliOS patch tracker. > > If I have the choice, I prefer the -devel@ list. That's public, archived > and easiest for me since I don't need to fetch anything, but get it > pushed into my inbox. > > > The patch utility does the reverse and will usually just be used like > this, it derives the file names from the PATCHFILE. > > patch <PATCHFILE > > perhaps with -d (to change directory) or -p (to strip leading path > components from the paths shown in the patch file; used if patch cannot > find the files). > Thank you. It looks easy indeed but you saved me from a lot of man/info/howto readings... :-) >> I'll also need some direction about the proper way of doing it for a >> good integration in the current source. >> > > The bigger concern is that fetchmail shouldn't be emitting headers it > knows are broken, so that the next hop (the MTA or MDA) does not get > confused. I wonder if fetchmail should prefix the broken lines with > X-Fetchmail-Escaped-Broken-Header: or something similar. > At first glance it seemed reasonable, but after one day of thinking about it, I'm not so sure I understand why fetchmail should even alter the original header in any way. Why is it fetchmail's concern if the next hop gets confused by an invalid header that was there in the first place ? Isn't it more useful to keep intact all the information from the original e-mail ? After all, an invalid header is also a signature for a spam or a virus and it could be used by some tool along the delivery chain to filter out the mail. In my case, a mail client could also detect the wrapping and restore the original header provided fetchmail doesn't mess it up. Frederic |
From: Frederic M. <fre...@wo...> - 2006-02-16 17:31:22
|
Matthias (and others), I have some more questions and requests for guidance... - Should I patch the source from 6.3.2 or the one in the SVN repository ? - I will simply ignore the invalid lines (unless someone has a better idea). I think it should be user selectable. Is it suitable to have that option in the rcfile and not available from the command line ? - The option would be set with a SET statement in the rcfile. Is there any reason to make it configurable for each server ? - I intent to add the variable of the option in struct runctl if it is global and to struct hostdata if it is local to a server. Is it the good way of doing it ? - Does anybody have a suitable name for that option ? BTW, I already have a working version of the patch for 6.3.2 but I need to do some more testings and adapt it to the answers to the questions here above. Frederic |
From: Matthias A. <mat...@gm...> - 2006-02-17 10:33:34
|
Frederic Marchal <fre...@wo...> writes: > Matthias (and others), > > I have some more questions and requests for guidance... > > - Should I patch the source from 6.3.2 or the one in the SVN > repository ? The SVN's branches/BRANCH_6-3/ would be the most useful place at this time for fixes, and trunk/ for feature patches. > - I will simply ignore the invalid lines (unless someone has a better > idea). Do you mean, skip the header, or include it without complaining? > I think it should be user selectable. Is it suitable to have that > option in the rcfile and not available from the command line ? Personally, I do not appreciate such inconsistencies. > - The option would be set with a SET statement in the rcfile. Is there >any reason to make it configurable for each server ? That depends on whether it's switching off the complaint (in that case, global might be enough) or actually deleting the header. The better option in any case is server or user configurable. > - I intent to add the variable of the option in struct runctl if it is > global and to struct hostdata if it is local to a server. Is it the good > way of doing it ? It's the only reasonable way of doing it. :-) > - Does anybody have a suitable name for that option ? That depends on what it does. It should default to "off" for compatibility. -- Matthias Andree |
From: Matthias A. <mat...@gm...> - 2006-03-03 22:11:38
Attachments:
noskip-incorrect.patch
|
Greetings, I think we'll shelve this problem and revisit in 6.4.X. I've thought about it again, and there is no good solution. It is unclear whether some mailer spoiled the blank line between header and body, and the incorrect header line delimits header from body, or whether it's just broken folding as observed by you. I am very much in favor of Rob MacGregor's suggestion of putting the message in a message/rfc822 container, and if user addresses have been figured from the header, forward there, else to the fallback postmaster (usually the calling user, literally "postmaster" if run as root, or whichever is configured in the rcfile), and I'd be more comfortable if such major changes can get testing first before being deployed. If it works well, it might get backported to 6.3.X depending on how fast those evolve. I hope that's acceptable. If you want messages passed on in spite of problematic headers, and let other parts of the mail system sort things out, try the attached patch. It's highly experimental, may cause older bugs to reappear that were masked (which please report, particularly segfaults!), and isn't suitable for application in production or distributions. Regards, -- Matthias Andree |
From: Frederic M. <fre...@wo...> - 2006-03-06 10:59:18
|
Matthias Andree wrote: > Greetings, > > I think we'll shelve this problem and revisit in 6.4.X. I've thought > about it again, and there is no good solution. It is unclear whether > some mailer spoiled the blank line between header and body, and the > incorrect header line delimits header from body, or whether it's just > broken folding as observed by you. > > I am very much in favor of Rob MacGregor's suggestion of putting the > message in a message/rfc822 container, and if user addresses have been > figured from the header, forward there, else to the fallback postmaster > (usually the calling user, literally "postmaster" if run as root, or > whichever is configured in the rcfile), and I'd be more comfortable if > such major changes can get testing first before being deployed. If it > works well, it might get backported to 6.3.X depending on how fast those > evolve. > > I hope that's acceptable. If you want messages passed on in spite of > problematic headers, and let other parts of the mail system sort things > out, try the attached patch. It's highly experimental, may cause older > bugs to reappear that were masked (which please report, particularly > segfaults!), and isn't suitable for application in production or > distributions. > Thank you Matthias, but I patched it myself as I promised. I should have told you about it, but I haven't said a word because I'm experiencing some big UIDL troubles with version 6.3.2 (I back ported my fix to that version to try it in production) and I don't know if the problem is related to my fix or if it is a known problem in 6.3.2 (as the recent e-mails on this list tend to say). Here is a suspicious case from my log with the "keep" option active: Feb 20 17:36:26 localhost fetchmail[14623]: mise en sommeil à lun 20 fév 2006 17:36:26 CET Feb 20 17:41:26 localhost fetchmail[14623]: réveillé à lun 20 fév 2006 17:41:26 CET Feb 20 17:41:29 localhost fetchmail[14623]: 1353 messages (1353 déjà vus) pour xxx#yyy.be dans 192.168.100.1 (70802116 octets). Feb 20 17:41:29 localhost fetchmail[14623]: mise en sommeil à lun 20 fév 2006 17:41:29 CET Feb 20 17:46:29 localhost fetchmail[14623]: réveillé à lun 20 fév 2006 17:46:29 CET Feb 20 17:46:31 localhost fetchmail[14623]: 1355 messages (1355 déjà vus) pour xxx#yyy.be dans 192.168.100.1 (70808100 octets). Feb 20 17:46:31 localhost fetchmail[14623]: mise en sommeil à lun 20 fév 2006 17:46:31 CET Feb 20 17:51:31 localhost fetchmail[14623]: réveillé à lun 20 fév 2006 17:51:31 CET Feb 20 17:51:34 localhost fetchmail[14623]: 1358 messages (1358 déjà vus) pour xxx#yyy.be dans 192.168.100.1 (70825184 octets). Feb 20 17:51:34 localhost fetchmail[14623]: mise en sommeil à lun 20 fév 2006 17:51:34 CET Why is it reporting 1353 messages seen, then 1355 and then 1358 without downloading any mail ? I have no answer and I can't figure out why it would be related to my patch. Moreover, and this is related to the same problem, Fetchmail keeps downloading some e-mails two or three times or it delays some e-mails until it eventually download them. It looks like the UIDL management is seriously broken but it may be on my version only. I just have to find out... If I remove the "keep" option everything looks good and I can't see any mail loss but, on the other hand, I can't compare the mails on the server to those actually downloaded. I also think my fix won't be acceptable to you because I let the message pass without any change (I can't encapsulate the offending e-mail in a container. It is beyond the time I can spend on this project) and I keep processing the e-mail until the first blank line is found even if it is in the body (I don't know what would happen if the end of the mail was reached without encountering a blank line). Anyway, if you want to have a look at my patch in its current state, I can send it to you. Frederic |
From: Matthias A. <mat...@gm...> - 2006-03-07 10:23:17
|
Frederic Marchal <fre...@wo...> writes: > Here is a suspicious case from my log with the "keep" option active: > > Feb 20 17:36:26 localhost fetchmail[14623]: mise en sommeil à lun 20 fév > 2006 17:36:26 CET > Feb 20 17:41:26 localhost fetchmail[14623]: réveillé à lun 20 fév 2006 > 17:41:26 CET > Feb 20 17:41:29 localhost fetchmail[14623]: 1353 messages (1353 déjà > vus) pour xxx#yyy.be dans 192.168.100.1 (70802116 octets). [...] > Why is it reporting 1353 messages seen, then 1355 and then 1358 without > downloading any mail ? I have no answer and I can't figure out why it > would be related to my patch. Moreover, and this is related to the same > problem, Fetchmail keeps downloading some e-mails two or three times or > it delays some e-mails until it eventually download them. It looks like > the UIDL management is seriously broken but it may be on my version > only. I just have to find out... Does fetchmail -vvv reveal any more detail? For instance, it would show the UIDL lists, and some editing with cut, sort, a text editor and finally the "comm" command might show what's really going on. -- Matthias Andree |
From: Frederic M. <fre...@wo...> - 2006-03-07 11:27:17
|
Matthias Andree wrote: > Frederic Marchal <fre...@wo...> writes: > > >> Why is it reporting 1353 messages seen, then 1355 and then 1358 without >> downloading any mail ? I have no answer and I can't figure out why it >> would be related to my patch. Moreover, and this is related to the same >> problem, Fetchmail keeps downloading some e-mails two or three times or >> it delays some e-mails until it eventually download them. It looks like >> the UIDL management is seriously broken but it may be on my version >> only. I just have to find out... >> > > Does fetchmail -vvv reveal any more detail? For instance, it would show > the UIDL lists, and some editing with cut, sort, a text editor and > finally the "comm" command might show what's really going on. > > I haven't tried that on the production server because of the daunting amount of output it generates. I'll try it now and see if I can get something out of it. Frederic |
From: Frederic M. <fre...@wo...> - 2006-03-07 15:01:50
|
Matthias Andree wrote: > Does fetchmail -vvv reveal any more detail? For instance, it would show > the UIDL lists, and some editing with cut, sort, a text editor and > finally the "comm" command might show what's really going on. > > It wasn't long and here is the result. I reconstructed the three UID lists here below from the UID fetchmail fetches. First run, 5 messages are waiting, the response to the UIDL command is as follow. All the UID are listed. POP3< 1 3QWN3Z3.CNM34671D94 POP3< 2 5KS8NWT.CNM34671DAD POP3< 3 GWBROT7.CNM34671DE2 POP3< 4 IZ3YMGF.CNM34671D48 POP3< 5 YDL1GPM.CNM34671DAD Next run, 8 messages are on the server and fetchmail tries to find the first unseen UID starting from number 4 and goes back to number 2 POP3< +OK 2 3QWN3Z3.CNM34671D94 POP3< +OK 3 5KS8NWT.CNM34671DAD POP3< +OK 4 D76QP56.CNM34671F2A POP3< +OK 5 GWBROT7.CNM34671DE2 POP3< +OK 6 IZ3YMGF.CNM34671D48 POP3< +OK 7 VYMMFQG.CNM34671EFC POP3< +OK 8 YDL1GPM.CNM34671DAD Next run, 10 messages, fetchmail seeks the first unseen message from number 5 POP3< +OK 5 D76QP56.CNM34671F2A POP3< +OK 6 GWBROT7.CNM34671DE2 POP3< +OK 7 IZ3YMGF.CNM34671D48 POP3< +OK 8 N09R1OZ.CNM34671FB1 POP3< +OK 9 VYMMFQG.CNM34671EFC POP3< +OK 10 YDL1GPM.CNM34671DAD As we can see, the UID numbers are not the same from one run to the other. Mercury returns a UID list sorted alphabetically on the UID string ! As a consequence, trying to find the last unseen message from a dichotomic search on such a UID list always fails. It also explains why I thought some messages were downloaded more than one time. They weren't but I believed so because the same message numbers were downloaded several times. They were simply different UID. But on the other side, messages with a low UID (alphabetically) were clearly never downloaded. The only option left is to read the whole UID list each time. Is it possible to configure fetchmail to do that ? Frederic |
From: Matthias A. <mat...@gm...> - 2006-03-07 16:17:02
|
Frederic Marchal wrote: > The only option left is to read the whole UID list each time. Is it > possible to configure fetchmail to do that ? Try "--fastuidl 0" on the command line. Quote from the manual: --fastuidl <number> (Keyword: fastuidl) Do a binary instead of linear search for the first unseen UID. Binary search avoids downloading the UIDs of all mails. This saves time (especially in daemon mode) where downloading the same set of UIDs in each poll is a waste of bandwidth. The number 'n' indicates how rarely a linear search should be done. In daemon mode, linear search is used once fol- lowed by binary searches in 'n-1' polls if 'n' is greater than 1; binary search is always used if 'n' is 1; linear search is always used if 'n' is 0. In non-daemon mode, binary search is used if 'n' is 1; otherwise linear search is used. This option works with POP3 only. Sunil, what should we do WRT fastuidl? We seem to need sanity checks to disable the fastuidl search for servers such as Frédéric's. Thanks, Matthias |
From: Sunil S. <sh...@bo...> - 2006-03-08 10:02:00
|
Quoting from Matthias Andree's mail on Tue, Mar 07, 2006: > what should we do WRT fastuidl? We seem to need sanity checks to disable > the fastuidl search for servers such as Fr?d?ric's. One option is to change the default to a lower value from the current value of 10. The lowest practical value is 2, which will use linear search in alternate polls. A default of 4 should be alright. It will use linear search in the first poll, followed by binary search in the next three polls, followed by linear search in the next poll, and so on. Other solutions are also possible. For example, keeping a map of UID and message number in memory (with corrections for mails deleted after logout). Then, when there is a mismatch in the message number expected and the message number received from the remote server for any one mail, a linear search should be forced in the next(*) poll. This would be equivalent to setting fastuidl to 2 if this mismatch happens in every poll. Note that Frederic has claimed that messages with low UID never get downloaded. This is only partially correct. In his case, messages with low UID never get downloaded whenever a binary search is performed. However, a linear search is performed once in every ten polls (starting with the first poll) and these messages which were missed out earlier will get downloaded in this poll with linear search. -- Sunil Shetye. (*) There is a reason why a linear search should not be forced in this poll after detecting the mismatch. The same reason applies as to why fastuidl cannot be disabled for servers such as that of Frederic. The fastuidl and the fetchsizelimit options have been developed keeping in mind a large mailbox with thousands of mails being polled over a slow dialup line which is prone to frequent disconnection. fetchmail used to previously download all UIDs and all mail sizes right at the start of the poll. Thus, the amount of data transferred even before the start of the download of the first mail used to be huge. For example, if there are 2000 mails in the folder, the data transferred would be around 110kb (assuming a UID size of 40b) for POP3. Assuming that an average mail size is 5kb, this would mean that data equivalent to 22 mails have been downloaded even before downloading the first mail. The fetchsizelimit option delays the downloading of the mail sizes. The fastuidl option delays the downloading of UIDs when doing binary search. Thus, the first (new) mail is downloaded pretty fast. Forcing linear search on this poll after detecting the mismatch will effectively disable fastuidl if this mismatch is detected on every poll. |
From: Frederic M. <fre...@wo...> - 2006-03-08 10:53:10
|
Sunil Shetye wrote: > Quoting from Matthias Andree's mail on Tue, Mar 07, 2006: > >> what should we do WRT fastuidl? We seem to need sanity checks to disable >> the fastuidl search for servers such as Fr?d?ric's. >> > > One option is to change the default to a lower value from the current > value of 10. The lowest practical value is 2, which will use linear > search in alternate polls. > > A default of 4 should be alright. It will use linear search in the > first poll, followed by binary search in the next three polls, > followed by linear search in the next poll, and so on. > > Other solutions are also possible. For example, keeping a map of UID > and message number in memory (with corrections for mails deleted after > logout). Then, when there is a mismatch in the message number expected > and the message number received from the remote server for any one > mail, a linear search should be forced in the next(*) poll. This would > be equivalent to setting fastuidl to 2 if this mismatch happens in > every poll. > > Note that Frederic has claimed that messages with low UID never get > downloaded. This is only partially correct. In his case, messages with > low UID never get downloaded whenever a binary search is performed. > However, a linear search is performed once in every ten polls > (starting with the first poll) and these messages which were missed > out earlier will get downloaded in this poll with linear search. > > Thank you for the clarification ! I was wondering why none of my user had complained about any lost e-mail. We receive a lot of spam but I would have been surprised if all the lost e-mails were only spam :-) A suitable solution would be to have an entry in the FAQ with the settings required for a smooth operation with Mercury. If I understand it right, the only bad effect is a long delay (up to 10 times the poll delay) in the delivery of some e-mails. It isn't that bad. BTW, I reported this sorting feature and the long lines wrapping problem to the Mercury mailing list but I don't expect any answer. It is closed source and with only one developer. David Harris seems to be busy fixing other more urgent problems in his mail client Pegasus Mail. Frederic |
From: Matthias A. <mat...@gm...> - 2006-03-14 16:26:36
|
Sunil Shetye <sh...@bo...> writes: > Quoting from Matthias Andree's mail on Tue, Mar 07, 2006: >> what should we do WRT fastuidl? We seem to need sanity checks to disable >> the fastuidl search for servers such as Fr?d?ric's. > > One option is to change the default to a lower value from the current > value of 10. The lowest practical value is 2, which will use linear > search in alternate polls. The algorithm tries to find the first unused UIDL with a binary search, on the assumption that all new messages were at the end of the list. This doesn't work for Frédéric's server and others. > Other solutions are also possible. For example, keeping a map of UID > and message number in memory (with corrections for mails deleted after > logout). Aren't we keeping those in memory anyways? If so, such a check wouldn't hurt and serve those users whose servers create messages out of order. In the long run, we'd probably better keep them in a database on disk to keep the memory footprint down. > Note that Frederic has claimed that messages with low UID never get > downloaded. This is only partially correct. In his case, messages with > low UID never get downloaded whenever a binary search is performed. > However, a linear search is performed once in every ten polls > (starting with the first poll) and these messages which were missed > out earlier will get downloaded in this poll with linear search. Well. Should I lower the default to 4 for the nonce or wait for a patch? :) -- Matthias Andree |
From: Sunil S. <sh...@bo...> - 2006-03-14 17:27:45
|
Quoting from Matthias Andree's mail on Tue, Mar 14, 2006: > > One option is to change the default to a lower value from the current > > value of 10. The lowest practical value is 2, which will use linear > > search in alternate polls. > > The algorithm tries to find the first unused UIDL with a binary search, > on the assumption that all new messages were at the end of the > list. This doesn't work for Fr?d?ric's server and others. Imagine the worst-case scenario: Frederic's mailbox has 2000 mails on this server and it has to be accessed on a slow dialup line which is prone to disconnection. Even if the assumption doesn't work, it will still download a few mails. Going back to linear search will probably fetch even less mails due to the overheads. > > Other solutions are also possible. For example, keeping a map of UID > > and message number in memory (with corrections for mails deleted after > > logout). > > Aren't we keeping those in memory anyways? If so, such a check wouldn't > hurt and serve those users whose servers create messages out of order. > > In the long run, we'd probably better keep them in a database on disk to > keep the memory footprint down. The question is not just of keeping in memory or database. The question is: What should be done if a mismatch is found? If the only solution is to go back to downloading all UIDs at one go, then I would prefer continuing with the incorrect assumption. The fight is not just between linear vs. binary search. The basic idea of fastuidl is to avoid downloading all UIDs at one go. > Well. Should I lower the default to 4 for the nonce or wait for a patch? :) Here it is: =============================================================================== Index: fetchmail-6.3/fetchmailconf.py =================================================================== --- fetchmail-6.3/fetchmailconf.py (revision 4739) +++ fetchmail-6.3/fetchmailconf.py (working copy) @@ -249,7 +249,7 @@ self.warnings = 3600 # Size warning interval (see tunable.h) self.fetchlimit = 0 # Max messages fetched per batch self.fetchsizelimit = 100 # Max message sizes fetched per transaction - self.fastuidl = 10 # Do fast uidl 9 out of 10 times + self.fastuidl = 4 # Do fast uidl 3 out of 4 times self.batchlimit = 0 # Max message forwarded per batch self.expunge = 0 # Interval between expunges (IMAP) self.ssl = 0 # Enable Seccure Socket Layer Index: fetchmail-6.3/fetchmail.man =================================================================== --- fetchmail-6.3/fetchmail.man (revision 4739) +++ fetchmail-6.3/fetchmail.man (working copy) @@ -615,7 +615,7 @@ once followed by binary searches in 'n-1' polls if 'n' is greater than 1; binary search is always used if 'n' is 1; linear search is always used if 'n' is 0. In non-daemon mode, binary search is used if 'n' is -1; otherwise linear search is used. +1; otherwise linear search is used. The default value of 'n' is 4. This option works with POP3 only. .TP .B \-e <count> | \-\-expunge <count> Index: fetchmail-6.3/fetchmail.c =================================================================== --- fetchmail-6.3/fetchmail.c (revision 4739) +++ fetchmail-6.3/fetchmail.c (working copy) @@ -970,7 +970,7 @@ def_opts.remotename = user; def_opts.listener = SMTP_MODE; def_opts.fetchsizelimit = 100; - def_opts.fastuidl = 10; + def_opts.fastuidl = 4; /* get the location of rcfile */ rcfiledir[0] = 0; =============================================================================== -- Sunil Shetye. |
From: Matthias A. <mat...@gm...> - 2006-03-14 18:06:54
|
Sunil Shetye <sh...@bo...> writes: > Quoting from Matthias Andree's mail on Tue, Mar 14, 2006: >> > One option is to change the default to a lower value from the current >> > value of 10. The lowest practical value is 2, which will use linear >> > search in alternate polls. >> >> The algorithm tries to find the first unused UIDL with a binary search, >> on the assumption that all new messages were at the end of the >> list. This doesn't work for Fr?d?ric's server and others. > > Imagine the worst-case scenario: Frederic's mailbox has 2000 mails on > this server and it has to be accessed on a slow dialup line which is > prone to disconnection. Even if the assumption doesn't work, it will > still download a few mails. Going back to linear search will probably > fetch even less mails due to the overheads. If it confuses the user, it needs to be fixed. If the connection is too flakey to regularly provide the full UIDL connection (which will usually compress well with V.42bis or MNP5), then the user should re-think if he needs "keep messages on server" setups. Usually there are better alternatives, such as *copying* new messages from INBOX to a second IMAP folder which is then downloaded in nokeep mode, so that there is no need to download UID lists. > The question is not just of keeping in memory or database. The > question is: What should be done if a mismatch is found? If the only > solution is to go back to downloading all UIDs at one go, then I would > prefer continuing with the incorrect assumption. Well, I'd rather download full UID lists to get rid of bogus assumptions. Missing messages because of false UIDL assumptions delays messages far more than connection drops (which are usually noticed quickly) does, and that causes certainly more reports than long UIDL downloads. A RANGES extension for POP3 that allowed UIDL n- or UIDL n-m would help a lot. >> Well. Should I lower the default to 4 for the nonce or wait for a patch? :) > > Here it is: I thought about a sanity checking patch that detects when slowuidl is needed, but I'm taking this for the nonce, too. My basic idea is: say we have u UIDLs stored, and the server has m messages. Then, send "UIDL" u+1 "UIDL" u+2 ... "UIDL" m pipelined (non-blocking) (or, on future RANGE capable POP3 servers, "UIDL" u+1 "-" m), and if ANY seen message is in that range, do slow UIDL. No binary search with many round trips. Of course, this needs to track deletions properly and if another client client deletes messages, needs to back down and fetch the whole list. Perhaps sanity checking the last known UIDL helps detecting third-party deletions on the assumption that the likelyhood of the server recycling a UID is low. -- Matthias Andree |
From: Sunil S. <sh...@bo...> - 2006-03-16 10:41:53
|
Quoting from Matthias Andree's mail on Tue, Mar 14, 2006: > > Imagine the worst-case scenario: Frederic's mailbox has 2000 mails on > > this server and it has to be accessed on a slow dialup line which is > > prone to disconnection. Even if the assumption doesn't work, it will > > still download a few mails. Going back to linear search will probably > > fetch even less mails due to the overheads. > > If it confuses the user, it needs to be fixed. If the connection is too > flakey to regularly provide the full UIDL connection (which will usually > compress well with V.42bis or MNP5), then the user should re-think if he > needs "keep messages on server" setups. Usually there are better > alternatives, such as *copying* new messages from INBOX to a second IMAP > folder which is then downloaded in nokeep mode, so that there is no need > to download UID lists. Apart from the issue of fixing, what is missing is documentation on this. There is no place where it is written what configuration should a user use to derive the best mileage from fetchmail. Maybe, some FAQ entry like: ========================================================================= C#. What protocol should I choose with my mailserver? My mailserver supports both IMAP and POP3. fetchmail, by default, uses first IMAP and then POP3 (protocol auto). This doesn't work when you are keeping mails on the mailserver or when there are errors (like socket errors or deliver errors) after a few mails have been downloaded. So, as a first step, you should necessarily choose between IMAP and POP3. - If your mailserver is delivering to multiple folders, choose IMAP. - If you want to download all mails and not leave a copy on the server, choose IMAP. - If you intend to keep mails on the server, check if your server supports UIDL. If it does, choose POP3 and enable the "uidl" option. With POP3 and UIDL, it is even possible to access the mailbox through multiple e-mail clients without affecting fetchmail. - If you intend to keep mails on the server, but your server does not support UIDL, choose IMAP. However, you should not access the mailbox through other e-mail clients as fetchmail will not be able to download all new mails. - If you are having a slow connection to the mailserver and/or expect frequent socket errors from the mailserver, choose IMAP and do not leave a copy on the mailserver. C#. Ok, I choose IMAP. What next? ... C#. Ok, I choose POP3. What next? ... ========================================================================= > > The question is not just of keeping in memory or database. The > > question is: What should be done if a mismatch is found? If the only > > solution is to go back to downloading all UIDs at one go, then I would > > prefer continuing with the incorrect assumption. > > Well, I'd rather download full UID lists to get rid of bogus > assumptions. Missing messages because of false UIDL assumptions delays > messages far more than connection drops (which are usually noticed > quickly) does, and that causes certainly more reports than long UIDL > downloads. > > A RANGES extension for POP3 that allowed UIDL n- or UIDL n-m would help > a lot. If this extension is being used, using it is the better option. > My basic idea is: say we have u UIDLs stored, and the server has m > messages. Then, send "UIDL" u+1 "UIDL" u+2 ... "UIDL" m pipelined > (non-blocking) (or, on future RANGE capable POP3 servers, "UIDL" u+1 "-" > m), and if ANY seen message is in that range, do slow UIDL. No binary > search with many round trips. Of course, this needs to track deletions > properly and if another client client deletes messages, needs to back > down and fetch the whole list. Perhaps sanity checking the last known > UIDL helps detecting third-party deletions on the assumption that the > likelyhood of the server recycling a UID is low. This might not work with "no keep" when fetchmail is not sure if the mails marked by fetchmail for deletion have actually been expunged or not. -- Sunil Shetye. |
From: Matthias A. <mat...@gm...> - 2006-03-17 11:41:18
|
Sunil Shetye <sh...@bo...> writes: > Apart from the issue of fixing, what is missing is documentation on > this. There is no place where it is written what configuration should > a user use to derive the best mileage from fetchmail. Maybe, some FAQ > entry like: Looks good. > ========================================================================= > C#. What protocol should I choose with my mailserver? My mailserver > supports both IMAP and POP3. > > fetchmail, by default, uses first IMAP and then POP3 (protocol auto). Should we deprecate this "protocol" in fetchmail proper so we can remove it in 6.4.0? I have a strong itch to do just that. Somebody got any objections? fetchmailconf could still probe for protocols (but needs to be told about SSL) -- and actually, someone should overhaul it (and rework the ergonomic properties. The current layout of the Tk interface is pretty unergonomic and in some places astonishing for the user.) > This doesn't work when you are keeping mails on the mailserver or when > there are errors (like socket errors or deliver errors) after a few > mails have been downloaded. > > So, as a first step, you should necessarily choose between IMAP and > POP3. > > - If your mailserver is delivering to multiple folders, choose IMAP. > > - If you want to download all mails and not leave a copy on the > server, choose IMAP. In --all --nokeep setups, POP3 is usually simpler - any particular reason why you suggest IMAP here? > - If you intend to keep mails on the server, check if your server > supports UIDL. If it does, choose POP3 and enable the "uidl" option. > With POP3 and UIDL, it is even possible to access the mailbox > through multiple e-mail clients without affecting fetchmail. > > - If you intend to keep mails on the server, but your server does not > support UIDL, choose IMAP. However, you should not access the Make that "MUST NOT" access... > mailbox through other e-mail clients as fetchmail will not be > able to download all new mails. > > - If you are having a slow connection to the mailserver and/or expect > frequent socket errors from the mailserver, choose IMAP and do not > leave a copy on the mailserver. This, too, is something POP3+UIDL should handle. >> My basic idea is: say we have u UIDLs stored, and the server has m >> messages. Then, send "UIDL" u+1 "UIDL" u+2 ... "UIDL" m pipelined >> (non-blocking) (or, on future RANGE capable POP3 servers, "UIDL" u+1 "-" >> m), and if ANY seen message is in that range, do slow UIDL. No binary >> search with many round trips. Of course, this needs to track deletions >> properly and if another client client deletes messages, needs to back >> down and fetch the whole list. Perhaps sanity checking the last known >> UIDL helps detecting third-party deletions on the assumption that the >> likelyhood of the server recycling a UID is low. > > This might not work with "no keep" when fetchmail is not sure if the > mails marked by fetchmail for deletion have actually been expunged or > not. But fetchmail knows if it has seen "+OK" or rather EPIPE in response to a QUIT command, and in no other case than having successfully sent "QUIT" can it assume the messages have been expunged. -- Matthias Andree |