From: Matthias A. <mat...@gm...> - 2010-04-23 20:04:16
|
Rainer Weikusat wrote on 2010-04-23: > "Matthias Andree" <mat...@gm...> writes: >> Rainer Weikusat wrote on 2010-04-21: >>> The company I work for uses fetchmail to download mails from various >>> POP- and IMAP-servers in order to offer an anti-spam/ anti-virus >>> scanning service for smartphone owners. Below is an oprofile excerpt >>> from the 'download server' regarding fetchmail: > > [...] > >> You've bumped into a design flaw of fetchmail's UID handling. > > I assume it is rather an instance of 'Learn Lisp. It will make you a > better programmer for life' :->. Uh, don't tell me. I've known that when my Debian installation fit onto the Conner CP30344 disk in my i486SX25. Tell that to Eric. And tell him LAST was an abomination and defunct-upon-invention. ;) No, don't tell him. I guess he'd do it in Python today, use a dictionary, and leave optimization up to the Python interpreter, and that might be reasonable, unless we'd have to have SSL idiosyncrasies under control. getmail (Python based) used to not complain about expired certificates, and all Charles had to say was "sorry, not my fault". > [...] > >> If we were being blunt and wanted a quick fix, we'd kill the list and >> use a vector (array) and realloc(), a flag "vector is sorted", and >> qsort and bsearch. That would bring things to O(n log n) >> >> If we were to be decent, we'd use the right structure for our purpose, >> and that's a hash table (in C++/STL i'd use a hash_set where >> available and a set where unavailable). We might want to keep things >> in a database on disk, rather than load/write the table for every >> run. >> >> Your patch effort is much appreicated, but however I have serious >> doubts about taking it. save_str and id_find are used together and >> basically O(n^2) complexity. Even if your patch shaves save_str down >> to O(n), id_find usage pattern itself remains O(n^2). > > struct idlist * is quite pervasive in the code. It is also used for > many 'short lists' (eg, ctl->localnames) where a linked list is at > least good enough. Keeping track of the last item on the oldsaved and > newsaved lists was just a reasonably non-invasive, fairly quick fix for > the 'worst offender' case (and the IMHO most braindead ...) I could > smuggle past my boss[*] as 'immediately useful and not that much > effort'. Yes. Only this is one of the makeshift solutions that is not bound to last long. GMX split my INBOX when it hit 50,000 messages. Oops. I do NOT want to poll that with fetchmail <= 6.3.16. :) Seriously, how about this approach: 1. create uid_save and uid_find wrappers around save_str and id_find and provide a typedef 2. change POP3 and possibly IMAP (see below) to use that => decouple from the implementation 3. replace uid_save/uid_find implementations by something decent? This way all the short lists can continue to use the save_str/id_find cruft for the nonce, and the bulk can use a O(n log n) implementation. > [*] I am actually supposed to add UID-support the IMAP > protocol driver and this case of 'walk until your shoes fall > off' just angered me ... Good to know. We have a patch in the berlios tracker [1], which is halfway there, but you need to add UIDVALIDITY support [2], and probably a few touch-ups because the patch ancestor and current fetchmail code have diverged. Oh, and of course we can add your company name to the NEWS file, too. Sponsored by... written by... <hint, hint> :) Note I'm planning to change the license to Affero GPL v3 in the not too distant future. [1] <http://developer.berlios.de/patch/?func=detailpatch&patch_id=740&group_id=1824> also attached [2] <http://www.rfc-editor.org/rfc/rfc3501.txt> 2.3.1.1. Unique Identifier (UID) Message Attribute Just in case BerliOS drops off the net, find the patch attached (may not be available on the list). Note again: not ready for use! HTH -- Matthias Andree |