From: Matthias A. <mat...@gm...> - 2010-04-23 17:41:20
|
Rainer Weikusat wrote on 2010-04-21: > Hello. > > The company I work for uses fetchmail to download mails from various > POP- and IMAP-servers in order to offer an anti-spam/ anti-virus > scanning service for smartphone owners. Below is an oprofile excerpt > from the 'download server' regarding fetchmail: Hi Rainer, thanks for the analysis. You've bumped into a design flaw of fetchmail's UID handling. It has been on my TODO list for long (and using fetchmail with just 500 kept messages was painful), but it was a non-functional improvement, and the functional improvements (bug fixes) always "jumped the queue" :) Basically the whole uid.c outfit is using the wrong data structures for its purpose. It uses a linear list of n elements, and iterates over it 1.5n times, i. e. we get O(n^2) complexity (without constants). The whole stuff - I guess you figured as much - is meant to figure if a certain UID is in a set or not, and read/write that set to disk. We don't need order, only uniqueness. If we were being blunt and wanted a quick fix, we'd kill the list and use a vector (array) and realloc(), a flag "vector is sorted", and qsort and bsearch. That would bring things to O(n log n) If we were to be decent, we'd use the right structure for our purpose, and that's a hash table (in C++/STL i'd use a hash_set where available and a set where unavailable). We might want to keep things in a database on disk, rather than load/write the table for every run. Your patch effort is much appreicated, but however I have serious doubts about taking it. save_str and id_find are used together and basically O(n^2) complexity. Even if your patch shaves save_str down to O(n), id_find usage pattern itself remains O(n^2). > > ,---- > | CPU: Core 2, speed 2393.92 MHz (estimated) > | Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) [...] > | samples % image name symbol name > | 25110 51.5035 fetchmail save_str > | 11399 23.3806 fetchmail id_find > | 5851 12.0011 fetchmail str_in_list > | 912 1.8706 fetchmail yylex > | 558 1.1445 fetchmail initialize_saved_lists > | 493 1.0112 fetchmail do_session > `---- > > According to this, the program spent most of its time (51.5%) with > executing the save_str-routine (uid.c). This routine appends a new > item to a linked list of structures by traversing the list in order to > find the last node and then adding the new one. It is [mostly] called > in a couple of places in pop3.c to update the 'newsaved' and > 'oldsaved' lists of message UIDs (UIDL). The query structure defined > in fetchmail.h actually even contains a member which was supposed to > track the end of the oldsaved list (oldsavedend) but that is only > initialized (in initialize_saved_lists/ uid.c) and never really used > for anything. The patch included below changes the code to actually > track the end of the oldsaved list and introduces a newsavedend struct > query member in order to do the same for the newsaved list. With this > change, the save_str-routines now only accounts for 0.4% of the CPU > time used by the program: > > ,---- > | CPU: Core 2, speed 2393.92 MHz (estimated) > | Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) [...] > | samples % image name symbol name > | 19146 48.1588 fetchmail id_find > | 9416 23.6845 fetchmail str_in_list > | 1452 3.6523 fetchmail yylex > | 860 2.1632 fetchmail initialize_saved_lists > | 810 2.0374 fetchmail do_session > | > | [...] > | > | 167 0.4201 fetchmail save_str > | 157 0.3949 fetchmail parsecmdline > | 145 0.3647 fetchmail readheaders > `---- > > I am aware that the 'split append' (save_str + explicit adjustment of > savedend-pointer) isn't exactly beautiful but it is a reasonably > simple change with a significant positive impact for POP3 mailboxes > containing lots of ('kept on server') messages. > > ---------- > --- fetchmail/fetchmail.h 16 Apr 2010 14:45:15 -0000 1.1.1.3 > +++ fetchmail/fetchmail.h 21 Apr 2010 16:31:34 -0000 1.1.1.3.2.1 > @@ -378,7 +378,7 @@ > unsigned int uid; /* UID of user to deliver to */ > struct idlist *skipped; /* messages skipped on the mail server */ > struct idlist *oldsaved, *newsaved; > - struct idlist **oldsavedend; > + struct idlist **oldsavedend, **newsavedend; > char lastdigest[DIGESTLEN]; /* last MD5 hash seen on this > connection */ > char *folder; /* folder currently being polled */ > --- fetchmail/pop3.c 16 Apr 2010 17:27:18 -0000 1.1.1.3 > +++ fetchmail/pop3.c 21 Apr 2010 16:31:34 -0000 1.1.1.3.2.1 > @@ -881,8 +881,10 @@ > last_nr = try_nr; > /* save it */ > - newl = save_str(&ctl->oldsaved, id, UID_UNSEEN); > + newl = save_str(ctl->oldsavedend, id, UID_UNSEEN); > newl->val.status.num = try_nr; > + > + ctl->oldsavedend = &newl->next; > } > } > if (outlevel >= O_DEBUG && last_nr <= count) > @@ -970,15 +972,18 @@ > /* the first try_id messages are known -> copy them to the newsaved > list */ > for( num = first_nr; num < list_len; num++ ) > { > - struct idlist *newl = save_str(&ctl->newsaved, > + struct idlist *newl = save_str(ctl->newsavedend, > str_from_nr_list(&ctl->oldsaved, num), > UID_UNSEEN); > newl->val.status.num = num - first_nr + 1; > + > + ctl->newsavedend = &newl->next; > } > if( nolinear ) { > free_str_list(&ctl->oldsaved); > ctl->oldsaved = 0; > + ctl->oldsavedend = &ctl->oldsaved; > last = try_id; > } > @@ -998,6 +1003,7 @@ > (void)folder; > /* Ensure that the new list is properly empty */ > ctl->newsaved = (struct idlist *)NULL; > + ctl->newsavedend = &ctl->newsaved; > #ifdef MBOX > /* Alain Knaff suggests this, but it's not RFC standard */ > @@ -1089,9 +1095,11 @@ > { > struct idlist *old, *newl; > - newl = save_str(&ctl->newsaved, id, UID_UNSEEN); > + newl = save_str(ctl->newsavedend, id, UID_UNSEEN); > newl->val.status.num = unum; > + ctl->newsavedend = &newl->next; > + > if ((old = str_in_list(&ctl->oldsaved, id, FALSE))) > { > flag mark = old->val.status.mark; > @@ -1123,7 +1131,8 @@ > * swap the lists (say, due to socket error), > * the same mail will not be downloaded again. > */ > - old = save_str(&ctl->oldsaved, id, UID_UNSEEN); > + old = save_str(ctl->oldsavedend, id, UID_UNSEEN); > + ctl->oldsavedend = &old->next; > } > /* save the number */ > old->val.status.num = unum; > @@ -1224,8 +1233,10 @@ > } > /* save it */ > - newl = save_str(&ctl->oldsaved, id, UID_UNSEEN); > + newl = save_str(ctl->oldsavedend, id, UID_UNSEEN); > newl->val.status.num = num; > + > + ctl->oldsavedend = &newl->next; > return(FALSE); > } > else > --- fetchmail/transact.c 16 Apr 2010 14:45:18 -0000 1.1.1.2 > +++ fetchmail/transact.c 21 Apr 2010 16:31:34 -0000 1.1.1.2.2.1 > @@ -882,8 +882,10 @@ > sscanf(line+12, "%s", id); > if (!str_find( &ctl->newsaved, num)) > { > - struct idlist *newl = save_str(&ctl->newsaved,id,UID_SEEN); > + struct idlist *newl = save_str(ctl->newsavedend,id,UID_SEEN); > newl->val.status.num = num; > + > + ctl->newsavedend = &newl->next; > } > } > } > --- fetchmail/uid.c 17 Aug 2009 10:47:46 -0000 1.1.1.1 > +++ fetchmail/uid.c 21 Apr 2010 16:31:34 -0000 1.1.1.1.2.1 > @@ -119,6 +119,7 @@ > ctl->oldsaved = (struct idlist *)NULL; > ctl->newsaved = (struct idlist *)NULL; > ctl->oldsavedend = &ctl->oldsaved; > + ctl->newsavedend = &ctl->newsaved; > } > errno = 0; > @@ -151,6 +152,7 @@ > char saveddelim1; > char *delimp2; > char saveddelim2 = '\0'; /* pacify -Wall */ > + struct idlist *old; > while (fgets(buf, POPBUFSIZE, tmpfp) != (char *)NULL) > { > @@ -215,7 +217,8 @@ > for (ctl = hostlist; ctl; ctl = ctl->next) { > if (strcasecmp(host, ctl->server.queryname) == 0 > && strcasecmp(user, ctl->remotename) == 0) { > - save_str(&ctl->oldsaved, id, UID_SEEN); > + old = save_str(ctl->oldsavedend, id, UID_SEEN); > + ctl->oldsavedend = &old->next; > break; > } > } > @@ -547,7 +550,9 @@ > if (outlevel >= O_DEBUG) > report(stdout, GT_("swapping UID lists\n")); > ctl->oldsaved = ctl->newsaved; > + ctl->oldsavedend = ctl->newsavedend; > ctl->newsaved = (struct idlist *) NULL; > + ctl->newsavedend = &ctl->newsaved; > free_str_list(&temp); > } > /* in fast uidl, there is no need to swap lists: the old state of > @@ -581,6 +586,7 @@ > report(stdout, GT_("discarding new UID list\n")); > free_str_list(&ctl->newsaved); > ctl->newsaved = (struct idlist *) NULL; > + ctl->newsavedend = &ctl->newsaved; > } > } > _______________________________________________ > fetchmail-devel mailing list > fet...@li... > https://lists.berlios.de/mailman/listinfo/fetchmail-devel -- Matthias Andree |