You can subscribe to this list here.
2004 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(73) |
Jul
(22) |
Aug
(42) |
Sep
(11) |
Oct
(23) |
Nov
(40) |
Dec
(2) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2005 |
Jan
|
Feb
|
Mar
(17) |
Apr
(26) |
May
(6) |
Jun
(21) |
Jul
(133) |
Aug
(25) |
Sep
(40) |
Oct
(12) |
Nov
(71) |
Dec
(57) |
2006 |
Jan
(23) |
Feb
(22) |
Mar
(43) |
Apr
(27) |
May
(13) |
Jun
(7) |
Jul
(3) |
Aug
(20) |
Sep
(16) |
Oct
(17) |
Nov
(31) |
Dec
(10) |
2007 |
Jan
(12) |
Feb
(17) |
Mar
(26) |
Apr
(13) |
May
(4) |
Jun
(1) |
Jul
(1) |
Aug
(21) |
Sep
(3) |
Oct
(8) |
Nov
(8) |
Dec
(5) |
2008 |
Jan
(5) |
Feb
(1) |
Mar
(3) |
Apr
(10) |
May
(3) |
Jun
(11) |
Jul
(5) |
Aug
(1) |
Sep
(6) |
Oct
|
Nov
(10) |
Dec
(2) |
2009 |
Jan
(17) |
Feb
(2) |
Mar
(1) |
Apr
(9) |
May
(23) |
Jun
(22) |
Jul
(32) |
Aug
(30) |
Sep
(11) |
Oct
(24) |
Nov
(4) |
Dec
|
2010 |
Jan
(12) |
Feb
(56) |
Mar
(32) |
Apr
(41) |
May
(36) |
Jun
(14) |
Jul
(7) |
Aug
(10) |
Sep
(13) |
Oct
(16) |
Nov
|
Dec
(14) |
2011 |
Jan
(3) |
Feb
|
Mar
(1) |
Apr
(16) |
May
(36) |
Jun
(2) |
Jul
|
Aug
(9) |
Sep
(2) |
Oct
(1) |
Nov
(8) |
Dec
(3) |
2012 |
Jan
(1) |
Feb
(5) |
Mar
(1) |
Apr
(1) |
May
(2) |
Jun
|
Jul
|
Aug
(7) |
Sep
(9) |
Oct
(2) |
Nov
(8) |
Dec
(9) |
2013 |
Jan
(11) |
Feb
(6) |
Mar
(14) |
Apr
(10) |
May
|
Jun
(12) |
Jul
(2) |
Aug
(2) |
Sep
(2) |
Oct
|
Nov
(7) |
Dec
(4) |
2014 |
Jan
(1) |
Feb
|
Mar
(2) |
Apr
(1) |
May
|
Jun
|
Jul
(1) |
Aug
(1) |
Sep
(1) |
Oct
(1) |
Nov
|
Dec
|
2015 |
Jan
|
Feb
|
Mar
|
Apr
(1) |
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
|
Feb
|
Mar
|
Apr
(4) |
May
|
Jun
(7) |
Jul
|
Aug
(8) |
Sep
(8) |
Oct
|
Nov
|
Dec
(2) |
2017 |
Jan
|
Feb
|
Mar
(2) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
(3) |
Nov
|
Dec
|
2018 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
(2) |
Nov
|
Dec
|
2019 |
Jan
|
Feb
|
Mar
|
Apr
|
May
(3) |
Jun
|
Jul
|
Aug
(6) |
Sep
(3) |
Oct
|
Nov
|
Dec
|
2020 |
Jan
(2) |
Feb
(3) |
Mar
(5) |
Apr
(2) |
May
(3) |
Jun
(3) |
Jul
(3) |
Aug
(2) |
Sep
(3) |
Oct
(4) |
Nov
(3) |
Dec
|
2021 |
Jan
(5) |
Feb
(2) |
Mar
(3) |
Apr
(3) |
May
|
Jun
|
Jul
(2) |
Aug
(14) |
Sep
(3) |
Oct
(4) |
Nov
(4) |
Dec
(3) |
2022 |
Jan
|
Feb
(2) |
Mar
(2) |
Apr
(1) |
May
|
Jun
|
Jul
(3) |
Aug
(1) |
Sep
|
Oct
(2) |
Nov
|
Dec
|
2023 |
Jan
(3) |
Feb
(1) |
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
2024 |
Jan
(2) |
Feb
|
Mar
|
Apr
|
May
(1) |
Jun
|
Jul
(1) |
Aug
|
Sep
(2) |
Oct
(1) |
Nov
(1) |
Dec
(1) |
2025 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
(3) |
Jul
(1) |
Aug
(1) |
Sep
(2) |
Oct
(1) |
Nov
|
Dec
|
From: <ad...@be...> - 2010-06-17 22:29:28
|
Bug #17272, was updated on 2010-Jun-17 14:17 Here is a current snapshot of the bug. Project: fetchmail Category: None Status: Closed Resolution: Fixed Bug Group: None Priority: 3 Submitted by: bjoernv Assigned to : m-a Summary: Fetchmail does not fetch some Web.DE mails Details: Fetchmail is not able to fetch some automatically generated Web.DE mails. Especially the so called "POP3 Reports" could not be fetched. These mails where generated by the German Freemail provider Web.DE if the user has some suspect mails in his folders which where not delivered by POP3. After some debugging I found, that these mails contain a header with some small problems. First the line "This is a multi-part message in MIME format." is part of the header. Second the header and the body is not divided by a single new line, but by a Space character followed by a new line. Fetchmails output and a Sample mail is appended. The problem was tested with Fetchmail 6.3.11 from openSUSE 11.2 and with the current GIT version. The mails from Web.DE are badly formatted. Anyway I think, that Fetchmail should be more tolerant with these mails. Normal MUAs like Seamonkey and Thunderbird can deal with the Web.DE mails. Follow-Ups: Date: 2010-Jun-17 22:29 By: m-a Comment: was fixed in 6.3.15 - will consider changing the error message ------------------------------------------------------- Date: 2010-Jun-17 22:21 By: bjoernv Comment: Using "--bad-header accept" helps to solve the issue with Web.DE. Fetchmail corrects the incorrect header with this option (Fetchmail inserts a newline before "This is a multi-part message in MIME format."). So the problem is solved for me. But may be, it's a good idea to add a hint after the error messages like "reading message XX...@po...:3 of 8 (10352 octets) fetchmail: incorrect header line found while scanning headers" Something link ("(try --bad-header accept)") would be helpful. ------------------------------------------------------- Date: 2010-Jun-17 21:01 By: m-a Comment: Please - complain to web.de to have them generate their messages in proper format - concurrently retest with fetchmail 6.3.17 and use "--bad-header accept" as command line option and follow up to report if this works. ------------------------------------------------------- Date: 2010-Jun-17 14:51 By: bjoernv Comment: Couldn't I attach files here? The sample Web.DE "POP3 Report" can be found here. If your browser shows extra newlines please download the file and open it with an good editor. The extra newlines are could be some "CR" characters: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-web.de-message.log.txt The sample Fetchmail log (from Fetchmail Git-Master on 2010-06-17) is here: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-git-master-web.de.log.txt ------------------------------------------------------- For detailed info, follow this link: http://developer.berlios.de/bugs/?func=detailbug&bug_id=17272&group_id=1824 |
From: <ad...@be...> - 2010-06-17 22:21:38
|
Bug #17272, was updated on 2010-Jun-17 04:17 Here is a current snapshot of the bug. Project: fetchmail Category: None Status: Open Resolution: None Bug Group: None Priority: 3 Submitted by: bjoernv Assigned to : m-a Summary: Fetchmail does not fetch some Web.DE mails Details: Fetchmail is not able to fetch some automatically generated Web.DE mails. Especially the so called "POP3 Reports" could not be fetched. These mails where generated by the German Freemail provider Web.DE if the user has some suspect mails in his folders which where not delivered by POP3. After some debugging I found, that these mails contain a header with some small problems. First the line "This is a multi-part message in MIME format." is part of the header. Second the header and the body is not divided by a single new line, but by a Space character followed by a new line. Fetchmails output and a Sample mail is appended. The problem was tested with Fetchmail 6.3.11 from openSUSE 11.2 and with the current GIT version. The mails from Web.DE are badly formatted. Anyway I think, that Fetchmail should be more tolerant with these mails. Normal MUAs like Seamonkey and Thunderbird can deal with the Web.DE mails. Follow-Ups: Date: 2010-Jun-17 12:21 By: bjoernv Comment: Using "--bad-header accept" helps to solve the issue with Web.DE. Fetchmail corrects the incorrect header with this option (Fetchmail inserts a newline before "This is a multi-part message in MIME format."). So the problem is solved for me. But may be, it's a good idea to add a hint after the error messages like "reading message XX...@po...:3 of 8 (10352 octets) fetchmail: incorrect header line found while scanning headers" Something link ("(try --bad-header accept)") would be helpful. ------------------------------------------------------- Date: 2010-Jun-17 11:01 By: m-a Comment: Please - complain to web.de to have them generate their messages in proper format - concurrently retest with fetchmail 6.3.17 and use "--bad-header accept" as command line option and follow up to report if this works. ------------------------------------------------------- Date: 2010-Jun-17 04:51 By: bjoernv Comment: Couldn't I attach files here? The sample Web.DE "POP3 Report" can be found here. If your browser shows extra newlines please download the file and open it with an good editor. The extra newlines are could be some "CR" characters: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-web.de-message.log.txt The sample Fetchmail log (from Fetchmail Git-Master on 2010-06-17) is here: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-git-master-web.de.log.txt ------------------------------------------------------- For detailed info, follow this link: http://developer.berlios.de/bugs/?func=detailbug&bug_id=17272&group_id=1824 |
From: <ad...@be...> - 2010-06-17 21:01:31
|
Bug #17272, was updated on 2010-Jun-17 14:17 Here is a current snapshot of the bug. Project: fetchmail Category: None Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: bjoernv Assigned to : none Summary: Fetchmail does not fetch some Web.DE mails Details: Fetchmail is not able to fetch some automatically generated Web.DE mails. Especially the so called "POP3 Reports" could not be fetched. These mails where generated by the German Freemail provider Web.DE if the user has some suspect mails in his folders which where not delivered by POP3. After some debugging I found, that these mails contain a header with some small problems. First the line "This is a multi-part message in MIME format." is part of the header. Second the header and the body is not divided by a single new line, but by a Space character followed by a new line. Fetchmails output and a Sample mail is appended. The problem was tested with Fetchmail 6.3.11 from openSUSE 11.2 and with the current GIT version. The mails from Web.DE are badly formatted. Anyway I think, that Fetchmail should be more tolerant with these mails. Normal MUAs like Seamonkey and Thunderbird can deal with the Web.DE mails. Follow-Ups: Date: 2010-Jun-17 21:01 By: m-a Comment: Please - complain to web.de to have them generate their messages in proper format - concurrently retest with fetchmail 6.3.17 and use "--bad-header accept" as command line option and follow up to report if this works. ------------------------------------------------------- Date: 2010-Jun-17 14:51 By: bjoernv Comment: Couldn't I attach files here? The sample Web.DE "POP3 Report" can be found here. If your browser shows extra newlines please download the file and open it with an good editor. The extra newlines are could be some "CR" characters: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-web.de-message.log.txt The sample Fetchmail log (from Fetchmail Git-Master on 2010-06-17) is here: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-git-master-web.de.log.txt ------------------------------------------------------- For detailed info, follow this link: http://developer.berlios.de/bugs/?func=detailbug&bug_id=17272&group_id=1824 |
From: <ad...@be...> - 2010-06-17 14:51:13
|
Bug #17272, was updated on 2010-Jun-17 04:17 Here is a current snapshot of the bug. Project: fetchmail Category: None Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: bjoernv Assigned to : none Summary: Fetchmail does not fetch some Web.DE mails Details: Fetchmail is not able to fetch some automatically generated Web.DE mails. Especially the so called "POP3 Reports" could not be fetched. These mails where generated by the German Freemail provider Web.DE if the user has some suspect mails in his folders which where not delivered by POP3. After some debugging I found, that these mails contain a header with some small problems. First the line "This is a multi-part message in MIME format." is part of the header. Second the header and the body is not divided by a single new line, but by a Space character followed by a new line. Fetchmails output and a Sample mail is appended. The problem was tested with Fetchmail 6.3.11 from openSUSE 11.2 and with the current GIT version. The mails from Web.DE are badly formatted. Anyway I think, that Fetchmail should be more tolerant with these mails. Normal MUAs like Seamonkey and Thunderbird can deal with the Web.DE mails. Follow-Ups: Date: 2010-Jun-17 04:51 By: bjoernv Comment: Couldn't I attach files here? The sample Web.DE "POP3 Report" can be found here. If your browser shows extra newlines please download the file and open it with an good editor. The extra newlines are could be some "CR" characters: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-web.de-message.log.txt The sample Fetchmail log (from Fetchmail Git-Master on 2010-06-17) is here: http://user.cs.tu-berlin.de/~bjoern/fetchmail/fetchmail-git-master-web.de.log.txt ------------------------------------------------------- For detailed info, follow this link: http://developer.berlios.de/bugs/?func=detailbug&bug_id=17272&group_id=1824 |
From: <ad...@be...> - 2010-06-17 14:17:23
|
Bug #17272, was updated on 2010-Jun-17 04:17 Here is a current snapshot of the bug. Project: fetchmail Category: None Status: Open Resolution: None Bug Group: None Priority: 5 Submitted by: bjoernv Assigned to : none Summary: Fetchmail does not fetch some Web.DE mails Details: Fetchmail is not able to fetch some automatically generated Web.DE mails. Especially the so called "POP3 Reports" could not be fetched. These mails where generated by the German Freemail provider Web.DE if the user has some suspect mails in his folders which where not delivered by POP3. After some debugging I found, that these mails contain a header with some small problems. First the line "This is a multi-part message in MIME format." is part of the header. Second the header and the body is not divided by a single new line, but by a Space character followed by a new line. Fetchmails output and a Sample mail is appended. The problem was tested with Fetchmail 6.3.11 from openSUSE 11.2 and with the current GIT version. The mails from Web.DE are badly formatted. Anyway I think, that Fetchmail should be more tolerant with these mails. Normal MUAs like Seamonkey and Thunderbird can deal with the Web.DE mails. For detailed info, follow this link: http://developer.berlios.de/bugs/?func=detailbug&bug_id=17272&group_id=1824 |
From: Matthias A. <mat...@gm...> - 2010-06-12 15:50:26
|
Am 12.06.2010, 00:06 Uhr, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: > >> Am 08.06.2010, 23:57 Uhr, schrieb Rainer Weikusat: >> >>> The file included below is a diff of the 'running' uid_db code against >>> the 6.3.17 fetchmail tree. I hope that it is ok in this form, because >>> I'm still in extreme 'now, we just need to deliver'-mode ... > > [...] > >> Is that the final submission to be expected, or do you still want to >> polish some bits of it? > > This is the code which is used for the POP3-using customers of 'my' > application. Because of this, I will fix or mitigate any issues with > it which come to my attention and affect me. As always, I'm meanwhile > working on something completely different with the usual 'Yesterday!' > [emphatically!] as most wanted release date. This means the short > answer is 'Yes' and the medium-sized one would be "I can want things > all day but that doesn't mean I can actually do them :-(". The usual there's more work than time problem that most of us share (and that I'd prefer to being bored all the time!). Thank you. I will integrate the code as is for a new minor branch and see to publishing it shortly after the merge in Git (I plan that there be a series of alpha- and beta-releases). I can't say how soon this will start, but I will create a Git branch for that. That new branch will have substantially less portability hacks and will require somewhat recent tool chains (expecting some of the more sophisticated C99 features but omitting the arcane ones) and operating systems (Single Unix Specification v3 aka IEEE Std 1003.1-2001, 2004 edition -- which has been superseded by v4 aka IEEE Std 1003.1-2008, but OS support for v4 isn't widely available yet), but is supposed to have little practical impact. I currently have FreeBSD 6, 7, 8, NetBSD 5.1, Solaris 10, Ubuntu 10.04, Fedora 13 and openSUSE 11.2 for testing and can occasionally test compiles on Cygwin 1.7, Ubuntu 8.04, and Darwin (the Unix OS underneath MacOS X). I do not care about half-baked or dying/dead non-IEEE-Std-1003 systems and have in fact removed support for them from the code to make it more readable and maintainable. If someone wants to provide me with an SSH account (no root privileges required) on another operating system that is under full vendor support, such as DragonflyBSD, OpenBSD, AIX, HP/UX, please contact me off-list. I think I don't care much about SCO, OpenServer, Unixware currently, but would take patches if they integrate cleanly and don't cause regressions on supported systems. > [...] > >> You also mentioned Disconnected IMAP earlier, is that a project you >> are still pursuing and/or want to submit later? >> If so, what's the status? > > Using \Seen to track 'message newness' reportedly caused the > gmail-interface to no longer bring up a whole thread of communication > when using it to access a message for the first time. This was deemed > serious enough (by the powers which be) to warrant a fix and because > of this, I am running a fetchmail-variant which support DIMAP exactly > to the degree needed for this particular application. In particular, > it doesn't keep a database of actually downloaded messages but just > assumes that everything between (<last recorded uid>, <UIDNEXT>) is > interesting and anything else isn't. An additional unpleasant > workaround would be a hack intended to deal with so-called > 'IMAP4rev1-servers' supporting UIDVALIDITY but not UIDNEXT (eg, > mail.amuni.com): It is assumed that the new UID space can be queried > in batches of thousands such that SEARCH UID ... never returns an empty > response except if the end of the mailbox was reached. This is > non-compliant with the IPMAP4rev1 RFC. > > Independently of these technical problems, this is code I have written > with the intent that no one except myself ever looks at it and it > contains too many of my usual opinions regarding how code should or > shouldn't be written, such as using mutable function pointers as > parser state variables and things like that, that I would dare to show > it to anyone. What's wrong with that? Function pointers are a regular feature of standard C. -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-06-12 00:06:53
|
"Matthias Andree" <mat...@gm...> writes: > Am 08.06.2010, 23:57 Uhr, schrieb Rainer Weikusat: > >> The file included below is a diff of the 'running' uid_db code against >> the 6.3.17 fetchmail tree. I hope that it is ok in this form, because >> I'm still in extreme 'now, we just need to deliver'-mode ... [...] > Is that the final submission to be expected, or do you still want to > polish some bits of it? This is the code which is used for the POP3-using customers of 'my' application. Because of this, I will fix or mitigate any issues with it which come to my attention and affect me. As always, I'm meanwhile working on something completely different with the usual 'Yesterday!' [emphatically!] as most wanted release date. This means the short answer is 'Yes' and the medium-sized one would be "I can want things all day but that doesn't mean I can actually do them :-(". [...] > You also mentioned Disconnected IMAP earlier, is that a project you > are still pursuing and/or want to submit later? > If so, what's the status? Using \Seen to track 'message newness' reportedly caused the gmail-interface to no longer bring up a whole thread of communication when using it to access a message for the first time. This was deemed serious enough (by the powers which be) to warrant a fix and because of this, I am running a fetchmail-variant which support DIMAP exactly to the degree needed for this particular application. In particular, it doesn't keep a database of actually downloaded messages but just assumes that everything between (<last recorded uid>, <UIDNEXT>) is interesting and anything else isn't. An additional unpleasant workaround would be a hack intended to deal with so-called 'IMAP4rev1-servers' supporting UIDVALIDITY but not UIDNEXT (eg, mail.amuni.com): It is assumed that the new UID space can be queried in batches of thousands such that SEARCH UID ... never returns an empty response except if the end of the mailbox was reached. This is non-compliant with the IPMAP4rev1 RFC. Independently of these technical problems, this is code I have written with the intent that no one except myself ever looks at it and it contains too many of my usual opinions regarding how code should or shouldn't be written, such as using mutable function pointers as parser state variables and things like that, that I would dare to show it to anyone. |
From: Matthias A. <mat...@gm...> - 2010-06-09 09:17:52
|
Am 08.06.2010, 23:57 Uhr, schrieb Rainer Weikusat: > The file included below is a diff of the 'running' uid_db code against > the 6.3.17 fetchmail tree. I hope that it is ok in this form, because > I'm still in extreme 'now, we just need to deliver'-mode ... Hi Rainer, thanks a lot. Is that the final submission to be expected, or do you still want to polish some bits of it? I'm asking because I plan to use the code in a new minor release (6.4 or 6.5, not yet sure how I'll name it) rather than 6.3.X, which means I may need to tweak a few ends (because I already purged old compatibility cruft from that tree) and that might conflict with your future changes. If you want to modify it, I'll just push the patch to a new branch in Gitorious so it's there for the public, but hold off with merging until around June 20, so that I don't need to re-merge later. You also mentioned Disconnected IMAP earlier, is that a project you are still pursuing and/or want to submit later? If so, what's the status? I absolutely do not mean to urge here, this is only so that we can align planning a bit. Best regards Matthias |
From: Rainer W. <rwe...@ms...> - 2010-06-08 23:58:01
|
The file included below is a diff of the 'running' uid_db code against the 6.3.17 fetchmail tree. I hope that it is ok in this form, because I'm still in extreme 'now, we just need to deliver'-mode ... ----------- --- fetchmail/Makefile.am 7 May 2010 13:49:15 -0000 1.1.1.6 +++ fetchmail/Makefile.am 25 May 2010 17:28:29 -0000 1.1.1.6.2.3 @@ -38,7 +38,7 @@ smbencrypt.h smbdes.c smbencrypt.c smbmd4.c smbutil.c \ libesmtp/gethostbyname.h libesmtp/gethostbyname.c \ smbtypes.h fm_getaddrinfo.c tls.c rfc822valid.c \ - xmalloc.h sdump.h sdump.c + xmalloc.h sdump.h sdump.c uid_db.h uid_db.c libfm_a_LIBADD= $(EXTRAOBJ) libfm_a_DEPENDENCIES= $(EXTRAOBJ) LDADD = libfm.a @LIBINTL@ $(LIBOBJS) --- fetchmail/fetchmail.c 7 May 2010 13:49:14 -0000 1.1.1.4 +++ fetchmail/fetchmail.c 24 May 2010 16:02:10 -0000 1.1.1.4.2.3 @@ -1537,6 +1537,14 @@ return(st); } +static int print_id_of(struct uid_db_record *rec, void *unused) +{ + (void)unused; + + printf("\t%s\n", rec->id); + return 0; +} + static void dump_params (struct runctl *runp, struct query *querylist, flag implicit) /* display query parameters in English */ @@ -1938,20 +1946,14 @@ if (ctl->server.protocol > P_POP2 && MAILBOX_PROTOCOL(ctl)) { - if (!ctl->oldsaved) + int count; + + if (!(count = uid_db_n_records(&ctl->oldsaved))) printf(GT_(" No UIDs saved from this host.\n")); else { - struct idlist *idp; - int count = 0; - - for (idp = ctl->oldsaved; idp; idp = idp->next) - ++count; - printf(GT_(" %d UIDs saved.\n"), count); - if (outlevel >= O_VERBOSE) - for (idp = ctl->oldsaved; idp; idp = idp->next) - printf("\t%s\n", idp->id); + traverse_uid_db(&ctl->oldsaved, print_id_of, NULL); } } --- fetchmail/fetchmail.h 7 May 2010 13:49:13 -0000 1.1.1.4 +++ fetchmail/fetchmail.h 24 May 2010 16:02:10 -0000 1.1.1.4.2.3 @@ -39,6 +39,8 @@ # include "trio/trio.h" #endif +#include "uid_db.h" + /* We need this for strstr */ #if !defined(HAVE_STRSTR) && !defined(strstr) char *strstr(const char *, const char *); @@ -387,8 +389,7 @@ int smtp_socket; /* socket descriptor for SMTP connection */ unsigned int uid; /* UID of user to deliver to */ struct idlist *skipped; /* messages skipped on the mail server */ - struct idlist *oldsaved, *newsaved; - struct idlist **oldsavedend; + struct uid_db oldsaved, newsaved; char lastdigest[DIGESTLEN]; /* last MD5 hash seen on this connection */ char *folder; /* folder currently being polled */ --- fetchmail/pop3.c 7 May 2010 13:49:19 -0000 1.1.1.4 +++ fetchmail/pop3.c 24 May 2010 16:02:10 -0000 1.1.1.4.2.6 @@ -21,6 +21,7 @@ #include "fetchmail.h" #include "socket.h" #include "i18n.h" +#include "uid_db.h" #ifdef OPIE_ENABLE #ifdef __cplusplus @@ -845,20 +846,20 @@ last_nr = count + 1; while (first_nr < last_nr - 1) { - struct idlist *newl; + struct uid_db_record *rec; try_nr = (first_nr + last_nr) / 2; if ((ok = pop3_getuidl(sock, try_nr, id, sizeof(id))) != 0) return ok; - if ((newl = str_in_list(&ctl->oldsaved, id, FALSE))) + if ((rec = find_uid_by_id(&ctl->oldsaved, id))) { - flag mark = newl->val.status.mark; + flag mark = rec->status; if (mark == UID_DELETED || mark == UID_EXPUNGED) { if (outlevel >= O_VERBOSE) report(stderr, GT_("id=%s (num=%u) was deleted, but is still present!\n"), id, try_nr); /* just mark it as seen now! */ - newl->val.status.mark = mark = UID_SEEN; + rec->status = mark = UID_SEEN; } /* narrow the search region! */ @@ -872,7 +873,7 @@ first_nr = try_nr; /* save the number */ - newl->val.status.num = try_nr; + set_uid_db_num(&ctl->oldsaved, rec, try_nr); } else { @@ -881,8 +882,8 @@ last_nr = try_nr; /* save it */ - newl = save_str(&ctl->oldsaved, id, UID_UNSEEN); - newl->val.status.num = try_nr; + rec = uid_db_insert(&ctl->oldsaved, id, UID_UNSEEN); + set_uid_db_num(&ctl->oldsaved, rec, try_nr); } } if (outlevel >= O_DEBUG && last_nr <= count) @@ -910,34 +911,38 @@ * of messages to get. * + Otherwise run a binary search to determine the last known message */ + struct uid_db_record *rec; int ok, nolinear = 0; - int first_nr, list_len, try_id, try_nr, add_id; + int first_nr, n_recs, try_id, try_nr, add_id; int num; char id [IDLEN+1]; if ((ok = pop3_gettopid(sock, 1, id, sizeof(id))) != 0) return ok; - if( ( first_nr = str_nr_in_list(&ctl->oldsaved, id) ) == -1 ) { + rec = first_uid_in_db(&ctl->oldsaved, id); + if (!rec) { /* the first message is unknown -> all messages are new */ *newp = *countp; return 0; } - + first_nr = rec->pos; + /* check where we expect the latest known message */ - list_len = count_list( &ctl->oldsaved ); - try_id = list_len - first_nr; /* -1 + 1 */ + n_recs = uid_db_n_records(&ctl->oldsaved); + try_id = n_recs - first_nr; /* -1 + 1 */ if( try_id > 1 ) { if( try_id <= *countp ) { if ((ok = pop3_gettopid(sock, try_id, id, sizeof(id))) != 0) return ok; - - try_nr = str_nr_last_in_list(&ctl->oldsaved, id); + + rec = last_uid_in_db(&ctl->oldsaved, id); + try_nr = rec ? rec->pos : -1; } else { try_id = *countp+1; try_nr = -1; } - if( try_nr != list_len -1 ) { + if( try_nr != n_recs -1 ) { /* some messages inbetween have been deleted... */ if( try_nr == -1 ) { nolinear = 1; @@ -955,7 +960,9 @@ if ((ok = pop3_gettopid(sock, try_id, id, sizeof(id))) != 0) return ok; - try_nr = str_nr_in_list(&ctl->oldsaved, id); + + rec = find_uid_by_id(&ctl->oldsaved, id); + try_nr = rec ? rec->pos : -1; } if( try_nr == -1 ) { try_id--; @@ -968,17 +975,16 @@ } } /* the first try_id messages are known -> copy them to the newsaved list */ - for( num = first_nr; num < list_len; num++ ) + for( num = first_nr; num < n_recs; num++ ) { - struct idlist *newl = save_str(&ctl->newsaved, - str_from_nr_list(&ctl->oldsaved, num), - UID_UNSEEN); - newl->val.status.num = num - first_nr + 1; + rec = find_uid_by_pos(&ctl->oldsaved, num); + + rec = uid_db_insert(&ctl->newsaved, rec->id, rec->status); + set_uid_db_num(&ctl->newsaved, rec, num - first_nr + 1); } if( nolinear ) { - free_str_list(&ctl->oldsaved); - ctl->oldsaved = 0; + clear_uid_db(&ctl->oldsaved); last = try_id; } @@ -997,7 +1003,7 @@ (void)folder; /* Ensure that the new list is properly empty */ - ctl->newsaved = (struct idlist *)NULL; + clear_uid_db(&ctl->newsaved); #ifdef MBOX /* Alain Knaff suggests this, but it's not RFC standard */ @@ -1032,6 +1038,9 @@ int fastuidl; char id [IDLEN+1]; + set_uid_db_num_pos_0(&ctl->oldsaved, *countp); + set_uid_db_num_pos_0(&ctl->newsaved, *countp); + /* should we do fast uidl this time? */ fastuidl = ctl->fastuidl; if (*countp > 7 && /* linear search is better if there are few mails! */ @@ -1091,14 +1100,13 @@ if (parseuid(buf, &unum, id, sizeof(id)) == PS_SUCCESS) { - struct idlist *old, *newl; + struct uid_db_record *old_rec, *new_rec; - newl = save_str(&ctl->newsaved, id, UID_UNSEEN); - newl->val.status.num = unum; + new_rec = uid_db_insert(&ctl->newsaved, id, UID_UNSEEN); - if ((old = str_in_list(&ctl->oldsaved, id, FALSE))) + if ((old_rec = find_uid_by_id(&ctl->oldsaved, id))) { - flag mark = old->val.status.mark; + flag mark = old_rec->status; if (mark == UID_DELETED || mark == UID_EXPUNGED) { /* XXX FIXME: switch 3 occurrences from @@ -1108,9 +1116,9 @@ if (outlevel >= O_VERBOSE) report(stderr, GT_("id=%s (num=%d) was deleted, but is still present!\n"), id, (int)unum); /* just mark it as seen now! */ - old->val.status.mark = mark = UID_SEEN; + old_rec->status = mark = UID_SEEN; } - newl->val.status.mark = mark; + new_rec->status = mark; if (mark == UID_UNSEEN) { (*newp)++; @@ -1127,10 +1135,14 @@ * swap the lists (say, due to socket error), * the same mail will not be downloaded again. */ - old = save_str(&ctl->oldsaved, id, UID_UNSEEN); + old_rec = uid_db_insert(&ctl->oldsaved, id, UID_UNSEEN); + } /* save the number */ - old->val.status.num = unum; + if (new_rec->status == UID_UNSEEN || !ctl->keep) { + set_uid_db_num(&ctl->oldsaved, old_rec, unum); + set_uid_db_num(&ctl->newsaved, new_rec, unum); + } } else return PS_ERROR; } /* multi-line loop for UIDL reply */ @@ -1199,8 +1211,9 @@ static int pop3_is_old(int sock, struct query *ctl, int num) /* is the given message old? */ { - struct idlist *newl; - if (!ctl->oldsaved) + struct uid_db_record *rec; + + if (!uid_db_n_records(&ctl->oldsaved)) return (num <= last); else if (dofastuidl) { @@ -1211,30 +1224,31 @@ /* in fast uidl, we manipulate the old list only! */ - if ((newl = id_find(&ctl->oldsaved, num))) + if ((rec = find_uid_by_num(&ctl->oldsaved, num))) { /* we already have the id! */ - return(newl->val.status.mark != UID_UNSEEN); + return(rec->status != UID_UNSEEN); } /* get the uidl first! */ if (pop3_getuidl(sock, num, id, sizeof(id)) != PS_SUCCESS) return(TRUE); - if ((newl = str_in_list(&ctl->oldsaved, id, FALSE))) { + if ((rec = find_uid_by_id(&ctl->oldsaved, id))) { /* we already have the id! */ - newl->val.status.num = num; - return(newl->val.status.mark != UID_UNSEEN); + set_uid_db_num(&ctl->oldsaved, rec, num); + return(rec->status != UID_UNSEEN); } /* save it */ - newl = save_str(&ctl->oldsaved, id, UID_UNSEEN); - newl->val.status.num = num; + rec = uid_db_insert(&ctl->oldsaved, id, UID_UNSEEN); + set_uid_db_num(&ctl->oldsaved, rec, num); + return(FALSE); + } else { + rec = find_uid_by_num(&ctl->newsaved, num); + return !rec || rec->status != UID_UNSEEN; } - else - return ((newl = id_find(&ctl->newsaved, num)) != NULL && - newl->val.status.mark != UID_UNSEEN); } #ifdef UNUSED @@ -1353,28 +1367,31 @@ static void mark_uid_seen(struct query *ctl, int number) /* Tell the UID code we've seen this. */ { - struct idlist *sdp; + struct uid_db_record *rec; - if ((sdp = id_find(&ctl->newsaved, number))) - sdp->val.status.mark = UID_SEEN; + if ((rec = find_uid_by_num(&ctl->newsaved, number))) + rec->status = UID_SEEN; /* mark it as seen in oldsaved also! In case, we do not swap the lists * (say, due to socket error), the same mail will not be downloaded * again. */ - if ((sdp = id_find(&ctl->oldsaved, number))) - sdp->val.status.mark = UID_SEEN; + if ((rec = find_uid_by_num(&ctl->oldsaved, number))) + rec->status = UID_SEEN; } static int pop3_delete(int sock, struct query *ctl, int number) /* delete a given message */ { + struct uid_db_record *rec; int ok; mark_uid_seen(ctl, number); /* actually, mark for deletion -- doesn't happen until QUIT time */ ok = gen_transact(sock, "DELE %d", number); if (ok != PS_SUCCESS) return(ok); - delete_str(dofastuidl ? &ctl->oldsaved : &ctl->newsaved, number); + + rec = find_uid_by_num(dofastuidl ? &ctl->oldsaved : &ctl->newsaved, number); + rec->status = UID_DELETED; return(PS_SUCCESS); } --- fetchmail/uid.c 7 May 2010 13:49:16 -0000 1.1.1.2 +++ fetchmail/uid.c 24 May 2010 16:02:10 -0000 1.1.1.2.2.6 @@ -108,6 +108,19 @@ static struct idlist *scratchlist; /** Read saved IDs from \a idfile and attach to each host in \a hostlist. */ +static int dump_saved_uid(struct uid_db_record *rec, void *unused) +{ + char *t; + + (void)unused; + + t = sdump(rec->id, rec->id_len); + report_build(stdout, " %s", t); + free(t); + + return 0; +} + void initialize_saved_lists(struct query *hostlist, const char *idfile) { struct stat statbuf; @@ -117,9 +130,9 @@ /* make sure lists are initially empty */ for (ctl = hostlist; ctl; ctl = ctl->next) { ctl->skipped = (struct idlist *)NULL; - ctl->oldsaved = (struct idlist *)NULL; - ctl->newsaved = (struct idlist *)NULL; - ctl->oldsavedend = &ctl->oldsaved; + + init_uid_db(&ctl->oldsaved); + init_uid_db(&ctl->newsaved); } errno = 0; @@ -212,11 +225,11 @@ *atsign = '\0'; host = atsign + 1; - /* find proper list and save it */ + /* find uidl db and save it */ for (ctl = hostlist; ctl; ctl = ctl->next) { if (strcasecmp(host, ctl->server.queryname) == 0 && strcasecmp(user, ctl->remotename) == 0) { - save_str(&ctl->oldsaved, id, UID_SEEN); + uid_db_insert(&ctl->oldsaved, id, UID_SEEN); break; } } @@ -248,14 +261,12 @@ { report_build(stdout, GT_("Old UID list from %s:"), ctl->server.pollname); - idp = ctl->oldsaved; - if (!idp) + + if (!uid_db_n_records(&ctl->oldsaved)) report_build(stdout, GT_(" <empty>")); - else for (idp = ctl->oldsaved; idp; idp = idp->next) { - char *t = sdump(idp->id, strlen(idp->id)); - report_build(stdout, " %s", t); - free(t); - } + else + traverse_uid_db(&ctl->oldsaved, dump_saved_uid, NULL); + report_complete(stdout, "\n"); } @@ -273,13 +284,18 @@ /** Assert that all UIDs marked deleted in query \a ctl have actually been expunged. */ -void expunge_uids(struct query *ctl) +static int mark_as_expunged_if(struct uid_db_record *rec, void *unused) { - struct idlist *idl; + (void)unused; + + if (rec->status == UID_DELETED) rec->status = UID_EXPUNGED; + return 0; +} - for (idl = dofastuidl ? ctl->oldsaved : ctl->newsaved; idl; idl = idl->next) - if (idl->val.status.mark == UID_DELETED) - idl->val.status.mark = UID_EXPUNGED; +void expunge_uids(struct query *ctl) +{ + traverse_uid_db(dofastuidl ? &ctl->oldsaved : &ctl->newsaved, + mark_as_expunged_if, NULL); } static const char *str_uidmark(int mark) @@ -303,16 +319,32 @@ } } -static void dump_list(const struct idlist *idp) +static int dump_uid_db_record(struct uid_db_record *rec, void *arg) +{ + unsigned *n_recs; + char *t; + + n_recs = arg; + --*n_recs; + + t = sdump(rec->id, rec->id_len); + report_build(stdout, " %s = %s%s", t, str_uidmark(rec->status), *n_recs ? "," : ""); + free(t); + + return 0; +} + +static void dump_uid_db(struct uid_db *db) { - if (!idp) { + unsigned n_recs; + + n_recs = uid_db_n_records(db); + if (!n_recs) { report_build(stdout, GT_(" <empty>")); - } else while (idp) { - char *t = sdump(idp->id, strlen(idp->id)); - report_build(stdout, " %s = %s%s", t, str_uidmark(idp->val.status.mark), idp->next ? "," : ""); - free(t); - idp = idp->next; + return; } + + traverse_uid_db(db, dump_uid_db_record, &n_recs); } /* finish a query */ @@ -323,10 +355,10 @@ { if (dofastuidl) { report_build(stdout, GT_("Merged UID list from %s:"), ctl->server.pollname); - dump_list(ctl->oldsaved); + dump_uid_db(&ctl->oldsaved); } else { report_build(stdout, GT_("New UID list from %s:"), ctl->server.pollname); - dump_list(ctl->newsaved); + dump_uid_db(&ctl->newsaved); } report_complete(stdout, "\n"); } @@ -347,15 +379,10 @@ * with UIDLs from that account in .fetchids, there is no way for * them to ever get garbage-collected. */ - if (ctl->newsaved) + if (uid_db_n_records(&ctl->newsaved)) { - /* old state of mailbox may now be irrelevant */ - struct idlist *temp = ctl->oldsaved; - if (outlevel >= O_DEBUG) - report(stdout, GT_("swapping UID lists\n")); - ctl->oldsaved = ctl->newsaved; - ctl->newsaved = (struct idlist *) NULL; - free_str_list(&temp); + swap_uid_db_data(&ctl->newsaved, &ctl->oldsaved); + clear_uid_db(&ctl->newsaved); } /* in fast uidl, there is no need to swap lists: the old state of * mailbox cannot be discarded! */ @@ -372,29 +399,53 @@ /* this is now a merged list! the mails which were seen in this * poll are marked here. */ report_build(stdout, GT_("Merged UID list from %s:"), ctl->server.pollname); - dump_list(ctl->oldsaved); + dump_uid_db(&ctl->oldsaved); report_complete(stdout, "\n"); } - if (ctl->newsaved) + if (uid_db_n_records(&ctl->newsaved)) { /* new state of mailbox is not reliable */ if (outlevel >= O_DEBUG) report(stdout, GT_("discarding new UID list\n")); - free_str_list(&ctl->newsaved); - ctl->newsaved = (struct idlist *) NULL; + clear_uid_db(&ctl->newsaved); } } /** Reset the number associated with each id */ void uid_reset_num(struct query *ctl) { - struct idlist *idp; - for (idp = ctl->oldsaved; idp; idp = idp->next) - idp->val.status.num = 0; + reset_uid_db_nums(&ctl->oldsaved); } /** Write list of seen messages, at end of run. */ +static int count_seen_deleted(struct uid_db_record *rec, void *arg) +{ + if (rec->status == UID_SEEN || rec->status == UID_DELETED) + ++*(long *)arg; + return 0; +} + +struct write_saved_info { + struct query *ctl; + FILE *fp; +}; + +static int write_uid_db_record(struct uid_db_record *rec, void *arg) +{ + struct write_saved_info *info; + int rc; + + if (!(rec->status == UID_SEEN || rec->status == UID_DELETED)) + return 0; + + info = arg; + rc = fprintf(info->fp, "%s@%s %s\n", + info->ctl->remotename, info->ctl->server.queryname, + rec->id); + return rc < 0 ? -1 : 0; +} + void write_saved_lists(struct query *hostlist, const char *idfile) { long idcount; @@ -404,12 +455,8 @@ /* if all lists are empty, nuke the file */ idcount = 0; - for (ctl = hostlist; ctl; ctl = ctl->next) { - for (idp = ctl->oldsaved; idp; idp = idp->next) - if (idp->val.status.mark == UID_SEEN - || idp->val.status.mark == UID_DELETED) - idcount++; - } + for (ctl = hostlist; ctl; ctl = ctl->next) + traverse_uid_db(&ctl->oldsaved, count_seen_deleted, &idcount); /* either nuke the file or write updated last-seen IDs */ if (!idcount && !scratchlist) @@ -428,19 +475,22 @@ report(stdout, GT_("Writing fetchids file.\n")); (void)unlink(newnam); /* remove file/link first */ if ((tmpfp = fopen(newnam, "w")) != (FILE *)NULL) { + struct write_saved_info info; int errflg = 0; + + info.fp = tmpfp; + for (ctl = hostlist; ctl; ctl = ctl->next) { - for (idp = ctl->oldsaved; idp; idp = idp->next) - if (idp->val.status.mark == UID_SEEN - || idp->val.status.mark == UID_DELETED) - if (fprintf(tmpfp, "%s@%s %s\n", - ctl->remotename, ctl->server.queryname, idp->id) < 0) { - int e = errno; - report(stderr, GT_("Write error on fetchids file %s: %s\n"), newnam, strerror(e)); - errflg = 1; - goto bailout; - } + info.ctl = ctl; + + if (traverse_uid_db(&ctl->oldsaved, write_uid_db_record, &info) < 0) { + int e = errno; + report(stderr, GT_("Write error on fetchids file %s: %s\n"), newnam, strerror(e)); + errflg = 1; + goto bailout; + } } + for (idp = scratchlist; idp; idp = idp->next) if (EOF == fputs(idp->id, tmpfp)) { int e = errno; --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ fetchmail/uid_db.c 27 May 2010 21:14:18 -0000 1.23.2.39 @@ -0,0 +1,547 @@ +/* + POP3 UID db + + Copyright (c) 2010 MAD Partners, Ltd. (rwe...@ms...) + + This file is being published in accordance with the GPLv2 terms + contained in the COPYING file being part of the fetchmail + 6.3.17 release, including the OpenSSL exemption. +*/ + +/* + * includes + */ +#include <inttypes.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "xmalloc.h" +#include "uid_db.h" + +/* + * constants + */ +enum { + MIN_RECORDS = 16 /* arbitrary */ +}; + +/* + * types + */ +struct pat_node { + struct pat_node *ptrs_[3]; + unsigned bit_ndx; + uint16_t bit_ofs, bit_mask; + + struct uid_db_record *rec; +}; + +/* + The idea behind this is that the 'left' pointer of + a node is accessible as ptrs(np)[-1] and the right + one a ptrs(np)[1]. This implies that no separate codepaths + for 'symmetric left- and right-cases' are needed. +*/ +#define ptrs(np) ((np)->ptrs_ + 1) + +/* + * routines + */ +/* + ** various helpers + */ +static inline unsigned bit_ofs(unsigned bit_ndx) +{ + return bit_ndx >> 3; +} + +static inline unsigned bit_mask(unsigned bit_ndx) +{ + return 1 << (bit_ndx & 7); +} + +/* + ** PATRICIA trie insertion + */ +/* + *** walkers + */ +static struct pat_node *walk_down(struct uid_db *db, struct uid_db_record *rec, + struct pat_node ***edgep, struct pat_node **parentp) +{ + /* + Find the pat node whose id is 'most similar' to the id + stored in rec->id. Return a pointer to this node. + 'parentp' and 'edgep' are output-only parameters which + will point to the parent of returned node and to the edge + pointer going from the parent to the returned node after + the call has returned. + + This routine is intended for inserts only. + */ + struct pat_node *cur, **edge; + unsigned bit_ndx, v, ofs; + + cur = db->pat_root; + ofs = -1; + do { + if (ofs != cur->bit_ofs) { + ofs = cur->bit_ofs; + v = ofs < rec->id_len ? rec->id[ofs] : 0; + } + + bit_ndx = cur->bit_ndx; + edge = ptrs(cur) + (v & cur->bit_mask ? 1 : -1); + } while ((cur = *edge) && cur->bit_ndx > bit_ndx); + + *parentp = + (struct pat_node *) + ((unsigned char *)edge - (v & bit_mask(bit_ndx) ? + offsetof(struct pat_node, ptrs_[2]) + : offsetof(struct pat_node, ptrs_[0]))); + *edgep = edge; + return cur; +} + +static inline struct pat_node *walk_up(unsigned diff_ndx, struct pat_node **parent) +{ + struct pat_node *p, *np; + + np = *parent; + + while ((p = *ptrs(np)) && p->bit_ndx > diff_ndx) + np = p; + + *parent = p; + return np; +} + +/* + *** bit fiddling + */ +static inline unsigned first_set_bit_in_char(unsigned v) +{ + return ffs(v) - 1; +} + +static int find_first_diff_bit(struct uid_db_record const *r0, + struct uid_db_record const *r1) +{ + /* + Return the bit_ndx of the first differing bit in + r0->id and r1->id or -1 if the strings are identical. + */ + struct uid_db_record const *long_id; + unsigned ofs, max; + unsigned char v; + + max = r0->id_len; + if (max > r1->id_len) { + max = r1->id_len; + long_id = r0; + } else + long_id = r1; + + ofs = 0; + do + v = r0->id[ofs] ^ r1->id[ofs]; + while (!v && ++ofs < max); + + if (!v) { + if (r0->id_len == r1->id_len) return -1; + v = long_id->id[ofs]; + } + + return first_set_bit_in_char(v) + ofs * 8; +} + +static inline unsigned bit_set(struct pat_node const *np, + struct uid_db_record const *rec) +{ + return np->bit_ofs < rec->id_len ? + rec->id[np->bit_ofs] & np->bit_mask : 0; +} + +/* + *** node allocation + */ +static struct pat_node *get_pat_node(struct uid_db_record *rec) +{ + struct pat_node *np; + + np = xmalloc(sizeof(*np)); + np->rec = rec; + rec->next = NULL; + return np; +} + +static struct pat_node *get_standalone_node(struct uid_db_record *rec) +{ + /* + Return a pat_node suitable for being inserted on the 'left edge' + of the trie, ie either linked to a node whose left pointer was zero + or being inserted as root node into an empty trie. The bit_ndx of + the pat_node is set to the index corresponding with the highest + set bit in rec->id. + + NB: This is a bad choice when UIDs share a common prefix because + this implies that the root node will cause a bit to be tested which + is non-zero in all other nodes, adding a theoretically redundant + level to the trie. This is (to the best of my knowledge) un- + fortunately unavoidable if nodes with different key lengths need + to be supported. + */ + struct pat_node *np; + + np = get_pat_node(rec); + + np->bit_ndx = first_set_bit_in_char(*rec->id); + np->bit_mask = bit_mask(np->bit_ndx); + np->bit_ofs = bit_ofs(np->bit_ndx); + + return np; +} + +/* + *** various helpers + */ +static inline int record_id_equal(struct uid_db_record const *r0, + struct uid_db_record const *r1) +{ + return + r0->id_len == r1->id_len + && memcmp(r0->id, r1->id, r0->id_len) == 0; +} + +static struct uid_db_record *append_to_list(struct uid_db_record **recp, + struct uid_db_record *rec) +{ + while (*recp) recp = &(*recp)->next; + *recp = rec; + + rec->next = NULL; + return rec; +} + +/* + *** insert routine + */ +static struct uid_db_record *pat_insert(struct uid_db *db, + struct uid_db_record *rec) +{ + /* + Insert the record pointed to by rec in the (potentially empty) + PATRICIA trie pointed to by db->pat_root and return rec. + */ + struct pat_node *np, *closest, *parent, **edge; + int me, bit_ndx; + + if (!db->pat_root) { + np = get_standalone_node(rec); + ptrs(np)[-1] = *ptrs(np) = NULL; + ptrs(np)[1] = np; + + db->pat_root = np; + return rec; + } + + closest = walk_down(db, rec, &edge, &parent); + + if (closest) { + bit_ndx = find_first_diff_bit(closest->rec, rec); + if (bit_ndx < 0) + return append_to_list(&closest->rec->next, rec); + + np = get_pat_node(rec); + + np->bit_ndx = bit_ndx; + np->bit_mask = bit_mask(bit_ndx); + np->bit_ofs = bit_ofs(bit_ndx); + } else + np = get_standalone_node(rec); + + if (parent->bit_ndx > np->bit_ndx) { + closest = walk_up(np->bit_ndx, &parent); + + if (!parent) edge = &db->pat_root; + else edge = ptrs(parent)[-1] == closest ? + ptrs(parent) - 1 : ptrs(parent) + 1; + *ptrs(closest) = np; + } + + *edge = np; + *ptrs(np) = parent; + + me = bit_set(np, rec) ? 1 : -1; + ptrs(np)[me] = np; + ptrs(np)[-me] = closest; + + return rec; +} + +/* + ** general db insertion + */ +static struct uid_db_record *get_uid_db_record(char const *id, unsigned status) +{ + struct uid_db_record *rec; + size_t id_len; + + rec = xmalloc(sizeof(*rec)); + + id_len = strlen(id); + rec->id = memcpy(xmalloc(id_len + 1), id, id_len + 1); + rec->id_len = id_len; + rec->status = status; + rec->num = 0; + + return rec; +} + +static void insert_into_records(struct uid_db *db, + struct uid_db_record *rec) +{ + unsigned next, want; + + next = db->records_next; + + if (next == db->records_max) { + want = db->records_max *= 2; + db->records = xrealloc(db->records, want * sizeof(rec)); + } + + rec->pos = next; + db->records[next] = rec; + db->records_next = next + 1; +} + +struct uid_db_record *uid_db_insert(struct uid_db *db, + char const *id, unsigned status) +{ + struct uid_db_record *rec; + + rec = get_uid_db_record(id, status); + insert_into_records(db, rec); + return pat_insert(db, rec); +} + +/* + ** message number index + */ +void set_uid_db_num(struct uid_db *db, struct uid_db_record *rec, + unsigned num) +{ + /* + Set the message number of the record pointed to by rec to num + and insert it into the num_ndx of the uid_db pointed to by db + at position corresponding with num. The num_ndx lookup array + is grown as needed. Message numbers are expected to 'generally' + be recorded in ascending order and hence, no provisions are + made to deal with the potentially quadratic complexity of + inserting a sequence of numbers into an array such that it + needs to be grown continuously. + */ + struct num_ndx *num_ndx; + unsigned have, want; + + num_ndx = &db->num_ndx; + + if (num_ndx->end_value > num) { + have = num_ndx->pos_0_value - num_ndx->end_value + 1; + want = num_ndx->pos_0_value - num + 1; + num_ndx->end_value = num; + + num_ndx->records = xrealloc(num_ndx->records, want * sizeof(rec)); + do num_ndx->records[--want] = NULL; while (want > have); + } + + num_ndx->records[uid_db_num_ofs(num_ndx, num)] = rec; +} + +void reset_uid_db_nums(struct uid_db *db) +{ + /* + Reset the message numbers of all uid_db_records stored + in the uid_db pointed to by db. The corresponding num_ndx + lookup array is afterwards freed and the num_ndx end_value + adjusted in order to indicate an 'empty' message number + index. + */ + struct uid_db_record **rec; + struct num_ndx *num_ndx; + unsigned ndx; + + num_ndx = &db->num_ndx; + + if (num_ndx->end_value < num_ndx->pos_0_value) { + ndx = num_ndx->pos_0_value - num_ndx->end_value; + while (ndx) { + rec = num_ndx->records + --ndx; + if (*rec) (*rec)->num = 0; + } + + num_ndx->end_value = num_ndx->pos_0_value + 1; + + free(num_ndx->records); + num_ndx->records = NULL; + } +} + +/* + ** search routines + */ +struct uid_db_record *find_uid_by_id(struct uid_db *db, char const *id) +{ + struct pat_node *np; + struct uid_db_record *rec; + unsigned v, bit_ndx, ofs; + size_t len; + + np = db->pat_root; + if (np) { + len = strlen(id); + ofs = -1; + do { + if (ofs != np->bit_ofs) { + ofs = np->bit_ofs; + v = ofs < len ? id[ofs] : 0; + } + + bit_ndx = np->bit_ndx; + np = ptrs(np)[v & bit_mask(bit_ndx) ? 1 : -1]; + } while (np && np->bit_ndx > bit_ndx); + + if (!np) return NULL; + + rec = np->rec; + return rec->id_len == len && memcmp(id, rec->id, len) == 0 ? + rec : NULL; + } + + return NULL; +} + +struct uid_db_record *last_uid_in_db(struct uid_db *db, char const *id) +{ + struct uid_db_record *rec; + + rec = find_uid_by_id(db, id); + if (!rec) return NULL; + + while (rec->next) rec = rec->next; + return rec; +} + +/* + ** destruction + */ +static void free_uid_list(struct uid_db_record *rec) +{ + if (rec->next) free_uid_list(rec->next); + + xfree(rec->id); + xfree(rec); +} + +static void free_pat_trie(struct pat_node *np) +{ + /* + Free the PATRCIA-trie pointed to by np and all + uid_db_records contained in it. + + The algorithm implemented below is: + + 1. Load the left pointer of the node pointed to by + np into next. + + 2. If the result is not NULL, + 2a) Set the left pointer to NULL. + 2b) Goto 1 if next points to a child of np. + + 3. Load the right pointer of the node pointed to by + np into next. + + 4. If the result is not NULL, + 4a) Set the right pointer to NULL. + 4b) Goto 1 id next points to a child of np. + + 5. Load next with the parent pointer of np. + + 6. Free np->rec and np. + + 7. Set np to next and goto 1 if it is not null. + */ + struct pat_node *next; + + do { + next = ptrs(np)[-1]; + if (next) { + ptrs(np)[-1] = NULL; + if (next->bit_ndx > np->bit_ndx) continue; + } + + next = ptrs(np)[1]; + if (next) { + ptrs(np)[1] = NULL; + if (next->bit_ndx > np->bit_ndx) continue; + } + + next = *ptrs(np); + + free_uid_list(np->rec); + free(np); + } while ((np = next)); +} + +void free_uid_db(struct uid_db *db) +{ + if (db->pat_root) free_pat_trie(db->pat_root); + + xfree(db->records); + xfree(db->num_ndx.records); +} + +/* + ** various public interfaces + */ +void init_uid_db(struct uid_db *db) +{ + struct num_ndx *num_ndx; + + db->pat_root = NULL; + + db->records = xmalloc(MIN_RECORDS * sizeof(*db->records)); + db->records_max = MIN_RECORDS; + db->records_next = 0; + + num_ndx = &db->num_ndx; + num_ndx->pos_0_value = num_ndx->end_value = -1; + num_ndx->records = NULL; +} + +void swap_uid_db_data(struct uid_db *db_0, struct uid_db *db_1) +{ + struct uid_db tmp; + + tmp = *db_0; + *db_0 = *db_1; + *db_1 = tmp; +} + +int traverse_uid_db(struct uid_db *db, + uid_db_traversal_routine *r, void *arg) +{ + struct uid_db_record **recs; + unsigned ndx, max; + int rc; + + rc = 0; + ndx = 0; + max = db->records_next; + recs = db->records; + while (ndx < max && (rc = r(recs[ndx], arg)) == 0) + ++ndx; + + return rc; +} --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ fetchmail/uid_db.h 27 May 2010 21:09:47 -0000 1.3.2.20 @@ -0,0 +1,155 @@ +/* + POP3 UID database + + Copyright (c) 2010 MAD Partners, Ltd. (rwe...@ms...) + + This file is being published in accordance with the GPLv2 terms + contained in the COPYING file being part of the fetchmail + 6.3.17 release, including the OpenSSL exemption. +*/ +#ifndef fetchmail_uid_db_h +#define fetchmail_uid_db_h + +/* + * includes + */ +#include <stddef.h> + +/* types */ +struct uid_db_record { + char *id; + size_t id_len; + + /* + num - message number assigned by server + status - message status (eg seen, deleted, ...) + pos - position in record list + */ + unsigned num, status, pos; + + struct uid_db_record *next; +}; + +struct num_ndx { + /* + Used to find uid records by message number. + + pos_0_value - highest message number + end_value - lowest known message number + + Grows downwards because the fastuidl-code may record + message numbers in non-ascending order but the + lookup array should ideally only be large enough to + store pointers to interesting ('new') messages. + */ + struct uid_db_record **records; + unsigned pos_0_value, end_value; +}; + +struct uid_db +{ + struct pat_node *pat_root; + + struct uid_db_record **records; + unsigned records_max, records_next; + + struct num_ndx num_ndx; +}; + +typedef int uid_db_traversal_routine(struct uid_db_record *, void *); + +/* + * routines + */ +/* + ** initialization/ finalization + */ +void init_uid_db(struct uid_db *db); + +void free_uid_db(struct uid_db *db); + +static inline void clear_uid_db(struct uid_db *db) +{ + free_uid_db(db); + init_uid_db(db); +} + +/* + ** message number index handling + */ +static inline unsigned uid_db_num_ofs(struct num_ndx const *num_ndx, unsigned num) +{ + return num_ndx->pos_0_value - num; +} + +void set_uid_db_num(struct uid_db *db, struct uid_db_record *rec, + unsigned num); + +static inline void set_uid_db_num_pos_0(struct uid_db *db, unsigned num) +{ + db->num_ndx.pos_0_value = num; + db->num_ndx.end_value = num + 1; +} + +void reset_uid_db_nums(struct uid_db *db); + +/* + ** various uidl db manipulatiors + */ +struct uid_db_record *uid_db_insert(struct uid_db *db, + char const *id, unsigned status); + +void swap_uid_db_data(struct uid_db *db_0, struct uid_db *db_1); + +/* + ** search routines + */ +struct uid_db_record *find_uid_by_id(struct uid_db *db, char const *id); + +static inline struct uid_db_record * +find_uid_by_num(struct uid_db *db, unsigned num) +{ + struct num_ndx *num_ndx; + + num_ndx = &db->num_ndx; + return num >= num_ndx->end_value ? + num_ndx->records[uid_db_num_ofs(num_ndx, num)] : NULL; +} + +static inline struct uid_db_record * +find_uid_by_pos(struct uid_db *db, unsigned pos) +{ + return pos < db->records_next ? db->records[pos] : NULL; +} + +static inline struct uid_db_record * +first_uid_in_db(struct uid_db *db, char const *id) +{ + return find_uid_by_id(db, id); +} + +struct uid_db_record *last_uid_in_db(struct uid_db *db, char const *id); + +/* + ** various accessors + */ +static inline unsigned uid_db_n_records(struct uid_db const *db) +{ + return db->records_next; +} + +/* + Traverses the struct uid_db records array in insert order, + invoking the subroutine pointed to by r with a pointer to + each record and the arg pointer as arguments. If the return + value of that is non-zero, traverse_uid_db immediately returns + with this value. Otherwise, zero is returned after the last + record was visited. + + The uid_db_traversal_routine must not modify the uid_db during + traversal. +*/ +int traverse_uid_db(struct uid_db *db, + uid_db_traversal_routine *r, void *arg); + +#endif |
From: Matthias A. <mat...@gm...> - 2010-06-01 10:25:39
|
Am 01.06.2010 00:13, schrieb Rainer Weikusat: > Rainer Weikusat <rwe...@ms...> writes: >> "Matthias Andree" <mat...@gm...> writes: > > [...] > >>> No urge here, I'm not sure if I'll have the time to merge that stuff >>> over the week-end. May well turn mid to end of next week before I get >>> to it. >> >> I will nevertheless do that since it is a fairly simple operation, >> which I can put into a few idle minutes in my (presently) usual 10h+ >> workdays > > Update why this hasn't happened so far: I have a deadline to meet at > 3pm 2010/06/01 and because of this, I've had no time to do anything > except working towards this goal so far (been writing code since > 2010/05/31 11am so far). Thanks for letting me know. This isn't a problem. I appreciate that real-life obligations precede fetchmail contributions. Focus on your RL project, and good speed with it. -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-06-01 00:14:06
|
Rainer Weikusat <rwe...@ms...> writes: > "Matthias Andree" <mat...@gm...> writes: [...] >> No urge here, I'm not sure if I'll have the time to merge that stuff >> over the week-end. May well turn mid to end of next week before I get >> to it. > > I will nevertheless do that since it is a fairly simple operation, > which I can put into a few idle minutes in my (presently) usual 10h+ > workdays Update why this hasn't happened so far: I have a deadline to meet at 3pm 2010/06/01 and because of this, I've had no time to do anything except working towards this goal so far (been writing code since 2010/05/31 11am so far). |
From: Matthias A. <mat...@gm...> - 2010-05-28 19:16:13
|
Rainer Weikusat wrote on 2010-05-28: > "Matthias Andree" <mat...@gm...> writes: >> D'oh. A std dev' that's thrice the average. There's a considerable >> chance for a negative number. :-} > > It is conceivable that I am miscalculating the number. But this really > just communicates that the distribution is very asymmetric. I'd rather trust your numerics than the underlying assumption it were normally distributed data :-) (Normalverteilung/Gauss'sche). Ok, there's always room to miss out on a "2" in an exponent and inadvertently confuse variance and standard deviation, but still why would the count of messages in the mailboxes be normally distributed? There isn't a single reason I could see, and this voids all chances that the arith. average were in 1-sigma, 2-sigma, 3-sigma confidence intervals, or anywhere near the expected value. >>> UID handling is still the dominant operation (processor-time wise) >>> performed by fetchmail. >> >> No it's not, see below -- you're not grouping other code properly. If >> oprofile samples call stacks so that it can cumulate per caller (like >> du(1) does) you'll see it's not. Otherwise, we'll resort to the >> preclusion reasoning below. > > This seems to be a disagreement about the meaning of the term > 'dominant'. It may well have some defined meaning I am not aware of. > I am not a repurposed physicist or person with any other kind of > degree , just a programmer and even this just because I spent > something like twenty years learning about this stuff whatever I could > gather from sources available to me. I'm not sure if there is a formal definiton, but I'd say it's "dominant" if you can neglect other terms in a sum, for instance. Or if you'd write the >> or << relation operators, often considered as possible with one order of magnitude in between. That we don't have after your fix. That doesn't make your fix less useful, however. As written for my 16,000 message inbox, fetchmail starts fetching immediately, whereas the unpatched fetchmail ponders UIDL more than a minute. > I will nevertheless do that since it is a fairly simple operation, > which I can put into a few idle minutes in my (presently) usual 10h+ > workdays and something I can absolutely not do is put much more effort > into this, except maybe two o'clock in the morning after working for > ten hours. Since the code exists, works and (as far as I could > determine) mitigates the problem it was supposed to mitigate, any > justification I could conceivable use to explain to my boss why I am > even looking at this during the time I am neither eating nor sleeping > is gone :-(. Which is good enough. I've always been meaning to do that, but there was always something else more important to do, so your final patch will be most welcome. I expect to make minor changes to fix compiler warnings and add side comments to NEWS and thereabouts (it's always interesting to see what you get when you compile on different Linux flavours, Solaris, NetBSD, FreeBSD, MacOS X), but from what I've seen so far, not much will be needed. Possibly the merge with my existing advance development branch (not yet published) for 6.4.0 or 6.5.0 may take another hour - I don't think I'll put this feature in a 6.3.X release. This 6.3 branch was originally just a drop-in compatible stopgap branch so nobody had an excuse to sidestep the >100 fixes between 6.2.5 and 6.3.0, but as we all know, makeshift arrangements often prove to be long-lived. I would never have expected it to spawn 18 nontrivial releases to date, and 6.3.18 would appear, too. The fetchmail code however really needs updating and leave compatibility with pre-2000 operating systems (with before C-89, before-POSIX2001) behind - especially the way it's sprayed all over the source with a gazillion of #ifdef makes maintaining the code a time-consuming and error-prone process, and my experience is if you don't have good improvements as selling points users will just stick to the old versions that you've left behind long ago. I know I may disappoint some users that way, but in the end I need to see to real life -- which means I'm not pampering up systems where vendor support ceased years ago. Efficiency improvements that users can see are one such selling point to motivate a migration... :) -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-28 11:30:10
|
"Matthias Andree" <mat...@gm...> writes: > Am 28.05.2010, 02:34 Uhr, schrieb Rainer Weikusat: > >> The numbers contained in this mail come from two 24h 'observations' of >> the old and new UID handling code done via oprofile. They are based on >> querying 119 POP3 accounts. Some statistics about the .fetchids files >> (everything in lines): >> >> total: 35004 >> max: 8019 >> min: 1 >> average: 294.15 >> median: 46 >> std. dev: 823.87 > > D'oh. A std dev' that's thrice the average. There's a considerable > chance for a negative number. :-} It is conceivable that I am miscalculating the number. But this really just communicates that the distribution is very asymmetric. >> UID handling is still the dominant operation (processor-time wise) >> performed by fetchmail. > > No it's not, see below -- you're not grouping other code properly. If > oprofile samples call stacks so that it can cumulate per caller (like > du(1) does) you'll see it's not. Otherwise, we'll resort to the > preclusion reasoning below. This seems to be a disagreement about the meaning of the term 'dominant'. It may well have some defined meaning I am not aware of. I am not a repurposed physicist or person with any other kind of degree , just a programmer and even this just because I spent something like twenty years learning about this stuff whatever I could gather from sources available to me. [...] >> I hope to be able to produce something like a final patch at some time >> late evening tomorrow. > > No urge here, I'm not sure if I'll have the time to merge that stuff > over the week-end. May well turn mid to end of next week before I get > to it. I will nevertheless do that since it is a fairly simple operation, which I can put into a few idle minutes in my (presently) usual 10h+ workdays and something I can absolutely not do is put much more effort into this, except maybe two o'clock in the morning after working for ten hours. Since the code exists, works and (as far as I could determine) mitigates the problem it was supposed to mitigate, any justification I could conceivable use to explain to my boss why I am even looking at this during the time I am neither eating nor sleeping is gone :-(. |
From: Matthias A. <mat...@gm...> - 2010-05-28 04:24:49
|
Am 28.05.2010, 02:34 Uhr, schrieb Rainer Weikusat: > The numbers contained in this mail come from two 24h 'observations' of > the old and new UID handling code done via oprofile. They are based on > querying 119 POP3 accounts. Some statistics about the .fetchids files > (everything in lines): > > total: 35004 > max: 8019 > min: 1 > average: 294.15 > median: 46 > std. dev: 823.87 D'oh. A std dev' that's thrice the average. There's a considerable chance for a negative number. :-} The actual problem is that the model doesn't fit, you don't have a Normal distribution here, confirmed by the table I'm not quoting. > UID handling is still the dominant operation (processor-time wise) > performed by fetchmail. No it's not, see below -- you're not grouping other code properly. If oprofile samples call stacks so that it can cumulate per caller (like du(1) does) you'll see it's not. Otherwise, we'll resort to the preclusion reasoning below. > The relative amount of time is down from > 93.46% to 34.36%. During the measurement interval, 1,540,732 event > samples were collected for the old code and 63,092 for the new > code: The trie/ array-based UID db has used about 4.09% of the CPU > time the list-based one did use. The relative times of the functions among themselves are hard to interpret, particularly if you don't group the other functions properly. The total sample count is far more interesting, and there you're down from 1.52 million samples (out of 1.65 million) to 0.07 million (out of 0.18 million), where the other stuff remains more or less the same. IOW, there is a base amount of ~0.12 million samples NOT related to UID handling. Where the old code in your load scenario was more than 12x that of the other code (i. e. "dominated"), it's now roughly half the effort, which is no longer "dominating". > I hope to be able to produce something like a final patch at some time > late evening tomorrow. No urge here, I'm not sure if I'll have the time to merge that stuff over the week-end. May well turn mid to end of next week before I get to it. Thank you. -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-28 02:35:05
|
The numbers contained in this mail come from two 24h 'observations' of the old and new UID handling code done via oprofile. They are based on querying 119 POP3 accounts. Some statistics about the .fetchids files (everything in lines): total: 35004 max: 8019 min: 1 average: 294.15 median: 46 std. dev: 823.87 Grouped by size with increments of 16: 0 36 (30.2%) 16 18 (15.1%) 32 7 (5.8%) 48 5 (4.2%) 64 3 (2.5%) 80 2 (1.6%) 96 4 (3.3%) 128 4 (3.3%) 144 1 (0.8%) 160 2 (1.6%) 176 5 (4.2%) 192 2 (1.6%) 208 1 (0.8%) 240 1 (0.8%) 256 4 (3.3%) 288 3 (2.5%) 304 1 (0.8%) 320 1 (0.8%) 384 1 (0.8%) 480 1 (0.8%) 512 2 (1.6%) 544 1 (0.8%) 560 1 (0.8%) 720 1 (0.8%) 736 1 (0.8%) 912 1 (0.8%) 976 1 (0.8%) 1104 1 (0.8%) 1200 1 (0.8%) 1264 1 (0.8%) 1344 2 (1.6%) 1408 1 (0.8%) 1744 1 (0.8%) 2000 1 (0.8%) 2064 1 (0.8%) This is data about the presently existing files. All actual numbers were 'somewhat smaller' for the 'new code' run and 'even more smaller' for the old one. UID handling is still the dominant operation (processor-time wise) performed by fetchmail. The relative amount of time is down from 93.46% to 34.36%. During the measurement interval, 1,540,732 event samples were collected for the old code and 63,092 for the new code: The trie/ array-based UID db has used about 4.09% of the CPU time the list-based one did use. By function, list based (at least 0.1%): 929507 56.3856 fetchmail.dimap id_find 585333 35.5074 fetchmail.dimap str_in_list 25892 1.5707 fetchmail.dimap save_str [w/ tail pointers] 15089 0.9153 fetchmail.dimap yylex 10827 0.6568 fetchmail.dimap do_session 10265 0.6227 fetchmail.dimap initialize_saved_lists 5908 0.3584 fetchmail.dimap yyparse 4941 0.2997 fetchmail.dimap SockRead 3557 0.2158 fetchmail.dimap strlcpy 3094 0.1877 fetchmail.dimap gen_recv 2976 0.1805 fetchmail.dimap main 2838 0.1722 fetchmail.dimap pop3_getrange 2783 0.1688 fetchmail.dimap optmerge 2642 0.1603 fetchmail.dimap stuffline 2614 0.1586 fetchmail.dimap write_saved_lists 2519 0.1528 fetchmail.dimap imap_response 2167 0.1315 fetchmail.dimap set_timeout 2162 0.1312 fetchmail.dimap imap_is_old 2072 0.1257 fetchmail.dimap xstrdup 1791 0.1086 fetchmail.dimap xmalloc By function, trie/ array: 36916 20.1071 fetchmail.uid-db uid_db_insert 26176 14.2573 fetchmail.uid-db find_uid_by_id 16937 9.2251 fetchmail.uid-db yylex 10700 5.8280 fetchmail.uid-db initialize_saved_lists 10248 5.5818 fetchmail.uid-db do_session 6041 3.2904 fetchmail.uid-db yyparse 4994 2.7201 fetchmail.uid-db SockRead 3728 2.0305 fetchmail.uid-db stuffline 3620 1.9717 fetchmail.uid-db strlcpy 3507 1.9102 fetchmail.uid-db free_uid_db 3212 1.7495 fetchmail.uid-db main 3174 1.7288 fetchmail.uid-db gen_recv 2791 1.5202 fetchmail.uid-db optmerge 2728 1.4859 fetchmail.uid-db imap_response 2641 1.4385 fetchmail.uid-db set_timeout 2428 1.3225 fetchmail.uid-db xmalloc 2356 1.2832 fetchmail.uid-db imap_is_old 2271 1.2369 fetchmail.uid-db write_uid_db_record 1901 1.0354 fetchmail.uid-db pop3_getrange 1803 0.9820 fetchmail.uid-db T.169 1786 0.9728 fetchmail.uid-db parsecmdline 1532 0.8344 fetchmail.uid-db traverse_uid_db 1413 0.7696 fetchmail.uid-db count_seen_deleted 1182 0.6438 fetchmail.uid-db free_uid_list 1112 0.6057 fetchmail.uid-db envquery 1112 0.6057 fetchmail.uid-db pop3_is_old 1098 0.5980 fetchmail.uid-db mark_as_expunged_if 936 0.5098 fetchmail.uid-db _fini 892 0.4858 fetchmail.uid-db imap_getrange 880 0.4793 fetchmail.uid-db gen_transact 812 0.4423 fetchmail.uid-db readbody 774 0.4216 fetchmail.uid-db SSL_verify_callback 747 0.4069 fetchmail.uid-db SockOpen 736 0.4009 fetchmail.uid-db readheaders 735 0.4003 fetchmail.uid-db parse_resp_code 687 0.3742 fetchmail.uid-db set_signal_handler 680 0.3704 fetchmail.uid-db pop3_ok 673 0.3666 fetchmail.uid-db prependdir 597 0.3252 fetchmail.uid-db SSLOpen 577 0.3143 fetchmail.uid-db xstrdup 575 0.3132 fetchmail.uid-db SockWrite 512 0.2789 fetchmail.uid-db write_saved_lists 499 0.2718 fetchmail.uid-db query_host 494 0.2691 fetchmail.uid-db in_drop_domain 481 0.2620 fetchmail.uid-db prc_parse_file 471 0.2565 fetchmail.uid-db do_protocol 463 0.2522 fetchmail.uid-db pop3_getauth 400 0.2179 fetchmail.uid-db pmg_stuffline 371 0.2021 fetchmail.uid-db parse_netrc 368 0.2004 fetchmail.uid-db imap_getauth 338 0.1841 fetchmail.uid-db MD5Update 314 0.1710 fetchmail.uid-db free_netrc 311 0.1694 fetchmail.uid-db init_uid_db 296 0.1612 fetchmail.uid-db yyensure_buffer_stack 292 0.1590 fetchmail.uid-db copy_str_list 291 0.1585 fetchmail.uid-db fm_getaddrinfo 291 0.1585 fetchmail.uid-db host_fqdn 283 0.1541 fetchmail.uid-db imap_ok 278 0.1514 fetchmail.uid-db capa_probe 270 0.1471 fetchmail.uid-db reply_hack 264 0.1438 fetchmail.uid-db __libc_csu_init 256 0.1394 fetchmail.uid-db _start 242 0.1318 fetchmail.uid-db report_init 240 0.1307 fetchmail.uid-db interface_approve 233 0.1269 fetchmail.uid-db T.136 232 0.1264 fetchmail.uid-db yy_flush_buffer 223 0.1215 fetchmail.uid-db list_merge 211 0.1149 fetchmail.uid-db save_str 209 0.1138 fetchmail.uid-db T.140 196 0.1068 fetchmail.uid-db terminate_poll 192 0.1046 fetchmail.uid-db interface_init I hope to be able to produce something like a final patch at some time late evening tomorrow. |
From: Matthias A. <mat...@gm...> - 2010-05-26 02:09:13
|
Am 25.05.2010, 22:28 Uhr, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: > > [...] > >> For reasons beyond me, --fastuidl 5 does not currently appear to work. >> Fetchmail pulls the whole UID list. >> Haven't checked yet if that's an artifact of your code. > > Hmm ... 'works for me' (in daemon mode). Euh yes. I forgot some of the gory details and re-read the manpage. --fastuidl apparently works as advertised also with your patch. (I was wondering why the heck it would/should have been broken.) > Done. The modified code is meanwhile pulling mail for our customers > and so far, it seems that the relative amount of (processing) time > spent on dealing with POP3 UIDs has dropped from about 93.5% to about > 35%. Provided the oprofile daemon doesn't crash overnight (appears to > be a hobby of it), I should have some (extrapolated) absolute numbers > tomorrow before noon. I will also reformat my section headings to > remove the consecutive asterisks and I would like to again dump the > documentation comments for the trivial subroutines[*] but apart from > that, I don't believe that there will be further non-trivial changes. OK. 35% still seems much, but OTOH if you're pulling few messages and keeping many that might be in order. Let's see what oprofile samples and tune later. > [*] I've encountered too many cases of code which was moved to > a different place and modified somewhat in order to accomplish > a somewhat different function, but - of course - leaving all > the original comments intact to put any trust in comments. Ugh. -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-25 22:29:00
|
"Matthias Andree" <mat...@gm...> writes: [...] > For reasons beyond me, --fastuidl 5 does not currently appear to work. > Fetchmail pulls the whole UID list. > Haven't checked yet if that's an artifact of your code. Hmm ... 'works for me' (in daemon mode). >> NB: The Makefile.am change should probably be moved to the POP3 >> section of Makefile.am > > Move it to libfm_a_SOURCES=... - that lets the linker sort out if the > uid_db code is needed. Done. The modified code is meanwhile pulling mail for our customers and so far, it seems that the relative amount of (processing) time spent on dealing with POP3 UIDs has dropped from about 93.5% to about 35%. Provided the oprofile daemon doesn't crash overnight (appears to be a hobby of it), I should have some (extrapolated) absolute numbers tomorrow before noon. I will also reformat my section headings to remove the consecutive asterisks and I would like to again dump the documentation comments for the trivial subroutines[*] but apart from that, I don't believe that there will be further non-trivial changes. [*] I've encountered too many cases of code which was moved to a different place and modified somewhat in order to accomplish a somewhat different function, but - of course - leaving all the original comments intact to put any trust in comments. |
From: Matthias A. <mat...@gm...> - 2010-05-25 09:21:36
|
Am 24.05.2010, 21:30 Uhr, schrieb Rainer Weikusat: > - I've tried to document everything not completely trivial in > form of comments in the uid_db.c file. IMO, this is a mixed > blessing because an algorithm is not necessarily easier to > understand when described in natural language and everybody > contemplating to do something with code needs to understand > that, anyway ... For me, the usual policy is saying in comments what can't be better written as code, so for descriptions of algorithms, it is just a documentation of the interface; internal commentary is just a sketch and a terse high-level summary, and not a duplication of what the code is saying. Examples: lock.?, xmalloc.? I will want to reformat this so that Doxygen can parse it, but I'll hold off on that until I get a "final" version of your patch so that we're not stepping on our toes. In particular, /** ... */ is special for Doxygen and marks the comment as a documentation comment that precedes the code (for instance, a function or struct or struct members) it is documenting. I think this change is large and important enough to warrant a new minor version bump later to 6.4.0. I'll also use that possibility to toss a lot of compatibility cruft for early-1990s systems overboard and require POSIX 2001 and full C89 conformance (and reserve the right to require C99). -- Matthias Andree |
From: Matthias A. <mat...@gm...> - 2010-05-24 22:49:18
|
Am 24.05.2010, 21:30 Uhr, schrieb Rainer Weikusat: > Below is a diff of my modified fetchmail tree against the 6.3.17 > release tree. This should still be considered somewhat experimental > because it has so far only been tested with a single e-mail account in > 'oneshot' mode. According to prelimnary measurement, the modification > seems to be advantageous if there are sixty or more e-mail kept on a > server. Additional changes I haven't mentioned so far: First impression: looks good. (Athlon XP 2500+ with slowish DDR RAM, probably 6 - 7 years old) With 550 messages, barely a difference, CPU time down from 20 to 15 ms. With 16,400 messages, a huge difference, CPU time down from 62 to 1.3 s (includes --fetchlimit 100 and TLS either time). > - I've dropped the l from the uid in all the names. 'UIDL' likely > means 'uid listing' and hence, what is listed must be an uid indeed - UID means "unique identifier" > - The message number index now grows downwards from the > message count to the lowest-numbered 'interesting' ('new') > message seen. In order to be able to also use this, I've > changed the POP3 'stock uidl code' to only record the > message numbers of 'new' messages in keep mode and to treat > a NULL return to the 'find by message number' operations as > 'message is old'. The message number index will either only > be large enough to hold the new message numbers or (at > worst) be large enough to hold the lowest number of a > 'seen' message tested by the fastuidl code For reasons beyond me, --fastuidl 5 does not currently appear to work. Fetchmail pulls the whole UID list. Haven't checked yet if that's an artifact of your code. > - I've tried to document everything not completely trivial in > form of comments in the uid_db.c file. IMO, this is a mixed > blessing because an algorithm is not necessarily easier to > understand when described in natural language and everybody > contemplating to do something with code needs to understand > that, anyway ... > > After sending this mail, I will integrate this into our 'product > fetchmail' and I may be able to publish some real comparisons between > old and new code tomorrow. > > NB: The Makefile.am change should probably be moved to the POP3 > section of Makefile.am Move it to libfm_a_SOURCES=... - that lets the linker sort out if the uid_db code is needed. -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-24 21:30:50
|
Below is a diff of my modified fetchmail tree against the 6.3.17 release tree. This should still be considered somewhat experimental because it has so far only been tested with a single e-mail account in 'oneshot' mode. According to prelimnary measurement, the modification seems to be advantageous if there are sixty or more e-mail kept on a server. Additional changes I haven't mentioned so far: - I've dropped the l from the uid in all the names. 'UIDL' likely means 'uid listing' and hence, what is listed must be an uid - The message number index now grows downwards from the message count to the lowest-numbered 'interesting' ('new') message seen. In order to be able to also use this, I've changed the POP3 'stock uidl code' to only record the message numbers of 'new' messages in keep mode and to treat a NULL return to the 'find by message number' operations as 'message is old'. The message number index will either only be large enough to hold the new message numbers or (at worst) be large enough to hold the lowest number of a 'seen' message tested by the fastuidl code - I've tried to document everything not completely trivial in form of comments in the uid_db.c file. IMO, this is a mixed blessing because an algorithm is not necessarily easier to understand when described in natural language and everybody contemplating to do something with code needs to understand that, anyway ... After sending this mail, I will integrate this into our 'product fetchmail' and I may be able to publish some real comparisons between old and new code tomorrow. NB: The Makefile.am change should probably be moved to the POP3 section of Makefile.am -------------- --- fetchmail/Makefile.am 7 May 2010 13:49:15 -0000 1.1.1.6 +++ fetchmail/Makefile.am 23 May 2010 21:23:09 -0000 1.1.1.6.2.2 @@ -68,7 +68,7 @@ driver.c transact.c sink.c smtp.c \ idlist.c uid.c mxget.c md5ify.c cram.c gssapi.c \ opie.c interface.c netrc.c \ - unmime.c conf.c checkalias.c \ + unmime.c conf.c checkalias.c uid_db.h uid_db.c\ lock.h lock.c \ rcfile_l.l rcfile_y.y ucs/norm_charmap.c if POP2_ENABLE --- fetchmail/fetchmail.c 7 May 2010 13:49:14 -0000 1.1.1.4 +++ fetchmail/fetchmail.c 24 May 2010 16:02:10 -0000 1.1.1.4.2.3 @@ -1537,6 +1537,14 @@ return(st); } +static int print_id_of(struct uid_db_record *rec, void *unused) +{ + (void)unused; + + printf("\t%s\n", rec->id); + return 0; +} + static void dump_params (struct runctl *runp, struct query *querylist, flag implicit) /* display query parameters in English */ @@ -1938,20 +1946,14 @@ if (ctl->server.protocol > P_POP2 && MAILBOX_PROTOCOL(ctl)) { - if (!ctl->oldsaved) + int count; + + if (!(count = uid_db_n_records(&ctl->oldsaved))) printf(GT_(" No UIDs saved from this host.\n")); else { - struct idlist *idp; - int count = 0; - - for (idp = ctl->oldsaved; idp; idp = idp->next) - ++count; - printf(GT_(" %d UIDs saved.\n"), count); - if (outlevel >= O_VERBOSE) - for (idp = ctl->oldsaved; idp; idp = idp->next) - printf("\t%s\n", idp->id); + traverse_uid_db(&ctl->oldsaved, print_id_of, NULL); } } --- fetchmail/fetchmail.h 7 May 2010 13:49:13 -0000 1.1.1.4 +++ fetchmail/fetchmail.h 24 May 2010 16:02:10 -0000 1.1.1.4.2.3 @@ -39,6 +39,8 @@ # include "trio/trio.h" #endif +#include "uid_db.h" + /* We need this for strstr */ #if !defined(HAVE_STRSTR) && !defined(strstr) char *strstr(const char *, const char *); @@ -387,8 +389,7 @@ int smtp_socket; /* socket descriptor for SMTP connection */ unsigned int uid; /* UID of user to deliver to */ struct idlist *skipped; /* messages skipped on the mail server */ - struct idlist *oldsaved, *newsaved; - struct idlist **oldsavedend; + struct uid_db oldsaved, newsaved; char lastdigest[DIGESTLEN]; /* last MD5 hash seen on this connection */ char *folder; /* folder currently being polled */ --- fetchmail/pop3.c 7 May 2010 13:49:19 -0000 1.1.1.4 +++ fetchmail/pop3.c 24 May 2010 16:02:10 -0000 1.1.1.4.2.6 @@ -21,6 +21,7 @@ #include "fetchmail.h" #include "socket.h" #include "i18n.h" +#include "uid_db.h" #ifdef OPIE_ENABLE #ifdef __cplusplus @@ -845,20 +846,20 @@ last_nr = count + 1; while (first_nr < last_nr - 1) { - struct idlist *newl; + struct uid_db_record *rec; try_nr = (first_nr + last_nr) / 2; if ((ok = pop3_getuidl(sock, try_nr, id, sizeof(id))) != 0) return ok; - if ((newl = str_in_list(&ctl->oldsaved, id, FALSE))) + if ((rec = find_uid_by_id(&ctl->oldsaved, id))) { - flag mark = newl->val.status.mark; + flag mark = rec->status; if (mark == UID_DELETED || mark == UID_EXPUNGED) { if (outlevel >= O_VERBOSE) report(stderr, GT_("id=%s (num=%u) was deleted, but is still present!\n"), id, try_nr); /* just mark it as seen now! */ - newl->val.status.mark = mark = UID_SEEN; + rec->status = mark = UID_SEEN; } /* narrow the search region! */ @@ -872,7 +873,7 @@ first_nr = try_nr; /* save the number */ - newl->val.status.num = try_nr; + set_uid_db_num(&ctl->oldsaved, rec, try_nr); } else { @@ -881,8 +882,8 @@ last_nr = try_nr; /* save it */ - newl = save_str(&ctl->oldsaved, id, UID_UNSEEN); - newl->val.status.num = try_nr; + rec = uid_db_insert(&ctl->oldsaved, id, UID_UNSEEN); + set_uid_db_num(&ctl->oldsaved, rec, try_nr); } } if (outlevel >= O_DEBUG && last_nr <= count) @@ -910,34 +911,38 @@ * of messages to get. * + Otherwise run a binary search to determine the last known message */ + struct uid_db_record *rec; int ok, nolinear = 0; - int first_nr, list_len, try_id, try_nr, add_id; + int first_nr, n_recs, try_id, try_nr, add_id; int num; char id [IDLEN+1]; if ((ok = pop3_gettopid(sock, 1, id, sizeof(id))) != 0) return ok; - if( ( first_nr = str_nr_in_list(&ctl->oldsaved, id) ) == -1 ) { + rec = first_uid_in_db(&ctl->oldsaved, id); + if (!rec) { /* the first message is unknown -> all messages are new */ *newp = *countp; return 0; } - + first_nr = rec->pos; + /* check where we expect the latest known message */ - list_len = count_list( &ctl->oldsaved ); - try_id = list_len - first_nr; /* -1 + 1 */ + n_recs = uid_db_n_records(&ctl->oldsaved); + try_id = n_recs - first_nr; /* -1 + 1 */ if( try_id > 1 ) { if( try_id <= *countp ) { if ((ok = pop3_gettopid(sock, try_id, id, sizeof(id))) != 0) return ok; - - try_nr = str_nr_last_in_list(&ctl->oldsaved, id); + + rec = last_uid_in_db(&ctl->oldsaved, id); + try_nr = rec ? rec->pos : -1; } else { try_id = *countp+1; try_nr = -1; } - if( try_nr != list_len -1 ) { + if( try_nr != n_recs -1 ) { /* some messages inbetween have been deleted... */ if( try_nr == -1 ) { nolinear = 1; @@ -955,7 +960,9 @@ if ((ok = pop3_gettopid(sock, try_id, id, sizeof(id))) != 0) return ok; - try_nr = str_nr_in_list(&ctl->oldsaved, id); + + rec = find_uid_by_id(&ctl->oldsaved, id); + try_nr = rec ? rec->pos : -1; } if( try_nr == -1 ) { try_id--; @@ -968,17 +975,16 @@ } } /* the first try_id messages are known -> copy them to the newsaved list */ - for( num = first_nr; num < list_len; num++ ) + for( num = first_nr; num < n_recs; num++ ) { - struct idlist *newl = save_str(&ctl->newsaved, - str_from_nr_list(&ctl->oldsaved, num), - UID_UNSEEN); - newl->val.status.num = num - first_nr + 1; + rec = find_uid_by_pos(&ctl->oldsaved, num); + + rec = uid_db_insert(&ctl->newsaved, rec->id, rec->status); + set_uid_db_num(&ctl->newsaved, rec, num - first_nr + 1); } if( nolinear ) { - free_str_list(&ctl->oldsaved); - ctl->oldsaved = 0; + clear_uid_db(&ctl->oldsaved); last = try_id; } @@ -997,7 +1003,7 @@ (void)folder; /* Ensure that the new list is properly empty */ - ctl->newsaved = (struct idlist *)NULL; + clear_uid_db(&ctl->newsaved); #ifdef MBOX /* Alain Knaff suggests this, but it's not RFC standard */ @@ -1032,6 +1038,9 @@ int fastuidl; char id [IDLEN+1]; + set_uid_db_num_pos_0(&ctl->oldsaved, *countp); + set_uid_db_num_pos_0(&ctl->newsaved, *countp); + /* should we do fast uidl this time? */ fastuidl = ctl->fastuidl; if (*countp > 7 && /* linear search is better if there are few mails! */ @@ -1091,14 +1100,13 @@ if (parseuid(buf, &unum, id, sizeof(id)) == PS_SUCCESS) { - struct idlist *old, *newl; + struct uid_db_record *old_rec, *new_rec; - newl = save_str(&ctl->newsaved, id, UID_UNSEEN); - newl->val.status.num = unum; + new_rec = uid_db_insert(&ctl->newsaved, id, UID_UNSEEN); - if ((old = str_in_list(&ctl->oldsaved, id, FALSE))) + if ((old_rec = find_uid_by_id(&ctl->oldsaved, id))) { - flag mark = old->val.status.mark; + flag mark = old_rec->status; if (mark == UID_DELETED || mark == UID_EXPUNGED) { /* XXX FIXME: switch 3 occurrences from @@ -1108,9 +1116,9 @@ if (outlevel >= O_VERBOSE) report(stderr, GT_("id=%s (num=%d) was deleted, but is still present!\n"), id, (int)unum); /* just mark it as seen now! */ - old->val.status.mark = mark = UID_SEEN; + old_rec->status = mark = UID_SEEN; } - newl->val.status.mark = mark; + new_rec->status = mark; if (mark == UID_UNSEEN) { (*newp)++; @@ -1127,10 +1135,14 @@ * swap the lists (say, due to socket error), * the same mail will not be downloaded again. */ - old = save_str(&ctl->oldsaved, id, UID_UNSEEN); + old_rec = uid_db_insert(&ctl->oldsaved, id, UID_UNSEEN); + } /* save the number */ - old->val.status.num = unum; + if (new_rec->status == UID_UNSEEN || !ctl->keep) { + set_uid_db_num(&ctl->oldsaved, old_rec, unum); + set_uid_db_num(&ctl->newsaved, new_rec, unum); + } } else return PS_ERROR; } /* multi-line loop for UIDL reply */ @@ -1199,8 +1211,9 @@ static int pop3_is_old(int sock, struct query *ctl, int num) /* is the given message old? */ { - struct idlist *newl; - if (!ctl->oldsaved) + struct uid_db_record *rec; + + if (!uid_db_n_records(&ctl->oldsaved)) return (num <= last); else if (dofastuidl) { @@ -1211,30 +1224,31 @@ /* in fast uidl, we manipulate the old list only! */ - if ((newl = id_find(&ctl->oldsaved, num))) + if ((rec = find_uid_by_num(&ctl->oldsaved, num))) { /* we already have the id! */ - return(newl->val.status.mark != UID_UNSEEN); + return(rec->status != UID_UNSEEN); } /* get the uidl first! */ if (pop3_getuidl(sock, num, id, sizeof(id)) != PS_SUCCESS) return(TRUE); - if ((newl = str_in_list(&ctl->oldsaved, id, FALSE))) { + if ((rec = find_uid_by_id(&ctl->oldsaved, id))) { /* we already have the id! */ - newl->val.status.num = num; - return(newl->val.status.mark != UID_UNSEEN); + set_uid_db_num(&ctl->oldsaved, rec, num); + return(rec->status != UID_UNSEEN); } /* save it */ - newl = save_str(&ctl->oldsaved, id, UID_UNSEEN); - newl->val.status.num = num; + rec = uid_db_insert(&ctl->oldsaved, id, UID_UNSEEN); + set_uid_db_num(&ctl->oldsaved, rec, num); + return(FALSE); + } else { + rec = find_uid_by_num(&ctl->newsaved, num); + return !rec || rec->status != UID_UNSEEN; } - else - return ((newl = id_find(&ctl->newsaved, num)) != NULL && - newl->val.status.mark != UID_UNSEEN); } #ifdef UNUSED @@ -1353,28 +1367,31 @@ static void mark_uid_seen(struct query *ctl, int number) /* Tell the UID code we've seen this. */ { - struct idlist *sdp; + struct uid_db_record *rec; - if ((sdp = id_find(&ctl->newsaved, number))) - sdp->val.status.mark = UID_SEEN; + if ((rec = find_uid_by_num(&ctl->newsaved, number))) + rec->status = UID_SEEN; /* mark it as seen in oldsaved also! In case, we do not swap the lists * (say, due to socket error), the same mail will not be downloaded * again. */ - if ((sdp = id_find(&ctl->oldsaved, number))) - sdp->val.status.mark = UID_SEEN; + if ((rec = find_uid_by_num(&ctl->oldsaved, number))) + rec->status = UID_SEEN; } static int pop3_delete(int sock, struct query *ctl, int number) /* delete a given message */ { + struct uid_db_record *rec; int ok; mark_uid_seen(ctl, number); /* actually, mark for deletion -- doesn't happen until QUIT time */ ok = gen_transact(sock, "DELE %d", number); if (ok != PS_SUCCESS) return(ok); - delete_str(dofastuidl ? &ctl->oldsaved : &ctl->newsaved, number); + + rec = find_uid_by_num(dofastuidl ? &ctl->oldsaved : &ctl->newsaved, number); + rec->status = UID_DELETED; return(PS_SUCCESS); } --- fetchmail/uid.c 7 May 2010 13:49:16 -0000 1.1.1.2 +++ fetchmail/uid.c 24 May 2010 16:02:10 -0000 1.1.1.2.2.6 @@ -108,6 +108,19 @@ static struct idlist *scratchlist; /** Read saved IDs from \a idfile and attach to each host in \a hostlist. */ +static int dump_saved_uid(struct uid_db_record *rec, void *unused) +{ + char *t; + + (void)unused; + + t = sdump(rec->id, rec->id_len); + report_build(stdout, " %s", t); + free(t); + + return 0; +} + void initialize_saved_lists(struct query *hostlist, const char *idfile) { struct stat statbuf; @@ -117,9 +130,9 @@ /* make sure lists are initially empty */ for (ctl = hostlist; ctl; ctl = ctl->next) { ctl->skipped = (struct idlist *)NULL; - ctl->oldsaved = (struct idlist *)NULL; - ctl->newsaved = (struct idlist *)NULL; - ctl->oldsavedend = &ctl->oldsaved; + + init_uid_db(&ctl->oldsaved); + init_uid_db(&ctl->newsaved); } errno = 0; @@ -212,11 +225,11 @@ *atsign = '\0'; host = atsign + 1; - /* find proper list and save it */ + /* find uidl db and save it */ for (ctl = hostlist; ctl; ctl = ctl->next) { if (strcasecmp(host, ctl->server.queryname) == 0 && strcasecmp(user, ctl->remotename) == 0) { - save_str(&ctl->oldsaved, id, UID_SEEN); + uid_db_insert(&ctl->oldsaved, id, UID_SEEN); break; } } @@ -248,14 +261,12 @@ { report_build(stdout, GT_("Old UID list from %s:"), ctl->server.pollname); - idp = ctl->oldsaved; - if (!idp) + + if (!uid_db_n_records(&ctl->oldsaved)) report_build(stdout, GT_(" <empty>")); - else for (idp = ctl->oldsaved; idp; idp = idp->next) { - char *t = sdump(idp->id, strlen(idp->id)); - report_build(stdout, " %s", t); - free(t); - } + else + traverse_uid_db(&ctl->oldsaved, dump_saved_uid, NULL); + report_complete(stdout, "\n"); } @@ -273,13 +284,18 @@ /** Assert that all UIDs marked deleted in query \a ctl have actually been expunged. */ -void expunge_uids(struct query *ctl) +static int mark_as_expunged_if(struct uid_db_record *rec, void *unused) { - struct idlist *idl; + (void)unused; + + if (rec->status == UID_DELETED) rec->status = UID_EXPUNGED; + return 0; +} - for (idl = dofastuidl ? ctl->oldsaved : ctl->newsaved; idl; idl = idl->next) - if (idl->val.status.mark == UID_DELETED) - idl->val.status.mark = UID_EXPUNGED; +void expunge_uids(struct query *ctl) +{ + traverse_uid_db(dofastuidl ? &ctl->oldsaved : &ctl->newsaved, + mark_as_expunged_if, NULL); } static const char *str_uidmark(int mark) @@ -303,16 +319,32 @@ } } -static void dump_list(const struct idlist *idp) +static int dump_uid_db_record(struct uid_db_record *rec, void *arg) +{ + unsigned *n_recs; + char *t; + + n_recs = arg; + --*n_recs; + + t = sdump(rec->id, rec->id_len); + report_build(stdout, " %s = %s%s", t, str_uidmark(rec->status), *n_recs ? "," : ""); + free(t); + + return 0; +} + +static void dump_uid_db(struct uid_db *db) { - if (!idp) { + unsigned n_recs; + + n_recs = uid_db_n_records(db); + if (!n_recs) { report_build(stdout, GT_(" <empty>")); - } else while (idp) { - char *t = sdump(idp->id, strlen(idp->id)); - report_build(stdout, " %s = %s%s", t, str_uidmark(idp->val.status.mark), idp->next ? "," : ""); - free(t); - idp = idp->next; + return; } + + traverse_uid_db(db, dump_uid_db_record, &n_recs); } /* finish a query */ @@ -323,10 +355,10 @@ { if (dofastuidl) { report_build(stdout, GT_("Merged UID list from %s:"), ctl->server.pollname); - dump_list(ctl->oldsaved); + dump_uid_db(&ctl->oldsaved); } else { report_build(stdout, GT_("New UID list from %s:"), ctl->server.pollname); - dump_list(ctl->newsaved); + dump_uid_db(&ctl->newsaved); } report_complete(stdout, "\n"); } @@ -347,15 +379,10 @@ * with UIDLs from that account in .fetchids, there is no way for * them to ever get garbage-collected. */ - if (ctl->newsaved) + if (uid_db_n_records(&ctl->newsaved)) { - /* old state of mailbox may now be irrelevant */ - struct idlist *temp = ctl->oldsaved; - if (outlevel >= O_DEBUG) - report(stdout, GT_("swapping UID lists\n")); - ctl->oldsaved = ctl->newsaved; - ctl->newsaved = (struct idlist *) NULL; - free_str_list(&temp); + swap_uid_db_data(&ctl->newsaved, &ctl->oldsaved); + clear_uid_db(&ctl->newsaved); } /* in fast uidl, there is no need to swap lists: the old state of * mailbox cannot be discarded! */ @@ -372,29 +399,53 @@ /* this is now a merged list! the mails which were seen in this * poll are marked here. */ report_build(stdout, GT_("Merged UID list from %s:"), ctl->server.pollname); - dump_list(ctl->oldsaved); + dump_uid_db(&ctl->oldsaved); report_complete(stdout, "\n"); } - if (ctl->newsaved) + if (uid_db_n_records(&ctl->newsaved)) { /* new state of mailbox is not reliable */ if (outlevel >= O_DEBUG) report(stdout, GT_("discarding new UID list\n")); - free_str_list(&ctl->newsaved); - ctl->newsaved = (struct idlist *) NULL; + clear_uid_db(&ctl->newsaved); } } /** Reset the number associated with each id */ void uid_reset_num(struct query *ctl) { - struct idlist *idp; - for (idp = ctl->oldsaved; idp; idp = idp->next) - idp->val.status.num = 0; + reset_uid_db_nums(&ctl->oldsaved); } /** Write list of seen messages, at end of run. */ +static int count_seen_deleted(struct uid_db_record *rec, void *arg) +{ + if (rec->status == UID_SEEN || rec->status == UID_DELETED) + ++*(long *)arg; + return 0; +} + +struct write_saved_info { + struct query *ctl; + FILE *fp; +}; + +static int write_uid_db_record(struct uid_db_record *rec, void *arg) +{ + struct write_saved_info *info; + int rc; + + if (!(rec->status == UID_SEEN || rec->status == UID_DELETED)) + return 0; + + info = arg; + rc = fprintf(info->fp, "%s@%s %s\n", + info->ctl->remotename, info->ctl->server.queryname, + rec->id); + return rc < 0 ? -1 : 0; +} + void write_saved_lists(struct query *hostlist, const char *idfile) { long idcount; @@ -404,12 +455,8 @@ /* if all lists are empty, nuke the file */ idcount = 0; - for (ctl = hostlist; ctl; ctl = ctl->next) { - for (idp = ctl->oldsaved; idp; idp = idp->next) - if (idp->val.status.mark == UID_SEEN - || idp->val.status.mark == UID_DELETED) - idcount++; - } + for (ctl = hostlist; ctl; ctl = ctl->next) + traverse_uid_db(&ctl->oldsaved, count_seen_deleted, &idcount); /* either nuke the file or write updated last-seen IDs */ if (!idcount && !scratchlist) @@ -428,19 +475,22 @@ report(stdout, GT_("Writing fetchids file.\n")); (void)unlink(newnam); /* remove file/link first */ if ((tmpfp = fopen(newnam, "w")) != (FILE *)NULL) { + struct write_saved_info info; int errflg = 0; + + info.fp = tmpfp; + for (ctl = hostlist; ctl; ctl = ctl->next) { - for (idp = ctl->oldsaved; idp; idp = idp->next) - if (idp->val.status.mark == UID_SEEN - || idp->val.status.mark == UID_DELETED) - if (fprintf(tmpfp, "%s@%s %s\n", - ctl->remotename, ctl->server.queryname, idp->id) < 0) { - int e = errno; - report(stderr, GT_("Write error on fetchids file %s: %s\n"), newnam, strerror(e)); - errflg = 1; - goto bailout; - } + info.ctl = ctl; + + if (traverse_uid_db(&ctl->oldsaved, write_uid_db_record, &info) < 0) { + int e = errno; + report(stderr, GT_("Write error on fetchids file %s: %s\n"), newnam, strerror(e)); + errflg = 1; + goto bailout; + } } + for (idp = scratchlist; idp; idp = idp->next) if (EOF == fputs(idp->id, tmpfp)) { int e = errno; --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ fetchmail/uid_db.c 24 May 2010 19:18:14 -0000 1.23.2.36 @@ -0,0 +1,588 @@ +/* + POP3 UID db + + Copyright (c) 2010 MAD Partners, Ltd. (rwe...@ms...) + + This file is being published in accordance with the GPLv2 terms + contained in the COPYING file being part of the fetchmail + 6.3.17 release, including the OpenSSL exemption. +*/ + +/* includes */ +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "xmalloc.h" +#include "uid_db.h" + +/* constants */ +enum { + MIN_RECORDS = 16 /* arbitrary */ +}; + +/* types */ +struct pat_node { + struct pat_node *ptrs_[3]; + + /* + The bit mask is stored in the nodes (as opposed to the + offset, which is (re-)calculated on demand) because + calculating the mask is a non-trivial operation (at + least on x86). + */ + unsigned bit_ndx, bit_mask; + + struct uid_db_record *rec; +}; + +/* + The idea behind this is that the 'left' pointer of + a node is accessible as ptrs(np)[-1] and the right + one a ptrs(np)[1]. This implies that no separate codepaths + for 'symmetric left- and right-cases' are needed. +*/ +#define ptrs(np) ((np)->ptrs_ + 1) + +/* routines */ +/** various helpers */ +static inline unsigned bit_ofs(unsigned bit_ndx) +{ + return bit_ndx >> 3; +} + +static inline unsigned bit_mask(unsigned bit_ndx) +{ + return 1 << (bit_ndx & 7); +} + +/** PATRICIA trie insertion */ +/*** walkers */ +static struct pat_node *walk_down(struct uid_db *db, struct uid_db_record *rec, + struct pat_node ***edgep, struct pat_node **parentp) +{ + /* + Find the pat node whose id is 'most similar' to the id + stored in rec->id. Return a pointer to this node. + 'parentp' and 'edgep' are output-only parameters which + will point to the parent of returned node and to the edge + pointer going from the parent to the returned node after + the call has returned. + + This routine is intended for inserts only. + */ + struct pat_node *cur, **edge; + unsigned bit_ndx, v, ofs; + + cur = db->pat_root; + ofs = -1; + do { + bit_ndx = cur->bit_ndx; + + if (bit_ofs(bit_ndx) != ofs) { + ofs = bit_ofs(bit_ndx); + v = ofs < rec->id_len ? rec->id[ofs] : 0; + } + + edge = ptrs(cur) + (v & cur->bit_mask ? 1 : -1); + } while ((cur = *edge) && cur->bit_ndx > bit_ndx); + + *parentp = + (struct pat_node *) + ((unsigned char *)edge - (v & bit_mask(bit_ndx) ? + offsetof(struct pat_node, ptrs_[2]) + : offsetof(struct pat_node, ptrs_[0]))); + *edgep = edge; + return cur; +} + +static inline struct pat_node *walk_up(unsigned diff_ndx, struct pat_node **parent) +{ + /* + Walk the chain of parent pointers starting with *parent until a node + is found whose parent has a bit_ndx smaller than diff_ndx. Return + a pointer to this node and update *parent to point to its parent. + */ + struct pat_node *p, *np; + + np = *parent; + + while ((p = *ptrs(np)) && p->bit_ndx > diff_ndx) + np = p; + + *parent = p; + return np; +} + +/*** bit fiddling */ +static inline unsigned first_set_bit_in_char(unsigned v) +{ + return ffs(v) - 1; +} + +static int find_first_diff_bit(struct uid_db_record const *r0, + struct uid_db_record const *r1) +{ + /* + Return the bit_ndx of the first differing bit in + r0->id and r1->id or -1 if the strings are identical. + */ + struct uid_db_record const *long_id; + unsigned ofs, max; + unsigned char v; + + max = r0->id_len; + if (max > r1->id_len) { + max = r1->id_len; + long_id = r0; + } else + long_id = r1; + + ofs = 0; + do + v = r0->id[ofs] ^ r1->id[ofs]; + while (!v && ++ofs < max); + + if (!v) { + if (r0->id_len == r1->id_len) return -1; + v = long_id->id[ofs]; + } + + return first_set_bit_in_char(v) + ofs * 8; +} + +static inline unsigned bit_set(unsigned bit_ndx, struct uid_db_record const *rec) +{ + /* + Return non-zero if the bit corresponding with bit_ndx is set + in rec->id + */ + unsigned ofs; + + ofs = bit_ofs(bit_ndx); + if (ofs >= rec->id_len) return 0; + return rec->id[ofs] & bit_mask(bit_ndx); +} + +/*** node allocation */ +static struct pat_node *get_pat_node(struct uid_db_record *rec) +{ + /* + Allocate a pat_node, set its rec pointer to rec and the + next pointer of rec to NULL. Return pointer to the pat_node. + */ + struct pat_node *np; + + np = xmalloc(sizeof(*np)); + np->rec = rec; + rec->next = NULL; + return np; +} + +static struct pat_node *get_standalone_node(struct uid_db_record *rec) +{ + /* + Return a pat_node suitable for being inserted on the 'left edge' + of the trie, ie either linked to a node whose left pointer was zero + or being inserted as root node into an empty trie. The bit_ndx of + the pat_node is set to the index corresponding with the highest + set bit in rec->id. + + NB: This is a bad choice when UIDs share a common prefix because + this implies that the root node will cause a bit to be tested which + is non-zero in all other nodes, adding a theoretically redundant + level to the trie. This is (to the best of my knowledge) un- + fortunately unavoidable if nodes with different key lengths need + to be supported. + */ + struct pat_node *np; + + np = get_pat_node(rec); + np->bit_ndx = first_set_bit_in_char(*rec->id); + np->bit_mask = bit_mask(np->bit_ndx); + return np; +} + +/*** various helpers */ +static inline int record_id_equal(struct uid_db_record const *r0, + struct uid_db_record const *r1) +{ + return + r0->id_len == r1->id_len + && memcmp(r0->id, r1->id, r0->id_len) == 0; +} + +static struct uid_db_record *append_to_list(struct uid_db_record **recp, + struct uid_db_record *rec) +{ + /* + Append the uid_db_record pointed to by rec to the uid_db_record + list accessible as *recp and return rec. + */ + while (*recp) recp = &(*recp)->next; + *recp = rec; + + rec->next = NULL; + return rec; +} + +/*** insert routine */ +static struct uid_db_record *pat_insert(struct uid_db *db, + struct uid_db_record *rec) +{ + /* + Insert the record pointed to by rec in the (potentially empty) + PATRICIA trie pointed to by db->pat_root and return rec. + */ + struct pat_node *np, *closest, *parent, **edge; + int me, bit_ndx; + + if (!db->pat_root) { + np = get_standalone_node(rec); + ptrs(np)[-1] = *ptrs(np) = NULL; + ptrs(np)[1] = np; + + db->pat_root = np; + return rec; + } + + closest = walk_down(db, rec, &edge, &parent); + + if (closest) { + bit_ndx = find_first_diff_bit(closest->rec, rec); + if (bit_ndx < 0) + return append_to_list(&closest->rec->next, rec); + + np = get_pat_node(rec); + np->bit_ndx = bit_ndx; + np->bit_mask = bit_mask(bit_ndx); + } else + np = get_standalone_node(rec); + + if (parent->bit_ndx > np->bit_ndx) { + closest = walk_up(np->bit_ndx, &parent); + + if (!parent) edge = &db->pat_root; + else edge = ptrs(parent)[-1] == closest ? + ptrs(parent) - 1 : ptrs(parent) + 1; + *ptrs(closest) = np; + } + + *edge = np; + *ptrs(np) = parent; + + me = bit_set(np->bit_ndx, rec) ? 1 : -1; + ptrs(np)[me] = np; + ptrs(np)[-me] = closest; + + return rec; +} + +/** general db insertion */ +static struct uid_db_record *get_uid_db_record(char const *id, unsigned status) +{ + /* + Allocate a uid_db_record structure and set its id pointer to a + dynamically allocated copy of id. The status member of the + new record is set to status and its message number to zero (invalid). + A pointer to it is then returned. + */ + struct uid_db_record *rec; + size_t id_len; + + rec = xmalloc(sizeof(*rec)); + + id_len = strlen(id); + rec->id = memcpy(xmalloc(id_len + 1), id, id_len + 1); + rec->id_len = id_len; + rec->status = status; + rec->num = 0; + + return rec; +} + +static void insert_into_records(struct uid_db *db, + struct uid_db_record *rec) +{ + /* + Insert rec into the records array of the uid_db pointed + to by db. The array is grown as necessary and the + corresponding state variables of the db are updated + accordingly. The pos member of rec is set to its position + in the array. + */ + unsigned next, want; + + next = db->records_next; + + if (next == db->records_max) { + want = db->records_max *= 2; + db->records = xrealloc(db->records, want * sizeof(rec)); + } + + rec->pos = next; + db->records[next] = rec; + db->records_next = next + 1; +} + +struct uid_db_record *uid_db_insert(struct uid_db *db, + char const *id, unsigned status) +{ + /* + Create an uid_db_record whose id is id and whose status is + status and insert it into the uid_db pointed to by db. + Return a pointer to the newly created record. + */ + struct uid_db_record *rec; + + rec = get_uid_db_record(id, status); + insert_into_records(db, rec); + return pat_insert(db, rec); +} + +/** message number index */ +void set_uid_db_num(struct uid_db *db, struct uid_db_record *rec, + unsigned num) +{ + /* + Set the message number of the record pointed to by rec to num + and insert it into the num_ndx of the uid_db pointed to by db + at position corresponding with num. The num_ndx lookup array + is grown as needed. Message numbers are expected to 'generally' + be recorded in ascending order and hence, no provisions are + made to deal with the potentially quadratic complexity of + inserting a sequence of numbers into an array such that it + needs to be grown continuously. + */ + struct num_ndx *num_ndx; + unsigned have, want; + + num_ndx = &db->num_ndx; + + if (num_ndx->end_value > num) { + have = num_ndx->pos_0_value - num_ndx->end_value + 1; + want = num_ndx->pos_0_value - num + 1; + num_ndx->end_value = num; + + num_ndx->records = xrealloc(num_ndx->records, want * sizeof(rec)); + do num_ndx->records[--want] = NULL; while (want > have); + } + + num_ndx->records[uid_db_num_ofs(num_ndx, num)] = rec; +} + +void reset_uid_db_nums(struct uid_db *db) +{ + /* + Reset the message numbers of all uid_db_records stored + in the uid_db pointed to by db. The corresponding num_ndx + lookup array is afterwards freed and the num_ndx end_value + adjusted in order to indicate an 'empty' message number + index. + */ + struct uid_db_record **rec; + struct num_ndx *num_ndx; + unsigned ndx; + + num_ndx = &db->num_ndx; + + if (num_ndx->end_value < num_ndx->pos_0_value) { + ndx = num_ndx->pos_0_value - num_ndx->end_value; + while (ndx) { + rec = num_ndx->records + --ndx; + if (*rec) (*rec)->num = 0; + } + + num_ndx->end_value = num_ndx->pos_0_value + 1; + + free(num_ndx->records); + num_ndx->records = NULL; + } +} + +/** search routines */ +struct uid_db_record *find_uid_by_id(struct uid_db *db, char const *id) +{ + /* + Search for an uid_db_record whose id is id in the uid_db pointed + to by db and return a pointer to it or NULL if no such record was + found. + */ + struct pat_node *np; + struct uid_db_record *rec; + unsigned v, bit_ndx, ofs; + size_t len; + + np = db->pat_root; + if (np) { + len = strlen(id); + ofs = -1; + do { + bit_ndx = np->bit_ndx; + + if (bit_ofs(bit_ndx) != ofs) { + ofs = bit_ofs(bit_ndx); + v = ofs < len ? id[ofs] : 0; + } + + np = ptrs(np)[v & np->bit_mask ? 1 : -1]; + } while (np && np->bit_ndx > bit_ndx); + + if (!np) return NULL; + + rec = np->rec; + return rec->id_len == len && memcmp(id, rec->id, len) == 0 ? + rec : NULL; + } + + return NULL; +} + +struct uid_db_record *last_uid_in_db(struct uid_db *db, char const *id) +{ + /* + Return a pointer to the 'last' (insert order) uid_db_record + contained in the uid_db pointed to by db whose id is id or + NULL if no such record exists. + */ + struct uid_db_record *rec; + + rec = find_uid_by_id(db, id); + if (!rec) return NULL; + + while (rec->next) rec = rec->next; + return rec; +} + +/** destruction */ +static void free_uid_list(struct uid_db_record *rec) +{ + /* + Free the list of uid_db_records starting with + the record pointed to by rec. + */ + if (rec->next) free_uid_list(rec->next); + + xfree(rec->id); + xfree(rec); +} + +static void free_pat_trie(struct pat_node *np) +{ + /* + Free the PATRCIA-trie pointed to by np and all + uid_db_records contained in it. + + The algorithm implemented below is: + + 1. Load the left pointer of the node pointed to by + np into next. + + 2. If the result is not NULL, + 2a) Set the left pointer to NULL. + 2b) Goto 1 if next points to a child of np. + + 3. Load the right pointer of the node pointed to by + np into next. + + 4. If the result is not NULL, + 4a) Set the right pointer to NULL. + 4b) Goto 1 id next points to a child of np. + + 5. Load next with the parent pointer of np. + + 6. Free np->rec and np. + + 7. Set np to next and goto 1 if it is not null. + */ + struct pat_node *next; + + do { + next = ptrs(np)[-1]; + if (next) { + ptrs(np)[-1] = NULL; + if (next->bit_ndx > np->bit_ndx) continue; + } + + next = ptrs(np)[1]; + if (next) { + ptrs(np)[1] = NULL; + if (next->bit_ndx > np->bit_ndx) continue; + } + + next = *ptrs(np); + + free_uid_list(np->rec); + free(np); + } while ((np = next)); +} + +void free_uid_db(struct uid_db *db) +{ + /* + Free all dynamically allocated memory of the uid_db + pointed to by db. The structure is not reinitialized. + */ + if (db->pat_root) free_pat_trie(db->pat_root); + + xfree(db->records); + xfree(db->num_ndx.records); +} + +/** various public interfaces */ +void init_uid_db(struct uid_db *db) +{ + /* + Initialize the uid_db structure pointed to by db 'properly' + such that it represents an empty database. An array of + size MIN_RECORDS is allocated and assigned to db->records. + */ + struct num_ndx *num_ndx; + + db->pat_root = NULL; + + db->records = xmalloc(MIN_RECORDS * sizeof(*db->records)); + db->records_max = MIN_RECORDS; + db->records_next = 0; + + num_ndx = &db->num_ndx; + num_ndx->pos_0_value = num_ndx->end_value = -1; + num_ndx->records = NULL; +} + +void swap_uid_db_data(struct uid_db *db_0, struct uid_db *db_1) +{ + struct uid_db tmp; + + tmp = *db_0; + *db_0 = *db_1; + *db_1 = tmp; +} + +int traverse_uid_db(struct uid_db *db, + uid_db_traversal_routine *r, void *arg) +{ + /* + Traverses the struct uid_db records array in insert order, + invoking the subroutine pointed to by r with a pointer to + each record and the arg pointer as arguments. If the return + value of that is non-zero, traverse_uid_db immediately returns + with this value. Otherwise, zero is returned after the last + record was visited. + + The uid_db_traversal_routine must not modify the uid_db during + traversal. + */ + struct uid_db_record **recs; + unsigned ndx, max; + int rc; + + rc = 0; + ndx = 0; + max = db->records_next; + recs = db->records; + while (ndx < max && (rc = r(recs[ndx], arg)) == 0) + ++ndx; + + return rc; +} --- /dev/null 1 Jan 1970 00:00:00 -0000 +++ fetchmail/uid_db.h 24 May 2010 19:18:38 -0000 1.3.2.19 @@ -0,0 +1,141 @@ +/* + POP3 UID database + + Copyright (c) 2010 MAD Partners, Ltd. (rwe...@ms...) + + This file is being published in accordance with the GPLv2 terms + contained in the COPYING file being part of the fetchmail + 6.3.17 release, including the OpenSSL exemption. +*/ +#ifndef fetchmail_uid_db_h +#define fetchmail_uid_db_h + +/* includes */ +#include <stddef.h> + +/* types */ +struct uid_db_record { + char *id; + size_t id_len; + + /* + num - message number assigned by server + status - message status (eg seen, deleted, ...) + pos - position in record list + */ + unsigned num, status, pos; + + struct uid_db_record *next; +}; + +struct num_ndx { + /* + Used to find uid records by message number. + + pos_0_value - highest message number + end_value - lowest known message number + + Grows downwards because the fastuidl-code may record + message numbers in non-ascending order but the + lookup array should ideally only be large enough to + store pointers to interesting ('new') messages. + */ + struct uid_db_record **records; + unsigned pos_0_value, end_value; +}; + +struct uid_db +{ + struct pat_node *pat_root; + + struct uid_db_record **records; + unsigned records_max, records_next; + + struct num_ndx num_ndx; +}; + +typedef int uid_db_traversal_routine(struct uid_db_record *, void *); + +/* routines */ +/** initialization/ finalization */ +void init_uid_db(struct uid_db *db); + +void free_uid_db(struct uid_db *db); + +static inline void clear_uid_db(struct uid_db *db) +{ + free_uid_db(db); + init_uid_db(db); +} + +/** message number index handling */ +static inline unsigned uid_db_num_ofs(struct num_ndx const *num_ndx, unsigned num) +{ + return num_ndx->pos_0_value - num; +} + +void set_uid_db_num(struct uid_db *db, struct uid_db_record *rec, + unsigned num); + +static inline void set_uid_db_num_pos_0(struct uid_db *db, unsigned num) +{ + db->num_ndx.pos_0_value = num; + db->num_ndx.end_value = num + 1; +} + +void reset_uid_db_nums(struct uid_db *db); + +/** various uidl db manipulatiors */ +struct uid_db_record *uid_db_insert(struct uid_db *db, + char const *id, unsigned status); + +void swap_uid_db_data(struct uid_db *db_0, struct uid_db *db_1); + +/** search routines */ +struct uid_db_record *find_uid_by_id(struct uid_db *db, char const *id); + +static inline struct uid_db_record * +find_uid_by_num(struct uid_db *db, unsigned num) +{ + struct num_ndx *num_ndx; + + num_ndx = &db->num_ndx; + return num >= num_ndx->end_value ? + num_ndx->records[uid_db_num_ofs(num_ndx, num)] : NULL; +} + +static inline struct uid_db_record * +find_uid_by_pos(struct uid_db *db, unsigned pos) +{ + return pos < db->records_next ? db->records[pos] : NULL; +} + +static inline struct uid_db_record * +first_uid_in_db(struct uid_db *db, char const *id) +{ + return find_uid_by_id(db, id); +} + +struct uid_db_record *last_uid_in_db(struct uid_db *db, char const *id); + +/** various accessors */ +static inline unsigned uid_db_n_records(struct uid_db const *db) +{ + return db->records_next; +} + +/* + Traverses the struct uid_db records array in insert order, + invoking the subroutine pointed to by r with a pointer to + each record and the arg pointer as arguments. If the return + value of that is non-zero, traverse_uid_db immediately returns + with this value. Otherwise, zero is returned after the last + record was visited. + + The uid_db_traversal_routine must not modify the uid_db during + traversal. +*/ +int traverse_uid_db(struct uid_db *db, + uid_db_traversal_routine *r, void *arg); + +#endif |
From: Matthias A. <mat...@gm...> - 2010-05-21 03:11:39
|
Am 20.05.2010 22:34, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: > > [...] > >> Looks quite alright at first glance. As a general remark, I wonder if >> it would be worthwhile splitting the PATRICIA core code from the >> fetchmail interfacing code. > > My 'general' opinion on this is that abstractions aren't free and > creating them 'just in case' should rather be avoided. OK. > For the sake of completeness: 'implicit int' refers to not declaring a > function return value which traditionally caused compilers to assume > that the function would return a value of type 'int'. This is no > longer valid C, as of ISO/IEC 9899:1999 (E). OTOH, the set of valid > type specifiers and type specifier combinations is defined in 6.7.2 > and includes the following text: > > Each list of type specifiers shall be one of the following > sets (delimited by commas, when there is more than one set on > a line); > > [...] > > -- int, signed, or signed int > -- unsigned, or unsigned int > [6.7.2|2] > > Meaning, not only is 'unsigned' a completely legitimate way to declare > an unsigned integer but actually, 'signed' is also valid instead of > 'int'. That said, I am completely willing to comply with any 'style > rules' you consider to be essential. I stand corrected. In the light of your quote from the C99 standard, please ignore my previous comment in this regard. I was indeed confusing function return arguments with variable types. Sorry! It was less of a style than rather a technical (namely, portability) argument, which is now void. Also, I don't think I'm currently concerned with style - fetchmail source is using different styles... as long as it remains concise, it will be fine. If you're familiar with Doxygen (Javadoc), writings docs in .h files in a format that Doxygen can parse would be nice, but no need to get acquainted with it if you prefer a different style. Anything sensible is better than no documentation. :-) > General (so that it's before the 'long quoted text'): As I wrote, > integration of this code with fetchmail was completed yesterday and > the resulting binary has survived a night of pulling e-mail from my > account. The most expensive part is now the lexer. I am going to run So you're essentially saying that the rcfile scanner (lexer) dominates execution time? That's a good thing for fetchmail, and a bad thing for its scanner. The practical workaround is daemon mode, which would only read the rcfile on startup and if it changes. We could have flex inflate its tables so it gets faster, but this easily goes in the hundreds-of-kbytes for fetchmail's rather complex lexer. > this for another night tonight, this time with fastuidl > enabled. Should this also go well, I will most likely move this onto > our 'production server' tomorrow. Except a few bits of missing > functionality (such as a traversal routine), the most notable change I Just wondering, how are you currently writing out the UIDs without a traversal routine? > have done is that I have replaced the recursive trie-freeing routine > with an iterative one: While this is - at best - only marginally > faster (for my test dataset), it has the nice property of needing no > additional storage beyond what's already used for the trie itself. Good. > I've also had an idea how to restrict the size of the num_ndx array to > what is actually needed: Since the maximum message number is known in > advanced, the vector can grow 'downward' such that position zero is > assumed to hold the largest possible message number and lower indices > are then inserted sufficiently distant from that with the array being > extended as required. Lookup would then use negative offsets relative > to a pointer to the end of the array, adjusted according to the > 'logical position' of the end of the array relative to the message > number 1. I think I will get this implemented tomorrow and could then > basically produce a patch relative to 6.3.17 which could serve as > base for further changes. I'm not sure if it's worth the effort of redoing the array handling just to shave off a dozen of realloc() calls. Keep maintainability in mind; in my professional live, I've been debugging somebody else's (rather awkward [1]) code for many months now, and I'd say maintainability wins over a few % of execution speed. [1] That other code, entirely unrelated to fetchmail, often follows copy & paste & modify one line programming style, which is very error-prone... >> $ gcc -O -Wc++-compat -std=c99 -Wall -Wextra -pedantic \ >> -o uidl_db uidl_db.c -Wno-unused -Dinline=__inline__ >> # -Dinline for -std=c89 :-) >> uidl_db.c: In function ‘init_uidl_db’: >> uidl_db.c:63: warning: request for implicit conversion from ‘void *’ >> to ‘struct uidl_record **’ not permitted in C++ > > This is just my usual laziness :-): Since void * is kind of a > passepartout type in C, I usually just cast to void * whenever any > pointer conversion is needed. I've changed that to use the actual > destination type. I was also too lazy to remove that from the compiler logs :) I'll be happy to fix this myself should I go for C++ later. >> uidl_db.c: In function ‘walk_down’: >> uidl_db.c:83: warning: comparison between signed and unsigned integer >> expressions >> uidl_db.c:85: warning: comparison between signed and unsigned integer >> expressions > > This was due to the ofs variable being of type int. I've changed that > to unsigned, which didn't matter for the code. This will, of course, Yup. The only thing is that loops need to be written more carefully because ">= 0" or thereabouts now always holds. I think that's what you meant by "didn't matter for the code". > break horribly if UIDs longer than 2^32 - 1 chars are supposed to be > processed (or if bit indices higher than 2^31-1 are needed) but I > assume that both propositions are not exactly realistic (the POP3 > specification already prohibits this: Maximum 'payload' of a response > line is 510 octets, according to RFC1939, sec. 3). I propose to document that somewhere in a quiet place in fetchmail.man, but to not even add code checks for that - a FIXME comment will be sufficient IMO. >> uidl_db.c:314: warning: ‘v’ may be used uninitialized in this function >> /tmp/cc6WZFTd.o: In function `main': > > I don't usually add spurious initializations just to silence these > warnings if they are obviously wrong. GCC is also sometimes off track, hadn't checked any of the warnings except for the ffs(). >> uidl_db.c(92): warning #1684: conversion from pointer to same-sized >> integral type (potential portability problem) >> offsetof(struct pat_node, ptrs_[2]) >> ^ > > I do not have the slightest idea what this is supposed to refer to. I'd have to see Intel C++'s stddef.h to see what's really up, but this looks like a compiler issue (either way, as stddef.h is a compiler provided header). A pointer difference *is* an integral type; quoting POSIX (which is more specific as to the return type than C99 - your quote below - is): offsetof(type, member-designator) Integer constant expression of type size_t, the value of which is the offset in bytes to the structure member (member-designa‐ tor), from the beginning of its structure (type). But it's indeed a portability problem. Proper code is not portable to Intel C++. 8-) > [...] > >> uidl_db.c(119): remark #2259: non-pointer conversion from "int" to >> "unsigned char" may lose significant bits >> v = r0->id[ofs] ^ r1->id[ofs]; >> ^ > > Since both operands come from arrays of char and are only insofar of > type 'int' as the values are supposed to be promoted to int, this > seems to be an extremly poor warning. If the sources are "char" (whatever sign), it's safe. Looks like a compiler defect. I need to re-test with the latest patch release, and if it persists I need to regain access to the Intel site so I can file a bug report. >> uidl_db.c(244): warning #2259: non-pointer conversion from >> "size_t={unsigned long}" to "unsigned int" may lose significant bits >> id_len = strlen(id); >> ^ > > I would rather keep this as 'unsigned' because size_t may be a 64-bit > type on 64-bit systems and this seems exessive for representing the > length of something restricted to less than 510 octets. Seems this warning is in order and size_t would be OK, or at least checking against UINT_MAX as a fall-back. Probably not in this piece of the code though, but in parseuid - reject (skip) overlong UIDs. I have yet to see a site that comes even close to the 70 characters that POP3 allows :-) what's wrong with size_t? On 64-bit machines, the other four bytes come at no extra cost. In doubt someone will be padding anyways on those computers. :) > > [...] > >> uidl_db.c:92:6: warning: using extended field designator is an >> extension [-pedantic] >> offsetof(struct pat_node, ptrs_[2]) >> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The relevant text of the C-standard is > > offsetof(type, member-designator) > > [...] > > The type and member designator shall be such that given > > static type t; > > then the expression &(t.member-designator) evaluates to an > address constant. > [7.17|3] > > Assuming the following code: > > static struct pat_node t; > > &(t.ptrs_[2]) certainly evaluates to an address constant, so this > appears to either be a traditional restriction or a C89-one which > accidentally escaped into LLVM/clang using someone remembering it as > 'transport medium'. Open issue for me, might be a clang bug, or a configuration issue. Will have to check with an SVN version and file a report if it is a bug. Same computer as for Intel C++, it's powered down, and I'm 100 km away. Other than that, I'm really looking forward to the code. Thanks a lot. Best regards Matthias |
From: Rainer W. <rwe...@ms...> - 2010-05-20 22:34:49
|
"Matthias Andree" <mat...@gm...> writes: [...] > Looks quite alright at first glance. As a general remark, I wonder if > it would be worthwhile splitting the PATRICIA core code from the > fetchmail interfacing code. My 'general' opinion on this is that abstractions aren't free and creating them 'just in case' should rather be avoided. [...] > Could I also ask that you spell out implicit "int" types, i. e. change > unsigned to unsigned int? Implicit int is obsolecscent usage. It is > valid C89, but causes C99 and C++ compiler complaints. For the sake of completeness: 'implicit int' refers to not declaring a function return value which traditionally caused compilers to assume that the function would return a value of type 'int'. This is no longer valid C, as of ISO/IEC 9899:1999 (E). OTOH, the set of valid type specifiers and type specifier combinations is defined in 6.7.2 and includes the following text: Each list of type specifiers shall be one of the following sets (delimited by commas, when there is more than one set on a line); [...] -- int, signed, or signed int -- unsigned, or unsigned int [6.7.2|2] Meaning, not only is 'unsigned' a completely legitimate way to declare an unsigned integer but actually, 'signed' is also valid instead of 'int'. That said, I am completely willing to comply with any 'style rules' you consider to be essential. [...] > To give a feeling for compiler verbosity, I'm including compiler > diagnostics from three compilers below, omitting warnings for unused > variables, and more comments inline. The (meanwhile completely integrated) code compiled as part of fetchmail without warnings but I have done a -W -Wall gcc pass and adressed everything which came up (including one actual error). I'd like to make a few statemens regarding specific warnings below. General (so that it's before the 'long quoted text'): As I wrote, integration of this code with fetchmail was completed yesterday and the resulting binary has survived a night of pulling e-mail from my account. The most expensive part is now the lexer. I am going to run this for another night tonight, this time with fastuidl enabled. Should this also go well, I will most likely move this onto our 'production server' tomorrow. Except a few bits of missing functionality (such as a traversal routine), the most notable change I have done is that I have replaced the recursive trie-freeing routine with an iterative one: While this is - at best - only marginally faster (for my test dataset), it has the nice property of needing no additional storage beyond what's already used for the trie itself. I've also had an idea how to restrict the size of the num_ndx array to what is actually needed: Since the maximum message number is known in advanced, the vector can grow 'downward' such that position zero is assumed to hold the largest possible message number and lower indices are then inserted sufficiently distant from that with the array being extended as required. Lookup would then use negative offsets relative to a pointer to the end of the array, adjusted according to the 'logical position' of the end of the array relative to the message number 1. I think I will get this implemented tomorrow and could then basically produce a patch relative to 6.3.17 which could serve as base for further changes. ? > $ gcc -O -Wc++-compat -std=c99 -Wall -Wextra -pedantic \ > -o uidl_db uidl_db.c -Wno-unused -Dinline=__inline__ > # -Dinline for -std=c89 :-) > uidl_db.c: In function ‘init_uidl_db’: > uidl_db.c:63: warning: request for implicit conversion from ‘void *’ > to ‘struct uidl_record **’ not permitted in C++ This is just my usual laziness :-): Since void * is kind of a passepartout type in C, I usually just cast to void * whenever any pointer conversion is needed. I've changed that to use the actual destination type. > uidl_db.c: In function ‘walk_down’: > uidl_db.c:83: warning: comparison between signed and unsigned integer > expressions > uidl_db.c:85: warning: comparison between signed and unsigned integer > expressions This was due to the ofs variable being of type int. I've changed that to unsigned, which didn't matter for the code. This will, of course, break horribly if UIDs longer than 2^32 - 1 chars are supposed to be processed (or if bit indices higher than 2^31-1 are needed) but I assume that both propositions are not exactly realistic (the POP3 specification already prohibits this: Maximum 'payload' of a response line is 510 octets, according to RFC1939, sec. 3). > uidl_db.c:314: warning: ‘v’ may be used uninitialized in this function > /tmp/cc6WZFTd.o: In function `main': I don't usually add spurious initializations just to silence these warnings if they are obviously wrong. [...] > uidl_db.c(92): warning #1684: conversion from pointer to same-sized > integral type (potential portability problem) > offsetof(struct pat_node, ptrs_[2]) > ^ I do not have the slightest idea what this is supposed to refer to. [...] > uidl_db.c(119): remark #2259: non-pointer conversion from "int" to > "unsigned char" may lose significant bits > v = r0->id[ofs] ^ r1->id[ofs]; > ^ Since both operands come from arrays of char and are only insofar of type 'int' as the values are supposed to be promoted to int, this seems to be an extremly poor warning. > uidl_db.c(244): warning #2259: non-pointer conversion from > "size_t={unsigned long}" to "unsigned int" may lose significant bits > id_len = strlen(id); > ^ I would rather keep this as 'unsigned' because size_t may be a 64-bit type on 64-bit systems and this seems exessive for representing the length of something restricted to less than 510 octets. [...] > uidl_db.c:92:6: warning: using extended field designator is an > extension [-pedantic] > offsetof(struct pat_node, ptrs_[2]) > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The relevant text of the C-standard is offsetof(type, member-designator) [...] The type and member designator shall be such that given static type t; then the expression &(t.member-designator) evaluates to an address constant. [7.17|3] Assuming the following code: static struct pat_node t; &(t.ptrs_[2]) certainly evaluates to an address constant, so this appears to either be a traditional restriction or a C89-one which accidentally escaped into LLVM/clang using someone remembering it as 'transport medium'. |
From: Matthias A. <mat...@gm...> - 2010-05-19 21:48:40
|
Am 19.05.2010, 20:30 Uhr, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: > > [...] > >> please do (show an advance/preview version) - if you don't want to >> expose it to the list yet for whatever reason, but want my comments, >> feel free to send personal mail. > > I'll happily listen to anyone who has something more constructive to > say about this than "I hate it!". This is my last 'standalone test > program'. It can be compiled and used to do testing/ measurement of > the trie-code based on reading 'unique message IDs' from stdin which > must point to 'something seekable'. The interface is known to be > incomplete and the halfway integrated version contains everything in > here and whatever was additionally needed in order to use this as > 'backend' for the code in uid.c. My present approach to doing lookups > based on 'message numbers' is very simplistic --- just an array going > from 0 .. <highest message number>. There's a potential scalability > issue in there: Realloc with a constant increment is O(n^2) when the > memory area needs to be copied. If the pop3.c code works as I believe > it does (inserts message numbers in increasing order), I will likely Hi Rainer, thanks a lot! Regarding what you're writing: We know the message count in advance, because we query it (through STAT, in pop3_getrange()), so we have a good guess of how many messages there are, and we need very few realloc(). Look for *countp (a pointer passed by the caller) to get the message count. Message numbers are also ephemeral (i. e. not saved to disk), in case that helps. > change this to additionally store the offset of the first element in > the array to keep this shorter for the case when a large mailbox > contains a few new messages at the end. I will also change the > realloc-calculation to something more sensible (usually, this means > doubling the allocated area every time it needs to grow. > > Any feedback on the implementation would be very welcome. Looks quite alright at first glance. As a general remark, I wonder if it would be worthwhile splitting the PATRICIA core code from the fetchmail interfacing code. gets(DEST) is a no-go for me - please use fgets(DEST, sizeof(DEST), stdin), even if it's just in test code. I've often wasted time debugging such test code because it wasn't half as robust as the non-test code. Could I also ask that you spell out implicit "int" types, i. e. change unsigned to unsigned int? Implicit int is obsolecscent usage. It is valid C89, but causes C99 and C++ compiler complaints. Please spell out "unsigned int", or use uintfast_t or something like that. I wonder if using a typedef for the integers in the .h file would be useful. To give a feeling for compiler verbosity, I'm including compiler diagnostics from three compilers below, omitting warnings for unused variables, and more comments inline. ======================================================================================= GCC 4.4: ======================================================================================= $ gcc -O -Wc++-compat -std=c99 -Wall -Wextra -pedantic \ -o uidl_db uidl_db.c -Wno-unused -Dinline=__inline__ # -Dinline for -std=c89 :-) uidl_db.c: In function ‘init_uidl_db’: uidl_db.c:63: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record **’ not permitted in C++ uidl_db.c: At top level: uidl_db.c:69: warning: ISO C does not allow extra ‘;’ outside of a function uidl_db.c: In function ‘walk_down’: uidl_db.c:83: warning: comparison between signed and unsigned integer expressions uidl_db.c:85: warning: comparison between signed and unsigned integer expressions uidl_db.c:93: warning: request for implicit conversion from ‘void *’ to ‘struct pat_node *’ not permitted in C++ uidl_db.c: In function ‘first_set_bit_in_char’: uidl_db.c:100: warning: implicit declaration of function ‘ffs’ uidl_db.c: In function ‘get_pat_node’: uidl_db.c:152: warning: request for implicit conversion from ‘void *’ to ‘struct pat_node *’ not permitted in C++ uidl_db.c: In function ‘get_uidl_record’: uidl_db.c:242: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record *’ not permitted in C++ uidl_db.c:245: warning: request for implicit conversion from ‘void *’ to ‘char *’ not permitted in C++ uidl_db.c: In function ‘insert_into_records’: uidl_db.c:262: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record **’ not permitted in C++ uidl_db.c: In function ‘set_uidl_num’: uidl_db.c:299: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record **’ not permitted in C++ uidl_db.c: In function ‘find_uidl_by_id’: uidl_db.c:325: warning: comparison between signed and unsigned integer expressions uidl_db.c:327: warning: comparison between signed and unsigned integer expressions uidl_db.c:314: warning: ‘v’ may be used uninitialized in this function /tmp/cc6WZFTd.o: In function `main': uidl_db.c:(.text+0x602): warning: the `gets' function is dangerous and should not be used. I'm not so concerned about the extra ; or the implicit conversions, but I am concerned about the signedness mismatches. The ffs declaration is harmless and fixed with "#include <strings.h>". ======================================================================================= Intel C++ 11.1: ======================================================================================= $ icc -O -w2 -Wdeprecated -Wall -Wcheck -Wp64 -diag-disable 177,1418 -std=c99 -o uidl_db uidl_db.c uidl_db.c(69): remark #424: extra ";" ignored }; ^ uidl_db.c(92): warning #1684: conversion from pointer to same-sized integral type (potential portability problem) offsetof(struct pat_node, ptrs_[2]) ^ uidl_db.c(93): warning #1684: conversion from pointer to same-sized integral type (potential portability problem) : offsetof(struct pat_node, ptrs_[0]))); ^ uidl_db.c(100): warning #266: function "ffs" declared implicitly return ffs(v) - 1; ^ uidl_db.c(119): remark #2259: non-pointer conversion from "int" to "unsigned char" may lose significant bits v = r0->id[ofs] ^ r1->id[ofs]; ^ uidl_db.c(244): warning #2259: non-pointer conversion from "size_t={unsigned long}" to "unsigned int" may lose significant bits id_len = strlen(id); ^ uidl_db.c(317): warning #2259: non-pointer conversion from "size_t={unsigned long}" to "unsigned int" may lose significant bits len = strlen(id); ^ /tmp/iccQw73Ea.o: In function `main': uidl_db.c:(.text+0x6d): warning: the `gets' function is dangerous and should not be used. ======================================================================================= And finally clang 2.7: ======================================================================================= $ clang -pedantic -Wall -O3 -std=c99 -o uidl_db uidl_db.c uidl_db.c:69:2: warning: extra ';' outside of a function }; ^ uidl_db.c:92:6: warning: using extended field designator is an extension [-pedantic] offsetof(struct pat_node, ptrs_[2]) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from uidl_db.c:11: In file included from /usr/include/stdlib.h:33: /usr/local/lib/clang/2.0/include/stddef.h:48:24: note: instantiated from: #define offsetof(t, d) __builtin_offsetof(t, d) ^ uidl_db.c:93:8: warning: using extended field designator is an extension [-pedantic] : offsetof(struct pat_node, ptrs_[0]))); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from uidl_db.c:11: In file included from /usr/include/stdlib.h:33: /usr/local/lib/clang/2.0/include/stddef.h:48:24: note: instantiated from: #define offsetof(t, d) __builtin_offsetof(t, d) ^ uidl_db.c:100:12: warning: implicit declaration of function 'ffs' is invalid in C99 [-Wimplicit-function-declaration] return ffs(v) - 1; ^ ... /tmp/cc-GdRjxd.o: In function `main': uidl_db.c:(.text+0x779): warning: the `gets' function is dangerous and should not be used. ======================================================================================= Further comments inlined below. ======================================================================================= (sorry for the awkward quoting, that's an Opera M2 issue -- if someone has an "automatic mailing list folder/virtual folder generation tool" add-on that behaves similar to Opera so that I don't need to do the 100 lists manually, I'd be grateful and switch soonish.) > ------------- uidl_db.h ------------------- > /* > uidl db > > Copyright (c) 2010 MAD Partners, Ltd. > > This file is being published according to the terms of the > GPLv2 license contained in the 6.3.17 release of fetchmail. > > */ > #ifndef fetchmail_uidl_db_h > #define fetchmail_uidl_db_h > > /* includes */ > #include <stddef.h> > > /* types */ > struct uidl_record { > char *id; > unsigned id_len; > unsigned num, status, pos; > > struct uidl_record *next; > }; > > struct pat_node; > > struct uidl_db > { > struct pat_node *pat_root; > struct uidl_record **records; > unsigned records_max, records_next; What's the difference between _max and _next? Should probably be documented in the .h file later, although it seems quite clear from the init_uid_db() function. > struct uidl_record **num_ndx; > unsigned num_ndx_max; > }; > > /* routines */ > void init_uidl_db(struct uidl_db *db); I wonder if it would be good to start all functions with uidl_* consistently, for namespace purity. > > struct uidl_record *insert_uidl(struct uidl_db *db, > char const *id, unsigned status); > > struct uidl_record *insert_uidl_noindex(struct uidl_db *db, > char const *id, > unsigned status); > > void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, > unsigned num); > > struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id); > > static inline struct uidl_record * > find_uidl_by_num(struct uidl_db *db, unsigned num) > { > return num < db->num_ndx_max ? db->num_ndx[num] : NULL; > } > static inline struct uidl_record * > find_uidl_by_pos(struct uidl_db *db, unsigned pos) > { > return pos < db->records_next ? db->records[pos] : NULL; > } > > static inline struct uidl_record * > first_uidl_in_db(struct uidl_db *db, char const *id) > { > return find_uidl_by_id(db, id); > } > struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id); > > #endif > --------------------------------------- > > ------------------- uidl_db.c ----------------- > /* > uidl db > > Copyright (c) 2010 MAD Partners, Ltd. > > This file is being published according to the terms of the > GPLv2 license contained in the 6.3.17 release of fetchmail. > */ > > /* includes */ > #include <stdlib.h> > #include <stdio.h> > #include <string.h> Please add, for ffs: #include <strings.h> > #include "uidl_db.h" > > /* constants */ > enum { > MIN_RECORDS = 16 > }; > > /* types */ > struct pat_node { > struct pat_node *ptrs_[3]; > unsigned bit_ndx, bit_mask; > > struct uidl_record *rec; > }; > > /* > The idea behind this is that the 'left' pointer is at -1 and > the 'right' pointer at 1. This implies that dealing with 'symmetric > cases' for left and right pointers isn't necessary. There's always > just 'this pointer' (offset n) and 'the other pointer' (offset -n). > */ > #define ptrs(p) ((p)->ptrs_ + 1) > > /* routines */ > void *xmalloc(size_t len) > { > return malloc(len); > } > > void *xrealloc(void *p, size_t new_size) > { > return realloc(p, new_size); > } > > static inline unsigned bit_ofs(unsigned bit_ndx) > { > return bit_ndx >> 3; > } > > static inline unsigned bit_mask(unsigned bit_ndx) > { > return 1 << (bit_ndx & 7); > } > > void init_uidl_db(struct uidl_db *db) > { > db->pat_root = NULL; > db->records = xmalloc(MIN_RECORDS * sizeof(*db->records)); > db->records_max = MIN_RECORDS; > db->records_next = 0; > > db->num_ndx = NULL; > db->num_ndx_max = 0; > }; > > static struct pat_node *walk_down(struct uidl_db *db, struct uidl_record > *rec, > struct pat_node ***edgep, struct pat_node **parentp) > { > struct pat_node *cur, **edge; > unsigned bit_ndx, v; > int ofs; > > cur = db->pat_root; > ofs = -1; > do { > bit_ndx = cur->bit_ndx; > > if (bit_ofs(bit_ndx) != ofs) { > ofs = bit_ofs(bit_ndx); > v = ofs < rec->id_len ? rec->id[ofs] : 0; > } > > edge = ptrs(cur) + (v & cur->bit_mask ? 1 : -1); > } while ((cur = *edge) && cur->bit_ndx > bit_ndx); > > *parentp = (void *)((char *)edge - (v & bit_mask(bit_ndx) ? > offsetof(struct pat_node, ptrs_[2]) > : offsetof(struct pat_node, ptrs_[0]))); > *edgep = edge; > return cur; > } > > static inline unsigned first_set_bit_in_char(unsigned v) > { > return ffs(v) - 1; > } > > static int find_first_diff_bit(struct uidl_record const *r0, > struct uidl_record const *r1) > { > struct uidl_record const *long_id; > unsigned ofs, max; > unsigned char v; > > max = r0->id_len; > if (max > r1->id_len) { > max = r1->id_len; > long_id = r0; > } else > long_id = r1; > > ofs = 0; > do > v = r0->id[ofs] ^ r1->id[ofs]; > while (!v && ++ofs < max); > > if (!v) { > if (r0->id_len == r1->id_len) return -1; > v = long_id->id[ofs]; > } > > return first_set_bit_in_char(v) + ofs * 8; > } > > static inline int record_id_equal(struct uidl_record const *r0, > struct uidl_record const *r1) > { > return > r0->id_len == r1->id_len > && memcmp(r0->id, r1->id, r0->id_len) == 0; > } > > static struct uidl_record *append_to_list(struct uidl_record **recp, > struct uidl_record *rec) > { > while (*recp) recp = &(*recp)->next; > *recp = rec; > rec->next = NULL; > return rec; > } > > static struct pat_node *get_pat_node(struct uidl_record *rec) > { > struct pat_node *np; > > np = xmalloc(sizeof(*np)); > np->rec = rec; > rec->next = NULL; > return np; > } > > static struct pat_node *get_standalone_node(struct uidl_record *rec) > { > struct pat_node *np; > > np = get_pat_node(rec); > np->bit_ndx = first_set_bit_in_char(*rec->id); > np->bit_mask = bit_mask(np->bit_ndx); > return np; > } > > static inline unsigned bit_set(unsigned bit_ndx, struct uidl_record > const *rec) > { > unsigned ofs; > > ofs = bit_ofs(bit_ndx); > if (ofs >= rec->id_len) return 0; > return rec->id[ofs] & bit_mask(bit_ndx); > } > > static inline struct pat_node *walk_up(unsigned diff_ndx, struct > pat_node **parent) > { > struct pat_node *p, *np; > > np = *parent; > while ((p = *ptrs(np)) && p->bit_ndx > diff_ndx) > np = p; > > *parent = p; > return np; > } > > static struct uidl_record *pat_insert(struct uidl_db *db, > struct uidl_record *rec) > { > struct pat_node *np, *closest, *parent, **edge; > int me, bit_ndx; > > if (!db->pat_root) { > np = get_standalone_node(rec); > ptrs(np)[-1] = *ptrs(np) = NULL; > ptrs(np)[1] = np; > > db->pat_root = np; > return rec; > } > > closest = walk_down(db, rec, &edge, &parent); > > if (closest) { > bit_ndx = find_first_diff_bit(closest->rec, rec); > if (bit_ndx < 0) > return append_to_list(&closest->rec->next, rec); > > np = get_pat_node(rec); > np->bit_ndx = bit_ndx; > np->bit_mask = bit_mask(bit_ndx); > } else > np = get_standalone_node(rec); > > if (parent->bit_ndx > np->bit_ndx) { > closest = walk_up(np->bit_ndx, &parent); > > if (!parent) edge = &db->pat_root; > else edge = ptrs(parent)[-1] == closest ? > ptrs(parent) - 1 : ptrs(parent) + 1; > *ptrs(closest) = np; > } > *edge = np; > *ptrs(np) = parent; > > me = bit_set(np->bit_ndx, rec) ? 1 : -1; > ptrs(np)[me] = np; > ptrs(np)[-me] = closest; > > return rec; > } > > static struct uidl_record *get_uidl_record(char const *id, unsigned > status) > { > struct uidl_record *rec; > unsigned id_len; > > rec = xmalloc(sizeof(*rec)); > > id_len = strlen(id); > rec->id = memcpy(xmalloc(id_len + 1), id, id_len + 1); > rec->id_len = id_len; > rec->status = status; > rec->num = 0; > > return rec; > } > > static struct uidl_record *insert_into_records(struct uidl_db *db, > struct uidl_record *rec) > { > unsigned next, want; > > next = db->records_next; > if (next == db->records_max) { > want = db->records_max *= 2; > db->records = xrealloc(db->records, want * sizeof(rec)); > } > > rec->pos = next; > db->records[next] = rec; > db->records_next = next + 1; > > return rec; > } > > struct uidl_record *insert_uidl(struct uidl_db *db, > char const *id, unsigned status) > { > struct uidl_record *rec; > > rec = insert_uidl_noindex(db, id, status); > return pat_insert(db, rec); > } > > struct uidl_record *insert_uidl_noindex(struct uidl_db *db, > char const *id, unsigned status) > { > struct uidl_record *rec; > > rec = get_uidl_record(id, status); > return insert_into_records(db, rec); > } > > void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, > unsigned num) > { > unsigned want, have; > > have = db->num_ndx_max; > if (num >= have) { > want = num + MIN_RECORDS; > db->num_ndx = xrealloc(db->num_ndx, want * sizeof(*db->num_ndx)); > db->num_ndx_max = want; > > do db->num_ndx[--want] = NULL; while (want > have); > } > > db->num_ndx[num] = rec; > rec->num = num; > } > > struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id) > { > static unsigned step_sum, searches; > struct pat_node *np; > struct uidl_record *rec; > unsigned v, bit_ndx, len; > int ofs; > len = strlen(id); > np = db->pat_root; > if (np) { > ofs = -1; > do { > bit_ndx = np->bit_ndx; > > if (bit_ofs(bit_ndx) != ofs) { > ofs = bit_ofs(bit_ndx); > v = ofs < len ? id[ofs] : 0; > } > > np = ptrs(np)[v & np->bit_mask ? 1 : -1]; > } while (np && np->bit_ndx > bit_ndx); > > if (!np) return NULL; > > rec = np->rec; > return rec->id_len == len && memcmp(id, rec->id, len) == 0 ? > rec : NULL; > } > > #if 0 > report(stderr, GT_("id search in unindexed db\n")); > #endif > > ofs = db->records_next; > while (ofs) { > rec = db->records[--ofs]; > if (rec->id_len == len && memcmp(rec->id, id, len) == 0) > return rec; > } > > return 0; > } > > struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id) > { > struct uidl_record *rec; > > rec = find_uidl_by_id(db, id); > if (!rec) return NULL; > > while (rec->next) rec = rec->next; > return rec; > } > > static void free_uidl_list(struct uidl_record *rec) > { > if (rec->next) free_uidl_list(rec->next); > > free(rec->id); > free(rec); > } > > static void free_pat_trie(struct pat_node *np) > { > struct pat_node *child; > > child = ptrs(np)[-1]; > if (child && child->bit_ndx > np->bit_ndx) free_pat_trie(child); > > child = ptrs(np)[1]; > if (child->bit_ndx > np->bit_ndx) free_pat_trie(child); > > free_uidl_list(np->rec); > free(np); > } > > void free_uidl_db(struct uidl_db *db) > { > struct uidl_record *rec; > unsigned ndx; > if (db->pat_root) free_pat_trie(db->pat_root); > > free(db->records); > free(db->num_ndx); > } > > /* test code */ > int main(void) > { > struct uidl_db db; > struct uidl_record *rec; > unsigned n; > char buf[80]; > init_uidl_db(&db); > while (gets(buf)) insert_uidl(&db, buf, 0); > > n = 1000; > while (n--) { > fseek(stdin, 0, 0); > while (gets(buf)) { > rec = find_uidl_by_id(&db, buf); > if (!rec || strcmp(rec->id, buf) != 0) > fprintf(stderr, "not found\n"); > } > } > return 0; > } > --------------------------------------------- -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-19 20:30:58
|
"Matthias Andree" <mat...@gm...> writes: [...] > please do (show an advance/preview version) - if you don't want to > expose it to the list yet for whatever reason, but want my comments, > feel free to send personal mail. I'll happily listen to anyone who has something more constructive to say about this than "I hate it!". This is my last 'standalone test program'. It can be compiled and used to do testing/ measurement of the trie-code based on reading 'unique message IDs' from stdin which must point to 'something seekable'. The interface is known to be incomplete and the halfway integrated version contains everything in here and whatever was additionally needed in order to use this as 'backend' for the code in uid.c. My present approach to doing lookups based on 'message numbers' is very simplistic --- just an array going from 0 .. <highest message number>. There's a potential scalability issue in there: Realloc with a constant increment is O(n^2) when the memory area needs to be copied. If the pop3.c code works as I believe it does (inserts message numbers in increasing order), I will likely change this to additionally store the offset of the first element in the array to keep this shorter for the case when a large mailbox contains a few new messages at the end. I will also change the realloc-calculation to something more sensible (usually, this means doubling the allocated area every time it needs to grow. Any feedback on the implementation would be very welcome. ------------- uidl_db.h ------------------- /* uidl db Copyright (c) 2010 MAD Partners, Ltd. This file is being published according to the terms of the GPLv2 license contained in the 6.3.17 release of fetchmail. */ #ifndef fetchmail_uidl_db_h #define fetchmail_uidl_db_h /* includes */ #include <stddef.h> /* types */ struct uidl_record { char *id; unsigned id_len; unsigned num, status, pos; struct uidl_record *next; }; struct pat_node; struct uidl_db { struct pat_node *pat_root; struct uidl_record **records; unsigned records_max, records_next; struct uidl_record **num_ndx; unsigned num_ndx_max; }; /* routines */ void init_uidl_db(struct uidl_db *db); struct uidl_record *insert_uidl(struct uidl_db *db, char const *id, unsigned status); struct uidl_record *insert_uidl_noindex(struct uidl_db *db, char const *id, unsigned status); void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, unsigned num); struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id); static inline struct uidl_record * find_uidl_by_num(struct uidl_db *db, unsigned num) { return num < db->num_ndx_max ? db->num_ndx[num] : NULL; } static inline struct uidl_record * find_uidl_by_pos(struct uidl_db *db, unsigned pos) { return pos < db->records_next ? db->records[pos] : NULL; } static inline struct uidl_record * first_uidl_in_db(struct uidl_db *db, char const *id) { return find_uidl_by_id(db, id); } struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id); #endif --------------------------------------- ------------------- uidl_db.c ----------------- /* uidl db Copyright (c) 2010 MAD Partners, Ltd. This file is being published according to the terms of the GPLv2 license contained in the 6.3.17 release of fetchmail. */ /* includes */ #include <stdlib.h> #include <stdio.h> #include <string.h> #include "uidl_db.h" /* constants */ enum { MIN_RECORDS = 16 }; /* types */ struct pat_node { struct pat_node *ptrs_[3]; unsigned bit_ndx, bit_mask; struct uidl_record *rec; }; /* The idea behind this is that the 'left' pointer is at -1 and the 'right' pointer at 1. This implies that dealing with 'symmetric cases' for left and right pointers isn't necessary. There's always just 'this pointer' (offset n) and 'the other pointer' (offset -n). */ #define ptrs(p) ((p)->ptrs_ + 1) /* routines */ void *xmalloc(size_t len) { return malloc(len); } void *xrealloc(void *p, size_t new_size) { return realloc(p, new_size); } static inline unsigned bit_ofs(unsigned bit_ndx) { return bit_ndx >> 3; } static inline unsigned bit_mask(unsigned bit_ndx) { return 1 << (bit_ndx & 7); } void init_uidl_db(struct uidl_db *db) { db->pat_root = NULL; db->records = xmalloc(MIN_RECORDS * sizeof(*db->records)); db->records_max = MIN_RECORDS; db->records_next = 0; db->num_ndx = NULL; db->num_ndx_max = 0; }; static struct pat_node *walk_down(struct uidl_db *db, struct uidl_record *rec, struct pat_node ***edgep, struct pat_node **parentp) { struct pat_node *cur, **edge; unsigned bit_ndx, v; int ofs; cur = db->pat_root; ofs = -1; do { bit_ndx = cur->bit_ndx; if (bit_ofs(bit_ndx) != ofs) { ofs = bit_ofs(bit_ndx); v = ofs < rec->id_len ? rec->id[ofs] : 0; } edge = ptrs(cur) + (v & cur->bit_mask ? 1 : -1); } while ((cur = *edge) && cur->bit_ndx > bit_ndx); *parentp = (void *)((char *)edge - (v & bit_mask(bit_ndx) ? offsetof(struct pat_node, ptrs_[2]) : offsetof(struct pat_node, ptrs_[0]))); *edgep = edge; return cur; } static inline unsigned first_set_bit_in_char(unsigned v) { return ffs(v) - 1; } static int find_first_diff_bit(struct uidl_record const *r0, struct uidl_record const *r1) { struct uidl_record const *long_id; unsigned ofs, max; unsigned char v; max = r0->id_len; if (max > r1->id_len) { max = r1->id_len; long_id = r0; } else long_id = r1; ofs = 0; do v = r0->id[ofs] ^ r1->id[ofs]; while (!v && ++ofs < max); if (!v) { if (r0->id_len == r1->id_len) return -1; v = long_id->id[ofs]; } return first_set_bit_in_char(v) + ofs * 8; } static inline int record_id_equal(struct uidl_record const *r0, struct uidl_record const *r1) { return r0->id_len == r1->id_len && memcmp(r0->id, r1->id, r0->id_len) == 0; } static struct uidl_record *append_to_list(struct uidl_record **recp, struct uidl_record *rec) { while (*recp) recp = &(*recp)->next; *recp = rec; rec->next = NULL; return rec; } static struct pat_node *get_pat_node(struct uidl_record *rec) { struct pat_node *np; np = xmalloc(sizeof(*np)); np->rec = rec; rec->next = NULL; return np; } static struct pat_node *get_standalone_node(struct uidl_record *rec) { struct pat_node *np; np = get_pat_node(rec); np->bit_ndx = first_set_bit_in_char(*rec->id); np->bit_mask = bit_mask(np->bit_ndx); return np; } static inline unsigned bit_set(unsigned bit_ndx, struct uidl_record const *rec) { unsigned ofs; ofs = bit_ofs(bit_ndx); if (ofs >= rec->id_len) return 0; return rec->id[ofs] & bit_mask(bit_ndx); } static inline struct pat_node *walk_up(unsigned diff_ndx, struct pat_node **parent) { struct pat_node *p, *np; np = *parent; while ((p = *ptrs(np)) && p->bit_ndx > diff_ndx) np = p; *parent = p; return np; } static struct uidl_record *pat_insert(struct uidl_db *db, struct uidl_record *rec) { struct pat_node *np, *closest, *parent, **edge; int me, bit_ndx; if (!db->pat_root) { np = get_standalone_node(rec); ptrs(np)[-1] = *ptrs(np) = NULL; ptrs(np)[1] = np; db->pat_root = np; return rec; } closest = walk_down(db, rec, &edge, &parent); if (closest) { bit_ndx = find_first_diff_bit(closest->rec, rec); if (bit_ndx < 0) return append_to_list(&closest->rec->next, rec); np = get_pat_node(rec); np->bit_ndx = bit_ndx; np->bit_mask = bit_mask(bit_ndx); } else np = get_standalone_node(rec); if (parent->bit_ndx > np->bit_ndx) { closest = walk_up(np->bit_ndx, &parent); if (!parent) edge = &db->pat_root; else edge = ptrs(parent)[-1] == closest ? ptrs(parent) - 1 : ptrs(parent) + 1; *ptrs(closest) = np; } *edge = np; *ptrs(np) = parent; me = bit_set(np->bit_ndx, rec) ? 1 : -1; ptrs(np)[me] = np; ptrs(np)[-me] = closest; return rec; } static struct uidl_record *get_uidl_record(char const *id, unsigned status) { struct uidl_record *rec; unsigned id_len; rec = xmalloc(sizeof(*rec)); id_len = strlen(id); rec->id = memcpy(xmalloc(id_len + 1), id, id_len + 1); rec->id_len = id_len; rec->status = status; rec->num = 0; return rec; } static struct uidl_record *insert_into_records(struct uidl_db *db, struct uidl_record *rec) { unsigned next, want; next = db->records_next; if (next == db->records_max) { want = db->records_max *= 2; db->records = xrealloc(db->records, want * sizeof(rec)); } rec->pos = next; db->records[next] = rec; db->records_next = next + 1; return rec; } struct uidl_record *insert_uidl(struct uidl_db *db, char const *id, unsigned status) { struct uidl_record *rec; rec = insert_uidl_noindex(db, id, status); return pat_insert(db, rec); } struct uidl_record *insert_uidl_noindex(struct uidl_db *db, char const *id, unsigned status) { struct uidl_record *rec; rec = get_uidl_record(id, status); return insert_into_records(db, rec); } void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, unsigned num) { unsigned want, have; have = db->num_ndx_max; if (num >= have) { want = num + MIN_RECORDS; db->num_ndx = xrealloc(db->num_ndx, want * sizeof(*db->num_ndx)); db->num_ndx_max = want; do db->num_ndx[--want] = NULL; while (want > have); } db->num_ndx[num] = rec; rec->num = num; } struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id) { static unsigned step_sum, searches; struct pat_node *np; struct uidl_record *rec; unsigned v, bit_ndx, len; int ofs; len = strlen(id); np = db->pat_root; if (np) { ofs = -1; do { bit_ndx = np->bit_ndx; if (bit_ofs(bit_ndx) != ofs) { ofs = bit_ofs(bit_ndx); v = ofs < len ? id[ofs] : 0; } np = ptrs(np)[v & np->bit_mask ? 1 : -1]; } while (np && np->bit_ndx > bit_ndx); if (!np) return NULL; rec = np->rec; return rec->id_len == len && memcmp(id, rec->id, len) == 0 ? rec : NULL; } #if 0 report(stderr, GT_("id search in unindexed db\n")); #endif ofs = db->records_next; while (ofs) { rec = db->records[--ofs]; if (rec->id_len == len && memcmp(rec->id, id, len) == 0) return rec; } return 0; } struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id) { struct uidl_record *rec; rec = find_uidl_by_id(db, id); if (!rec) return NULL; while (rec->next) rec = rec->next; return rec; } static void free_uidl_list(struct uidl_record *rec) { if (rec->next) free_uidl_list(rec->next); free(rec->id); free(rec); } static void free_pat_trie(struct pat_node *np) { struct pat_node *child; child = ptrs(np)[-1]; if (child && child->bit_ndx > np->bit_ndx) free_pat_trie(child); child = ptrs(np)[1]; if (child->bit_ndx > np->bit_ndx) free_pat_trie(child); free_uidl_list(np->rec); free(np); } void free_uidl_db(struct uidl_db *db) { struct uidl_record *rec; unsigned ndx; if (db->pat_root) free_pat_trie(db->pat_root); free(db->records); free(db->num_ndx); } /* test code */ int main(void) { struct uidl_db db; struct uidl_record *rec; unsigned n; char buf[80]; init_uidl_db(&db); while (gets(buf)) insert_uidl(&db, buf, 0); n = 1000; while (n--) { fseek(stdin, 0, 0); while (gets(buf)) { rec = find_uidl_by_id(&db, buf); if (!rec || strcmp(rec->id, buf) != 0) fprintf(stderr, "not found\n"); } } return 0; } --------------------------------------------- |
From: Matthias A. <mat...@gm...> - 2010-05-18 21:05:51
|
Am 18.05.2010, 19:48 Uhr, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: >> Am 16.05.2010, 23:33 Uhr, schrieb Rainer Weikusat: >>> The trie implementation is meanwhile complete > > [...] > >> I'm really looking forward to your code :-) > > I hope that I've now also found the 'obvious' optimizations (even with > 'full' tail pointers, save_str is the third most expensive operation > accomplished by fetchmail, so increasing insert times potentially > matters a lot). I am going to start with integrating the uidl db > implementation I have with the POP3 code now. If this was ok, I would > like to send an 'advance' version of the code (basically, the > 'stand-alone test program' I have used so far) to the list in order to > enable some third-party review. The code is, however, presently > completely comment-free (could be added to some reasonable degree, eg > no 'fuck me gently with a chainsaw'-style witticisms) and presumably > somewhat dense to people who are not used to regard 'integers' a > finite, ordered bit-sets. Hi Rainer, please do (show an advance/preview version) - if you don't want to expose it to the list yet for whatever reason, but want my comments, feel free to send personal mail. WRT commentary, I think I am with Stroustrup here, meaning that the commentary is good to fill in what *cannot* be better said in code. Instead, comments should provide a general idea of what a piece of code does, and possibly describe the interfaces and return values, at least the special cases (if any) such as NULL, -1, 0, ... descriptive and accurate variable and function names also reduce the need for commentary in epic breadth. :-) Best regards -- Matthias Andree |