From: Matthias A. <mat...@gm...> - 2010-05-12 00:34:27
|
Rainer Weikusat schrieb am 2010-05-11: > I apologize for the misdirected mail ... 'wrong button' No big deal. > > Sounds good, although I'm not quite sure if you're working on two distinct > > issues here (DIMAP and POP3 improvements). > > I do. IMAP UIDs are defined to be strictly monotonically increasing > unsigned 32 bit numbers (eventual wrap-around to be signalled via > UIDVAILDITY change) and for the application I need this for, it is ok > to not apply 'policy changes' (such as a changed size limit) > retroactively. This means that I basically just record the highest > 'seen' UID + 1 and then do a SEARCH UID <recorded>:<UIDNEXT> to get > information about yet 'unseen' mails. A changed size limit only hurts if it means that more mail would be downloaded than before, and in that case, you can as well invalidate UIDVALIDITY (at the cost of re-fetching everything). OTOH the POP3 UID stuff could then be reused to do an exhaustive search, and we will finally be able to fetch from read-only mailboxes either way in --keep configurations. > > I have plans to attack POP3/UIDL scalability myself soon -- I could > > devote my time in other ways if I know your work covers that and > > reduces complexity to O(n log n) worst-case. > > I actually started to write some code for this. So far, I was planning > to use two vectors to be able to access records either by position or > by 'associated fetch number' and to do ID searches based on a PATRICIA > trie (reasonably easy to implement and I've already done a few in the > past). This would imply that searching for a string of n characters > would, at worst, need n * 8 single bit tests[*] (and a final complete > comparison). Based on my present oprofile numbers[**] (based on 101 > .fetchids files with an average length of ~303 lines), the time spent > with loading the file data is IMHO not large enough to warrant using a > disk-based database (and this would be complicated because of the need > to be able to find 'UIDL records' based on three different critetria). Hadn't thought of Patricia tries, actually a decent idea (yet possibly what a C++ STL map type might do under the hood if the key is suitable). I wonder how we handle the case if two messages (i. e. different message number) that share the same UIDL (this happens on some sites if the recipient is listed more than once, or is being Cc:d mailing list mail, and if the POP3 server uses a hash of few header items to construct the UIDL). Such appearance in a mailbox happens in the wild and should not be treated an error that would abort the fetch. getmail used to fail in such situations, we don't want to follow it down that route... At any rate, this looks quite reasonable overall, so I'll keep my hands off that code for the next days to not step on your toes. I'm looking forward to it. Good speed! Thanks & best regards Matthias |