Thread: Re: [fetchmail-devel] imap: introduce imapsearchmax parameter

Client daemon to move mail from POP and IMAP to your local computer

Brought to you by: m-a, question42, robfunk

fetchmail-devel

Re: [fetchmail-devel] imap: introduce imapsearchmax parameter

From: Sunil S. <sun...@ro...> - 2011-04-30 18:15:20

--- On Fri, 29/4/11, Andrea Righi <rig...@gm...> wrote:
> > > > 1. Parse the response to the SELECT command:

...

> A0001 SELECT "INBOX"
> * FLAGS (\Answered \Flagged \Draft \Deleted \Seen)
> * OK [PERMANENTFLAGS (\Answered \Flagged \Draft \Deleted
> \Seen \*)]
> * OK [UIDVALIDITY 2]
> * 518299 EXISTS
> * 0 RECENT
> * OK [UIDNEXT 652168]
> A0001 OK [READ-WRITE] INBOX selected. (Success)

Parsing will not give the unseen count information for you. Also, I 
realised later that this method will not work when 'idle' is enabled.

> > > > 2. Send a STATUS command

This may not work correctly on open folders.

> > > > 3. Send an extended SEARCH command

This is not supported by your server.

So, at the IMAP protocol level, we have run out of options.

I will check if the fetchmail parsing can be improved directly in 
fetchmail itself.

The following is a recap of the problem:

==========================================================================
fetchmail gets the unseen message list using the SEARCH command. There is 
no standard mechanism to get the count and range of unseen messages.

fetchmail currently reads the response to the SEARCH command in a buffer 
of size MSGBUFSIZE=8k. In a normal case when there are only a few new 
mails, the response fits nicely in this buffer:

IMAP> A0010 SEARCH UNSEEN
IMAP< * SEARCH 499998 499999 500000
IMAP< A0010 OK SEARCH completed

However, when there are too many unseen mails, the buffer response gets 
truncated (possibly in the middle of a number).

IMAP> A0011 SEARCH UNSEEN
IMAP< * SEARCH 490001 490002 490003 ... 499998 499999 500000
IMAP< A0011 OK SEARCH completed

IMAP_SEARCH_MAX=1k (which in reality should be a function of MSGBUFSIZE 
and count) tries to avoid the truncation by searching for unseen mail in 
batches.

IMAP> A0012 SEARCH 1:1000 UNSEEN
IMAP> A0013 SEARCH 1001:2000 UNSEEN
IMAP> A0014 SEARCH 2001:3000 UNSEEN

The problems with this approach are:

- In the most common case, there are very few unseen mails. This method of 
searching is inefficient in those cases.

- If the mailbox size is huge (say having 500k mails), fetchmail gets 
caught in an unproductive cycle.
==========================================================================

-- 
Sunil Shetye.

Re: [fetchmail-devel] [PATCH] imap: introduce imapsearchmax parameter

From: Sunil S. <sun...@ro...> - 2011-05-03 06:05:27

Attachments: 0001-Call-strlen-only-once-when-removing-CRLF-from-a-line.patch 0002-Do-not-search-for-UNSEEN-messages-in-ranges.patch

Hi,


----- Original Message -----
> So, at the IMAP protocol level, we have run out of options.
> 
> I will check if the fetchmail parsing can be improved directly in 
> fetchmail itself.

Attached are the patches which should solve this issue.

The first patch has nothing to do with this issue. It only removes excess calls to strlen().

The second patch works by sending a "SEARCH UNSEEN" command and reading the response in batches (when required).

Andrea, please test the patches for your mailbox.

Matthias, please evaluate the patches for incorporation in fetchmail.

-- 
Sunil Shetye.

[fetchmail-devel] fetchmail fix coordination and future planning (was: [PATCH] imap: introduce imapsearchmax parameter)

From: Matthias A. <mat...@gm...> - 2011-05-03 15:12:23

Am 03.05.2011 06:05, schrieb Sunil Shetye:
> Hi,
> 
> 
> ----- Original Message -----
>> So, at the IMAP protocol level, we have run out of options.
>>
>> I will check if the fetchmail parsing can be improved directly in 
>> fetchmail itself.
> 
> Attached are the patches which should solve this issue.
> 
> The first patch has nothing to do with this issue. It only removes excess calls to strlen().
> 
> The second patch works by sending a "SEARCH UNSEEN" command and reading the response in batches (when required).
> 
> Andrea, please test the patches for your mailbox.
> 
> Matthias, please evaluate the patches for incorporation in fetchmail.

Hi Sunil,

the first patch is easy, accepted, thank you.


As so the second (with _split and caching), I wonder if it's the right
approach. It looks a bit as though we're curing the symptoms instead of
the cause.  The cause is a buffer too small to handle the response, and
the cure would be to make the buffer dynamic and possibly resize it if
we don't get a \r?\n in it.


A dynamic buffer could also save us the strcpy(argbuf, buf) copy which,
IMO, is quite wasteful.


What I propose is to have three branches, details below the list:

- one for 6.3.X (new name to be devised), for a final release and
possibly later critical/security fixes

- one for a new 6.4.X or 7.X.X branch (master) where such fixes as the
IMAP search ranges can be made

- one for the future 7.X.X or 8.X.X branch (next) for radical changes
such as the C++ migration and major structural changes.


Details:

1 - create a side branch from master for 6.3.X, revert the SSLv2
removal, fix STARTTLS timeouts (the code is almost completed) and clear
up the MD5 mess.  Release 6.3.20 from the side branch.


2 - master advances to 7.0.X, sporting feature removals from the "next"
branch (7.0.0-alpha2) such as IMAP2, POP2, SSLv2, support for pre-SUSv3
systems and other cleanups, but not the C++ migration.  Revamp the
socket handling, using dynamic buffers, use Rainer's patch to speed up
POP3 UID handling with PATRICIA data structures (this gives us the edge
over getmail ;-))

  - The SSL handling is not done properly, but needs to be.
Look at the _ssl_ctx and all that stuff, we're wasting humongous amounts
of static memory (FD_SETSIZE is 1024 on my system, and sizeof(SSL) is
512, so we waste 512 kibiB of RAM just for SSL support) just because the
socket code sticks to simple ints, rather than using a proper structure
of our own.



3 - next advances to 8.0.X including a C++ migration, with major overhaul:

  - I have started converting the code (on the "next" branch in Git) to
C++ with Standard Template Library and Boost.  Boost is, for the biggest
part, a headers-only library, so it doesn't create additional run-time
dependencies, and it is peer-reviewed by the C++ gurus and often the
basis for future standards <http://www.boost.org/>. IMAPCapa* is an
example of how compact the code is.

    C++ features such as the string type, or STL containers such as
vector, map, set, list/deque/stack; or Boost functions for intervals,
will greatly simplify the code.  As will exceptions.


  - I plan to support UID-based fetches in IMAP too, so we can fetch
from READ-ONLY folders and need no longer tamper with \Seen flags.

    In that case, UIDs are strictly monotonically increasing (unless
UIDVALIDITY has changed), so we know for certain where we need to start
the SEARCH or FETCH, too, so we avoid empty fetches. (Note Boost's
Interval type!)


  - I plan to support pipelined operation for common protocols to avoid
the roundtrip delays incurred through lock-step behaviour.  It's not
possible everywhere (for instance, when doing a binary search for POP3
UIDLs), but should help maxing out the wires or WiFi even in a single
thread once we get to downloading.


  - I want to clean up the code to an extent where we can ultimately
fetch from several accounts in parallel.


Opinions, comments, remarks, concerns, ideas are solicited.

Thanks.

Best regards,
Matthias

Re: [fetchmail-devel] fetchmail fix coordination and future planning (was: [PATCH] imap: introduce imapsearchmax parameter)

From: Sunil S. <sun...@ro...> - 2011-05-03 20:25:19

Hi Matthias,



----- Original Message -----
> As so the second (with _split and caching), I wonder if it's the right
> approach. It looks a bit as though we're curing the symptoms instead of
> the cause.  The cause is a buffer too small to handle the response, and
> the cure would be to make the buffer dynamic and possibly resize it if
> we don't get a \r?\n in it.
> 
> 
> A dynamic buffer could also save us the strcpy(argbuf, buf) copy which,
> IMO, is quite wasteful.

I had given a thought for implementing dynamic buffers. However, some issues came to my mind:

- What if the buffer required is very long? For example, an extreme response like:

IMAP< * SEARCH 1 2 3 4 5 6 ... 999995 999996 999997 99998 99999 100000

will require a dynamic buffer of size 575k. Is it okay to allocate such a huge buffer?

- Who will free the buffer? Obviously, it will have to be statically allocated in order to free it later.

- How long will it remain? Is it safe to free it immediately on the next call?

In short, without a proper memory manager, it may not be safe to just allocate a dynamic buffer of such a huge size.


Note that the strcpy(argbuf, buf) can still be avoided by directly passing argbuf (if available) to gen_recv().


Also, note that that patch is a bug fix. The current method range search for UNSEEN mails is too slow for huge mailboxes. So, a fix for this issue is required for 6.3.X.


> What I propose is to have three branches, details below the list:
> 
> - one for 6.3.X (new name to be devised), for a final release and
> possibly later critical/security fixes
> 
> - one for a new 6.4.X or 7.X.X branch (master) where such fixes as the
> IMAP search ranges can be made

I think, it is necessary to fix the range search issue in 6.3.X itself as fetchmail is now unusable with IMAP when the mailbox is huge and keep is on.

-- 
Sunil Shetye.

Re: [fetchmail-devel] fetchmail fix coordination and future planning

From: Matthias A. <mat...@gm...> - 2011-05-04 01:14:29

Am 03.05.2011 20:25, schrieb Sunil Shetye:
> Hi Matthias,
> 
> 
> 
> ----- Original Message -----
>> As so the second (with _split and caching), I wonder if it's the right
>> approach. It looks a bit as though we're curing the symptoms instead of
>> the cause.  The cause is a buffer too small to handle the response, and
>> the cure would be to make the buffer dynamic and possibly resize it if
>> we don't get a \r?\n in it.
>>
>>
>> A dynamic buffer could also save us the strcpy(argbuf, buf) copy which,
>> IMO, is quite wasteful.
> 
> I had given a thought for implementing dynamic buffers. However, some issues came to my mind:
> 
> - What if the buffer required is very long? For example, an extreme response like:
> 
> IMAP< * SEARCH 1 2 3 4 5 6 ... 999995 999996 999997 99998 99999 100000
> 
> will require a dynamic buffer of size 575k. Is it okay to allocate such a huge buffer?

Possibly yes, although some limit would be in order.

Do we really need to optimize 6.3.X with obtrusive changes?  This will
make merging between master and next harder.

> - Who will free the buffer? Obviously, it will have to be statically allocated in order to free it later.
> 
> - How long will it remain? Is it safe to free it immediately on the next call?

> In short, without a proper memory manager, it may not be safe to just allocate a dynamic buffer of such a huge size.

I'll sidestep these for now.

> Note that the strcpy(argbuf, buf) can still be avoided by directly passing argbuf (if available) to gen_recv().

OK. It's a separate microoptimization anyways, not worth discussing.
Someone will do it ;-)

> Also, note that that patch is a bug fix. The current method range search for UNSEEN mails is too slow for huge mailboxes. So, a fix for this issue is required for 6.3.X.

It may be painful, but if it works, it's not a bug unless the
documentation promises otherwise (which it does not).

For a 25,000 message mailbox with 1% unseen, the amount of SEARCH
transactions seems to be reasonably low.  I think POP3 + uidl + keep in
that situation would burn something between 3 and 30 seconds CPU time on
a Phenom II 2.5 GHz CPU w/ GCC 4.5 compiled code while chewing on UIDLs.
The fix was contributed by Rainer Weikusat and is in the next branch and
in 7.0.0-alpha2.

>> What I propose is to have three branches, details below the list:
>>
>> - one for 6.3.X (new name to be devised), for a final release and
>> possibly later critical/security fixes
>>
>> - one for a new 6.4.X or 7.X.X branch (master) where such fixes as the
>> IMAP search ranges can be made
> 
> I think, it is necessary to fix the range search issue in 6.3.X itself as fetchmail is now unusable with IMAP when the mailbox is huge and keep is on.

Try POP3 UIDL on a mailbox with the same message (86000) count and let
me know if you still think IMAP is "unusable".  Hint: check what Rainer
and I have been benchmarking when he contributed the PATRICIA
implementation.  I've avoided it in 6.3.X because I have portability
concerns.

I will take your patch in spite of my concerns, and test it.

I suggest that you check the manual pages of strlcpy and strlcat, these
are alternatives to snprintf(buf, siz, "%s%s", s1, s2).  glibc is one of
the major distros lacking it, but we ship replacements.  Also note that
<strings.h> has strcasecmp and, more usefully, strncasecmp, and we have
our replacements for OSes lacking them, too.

Re: [fetchmail-devel] fetchmail fix coordination and future planning

From: Sunil S. <sun...@ro...> - 2011-05-04 08:46:36

----- Original Message -----

> Observations:
> 
> - the patch makes an undocumented assumption that the buffer for a
> second call to gen_recv_split is always large enough to copy the
> remainder. If it isn't, you lose data.  Doesn't happen in fetchmail now,
> but is a maintenance concern, so I'll add an assert() for now.
> 
> - you don't guard the prefix length - I've added that.
> 
> - you don't check if the data-to-be-cached fits into the rs structure -
> I've added that.

Good. All required.

> - /** foo */ isn't a typo, but a marker for Doxygen so it actually looks
> at the comment for documentation, and extracts it. Also note the
> ordering of these comments versus the comma; alternatively you can write
> /**< foo */ to document the PREVIOUS argument in a prototype.

Okay. Note that the prototype was just copied from gen_recv() and a parameter was added later.

>     Fixup: match prefix caseblind, add some guards, streamline phase
> handling.

Required.

>     Add a few asserts to catch abuse, and use strlcpy/strlcat rather than
>     snprintf because some implementations of the latter are unsuitable for
>     detecting buffer exhaustion.

Required.

>     Fixup: remove unused variables.

How did I miss that?

Thanks for the review.

-- 
Sunil Shetye.

Re: [fetchmail-devel] fetchmail fix coordination and future planning

From: Matthias A. <mat...@gm...> - 2011-05-04 12:24:39

Am 04.05.2011 08:46, schrieb Sunil Shetye:

> Okay. Note that the prototype was just copied from gen_recv() and a parameter was added later.

Oops :) I've fixed the transact.c documentation for Doxygen, and fixed
some unsafe macro expansions (missing parentheses) along the way.

Re: [fetchmail-devel] [PATCH] fetchmail fix coordination and future planning

From: Sunil S. <sun...@ro...> - 2011-05-04 12:50:23

Attachments: 0001-correct-call-to-gen_send.patch

----- Original Message -----

> And it went unnoticed possibly because you use different compiler flags
> (I routinely compile with -pedantic -Wall and often also with -Wextra or
> -W, and I routinely also try Intel C++ and Clang, and I take care to
> have at least -O1 or usually -O2 set -- my current computer is fast
> enough and compiles fetchmail (after configure) in < 2.5 s real time.)

Is there a way of enabling those CFLAGS flags as well as the configure flags by default? Every time I switch machines and checkout fetchmail again (which is rare), I seem to miss out setting the same set of flags again.

I have enabled the warnings now. I have also added -Wformat=2 and got some warnings.

Attached is a minor patch to fix those warnings.

-- 
Sunil Shetye.

Re: [fetchmail-devel] [PATCH] fetchmail fix coordination and future planning

From: Matthias A. <mat...@gm...> - 2011-05-04 15:53:55

Am 04.05.2011 12:50, schrieb Sunil Shetye:
> ----- Original Message -----
> 
>> And it went unnoticed possibly because you use different compiler flags
>> (I routinely compile with -pedantic -Wall and often also with -Wextra or
>> -W, and I routinely also try Intel C++ and Clang, and I take care to
>> have at least -O1 or usually -O2 set -- my current computer is fast
>> enough and compiles fetchmail (after configure) in < 2.5 s real time.)
> 
> Is there a way of enabling those CFLAGS flags as well as the configure flags by default? Every time I switch machines and checkout fetchmail again (which is rare), I seem to miss out setting the same set of flags again.

Hm. These are compiler-specific, so we'd have to test explicitly.

> I have enabled the warnings now. I have also added -Wformat=2 and got some warnings.
> 
> Attached is a minor patch to fix those warnings.

Good catch, although I have the nasty feeling I've fixed similar bugs in
other .c files before. I wonder how these slipped past my attention...

Re: [fetchmail-devel] fetchmail fix coordination and future planning

From: Matthias A. <mat...@gm...> - 2011-05-04 02:08:45

Am 04.05.2011 01:14, schrieb Matthias Andree:

> I will take your patch in spite of my concerns, and test it.

Observations:

- the patch makes an undocumented assumption that the buffer for a
second call to gen_recv_split is always large enough to copy the
remainder. If it isn't, you lose data.  Doesn't happen in fetchmail now,
but is a maintenance concern, so I'll add an assert() for now.

- you don't guard the prefix length - I've added that.

- you don't check if the data-to-be-cached fits into the rs structure -
I've added that.

- /** foo */ isn't a typo, but a marker for Doxygen so it actually looks
at the comment for documentation, and extracts it. Also note the
ordering of these comments versus the comma; alternatively you can write
/**< foo */ to document the PREVIOUS argument in a prototype.

My changes have been pushed out to the Git master branch, please check
my two Fixup commits:

commit 646f36e1c1369fd65c0d641cae48fa4425613462
Author: Matthias Andree <mat...@gm...>
Date:   Wed May 4 02:02:30 2011 +0200

    Fixup: match prefix caseblind, add some guards, streamline phase
handling.

    Add a few asserts to catch abuse, and use strlcpy/strlcat rather than
    snprintf because some implementations of the latter are unsuitable for
    detecting buffer exhaustion.

M       transact.c

commit d7d43f53e1da9d5c57961ea26fefa609de1e30e7
Author: Matthias Andree <mat...@gm...>
Date:   Wed May 4 01:58:46 2011 +0200

    Fixup: remove unused variables.

M       imap.c

Re: [fetchmail-devel] fetchmail fix coordination and future planning

From: Matthias A. <mat...@gm...> - 2011-05-04 11:07:42

On 04.05.2011 08:46, Sunil Shetye replied:

>>     Fixup: remove unused variables.
> 
> How did I miss that?

Hi Sunil,

As fetchmail 6.3.X is C89-style, variable declarations are far away from
their use, so missing "first/last" was easy. Add to that the humongous
text length of some code parts...

And it went unnoticed possibly because you use different compiler flags
(I routinely compile with -pedantic -Wall and often also with -Wextra or
-W, and I routinely also try Intel C++ and Clang, and I take care to
have at least -O1 or usually -O2 set -- my current computer is fast
enough and compiles fetchmail (after configure) in < 2.5 s real time.)

> Thanks for the review.

Thanks for the counter-review.

Best regards
Matthias