Re: [fetchmail-devel] Some PATRICIA numbers

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Am 20.05.2010 22:34, schrieb Rainer Weikusat:
> "Matthias Andree" <mat...@gm...> writes:
> 
> [...]
> 
>> Looks quite alright at first glance. As a general remark, I wonder if
>> it  would be worthwhile splitting the PATRICIA core code from the
>> fetchmail  interfacing code.
> 
> My 'general' opinion on this is that abstractions aren't free and
> creating them 'just in case' should rather be avoided. 

OK.

> For the sake of completeness: 'implicit int' refers to not declaring a
> function return value which traditionally caused compilers to assume
> that the function would return a value of type 'int'. This is no
> longer valid C, as of ISO/IEC 9899:1999 (E). OTOH, the set of valid
> type specifiers and type specifier combinations is defined in 6.7.2
> and includes the following text:
> 
> 	Each list of type specifiers shall be one of the following
> 	sets (delimited by commas, when there is more than one set on
> 	a line);
> 
>         [...]
> 
>         -- int, signed, or signed int
> 	-- unsigned, or unsigned int
>         [6.7.2|2]
> 
> Meaning, not only is 'unsigned' a completely legitimate way to declare
> an unsigned integer but actually, 'signed' is also valid instead of
> 'int'. That said, I am completely willing to comply with any 'style
> rules' you consider to be essential.

I stand corrected.  In the light of your quote from the C99 standard,
please ignore my previous comment in this regard. I was indeed confusing
function return arguments with variable types.  Sorry!

It was less of a style than rather a technical (namely, portability)
argument, which is now void.

Also, I don't think I'm currently concerned with style - fetchmail
source is using different styles... as long as it remains concise, it
will be fine.

If you're familiar with Doxygen (Javadoc), writings docs in .h files in
a format that Doxygen can parse would be nice, but no need to get
acquainted with it if you prefer a different style. Anything sensible is
better than no documentation. :-)

> General (so that it's before the 'long quoted text'): As I wrote,
> integration of this code with fetchmail was completed yesterday and
> the resulting binary has survived a night of pulling e-mail from my
> account. The most expensive part is now the lexer. I am going to run

So you're essentially saying that the rcfile scanner (lexer) dominates
execution time?  That's a good thing for fetchmail, and a bad thing for
its scanner.  The practical workaround is daemon mode, which would only
read the rcfile on startup and if it changes.

We could have flex inflate its tables so it gets faster, but this easily
goes in the hundreds-of-kbytes for fetchmail's rather complex lexer.

> this for another night tonight, this time with fastuidl
> enabled. Should this also go well, I will most likely move this onto
> our 'production server' tomorrow. Except a few bits of missing
> functionality (such as a traversal routine), the most notable change I

Just wondering, how are you currently writing out the UIDs without a
traversal routine?

> have done is that I have replaced the recursive trie-freeing routine
> with an iterative one: While this is - at best - only marginally
> faster (for my test dataset), it has the nice property of needing no
> additional storage beyond what's already used for the trie itself.

Good.

> I've also had an idea how to restrict the size of the num_ndx array to
> what is actually needed: Since the maximum message number is known in
> advanced, the vector can grow 'downward' such that position zero is
> assumed to hold the largest possible message number and lower indices
> are then inserted sufficiently distant from that with the array being
> extended as required. Lookup would then use negative offsets relative
> to a pointer to the end of the array, adjusted according to the
> 'logical position' of the end of the array relative to the message
> number 1. I think I will get this implemented tomorrow and could then
> basically produce a patch relative to 6.3.17 which could serve as
> base for further changes.

I'm not sure if it's worth the effort of redoing the array handling just
to shave off a dozen of realloc() calls.  Keep maintainability in mind;
in my professional live, I've been debugging somebody else's (rather
awkward [1]) code for many months now, and I'd say maintainability wins
over a few % of execution speed.

[1] That other code, entirely unrelated to fetchmail, often follows copy
& paste & modify one line programming style, which is very error-prone...

>> $ gcc -O -Wc++-compat -std=c99 -Wall -Wextra -pedantic \
>>       -o uidl_db uidl_db.c  -Wno-unused -Dinline=__inline__
>> # -Dinline for -std=c89 :-)
>> uidl_db.c: In function ‘init_uidl_db’:
>> uidl_db.c:63: warning: request for implicit conversion from ‘void *’
>> to  ‘struct uidl_record **’ not permitted in C++
> 
> This is just my usual laziness :-): Since void * is kind of a
> passepartout type in C, I usually just cast to void * whenever any
> pointer conversion is needed. I've changed that to use the actual
> destination type.

I was also too lazy to remove that from the compiler logs :)
I'll be happy to fix this myself should I go for C++ later.

>> uidl_db.c: In function ‘walk_down’:
>> uidl_db.c:83: warning: comparison between signed and unsigned integer
>> expressions
>> uidl_db.c:85: warning: comparison between signed and unsigned integer
>> expressions
> 
> This was due to the ofs variable being of type int. I've changed that
> to unsigned, which didn't matter for the code. This will, of course,

Yup. The only thing is that loops need to be written more carefully
because ">= 0" or thereabouts now always holds.  I think that's what you
meant by "didn't matter for the code".

> break horribly if UIDs longer than 2^32 - 1 chars are supposed to be
> processed (or if bit indices higher than 2^31-1 are needed) but I
> assume that both propositions are not exactly realistic (the POP3
> specification already prohibits this: Maximum 'payload' of a response
> line is 510 octets, according to RFC1939, sec. 3).

I propose to document that somewhere in a quiet place in fetchmail.man,
but to not even add code checks for that - a FIXME comment will be
sufficient IMO.

>> uidl_db.c:314: warning: ‘v’ may be used uninitialized in this function
>> /tmp/cc6WZFTd.o: In function `main':
> 
> I don't usually add spurious initializations just to silence these
> warnings if they are obviously wrong. 

GCC is also sometimes off track, hadn't checked any of the warnings
except for the ffs().

>> uidl_db.c(92): warning #1684: conversion from pointer to same-sized
>> integral type (potential portability problem)
>>   					offsetof(struct pat_node, ptrs_[2])
>>   					^
> 
> I do not have the slightest idea what this is supposed to refer to.

I'd have to see Intel C++'s stddef.h to see what's really up, but this
looks like a compiler issue (either way, as stddef.h is a compiler
provided header).

A pointer difference *is* an integral type; quoting POSIX (which is more
specific as to the return type than C99 - your quote below - is):

       offsetof(type, member-designator)

              Integer  constant  expression of type size_t, the value of
which
              is the offset in bytes to the structure member
(member-designa‐
              tor), from the beginning of its structure (type).

But it's indeed a portability problem. Proper code is not portable to
Intel C++.  8-)

> [...]
> 
>> uidl_db.c(119): remark #2259: non-pointer conversion from "int" to
>> "unsigned char" may lose significant bits
>>   	v = r0->id[ofs] ^ r1->id[ofs];
>>   	  ^
> 
> Since both operands come from arrays of char and are only insofar of
> type 'int' as the values are supposed to be promoted to int, this
> seems to be an extremly poor warning.

If the sources are "char" (whatever sign), it's safe. Looks like a
compiler defect.  I need to re-test with the latest patch release, and
if it persists I need to regain access to the Intel site so I can file a
bug report.

>> uidl_db.c(244): warning #2259: non-pointer conversion from
>> "size_t={unsigned long}" to "unsigned int" may lose significant bits
>>       id_len = strlen(id);
>>              ^
> 
> I would rather keep this as 'unsigned' because size_t may be a 64-bit
> type on 64-bit systems and this seems exessive for representing the
> length of something restricted to less than 510 octets.

Seems this warning is in order and size_t would be OK, or at least
checking against UINT_MAX as a fall-back.  Probably not in this piece of
the code though, but in parseuid - reject (skip) overlong UIDs. I have
yet to see a site that comes even close to the 70 characters that POP3
allows :-)

what's wrong with size_t?  On 64-bit machines, the other four bytes
come at no extra cost.  In doubt someone will be padding anyways on
those computers. :)

> 
> [...]
> 
>> uidl_db.c:92:6: warning: using extended field designator is an
>> extension  [-pedantic]
>>                                         offsetof(struct pat_node, ptrs_[2])
>>                                         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
> 
> The relevant text of the C-standard is
> 
> 	offsetof(type, member-designator)
> 
>         [...]
> 
>         The type and member designator shall be such that given
>         
>         	static type t;
>                 
> 	then the expression &(t.member-designator) evaluates to an
> 	address constant.
> 	[7.17|3]
> 
> Assuming the following code:
> 
> 	static struct pat_node t;
> 
> &(t.ptrs_[2]) certainly evaluates to an address constant, so this
> appears to either be a traditional restriction or a C89-one which
> accidentally escaped into LLVM/clang using someone remembering it as
> 'transport medium'.

Open issue for me, might be a clang bug, or a configuration issue. Will
have to check with an SVN version and file a report if it is a bug.
Same computer as for Intel C++, it's powered down, and I'm 100 km away.

Other than that, I'm really looking forward to the code.

Thanks a lot.

Best regards
Matthias

Re: [fetchmail-devel] Some PATRICIA numbers

Client daemon to move mail from POP and IMAP to your local computer

Re: [fetchmail-devel] Some PATRICIA numbers