From: Rainer W. <rwe...@ms...> - 2010-05-16 23:36:15
|
The trie implementation is meanwhile complete and I have done some preliminary benchmarks. I fear that the 'insert' cost will turn out to be somewhat higher, but the results for searches are promising. The largest .fetchids file I have here contains 7275 UIDs. Assuming the trie has already been created, iterating over the input file a thousand times and searching for every UID needs 3.93s on average, meaning the average time for a single search is about 5.4E-7 seconds (3Ghz P4). Doing a plain linear search for every UID needs about 16.6 seconds on average[*] or about 2.3E-4 seconds per search. [*] For ten searches. I didn't have the patience to wait for 1000 :-> |
From: Matthias A. <mat...@gm...> - 2010-05-17 09:44:24
|
Am 16.05.2010, 23:33 Uhr, schrieb Rainer Weikusat: > The trie implementation is meanwhile complete and I have done some > preliminary benchmarks. I fear that the 'insert' cost will turn out to > be somewhat higher, but the results for searches are promising. The > largest .fetchids file I have here contains 7275 UIDs. Assuming the > trie has already been created, iterating over the input file a > thousand times and searching for every UID needs 3.93s on average, > meaning the average time for a single search is about 5.4E-7 seconds > (3Ghz P4). Doing a plain linear search for every UID needs about 16.6 > seconds on average[*] or about 2.3E-4 seconds per search. > > [*] For ten searches. I didn't have the patience to wait for > 1000 :-> Hi Rainer, this looks good. Without trying to downplay your efforts, I'd also say that this was somewhat expected. In other words, your numbers confirm that the trie works as expected and does not degenerate into a linear list in your test cases. Insertion into the data structure is fastest for a pre-allocated array, then appending to a list that has a tail pointer. Inserting into trees, tries, heaps, and thereabouts always takes a bit longer because it encloses a search for the node where you'll be forking the branches. It is some sort of "ordered" insertion. This is the price we (happily!) pay for quicker retrievals and searches. The important part here is that, given N UIDs in a list, the costly operations "insert all UIDs" and "for each UID retrieved from the server, check if we've already seen it" are reduced to O(N log N) when they used to cost O(N^2) for larger N (i. e. asymptotically). I'm really looking forward to your code :-) Best regards -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-18 19:48:51
|
"Matthias Andree" <mat...@gm...> writes: > Am 16.05.2010, 23:33 Uhr, schrieb Rainer Weikusat: >> The trie implementation is meanwhile complete [...] > I'm really looking forward to your code :-) I hope that I've now also found the 'obvious' optimizations (even with 'full' tail pointers, save_str is the third most expensive operation accomplished by fetchmail, so increasing insert times potentially matters a lot). I am going to start with integrating the uidl db implementation I have with the POP3 code now. If this was ok, I would like to send an 'advance' version of the code (basically, the 'stand-alone test program' I have used so far) to the list in order to enable some third-party review. The code is, however, presently completely comment-free (could be added to some reasonable degree, eg no 'fuck me gently with a chainsaw'-style witticisms) and presumably somewhat dense to people who are not used to regard 'integers' a finite, ordered bit-sets. |
From: Matthias A. <mat...@gm...> - 2010-05-18 21:05:51
|
Am 18.05.2010, 19:48 Uhr, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: >> Am 16.05.2010, 23:33 Uhr, schrieb Rainer Weikusat: >>> The trie implementation is meanwhile complete > > [...] > >> I'm really looking forward to your code :-) > > I hope that I've now also found the 'obvious' optimizations (even with > 'full' tail pointers, save_str is the third most expensive operation > accomplished by fetchmail, so increasing insert times potentially > matters a lot). I am going to start with integrating the uidl db > implementation I have with the POP3 code now. If this was ok, I would > like to send an 'advance' version of the code (basically, the > 'stand-alone test program' I have used so far) to the list in order to > enable some third-party review. The code is, however, presently > completely comment-free (could be added to some reasonable degree, eg > no 'fuck me gently with a chainsaw'-style witticisms) and presumably > somewhat dense to people who are not used to regard 'integers' a > finite, ordered bit-sets. Hi Rainer, please do (show an advance/preview version) - if you don't want to expose it to the list yet for whatever reason, but want my comments, feel free to send personal mail. WRT commentary, I think I am with Stroustrup here, meaning that the commentary is good to fill in what *cannot* be better said in code. Instead, comments should provide a general idea of what a piece of code does, and possibly describe the interfaces and return values, at least the special cases (if any) such as NULL, -1, 0, ... descriptive and accurate variable and function names also reduce the need for commentary in epic breadth. :-) Best regards -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-19 20:30:58
|
"Matthias Andree" <mat...@gm...> writes: [...] > please do (show an advance/preview version) - if you don't want to > expose it to the list yet for whatever reason, but want my comments, > feel free to send personal mail. I'll happily listen to anyone who has something more constructive to say about this than "I hate it!". This is my last 'standalone test program'. It can be compiled and used to do testing/ measurement of the trie-code based on reading 'unique message IDs' from stdin which must point to 'something seekable'. The interface is known to be incomplete and the halfway integrated version contains everything in here and whatever was additionally needed in order to use this as 'backend' for the code in uid.c. My present approach to doing lookups based on 'message numbers' is very simplistic --- just an array going from 0 .. <highest message number>. There's a potential scalability issue in there: Realloc with a constant increment is O(n^2) when the memory area needs to be copied. If the pop3.c code works as I believe it does (inserts message numbers in increasing order), I will likely change this to additionally store the offset of the first element in the array to keep this shorter for the case when a large mailbox contains a few new messages at the end. I will also change the realloc-calculation to something more sensible (usually, this means doubling the allocated area every time it needs to grow. Any feedback on the implementation would be very welcome. ------------- uidl_db.h ------------------- /* uidl db Copyright (c) 2010 MAD Partners, Ltd. This file is being published according to the terms of the GPLv2 license contained in the 6.3.17 release of fetchmail. */ #ifndef fetchmail_uidl_db_h #define fetchmail_uidl_db_h /* includes */ #include <stddef.h> /* types */ struct uidl_record { char *id; unsigned id_len; unsigned num, status, pos; struct uidl_record *next; }; struct pat_node; struct uidl_db { struct pat_node *pat_root; struct uidl_record **records; unsigned records_max, records_next; struct uidl_record **num_ndx; unsigned num_ndx_max; }; /* routines */ void init_uidl_db(struct uidl_db *db); struct uidl_record *insert_uidl(struct uidl_db *db, char const *id, unsigned status); struct uidl_record *insert_uidl_noindex(struct uidl_db *db, char const *id, unsigned status); void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, unsigned num); struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id); static inline struct uidl_record * find_uidl_by_num(struct uidl_db *db, unsigned num) { return num < db->num_ndx_max ? db->num_ndx[num] : NULL; } static inline struct uidl_record * find_uidl_by_pos(struct uidl_db *db, unsigned pos) { return pos < db->records_next ? db->records[pos] : NULL; } static inline struct uidl_record * first_uidl_in_db(struct uidl_db *db, char const *id) { return find_uidl_by_id(db, id); } struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id); #endif --------------------------------------- ------------------- uidl_db.c ----------------- /* uidl db Copyright (c) 2010 MAD Partners, Ltd. This file is being published according to the terms of the GPLv2 license contained in the 6.3.17 release of fetchmail. */ /* includes */ #include <stdlib.h> #include <stdio.h> #include <string.h> #include "uidl_db.h" /* constants */ enum { MIN_RECORDS = 16 }; /* types */ struct pat_node { struct pat_node *ptrs_[3]; unsigned bit_ndx, bit_mask; struct uidl_record *rec; }; /* The idea behind this is that the 'left' pointer is at -1 and the 'right' pointer at 1. This implies that dealing with 'symmetric cases' for left and right pointers isn't necessary. There's always just 'this pointer' (offset n) and 'the other pointer' (offset -n). */ #define ptrs(p) ((p)->ptrs_ + 1) /* routines */ void *xmalloc(size_t len) { return malloc(len); } void *xrealloc(void *p, size_t new_size) { return realloc(p, new_size); } static inline unsigned bit_ofs(unsigned bit_ndx) { return bit_ndx >> 3; } static inline unsigned bit_mask(unsigned bit_ndx) { return 1 << (bit_ndx & 7); } void init_uidl_db(struct uidl_db *db) { db->pat_root = NULL; db->records = xmalloc(MIN_RECORDS * sizeof(*db->records)); db->records_max = MIN_RECORDS; db->records_next = 0; db->num_ndx = NULL; db->num_ndx_max = 0; }; static struct pat_node *walk_down(struct uidl_db *db, struct uidl_record *rec, struct pat_node ***edgep, struct pat_node **parentp) { struct pat_node *cur, **edge; unsigned bit_ndx, v; int ofs; cur = db->pat_root; ofs = -1; do { bit_ndx = cur->bit_ndx; if (bit_ofs(bit_ndx) != ofs) { ofs = bit_ofs(bit_ndx); v = ofs < rec->id_len ? rec->id[ofs] : 0; } edge = ptrs(cur) + (v & cur->bit_mask ? 1 : -1); } while ((cur = *edge) && cur->bit_ndx > bit_ndx); *parentp = (void *)((char *)edge - (v & bit_mask(bit_ndx) ? offsetof(struct pat_node, ptrs_[2]) : offsetof(struct pat_node, ptrs_[0]))); *edgep = edge; return cur; } static inline unsigned first_set_bit_in_char(unsigned v) { return ffs(v) - 1; } static int find_first_diff_bit(struct uidl_record const *r0, struct uidl_record const *r1) { struct uidl_record const *long_id; unsigned ofs, max; unsigned char v; max = r0->id_len; if (max > r1->id_len) { max = r1->id_len; long_id = r0; } else long_id = r1; ofs = 0; do v = r0->id[ofs] ^ r1->id[ofs]; while (!v && ++ofs < max); if (!v) { if (r0->id_len == r1->id_len) return -1; v = long_id->id[ofs]; } return first_set_bit_in_char(v) + ofs * 8; } static inline int record_id_equal(struct uidl_record const *r0, struct uidl_record const *r1) { return r0->id_len == r1->id_len && memcmp(r0->id, r1->id, r0->id_len) == 0; } static struct uidl_record *append_to_list(struct uidl_record **recp, struct uidl_record *rec) { while (*recp) recp = &(*recp)->next; *recp = rec; rec->next = NULL; return rec; } static struct pat_node *get_pat_node(struct uidl_record *rec) { struct pat_node *np; np = xmalloc(sizeof(*np)); np->rec = rec; rec->next = NULL; return np; } static struct pat_node *get_standalone_node(struct uidl_record *rec) { struct pat_node *np; np = get_pat_node(rec); np->bit_ndx = first_set_bit_in_char(*rec->id); np->bit_mask = bit_mask(np->bit_ndx); return np; } static inline unsigned bit_set(unsigned bit_ndx, struct uidl_record const *rec) { unsigned ofs; ofs = bit_ofs(bit_ndx); if (ofs >= rec->id_len) return 0; return rec->id[ofs] & bit_mask(bit_ndx); } static inline struct pat_node *walk_up(unsigned diff_ndx, struct pat_node **parent) { struct pat_node *p, *np; np = *parent; while ((p = *ptrs(np)) && p->bit_ndx > diff_ndx) np = p; *parent = p; return np; } static struct uidl_record *pat_insert(struct uidl_db *db, struct uidl_record *rec) { struct pat_node *np, *closest, *parent, **edge; int me, bit_ndx; if (!db->pat_root) { np = get_standalone_node(rec); ptrs(np)[-1] = *ptrs(np) = NULL; ptrs(np)[1] = np; db->pat_root = np; return rec; } closest = walk_down(db, rec, &edge, &parent); if (closest) { bit_ndx = find_first_diff_bit(closest->rec, rec); if (bit_ndx < 0) return append_to_list(&closest->rec->next, rec); np = get_pat_node(rec); np->bit_ndx = bit_ndx; np->bit_mask = bit_mask(bit_ndx); } else np = get_standalone_node(rec); if (parent->bit_ndx > np->bit_ndx) { closest = walk_up(np->bit_ndx, &parent); if (!parent) edge = &db->pat_root; else edge = ptrs(parent)[-1] == closest ? ptrs(parent) - 1 : ptrs(parent) + 1; *ptrs(closest) = np; } *edge = np; *ptrs(np) = parent; me = bit_set(np->bit_ndx, rec) ? 1 : -1; ptrs(np)[me] = np; ptrs(np)[-me] = closest; return rec; } static struct uidl_record *get_uidl_record(char const *id, unsigned status) { struct uidl_record *rec; unsigned id_len; rec = xmalloc(sizeof(*rec)); id_len = strlen(id); rec->id = memcpy(xmalloc(id_len + 1), id, id_len + 1); rec->id_len = id_len; rec->status = status; rec->num = 0; return rec; } static struct uidl_record *insert_into_records(struct uidl_db *db, struct uidl_record *rec) { unsigned next, want; next = db->records_next; if (next == db->records_max) { want = db->records_max *= 2; db->records = xrealloc(db->records, want * sizeof(rec)); } rec->pos = next; db->records[next] = rec; db->records_next = next + 1; return rec; } struct uidl_record *insert_uidl(struct uidl_db *db, char const *id, unsigned status) { struct uidl_record *rec; rec = insert_uidl_noindex(db, id, status); return pat_insert(db, rec); } struct uidl_record *insert_uidl_noindex(struct uidl_db *db, char const *id, unsigned status) { struct uidl_record *rec; rec = get_uidl_record(id, status); return insert_into_records(db, rec); } void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, unsigned num) { unsigned want, have; have = db->num_ndx_max; if (num >= have) { want = num + MIN_RECORDS; db->num_ndx = xrealloc(db->num_ndx, want * sizeof(*db->num_ndx)); db->num_ndx_max = want; do db->num_ndx[--want] = NULL; while (want > have); } db->num_ndx[num] = rec; rec->num = num; } struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id) { static unsigned step_sum, searches; struct pat_node *np; struct uidl_record *rec; unsigned v, bit_ndx, len; int ofs; len = strlen(id); np = db->pat_root; if (np) { ofs = -1; do { bit_ndx = np->bit_ndx; if (bit_ofs(bit_ndx) != ofs) { ofs = bit_ofs(bit_ndx); v = ofs < len ? id[ofs] : 0; } np = ptrs(np)[v & np->bit_mask ? 1 : -1]; } while (np && np->bit_ndx > bit_ndx); if (!np) return NULL; rec = np->rec; return rec->id_len == len && memcmp(id, rec->id, len) == 0 ? rec : NULL; } #if 0 report(stderr, GT_("id search in unindexed db\n")); #endif ofs = db->records_next; while (ofs) { rec = db->records[--ofs]; if (rec->id_len == len && memcmp(rec->id, id, len) == 0) return rec; } return 0; } struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id) { struct uidl_record *rec; rec = find_uidl_by_id(db, id); if (!rec) return NULL; while (rec->next) rec = rec->next; return rec; } static void free_uidl_list(struct uidl_record *rec) { if (rec->next) free_uidl_list(rec->next); free(rec->id); free(rec); } static void free_pat_trie(struct pat_node *np) { struct pat_node *child; child = ptrs(np)[-1]; if (child && child->bit_ndx > np->bit_ndx) free_pat_trie(child); child = ptrs(np)[1]; if (child->bit_ndx > np->bit_ndx) free_pat_trie(child); free_uidl_list(np->rec); free(np); } void free_uidl_db(struct uidl_db *db) { struct uidl_record *rec; unsigned ndx; if (db->pat_root) free_pat_trie(db->pat_root); free(db->records); free(db->num_ndx); } /* test code */ int main(void) { struct uidl_db db; struct uidl_record *rec; unsigned n; char buf[80]; init_uidl_db(&db); while (gets(buf)) insert_uidl(&db, buf, 0); n = 1000; while (n--) { fseek(stdin, 0, 0); while (gets(buf)) { rec = find_uidl_by_id(&db, buf); if (!rec || strcmp(rec->id, buf) != 0) fprintf(stderr, "not found\n"); } } return 0; } --------------------------------------------- |
From: Matthias A. <mat...@gm...> - 2010-05-19 21:48:40
|
Am 19.05.2010, 20:30 Uhr, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: > > [...] > >> please do (show an advance/preview version) - if you don't want to >> expose it to the list yet for whatever reason, but want my comments, >> feel free to send personal mail. > > I'll happily listen to anyone who has something more constructive to > say about this than "I hate it!". This is my last 'standalone test > program'. It can be compiled and used to do testing/ measurement of > the trie-code based on reading 'unique message IDs' from stdin which > must point to 'something seekable'. The interface is known to be > incomplete and the halfway integrated version contains everything in > here and whatever was additionally needed in order to use this as > 'backend' for the code in uid.c. My present approach to doing lookups > based on 'message numbers' is very simplistic --- just an array going > from 0 .. <highest message number>. There's a potential scalability > issue in there: Realloc with a constant increment is O(n^2) when the > memory area needs to be copied. If the pop3.c code works as I believe > it does (inserts message numbers in increasing order), I will likely Hi Rainer, thanks a lot! Regarding what you're writing: We know the message count in advance, because we query it (through STAT, in pop3_getrange()), so we have a good guess of how many messages there are, and we need very few realloc(). Look for *countp (a pointer passed by the caller) to get the message count. Message numbers are also ephemeral (i. e. not saved to disk), in case that helps. > change this to additionally store the offset of the first element in > the array to keep this shorter for the case when a large mailbox > contains a few new messages at the end. I will also change the > realloc-calculation to something more sensible (usually, this means > doubling the allocated area every time it needs to grow. > > Any feedback on the implementation would be very welcome. Looks quite alright at first glance. As a general remark, I wonder if it would be worthwhile splitting the PATRICIA core code from the fetchmail interfacing code. gets(DEST) is a no-go for me - please use fgets(DEST, sizeof(DEST), stdin), even if it's just in test code. I've often wasted time debugging such test code because it wasn't half as robust as the non-test code. Could I also ask that you spell out implicit "int" types, i. e. change unsigned to unsigned int? Implicit int is obsolecscent usage. It is valid C89, but causes C99 and C++ compiler complaints. Please spell out "unsigned int", or use uintfast_t or something like that. I wonder if using a typedef for the integers in the .h file would be useful. To give a feeling for compiler verbosity, I'm including compiler diagnostics from three compilers below, omitting warnings for unused variables, and more comments inline. ======================================================================================= GCC 4.4: ======================================================================================= $ gcc -O -Wc++-compat -std=c99 -Wall -Wextra -pedantic \ -o uidl_db uidl_db.c -Wno-unused -Dinline=__inline__ # -Dinline for -std=c89 :-) uidl_db.c: In function ‘init_uidl_db’: uidl_db.c:63: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record **’ not permitted in C++ uidl_db.c: At top level: uidl_db.c:69: warning: ISO C does not allow extra ‘;’ outside of a function uidl_db.c: In function ‘walk_down’: uidl_db.c:83: warning: comparison between signed and unsigned integer expressions uidl_db.c:85: warning: comparison between signed and unsigned integer expressions uidl_db.c:93: warning: request for implicit conversion from ‘void *’ to ‘struct pat_node *’ not permitted in C++ uidl_db.c: In function ‘first_set_bit_in_char’: uidl_db.c:100: warning: implicit declaration of function ‘ffs’ uidl_db.c: In function ‘get_pat_node’: uidl_db.c:152: warning: request for implicit conversion from ‘void *’ to ‘struct pat_node *’ not permitted in C++ uidl_db.c: In function ‘get_uidl_record’: uidl_db.c:242: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record *’ not permitted in C++ uidl_db.c:245: warning: request for implicit conversion from ‘void *’ to ‘char *’ not permitted in C++ uidl_db.c: In function ‘insert_into_records’: uidl_db.c:262: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record **’ not permitted in C++ uidl_db.c: In function ‘set_uidl_num’: uidl_db.c:299: warning: request for implicit conversion from ‘void *’ to ‘struct uidl_record **’ not permitted in C++ uidl_db.c: In function ‘find_uidl_by_id’: uidl_db.c:325: warning: comparison between signed and unsigned integer expressions uidl_db.c:327: warning: comparison between signed and unsigned integer expressions uidl_db.c:314: warning: ‘v’ may be used uninitialized in this function /tmp/cc6WZFTd.o: In function `main': uidl_db.c:(.text+0x602): warning: the `gets' function is dangerous and should not be used. I'm not so concerned about the extra ; or the implicit conversions, but I am concerned about the signedness mismatches. The ffs declaration is harmless and fixed with "#include <strings.h>". ======================================================================================= Intel C++ 11.1: ======================================================================================= $ icc -O -w2 -Wdeprecated -Wall -Wcheck -Wp64 -diag-disable 177,1418 -std=c99 -o uidl_db uidl_db.c uidl_db.c(69): remark #424: extra ";" ignored }; ^ uidl_db.c(92): warning #1684: conversion from pointer to same-sized integral type (potential portability problem) offsetof(struct pat_node, ptrs_[2]) ^ uidl_db.c(93): warning #1684: conversion from pointer to same-sized integral type (potential portability problem) : offsetof(struct pat_node, ptrs_[0]))); ^ uidl_db.c(100): warning #266: function "ffs" declared implicitly return ffs(v) - 1; ^ uidl_db.c(119): remark #2259: non-pointer conversion from "int" to "unsigned char" may lose significant bits v = r0->id[ofs] ^ r1->id[ofs]; ^ uidl_db.c(244): warning #2259: non-pointer conversion from "size_t={unsigned long}" to "unsigned int" may lose significant bits id_len = strlen(id); ^ uidl_db.c(317): warning #2259: non-pointer conversion from "size_t={unsigned long}" to "unsigned int" may lose significant bits len = strlen(id); ^ /tmp/iccQw73Ea.o: In function `main': uidl_db.c:(.text+0x6d): warning: the `gets' function is dangerous and should not be used. ======================================================================================= And finally clang 2.7: ======================================================================================= $ clang -pedantic -Wall -O3 -std=c99 -o uidl_db uidl_db.c uidl_db.c:69:2: warning: extra ';' outside of a function }; ^ uidl_db.c:92:6: warning: using extended field designator is an extension [-pedantic] offsetof(struct pat_node, ptrs_[2]) ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from uidl_db.c:11: In file included from /usr/include/stdlib.h:33: /usr/local/lib/clang/2.0/include/stddef.h:48:24: note: instantiated from: #define offsetof(t, d) __builtin_offsetof(t, d) ^ uidl_db.c:93:8: warning: using extended field designator is an extension [-pedantic] : offsetof(struct pat_node, ptrs_[0]))); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In file included from uidl_db.c:11: In file included from /usr/include/stdlib.h:33: /usr/local/lib/clang/2.0/include/stddef.h:48:24: note: instantiated from: #define offsetof(t, d) __builtin_offsetof(t, d) ^ uidl_db.c:100:12: warning: implicit declaration of function 'ffs' is invalid in C99 [-Wimplicit-function-declaration] return ffs(v) - 1; ^ ... /tmp/cc-GdRjxd.o: In function `main': uidl_db.c:(.text+0x779): warning: the `gets' function is dangerous and should not be used. ======================================================================================= Further comments inlined below. ======================================================================================= (sorry for the awkward quoting, that's an Opera M2 issue -- if someone has an "automatic mailing list folder/virtual folder generation tool" add-on that behaves similar to Opera so that I don't need to do the 100 lists manually, I'd be grateful and switch soonish.) > ------------- uidl_db.h ------------------- > /* > uidl db > > Copyright (c) 2010 MAD Partners, Ltd. > > This file is being published according to the terms of the > GPLv2 license contained in the 6.3.17 release of fetchmail. > > */ > #ifndef fetchmail_uidl_db_h > #define fetchmail_uidl_db_h > > /* includes */ > #include <stddef.h> > > /* types */ > struct uidl_record { > char *id; > unsigned id_len; > unsigned num, status, pos; > > struct uidl_record *next; > }; > > struct pat_node; > > struct uidl_db > { > struct pat_node *pat_root; > struct uidl_record **records; > unsigned records_max, records_next; What's the difference between _max and _next? Should probably be documented in the .h file later, although it seems quite clear from the init_uid_db() function. > struct uidl_record **num_ndx; > unsigned num_ndx_max; > }; > > /* routines */ > void init_uidl_db(struct uidl_db *db); I wonder if it would be good to start all functions with uidl_* consistently, for namespace purity. > > struct uidl_record *insert_uidl(struct uidl_db *db, > char const *id, unsigned status); > > struct uidl_record *insert_uidl_noindex(struct uidl_db *db, > char const *id, > unsigned status); > > void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, > unsigned num); > > struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id); > > static inline struct uidl_record * > find_uidl_by_num(struct uidl_db *db, unsigned num) > { > return num < db->num_ndx_max ? db->num_ndx[num] : NULL; > } > static inline struct uidl_record * > find_uidl_by_pos(struct uidl_db *db, unsigned pos) > { > return pos < db->records_next ? db->records[pos] : NULL; > } > > static inline struct uidl_record * > first_uidl_in_db(struct uidl_db *db, char const *id) > { > return find_uidl_by_id(db, id); > } > struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id); > > #endif > --------------------------------------- > > ------------------- uidl_db.c ----------------- > /* > uidl db > > Copyright (c) 2010 MAD Partners, Ltd. > > This file is being published according to the terms of the > GPLv2 license contained in the 6.3.17 release of fetchmail. > */ > > /* includes */ > #include <stdlib.h> > #include <stdio.h> > #include <string.h> Please add, for ffs: #include <strings.h> > #include "uidl_db.h" > > /* constants */ > enum { > MIN_RECORDS = 16 > }; > > /* types */ > struct pat_node { > struct pat_node *ptrs_[3]; > unsigned bit_ndx, bit_mask; > > struct uidl_record *rec; > }; > > /* > The idea behind this is that the 'left' pointer is at -1 and > the 'right' pointer at 1. This implies that dealing with 'symmetric > cases' for left and right pointers isn't necessary. There's always > just 'this pointer' (offset n) and 'the other pointer' (offset -n). > */ > #define ptrs(p) ((p)->ptrs_ + 1) > > /* routines */ > void *xmalloc(size_t len) > { > return malloc(len); > } > > void *xrealloc(void *p, size_t new_size) > { > return realloc(p, new_size); > } > > static inline unsigned bit_ofs(unsigned bit_ndx) > { > return bit_ndx >> 3; > } > > static inline unsigned bit_mask(unsigned bit_ndx) > { > return 1 << (bit_ndx & 7); > } > > void init_uidl_db(struct uidl_db *db) > { > db->pat_root = NULL; > db->records = xmalloc(MIN_RECORDS * sizeof(*db->records)); > db->records_max = MIN_RECORDS; > db->records_next = 0; > > db->num_ndx = NULL; > db->num_ndx_max = 0; > }; > > static struct pat_node *walk_down(struct uidl_db *db, struct uidl_record > *rec, > struct pat_node ***edgep, struct pat_node **parentp) > { > struct pat_node *cur, **edge; > unsigned bit_ndx, v; > int ofs; > > cur = db->pat_root; > ofs = -1; > do { > bit_ndx = cur->bit_ndx; > > if (bit_ofs(bit_ndx) != ofs) { > ofs = bit_ofs(bit_ndx); > v = ofs < rec->id_len ? rec->id[ofs] : 0; > } > > edge = ptrs(cur) + (v & cur->bit_mask ? 1 : -1); > } while ((cur = *edge) && cur->bit_ndx > bit_ndx); > > *parentp = (void *)((char *)edge - (v & bit_mask(bit_ndx) ? > offsetof(struct pat_node, ptrs_[2]) > : offsetof(struct pat_node, ptrs_[0]))); > *edgep = edge; > return cur; > } > > static inline unsigned first_set_bit_in_char(unsigned v) > { > return ffs(v) - 1; > } > > static int find_first_diff_bit(struct uidl_record const *r0, > struct uidl_record const *r1) > { > struct uidl_record const *long_id; > unsigned ofs, max; > unsigned char v; > > max = r0->id_len; > if (max > r1->id_len) { > max = r1->id_len; > long_id = r0; > } else > long_id = r1; > > ofs = 0; > do > v = r0->id[ofs] ^ r1->id[ofs]; > while (!v && ++ofs < max); > > if (!v) { > if (r0->id_len == r1->id_len) return -1; > v = long_id->id[ofs]; > } > > return first_set_bit_in_char(v) + ofs * 8; > } > > static inline int record_id_equal(struct uidl_record const *r0, > struct uidl_record const *r1) > { > return > r0->id_len == r1->id_len > && memcmp(r0->id, r1->id, r0->id_len) == 0; > } > > static struct uidl_record *append_to_list(struct uidl_record **recp, > struct uidl_record *rec) > { > while (*recp) recp = &(*recp)->next; > *recp = rec; > rec->next = NULL; > return rec; > } > > static struct pat_node *get_pat_node(struct uidl_record *rec) > { > struct pat_node *np; > > np = xmalloc(sizeof(*np)); > np->rec = rec; > rec->next = NULL; > return np; > } > > static struct pat_node *get_standalone_node(struct uidl_record *rec) > { > struct pat_node *np; > > np = get_pat_node(rec); > np->bit_ndx = first_set_bit_in_char(*rec->id); > np->bit_mask = bit_mask(np->bit_ndx); > return np; > } > > static inline unsigned bit_set(unsigned bit_ndx, struct uidl_record > const *rec) > { > unsigned ofs; > > ofs = bit_ofs(bit_ndx); > if (ofs >= rec->id_len) return 0; > return rec->id[ofs] & bit_mask(bit_ndx); > } > > static inline struct pat_node *walk_up(unsigned diff_ndx, struct > pat_node **parent) > { > struct pat_node *p, *np; > > np = *parent; > while ((p = *ptrs(np)) && p->bit_ndx > diff_ndx) > np = p; > > *parent = p; > return np; > } > > static struct uidl_record *pat_insert(struct uidl_db *db, > struct uidl_record *rec) > { > struct pat_node *np, *closest, *parent, **edge; > int me, bit_ndx; > > if (!db->pat_root) { > np = get_standalone_node(rec); > ptrs(np)[-1] = *ptrs(np) = NULL; > ptrs(np)[1] = np; > > db->pat_root = np; > return rec; > } > > closest = walk_down(db, rec, &edge, &parent); > > if (closest) { > bit_ndx = find_first_diff_bit(closest->rec, rec); > if (bit_ndx < 0) > return append_to_list(&closest->rec->next, rec); > > np = get_pat_node(rec); > np->bit_ndx = bit_ndx; > np->bit_mask = bit_mask(bit_ndx); > } else > np = get_standalone_node(rec); > > if (parent->bit_ndx > np->bit_ndx) { > closest = walk_up(np->bit_ndx, &parent); > > if (!parent) edge = &db->pat_root; > else edge = ptrs(parent)[-1] == closest ? > ptrs(parent) - 1 : ptrs(parent) + 1; > *ptrs(closest) = np; > } > *edge = np; > *ptrs(np) = parent; > > me = bit_set(np->bit_ndx, rec) ? 1 : -1; > ptrs(np)[me] = np; > ptrs(np)[-me] = closest; > > return rec; > } > > static struct uidl_record *get_uidl_record(char const *id, unsigned > status) > { > struct uidl_record *rec; > unsigned id_len; > > rec = xmalloc(sizeof(*rec)); > > id_len = strlen(id); > rec->id = memcpy(xmalloc(id_len + 1), id, id_len + 1); > rec->id_len = id_len; > rec->status = status; > rec->num = 0; > > return rec; > } > > static struct uidl_record *insert_into_records(struct uidl_db *db, > struct uidl_record *rec) > { > unsigned next, want; > > next = db->records_next; > if (next == db->records_max) { > want = db->records_max *= 2; > db->records = xrealloc(db->records, want * sizeof(rec)); > } > > rec->pos = next; > db->records[next] = rec; > db->records_next = next + 1; > > return rec; > } > > struct uidl_record *insert_uidl(struct uidl_db *db, > char const *id, unsigned status) > { > struct uidl_record *rec; > > rec = insert_uidl_noindex(db, id, status); > return pat_insert(db, rec); > } > > struct uidl_record *insert_uidl_noindex(struct uidl_db *db, > char const *id, unsigned status) > { > struct uidl_record *rec; > > rec = get_uidl_record(id, status); > return insert_into_records(db, rec); > } > > void set_uidl_num(struct uidl_db *db, struct uidl_record *rec, > unsigned num) > { > unsigned want, have; > > have = db->num_ndx_max; > if (num >= have) { > want = num + MIN_RECORDS; > db->num_ndx = xrealloc(db->num_ndx, want * sizeof(*db->num_ndx)); > db->num_ndx_max = want; > > do db->num_ndx[--want] = NULL; while (want > have); > } > > db->num_ndx[num] = rec; > rec->num = num; > } > > struct uidl_record *find_uidl_by_id(struct uidl_db *db, char const *id) > { > static unsigned step_sum, searches; > struct pat_node *np; > struct uidl_record *rec; > unsigned v, bit_ndx, len; > int ofs; > len = strlen(id); > np = db->pat_root; > if (np) { > ofs = -1; > do { > bit_ndx = np->bit_ndx; > > if (bit_ofs(bit_ndx) != ofs) { > ofs = bit_ofs(bit_ndx); > v = ofs < len ? id[ofs] : 0; > } > > np = ptrs(np)[v & np->bit_mask ? 1 : -1]; > } while (np && np->bit_ndx > bit_ndx); > > if (!np) return NULL; > > rec = np->rec; > return rec->id_len == len && memcmp(id, rec->id, len) == 0 ? > rec : NULL; > } > > #if 0 > report(stderr, GT_("id search in unindexed db\n")); > #endif > > ofs = db->records_next; > while (ofs) { > rec = db->records[--ofs]; > if (rec->id_len == len && memcmp(rec->id, id, len) == 0) > return rec; > } > > return 0; > } > > struct uidl_record *last_uidl_in_db(struct uidl_db *db, char const *id) > { > struct uidl_record *rec; > > rec = find_uidl_by_id(db, id); > if (!rec) return NULL; > > while (rec->next) rec = rec->next; > return rec; > } > > static void free_uidl_list(struct uidl_record *rec) > { > if (rec->next) free_uidl_list(rec->next); > > free(rec->id); > free(rec); > } > > static void free_pat_trie(struct pat_node *np) > { > struct pat_node *child; > > child = ptrs(np)[-1]; > if (child && child->bit_ndx > np->bit_ndx) free_pat_trie(child); > > child = ptrs(np)[1]; > if (child->bit_ndx > np->bit_ndx) free_pat_trie(child); > > free_uidl_list(np->rec); > free(np); > } > > void free_uidl_db(struct uidl_db *db) > { > struct uidl_record *rec; > unsigned ndx; > if (db->pat_root) free_pat_trie(db->pat_root); > > free(db->records); > free(db->num_ndx); > } > > /* test code */ > int main(void) > { > struct uidl_db db; > struct uidl_record *rec; > unsigned n; > char buf[80]; > init_uidl_db(&db); > while (gets(buf)) insert_uidl(&db, buf, 0); > > n = 1000; > while (n--) { > fseek(stdin, 0, 0); > while (gets(buf)) { > rec = find_uidl_by_id(&db, buf); > if (!rec || strcmp(rec->id, buf) != 0) > fprintf(stderr, "not found\n"); > } > } > return 0; > } > --------------------------------------------- -- Matthias Andree |
From: Rainer W. <rwe...@ms...> - 2010-05-20 22:34:49
|
"Matthias Andree" <mat...@gm...> writes: [...] > Looks quite alright at first glance. As a general remark, I wonder if > it would be worthwhile splitting the PATRICIA core code from the > fetchmail interfacing code. My 'general' opinion on this is that abstractions aren't free and creating them 'just in case' should rather be avoided. [...] > Could I also ask that you spell out implicit "int" types, i. e. change > unsigned to unsigned int? Implicit int is obsolecscent usage. It is > valid C89, but causes C99 and C++ compiler complaints. For the sake of completeness: 'implicit int' refers to not declaring a function return value which traditionally caused compilers to assume that the function would return a value of type 'int'. This is no longer valid C, as of ISO/IEC 9899:1999 (E). OTOH, the set of valid type specifiers and type specifier combinations is defined in 6.7.2 and includes the following text: Each list of type specifiers shall be one of the following sets (delimited by commas, when there is more than one set on a line); [...] -- int, signed, or signed int -- unsigned, or unsigned int [6.7.2|2] Meaning, not only is 'unsigned' a completely legitimate way to declare an unsigned integer but actually, 'signed' is also valid instead of 'int'. That said, I am completely willing to comply with any 'style rules' you consider to be essential. [...] > To give a feeling for compiler verbosity, I'm including compiler > diagnostics from three compilers below, omitting warnings for unused > variables, and more comments inline. The (meanwhile completely integrated) code compiled as part of fetchmail without warnings but I have done a -W -Wall gcc pass and adressed everything which came up (including one actual error). I'd like to make a few statemens regarding specific warnings below. General (so that it's before the 'long quoted text'): As I wrote, integration of this code with fetchmail was completed yesterday and the resulting binary has survived a night of pulling e-mail from my account. The most expensive part is now the lexer. I am going to run this for another night tonight, this time with fastuidl enabled. Should this also go well, I will most likely move this onto our 'production server' tomorrow. Except a few bits of missing functionality (such as a traversal routine), the most notable change I have done is that I have replaced the recursive trie-freeing routine with an iterative one: While this is - at best - only marginally faster (for my test dataset), it has the nice property of needing no additional storage beyond what's already used for the trie itself. I've also had an idea how to restrict the size of the num_ndx array to what is actually needed: Since the maximum message number is known in advanced, the vector can grow 'downward' such that position zero is assumed to hold the largest possible message number and lower indices are then inserted sufficiently distant from that with the array being extended as required. Lookup would then use negative offsets relative to a pointer to the end of the array, adjusted according to the 'logical position' of the end of the array relative to the message number 1. I think I will get this implemented tomorrow and could then basically produce a patch relative to 6.3.17 which could serve as base for further changes. ? > $ gcc -O -Wc++-compat -std=c99 -Wall -Wextra -pedantic \ > -o uidl_db uidl_db.c -Wno-unused -Dinline=__inline__ > # -Dinline for -std=c89 :-) > uidl_db.c: In function ‘init_uidl_db’: > uidl_db.c:63: warning: request for implicit conversion from ‘void *’ > to ‘struct uidl_record **’ not permitted in C++ This is just my usual laziness :-): Since void * is kind of a passepartout type in C, I usually just cast to void * whenever any pointer conversion is needed. I've changed that to use the actual destination type. > uidl_db.c: In function ‘walk_down’: > uidl_db.c:83: warning: comparison between signed and unsigned integer > expressions > uidl_db.c:85: warning: comparison between signed and unsigned integer > expressions This was due to the ofs variable being of type int. I've changed that to unsigned, which didn't matter for the code. This will, of course, break horribly if UIDs longer than 2^32 - 1 chars are supposed to be processed (or if bit indices higher than 2^31-1 are needed) but I assume that both propositions are not exactly realistic (the POP3 specification already prohibits this: Maximum 'payload' of a response line is 510 octets, according to RFC1939, sec. 3). > uidl_db.c:314: warning: ‘v’ may be used uninitialized in this function > /tmp/cc6WZFTd.o: In function `main': I don't usually add spurious initializations just to silence these warnings if they are obviously wrong. [...] > uidl_db.c(92): warning #1684: conversion from pointer to same-sized > integral type (potential portability problem) > offsetof(struct pat_node, ptrs_[2]) > ^ I do not have the slightest idea what this is supposed to refer to. [...] > uidl_db.c(119): remark #2259: non-pointer conversion from "int" to > "unsigned char" may lose significant bits > v = r0->id[ofs] ^ r1->id[ofs]; > ^ Since both operands come from arrays of char and are only insofar of type 'int' as the values are supposed to be promoted to int, this seems to be an extremly poor warning. > uidl_db.c(244): warning #2259: non-pointer conversion from > "size_t={unsigned long}" to "unsigned int" may lose significant bits > id_len = strlen(id); > ^ I would rather keep this as 'unsigned' because size_t may be a 64-bit type on 64-bit systems and this seems exessive for representing the length of something restricted to less than 510 octets. [...] > uidl_db.c:92:6: warning: using extended field designator is an > extension [-pedantic] > offsetof(struct pat_node, ptrs_[2]) > ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The relevant text of the C-standard is offsetof(type, member-designator) [...] The type and member designator shall be such that given static type t; then the expression &(t.member-designator) evaluates to an address constant. [7.17|3] Assuming the following code: static struct pat_node t; &(t.ptrs_[2]) certainly evaluates to an address constant, so this appears to either be a traditional restriction or a C89-one which accidentally escaped into LLVM/clang using someone remembering it as 'transport medium'. |
From: Matthias A. <mat...@gm...> - 2010-05-21 03:11:39
|
Am 20.05.2010 22:34, schrieb Rainer Weikusat: > "Matthias Andree" <mat...@gm...> writes: > > [...] > >> Looks quite alright at first glance. As a general remark, I wonder if >> it would be worthwhile splitting the PATRICIA core code from the >> fetchmail interfacing code. > > My 'general' opinion on this is that abstractions aren't free and > creating them 'just in case' should rather be avoided. OK. > For the sake of completeness: 'implicit int' refers to not declaring a > function return value which traditionally caused compilers to assume > that the function would return a value of type 'int'. This is no > longer valid C, as of ISO/IEC 9899:1999 (E). OTOH, the set of valid > type specifiers and type specifier combinations is defined in 6.7.2 > and includes the following text: > > Each list of type specifiers shall be one of the following > sets (delimited by commas, when there is more than one set on > a line); > > [...] > > -- int, signed, or signed int > -- unsigned, or unsigned int > [6.7.2|2] > > Meaning, not only is 'unsigned' a completely legitimate way to declare > an unsigned integer but actually, 'signed' is also valid instead of > 'int'. That said, I am completely willing to comply with any 'style > rules' you consider to be essential. I stand corrected. In the light of your quote from the C99 standard, please ignore my previous comment in this regard. I was indeed confusing function return arguments with variable types. Sorry! It was less of a style than rather a technical (namely, portability) argument, which is now void. Also, I don't think I'm currently concerned with style - fetchmail source is using different styles... as long as it remains concise, it will be fine. If you're familiar with Doxygen (Javadoc), writings docs in .h files in a format that Doxygen can parse would be nice, but no need to get acquainted with it if you prefer a different style. Anything sensible is better than no documentation. :-) > General (so that it's before the 'long quoted text'): As I wrote, > integration of this code with fetchmail was completed yesterday and > the resulting binary has survived a night of pulling e-mail from my > account. The most expensive part is now the lexer. I am going to run So you're essentially saying that the rcfile scanner (lexer) dominates execution time? That's a good thing for fetchmail, and a bad thing for its scanner. The practical workaround is daemon mode, which would only read the rcfile on startup and if it changes. We could have flex inflate its tables so it gets faster, but this easily goes in the hundreds-of-kbytes for fetchmail's rather complex lexer. > this for another night tonight, this time with fastuidl > enabled. Should this also go well, I will most likely move this onto > our 'production server' tomorrow. Except a few bits of missing > functionality (such as a traversal routine), the most notable change I Just wondering, how are you currently writing out the UIDs without a traversal routine? > have done is that I have replaced the recursive trie-freeing routine > with an iterative one: While this is - at best - only marginally > faster (for my test dataset), it has the nice property of needing no > additional storage beyond what's already used for the trie itself. Good. > I've also had an idea how to restrict the size of the num_ndx array to > what is actually needed: Since the maximum message number is known in > advanced, the vector can grow 'downward' such that position zero is > assumed to hold the largest possible message number and lower indices > are then inserted sufficiently distant from that with the array being > extended as required. Lookup would then use negative offsets relative > to a pointer to the end of the array, adjusted according to the > 'logical position' of the end of the array relative to the message > number 1. I think I will get this implemented tomorrow and could then > basically produce a patch relative to 6.3.17 which could serve as > base for further changes. I'm not sure if it's worth the effort of redoing the array handling just to shave off a dozen of realloc() calls. Keep maintainability in mind; in my professional live, I've been debugging somebody else's (rather awkward [1]) code for many months now, and I'd say maintainability wins over a few % of execution speed. [1] That other code, entirely unrelated to fetchmail, often follows copy & paste & modify one line programming style, which is very error-prone... >> $ gcc -O -Wc++-compat -std=c99 -Wall -Wextra -pedantic \ >> -o uidl_db uidl_db.c -Wno-unused -Dinline=__inline__ >> # -Dinline for -std=c89 :-) >> uidl_db.c: In function ‘init_uidl_db’: >> uidl_db.c:63: warning: request for implicit conversion from ‘void *’ >> to ‘struct uidl_record **’ not permitted in C++ > > This is just my usual laziness :-): Since void * is kind of a > passepartout type in C, I usually just cast to void * whenever any > pointer conversion is needed. I've changed that to use the actual > destination type. I was also too lazy to remove that from the compiler logs :) I'll be happy to fix this myself should I go for C++ later. >> uidl_db.c: In function ‘walk_down’: >> uidl_db.c:83: warning: comparison between signed and unsigned integer >> expressions >> uidl_db.c:85: warning: comparison between signed and unsigned integer >> expressions > > This was due to the ofs variable being of type int. I've changed that > to unsigned, which didn't matter for the code. This will, of course, Yup. The only thing is that loops need to be written more carefully because ">= 0" or thereabouts now always holds. I think that's what you meant by "didn't matter for the code". > break horribly if UIDs longer than 2^32 - 1 chars are supposed to be > processed (or if bit indices higher than 2^31-1 are needed) but I > assume that both propositions are not exactly realistic (the POP3 > specification already prohibits this: Maximum 'payload' of a response > line is 510 octets, according to RFC1939, sec. 3). I propose to document that somewhere in a quiet place in fetchmail.man, but to not even add code checks for that - a FIXME comment will be sufficient IMO. >> uidl_db.c:314: warning: ‘v’ may be used uninitialized in this function >> /tmp/cc6WZFTd.o: In function `main': > > I don't usually add spurious initializations just to silence these > warnings if they are obviously wrong. GCC is also sometimes off track, hadn't checked any of the warnings except for the ffs(). >> uidl_db.c(92): warning #1684: conversion from pointer to same-sized >> integral type (potential portability problem) >> offsetof(struct pat_node, ptrs_[2]) >> ^ > > I do not have the slightest idea what this is supposed to refer to. I'd have to see Intel C++'s stddef.h to see what's really up, but this looks like a compiler issue (either way, as stddef.h is a compiler provided header). A pointer difference *is* an integral type; quoting POSIX (which is more specific as to the return type than C99 - your quote below - is): offsetof(type, member-designator) Integer constant expression of type size_t, the value of which is the offset in bytes to the structure member (member-designa‐ tor), from the beginning of its structure (type). But it's indeed a portability problem. Proper code is not portable to Intel C++. 8-) > [...] > >> uidl_db.c(119): remark #2259: non-pointer conversion from "int" to >> "unsigned char" may lose significant bits >> v = r0->id[ofs] ^ r1->id[ofs]; >> ^ > > Since both operands come from arrays of char and are only insofar of > type 'int' as the values are supposed to be promoted to int, this > seems to be an extremly poor warning. If the sources are "char" (whatever sign), it's safe. Looks like a compiler defect. I need to re-test with the latest patch release, and if it persists I need to regain access to the Intel site so I can file a bug report. >> uidl_db.c(244): warning #2259: non-pointer conversion from >> "size_t={unsigned long}" to "unsigned int" may lose significant bits >> id_len = strlen(id); >> ^ > > I would rather keep this as 'unsigned' because size_t may be a 64-bit > type on 64-bit systems and this seems exessive for representing the > length of something restricted to less than 510 octets. Seems this warning is in order and size_t would be OK, or at least checking against UINT_MAX as a fall-back. Probably not in this piece of the code though, but in parseuid - reject (skip) overlong UIDs. I have yet to see a site that comes even close to the 70 characters that POP3 allows :-) what's wrong with size_t? On 64-bit machines, the other four bytes come at no extra cost. In doubt someone will be padding anyways on those computers. :) > > [...] > >> uidl_db.c:92:6: warning: using extended field designator is an >> extension [-pedantic] >> offsetof(struct pat_node, ptrs_[2]) >> ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ > > The relevant text of the C-standard is > > offsetof(type, member-designator) > > [...] > > The type and member designator shall be such that given > > static type t; > > then the expression &(t.member-designator) evaluates to an > address constant. > [7.17|3] > > Assuming the following code: > > static struct pat_node t; > > &(t.ptrs_[2]) certainly evaluates to an address constant, so this > appears to either be a traditional restriction or a C89-one which > accidentally escaped into LLVM/clang using someone remembering it as > 'transport medium'. Open issue for me, might be a clang bug, or a configuration issue. Will have to check with an SVN version and file a report if it is a bug. Same computer as for Intel C++, it's powered down, and I'm 100 km away. Other than that, I'm really looking forward to the code. Thanks a lot. Best regards Matthias |