From: Matthias A. <mat...@gm...> - 2010-05-17 09:44:24
|
Am 16.05.2010, 23:33 Uhr, schrieb Rainer Weikusat: > The trie implementation is meanwhile complete and I have done some > preliminary benchmarks. I fear that the 'insert' cost will turn out to > be somewhat higher, but the results for searches are promising. The > largest .fetchids file I have here contains 7275 UIDs. Assuming the > trie has already been created, iterating over the input file a > thousand times and searching for every UID needs 3.93s on average, > meaning the average time for a single search is about 5.4E-7 seconds > (3Ghz P4). Doing a plain linear search for every UID needs about 16.6 > seconds on average[*] or about 2.3E-4 seconds per search. > > [*] For ten searches. I didn't have the patience to wait for > 1000 :-> Hi Rainer, this looks good. Without trying to downplay your efforts, I'd also say that this was somewhat expected. In other words, your numbers confirm that the trie works as expected and does not degenerate into a linear list in your test cases. Insertion into the data structure is fastest for a pre-allocated array, then appending to a list that has a tail pointer. Inserting into trees, tries, heaps, and thereabouts always takes a bit longer because it encloses a search for the node where you'll be forking the branches. It is some sort of "ordered" insertion. This is the price we (happily!) pay for quicker retrievals and searches. The important part here is that, given N UIDs in a list, the costly operations "insert all UIDs" and "for each UID retrieved from the server, check if we've already seen it" are reduced to O(N log N) when they used to cost O(N^2) for larger N (i. e. asymptotically). I'm really looking forward to your code :-) Best regards -- Matthias Andree |