No good data as yet, please submit your benchmarks here.
i realise that these statistics are pretty rough, but i thought in the spirit of doing beat offs on lucene, i'd throw in my search engine into the ring... see what people have to say. i'm not going to attempt to show off using pretty graphs and mean/median/max charts. rather, i am going to hope to get some sort of stability out of doing the same test over and over again - and all tests will be run in one lump, so the start up of the VM will be 'masked' by the amount of time that it runs for.
comparison between apache lucene (i will call it jlucene) and clucene.
machine is a pentium 4 running windows xp with 1gb ram.
indexing 663mb (797 files) of guttenberg texts (some files removed and no graphics,etc):
using default memory -Xmx, etc.
I wanted to not put any restrictions on maxFieldsLenght, but it ran out of memory. so i restricted maxFieldLength to 100,000 terms.
used the demo programs. indexing was first run with java maxFieldsLength left at 10,000.
the whole directory was indexed in one hit.
jlucene indexing completed in 136219 ms. peak memory usage ~15mb, average ~12mb
clucene(memory optimized) indexing completed in 53735 ms. peak memory usage ~7.5mb, average ~1.7mb
the whole thing was done again, this time setting the maxFieldsLength to 100,000.
jlucene indexing completed in 646453. peak memory usage ~72mb, average ~14mb (when it goes to some of the smaller files)
clucene(memory optimized) indexing completed in 232141. peak memory usage ~~60, average ~4mb (when it goes to some of the smaller files)
then i made a text file containing text from the guttenberg collection with 10000 lines. each line represeting one query.
i pruned out key search phrases and punctiuation, etc.
i then modified the demo programs to search for each line of this file. I used the queryParser,
each hits result was iterated through fully (but no text processing or retrieval of the fields was done, just the document).
the 10000 queries were run 4 times each. all java variables were nulled as soon as they were not required anymore.
the search was run on the index created with the previous tests - it was 176mb.
jlucene - searching completed on average 60078ms and used ~13mb for each iteration of 10000 queries.
clucene(memory optimized) searching completed on average 53453ms and used ~4.2mb for each iteration of 10000 queries.
clucene(speed optimized) searching completed on average 52350ms and used ~11mb for each iteration of 10000 queries.
i was a bit dissapointed with the performance gains on the searching side. however, i have only recently begun finetuning
the performance of clucene. i started fine tuning the indexing side of things first - and i am pleased to see the performance
gains - until recently clucene was in the order of 1.8 times faster at indexing.
of course jlucene currently has a much larger group of people working on the code so it stands to reason that there has been
much more performance tuning done. until recently, clucene has basically had next to no time devoted to improving performance, and
this really shows, especially on the search side of things.
i'd like to congratulate lucene and especially doug on an excellent search api!