From: Wayne Graham <wsgrah@wm...> - 2007-10-01 16:38:38
Ok, I ran some tests for performance and here's what I got.
Using the Java importer in the SVN trunk, I indexed 19,112 records in 43
seconds. The Marc XML files took up a space of 56.5 Mb and the index was
12.1 Mb for a total size of 68.7Mb on disk.
I switched over to an added field in Solr (with a new fieldType in the
schema.xml file of "storage" with omits norms, doesn't sort missing
lasts, doesn't index, but stores the String output). We probably need to
figure out which fieldType to use for this that'll minimize processing.
I picked solr.strField as it uses an untokenized field.
I indexed the same records in 1:15 seconds and noticed a nice side
effect. The total space on disk drops to 61.9 Mb. The compression mostly
comes from not using tabs and new lines (it's just one line).
I've haven't done anything looking at the performance of the two
indexes, but I have a hunch any difference will be negligible (if any).
Since the fields doesn't get queried, only returned in the result, the
only difference _should_ be in the overhead to return the result.
* Wayne Graham
* Earl Gregg Swem Library
* PO Box 8794
* Williamsburg, VA 23188
Get latest updates about Open Source Projects, Conferences and News.