From: Tod O. <to...@uc...> - 2012-02-24 16:16:44
|
Thanks! We'll look at parallel indexing and the Solr merge settings. And also the GC behavior. -Tod On Feb 24, 2012, at 6:29 AM, Demian Katz wrote: > I'm copying this message to solrmarc-tech, since you'll probably get additional suggestions from there. > > Also, you might want to look at this thread -- it's a few years old but probably still relevant: > > http://sourceforge.net/mailarchive/message.php?msg_id=21044664 > > ...and here's another one: > > http://groups.google.com/group/solrmarc-tech/browse_thread/thread/fe329385bb1dc953 > > One thing that's particularly worth experimenting with (if you haven't already) is comparing performance between direct index writing and writing over HTTP. If you edit import.properties and change your solr.path value to "REMOTE", then SolrMarc will post updates to the solr.hosturl URL rather than writing them directly to the index. If you can split up your MARC file into chunks, you can run multiple instances of SolrMarc in parallel using the HTTP writing method, and that might help speed things up. > > - Demian > ________________________________________ > From: Tod Olson [to...@uc...] > Sent: Thursday, February 23, 2012 10:08 PM > To: vuf...@li... > Subject: [VuFind-Tech] solrmarc import speed > > Well, the inevitable question of how to speed up solrmarc imports is coming up. Some guidance about what to look for would be welcome. > > The test system is a VM with 2CPUs, 49GB RAM (~4GB free) running Ubuntu 10. What we observe is in our first full import (6 million records) one of our later files of about a million records would be added in 9 hours. Not production speed, but enough to test. Now that we have a full index and are re-importing the records, we only imported about 370K records in the first 6 hours. Looks to me like we are CPU bound, seems maybe there's a single thread in solrmarc that is the bottleneck. solrconfig.xml is the default from the VuFind distro: mergFactor=10, that sort of thing. > > Behavior-wise, we also notice that records will chug along for awhile, and then there will be a big pause with no feedback. I assume this is when solr is merging segments. > > I know a few of you are indexing several million records, so I figure I'll start here. What were your first steps in speeding up indexing, and what kinds of metrics were useful to you? > > Thanks for any advice or pointers. > > -Tod > > > Tod Olson <to...@uc...<mailto:to...@uc...>> > Systems Librarian > University of Chicago Library > > > |