I'm copying this to solrmarc-tech; you might want to join that list to continue discussions about SolrMarc performance, since you'll reach a larger group of SolrMarc experts there.
Before digging to deeply into any of this, I would strongly recommend upgrading to the latest master code from Git (which is extremely close to what will be released as VuFind 2.0 final in a couple of weeks). This has been upgraded to Solr 4.2.1, which has significantly different performance characteristics than Solr 3 (almost certainly for the better). You'll probably still be able to do some work to make things run better, but I would be interested to hear how your Solr 4 experience differs from your Solr 3 experience. If you don't mind trying that and reporting back, we can provide further advice.
Regarding your question about starting Solr in the background and stopping it with a stop script -- that's the normal approach under Linux, but I haven't been able to find a way to do it reliably under Windows, so I usually rely on hitting Ctrl-C to stop the process. With Solr 3 there was a Windows service wrapper for Jetty, but that no longer works with Solr 4.
From: Michael Lackhoff [michael@...]
Sent: Sunday, June 09, 2013 12:25 PM
To: 'VuFind List'
Subject: [VuFind-General] Sending lots of data to Vufind 2RC1
I would like to create quite a big index (some tens of millions records)
but noticed that it would take ages with the default settings.
What I did so far:
- unpack the vufind distribution
- set environment variables so that everything is found
- run run_vufind.bat start
- change import.properties to use REMOTE indexing
- run import_marc.bat test.mrc
What strikes me first even if it might not be the reason for the bad
performance is the amount of logging. Every record is logged at least
three times: (1) on the solrmarc console, (2) on the solr/jetty console
(3) in jetty/logs/<datastamp>.request.log
I would prefer to only log errors and only to a single log file (or
perhaps two, one for solr, one for solrmarc). Is this possible?
Second observation: I can see the records already while importing,
though autocommit is not set in solrconfig.xml. Does solrmarc give a
commit request after every record? That would explain why it is rather slow.
Another observation, not performance related: I noticed that the beta
somehow managed to start solr in the background and stop it with a stop
script. I saw that there is also a stop script in the RC1 distribution
but since Solr is running on the console it looks only natural to stop
it with CTRL-C. It would be nice to have it again running in the
background, is this possible?
But back to performance, what would be recommended memory settings? I
have 32GB of RAM on Windows 7 64bit. Everything is running on a really
fast SSD. Should I give most RAM to Solr or to solrmarc or where might
the bottleneck be?
Just to give some data: I am doing indexing on a comparable machine at a
rate of about 1000 records/second from a custom Perl-script over HTTP
and this script has to do quite a few database lookups. With solrmarc at
least as it is set up now, I only get 58 records/second.
I planned to do the import with a custom script but I didn't find the
time yet to port (the relevant for me parts of) solrmarc to Perl and a
properly configured Java has to be faster than an interpreted language!?!
Any ideas what I could try?
How ServiceNow helps IT people transform IT departments:
1. A cloud service to automate IT design, transition and operations
2. Dashboards that offer high-level views of enterprise services
3. A single system of record for all IT processes
VuFind-General mailing list