From: Bill D. <bi...@du...> - 2012-02-24 16:03:44
|
I would second the recommendation to make sure you're not memory-starving your solr instance (give it at least 4GB of heap with the -Xmm and -Xmx switches). If things start out fast and then slow down, you're probably spending all your time in GC (which you may be able to see if you look in your jetty logs). I'd be tempted (if possible) to give the solr jvm a crapload of heap and see what happens. If it works, scale it back until it doesn't anymore :-) A related question: Does solr take more memory for indexing than for serving up pages? To wit: would it make sense to give an indexing instance a lot more memory than a solr instance running against a static index? Or are they basically doing the same thing and I shouldn't bother? On Fri, Feb 24, 2012 at 10:38 AM, Alan Rykhus <ala...@mn...> wrote: > Hello Tod, > > We have 6.5 million records in our database. We update the database 4 > times a day. Each update takes 30 minutes to complete. In the last > couple of weeks the largest update was 90K records, still took 30 > minutes. > > I'm pretty sure we are just using the defaults in solrconfig.xml. The > records get slurped up in 5 minutes. It takes 25 minutes for the commit > to take place at the end. > > I did change the options in import-marc.sh: > > INDEX_OPTIONS='-Xms4096m -Xmx4096m -XX:+UseParallelGC -XX: > +AggressiveOpts' > > This is on a VM that we recently upgraded to 12GB of ram. > > al > > > On Thu, 2012-02-23 at 21:08 -0600, Tod Olson wrote: > > Well, the inevitable question of how to speed up solrmarc imports is > > coming up. Some guidance about what to look for would be welcome. > > > > > > The test system is a VM with 2CPUs, 49GB RAM (~4GB free) running > > Ubuntu 10. What we observe is in our first full import (6 million > > records) one of our later files of about a million records would be > > added in 9 hours. Not production speed, but enough to test. Now that > > we have a full index and are re-importing the records, we only > > imported about 370K records in the first 6 hours. Looks to me like we > > are CPU bound, seems maybe there's a single thread in solrmarc that is > > the bottleneck. solrconfig.xml is the default from the VuFind distro: > > mergFactor=10, that sort of thing. > > > > > > Behavior-wise, we also notice that records will chug along for awhile, > > and then there will be a big pause with no feedback. I assume this is > > when solr is merging segments. > > > > > > I know a few of you are indexing several million records, so I figure > > I'll start here. What were your first steps in speeding up indexing, > > and what kinds of metrics were useful to you? > > > > > > Thanks for any advice or pointers. > > > > > > -Tod > > > > > > Tod Olson <to...@uc...> > > Systems Librarian > > University of Chicago Library > > > > > > > > > > > ------------------------------------------------------------------------------ > > Virtualization & Cloud Management Using Capacity Planning > > Cloud computing makes use of virtualization - but cloud computing > > also focuses on allowing computing to be delivered as a service. > > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > > _______________________________________________ > > Vufind-tech mailing list > > Vuf...@li... > > https://lists.sourceforge.net/lists/listinfo/vufind-tech > > -- > Alan Rykhus > PALS, A Program of the Minnesota State Colleges and Universities > (507)389-1975 > ala...@mn... > "It's hard to lead a cavalry charge if you think you look funny on a > horse" ~ Adlai Stevenson > > > > ------------------------------------------------------------------------------ > Virtualization & Cloud Management Using Capacity Planning > Cloud computing makes use of virtualization - but cloud computing > also focuses on allowing computing to be delivered as a service. > http://www.accelacomm.com/jaw/sfnl/114/51521223/ > _______________________________________________ > Vufind-tech mailing list > Vuf...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-tech > -- Bill Dueber Library Systems Programmer University of Michigan Library |