Actually, we may be letting the heap grow too large. There's no -Xmx being used, and process has grown to 30GB, with only 5GB in the RSS, which is clearly insane and can't be good for the GC.

I'm certain we have other issues, as mentioned elsewhere in this thread, but this look like a good starting point. Probably sending the documents to solr over HTTP would make sense, too, so we can just focus on one Solr process.

You question about solr memory in indexing vs. searching makes a lot of sense to me, but I don't have the answer.

Thanks,

-Tod


On Feb 24, 2012, at 10:03 AM, Bill Dueber wrote:

I would second the recommendation to make sure you're not memory-starving your solr instance (give it at least 4GB of heap with the -Xmm and -Xmx switches). If things start out fast and then slow down, you're probably spending all your time in GC (which you may be able to see if you look in your jetty logs).

I'd be tempted (if possible) to give the solr jvm a crapload of heap and see what happens. If it works, scale it back until it doesn't anymore :-)

A related question: Does solr take more memory for indexing than for serving up pages? To wit: would it make sense to give an indexing instance a lot more memory than a solr instance running against a static index? Or are they basically doing the same thing and I shouldn't bother?

On Fri, Feb 24, 2012 at 10:38 AM, Alan Rykhus <alan.rykhus@mnsu.edu> wrote:
Hello Tod,

We have 6.5 million records in our database. We update the database 4
times a day. Each update takes 30 minutes to complete. In the last
couple of weeks the largest update was 90K records, still took 30
minutes.

I'm pretty sure we are just using the defaults in solrconfig.xml. The
records get slurped up in 5 minutes. It takes 25 minutes for the commit
to take place at the end.

I did change the options in import-marc.sh:

INDEX_OPTIONS='-Xms4096m -Xmx4096m -XX:+UseParallelGC -XX:
+AggressiveOpts'

This is on a VM that we recently upgraded to 12GB of ram.

al


On Thu, 2012-02-23 at 21:08 -0600, Tod Olson wrote:
> Well, the inevitable question of how to speed up solrmarc imports is
> coming up. Some guidance about what to look for would be welcome.
>
>
> The test system is a VM with 2CPUs, 49GB RAM (~4GB free) running
> Ubuntu 10. What we observe is in our first full import (6 million
> records) one of our later files of about a million records would be
> added in 9 hours. Not production speed, but enough to test. Now that
> we have a full index and are re-importing the records, we only
> imported about 370K records in the first 6 hours. Looks to me like we
> are CPU bound, seems maybe there's a single thread in solrmarc that is
> the bottleneck. solrconfig.xml is the default from the VuFind distro:
> mergFactor=10, that sort of thing.
>
>
> Behavior-wise, we also notice that records will chug along for awhile,
> and then there will be a big pause with no feedback. I assume this is
> when solr is merging segments.
>
>
> I know a few of you are indexing several million records, so I figure
> I'll start here. What were your first steps in speeding up indexing,
> and what kinds of metrics were useful to you?
>
>
> Thanks for any advice or pointers.
>
>
> -Tod
>
>
> Tod Olson <tod@uchicago.edu>
> Systems Librarian
> University of Chicago Library
>
>
>
>
> ------------------------------------------------------------------------------
> Virtualization & Cloud Management Using Capacity Planning
> Cloud computing makes use of virtualization - but cloud computing
> also focuses on allowing computing to be delivered as a service.
> http://www.accelacomm.com/jaw/sfnl/114/51521223/
> _______________________________________________
> Vufind-tech mailing list
> Vufind-tech@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/vufind-tech

--
Alan Rykhus
PALS, A Program of the Minnesota State Colleges and Universities
(507)389-1975
alan.rykhus@mnsu.edu
"It's hard to lead a cavalry charge if you think you look funny on a
horse" ~ Adlai Stevenson


------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/
_______________________________________________
Vufind-tech mailing list
Vufind-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/vufind-tech



--
Bill Dueber
Library Systems Programmer
University of Michigan Library
------------------------------------------------------------------------------
Virtualization & Cloud Management Using Capacity Planning
Cloud computing makes use of virtualization - but cloud computing
also focuses on allowing computing to be delivered as a service.
http://www.accelacomm.com/jaw/sfnl/114/51521223/_______________________________________________
Vufind-tech mailing list
Vufind-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/vufind-tech