Re: [VuFind-Tech] Garbage Collection

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Am 15.08.2013 15:18, schrieb Osullivan L.:
> Hi Folks,
>
> Whilst we were using VuFind 1x we had lots of problems with garbage
> collection which basically getting full far too quickly, thus
> necessitating frequent restarts of VuFind.

That shouldn't be necessary, if you manage to find sufficient heap size 
and garbage collection settings. We have Solr instances running for weeks...

> Having read about solr 4's
> better memory management, I had hoped to see significant improvements in
> this problem but unfortunately, it does seem to be the case.

We are just in the process of turning our old "Solr 3.6 cluster" into
a Solr 4.4 based SolrCloud environment... The garbage collection 
settings, that in the end worked fine with Solr 3.6, don't work well 
with Solr 4 for us (means too long "stop the world collections" every 
few hours or days (depending on settings), that we didn't have with Solr 
3.6). But since we are changing the complete architecture of our Solr 
environment (from one large index, that is replicated several times to 
several machines to a 5 times split index replicated 3 times), that 
might not be a general rule.

> Ubuntu 12.04 on Virtual Server
> 10GB Ram
>
> JAVA_OPTIONS="-server -d64 -Xms5120m -Xmx5120m -XX:+UseParallelGC
> -XX:+UseParallelOldGC -XX:+AggressiveOpts -XX:NewRatio=5
> -Xloggc:/var/log/vufind2/gc.log"

This article gives some simple advice on how to find a reasonable heap 
size: 
https://support.lucidworks.com/entries/25063063-Estimating-your-heap-and-memory-requirements
I'd recommend to try -XX:+UseConcMarkSweepGC toegether with 
-XX:+UseParNewGC as garbage collectors. With Solr 3.6 it is working 
nicely. There are lots of options, that you may use to fine tune 
behaviour of UseConcMarkSweepGC. I played with lots of them, and for 
example different values for -XX:NewRatio showed strong effects (8 or 9 
worked quite well for us), but at some point we still run into "stop the 
world collections" that tear down the whole SolrCloud in a chain process 
and kill search for several minutes (because the whole recovery scenario 
in the cloud kicks in after Solr works again). So I have no final 
conclusion on good settings...
But if you are using a recent Java 7 distribution, you can try just 
-XX:+UseG1GC as single garbage collection option. That's the more or 
less new G1 collector, that according to Oracle is ideal for high 
performance applications with high memory and low pause requirements 
(like Solr)... There are different reports about the true abilities of 
this thing (especially when used with Solr). For us it currently works 
at least better than UseConcMarkSweepGC with all kinds of esoteric 
settings (but maybe I just never found the right combination of 
settings?)... It's still not ideal, but it seems to work now...

> We are currently operating at no-where near peak as most students are
> off but the garbage is filling up every 1 hour and 40 mins or so.

So what is happening then? Out of memory exceptions or just too painful 
"stop the world collections" that interrupt Solr? General rule: If Solr 
goes oom (out of memory), either garbage collection kicks in too late 
(here all the options of UseConcMarkSweepGC can help) or you simply have 
not enough heap space (allocate more, if not possible go and buy RAM or 
try to lower Solr's memory requirements by reducing cache sizes, but 
that will also cost performance). If there are long "stop the world 
pauses" (several seconds), try to reduce(!) heap size or use a different 
garbage collector...

Till