Re: [VuFind-Tech] "Waaaay too high [...] an OOM waiting to happen" WAS: Garbage Collection

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Good science should be reproducible, so I would be inclined to use a public dataset available to all. I look forward to the dev call.

Benjamin Mosior

From: Demian Katz [mailto:dem...@vi...]
Sent: Tuesday, October 22, 2013 9:42 AM
To: Eoghan Ó Carragáin; Tod Olson
Cc: Mosior, Benjamin; Osullivan L.; vuf...@li...
Subject: RE: [VuFind-Tech] "Waaaay too high [...] an OOM waiting to happen" WAS: Garbage Collection

This makes sense to me. I believe that Benjamin's previous tests used the Keystone Library Network's index, which is quite large - I think that's a viable alternative to the BCL dataset if he wishes to continue down that route.

And yes, I believe you are correct that the terms component is a good way to obtain data sets for applying filters.

We'll talk about this more on the next dev call - in the meantime, let me know if there's anything specific anyone needs from me.

- Demian

From: Eoghan Ó Carragáin [mailto:eog...@gm...]
Sent: Tuesday, October 22, 2013 8:09 AM
To: Tod Olson
Cc: Mosior, Benjamin; Demian Katz; Osullivan L.; vuf...@li...<mailto:vuf...@li...>
Subject: Re: [VuFind-Tech] "Waaaay too high [...] an OOM waiting to happen" WAS: Garbage Collection

Thanks! I agree it is important to test this before changing the master defaults (or at least so we know what to change them to).

Since the type & extent of data in the index plays a big part, ideally we'd use a representative dataset. For example, we could use the first two *.mrc files from Boston College Library: http://archive.org/details/bcl_marc. Obviously some VF instances have much more than 1M records, but it is probably a reasonable number for benchmarking the VF defaults. Another benefit of using a common dataset for testing is that we only have to generate lists of unique facet values etc once

In terms of heap size & garbage collector, probably best to use "good base settings" recommendation from the VF wiki:

JAVA_OPTIONS="-server -Xmx3800m -Xms3800m -XX:+UseParallelGC -XX:+AggressiveOpts -XX:NewRatio=5"

To test the filter cache, I *think* we need to try to test two aspects:

1.   How much of the heap could the current defaults consume

2.   What would be the performance implications for dropping the cache size way down to something like 512, 4096, or 16384

1) How much of the heap could the current defaults consume

Erick gives a rule of thumb for calculating the worst case max: (FilterCache "size" as per solrconfig) * (MaxDocs/8) bytes. So for a VF index with 1M docs: 300,000 * (1,000,000/8) = 37,500,000,000 bytes = 34.9GB! Or following a hard-commit with current auto-warming settings: 50,000 * (1,000,000/8) = 6,250,000,000 bytes = 5.8 GB. As discussed elsewhere on the Solr mailing list (e.g. [1]), in practice Solr has various optimisations which means the amount of memory used by the filter cache should be considerably less especially where a given filter doesn't match many documents which would be the case for lots of facets on library data (e.g. an author has only written 2 books).

Also, depending on your catalogue usage, I think it could take quite awhile before users would select 300,000 (or even 50,000) different facet values, especially if you're doing a daily restart of Solr which seems to be reasonably common practice. However, GoogleBot or equivalent could try out all the unique values quickly & fill up the cache, so it would be good to get some idea of how much memory the filter cache could possibly consume if the default size and/or autowarm sizes were reached.

Possible test:

·      Get a list of unique values in each of VF's standard facets fields. Is the Terms Component the best way to get this? I think this should be the max number of possible filter cache entries (i.e. the key for a filter cache entry is the combination of the filter field and filter value, and the filter cache value is some representation of the documents that match that filter with the worst case being a bitmap for MaxDocs)?

·      Have something like jmeter populate the filter cache by calling an fq for each of these values. Response times aren't important here; we're just trying to fill the cache.

  *   Monitor JVM heap usage with something like GCViewer
  *   It would also be interesting to time how long it takes to carry out a commit for one new document when the cache is full (i.e. how long it takes to do the autowarming of 50K filter queries so that the new searcher is ready)

·      Repeat with cache size dropped way back & compare JVM heap usage

2) What would be the performance implications for dropping the cache way down to something like 512, 4096, or 16384? To test:

·      Same as above but monitor response times for queries which use fq queries with different cache settings? I think filter cache helps with faceting too, so it would be important to return facets with the test queries.

For a production instance, the recommendation is to tune the cache by monitoring hit-ratio and evictions in the Solr Admin. I don't think this applies to the test scenario because it won't reflect real-world usage. If we arrive at a good default size that won't risk OOM when the cache is filled (test 1), then we can document cache monitoring recommendations on on the VF wiki.

Do these sound like useful tests (I may well be misunderstanding some aspect of jmeter testing and/or solr caching!)? If this proves useful for filter cache, we can come up with something similar for the other cache types.

Cheers,

Eoghan

[1] http://lucene.472066.n3.nabble.com/Solr-filterCache-size-settings-td3520049.html

On 22 October 2013 03:33, Tod Olson <to...@uc...<mailto:to...@uc...>> wrote:
Something that would be really useful would be a couple test plans for (JMeter|Grinder|FunkLoad) that demonstrate a combination searches on different keyword indexes and browse indexes. Those could be used both for GC testing as well as general scaling.

-Tod

On Oct 21, 2013, at 3:46 PM, "Mosior, Benjamin" <BEM...@sh...<mailto:BEM...@sh...>> wrote:

I can certainly assist, though I would like to be a bit more thorough than the 1-off testing we performed previously. If we can put together a list of scenarios, I'd be glad to compile some data as I am able. We have a reasonable amount of dev-related computing resources available for the time being.

Perhaps we should set up a Google Spreadsheet until we can prepare the proper wiki syntax for the scenarios?

Also, I put together the results of our simple testing at the Summit here:https://vufind.org/wiki/testing_performance#keystone_library_network

Thanks,
Benjamin Mosior

From: Demian Katz [mailto:demian.katz@<mailto:demian.katz@>villanova.edu<http://villanova.edu>]
Sent: Monday, October 21, 2013 3:11 PM
To: Eoghan Ó Carragáin; Mosior, Benjamin
Cc: Osullivan L.; vuf...@li...<mailto:vuf...@li...>
Subject: RE: [VuFind-Tech] "Waaaay too high [...] an OOM waiting to happen" WAS: Garbage Collection

At the VuFind Summit last week, Benjamin Mosior did some Solr testing to check out different JVM garbage collection settings (among other things). I wonder if he might be able to help us out with some testing related to these settings as well - create a test suite that, for example, searches with filter queries applied and pages through results... then run it against different cache settings.

Benjamin, what do you think?

(Also, this is a topic worth putting on the next dev call agenda - I'll do so right now).

- Demian

From: Eoghan Ó Carragáin [mailto:eog...@gm...]
Sent: Friday, October 18, 2013 11:17 AM
Cc: Osullivan L.; vuf...@li...<mailto:vuf...@li...>
Subject: [VuFind-Tech] "Waaaay too high [...] an OOM waiting to happen" WAS: Garbage Collection

Hi,
I just noticed that Erick Erickson replied by my email to the solr-user list about Vufind's default cache settings (for some reason the reply never came to my inbox). You can see his response here: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201308.mbox/%3CC...@ma...%3E

Erick's response (summarized in the revised subject line for this thread ;)) would suggest that VuFind's filter cache defaults in particular are very high and are likely to be a contributing factor to unpredictable OOMs reported by some VuFind users.

Although GC has not been an issue for our installation since curtailing GoogleBot's activity, I have noticed that doing a commit can take a long time (~45 minutes even if only 1 record has been added) on our production server which Erick points to as another possible symptom of the high cache settings.

Erick suggests dropping cache settings "WAAAAAY back" and monitoring cache evictions on the solr admin page to tune things to a more appropriate level.

Not sure when I'll actually get to test any changes on our installation, but I think we should definitely look into this further in terms of the defaults shipped with VuFind. Has anyone actually changed the defaults in their production VuFind instances?

Cheers,
Eoghan

On 16 August 2013 12:55, Eoghan Ó Carragáin <eog...@gm...<mailto:eog...@gm...>> wrote:
BTW, I emailed the solr-user list for advice on the Vufind default cache settings but have yet to hear back: http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201308.mbox/%3CCAHh9HmEsQgSG9mZfqX0Q-ou92vQR-uWex7BO%3DG1fNN%2BaGjTLPw%40mail.gmail.com%3E

Cheers

On 16 August 2013 12:53, helix84 <he...@ce...<mailto:he...@ce...>> wrote:
On Fri, Aug 16, 2013 at 1:34 PM, Osullivan L. <L.O...@sw...<mailto:L.O...@sw...>> wrote:
> One thing which I've noticed is that we are using java-6-openjdk-amd64,
> quite possibly because the original virtual server used AMD processors.
> I think it now uses intel - might it be worth changing the java version?
Hi Luke,

no, you're using the correct version.

amd64: "The port consists of a kernel for all AMD 64bit CPUs with
AMD64 extension and all Intel CPUs with Intel 64 extension, and a
common 64bit userspace." [1]

>From [2]:
Q: Is this port only for AMD 64-bit CPUs?

A: No. "AMD64" is the name chosen by AMD for their 64-bit extension to
the Intel x86 instruction set. Before release, it was called "x86-64"
or "x86_64", and some distributions still use these names. Intel
refers to its AMD64 implementation as "Intel64" previously named
"EM64T". The architecture is AMD64-compatible and Debian AMD64 will
run on AMD and Intel processors with 64-bit support. Because of the
technology paternity, Debian uses the name "AMD64".

[1] http://www.debian.org/ports/amd64/index.en.html
[2] https://wiki.debian.org/DebianAMD64Faq

Regards,
~~helix84

------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite!
It's a free troubleshooting tool designed for production.
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://pubads.g.doubleclick.net/gampad/clk?id=48897031&iu=/4140/ostg.clktrk
_______________________________________________
Vufind-tech mailing list
Vuf...@li...<mailto:Vuf...@li...>
https://lists.sourceforge.net/lists/listinfo/vufind-tech

------------------------------------------------------------------------------
October Webinars: Code for Performance
Free Intel webinars can help you accelerate application performance.
Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most from
the latest Intel processors and coprocessors. See abstracts and register >
http://pubads.g.doubleclick.net/gampad/clk?id=60135991&iu=/4140/ostg.clktrk_______________________________________________

Vufind-tech mailing list
Vuf...@li...<mailto:Vuf...@li...>
https://lists.sourceforge.net/lists/listinfo/vufind-tech