Re: [VuFind-Tech] Facets -- where are the bottlenecks?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi All,

Mark and I have been out of the office recently, so apologies for  
coming in late with news that probably isn't all that useful. I don't  
have much more to add to the discussion apart from re-iterating the  
importance of cache sizes for facet performance.

At the NLA, we were lucky enough to have a developer who had written  
an alternate solr facet implementation for another project. For each  
facetted field, this implementation builds up a document hash once at  
initialisation time and holds onto it until the indexes are changed.  
This means we endure a once-off cost at startup (we're talking  
minutes here), but subsequent facet requests are very fast, under a  
second. The downside with this approach is the hash needs to be  
rebuilt every time the indexes are changed, which means we couldn't  
do realtime updates to solr getting hit with our multi-minute facet  
build time on every commit.

We have something like 4 million solr documents running in a 6Gb JVM.  
If people are interested in learning more about how our faceting  
works, we're happy to share the code. I think the NLA is in the  
process of releasing it through this other project, but we've given  
out the faceting code to other VuFinders in the past, from memory.

Steve

On 05/08/2008, at 7:01 AM, Jessie Keck wrote:

> Hi Jeffrey,
> I believe that NLA has made some improvements with their own code,  
> maybe
> they can say something to that affect when they come on-line.
>
> -Jessie
>

----
Steven McPhillips <smc...@nl...>
IT Business Systems
National Library of Australia
Try our new catalogue - http://catalogue.nla.gov.au