Re: [VuFind-General] General Questions about VUFind

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

On Tue, Jul 31, 2007 at 10:10:13AM -0400, Andrew Nagy wrote:
> > -----Original Message-----
> > From: vuf...@li... [mailto:vufind-
> > gen...@li...] On Behalf Of Chris Delis
> > Sent: Monday, July 30, 2007 7:31 PM
> > To: vuf...@li...
> > Subject: [VuFind-General] General Questions about VUFind
> >
> >
> > Why are XML bib records stored locally in addition to the SOLR
> > (Lucene) database?  I'm assuming it is for performance reasons, right?
> > Did you begin development without the local files?
> 
> This is for both performance as well as to allow for XSL display of the individual records.
> Not every field needs to be stored in apache solr, so we only put the fields that are needed for searching.  This will also help keep out erroneous records in a the "all fields" search.  If you have varyihng opinions, please feel free to share.
> 
> >
> > I notice that you are in the middle of developing Holding capabilities
> > but haven't yet activated them in the code.  How much functionality do
> > you plan on performing?  Do you plan on making VUFind actually perform
> > the holding requests (and cancellation)?  The reason I ask is that we
> > are Voyager customers and the development team over there recommended
> > strongly against doing this ourselves (performing writes to the
> > database) and insisted on using WebVoyage; they said it was because of
> > data integrity issues.  As you can imagine, one of the reasons we are
> > interested in solutions such as VUFind is because of WebVoyage's
> > limitations ;-) For the time being, instead of performing the Holding
> > requests myself, I instead decided to create a web services facade for
> > WebVoyage (where I basically run WebVoage instances in the background
> > and screen scrape-the GET and POST parameters needed to perform
> > certain actions - yuck!).
> 
> Yeah, I refuse to develop any screen scraping code.  While I have not yet completely faced the holds/recalls, etc.  We do plan to develop it into vufind.
> 
>   I heard that there is a Voyager module that
> > can handle 3M Standard Interchange Protocol (SIP) communication.  I
> > didn't know this until recently.  I wonder if there are any SIP API's
> > available for Voyager so I can do without the screen-scraping web
> > services! :-) Do you or anyone on this list know of any?
> 
> I am not aware of this, so if you or anyone else knows anything Id love to hear more about it.
> 
> >
> > Scalability.  I understand that the backend Lucene database can scale
> > horizontally (replication), which is good, but I'm wondering what sort
> > of "horse-power" you plan on using for Villanova's VUFind
> > implementation.  Do you mind sharing your current or planned hardware
> > architecture for running this on your campus?  Approximately how many
> > bib records do you manage?  Also, have you tweaked any java VM
> > settings?
> 
> I am about to purchase 1 single dell 2950 to run our vufind implementation.  Erik Hatcher was able to load a solr instance with his 3 million records from UVA and run it on his macbook with the same speeds - searches in milliseconds.  I am not worried about record size.  There are many companies that use solr with many millions of records in a web farm environment with out any scalability issues - that we hear of anyway.
> Casey Durfee from Seattle Public said that he has noticed some faceting performance issues while testing solr with the LOC dataset.
> 
> I would love to see a large school such as yours adopt Vufind and really put it to the test.  How many records do you have?

Actually, we are not a single school; University of Illinois is our
host institution.  We are a consortium of at least (lost count) 65
schools, with the U of I being the largest.  We received our 25
millionth record this past week.  

I don't have the exact hardware specs with me, but in development I am
running a 2x 2GHz AMD-based CPU Virtual Machine (running VMWare) with
a Ubuntu 7.04 Server OS.  I have allocated 4GB of memory, but this
limit was never in danger of being a reached.  CPU usage, however, is
constantly being peaked out at or near 100% during SOLR searches.  Our
test environment has only 500,000 Bib records.

> 
> Our library just added about 150,000 records to our collection from apackage we just purchased so I think our collection is close to 750,000 now.  The dataset on vufind.org is about 550,000.
> 
> As to the Java vm setting, I haven't touched it yet.  I am also looking at switching from Tomcat to Jetty.  Solr comes prepackaged with Jetty and it would make the vufind distribution more lightweight.  Any thoughts on this?
> 

I would be willing to try Jetty.  How difficult was it to integrate
SOLR with Tomcat?  Is switching to Jetty a trivial task, you think?

Chris

> Andrew
> 
> 
> 
>