From: Chris D. <ce...@ui...> - 2007-07-31 14:35:13
|
On Tue, Jul 31, 2007 at 10:10:13AM -0400, Andrew Nagy wrote: > > -----Original Message----- > > From: vuf...@li... [mailto:vufind- > > gen...@li...] On Behalf Of Chris Delis > > Sent: Monday, July 30, 2007 7:31 PM > > To: vuf...@li... > > Subject: [VuFind-General] General Questions about VUFind > > > > > > Why are XML bib records stored locally in addition to the SOLR > > (Lucene) database? I'm assuming it is for performance reasons, right? > > Did you begin development without the local files? > > This is for both performance as well as to allow for XSL display of the individual records. > Not every field needs to be stored in apache solr, so we only put the fields that are needed for searching. This will also help keep out erroneous records in a the "all fields" search. If you have varyihng opinions, please feel free to share. > > > > > I notice that you are in the middle of developing Holding capabilities > > but haven't yet activated them in the code. How much functionality do > > you plan on performing? Do you plan on making VUFind actually perform > > the holding requests (and cancellation)? The reason I ask is that we > > are Voyager customers and the development team over there recommended > > strongly against doing this ourselves (performing writes to the > > database) and insisted on using WebVoyage; they said it was because of > > data integrity issues. As you can imagine, one of the reasons we are > > interested in solutions such as VUFind is because of WebVoyage's > > limitations ;-) For the time being, instead of performing the Holding > > requests myself, I instead decided to create a web services facade for > > WebVoyage (where I basically run WebVoage instances in the background > > and screen scrape-the GET and POST parameters needed to perform > > certain actions - yuck!). > > Yeah, I refuse to develop any screen scraping code. While I have not yet completely faced the holds/recalls, etc. We do plan to develop it into vufind. > > I heard that there is a Voyager module that > > can handle 3M Standard Interchange Protocol (SIP) communication. I > > didn't know this until recently. I wonder if there are any SIP API's > > available for Voyager so I can do without the screen-scraping web > > services! :-) Do you or anyone on this list know of any? > > I am not aware of this, so if you or anyone else knows anything Id love to hear more about it. > > > > > Scalability. I understand that the backend Lucene database can scale > > horizontally (replication), which is good, but I'm wondering what sort > > of "horse-power" you plan on using for Villanova's VUFind > > implementation. Do you mind sharing your current or planned hardware > > architecture for running this on your campus? Approximately how many > > bib records do you manage? Also, have you tweaked any java VM > > settings? > > I am about to purchase 1 single dell 2950 to run our vufind implementation. Erik Hatcher was able to load a solr instance with his 3 million records from UVA and run it on his macbook with the same speeds - searches in milliseconds. I am not worried about record size. There are many companies that use solr with many millions of records in a web farm environment with out any scalability issues - that we hear of anyway. > Casey Durfee from Seattle Public said that he has noticed some faceting performance issues while testing solr with the LOC dataset. > > I would love to see a large school such as yours adopt Vufind and really put it to the test. How many records do you have? Actually, we are not a single school; University of Illinois is our host institution. We are a consortium of at least (lost count) 65 schools, with the U of I being the largest. We received our 25 millionth record this past week. I don't have the exact hardware specs with me, but in development I am running a 2x 2GHz AMD-based CPU Virtual Machine (running VMWare) with a Ubuntu 7.04 Server OS. I have allocated 4GB of memory, but this limit was never in danger of being a reached. CPU usage, however, is constantly being peaked out at or near 100% during SOLR searches. Our test environment has only 500,000 Bib records. > > Our library just added about 150,000 records to our collection from apackage we just purchased so I think our collection is close to 750,000 now. The dataset on vufind.org is about 550,000. > > As to the Java vm setting, I haven't touched it yet. I am also looking at switching from Tomcat to Jetty. Solr comes prepackaged with Jetty and it would make the vufind distribution more lightweight. Any thoughts on this? > I would be willing to try Jetty. How difficult was it to integrate SOLR with Tomcat? Is switching to Jetty a trivial task, you think? Chris > Andrew > > > > |