From: Chris D. <ce...@ui...> - 2007-07-30 23:30:58
|
Hi Andrew, Welcome back! Hope you had a relaxing time at the beach. I have a few general questions about the current design of VUFind, if you don't mind: Why are XML bib records stored locally in addition to the SOLR (Lucene) database? I'm assuming it is for performance reasons, right? Did you begin development without the local files? I notice that you are in the middle of developing Holding capabilities but haven't yet activated them in the code. How much functionality do you plan on performing? Do you plan on making VUFind actually perform the holding requests (and cancellation)? The reason I ask is that we are Voyager customers and the development team over there recommended strongly against doing this ourselves (performing writes to the database) and insisted on using WebVoyage; they said it was because of data integrity issues. As you can imagine, one of the reasons we are interested in solutions such as VUFind is because of WebVoyage's limitations ;-) For the time being, instead of performing the Holding requests myself, I instead decided to create a web services facade for WebVoyage (where I basically run WebVoage instances in the background and screen scrape-the GET and POST parameters needed to perform certain actions - yuck!). I heard that there is a Voyager module that can handle 3M Standard Interchange Protocol (SIP) communication. I didn't know this until recently. I wonder if there are any SIP API's available for Voyager so I can do without the screen-scraping web services! :-) Do you or anyone on this list know of any? Scalability. I understand that the backend Lucene database can scale horizontally (replication), which is good, but I'm wondering what sort of "horse-power" you plan on using for Villanova's VUFind implementation. Do you mind sharing your current or planned hardware architecture for running this on your campus? Approximately how many bib records do you manage? Also, have you tweaked any java VM settings? Thanks again for all your hard work! Cheers, Chris |
From: Andrew N. <and...@vi...> - 2007-07-31 14:10:20
|
> -----Original Message----- > From: vuf...@li... [mailto:vufind- > gen...@li...] On Behalf Of Chris Delis > Sent: Monday, July 30, 2007 7:31 PM > To: vuf...@li... > Subject: [VuFind-General] General Questions about VUFind > > > Why are XML bib records stored locally in addition to the SOLR > (Lucene) database? I'm assuming it is for performance reasons, right? > Did you begin development without the local files? This is for both performance as well as to allow for XSL display of the ind= ividual records. Not every field needs to be stored in apache solr, so we only put the field= s that are needed for searching. This will also help keep out erroneous re= cords in a the "all fields" search. If you have varyihng opinions, please = feel free to share. > > I notice that you are in the middle of developing Holding capabilities > but haven't yet activated them in the code. How much functionality do > you plan on performing? Do you plan on making VUFind actually perform > the holding requests (and cancellation)? The reason I ask is that we > are Voyager customers and the development team over there recommended > strongly against doing this ourselves (performing writes to the > database) and insisted on using WebVoyage; they said it was because of > data integrity issues. As you can imagine, one of the reasons we are > interested in solutions such as VUFind is because of WebVoyage's > limitations ;-) For the time being, instead of performing the Holding > requests myself, I instead decided to create a web services facade for > WebVoyage (where I basically run WebVoage instances in the background > and screen scrape-the GET and POST parameters needed to perform > certain actions - yuck!). Yeah, I refuse to develop any screen scraping code. While I have not yet c= ompletely faced the holds/recalls, etc. We do plan to develop it into vufi= nd. I heard that there is a Voyager module that > can handle 3M Standard Interchange Protocol (SIP) communication. I > didn't know this until recently. I wonder if there are any SIP API's > available for Voyager so I can do without the screen-scraping web > services! :-) Do you or anyone on this list know of any? I am not aware of this, so if you or anyone else knows anything Id love to = hear more about it. > > Scalability. I understand that the backend Lucene database can scale > horizontally (replication), which is good, but I'm wondering what sort > of "horse-power" you plan on using for Villanova's VUFind > implementation. Do you mind sharing your current or planned hardware > architecture for running this on your campus? Approximately how many > bib records do you manage? Also, have you tweaked any java VM > settings? I am about to purchase 1 single dell 2950 to run our vufind implementation.= Erik Hatcher was able to load a solr instance with his 3 million records = from UVA and run it on his macbook with the same speeds - searches in milli= seconds. I am not worried about record size. There are many companies tha= t use solr with many millions of records in a web farm environment with out= any scalability issues - that we hear of anyway. Casey Durfee from Seattle Public said that he has noticed some faceting per= formance issues while testing solr with the LOC dataset. I would love to see a large school such as yours adopt Vufind and really pu= t it to the test. How many records do you have? Our library just added about 150,000 records to our collection from apackag= e we just purchased so I think our collection is close to 750,000 now. The= dataset on vufind.org is about 550,000. As to the Java vm setting, I haven't touched it yet. I am also looking at = switching from Tomcat to Jetty. Solr comes prepackaged with Jetty and it w= ould make the vufind distribution more lightweight. Any thoughts on this? Andrew |
From: Chris D. <ce...@ui...> - 2007-07-31 14:35:13
|
On Tue, Jul 31, 2007 at 10:10:13AM -0400, Andrew Nagy wrote: > > -----Original Message----- > > From: vuf...@li... [mailto:vufind- > > gen...@li...] On Behalf Of Chris Delis > > Sent: Monday, July 30, 2007 7:31 PM > > To: vuf...@li... > > Subject: [VuFind-General] General Questions about VUFind > > > > > > Why are XML bib records stored locally in addition to the SOLR > > (Lucene) database? I'm assuming it is for performance reasons, right? > > Did you begin development without the local files? > > This is for both performance as well as to allow for XSL display of the individual records. > Not every field needs to be stored in apache solr, so we only put the fields that are needed for searching. This will also help keep out erroneous records in a the "all fields" search. If you have varyihng opinions, please feel free to share. > > > > > I notice that you are in the middle of developing Holding capabilities > > but haven't yet activated them in the code. How much functionality do > > you plan on performing? Do you plan on making VUFind actually perform > > the holding requests (and cancellation)? The reason I ask is that we > > are Voyager customers and the development team over there recommended > > strongly against doing this ourselves (performing writes to the > > database) and insisted on using WebVoyage; they said it was because of > > data integrity issues. As you can imagine, one of the reasons we are > > interested in solutions such as VUFind is because of WebVoyage's > > limitations ;-) For the time being, instead of performing the Holding > > requests myself, I instead decided to create a web services facade for > > WebVoyage (where I basically run WebVoage instances in the background > > and screen scrape-the GET and POST parameters needed to perform > > certain actions - yuck!). > > Yeah, I refuse to develop any screen scraping code. While I have not yet completely faced the holds/recalls, etc. We do plan to develop it into vufind. > > I heard that there is a Voyager module that > > can handle 3M Standard Interchange Protocol (SIP) communication. I > > didn't know this until recently. I wonder if there are any SIP API's > > available for Voyager so I can do without the screen-scraping web > > services! :-) Do you or anyone on this list know of any? > > I am not aware of this, so if you or anyone else knows anything Id love to hear more about it. > > > > > Scalability. I understand that the backend Lucene database can scale > > horizontally (replication), which is good, but I'm wondering what sort > > of "horse-power" you plan on using for Villanova's VUFind > > implementation. Do you mind sharing your current or planned hardware > > architecture for running this on your campus? Approximately how many > > bib records do you manage? Also, have you tweaked any java VM > > settings? > > I am about to purchase 1 single dell 2950 to run our vufind implementation. Erik Hatcher was able to load a solr instance with his 3 million records from UVA and run it on his macbook with the same speeds - searches in milliseconds. I am not worried about record size. There are many companies that use solr with many millions of records in a web farm environment with out any scalability issues - that we hear of anyway. > Casey Durfee from Seattle Public said that he has noticed some faceting performance issues while testing solr with the LOC dataset. > > I would love to see a large school such as yours adopt Vufind and really put it to the test. How many records do you have? Actually, we are not a single school; University of Illinois is our host institution. We are a consortium of at least (lost count) 65 schools, with the U of I being the largest. We received our 25 millionth record this past week. I don't have the exact hardware specs with me, but in development I am running a 2x 2GHz AMD-based CPU Virtual Machine (running VMWare) with a Ubuntu 7.04 Server OS. I have allocated 4GB of memory, but this limit was never in danger of being a reached. CPU usage, however, is constantly being peaked out at or near 100% during SOLR searches. Our test environment has only 500,000 Bib records. > > Our library just added about 150,000 records to our collection from apackage we just purchased so I think our collection is close to 750,000 now. The dataset on vufind.org is about 550,000. > > As to the Java vm setting, I haven't touched it yet. I am also looking at switching from Tomcat to Jetty. Solr comes prepackaged with Jetty and it would make the vufind distribution more lightweight. Any thoughts on this? > I would be willing to try Jetty. How difficult was it to integrate SOLR with Tomcat? Is switching to Jetty a trivial task, you think? Chris > Andrew > > > > |
From: Andrew N. <and...@vi...> - 2007-07-31 14:57:36
|
> > I would love to see a large school such as yours adopt Vufind and > really put it to the test. How many records do you have? > > Actually, we are not a single school; University of Illinois is our > host institution. We are a consortium of at least (lost count) 65 > schools, with the U of I being the largest. We received our 25 > millionth record this past week. Wow, that would be a great test case! > > I don't have the exact hardware specs with me, but in development I am > running a 2x 2GHz AMD-based CPU Virtual Machine (running VMWare) with > a Ubuntu 7.04 Server OS. I have allocated 4GB of memory, but this > limit was never in danger of being a reached. CPU usage, however, is > constantly being peaked out at or near 100% during SOLR searches. Our > test environment has only 500,000 Bib records. Hmm, you are seeing CPU spikes when searching 500,000 records? I would hat= e to blame someone else, but maybe VMWare is causing this issue. I have no= t heard of this as a problem yet. Lucene/Solr should not have any problem = searching 500,000 records. Have you tried submitting an optimize statement= to solr? http://wiki.apache.org/solr/UpdateXmlMessages#head-a847de14ab548e9f3d9a5ba7= 2aae7e5ac25cc51b > > > > > Our library just added about 150,000 records to our collection from > apackage we just purchased so I think our collection is close to > 750,000 now. The dataset on vufind.org is about 550,000. > > > > As to the Java vm setting, I haven't touched it yet. I am also > looking at switching from Tomcat to Jetty. Solr comes prepackaged with > Jetty and it would make the vufind distribution more lightweight. Any > thoughts on this? > > > > I would be willing to try Jetty. How difficult was it to integrate > SOLR with Tomcat? Is switching to Jetty a trivial task, you think? The switch to Jetty would be trivial. If you would like to test it, please= let me know how it works and if you notice any difference in performance. Thanks! Andrew |
From: Chris D. <ce...@ui...> - 2007-07-31 15:22:04
|
On Tue, Jul 31, 2007 at 10:57:31AM -0400, Andrew Nagy wrote: > > > I would love to see a large school such as yours adopt Vufind and > > really put it to the test. How many records do you have? > > > > Actually, we are not a single school; University of Illinois is our > > host institution. We are a consortium of at least (lost count) 65 > > schools, with the U of I being the largest. We received our 25 > > millionth record this past week. > > Wow, that would be a great test case! > > > > > I don't have the exact hardware specs with me, but in development I am > > running a 2x 2GHz AMD-based CPU Virtual Machine (running VMWare) with > > a Ubuntu 7.04 Server OS. I have allocated 4GB of memory, but this > > limit was never in danger of being a reached. CPU usage, however, is > > constantly being peaked out at or near 100% during SOLR searches. Our > > test environment has only 500,000 Bib records. > > Hmm, you are seeing CPU spikes when searching 500,000 records? I would hate to blame someone else, but maybe VMWare is causing this issue. I have not heard of this as a problem yet. Lucene/Solr should not have any problem searching 500,000 records. Have you tried submitting an optimize statement to solr? Yes, whenever I add records to SOLR, I issue a <commit/> and <optimize/>. Well, if this behavior is not common, then it wouldn't be unfair to blame the one system that exhibits it :-) I didn't set up the VMWare instance, but my guess is that it is a very typical setup. Also, this system only runs VUFind. > http://wiki.apache.org/solr/UpdateXmlMessages#head-a847de14ab548e9f3d9a5ba72aae7e5ac25cc51b > > > > > > > > > Our library just added about 150,000 records to our collection from > > apackage we just purchased so I think our collection is close to > > 750,000 now. The dataset on vufind.org is about 550,000. > > > > > > As to the Java vm setting, I haven't touched it yet. I am also > > looking at switching from Tomcat to Jetty. Solr comes prepackaged with > > Jetty and it would make the vufind distribution more lightweight. Any > > thoughts on this? > > > > > > > I would be willing to try Jetty. How difficult was it to integrate > > SOLR with Tomcat? Is switching to Jetty a trivial task, you think? > > The switch to Jetty would be trivial. If you would like to test it, please let me know how it works and if you notice any difference in performance. > I wouldn't mind testing this out. Chris > Thanks! > Andrew |
From: Wayne G. <ws...@wm...> - 2007-07-31 14:51:56
|
Just had a thought that may have an impact on the development with putting the xml files in one data directory. The max number of files on an ext3 file system (VolumeSize / 2)^13 or the number of blocks (whichever is less) for the entire file system. The minimum is (Volume Size / 2)^23, which is generally enough, though it may need to be taken under consideration for larger library systems. An alternate file system like JFS, XFS, or ext4 might be better for this type of application... Wayne Andrew Nagy wrote: >> -----Original Message----- >> From: vuf...@li... [mailto:vufind- >> gen...@li...] On Behalf Of Chris Delis >> Sent: Monday, July 30, 2007 7:31 PM >> To: vuf...@li... >> Subject: [VuFind-General] General Questions about VUFind >> >> >> Why are XML bib records stored locally in addition to the SOLR >> (Lucene) database? I'm assuming it is for performance reasons, right? >> Did you begin development without the local files? > > This is for both performance as well as to allow for XSL display of the individual records. > Not every field needs to be stored in apache solr, so we only put the fields that are needed for searching. This will also help keep out erroneous records in a the "all fields" search. If you have varyihng opinions, please feel free to share. > >> I notice that you are in the middle of developing Holding capabilities >> but haven't yet activated them in the code. How much functionality do >> you plan on performing? Do you plan on making VUFind actually perform >> the holding requests (and cancellation)? The reason I ask is that we >> are Voyager customers and the development team over there recommended >> strongly against doing this ourselves (performing writes to the >> database) and insisted on using WebVoyage; they said it was because of >> data integrity issues. As you can imagine, one of the reasons we are >> interested in solutions such as VUFind is because of WebVoyage's >> limitations ;-) For the time being, instead of performing the Holding >> requests myself, I instead decided to create a web services facade for >> WebVoyage (where I basically run WebVoage instances in the background >> and screen scrape-the GET and POST parameters needed to perform >> certain actions - yuck!). > > Yeah, I refuse to develop any screen scraping code. While I have not yet completely faced the holds/recalls, etc. We do plan to develop it into vufind. > > I heard that there is a Voyager module that >> can handle 3M Standard Interchange Protocol (SIP) communication. I >> didn't know this until recently. I wonder if there are any SIP API's >> available for Voyager so I can do without the screen-scraping web >> services! :-) Do you or anyone on this list know of any? > > I am not aware of this, so if you or anyone else knows anything Id love to hear more about it. > >> Scalability. I understand that the backend Lucene database can scale >> horizontally (replication), which is good, but I'm wondering what sort >> of "horse-power" you plan on using for Villanova's VUFind >> implementation. Do you mind sharing your current or planned hardware >> architecture for running this on your campus? Approximately how many >> bib records do you manage? Also, have you tweaked any java VM >> settings? > > I am about to purchase 1 single dell 2950 to run our vufind implementation. Erik Hatcher was able to load a solr instance with his 3 million records from UVA and run it on his macbook with the same speeds - searches in milliseconds. I am not worried about record size. There are many companies that use solr with many millions of records in a web farm environment with out any scalability issues - that we hear of anyway. > Casey Durfee from Seattle Public said that he has noticed some faceting performance issues while testing solr with the LOC dataset. > > I would love to see a large school such as yours adopt Vufind and really put it to the test. How many records do you have? > > Our library just added about 150,000 records to our collection from apackage we just purchased so I think our collection is close to 750,000 now. The dataset on vufind.org is about 550,000. > > As to the Java vm setting, I haven't touched it yet. I am also looking at switching from Tomcat to Jetty. Solr comes prepackaged with Jetty and it would make the vufind distribution more lightweight. Any thoughts on this? > > Andrew > > > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general -- /** * Wayne Graham * Earl Gregg Swem Library * PO Box 8794 * Williamsburg, VA 23188 * 757.221.3112 * http://swem.wm.edu/blogs/waynegraham/ */ |
From: Andrew N. <and...@vi...> - 2007-07-31 14:57:53
|
> -----Original Message----- > From: vuf...@li... [mailto:vufind- > gen...@li...] On Behalf Of Wayne Graham > Sent: Tuesday, July 31, 2007 10:52 AM > To: vuf...@li... > Subject: Re: [VuFind-General] General Questions about VUFind > > Just had a thought that may have an impact on the development with > putting the xml files in one data directory. The max number of files on > an ext3 file system (VolumeSize / 2)^13 or the number of blocks > (whichever is less) for the entire file system. The minimum is (Volume > Size / 2)^23, which is generally enough, though it may need to be taken > under consideration for larger library systems. > > An alternate file system like JFS, XFS, or ext4 might be better for > this > type of application... > Hmm, yes this is something that I did not take into consideration. I had a= t on time planned on adding the marcxml records into a native xml database = for storage and management - such as exist. This is how our digital librar= y is managed. exist stores and manages the mets xml records and solr (not = yet complete) processes the searching and faceting. This might be worth investigating for larger institutions. Thanks Andrew |
From: Doran, M. D <do...@ut...> - 2007-07-31 15:31:44
|
Hi Chris, > I heard that there is a Voyager module that can handle 3M > Standard Interchange Protocol (SIP) communication. ... > I wonder if there are any SIP API's available for Voyager > so I can do without the screen-scraping web services! NCIP and a SIP2 APIs exist for Voyager. However, don't get your hopes = up too high. My understanding is that Ex Libris (nee Endeavor) supplies an NCIP = server binary as part of their ILL add-on product. If you have not = purchased that ILL product, then you will not have that NCIP server. It = is also my understanding that while the Voyager NCIP API supports their = ILL product, it was not meant to serve as a general purpose NCIP = application programming interface (API) [1]. Similarly, a SIP2 server is included when purchasing the Voyager = Self-Check add-on product and, again, while it supports their Self-Check = product, it was not meant to serve as a general purpose SIP application = programming interface (API) [2]. That doesn't mean you can't try to utilize these APIs for other purposes = (if you have purchased them). However, it will probably involve a lot = of trial and error -- you can't assume that all the functionality = outlined in the standards is going to work. I know that a couple of the = Finnish Universities did some very innovative work, utilizing SIP to = integrate Voyager with a mobile phone application [3]. If I recollect, = though, that particular development effort was done with Endeavor's = cooperation. For more information, search the voyager-l archive on "NCIP" and "SIP"; = see in particular the Voyager Product Manager's informative update re = (then) Endeavor's accomplishments and plans for NCIP interoperability = [4]. Many of us would be interested in a more fully-functioning NCIP = API into Voyager so that we can (safely) write to, and read from, the = circulation module data via our own client programs; my impression was = that Endeavor was not too keen on providing that type of functionality, = since it would entail a fairly substantial development effort. I've = been beating this drum since 2003 and have talked off the record with = some of the Voyager developers. They're not going to do anything until, = and unless, there is some demand. Make your voice heard! -- Michael [1] NISO Circulation Interchange Protocol (NCIP)=20 http://www.niso.org/committees/committee_at.html [2] 3M Standard Interchange Protocol http://cms.3m.com/cms/US/en/0-170/ckceiFQ/viewimage.jhtml [3] See "Voyager circulation services to customers using mobile phones, = cases Helsinki School of Economics and Helsinki University of = Technology" by Matti Raatikainen, Ulla Huurinainen, Systems Analyst, = Helsinki University of Technology Library. EndUser 2003 [4] Subject: RE: [VOYAGER-L]: Voyager SIP or NCIP interface=20 Date: 2004-03-11 16:34:00=20 To: Voyager-L "I'd like to clarify our accomplishments and plans for interoperability and for NCIP..." # Michael Doran, Systems Librarian # University of Texas at Arlington # 817-272-5326 office # 817-688-1926 mobile # do...@ut... # http://rocky.uta.edu/doran/ =20 > -----Original Message----- > From: vuf...@li...=20 > [mailto:vuf...@li...] On=20 > Behalf Of Chris Delis > Sent: Monday, July 30, 2007 6:31 PM > To: vuf...@li... > Subject: [VuFind-General] General Questions about VUFind >=20 > Hi Andrew, >=20 > Welcome back! Hope you had a relaxing time at the beach. >=20 > I have a few general questions about the current design of=20 > VUFind, if you don't mind: >=20 > Why are XML bib records stored locally in addition to the SOLR > (Lucene) database? I'm assuming it is for performance reasons, right? > Did you begin development without the local files? >=20 > I notice that you are in the middle of developing Holding=20 > capabilities but haven't yet activated them in the code. How=20 > much functionality do you plan on performing? Do you plan on=20 > making VUFind actually perform the holding requests (and=20 > cancellation)? The reason I ask is that we are Voyager=20 > customers and the development team over there recommended=20 > strongly against doing this ourselves (performing writes to the > database) and insisted on using WebVoyage; they said it was=20 > because of data integrity issues. As you can imagine, one of=20 > the reasons we are interested in solutions such as VUFind is=20 > because of WebVoyage's limitations ;-) For the time being,=20 > instead of performing the Holding requests myself, I instead=20 > decided to create a web services facade for WebVoyage (where=20 > I basically run WebVoage instances in the background and=20 > screen scrape-the GET and POST parameters needed to perform=20 > certain actions - yuck!). I heard that there is a Voyager=20 > module that can handle 3M Standard Interchange Protocol (SIP)=20 > communication. I didn't know this until recently. I wonder=20 > if there are any SIP API's available for Voyager so I can do=20 > without the screen-scraping web services! :-) Do you or=20 > anyone on this list know of any? >=20 > Scalability. I understand that the backend Lucene database=20 > can scale horizontally (replication), which is good, but I'm=20 > wondering what sort of "horse-power" you plan on using for=20 > Villanova's VUFind implementation. Do you mind sharing your=20 > current or planned hardware architecture for running this on=20 > your campus? Approximately how many bib records do you=20 > manage? Also, have you tweaked any java VM settings? >=20 > Thanks again for all your hard work! >=20 > Cheers, > Chris >=20 >=20 > -------------------------------------------------------------- > ----------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and=20 > a browser. > Download your FREE copy of Splunk now >> =20 > http://get.splunk.com/ _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general >=20 |
From: Chris D. <ce...@ui...> - 2007-07-31 15:46:22
|
On Tue, Jul 31, 2007 at 10:31:30AM -0500, Doran, Michael D wrote: > Hi Chris, Hi Michael, (BTW, we have enjoyed using your New Books list. Thanks!) > > > I heard that there is a Voyager module that can handle 3M > > Standard Interchange Protocol (SIP) communication. ... > > I wonder if there are any SIP API's available for Voyager > > so I can do without the screen-scraping web services! > > NCIP and a SIP2 APIs exist for Voyager. However, don't get your hopes up too high. > Believe me, my hopes are not very high ;-) I will try searching through the links you were generous enough to provide. You're right about making our voices heard and we plan on doing just that (not that we've been "quiet" over the years). In the meantime, I will probably integrate with my screen-scraping API until/unless development in VUFind seems promising. There are a lot of things I hate about this approach (e.g., all of those keyword search servers getting launched for each OPAC server, of which will be deemed unnecessary if we can get VUFind to work in production), not to mention all of the ugly and rigid code. I will just have to be careful in re-using WebVoyage sessions whenever I can. Also, I plan on using WebVoyage as a backup in the event of a Web Service failure. All of this hardship wouldn't be necessary if there simply was a Voyager API! Even if it were proprietary and not open source, it would still be enormously helpful. Good luck with your endeavors (pun intended)! Chris > My understanding is that Ex Libris (nee Endeavor) supplies an NCIP server binary as part of their ILL add-on product. If you have not purchased that ILL product, then you will not have that NCIP server. It is also my understanding that while the Voyager NCIP API supports their ILL product, it was not meant to serve as a general purpose NCIP application programming interface (API) [1]. > > Similarly, a SIP2 server is included when purchasing the Voyager Self-Check add-on product and, again, while it supports their Self-Check product, it was not meant to serve as a general purpose SIP application programming interface (API) [2]. > > That doesn't mean you can't try to utilize these APIs for other purposes (if you have purchased them). However, it will probably involve a lot of trial and error -- you can't assume that all the functionality outlined in the standards is going to work. I know that a couple of the Finnish Universities did some very innovative work, utilizing SIP to integrate Voyager with a mobile phone application [3]. If I recollect, though, that particular development effort was done with Endeavor's cooperation. > > For more information, search the voyager-l archive on "NCIP" and "SIP"; see in particular the Voyager Product Manager's informative update re (then) Endeavor's accomplishments and plans for NCIP interoperability [4]. Many of us would be interested in a more fully-functioning NCIP API into Voyager so that we can (safely) write to, and read from, the circulation module data via our own client programs; my impression was that Endeavor was not too keen on providing that type of functionality, since it would entail a fairly substantial development effort. I've been beating this drum since 2003 and have talked off the record with some of the Voyager developers. They're not going to do anything until, and unless, there is some demand. Make your voice heard! > > -- Michael > > [1] NISO Circulation Interchange Protocol (NCIP) > http://www.niso.org/committees/committee_at.html > > [2] 3M Standard Interchange Protocol > http://cms.3m.com/cms/US/en/0-170/ckceiFQ/viewimage.jhtml > > [3] See "Voyager circulation services to customers using mobile phones, cases Helsinki School of Economics and Helsinki University of Technology" by Matti Raatikainen, Ulla Huurinainen, Systems Analyst, Helsinki University of Technology Library. EndUser 2003 > > [4] Subject: RE: [VOYAGER-L]: Voyager SIP or NCIP interface > Date: 2004-03-11 16:34:00 > To: Voyager-L > "I'd like to clarify our accomplishments and plans > for interoperability and for NCIP..." > > # Michael Doran, Systems Librarian > # University of Texas at Arlington > # 817-272-5326 office > # 817-688-1926 mobile > # do...@ut... > # http://rocky.uta.edu/doran/ > > > > -----Original Message----- > > From: vuf...@li... > > [mailto:vuf...@li...] On > > Behalf Of Chris Delis > > Sent: Monday, July 30, 2007 6:31 PM > > To: vuf...@li... > > Subject: [VuFind-General] General Questions about VUFind > > > > Hi Andrew, > > > > Welcome back! Hope you had a relaxing time at the beach. > > > > I have a few general questions about the current design of > > VUFind, if you don't mind: > > > > Why are XML bib records stored locally in addition to the SOLR > > (Lucene) database? I'm assuming it is for performance reasons, right? > > Did you begin development without the local files? > > > > I notice that you are in the middle of developing Holding > > capabilities but haven't yet activated them in the code. How > > much functionality do you plan on performing? Do you plan on > > making VUFind actually perform the holding requests (and > > cancellation)? The reason I ask is that we are Voyager > > customers and the development team over there recommended > > strongly against doing this ourselves (performing writes to the > > database) and insisted on using WebVoyage; they said it was > > because of data integrity issues. As you can imagine, one of > > the reasons we are interested in solutions such as VUFind is > > because of WebVoyage's limitations ;-) For the time being, > > instead of performing the Holding requests myself, I instead > > decided to create a web services facade for WebVoyage (where > > I basically run WebVoage instances in the background and > > screen scrape-the GET and POST parameters needed to perform > > certain actions - yuck!). I heard that there is a Voyager > > module that can handle 3M Standard Interchange Protocol (SIP) > > communication. I didn't know this until recently. I wonder > > if there are any SIP API's available for Voyager so I can do > > without the screen-scraping web services! :-) Do you or > > anyone on this list know of any? > > > > Scalability. I understand that the backend Lucene database > > can scale horizontally (replication), which is good, but I'm > > wondering what sort of "horse-power" you plan on using for > > Villanova's VUFind implementation. Do you mind sharing your > > current or planned hardware architecture for running this on > > your campus? Approximately how many bib records do you > > manage? Also, have you tweaked any java VM settings? > > > > Thanks again for all your hard work! > > > > Cheers, > > Chris > > > > > > -------------------------------------------------------------- > > ----------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and > > a browser. > > Download your FREE copy of Splunk now >> > > http://get.splunk.com/ _______________________________________________ > > VuFind-General mailing list > > VuF...@li... > > https://lists.sourceforge.net/lists/listinfo/vufind-general > > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general |
From: Wayne G. <ws...@wm...> - 2007-07-31 15:46:55
|
You may be able to get around some of the CPU spikes by changing the GC variable for Java. Also, which version of Java are you running under? You may find a performance bump with the java-sun6-jdk. You can set -verbose:gc in the CATALINA_OPTS to dump to the solr/tomcat/logs/catalina.out file to see what's going on. I suspect that there is some Java tuning that needs to take place. HTH, Wayne Chris Delis wrote: > On Tue, Jul 31, 2007 at 10:57:31AM -0400, Andrew Nagy wrote: >>>> I would love to see a large school such as yours adopt Vufind and >>> really put it to the test. How many records do you have? >>> >>> Actually, we are not a single school; University of Illinois is our >>> host institution. We are a consortium of at least (lost count) 65 >>> schools, with the U of I being the largest. We received our 25 >>> millionth record this past week. >> Wow, that would be a great test case! >> >>> I don't have the exact hardware specs with me, but in development I am >>> running a 2x 2GHz AMD-based CPU Virtual Machine (running VMWare) with >>> a Ubuntu 7.04 Server OS. I have allocated 4GB of memory, but this >>> limit was never in danger of being a reached. CPU usage, however, is >>> constantly being peaked out at or near 100% during SOLR searches. Our >>> test environment has only 500,000 Bib records. >> Hmm, you are seeing CPU spikes when searching 500,000 records? I > would hate to blame someone else, but maybe VMWare is causing this > issue. I have not heard of this as a problem yet. Lucene/Solr should > not have any problem searching 500,000 records. Have you tried > submitting an optimize statement to solr? > > Yes, whenever I add records to SOLR, I issue a <commit/> and > <optimize/>. > > Well, if this behavior is not common, then it wouldn't be unfair to > blame the one system that exhibits it :-) I didn't set up the VMWare > instance, but my guess is that it is a very typical setup. Also, this > system only runs VUFind. > > >> http://wiki.apache.org/solr/UpdateXmlMessages#head-a847de14ab548e9f3d9a5ba72aae7e5ac25cc51b >> >>>> Our library just added about 150,000 records to our collection from >>> apackage we just purchased so I think our collection is close to >>> 750,000 now. The dataset on vufind.org is about 550,000. >>>> As to the Java vm setting, I haven't touched it yet. I am also >>> looking at switching from Tomcat to Jetty. Solr comes prepackaged with >>> Jetty and it would make the vufind distribution more lightweight. Any >>> thoughts on this? >>> I would be willing to try Jetty. How difficult was it to integrate >>> SOLR with Tomcat? Is switching to Jetty a trivial task, you think? >> The switch to Jetty would be trivial. If you would like to test it, please let me know how it works and if you notice any difference in performance. >> > > > I wouldn't mind testing this out. > > Chris > > >> Thanks! >> Andrew > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > VuFind-General mailing list > VuF...@li... > https://lists.sourceforge.net/lists/listinfo/vufind-general -- /** * Wayne Graham * Earl Gregg Swem Library * PO Box 8794 * Williamsburg, VA 23188 * 757.221.3112 * http://swem.wm.edu/blogs/waynegraham/ */ |
From: Chris D. <ce...@ui...> - 2007-07-31 16:18:58
|
On Tue, Jul 31, 2007 at 11:46:46AM -0400, Wayne Graham wrote: > You may be able to get around some of the CPU spikes by changing the GC > variable for Java. Also, which version of Java are you running under? > You may find a performance bump with the java-sun6-jdk. > Looks like your correct. Thanks Wayne. It's been a while since I've dealt with java tuning issues. Looks like I need to profile the garbage collector and tweak their settings appropriately. Chris > You can set -verbose:gc in the CATALINA_OPTS to dump to the > solr/tomcat/logs/catalina.out file to see what's going on. I suspect > that there is some Java tuning that needs to take place. > > HTH, > Wayne > > Chris Delis wrote: > > On Tue, Jul 31, 2007 at 10:57:31AM -0400, Andrew Nagy wrote: > >>>> I would love to see a large school such as yours adopt Vufind and > >>> really put it to the test. How many records do you have? > >>> > >>> Actually, we are not a single school; University of Illinois is our > >>> host institution. We are a consortium of at least (lost count) 65 > >>> schools, with the U of I being the largest. We received our 25 > >>> millionth record this past week. > >> Wow, that would be a great test case! > >> > >>> I don't have the exact hardware specs with me, but in development I am > >>> running a 2x 2GHz AMD-based CPU Virtual Machine (running VMWare) with > >>> a Ubuntu 7.04 Server OS. I have allocated 4GB of memory, but this > >>> limit was never in danger of being a reached. CPU usage, however, is > >>> constantly being peaked out at or near 100% during SOLR searches. Our > >>> test environment has only 500,000 Bib records. > >> Hmm, you are seeing CPU spikes when searching 500,000 records? I > > would hate to blame someone else, but maybe VMWare is causing this > > issue. I have not heard of this as a problem yet. Lucene/Solr should > > not have any problem searching 500,000 records. Have you tried > > submitting an optimize statement to solr? > > > > Yes, whenever I add records to SOLR, I issue a <commit/> and > > <optimize/>. > > > > Well, if this behavior is not common, then it wouldn't be unfair to > > blame the one system that exhibits it :-) I didn't set up the VMWare > > instance, but my guess is that it is a very typical setup. Also, this > > system only runs VUFind. > > > > > >> http://wiki.apache.org/solr/UpdateXmlMessages#head-a847de14ab548e9f3d9a5ba72aae7e5ac25cc51b > >> > >>>> Our library just added about 150,000 records to our collection from > >>> apackage we just purchased so I think our collection is close to > >>> 750,000 now. The dataset on vufind.org is about 550,000. > >>>> As to the Java vm setting, I haven't touched it yet. I am also > >>> looking at switching from Tomcat to Jetty. Solr comes prepackaged with > >>> Jetty and it would make the vufind distribution more lightweight. Any > >>> thoughts on this? > >>> I would be willing to try Jetty. How difficult was it to integrate > >>> SOLR with Tomcat? Is switching to Jetty a trivial task, you think? > >> The switch to Jetty would be trivial. If you would like to test it, please let me know how it works and if you notice any difference in performance. > >> > > > > > > I wouldn't mind testing this out. > > > > Chris > > > > > >> Thanks! > >> Andrew > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > VuFind-General mailing list > > VuF...@li... > > https://lists.sourceforge.net/lists/listinfo/vufind-general > > > -- > /** > * Wayne Graham > * Earl Gregg Swem Library > * PO Box 8794 > * Williamsburg, VA 23188 > * 757.221.3112 > * http://swem.wm.edu/blogs/waynegraham/ > */ > |
From: Andrew N. <and...@vi...> - 2007-08-02 13:45:36
|
> Just had a thought that may have an impact on the development with > putting the xml files in one data directory. The max number of files on > an ext3 file system (VolumeSize / 2)^13 or the number of blocks > (whichever is less) for the entire file system. The minimum is (Volume > Size / 2)^23, which is generally enough, though it may need to be taken > under consideration for larger library systems. I have been thinking about this a little bit. But unless my math is way of= f, it shouldn't be a consern. The VolumeSize is measured in bytes, right? So lets say that you have a 25= 0gb drive. This means that the VolumeSize is 268,435,456,000 bytes. Divid= ed by 2 is 134,217,728,000. If we take that to the 13th power ... well I d= on't need to do the math, it will be some enormous number. Now I know that bash already has a limit on the '*' conversion. If you wer= e to try typing ls * in the data directory, bash will give you an error abo= ut a limit. Im not sure what the limit is, but with our half a million rec= ords I get the error. But this hasn't swayed me from dumping all the files= in 1 directory. Andrew |
From: Wayne G. <ws...@wm...> - 2007-08-02 14:00:51
|
You know, this was more my geeky sys admin side coming out ;) This would be more an issue for folks running a virtualized instance where they need to calculate the size of the volume. However, because of the XML file size, you'd need to allocated enough space on the volume to actually hold the data, an would most likely never run into this issue. If folks do run into that particular issue, changing to a file system like XFS or JSF that doesn't have this variable max file issue is far more reasonable than re-architecting the application. Wayne Andrew Nagy wrote: >> Just had a thought that may have an impact on the development with >> putting the xml files in one data directory. The max number of files on >> an ext3 file system (VolumeSize / 2)^13 or the number of blocks >> (whichever is less) for the entire file system. The minimum is (Volume >> Size / 2)^23, which is generally enough, though it may need to be taken >> under consideration for larger library systems. > > I have been thinking about this a little bit. But unless my math is way off, it shouldn't be a consern. > > The VolumeSize is measured in bytes, right? So lets say that you have a 250gb drive. This means that the VolumeSize is 268,435,456,000 bytes. Divided by 2 is 134,217,728,000. If we take that to the 13th power ... well I don't need to do the math, it will be some enormous number. > > Now I know that bash already has a limit on the '*' conversion. If you were to try typing ls * in the data directory, bash will give you an error about a limit. Im not sure what the limit is, but with our half a million records I get the error. But this hasn't swayed me from dumping all the files in 1 directory. > > Andrew > -- /** * Wayne Graham * Earl Gregg Swem Library * PO Box 8794 * Williamsburg, VA 23188 * 757.221.3112 * http://swem.wm.edu/blogs/waynegraham/ */ |
From: Andrew N. <and...@vi...> - 2007-08-27 13:23:46
|
> > I, too, am noticing performance issues with faceting. I didn't > notice > > this with data sets of 500,000 or so. However, upon importing 2 > > million bibs, the response time is now quite noticeable. Facet > > searches without a filter string are reasonable, but when you narrow > > in on a facet even one time, the "Loading Narrowing Options..." can > > take a minute or two to complete. Do you have any thoughts on ways > to > > work around this? I know next-to-nothing about SOLR and its tuning > > capabilities. I guess now would be a good time for me to delve into > > it, eh? :-O > > > It seems I might be able to make a difference by tweaking the > filterCash settings in solrconfig.xml. ??? > > I noticed quite a few fields in schema.xml are "text." I'm guessing > there is a reason for this, right? Would bad things happen if I > changed some of them (e.g., author, title) to "string"? I.e., in what > ways are you taking advantage of the "text" type? The best thing to do, as I have heard from the solr listserv is to play wit= h the cache numbers. Supposidly it is a black magic, but if you figure out= the best settings for the cache, solr will be very fast. I would not change the text to string. You will notice many of the fields = also have a string compliment. For example there is a field called author = and a field called authorStr. The *Str compliment is used for faceting. Andrew |