I don’t think that Solr should have too much trouble handling millions of records; it has definitely been done before.  Requirements will depend to some extent on the detail level of the records – it’s always hard to predict.


I personally don’t have a lot of experience with really large indexes, but you might be interested in reading Tom Burton-West’s blog, which has a lot of useful information about his experiences working with a massive full-text Solr index at HathiTrust:




If you run into trouble, Tom might also be a useful resource; he’s very approachable and has good insights into how Solr performs under extreme conditions.


- Demian


From: Filipe MS Bento (UA) [mailto:fsb@ua.pt]
Sent: Thursday, October 04, 2012 5:38 AM
To: Andrea Marchitelli; vufind-tech@lists.sourceforge.net
Subject: Re: [VuFind-Tech] Systems requirements for a quite big installation


Hi Andrea!


Congrats on your project: it really promises!


Regarding the specs, I am hereby sharing the ones I requested our Computer Center for a future production server to host about half of the records you mentioned (at the present time it has 6.3 million ones in a very low specs staging server), most of them coming from OAI-PMH sources (local and external), with some very "rich" records (in contents, I mean: large abstracts and alike) – will be running in a single VM (VMs farm), CentOS 6 :


Disk Space: 35GB (our present SOLR du) x 8 = +/- 300GB

Memory: 8GB (min), ideal 10GB+ > allows to reduce the time it takes for index optimization

CPU: at least 4 logical processors (ideal: 8+).


At that time, I asked for Demian’s opinion if these numbers were ok for the desired target numbers and Demian gave said it should be ok, so I guess I would go for the double of  all these numbers (duplicate them) in your case, at least in what it concerns Disk Space (storage, remote or local) available; the other ones are just a question of performance and if you are running VuFind also in a VM it very easy to increase those at any time if you see that the server isn’t coping with that specs for the memory / # of CPUs.


Anyone with large scale production servers can confirm this (inc. Demian, of course : ) )?


All the best and looking forward to see it “live and kicking”,





Filipe Manuel S. Bento  |  http://about.filipebento.pt/


Going Beyond the Bibliographic Catalog: 
The Basis for a New Participatory Scientific Information Discovery and Sharing Model

Filipe Bento and Lídia Silva (University of Aveiro, Portugal)

IGI Global, 2013. ISBN 9781466619128, p. 1-38


Read it for free: request your free 60-day e-book exam copy


Computer Science Specialist | Long Term LOA (Leave of Absence) from the University of Aveiro, Portugal

Electronic & Telecommunication Engineering (5 yrs degree, UA) * MSc in Electronic Information Management (U.Sheffield, UK)

European Space Agency [ESA] Industrial Placement (IRS / ESRIN – European Space Research Institute, Frascati [Rome], Italy)

ICPD Doctorate Candidate (UA | U.Porto) * PhD Researcher (UA/CETAC.Media), grant by FCT - Portuguese Foundation for Science and Technology

President/Chair of USE.pt Steering Committee (Portuguese Ex Libris Users’ National Association, hosted by Portuguese Parliament's Library, Palácio de S. Bento, Lisbon, http://metis.fe.up.pt/use/



-----Original Message-----
From: Andrea Marchitelli [mailto:marchitelli@cilea.it]
Sent: quinta-feira, 4 de Outubro de 2012 08:52
To: vufind-tech@lists.sourceforge.net
Subject: [VuFind-Tech] Systems requirements for a quite big installation



we are planning an installation of VuFind for a new project that has quite numbers that seem to me quite big.


This VuFind will index about 10 millions of records, coming from different ILS that export UNIMARC file, so we have to import 9-10 different recordset every week.

Moreover, this VuFind should harvest a pair of OAI-PMH sources, adding an half million record.


Wich system requirements (RAM, disk and so on) do you think that we have to provide?.





Dr. Andrea Marchitelli



ph. +39 06 59292856 - mob. +39 340 4027156 - fax +39 06 5913770 CILEA - Consorzio Interuniversitario http://www.cilea.it/disclaimer

skype: andreamarchitelli



Don't let slow site performance ruin your business. Deploy New Relic APM Deploy New Relic app performance management and know exactly what is happening inside your Ruby, Python, PHP, Java, and .NET app Try New Relic at no cost today and get our sweet Data Nerd shirt too!



Vufind-tech mailing list