The requirements are a little vague. But then, "entry level system"
and "mid-sized university" are a bit fuzzy too. :-) So is a term I'll
probably throw in: "server-class box".
It's really hard to pin down precise requirements. A small but very
popular collection might need more machine than a huge collection with
a very select clientele. I think that the best you are going to get
is some examples.
Here we run two production DSpace hosts. One has a single DSpace
instance and its DBMS, in 2GB of memory on dual Xeon 2.4gHz
(hyperthreaded). It contains 4570 ORIGINAL bitstreams (primary
documents, not thumbnails or extracted text or licenses) in 1717
items. The DBMS occupies 1.7gB and the assetstore 5.4gB. (The DBMS
is also providing two other databases in the same tablespace, so it's
hard to say precisely how much is used by ScholarWorks.IUPUI.Edu).
This one is our institutional repository and contains mainly local
research output. The memory is comfortably full and performance is
unremarkable. We have two gigabit Ethernet links from this host to
The other host runs three DSpace instances and their DBMS. It has 3GB
of memory and dual Xeon 3gHz processors (hyperthreaded). The DBMS
occupies 6.5GB and the three assetstores about 18GB. The largest
instance contains (from memory) about 20,000 documents and has a
sizable international audience; the other two are considerably
smaller. One instance is our university archive (meeting minutes and
such) and would be of limited interest outside the organization.
We've had some performance problems, mainly due to memory pressure and
my inexperience in tuning a system for sizable Java app.s. If I were
sizing this system today I would recommend at least 4GB of memory.
I'm also considering consolidating the databases on a single, separate
host. This host also runs two GbE links to the outside.
I'm also happily running more than a dozen test instances on a 4GB dual
All of these are using hardware RAID-5 storage, via whatever HP
StorageWorks or Dell PERC controller came with the system. I don't
think I've got the DBMS tuned well enough yet to say whether there are
significant performance limits from that setup, but a lot of Postgres
folk prefer software RAID and definitely like other controllers for
high-performance DBMS servers. I don't expect to see such limits hit
unless our traffic goes up considerably.
Mark H. Wood, Lead System Programmer mwood@...
Friends don't let friends publish revisable-form documents.