From: Dmitriy S. <sha...@gm...> - 2010-03-29 16:32:04
|
On Mon, 2010-03-29 at 10:59 -0400, Andrzej Jan Taramina wrote: > Looking to get some guidance on how big you can scale an eXist database. > > Right now, our instances are about 15-25K documents where each document is in the 25K-2M range, probably averaging > around 150-200K. This results in a dom.dbx = 3.5G, structure.dbx = 1.8G, collections.dbx = 4.2M and values.dbx = 155M, > which is not all that large compared to some relational databases. > > What if we scale up 10x to nearly quarter of a million documents? The file sizes still shouldn't be all that big for > modern hardware, but will the performance scale linearly or close to it, assuming a powerful enough server (say a > dual-cpu, 6-Core machine (12 cores, 24 native threads) with gobs of memory)? > > OK.....if that works how about two orders of magnitude (100x current size)? That would give us 2.5M documents, 250GB > dom.dbx and a structure.dbx in the 180GB range. Bit too big or practical to cache the whole structure.dbx in memory, > regardless of the size of the memory in the server. > > At what point do I start looking at alternative storage mechanisms, (RDBMS, Hadoop, memcached, etc.) or co-operating > distributed eXist instances? > > Thanks for any insights from those that have pushed big databases in eXist... eXist can store as much as you have hdd space. The main question is what eXist can do with it & what it can't. It be very good on "single" small result selection from that big amount of data, but the problem starts as soon as you increase evaluations & "special" operations like order by. -- Cheers, Dmitriy Shabanov |