Re: [Exist-development] How big can you supersize this puppy?

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi Adam (and everybody),
	yes, that's my very old e-mail! My professional e-mail is now jmf...@cn... since the research group where I'm working on moved to CNIO (Spanish National Cancer Research Centre) three years ago.

	Obviously, you can also use this one :-)

	My old tests (1~2 years ago) were focused on scalability at single document, at collection and at query levels. A nice improvement since then is the FT indexes implementation used in eXist. It has improved A LOT, because the FTI implementation prior to the Lucene index did not scale up. eXist team has also removed many of the existing internal bottlenecks, most of them at intermediate results processing.

	But as Wolfgang has written, there are lots of possible improvements which are only needed when you are working with huge database instances. I guess intermediate results in a complex query on a huge database can still fire an OutOfMemoryError. For instance, a sequence of a hundred thousand in-memory nodes, which is being generated from database content. Other example, when you have to sort a huge sequence of nodes based on a complex condition (which is hopefully being addressed by latest Wolfgang developments).

	Best wishes,
		José María

On 03/31/10 22:43, Adam Retter wrote:
> Andrezj,
>
> A chap on the mailing list has quite some experience of scaling eXist
> into the hundreds of gigabytes range, perhaps if you email him he
> could share some of his experiences with you as well. José María
> Fernández González. jmfernandez<at>  cnb.uam.es
>
> On 29 March 2010 15:59, Andrzej Jan Taramina<an...@ch...>  wrote:
>> Looking to get some guidance on how big you can scale an eXist database.
>>
>> Right now, our instances are about 15-25K documents where each document is in the 25K-2M range, probably averaging
>> around 150-200K.  This results in a dom.dbx = 3.5G, structure.dbx = 1.8G, collections.dbx = 4.2M and values.dbx = 155M,
>> which is not all that large compared to some relational databases.
>>
>> What if we scale up 10x to nearly quarter of a million documents?  The file sizes still shouldn't be all that big for
>> modern hardware, but will the performance scale linearly or close to it, assuming a powerful enough server (say a
>> dual-cpu, 6-Core machine (12 cores, 24 native threads) with gobs of memory)?
>>
>> OK.....if that works how about two orders of magnitude (100x current size)?  That would give us 2.5M documents, 250GB
>> dom.dbx and a structure.dbx in the 180GB range.  Bit too big or practical to cache the whole structure.dbx in memory,
>> regardless of the size of the memory in the server.
>>
>> At what point do I start looking at alternative storage mechanisms, (RDBMS, Hadoop, memcached, etc.) or co-operating
>> distributed eXist instances?
>>
>> Thanks for any insights from those that have pushed big databases in eXist...
>>
>> --
>> Andrzej Taramina
>> Chaeron Corporation: Enterprise System Solutions
>> http://www.chaeron.com
>>
>> ------------------------------------------------------------------------------
>> Download Intel&#174; Parallel Studio Eval
>> Try the new software tools for yourself. Speed compiling, find bugs
>> proactively, and fine-tune applications for parallel performance.
>> See why Intel Parallel Studio got high marks during beta.
>> http://p.sf.net/sfu/intel-sw-dev
>> _______________________________________________
>> Exist-development mailing list
>> Exi...@li...
>> https://lists.sourceforge.net/lists/listinfo/exist-development
>>
>
>
>

-- 
"La violencia es el último recurso del incompetente"
	- Salvor Hardin en "La Fundación" de Isaac Asimov
"Premature optimization is the root of all evil." - Donald Knuth

José María Fernández González
e-mail: jos...@gm...

Re: [Exist-development] How big can you supersize this puppy?

eXist-db is a feature rich Open Source native XML database

Re: [Exist-development] How big can you supersize this puppy?