you didn't tell us what eXist version you are using?=20
While it is true that the beta2 release had quite a few problems with
storing huge data sets, I would expect the current snapshot releases
to behave quite a bit better (though I always see more potential for
improvements). Recent code should scale well beyond the 30,000 docs
mentioned on the nxqd website ;-) given that not too much swapping
We also have some more pending changes in CVS to speed up indexing and
improve caching. In particular, I've tried to further reduce the
amount of temporary memory allocated during the indexing. Given these
changes, I only see one remaining issue that could lead to
out-of-memory errors: the document metadata is still kept in memory
during indexing and - though the single document doesn't consume much
space - the sum of it could indeed become a problem for collections of
your size, so this may need to be addressed.
I acknowledge there are still many areas we have to work at to improve
general usability. During the past months, I put most of my efforts
into sponsored features like the new logging & recovery, so
performance issues came a bit short (yes, even for an open source
project, the selection of features to implement is mainly driven by
economical factors as we all have to earn our living).
From my point of view, the next crucial feature to improve support for
large data sets is a pre-evaluation optimizer for queries: as I see
it, even a simple query-rewriting optimizer could solve a major part
of the performance and memory problems I currently experience with
queries over huge node sets, so this should be the next top-priority
for the project.
On 9/6/05, Jos=E9 Mar=EDa Fern=E1ndez Gonz=E1lez <jmfernandez@...> w=
> Hi everybody,
> I have been doing some tests with my huge test documents (now the=
> taking about 8 GB), splitting them in smaller pieces. The problem now is
> the number of generated "sub"documents, which is about 1 million. I have
> discovered that they can't be stored together in a single collection
> because an "Out of Memory" exception is fired (around 250000~260000
> documents in the same collection) with the Java parameter -Xmx256000k.
> If I double it (-Xmx512000), then I get over 520000 documents.
> I'm using an Athlon-64 with 2GB of memory, Gentoo Linux and I hav=
> tested with various JVMs: Sun (1.5.0.04) and Blackdown (1.4.2.03); BEA's
> jrockit (1.5.0.03) simply burst due an internal bug after a few tenths
> of thousand documents; and at last I'm using the IBM one (1.4.2),
> because it seems faster than the other ones, and it went further than
> the others.
> I have been using both the XML-RPC interface and the local one to=
> the tests. So, my question: is the max number of documents in a single
> collection bounded by the Java memory? Is held (or pinned) in memory the
> list of documents in the used collections?
> Best Regards,
> Jos=E9 Mar=EDa
> PS: I'm now testing a with MORE memory (-Xmx1536m). I'll tell my
> results when it has finished...
> Jos=E9 Mar=EDa Fern=E1ndez Gonz=E1lez e-mail: jmfernandez@...=
> Tlfn: (+34) 91 585 54 50 Fax: (+34) 91 585 45 06
> Grupo de Dise=F1o de Proteinas Protein Design Group
> Centro Nacional de Biotecnolog=EDa National Center of Biotechnolog=
> C.P.: 28049 Zip Code: 28049
> C/. Darwin n=BA 3 (Campus Cantoblanco, U. Aut=F3noma), Madrid (Spain)
> SF.Net email is Sponsored by the Better Software Conference & EXPO
> September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practic=
> Agile & Plan-Driven Development * Managing Projects & Teams * Testing & Q=
> Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
> Exist-open mailing list