From: Wolfgang Meier <meier@if...> - 2003-02-21 17:54:58
I did some more memory profiling during this week to investigate why the=20
database does sometimes run into out-of-memory exceptions and the like du=
indexing. The main thing I found is that eXist's own free memory checks s=
to interfere with the garbage collector when indexing a larger number of=20
rather small files.
I have thus changed the strategy a bit: the db will now only check memory=
limits for files that contain more than 10000 nodes (this limit is requir=
to clean up the cache from time to time) and otherwise lets the garbage=20
collector do its job.
I have also reworked the indexing and btree code to avoid object allocati=
(mainly copy operations between byte-arrays). The results on memory=20
consumption are quite considerable, though I suspect there is still more=20
potential for further optimizations (for example, by switching to java NI=
I observed a 20% performance increase when indexing some of my test=20
I am now able to index a 400MB (in 40000 files) volume without database h=
or out-of-memory exceptions, even when communicating to a remote db (usin=
64KB min, 128MB max memory for the JVM). Indexing large data volumes over=
network is still not recommended (the XMLRPC calls consume additional=20
memory), but at least, it seems to work now.
All changes are available in the cvs.