[Exist-open] Bug-fix: Problems with large documents

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Thanks to Mathias' bug report, I have found a severe bug which leads to=20
uncontrolled memory consumption when parsing larger documents. eXist uses=
=20
several page buffers to cache data- and btree-pages. However, in some cas=
es,=20
pages which were removed from the cache have not been properly garbage=20
collected (there were still valid references to the object). This does al=
so=20
apply to version 0.7.1.

The current CVS version should fix the problem. I have made several tests=
 and=20
it basically seems to work now. Using the patched code, storing a 12MB fi=
le=20
to the server via XMLRPC took between 86 and 130 seconds (JVM memory=20
settings: 32MB min./128MB max.). I have now also been able to index the s=
ame=20
file with memory restricted to only 32MB max.. Indexing a 32MB file took=20
about 300 seconds. However, sending this amount of data via XMLRPC made t=
he=20
client crash, so I had to index it locally.

> Does exist build a complete in-memory parse-tree of the document during=
=20
> insertion or is processed sequentially?

eXist processes documents sequentially using SAX. However, two SAX runs a=
re=20
needed: During the first run, eXist determines the structure of the resul=
ting=20
node tree. In the second run, it stores the actual nodes. Element and=20
fulltext indexes are cached. They will be flushed to disk whenever the JV=
M=20
runs low on memory (free memory < 5MB).

Best regards,

Wolfgang

[Exist-open] Bug-fix: Problems with large documents

eXist-db is a feature rich Open Source native XML database

[Exist-open] Bug-fix: Problems with large documents