Thread: [Exist-open] New snapshot available

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

I'm happy to announce a new snapshot release with many important changes: 

1) Memory consumption during query processing 

While experimenting with a large collection of Asian-language TEI docs, I found
that 
memory consumption is much too high for some types of queries, especially
queries 
on the fulltext of the document. I measured up to 60M for a query on some
frequent 
single-char tokens. 

As a result, major parts of the query engine have been modified to reduce
memory 
consumption during query processing. So far, eXist used to load the entire list
of 
text nodes matching a given text token into memory, then checked the nodes
against 
the query context and applied an intersection or union on the resulting sets. 

All these operations have now been merged into one, single step: the list of 
matching text nodes is no longer kept in memory. Instead, nodes are directly 
matched against the context node set while scanning the index and only the
relevant 
nodes are returned. As a result, killer queries like match-all(., 'a.*', 'b.*',
'c.*') will 
still take some time, but they can't kill the database. 

In addition, I replaced the node set implementations by better variants with
reduced 
memory consumption. Using simple arrays for node sets is faster than other 
alternatives, e.g. trees. However, array length is fixed, so arrays are
frequently 
reallocated while the set grows. The new implementation tries to better
estimate 
the expected size of the node set. This estimation is correct in many cases, so
only 
minimal reallocations are required during query processing. The new class 
org.exist.dom.ExtArrayNodeSet, uses a combination of an AVL tree (for document 
ids) and arrays (for the nodes). 
I have also spent some time to optimize the various algorithms using profiling 
information. 

Please note that the match highlighting feature does still consume a large
amount of 
memory if the number of text hits is very large. You should disable this
feature for 
Asian-language texts. 

2) XPath query processing 

XPath query processing has been changed to reflect the query processing model
of 
XPath 2.0 (and XQuery), i.e. every expression returns a sequence of items,
where 
an item is either a node or an atomic value and the single item is also a
sequence. I 
got most of the ideas from Michael Kay's Saxon. The migration to the XPath 2
model 
has just begun, so the code should be regarded as unstable, though most of your 
old queries should work. 

Best regards, 

Wolfgang 
-- 

Thread: [Exist-open] New snapshot available

eXist-db is a feature rich Open Source native XML database

exist-open