From: Wolfgang M. <me...@if...> - 2003-10-04 09:23:45
|
I'm happy to announce a new snapshot release with many important changes: 1) Memory consumption during query processing While experimenting with a large collection of Asian-language TEI docs, I found that memory consumption is much too high for some types of queries, especially queries on the fulltext of the document. I measured up to 60M for a query on some frequent single-char tokens. As a result, major parts of the query engine have been modified to reduce memory consumption during query processing. So far, eXist used to load the entire list of text nodes matching a given text token into memory, then checked the nodes against the query context and applied an intersection or union on the resulting sets. All these operations have now been merged into one, single step: the list of matching text nodes is no longer kept in memory. Instead, nodes are directly matched against the context node set while scanning the index and only the relevant nodes are returned. As a result, killer queries like match-all(., 'a.*', 'b.*', 'c.*') will still take some time, but they can't kill the database. In addition, I replaced the node set implementations by better variants with reduced memory consumption. Using simple arrays for node sets is faster than other alternatives, e.g. trees. However, array length is fixed, so arrays are frequently reallocated while the set grows. The new implementation tries to better estimate the expected size of the node set. This estimation is correct in many cases, so only minimal reallocations are required during query processing. The new class org.exist.dom.ExtArrayNodeSet, uses a combination of an AVL tree (for document ids) and arrays (for the nodes). I have also spent some time to optimize the various algorithms using profiling information. Please note that the match highlighting feature does still consume a large amount of memory if the number of text hits is very large. You should disable this feature for Asian-language texts. 2) XPath query processing XPath query processing has been changed to reflect the query processing model of XPath 2.0 (and XQuery), i.e. every expression returns a sequence of items, where an item is either a node or an atomic value and the single item is also a sequence. I got most of the ideas from Michael Kay's Saxon. The migration to the XPath 2 model has just begun, so the code should be regarded as unstable, though most of your old queries should work. Best regards, Wolfgang -- |