From: Wolfgang M. <wol...@gm...> - 2006-04-26 16:05:26
|
I'm happy to announce that the redesign of eXist's internal indexing scheme is basically complete now and is awaiting testers. The DLN ("Dynamic Level Numbering") branch can run the entire test suite and passes all of the relevant tests. I'm looking forward to hear your experiences and error reports ... For those not familiar with SVN, a snapshot of the branch is available for download: http://prdownloads.sourceforge.net/exist/eXistDLN-b1-20060426.jar The release can be installed like any other snapshot and includes the entire distribution. So far, the code has not been extensively tested with real applications (though the test suite is quite comprehensive in some, but certainly not all areas). Please note that the branch is not completely in sync with the main development trunk, so some features added to the trunk since mid-February may be missing (merging the code will take quite some time). A description of the redesign can be found in the wiki (work in progress, needs to be updated and extended): http://wiki.exist-db.org/space/NewIndexArchitecture To summarize, the main benefits of the new indexing scheme are: * the size of a single document is no longer limited by the numbering scheme (in fact it is only limited by internal storage restrictions, which can be changed) * the new dynamic level ids can much better adopt to the irregular structure of a document, i.e. we don't need to "balance" the DOM in any way internally * the new scheme is update-friendly: the old code required a reindex of parts of the document after each single update. This is no longer necessary. Nodes can be inserted in any position of the document tree without reindex. * there's further optimization potential to improve the performance of join operations. However, node set processing is still based on the old node set implementation classes, so the benefits of the new scheme are not yet visible. Disadvantages / problems to expect: * Increased storage footprint of nodes (we now have to store the node id along with each node in dom.dbx). Needs to be improved. * I currently recognize a slight performance decrease for some types of queries (but see the last item above). Happy testing! Wolfgang |
From: Wolfgang M. <wol...@gm...> - 2006-04-26 16:13:18
|
Reading my message a second time, I think it might be slightly too enthusiastic: inexperienced users could be tempted to download the code immediately and be frustrated afterwards. So please be warned: the snapshot represents ongoing development work and is only meant for testing and bug fixing! Wolfgang |
From: Michael B. <mbe...@mb...> - 2006-04-26 20:46:05
|
Wolfgang wrote: > I'm happy to announce that the redesign of eXist's internal indexing > scheme is basically complete now and is awaiting testers. Great news! So far things are looking very good. A 60M TEI document with a very straggly tree structure that previously defeated eXist went in with no problems, and the results of queries against it (using for comparison a specially "flattened" version of the same document, which has the same text content but a structure massaged to allow the old node indexing to cope with it) all seem correct so far. Subjectively I have so far noticed no significant differences in either indexing or retrieval times, but I will need to schedule some more methodical testing in the near future. Since I make little production use of XUpdates, I'm not really in a position to test that aspect at all rigorously with code of my own. For anyone who, like me, wants to build against the SVN outside of an IDE: the code for the Java client in the DLN branch still has a dependency on gnu.readline (no longer present in the trunk /lib tree) and so requires access to that jar at build time, though it seems to have problems using it at run time, since the command history doesn't work, at least not in my present setup. And although the DLN branch will build with the log4 jar version (1.2.13) used in the trunk, the database won't start up unless that jar is replaced by 1.2.9. But those are minor matters and are sure to get fixed when the branches are is fully reconciled. It certainly looks as though the "document too complex" dragon has been slain, which is a major landmark in eXist's history. Congratulations to all concerned, and to Wolfgang in particular for masterminding what is in effect a heart transplant on his brainchild. Michael Beddow |
From: Wolfgang M. <wol...@gm...> - 2006-04-27 16:39:43
|
We found a few issues in yesterdays snapshot (and the DLN branch from which it was built): 1) large files could be stored and queried but when you tried to retrieve or delete them, you received an Exception complaining about: "Negative size for DLN: -125". 2) most of the web examples didn't work. This is due to the differences between the mainline in trunk and the code in the branch. I have ported the corresponding HTTP-related XQuery modules back into the branch, so the example should work again. 3) the branch required an older version of log4j.jar in lib/core. I fixed this. There's still a dependency on libreadline-java.jar. Removing that was not possible without causing further trouble: too many changes in trunk would need to be imported. We will have a code freeze soon and merge the entire source trees then. An updated snapshot is available in the usual place: http://prdownloads.sourceforge.net/exist/eXistDLN-b1-20060427.jar Wolfgang |