From: Jean-Marc V. <jm...@fr...> - 2004-02-28 11:01:42
|
Hello I want to know where we are with the problem about irregularily structured documents. In this post, Wolfgang defined a strategy to split behind the scene the unmanagable document : EXistException: document is too complex/irregularily structured] <http://sourceforge.net/mailarchive/forum.php?thread_id=3820172&forum_id=3154> Then later he thought about another solution, based on a larger index: Re: Re: EXistException: document is too complex/irregularily structured <http://sourceforge.net/mailarchive/message.php?msg_id=7108760> I'm still trying to understand Wolfgang's PDF article ( eXist: An Open Source Native XML Database <http://exist-db.org/webdb.pdf> ), and to understand where in the code the algorithms are implemeted, and documenting what I found in webapp/design.xml . I think that the article is still up-to-date with the code, apart from obvious things like XQuery and XUpdate were not implemeted, but I'd like to know about algorithmic changes since the article appeared. Ths only new idea I have for now, to overcome the problem that the range of sparse identifiers is exhausted, is to use somehow doubles instead or in addition to longs. Mathematical Real numbers have the property that you allways can insert a number between two non-equal numbers. Java doubles are limited to 15 significative figures, but much better than longs in this respect. I think that whatever new indexing scheme is developped, it will pay to do some *refactoring*. The indexing should offer an API layer sitting between the DOM above and the storage below. So trials of new algorithms will be easier, and why not have different indexing schemes on a document or collection level? What I will first do today is write an JUnit test to reproduce the problem. I intend to create with DOM a simple document having 10000 elements connected to the root, and the first element having a sub-tree of depth, say 100, or 10000. -- Jean-Marc Vanel 01 39 43 31 46 Conseil et Services / développement & intégration logiciels Logiciel libre, Web, Java, XML ... A la pointe de la technique, au service des projets http://jmvanel.free.fr/ ===) CV, software resources Mes journaux: - sujets généraux en Français: http://jmvanel.free.fr/Block-note.html - sujets informatiques en Français: http://jmvanel.free.fr/notes-informatiques.html - computer science diary : http://jmvanel.free.fr/computer-notes.html Worldwide Botanical Knowledge Base http://wwbota.free.fr/ |