From: Greg W. <GW...@bn...> - 2004-12-10 20:33:45
|
Hello, I'm building an eXist database to experiment and test it for use in an application and I've come across a few things that I would like to ask some questions about. I'm intending to load about 75,000 files and in these files there are several hundred thousand documents of interest. There are fairly natural ways to segment the files, but for this first test I'm loading all of the files into a single collection. At the moment I have a bit over 10,000 files loaded into this single collection. What kind of limits are there on the size of a query result set? I ask because I am getting error messages saying the size has exceeded 10,000 and the query fails. The output result in some cases would be more than 10k, but in many cases only an internal intermediate would have been so large. One kind of question that I want to ask the db is "give me all document fragments (50,000?) that have a particular sub structure within them"? The sub structure I'm interested in is tied to attribute values on certain elements, a very XML and XPath kind of thing. The current perl code and Oracle database takes days to parse the docs and execute the query. The eXist db fails with a "too large" message. Query performance seems slow in many cases. Is there documentation about how certain kinds of queries perform and which operators are faster and in which use cases they are faster? I'm convinced that much of the slowdown is because I don't know the system and am not posing queries that are well formed for eXist. However, I am seeing some slow performance that makes me wonder if the engine itself is not designed to handle the kinds of questions I'm intending to ask. Is it possible to index a single file into multiple collections at once? I have a need to refer to files via multiple logical hierarchies and I'm wondering how to do that in eXist. Can I treat collections as logical hierarchies or are they physical entities? What is the effect on query performance of having many collections and then querying all collections all at once? That is, if I have 200 collections into which I've placed 70k files and then want to answer a question across all the collections is there a significant difference in performance between one collection and those 200 collections (keeping the number of files constant)? Are there heuristics on what is good collection and database design in eXist? The eXist db looks like a very useful tool, just trying to learn how to put it to use. Thanks for any help! /pgw Greg Wolff gw...@bn... |