From: Michael B. <mbe...@mb...> - 2005-04-20 13:57:58
|
[I see in the meantime that a reply from Wolfgang has come in, but I will post this anyway, even though it covers almost the same ground, just in case viewing the same thing from a slightly different angle helps] hakhan wrote: [...] > What do I do wrong? Did I reached the limitations of a native > XML database and should I step over to an XML-enabled > database where querying data is perhaps much faster? I don't think it's possible to give you much useful advice at the level of generality in which you state your problems. Certainly, >100MB per document is pretty big compared to the size of documents used by other eXist users known to me (although I believe there are some users with documents of such sizes). And in general with all XML databases I've looked at, if a given body of data can be conveniently split into many smaller documents rather than placed in few very large ones, storage and retrieval performance improves. Do you really need the granularity of indexing and retrieval that eXist offers? If your are dealing with what are in effect fairly flat documents, you might be better using a fulltext retrieval system (maybe keeping the filesystem as the repository and retrieving sub-document level sections via a SAX parse). Lucene is probably the best-known Open Source example, but there are others, and many proprietary ones. On the other hand, is your data highly structured but not strongly hierarchical? In that case you might be better storing it in a more traditional RDBMS or an OODB (you could still generate XML for interchange purposes if that is a requirement). As for XML => RDBMS "adapters", most of which are commercial and expensive, I'm personally a bit sceptical. I've not been able to do any proper testing, but my subjective quick impression of two such systems was that they performed best on data which could really have been better kept purely in a RDBMS in the first place. If a native XML solution is generically the right one for your needs, size of document alone is not the only thing to consider when comparing systems. Performance is dependent on the structure of your documents and on the nature of your queries. As elsewhere, there is an indexation-related trade off between storage and retrieval times. By default, eXist indexes all the text in the document using its full-text index. Depending on your documents and your needs, you might get significant improvements in storage time with no penalty in retrieval time if you could identify portions of your documents for selective fulltext indexing, disable indexing of attribute values if you don't need them for retrieval, etc. Or you might want to turn off fulltext indexing altogether and use range indexes on suitable components. A further point is that some XQueries are intrinsically more expensive than others, and some are more expensive under one implementation than under others. However, it could be that there are list members who can give concrete advice on the basis of using eXist for documents like yours; and could comment on the strategies embodied in the queries your are finding frustratingly slow, but unless they know a bit more about what your documents are indeed like and what sort of retrieval needs you have, they won't be able to offer their experience. Michael Beddow |