Maybe it is feasible to break that big document into a couple of smaller ones?
Well, here on the Fraunhofer Institute of Experimental Software Enigneering in KAISERSLAUTERN :) we deal with especially kind of XML documents. And we are not sure are we able to automaticly chunk it into pieces, because of its structure - they might change change their structure in future and that XML standard might provide "variations". That might have problems with automaticly chunking of big documents.
OK, I suppose it will be OK to introduce me. I'm teaching assistent at Faculty of Electrical Engineering at University of Banja Luka, Bosnia and Herzegovina. I'm here in Kaiserslautern doing project (funding of project is by one European project JEP). My supervisor here (you might know him) is Thomas Forster. I would wish to visit you and to talk with you and maybe with some other people at the University. You might be also involved in this project. Some people from Kaiserslautern and Paderborn were in Banja Luka one week ago!
Isn't it feasible to change to a stream-based approach, because the memory consumption will be nearly constant?
To change what project? I looked into Xindice's source code, and ... it might not be easy to modify it.
You could use Jaxen with your own iterator
implementation, XSQ together with SAX, or a persistent DOM implementation.
1. I don't see the reason to use Jaxen because Saxon works with large documents. I'm not sure about Jaxen. I see that is use JDOM or dom4j. I tried with dom4j and it broke!
2. XSQ use Xerces for evaluating XPath expression. I don't know almost anything about Xerces.
3. Persistant DOM implementation? I googled and found Ozone. I will look it!
And, do you have considered to use XML-middleware and store the data into a
relational schema? Another hint: Have a look at Galax and its projecting algorithm.
See at chunking! If they have unregular sintax it is not appropriate to store it into relational schema. And if new standard arrive - what than?