From: Wolfgang M. <me...@if...> - 2004-01-22 17:59:49
|
Hi, I assume you were using the 0.9.2 release? I would strongly recommend to get the development snapshot. The problem with illegal btree pages should be fixed by now. Unfortunately, I have not been able to merge my latest changes into CVS. I fixed several bugs concerning btree storage yesterday (one also causing ArrayIndexOutOfBounds exceptions). I will post a new snapshot once CVS lets me in again. I also made some tests today: at least, storing about 100.000 files with approximately 1.5 GB of XML source worked well. I would be interested to learn more about the structure of your data. Then we can better see how far we can get. I assume that your XML files do include some amount of non-text data? Wolfgang > eXist is an attractive product! > > We need to know whether it can handle 1,000,000 and more documents, or when > it might to able to. Our current document store uses a combination > of OS files and SQL-based indexes, and we are considering future options > for backend store. Our documents are each about 25KB, so we need to handle > at least 25 GB of raw xml files. > > [Our project site is http://www.fedora.info We have no connection to Red > Hat! > We've used the name "Fedora" since 1997. They should have done a search > first!] > > > 1. What are the largest eXist collections known to have worked? > > > I am testing on solaris 5.8, using the eXist client in local mode, and have > encountered two severe errors before loading even 100,000 documents. > > On my first try, I scaled up by adding batches of 1, 9, 90, 900, and 9000 > documents, > successfully. I then set up to add in 10,000-document batches. The first > batch added > without error, for a total of 20,000 documents, or about 500MB in raw xml > files. > > The next (second) batch of 10,000 failed on every document, with an > internal exception: > > org.dbxml.core.filer.BTreeException: Invalid Page Type In addValue > > meaning that a B-tree node read was neither a leaf nor a branch node, only > those 2 types > expected. (there is one other node state with a const in the code, but not > allowed > at the point in the code throwing the exception.) > > > 2. Is this "Invalid Page Type" a known bug? Anyone know the cause or fix? > > > [I copied .dbx files to record snapshots of intermediate db sizes, and > -might- have > restored from .dbx files made with config.xml A, and then switched to > config.xml B. > --I don't think I did this.-- But in case I did make that mistake, I > reinstalled eXist > before further testing, using only one config.xml (out-of-the-box), and not > copying .dbx files.] > > > On this second try, I set up to add again in 10,000-document batches, each > batch going to > one of 10 different collections. The first 7 batches added without error, > for a total of > 70,000 documents, or about 1.75GB in raw xml files. Better this time! > > Then I got a different internal error on import. > > The next batch (and subsequent batches) failed alike: for each of these > batches, > 19 documents were correctly imported, followed by the remaining documents > failing > with a different internal exception: > > java.lang.ArrayIndexOutOfBoundsException: -1 > [sometimes with a bad index value of -1316088163] > The b-tree buckets are programmatically Java arrays, so this error also is > in the b-tree > code. > > I looked at the xml input files and they look ok. I retried the first > (no-error) batch: > same error now, 19 successes followed by failure of all documents in the batch. > > > 3. I saw another posting on this error, but in a query instead of in an import. > Is this "ArrayIndexOutOfBoundsException" a known bug? Anyone know the > cause or fix? > > > Thanks for help on this. I only have my large disk space for a limited > time, and need to > get beyond these problems in order to test. > > > Bill Niebel > University of Virginia > > > > ------------------------------------------------------- > The SF.Net email is sponsored by EclipseCon 2004 > Premiere Conference on Open Tools Development and Integration > See the breadth of Eclipse activity. February 3-5 in Anaheim, CA. > http://www.eclipsecon.org/osdn > _______________________________________________ > Exist-open mailing list > Exi...@li... > https://lists.sourceforge.net/lists/listinfo/exist-open > -- and |
From: Kevin S. C. <ksc...@st...> - 2004-01-26 23:30:57
|
> From: "Wolfgang Meier" <me...@if...> > I also made some tests today: at least, storing about 100.000 files with > approximately > 1.5 GB of XML source worked well. Out of curiosity how are these arranged with regard to collections? Are all 100,000 in one collection or split into... how many? Thanks, Kevin -- Kevin S. Clarke <ksc...@st...> Lane Medical Library, Stanford University |