I am populating the data table batching all data inserts of a strip and then actually inserting them in a single transaction.
The number of transactions is:
N=number of strips (e.g., 50)
1 for spectrumList table
1xN for Data table
1xN for BBsToSpectrumList table
1xN for BBs table
So, one strip, one transaction (for this file, the DB increases of more or less 80 MB each transaction when N=50).
Now I am trying to use your PRAGMA settings, but I can't see any significant differences. It takes like 15 minutes less, but more than one hour. I
stat.executeUpdate("PRAGMA count_changes = OFF;");
and we gained other 15 minutes. But it is not sufficient: it takes almost 2 hours. I am using Xerial jdbc for Windows 64 bit and I checked out that it is actually using the 64 bit JVM. Am I doing something wrong?
We are talking about 4GB under Windows (only MS1: 72900624 entries in the DATA table) starting from a 780MB (MS1+MS2) mzXML file...it's huge! I am storing m/z and intensities as
double values and scan numbers as integer values.
expecDataNum=maxDataNum*dataDensity % --> ~74*10^6 (expected data number)
actualDataNum=72900624; % counted by the query: SELECT count(1) FROM DATA
% The expecDataNum and the actualDataNum are consistent.
occupDouble=8; %REAL. The value is a floating point value, stored as
an 8-byte IEEE floating point number.
occupInteger=[2 4 6 8]; %Integer. the value is a signed integer, stored in 1,
2, 3, 4, 6, or 8 bytes depending on the magnitude of the value.
gb=1024^3; (Windows convention)
1.39 1.67 1.94 2.22
The DB occupies 4 GB (Windows convention).
What's your opinion?
Da: Matt Chambers <email@example.com>
Inviato: Dom 5 dicembre 2010, 23:13:36
Oggetto: Re: [proteowizard-mzrtree] access times
How are you populating the table? Make sure it is inside a
transaction. Also, for bulk inserts to SQLite, try:
This won't help file size though. What sizes are we talking about?
If we have to move to BLOB, it will be much harder to propose it as
a standard because that will significantly complicate the reading
logic. At that point, the only reason I can think of to stick with
SQLite is the built-in R*Tree support.
On 12/5/2010 2:09 PM, Sara Nasso wrote:
I tested for range queries times on the DB using the DATA table
Problems related to DB size and time required to populate it
still need to be solved. I haven't thought about it yet. I guess
that populating would be quicker using C-language interface, but
I have no clue about how to deal with the size! Probably the
only solution is to move to BLOBs. Any ideas?
What to do next?