From: semelak <sem...@ya...> - 2010-04-30 11:45:25
|
Inserting about 11 million records into MySQL with the following command takes less than 2 minutes on my machine after disabling the keys and increasing the buffer size as instructed on the following link: http://venublog.com/2007/11/07/load-data-infile-performance/ LOAD DATA INFILE 'fileToLoad.csv' IGNORE INTO TABLE load1 FIELDS TERMINATED BY ','; It seems like the LOAD DATA INFILE instruction is not available with HSQLDB. What are the alternatives then ? It took a ridiculously long time to execute the Prepared Insert statements I wrote. Is there a fast and efficient way to load millions of records directly from files into the HSQL database ? --- On Thu, 4/22/10, Fred Toussi <fr...@us...> wrote: > From: Fred Toussi <fr...@us...> > Subject: Re: [Hsqldb-user] Very Slow Inserts and Bad performance -- Help > To: "User discussions about HyperSQL Database Engine." <hsq...@li...> > Date: Thursday, April 22, 2010, 5:02 AM > During bulk INSERT, if the data is > ordered according to your UNIQUE > index (the "term" field), the speed will improve to some > extent. My test > was with sorted data. > > As I said before, speed will always drop as the table gets > larger (disk > seek time). A solid state hard disk will help speed. > > Also note that the maximum number of rows in a table cannot > exceed 2 > power 31 (just over 2 billion) rows. Also the maximum > number of blocks > in the .data file cannot exceed this amount. > > Therefore, you will need to use SET FILES SCALE 128 (this > is the > maximum) and if more than a small percentage of your rows > have a long > verchar value (taling over 90 bytes), modify the HSQLDB > (the > org.hsqldb.persist.Logger class) code to allow SET FILES > SCALE 256. You > need to do this when the database is empty. > > With SET FILES SCALE 128, your .data file size will be up > to 256GB. > > Fred > > On Thu, 22 Apr 2010 04:18 -0700, "semelak" <sem...@ya...> > wrote: > > I removed the CHECKPOINT statement along with "SET > FILES CACHE SIZE > > 1000000" and "SET FILES WRITE DELAY 10 MILLIS". The > speed has improved > > now, but it's nowhere near the speed you quoted > (20,000 rows per second). > > The average I am having is 5,200 rows per second now. > The table has 18 > > million rows now. Is it possible to improve the > speed even further ? The > > Table will have at least 1800 million rows and it will > be used for > > read-only operations. I do have the data saved in a > CSV file. Is there a > > faster way to import all of the data in the database > ? > > > > > > --- On Thu, 4/22/10, Fred Toussi <fr...@us...> > wrote: > > > > > From: Fred Toussi <fr...@us...> > > > Subject: Re: [Hsqldb-user] Very Slow Inserts and > Bad performance -- Help > > > To: "User discussions about HyperSQL Database > Engine." <hsq...@li...> > > > Date: Thursday, April 22, 2010, 3:10 AM > > > Don't use CHECKPOINT, then backup is > > > not created while you are > > > inserting. Perform CHECKPOINT once at the end of > the > > > process. > > > Reduce the batch size to 1000, large sizes just > take up > > > memory. > > > "SET FILES CACHE SIZE 1000000" will not improve > insert > > > speed. > > > "SET FILES WRITE DELAY 10 MILLIS" will not have > any effect > > > because you > > > have disabled logging with "SET FILES LOG > FALSE". > > > You should get better performance. I have tried > inserting 4 > > > million rows > > > into a similar table and the average speed is > above 20000 > > > rows per > > > second. > > > Speed will always drop as the table gets larger. > This is > > > due to disk > > > seek time. > > > > > > Fred > > > > > > On Thu, 22 Apr 2010 01:42 -0700, "semelak" <sem...@ya...> > > > wrote: > > > > One more thing I forgot to add: > > > > > > > > stmt.execute("SET FILES CACHE SIZE > 1000000;"); > > > > > > > > The above was run once right after creating > the Table. > > > The table is > > > > created with the following: > > > > > > > > > > > stmt.execute( > > > > "CREATE > > > CACHED TABLE TermsFreq (" + > > > > " term > > > varchar(120) NOT NULL," + > > > > " > > > doc_id INTEGER NOT NULL," + > > > > " TF > > > INTEGER NOT NULL ," + > > > > " > > > UNIQUE (term,doc_id) " + > > > > ") > > > ;"); > > > > > > > > Still, it's very slow. > > > > > > > > --- On Thu, 4/22/10, semelak <sem...@ya...> > > > wrote: > > > > > > > > > From: semelak <sem...@ya...> > > > > > Subject: [Hsqldb-user] Very Slow > Inserts and Bad > > > performance -- Help > > > > > To: hsq...@li... > > > > > Date: Thursday, April 22, 2010, 1:04 > AM > > > > > Hi > > > > > > > > > > I am using preparedStatements to insert > a large > > > number of > > > > > entries in a cached table (over 10 > million > > > entries). > > > > > > > > > > pstmt = con.prepareStatement("INSERT > INTO > > > > > TermsDocFreq(term, DF) VALUES(?,?)"); > > > > > > > > > > loop at least 20,000 times{ > > > > > pstmt.setString(1,key); > > > > > pstmt.setInt(2, i); > > > > > pstmt.setInt(3, val); > > > > > pstmt.addBatch(); > > > > > } > > > > > > > > > > After addBatching every 20,000 Inserts, > I run the > > > following > > > > > to save the changes/additions to disk: > > > > > > > > > > > > > > > pstmt.executeBatch(); > > > > > stmt.execute("CHECKPOINT ;"); > > > > > con.commit(); > > > > > > > > > > The problem is that the performance of > > > executeBatch > > > > > degrades by a large degree after 3rd or > 4th > > > exectuion. In > > > > > the first run, it only takes about 25 > seconds. In > > > the > > > > > second, it takes about 1 minute. In the > 7th run, > > > it takes > > > > > near 5 minutes !!! > > > > > > > > > > I am just wondering what could be wrong > ? What > > > can I do to > > > > > improve the performance of the INSERTS > ?? Is > > > there a way to > > > > > disable backup as it is not really > needed for my > > > > > application. > > > > > > > > > > I have the following set before > performing any > > > INSERTS by > > > > > the way as I've read they might help > improving > > > the > > > > > performance: > > > > > > > > > > stmt.execute("SET FILES LOG FALSE;"); > > > > > stmt.execute("SET FILES WRITE DELAY 10 > MILLIS > > > ;"); > > > > > > > > > > Any help is very much appreciated. I am > using the > > > latest > > > > > version by the way. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > > _______________________________________________ > > > > > Hsqldb-user mailing list > > > > > Hsq...@li... > > > > > https://lists.sourceforge.net/lists/listinfo/hsqldb-user > > > > > > > > > > > > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > > > > _______________________________________________ > > > > Hsqldb-user mailing list > > > > Hsq...@li... > > > > https://lists.sourceforge.net/lists/listinfo/hsqldb-user > > > > > > > > > > > ------------------------------------------------------------------------------ > > > _______________________________________________ > > > Hsqldb-user mailing list > > > Hsq...@li... > > > https://lists.sourceforge.net/lists/listinfo/hsqldb-user > > > > > > > > > > > > > > ------------------------------------------------------------------------------ > > _______________________________________________ > > Hsqldb-user mailing list > > Hsq...@li... > > https://lists.sourceforge.net/lists/listinfo/hsqldb-user > > > > ------------------------------------------------------------------------------ > _______________________________________________ > Hsqldb-user mailing list > Hsq...@li... > https://lists.sourceforge.net/lists/listinfo/hsqldb-user > |