Re: [Cppcms-users] Fast bulk-insert into SQLITE3 using cppDB
Brought to you by:
artyom-beilis
From: Artyom B. <art...@ya...> - 2011-08-21 07:14:50
|
Several things: cppdb::session sql("sqlite3:db=db.db; Page Size=65536; Cache Size=65536; Synchronous=Off; Journal Mode=Off;"); These options " Page Size=65536; Cache Size=65536; Synchronous=Off; Journal Mode=Off;" Have no effect with CppDB, see: http://art-blog.no-ip.info/sql/cppdb/sqlite3.html If you want to set these options you need to use Sqlite PRAGMA statements: sql << "PRAGMA cache_size =65536" << cppdb::exec; sql << "PRAGMA synchronous = OFF" << cppdb::exec; See: http://www.sqlite.org/pragma.html Now if you only want to populate the new DB then "PRAGMA synchronous = OFF" if fine but don't turn off journal as actually transaction can't be done without journal so it is better to use journal and transaction. Also bulk inserts with sqlite3 should be very fast when you do it in a single transaction even with synchronization. Few additional points: 1. The MSVC's std::fstream implementation is quite bad, I've seen that fopen API performs sometimes much faster. So if performance is critical I'd suggest to use FILE/fopen, in any case you don't even use any specific istream features like splitting the file by lines and so on. 2. This is bad code: string* f = new string[10]; use std::vector<std::string> f(10); Or even std::string f[10]; Especially when you forget to call delete [] f > I'm breaking my head trying to figure out the reason of this difference in sizes; may be you know possible reasons? It is possible due to fact that some integer columns are filled up like strings. Sqlite can happily store any value in any column so this may explain the difference. Artyom Beilis -------------- CppCMS - C++ Web Framework: http://cppcms.sf.net/ CppDB - C++ SQL Connectivity: http://cppcms.sf.net/sql/cppdb/ >________________________________ >From: Vizcayno Tamparantan <viz...@gm...> >To: cppcms-users <cpp...@li...> >Sent: Sunday, August 21, 2011 1:19 AM >Subject: [Cppcms-users] Fast bulk-insert into SQLITE3 using cppDB > > >**************** This replace to previous post because the display format is ugly *************************** > >Hello: >I show you the main part of a C++ which loads a 540MB delimited ("|") text file into an SQLite3 db using last cppDB (trunk 1875): >int main () >{ > Timer t; > cout << t.ShowStart() << endl; > > ifstream input("in1.txt"); > const size_t size = 512 * 1024; > char buffer[size]; > string* f = new string[10]; > > sql << "DROP TABLE IF EXISTS ttb" << cppdb::exec; > sql << "CREATE TABLE ttb (alm varchar(4), alm2 varchar(4), \ > mat varchar(18), ser varchar(20), fac varchar(18), almttb varchar(10), \ > cant integer, sts varchar(1), ser_2_20 varchar(20), rowid_sap integer, \ > stsmat varchar);" << cppdb::exec; > > cppdb::transaction guard(sql); > cppdb::statement stat; > stat = sql << > "INSERT INTO ttb(alm,alm2,mat,ser,fac,almttb,cant,sts,ser_2_20,rowid_sap,stsmat) " > "VALUES(?,?,?,?,?,?,?,?,?,?,?)"; > > int cont = 0; > int regLei = 0; > while (input) // Read a text file with fields separated by "|" > { > input.read(buffer, size); > size_t readBytes = input.gcount(); > for (size_t i = 0; i < readBytes; i++) > { > if (*(buffer+i) == '\n') // || *(buffer+i) == '\r') > { > if (regLei++ > 0) // Don't save first line, it's header > { > stat.reset(); > stat.bind(f[0]); > stat.bind(f[1]); > stat.bind(f[2]); > stat.bind(f[3]); > stat.bind(f[4]); > stat.bind(f[5]); > stat.bind(f[7]); > stat.bind_null(); > stat.bind_null(); > stat.bind_null(); > stat.bind(f[9]); > stat.exec(); > } > > for (int j=0;j<10;f[j++]=""); // Reset string arrays of fields > cont = 0; > } > else > if (*(buffer+i) != '|') > f[cont] += *(buffer+i); // Build each field char by char > else > cont++; // Build data on next string of the array f > } > } > guard.commit(); > > t.Stop(); > cout << t.ShowEnd() << endl; > cout << "Executed in: " << t.ElapsedSeconds() << " seconds." << endl; > > input.close(); > return 0; >} > > >Please, be patient with my C++ style. There is a part of the code that is trying to read the file as fast as possible (using a buffer), but I can't get to save the data into Sqlite at the same speed that I read. >In the code you can see and attemp to accelerate the writing but, I am not sure if it's working because the journal file continues generating data, I am sure I am doing something wrong: > cppdb::session sql("sqite3:db=db.db; Page Size=65536; Cache Size=65536; Synchronous=Off; Journal Mode=Off;"); > >My questions are: >1) What is bad in the cppdb::session syntax I wrote? >2) Checking the code, can you find a way to speed-up the writing of the data into the SQLite DB (I am sure that the reading is very fast), the command compilation in VS2010 is: cl /EHsc /Ox /Ob2 /Oi for both, the cppDB and my C++ program; i'm under windows 7 and program generation is for 32 bits. >3) I also wrote a program in C# and it is always 10-15 seconds faster than my C++ and, as a curiosity, the file size of the database running C# is 386MB and after running the C++ the result size is 408MB (one possible reason because C++ is slower than my C# program); the records and fields generated are the same in both databases and > >Many thanks for your attention. > >------------------------------------------------------------------------------ >Get a FREE DOWNLOAD! and learn more about uberSVN rich system, >user administration capabilities and model configuration. Take >the hassle out of deploying and managing Subversion and the >tools developers use with it. http://p.sf.net/sfu/wandisco-d2d-2 >_______________________________________________ >Cppcms-users mailing list >Cpp...@li... >https://lists.sourceforge.net/lists/listinfo/cppcms-users > > > |