From: John H. <Jo...@mi...> - 2008-10-07 02:51:52
|
Hi My first test are complete. To aid in getting all the questions/answers and results in one place, I've made a wiki entry. http://wiki.bacula.org/doku.php?id=wiki:playground My scripts are attached. --John |
From: Kern S. <ke...@si...> - 2008-10-07 07:56:05
|
On Tuesday 07 October 2008 04:50:50 John Huttley wrote: > Hi > My first test are complete. > To aid in getting all the questions/answers and results in one place, > I've made a wiki entry. > > > http://wiki.bacula.org/doku.php?id=wiki:playground > > > My scripts are attached. > > --John Can you present a concise summary of the times to insert with and without the extra fields and the sizes of the resulting databases? Then I think people can start to have some idea of the possible costs of this proposal. Also at some point, if we really want to proceed, we *may* need similar tests with MySQL, but that is not necessary for the moment. Regards, Kern |
From: John H. <Jo...@mi...> - 2008-10-07 18:28:49
|
Kern, All detail I have is on the wiki, but to repeat, add 2 datetime and a long integer sends the insert time from 520 to 532 minutes, an increase of 2.3 % Size increase of DB too small to detect on my system, thus it is under 1%. --john Kern Sibbald wrote: > On Tuesday 07 October 2008 04:50:50 John Huttley wrote: > >> Hi >> My first test are complete. >> To aid in getting all the questions/answers and results in one place, >> I've made a wiki entry. >> >> >> http://wiki.bacula.org/doku.php?id=wiki:playground >> >> >> My scripts are attached. >> >> --John >> > > Can you present a concise summary of the times to insert with and without the > extra fields and the sizes of the resulting databases? > > Then I think people can start to have some idea of the possible costs of this > proposal. Also at some point, if we really want to proceed, we *may* need > similar tests with MySQL, but that is not necessary for the moment. > > Regards, > > Kern > > > |
From: Eric B. <er...@eb...> - 2008-10-07 08:19:09
|
Hello John, An other solution that doesn't require any additional code and no database structure modification, is to use a PL function to extract mtime, ctime, size, block_size etc... from LStat field. This function is available on Postgresql and now on Mysql (thanks to Kjetil). The only drawback that i see with this PL function, is that you *should* avoid to use it in WHERE or JOIN clauses. But, it works very well for reporting and analysis. Bye Le Tuesday 07 October 2008 04:50:50 John Huttley, vous avez écrit : > Hi > My first test are complete. > To aid in getting all the questions/answers and results in one place, > I've made a wiki entry. > > > http://wiki.bacula.org/doku.php?id=wiki:playground > > > My scripts are attached. > > --John |
From: Kjetil T. H. <kje...@li...> - 2008-10-07 12:55:30
|
John Huttley <Jo...@mi...> writes: > > http://wiki.bacula.org/doku.php?id=wiki:playground | "ctime Timestamp [...] The datetime it was created." ctime is inode change time on Unix clients, and create time on Windows clients. this difference alone makes the value of the field suspect. -- regards, | Redpill _ Kjetil T. Homme | Linpro (_) |
From: John H. <Jo...@mi...> - 2008-10-07 18:31:28
|
Yes, That was next on my list. Not something I've done before so I'm glad to hear that someone else has done it. Where will I find this code? Regards, John Eric Bollengier wrote: > Hello John, > > An other solution that doesn't require any additional code and no database > structure modification, is to use a PL function to extract mtime, ctime, > size, block_size etc... from LStat field. > > This function is available on Postgresql and now on Mysql (thanks to Kjetil). > > The only drawback that i see with this PL function, is that you *should* avoid > to use it in WHERE or JOIN clauses. But, it works very well for reporting and > analysis. > > Bye > > Le Tuesday 07 October 2008 04:50:50 John Huttley, vous avez écrit : > >> Hi >> My first test are complete. >> To aid in getting all the questions/answers and results in one place, >> I've made a wiki entry. >> >> >> http://wiki.bacula.org/doku.php?id=wiki:playground >> >> >> My scripts are attached. >> >> --John >> > > > > |
From: Eric B. <er...@eb...> - 2008-10-07 18:43:25
|
Take a look to trunk/gui/bweb/script/bweb-postgresql.sql and bweb-mysql.sql function base64_decode_lstat(); Bye Le Tuesday 07 October 2008 20:31:18 John Huttley, vous avez écrit : > Yes, > That was next on my list. Not something I've done before so I'm glad to > hear that someone else has done it. > > Where will I find this code? > > > Regards, > > John > > Eric Bollengier wrote: > > Hello John, > > > > An other solution that doesn't require any additional code and no > > database structure modification, is to use a PL function to extract > > mtime, ctime, size, block_size etc... from LStat field. > > > > This function is available on Postgresql and now on Mysql (thanks to > > Kjetil). > > > > The only drawback that i see with this PL function, is that you *should* > > avoid to use it in WHERE or JOIN clauses. But, it works very well for > > reporting and analysis. > > > > Bye > > > > Le Tuesday 07 October 2008 04:50:50 John Huttley, vous avez écrit : > >> Hi > >> My first test are complete. > >> To aid in getting all the questions/answers and results in one place, > >> I've made a wiki entry. > >> > >> > >> http://wiki.bacula.org/doku.php?id=wiki:playground > >> > >> > >> My scripts are attached. > >> > >> --John |
From: Yuri T. <ti...@gm...> - 2008-10-08 15:12:26
|
Good work! But... I think the test is not quite correct. ;) I am talking about the command "COPY" in PostgreSQL. In other words, I think that COPY and INSERT performed differently. "COPY" must be implemented faster than individual "INSERT" into real Bacula Job. I may be wrong. To make the most correct test, we must take the bacula source code and based on it to make gen_filetable.c, for example. We must use INSERTs instead COPY. At the moment I have a server (yet completely empty and sufficiently powerful) where I can create a database size of 100-200Gb or more, to conduct tests. I need the source code gen_filetable.c I novice C programmer, so do not promise that write code fast. ;) 2008/10/7 John Huttley <Jo...@mi...>: > Hi > My first test are complete. > To aid in getting all the questions/answers and results in one place, I've > made a wiki entry. > > > http://wiki.bacula.org/doku.php?id=wiki:playground > > > My scripts are attached. > > --John > > */ > #include <stdio.h> > > #define ROWS 1000000 > main () { > > long int I; > int row_offset; > row_offset=2000000000; > > // printf("COPY file (fileid, fileindex, jobid, pathid, filenameid, > markid, lstat, md5, size, mtime,ctime) FROM stdin;\n"); > printf("COPY file (fileid, fileindex, jobid, pathid, filenameid, > markid, lstat, md5) FROM stdin;\n"); > for (I=0; I< ROWS; I++ ) { > printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\n", > // printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\t%d\t%s\t%s\n", > I+row_offset, // fileid > 3, //fileindex > random(), //jobid > random(), //pathid > 1, //filenameid > 1, //markid > "LSTATDATA", //LSTAT > "MD512345678901234" //md5 > // ,0, //size > // "2008-09-01T01:02:04", //ctime > // "2008-09-01T01:02:05" //mtime > ); > } > > } > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Bacula-devel mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > -- with best regards |
From: Dan L. <da...@la...> - 2008-10-08 15:29:09
|
On Oct 8, 2008, at 11:12 AM, Yuri Timofeev wrote: > Good work! > But... > I think the test is not quite correct. ;) > I am talking about the command "COPY" in PostgreSQL. > > In other words, I think that COPY and INSERT performed differently. > "COPY" must be implemented faster than individual "INSERT" into real > Bacula Job. Yes, COPY, if doing multiple rows, should be faster than INSERT. > I may be wrong. I may be too. I doubt it. > > -- Dan Langille http://langille.org/ |
From: John H. <Jo...@mi...> - 2008-10-08 18:43:17
|
Does anyone think we need ctime? Else I can run the tests without it. --John |
From: Jesper K. <je...@kr...> - 2008-10-08 19:26:12
|
Yuri Timofeev wrote: > Good work! > But... > I think the test is not quite correct. ;) > I am talking about the command "COPY" in PostgreSQL. > > In other words, I think that COPY and INSERT performed differently. > "COPY" must be implemented faster than individual "INSERT" into real Bacula Job. > > I may be wrong. > > To make the most correct test, we must take the bacula source code and > based on it to make gen_filetable.c, for example. > We must use INSERTs instead COPY. INSERT in a transaction should perform "similar" to "copy". You could even do a insert with multiple tuples. But single inserts will absolutely be slower than copy due to parsing/copying and transaction overhead. -- Jesper |
From: Yuri T. <ti...@gm...> - 2008-10-10 06:08:55
|
Sorry. I was wrong. I am now looking into the source code bacula. Working with postgresql is using: 'COPY batch FROM STDIN' (not INSERT ;) I am now prepared and launched a test for MySQL. I look forward to when it will end. ;) 2008/10/8 Yuri Timofeev <ti...@gm...>: > Good work! > But... > I think the test is not quite correct. ;) > I am talking about the command "COPY" in PostgreSQL. > > In other words, I think that COPY and INSERT performed differently. > "COPY" must be implemented faster than individual "INSERT" into real Bacula Job. > > I may be wrong. > > To make the most correct test, we must take the bacula source code and > based on it to make gen_filetable.c, for example. > We must use INSERTs instead COPY. > > At the moment I have a server (yet completely empty and sufficiently > powerful) where I can create a database size of 100-200Gb or more, to > conduct tests. > I need the source code gen_filetable.c > > I novice C programmer, so do not promise that write code fast. ;) > > 2008/10/7 John Huttley <Jo...@mi...>: >> Hi >> My first test are complete. >> To aid in getting all the questions/answers and results in one place, I've >> made a wiki entry. >> >> >> http://wiki.bacula.org/doku.php?id=wiki:playground >> >> >> My scripts are attached. >> >> --John >> > > >> */ >> #include <stdio.h> >> >> #define ROWS 1000000 >> main () { >> >> long int I; >> int row_offset; >> row_offset=2000000000; >> >> // printf("COPY file (fileid, fileindex, jobid, pathid, filenameid, >> markid, lstat, md5, size, mtime,ctime) FROM stdin;\n"); >> printf("COPY file (fileid, fileindex, jobid, pathid, filenameid, >> markid, lstat, md5) FROM stdin;\n"); >> for (I=0; I< ROWS; I++ ) { >> printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\n", >> // printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\t%d\t%s\t%s\n", >> I+row_offset, // fileid >> 3, //fileindex >> random(), //jobid >> random(), //pathid >> 1, //filenameid >> 1, //markid >> "LSTATDATA", //LSTAT >> "MD512345678901234" //md5 >> // ,0, //size >> // "2008-09-01T01:02:04", //ctime >> // "2008-09-01T01:02:05" //mtime >> ); >> } >> >> } >> >> ------------------------------------------------------------------------- >> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge >> Build the coolest Linux based applications with Moblin SDK & win great >> prizes >> Grand prize is a trip for two to an Open Source event anywhere in the world >> http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> _______________________________________________ >> Bacula-devel mailing list >> Bac...@li... >> https://lists.sourceforge.net/lists/listinfo/bacula-devel >> >> > > > > -- > with best regards > -- with best regards |
From: Yuri T. <ti...@gm...> - 2008-10-14 07:22:50
Attachments:
test.tar.gz
|
Hi 2008/10/7 John Huttley <Jo...@mi...>: > Hi > My first test are complete. > To aid in getting all the questions/answers and results in one place, I've > made a wiki entry. > > http://wiki.bacula.org/doku.php?id=wiki:playground > My first test are complete. ;) Summary: MySQL, ProLiant DL180G5, CentOS 5.2 1. Batch inserts 10M unique! records in table 'File' (DB schema not changed) : 39 hours (23+16), DB file size 15,4 Gb 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2 Gb Source code attached. Please note that the length of time (16 hours) insert records from the table 'batch' to the table 'File' the same. This indicates the stability of results. Full report: # ./test Date: Oct 10 2008 11:27:37 Host: Localhost via UNIX socket Use database: 'bacula_test'. Temporary table 'batch' created. Start filling tables 'batch'(temporary), 'Path', 'Filename' ... End. 10000000 records inserts. *** CPUs time used : 375.880000 *** Elapsed time (wall clock) : 85334.000000 Start inserts from temporary table 'batch' to 'File' ... End inserts to 'File' ... *** CPUs time used : 0.000000 *** Elapsed time (wall clock) : 58838.000000 # ./test2 Date: Oct 10 2008 11:27:37 Host: Localhost via UNIX socket Use database: 'bacula_test2'. Temporary table 'batch' created. Start filling tables 'batch'(temporary), 'Path', 'Filename' ... End. 10000000 records inserts. *** CPUs time used : 384.650000 *** Elapsed time (wall clock) : 71670.000000 Start inserts from temporary table 'batch' to 'File' ... End inserts to 'File' ... *** CPUs time used : 0.000000 *** Elapsed time (wall clock) : 58564.000000 -- with best regards |
From: Kern S. <ke...@si...> - 2008-10-14 09:37:41
|
Hello, Could you summarize the difference in your schemas. From what I am seeing it looks like your first test that ran 39 hours is a standard Bacula File table more or less, and the second test NEW DB that ran 35 hours is a standard Bacula File table with additional fields and produces a smaller size. If that is true, then I don't understand the timing differences. Regards, Kern On Tuesday 14 October 2008 09:22:44 Yuri Timofeev wrote: > Hi > > 2008/10/7 John Huttley <Jo...@mi...>: > > Hi > > My first test are complete. > > To aid in getting all the questions/answers and results in one place, > > I've made a wiki entry. > > > > http://wiki.bacula.org/doku.php?id=wiki:playground > > My first test are complete. ;) > > Summary: > > MySQL, ProLiant DL180G5, CentOS 5.2 > > 1. Batch inserts 10M unique! records in table 'File' (DB schema not > changed) : 39 hours (23+16), DB file size 15,4 Gb > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2 > Gb > > Source code attached. > > Please note that the length of time (16 hours) insert records from the > table 'batch' to the table 'File' the same. > This indicates the stability of results. > > > > > > Full report: > > # ./test > Date: Oct 10 2008 11:27:37 > Host: Localhost via UNIX socket > Use database: 'bacula_test'. > Temporary table 'batch' created. > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > End. > 10000000 records inserts. > *** CPUs time used : 375.880000 > *** Elapsed time (wall clock) : 85334.000000 > > Start inserts from temporary table 'batch' to 'File' ... > End inserts to 'File' ... > *** CPUs time used : 0.000000 > *** Elapsed time (wall clock) : 58838.000000 > > > > > # ./test2 > Date: Oct 10 2008 11:27:37 > Host: Localhost via UNIX socket > Use database: 'bacula_test2'. > Temporary table 'batch' created. > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > End. > 10000000 records inserts. > *** CPUs time used : 384.650000 > *** Elapsed time (wall clock) : 71670.000000 > > Start inserts from temporary table 'batch' to 'File' ... > End inserts to 'File' ... > *** CPUs time used : 0.000000 > *** Elapsed time (wall clock) : 58564.000000 |
From: John H. <Jo...@mi...> - 2008-10-14 07:39:44
|
So the modified version is actually a bit faster? Thats odd. are you running 64Bit? Whats your Ram? Version of Mysql? I'll run it on my system also. Regards, john Yuri Timofeev wrote: > Hi > > 2008/10/7 John Huttley <Jo...@mi...>: > >> Hi >> My first test are complete. >> To aid in getting all the questions/answers and results in one place, I've >> made a wiki entry. >> >> http://wiki.bacula.org/doku.php?id=wiki:playground >> >> > > My first test are complete. ;) > > Summary: > > MySQL, ProLiant DL180G5, CentOS 5.2 > > 1. Batch inserts 10M unique! records in table 'File' (DB schema not > changed) : 39 hours (23+16), DB file size 15,4 Gb > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2 > Gb > > Source code attached. > > Please note that the length of time (16 hours) insert records from the > table 'batch' to the table 'File' the same. > This indicates the stability of results. > > > > > > Full report: > > # ./test > Date: Oct 10 2008 11:27:37 > Host: Localhost via UNIX socket > Use database: 'bacula_test'. > Temporary table 'batch' created. > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > End. > 10000000 records inserts. > *** CPUs time used : 375.880000 > *** Elapsed time (wall clock) : 85334.000000 > > Start inserts from temporary table 'batch' to 'File' ... > End inserts to 'File' ... > *** CPUs time used : 0.000000 > *** Elapsed time (wall clock) : 58838.000000 > > > > > # ./test2 > Date: Oct 10 2008 11:27:37 > Host: Localhost via UNIX socket > Use database: 'bacula_test2'. > Temporary table 'batch' created. > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > End. > 10000000 records inserts. > *** CPUs time used : 384.650000 > *** Elapsed time (wall clock) : 71670.000000 > > Start inserts from temporary table 'batch' to 'File' ... > End inserts to 'File' ... > *** CPUs time used : 0.000000 > *** Elapsed time (wall clock) : 58564.000000 > > > > > > > ------------------------------------------------------------------------ > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > ------------------------------------------------------------------------ > > _______________________________________________ > Bacula-devel mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-devel > |
From: Yuri T. <ti...@gm...> - 2008-10-14 07:59:13
|
2008/10/14 John Huttley <Jo...@mi...>: > So the modified version is actually a bit faster? > Well, yes. > Thats odd. So that MySQL slow with fields of type BLOB, imho. In an alternative scheme appear new fields : size, ctime, mtime. I therefore reduced length the value that is inserted into the field LStat. char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I"; instead of char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E"; > > are you running 64Bit? Yes. > Whats your Ram? 8Gb, but MySQL does not use the entire memory. The server is used only for this test. > Version of Mysql? mysql Ver 14.12 Distrib 5.0.45, for redhat-linux-gnu (x86_64) using readline 5.0 /etc/my.cnf taken from /usr/share/doc/mysql-server-5.0.45/my-huge.cnf > > > I'll run it on my system also. > > Regards, > > john > > > Yuri Timofeev wrote: > > Hi > > 2008/10/7 John Huttley <Jo...@mi...>: > > > Hi > My first test are complete. > To aid in getting all the questions/answers and results in one place, I've > made a wiki entry. > > http://wiki.bacula.org/doku.php?id=wiki:playground > > > > My first test are complete. ;) > > Summary: > > MySQL, ProLiant DL180G5, CentOS 5.2 > > 1. Batch inserts 10M unique! records in table 'File' (DB schema not > changed) : 39 hours (23+16), DB file size 15,4 Gb > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2 > Gb > > Source code attached. > > Please note that the length of time (16 hours) insert records from the > table 'batch' to the table 'File' the same. > This indicates the stability of results. > > > > > > Full report: > > # ./test > Date: Oct 10 2008 11:27:37 > Host: Localhost via UNIX socket > Use database: 'bacula_test'. > Temporary table 'batch' created. > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > End. > 10000000 records inserts. > *** CPUs time used : 375.880000 > *** Elapsed time (wall clock) : 85334.000000 > > Start inserts from temporary table 'batch' to 'File' ... > End inserts to 'File' ... > *** CPUs time used : 0.000000 > *** Elapsed time (wall clock) : 58838.000000 > > > > > # ./test2 > Date: Oct 10 2008 11:27:37 > Host: Localhost via UNIX socket > Use database: 'bacula_test2'. > Temporary table 'batch' created. > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > End. > 10000000 records inserts. > *** CPUs time used : 384.650000 > *** Elapsed time (wall clock) : 71670.000000 > > Start inserts from temporary table 'batch' to 'File' ... > End inserts to 'File' ... > *** CPUs time used : 0.000000 > *** Elapsed time (wall clock) : 58564.000000 > > > > > > > > ________________________________ > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great > prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > ________________________________ > _______________________________________________ > Bacula-devel mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-devel > -- with best regards |
From: Kern S. <ke...@si...> - 2008-10-14 07:56:26
|
On Tuesday 14 October 2008 09:39:27 John Huttley wrote: > So the modified version is actually a bit faster? That is what I understood too, but I wanted to get a confirmation. If it is indeed the case that the new case runs faster, it is indeed odd, and I would say the tester has fallen into a very trap that is very common in performance analysis. That is to get reliable test results on a memory caching machine such as Linux, you must run each test 10 times, and throw out the two with the times that differ the most from the average, then re-compute the average based on the remaining 8 samples which should not vary more than a few percent. Another aid is to run one set of 10 tests, then the other set of 10 tests, then the first set of 10 tests again, and make sure the two runs of the first set generate the same results (using the methodology mentioned above). If you want to have a better measure of disk times, you can sprinkle "sync" shell commands between each of the 10 test runs. Regards, Kern > > Thats odd. > > are you running 64Bit? > Whats your Ram? > Version of Mysql? > > > I'll run it on my system also. > > Regards, > > john > > Yuri Timofeev wrote: > > Hi > > > > 2008/10/7 John Huttley <Jo...@mi...>: > >> Hi > >> My first test are complete. > >> To aid in getting all the questions/answers and results in one place, > >> I've made a wiki entry. > >> > >> http://wiki.bacula.org/doku.php?id=wiki:playground > > > > My first test are complete. ;) > > > > Summary: > > > > MySQL, ProLiant DL180G5, CentOS 5.2 > > > > 1. Batch inserts 10M unique! records in table 'File' (DB schema not > > changed) : 39 hours (23+16), DB file size 15,4 Gb > > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema > > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2 > > Gb > > > > Source code attached. > > > > Please note that the length of time (16 hours) insert records from the > > table 'batch' to the table 'File' the same. > > This indicates the stability of results. > > > > > > > > > > > > Full report: > > > > # ./test > > Date: Oct 10 2008 11:27:37 > > Host: Localhost via UNIX socket > > Use database: 'bacula_test'. > > Temporary table 'batch' created. > > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > > End. > > 10000000 records inserts. > > *** CPUs time used : 375.880000 > > *** Elapsed time (wall clock) : 85334.000000 > > > > Start inserts from temporary table 'batch' to 'File' ... > > End inserts to 'File' ... > > *** CPUs time used : 0.000000 > > *** Elapsed time (wall clock) : 58838.000000 > > > > > > > > > > # ./test2 > > Date: Oct 10 2008 11:27:37 > > Host: Localhost via UNIX socket > > Use database: 'bacula_test2'. > > Temporary table 'batch' created. > > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > > End. > > 10000000 records inserts. > > *** CPUs time used : 384.650000 > > *** Elapsed time (wall clock) : 71670.000000 > > > > Start inserts from temporary table 'batch' to 'File' ... > > End inserts to 'File' ... > > *** CPUs time used : 0.000000 > > *** Elapsed time (wall clock) : 58564.000000 > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's > > challenge Build the coolest Linux based applications with Moblin SDK & > > win great prizes Grand prize is a trip for two to an Open Source event > > anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Bacula-devel mailing list > > Bac...@li... > > https://lists.sourceforge.net/lists/listinfo/bacula-devel |
From: Yuri T. <ti...@gm...> - 2008-10-14 08:06:33
|
2008/10/14 Kern Sibbald <ke...@si...>: > On Tuesday 14 October 2008 09:39:27 John Huttley wrote: >> So the modified version is actually a bit faster? > > That is what I understood too, but I wanted to get a confirmation. > > If it is indeed the case that the new case runs faster, it is indeed odd, and > I would say the tester has fallen into a very trap that is very common in > performance analysis. Of course, this is only the first test! I think that will soon be able to hold a series of tests. I just limiting the number of entries from 10M to 5M (very long wait) > > That is to get reliable test results on a memory caching machine such as > Linux, you must run each test 10 times, and throw out the two with the times > that differ the most from the average, then re-compute the average based on > the remaining 8 samples which should not vary more than a few percent. > > Another aid is to run one set of 10 tests, then the other set of 10 tests, > then the first set of 10 tests again, and make sure the two runs of the first > set generate the same results (using the methodology mentioned above). > If you want to have a better measure of disk times, you can sprinkle "sync" > shell commands between each of the 10 test runs. > > Regards, > > Kern > >> >> Thats odd. >> >> are you running 64Bit? >> Whats your Ram? >> Version of Mysql? >> >> >> I'll run it on my system also. >> >> Regards, >> >> john >> >> Yuri Timofeev wrote: >> > Hi >> > >> > 2008/10/7 John Huttley <Jo...@mi...>: >> >> Hi >> >> My first test are complete. >> >> To aid in getting all the questions/answers and results in one place, >> >> I've made a wiki entry. >> >> >> >> http://wiki.bacula.org/doku.php?id=wiki:playground >> > >> > My first test are complete. ;) >> > >> > Summary: >> > >> > MySQL, ProLiant DL180G5, CentOS 5.2 >> > >> > 1. Batch inserts 10M unique! records in table 'File' (DB schema not >> > changed) : 39 hours (23+16), DB file size 15,4 Gb >> > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema >> > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2 >> > Gb >> > >> > Source code attached. >> > >> > Please note that the length of time (16 hours) insert records from the >> > table 'batch' to the table 'File' the same. >> > This indicates the stability of results. >> > >> > >> > >> > >> > >> > Full report: >> > >> > # ./test >> > Date: Oct 10 2008 11:27:37 >> > Host: Localhost via UNIX socket >> > Use database: 'bacula_test'. >> > Temporary table 'batch' created. >> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... >> > End. >> > 10000000 records inserts. >> > *** CPUs time used : 375.880000 >> > *** Elapsed time (wall clock) : 85334.000000 >> > >> > Start inserts from temporary table 'batch' to 'File' ... >> > End inserts to 'File' ... >> > *** CPUs time used : 0.000000 >> > *** Elapsed time (wall clock) : 58838.000000 >> > >> > >> > >> > >> > # ./test2 >> > Date: Oct 10 2008 11:27:37 >> > Host: Localhost via UNIX socket >> > Use database: 'bacula_test2'. >> > Temporary table 'batch' created. >> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... >> > End. >> > 10000000 records inserts. >> > *** CPUs time used : 384.650000 >> > *** Elapsed time (wall clock) : 71670.000000 >> > >> > Start inserts from temporary table 'batch' to 'File' ... >> > End inserts to 'File' ... >> > *** CPUs time used : 0.000000 >> > *** Elapsed time (wall clock) : 58564.000000 >> > >> > >> > >> > >> > >> > >> > ------------------------------------------------------------------------ >> > >> > ------------------------------------------------------------------------- >> > This SF.Net email is sponsored by the Moblin Your Move Developer's >> > challenge Build the coolest Linux based applications with Moblin SDK & >> > win great prizes Grand prize is a trip for two to an Open Source event >> > anywhere in the world >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ >> > ------------------------------------------------------------------------ >> > >> > _______________________________________________ >> > Bacula-devel mailing list >> > Bac...@li... >> > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > > > ------------------------------------------------------------------------- > This SF.Net email is sponsored by the Moblin Your Move Developer's challenge > Build the coolest Linux based applications with Moblin SDK & win great prizes > Grand prize is a trip for two to an Open Source event anywhere in the world > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > _______________________________________________ > Bacula-devel mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-devel > -- with best regards |
From: Kern S. <ke...@si...> - 2008-10-14 12:33:27
|
On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote: > 2008/10/14 Kern Sibbald <ke...@si...>: > > On Tuesday 14 October 2008 09:39:27 John Huttley wrote: > >> So the modified version is actually a bit faster? > > > > That is what I understood too, but I wanted to get a confirmation. > > > > If it is indeed the case that the new case runs faster, it is indeed odd, > > and I would say the tester has fallen into a very trap that is very > > common in performance analysis. > > Of course, this is only the first test! > I think that will soon be able to hold a series of tests. > I just limiting the number of entries from 10M to 5M (very long wait) Yes, clearly running something 10 times is not very practical if it takes 35 hours each time, so the test size must be reduced, and you can reduce the number of runs from 10 to say 5. However, what was not at all evident from your first post is that there are apparently subtle differences in schemas that I did not see and differences in the size of the data you were inserting -- and those could possibly explain a large (or even all) the difference in timings. > > > That is to get reliable test results on a memory caching machine such as > > Linux, you must run each test 10 times, and throw out the two with the > > times that differ the most from the average, then re-compute the average > > based on the remaining 8 samples which should not vary more than a few > > percent. > > > > Another aid is to run one set of 10 tests, then the other set of 10 > > tests, then the first set of 10 tests again, and make sure the two runs > > of the first set generate the same results (using the methodology > > mentioned above). If you want to have a better measure of disk times, you > > can sprinkle "sync" shell commands between each of the 10 test runs. > > > > Regards, > > > > Kern > > > >> Thats odd. > >> > >> are you running 64Bit? > >> Whats your Ram? > >> Version of Mysql? > >> > >> > >> I'll run it on my system also. > >> > >> Regards, > >> > >> john > >> > >> Yuri Timofeev wrote: > >> > Hi > >> > > >> > 2008/10/7 John Huttley <Jo...@mi...>: > >> >> Hi > >> >> My first test are complete. > >> >> To aid in getting all the questions/answers and results in one place, > >> >> I've made a wiki entry. > >> >> > >> >> http://wiki.bacula.org/doku.php?id=wiki:playground > >> > > >> > My first test are complete. ;) > >> > > >> > Summary: > >> > > >> > MySQL, ProLiant DL180G5, CentOS 5.2 > >> > > >> > 1. Batch inserts 10M unique! records in table 'File' (DB schema not > >> > changed) : 39 hours (23+16), DB file size 15,4 Gb > >> > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema > >> > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2 > >> > Gb > >> > > >> > Source code attached. > >> > > >> > Please note that the length of time (16 hours) insert records from the > >> > table 'batch' to the table 'File' the same. > >> > This indicates the stability of results. > >> > > >> > > >> > > >> > > >> > > >> > Full report: > >> > > >> > # ./test > >> > Date: Oct 10 2008 11:27:37 > >> > Host: Localhost via UNIX socket > >> > Use database: 'bacula_test'. > >> > Temporary table 'batch' created. > >> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > >> > End. > >> > 10000000 records inserts. > >> > *** CPUs time used : 375.880000 > >> > *** Elapsed time (wall clock) : 85334.000000 > >> > > >> > Start inserts from temporary table 'batch' to 'File' ... > >> > End inserts to 'File' ... > >> > *** CPUs time used : 0.000000 > >> > *** Elapsed time (wall clock) : 58838.000000 > >> > > >> > > >> > > >> > > >> > # ./test2 > >> > Date: Oct 10 2008 11:27:37 > >> > Host: Localhost via UNIX socket > >> > Use database: 'bacula_test2'. > >> > Temporary table 'batch' created. > >> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ... > >> > End. > >> > 10000000 records inserts. > >> > *** CPUs time used : 384.650000 > >> > *** Elapsed time (wall clock) : 71670.000000 > >> > > >> > Start inserts from temporary table 'batch' to 'File' ... > >> > End inserts to 'File' ... > >> > *** CPUs time used : 0.000000 > >> > *** Elapsed time (wall clock) : 58564.000000 > >> > > >> > > >> > > >> > > >> > > >> > > >> > ---------------------------------------------------------------------- > >> >-- > >> > > >> > ---------------------------------------------------------------------- > >> >--- This SF.Net email is sponsored by the Moblin Your Move Developer's > >> > challenge Build the coolest Linux based applications with Moblin SDK & > >> > win great prizes Grand prize is a trip for two to an Open Source event > >> > anywhere in the world > >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > >> > ---------------------------------------------------------------------- > >> >-- > >> > > >> > _______________________________________________ > >> > Bacula-devel mailing list > >> > Bac...@li... > >> > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > > > ------------------------------------------------------------------------- > > This SF.Net email is sponsored by the Moblin Your Move Developer's > > challenge Build the coolest Linux based applications with Moblin SDK & > > win great prizes Grand prize is a trip for two to an Open Source event > > anywhere in the world > > http://moblin-contest.org/redirect.php?banner_id=100&url=/ > > _______________________________________________ > > Bacula-devel mailing list > > Bac...@li... > > https://lists.sourceforge.net/lists/listinfo/bacula-devel |
From: Yuri T. <ti...@gm...> - 2008-10-14 08:45:03
|
2008/10/14 Kern Sibbald <ke...@si...>: > On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote: >> 2008/10/14 Kern Sibbald <ke...@si...>: >> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote: >> >> So the modified version is actually a bit faster? >> > >> > That is what I understood too, but I wanted to get a confirmation. >> > >> > If it is indeed the case that the new case runs faster, it is indeed odd, >> > and I would say the tester has fallen into a very trap that is very >> > common in performance analysis. >> >> Of course, this is only the first test! >> I think that will soon be able to hold a series of tests. >> I just limiting the number of entries from 10M to 5M (very long wait) > > Yes, clearly running something 10 times is not very practical if it takes 35 > hours each time, so the test size must be reduced, and you can reduce the > number of runs from 10 to say 5. > > However, what was not at all evident from your first post is that there are > apparently subtle differences in schemas that I did not see and differences > in the size of the data you were inserting -- and those could possibly > explain a large (or even all) the difference in timings. > In an alternative scheme appear new fields : size, ctime, mtime. I therefore reduced length the value that is inserted into the field LStat. For the old scheme, I used : char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E"; and for the new scheme: char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I"; But it is not entirely correct. In an alternative scheme2 appear new fields : size, ctime, mtime, _atime_. The new version of the tests, I did as correctly: char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E"; /* for traditional scheme */ char *lstat = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC"; /* for new scheme */ Therefore, in alternative scheme the length of lstat reduced. That is right? -- with best regards |
From: Kern S. <ke...@si...> - 2008-10-14 08:56:38
|
On Tuesday 14 October 2008 10:42:22 Yuri Timofeev wrote: > 2008/10/14 Kern Sibbald <ke...@si...>: > > On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote: > >> 2008/10/14 Kern Sibbald <ke...@si...>: > >> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote: > >> >> So the modified version is actually a bit faster? > >> > > >> > That is what I understood too, but I wanted to get a confirmation. > >> > > >> > If it is indeed the case that the new case runs faster, it is indeed > >> > odd, and I would say the tester has fallen into a very trap that is > >> > very common in performance analysis. > >> > >> Of course, this is only the first test! > >> I think that will soon be able to hold a series of tests. > >> I just limiting the number of entries from 10M to 5M (very long wait) > > > > Yes, clearly running something 10 times is not very practical if it takes > > 35 hours each time, so the test size must be reduced, and you can reduce > > the number of runs from 10 to say 5. > > > > However, what was not at all evident from your first post is that there > > are apparently subtle differences in schemas that I did not see and > > differences in the size of the data you were inserting -- and those could > > possibly explain a large (or even all) the difference in timings. > > In an alternative scheme appear new fields : size, ctime, mtime. > I therefore reduced length the value that is inserted into the field LStat. > > For the old scheme, I used : > char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E"; > > and for the new scheme: > char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I"; > > But it is not entirely correct. > > > > In an alternative scheme2 appear new fields : size, ctime, mtime, _atime_. > > The new version of the tests, I did as correctly: > char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A > E"; /* for traditional scheme */ > char *lstat = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC"; > /* for new scheme */ > > Therefore, in alternative scheme the length of lstat reduced. > That is right? I would not say it is a question of being right or not. It is a possibility, but it would be a big effort to eliminate those fields -- first there is the programming problem of finding every place they are accessed and ensuring the new fields are used, but even more important, there is the problem of converting existing databases from the old scheme to the new one. |
From: Yuri T. <ti...@gm...> - 2008-10-14 09:14:57
|
2008/10/14 Kern Sibbald <ke...@si...>: > On Tuesday 14 October 2008 10:42:22 Yuri Timofeev wrote: >> 2008/10/14 Kern Sibbald <ke...@si...>: >> > On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote: >> >> 2008/10/14 Kern Sibbald <ke...@si...>: >> >> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote: >> >> >> So the modified version is actually a bit faster? >> >> > >> >> > That is what I understood too, but I wanted to get a confirmation. >> >> > >> >> > If it is indeed the case that the new case runs faster, it is indeed >> >> > odd, and I would say the tester has fallen into a very trap that is >> >> > very common in performance analysis. >> >> >> >> Of course, this is only the first test! >> >> I think that will soon be able to hold a series of tests. >> >> I just limiting the number of entries from 10M to 5M (very long wait) >> > >> > Yes, clearly running something 10 times is not very practical if it takes >> > 35 hours each time, so the test size must be reduced, and you can reduce >> > the number of runs from 10 to say 5. >> > >> > However, what was not at all evident from your first post is that there >> > are apparently subtle differences in schemas that I did not see and >> > differences in the size of the data you were inserting -- and those could >> > possibly explain a large (or even all) the difference in timings. >> >> In an alternative scheme appear new fields : size, ctime, mtime. >> I therefore reduced length the value that is inserted into the field LStat. >> >> For the old scheme, I used : >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E"; >> >> and for the new scheme: >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I"; >> >> But it is not entirely correct. >> >> >> >> In an alternative scheme2 appear new fields : size, ctime, mtime, _atime_. >> >> The new version of the tests, I did as correctly: >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A >> E"; /* for traditional scheme */ >> char *lstat = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC"; >> /* for new scheme */ >> >> Therefore, in alternative scheme the length of lstat reduced. >> That is right? > > I would not say it is a question of being right or not. It is a possibility, > but it would be a big effort to eliminate those fields -- first there is the > programming problem of finding every place they are accessed and ensuring the > new fields are used, but even more important, there is the problem of > converting existing databases from the old scheme to the new one. > > > Yes, I agree. Perhaps these studies will never be translated into bacula source code. However, it is interesting. -- with best regards |
From: Kern S. <ke...@si...> - 2008-10-14 09:38:22
|
On Tuesday 14 October 2008 11:12:17 Yuri Timofeev wrote: > 2008/10/14 Kern Sibbald <ke...@si...>: > > On Tuesday 14 October 2008 10:42:22 Yuri Timofeev wrote: > >> 2008/10/14 Kern Sibbald <ke...@si...>: > >> > On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote: > >> >> 2008/10/14 Kern Sibbald <ke...@si...>: > >> >> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote: > >> >> >> So the modified version is actually a bit faster? > >> >> > > >> >> > That is what I understood too, but I wanted to get a confirmation. > >> >> > > >> >> > If it is indeed the case that the new case runs faster, it is > >> >> > indeed odd, and I would say the tester has fallen into a very trap > >> >> > that is very common in performance analysis. > >> >> > >> >> Of course, this is only the first test! > >> >> I think that will soon be able to hold a series of tests. > >> >> I just limiting the number of entries from 10M to 5M (very long wait) > >> > > >> > Yes, clearly running something 10 times is not very practical if it > >> > takes 35 hours each time, so the test size must be reduced, and you > >> > can reduce the number of runs from 10 to say 5. > >> > > >> > However, what was not at all evident from your first post is that > >> > there are apparently subtle differences in schemas that I did not see > >> > and differences in the size of the data you were inserting -- and > >> > those could possibly explain a large (or even all) the difference in > >> > timings. > >> > >> In an alternative scheme appear new fields : size, ctime, mtime. > >> I therefore reduced length the value that is inserted into the field > >> LStat. > >> > >> For the old scheme, I used : > >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A > >> E"; > >> > >> and for the new scheme: > >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I"; > >> > >> But it is not entirely correct. > >> > >> > >> > >> In an alternative scheme2 appear new fields : size, ctime, mtime, > >> _atime_. > >> > >> The new version of the tests, I did as correctly: > >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A > >> E"; /* for traditional scheme */ > >> char *lstat = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC"; > >> /* for new scheme */ > >> > >> Therefore, in alternative scheme the length of lstat reduced. > >> That is right? > > > > I would not say it is a question of being right or not. It is a > > possibility, but it would be a big effort to eliminate those fields -- > > first there is the programming problem of finding every place they are > > accessed and ensuring the new fields are used, but even more important, > > there is the problem of converting existing databases from the old scheme > > to the new one. > > Yes, I agree. > Perhaps these studies will never be translated into bacula source code. > However, it is interesting. Yes, it is very interesting. We are very likely going to set certain of the LStat fields to zero before the next release. This will have the effect of compressing the LStat record without removing any of the fields. This could give us up to 8-9 bytes gain (smaller) per File record, which will more than compensate the fact that we will be switching from 32 bit FileIds to 64 bit FileIds. Switching to 64 bit FileIds will be an important database change ... Regards, Kern |
From: Yuri T. <ti...@gm...> - 2008-10-19 10:31:54
|
Hi, baculamaniacs ;) If you remember, I held a series of tests on the speed Bacula. I compared the two DB scheme: canonical vs alternative (to the disclosure LStat in several columns). So good news: canonical DB scheme won, was faster. Details are not yet writing. However (and this is "bad" news ;), I concluded that my tests were not correct. For the following reasons. When I changed the test, generates 5M _unique_ Filename, Path and made (as in real Bacula): INSERT INTO Path (Path) SELECT a.Path FROM (SELECT DISTINCT Path FROM batch) AS a WHERE NOT EXISTS (SELECT Path FROM Path AS p WHERE p.Path = a.Path) my MySQL "fell". So I think that I wasted time spent on his "tests" ;( Continue. I have in the database Catalog following statistics: Job 2,823 records File 11,338,602 records Filename 2,188,445 Path 25,929 At each Job there is an average of 4016 files (entries in the table File). For each file (one entry in the table File) in average, 0.1930 entries in the Filename table and 0.0022868 entries in the table Path. In the tests need to use a similar proportion, that is, for example, 10M entries in the File table will be done 1,930,000 ent ries in the table Filename and 22,868 entries in the table Path. In this case, the test will be very similar to real work Bacula. I am going to write a message in a mailing-list bacula-users and ask to send me the sample type: select count(*) from Job; select count(*) from File; select count(*) from Filename; select count(*) from Path; in order to calculate the average proportion of Job's, File's, etc. Then make a new series of "right" tests. I have not aimed to prove that the canonical DB scheme will be slower than the alternative DB scheme. I want to bring the tests to real work and watch some interesting dependencies (see attach). PS. Negative role played as RAID5. The new tests will I use RAID1+0. -- with best regards |