Thread: [Bacula-devel] Alternative DB proposal | Bacula

bacula-devel

[Bacula-devel] Alternative DB proposal

From: John H. <Jo...@mi...> - 2008-10-07 02:51:52

Attachments: makefile bacula1-file-schema.sql bacula2-file-schema.sql bacula-file-indexes.sql gen_dumpfile.c

Hi
My first test are complete.
To aid in getting all the questions/answers and results in one place, 
I've made a wiki entry.


http://wiki.bacula.org/doku.php?id=wiki:playground


My scripts are attached.

--John

Re: [Bacula-devel] Alternative DB proposal

From: Kern S. <ke...@si...> - 2008-10-07 07:56:05

On Tuesday 07 October 2008 04:50:50 John Huttley wrote:
> Hi
> My first test are complete.
> To aid in getting all the questions/answers and results in one place,
> I've made a wiki entry.
>
>
> http://wiki.bacula.org/doku.php?id=wiki:playground
>
>
> My scripts are attached.
>
> --John

Can you present a concise summary of the times to insert with and without the 
extra fields and the sizes of the resulting databases?

Then I think people can start to have some idea of the possible costs of this 
proposal.  Also at some point, if we really want to proceed, we *may* need 
similar tests with MySQL, but that is not necessary for the moment.

Regards,

Kern

Re: [Bacula-devel] Alternative DB proposal

From: John H. <Jo...@mi...> - 2008-10-07 18:28:49

Kern,
All detail I have is on the wiki, but to repeat,
add 2 datetime and a long integer sends the insert time from 520 to 532 
minutes, an increase of 2.3 %

Size increase of DB too small to detect on my system, thus it is under 1%.

--john

Kern Sibbald wrote:
> On Tuesday 07 October 2008 04:50:50 John Huttley wrote:
>   
>> Hi
>> My first test are complete.
>> To aid in getting all the questions/answers and results in one place,
>> I've made a wiki entry.
>>
>>
>> http://wiki.bacula.org/doku.php?id=wiki:playground
>>
>>
>> My scripts are attached.
>>
>> --John
>>     
>
> Can you present a concise summary of the times to insert with and without the 
> extra fields and the sizes of the resulting databases?
>
> Then I think people can start to have some idea of the possible costs of this 
> proposal.  Also at some point, if we really want to proceed, we *may* need 
> similar tests with MySQL, but that is not necessary for the moment.
>
> Regards,
>
> Kern
>
>
>

Re: [Bacula-devel] Alternative DB proposal

From: Eric B. <er...@eb...> - 2008-10-07 08:19:09

Hello John,

An other solution that doesn't require any additional code and no database 
structure modification, is to use a PL function to extract mtime, ctime, 
size, block_size etc... from LStat field.

This function is available on Postgresql and now on Mysql (thanks to Kjetil). 

The only drawback that i see with this PL function, is that you *should* avoid 
to use it in WHERE or JOIN clauses. But, it works very well for reporting and 
analysis.

Bye

Le Tuesday 07 October 2008 04:50:50 John Huttley, vous avez écrit :
> Hi
> My first test are complete.
> To aid in getting all the questions/answers and results in one place,
> I've made a wiki entry.
>
>
> http://wiki.bacula.org/doku.php?id=wiki:playground
>
>
> My scripts are attached.
>
> --John

Re: [Bacula-devel] Alternative DB proposal

From: Kjetil T. H. <kje...@li...> - 2008-10-07 12:55:30

John Huttley <Jo...@mi...> writes:
>
> http://wiki.bacula.org/doku.php?id=wiki:playground

|   "ctime Timestamp [...] The datetime it was created."

ctime is inode change time on Unix clients, and create time on Windows
clients.  this difference alone makes the value of the field suspect.

-- 
regards,          | Redpill  _
Kjetil T. Homme   | Linpro  (_)

Re: [Bacula-devel] Alternative DB proposal

From: John H. <Jo...@mi...> - 2008-10-07 18:31:28

Yes,
That was next on my list. Not something I've done before so I'm glad to 
hear that someone else has done it.

Where will I find this code?


Regards,

John




Eric Bollengier wrote:
> Hello John,
>
> An other solution that doesn't require any additional code and no database 
> structure modification, is to use a PL function to extract mtime, ctime, 
> size, block_size etc... from LStat field.
>
> This function is available on Postgresql and now on Mysql (thanks to Kjetil). 
>
> The only drawback that i see with this PL function, is that you *should* avoid 
> to use it in WHERE or JOIN clauses. But, it works very well for reporting and 
> analysis.
>
> Bye
>
> Le Tuesday 07 October 2008 04:50:50 John Huttley, vous avez écrit :
>   
>> Hi
>> My first test are complete.
>> To aid in getting all the questions/answers and results in one place,
>> I've made a wiki entry.
>>
>>
>> http://wiki.bacula.org/doku.php?id=wiki:playground
>>
>>
>> My scripts are attached.
>>
>> --John
>>     
>
>
>
>

Re: [Bacula-devel] Alternative DB proposal

From: Eric B. <er...@eb...> - 2008-10-07 18:43:25

Take a look to 

trunk/gui/bweb/script/bweb-postgresql.sql and bweb-mysql.sql
function base64_decode_lstat();

Bye

Le Tuesday 07 October 2008 20:31:18 John Huttley, vous avez écrit :
> Yes,
> That was next on my list. Not something I've done before so I'm glad to
> hear that someone else has done it.
>
> Where will I find this code?
>
>
> Regards,
>
> John
>
> Eric Bollengier wrote:
> > Hello John,
> >
> > An other solution that doesn't require any additional code and no
> > database structure modification, is to use a PL function to extract
> > mtime, ctime, size, block_size etc... from LStat field.
> >
> > This function is available on Postgresql and now on Mysql (thanks to
> > Kjetil).
> >
> > The only drawback that i see with this PL function, is that you *should*
> > avoid to use it in WHERE or JOIN clauses. But, it works very well for
> > reporting and analysis.
> >
> > Bye
> >
> > Le Tuesday 07 October 2008 04:50:50 John Huttley, vous avez écrit :
> >> Hi
> >> My first test are complete.
> >> To aid in getting all the questions/answers and results in one place,
> >> I've made a wiki entry.
> >>
> >>
> >> http://wiki.bacula.org/doku.php?id=wiki:playground
> >>
> >>
> >> My scripts are attached.
> >>
> >> --John

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-08 15:12:26

Good work!
But...
I think the test is not quite correct. ;)
I am talking about the command "COPY" in PostgreSQL.

In other words, I think that COPY and INSERT performed differently.
"COPY" must be implemented faster than individual "INSERT" into real Bacula Job.

I may be wrong.

To make the most correct test, we must take the bacula source code and
based on it to make gen_filetable.c, for example.
We must  use INSERTs instead COPY.

At the moment I have a server (yet completely empty and sufficiently
powerful) where I can create a database size of 100-200Gb or more, to
conduct tests.
I need the source code gen_filetable.c

I novice C programmer, so do not promise that write code fast. ;)

2008/10/7 John Huttley <Jo...@mi...>:
> Hi
> My first test are complete.
> To aid in getting all the questions/answers and results in one place, I've
> made a wiki entry.
>
>
> http://wiki.bacula.org/doku.php?id=wiki:playground
>
>
> My scripts are attached.
>
> --John
>


> */
> #include <stdio.h>
>
> #define ROWS 1000000
> main () {
>
> long int I;
> int row_offset;
>         row_offset=2000000000;
>
> //      printf("COPY file (fileid, fileindex, jobid, pathid, filenameid,
> markid, lstat, md5, size, mtime,ctime) FROM stdin;\n");
>        printf("COPY file (fileid, fileindex, jobid, pathid, filenameid,
> markid, lstat, md5) FROM stdin;\n");
>        for  (I=0; I< ROWS; I++ ) {
>        printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\n",
> //      printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\t%d\t%s\t%s\n",
>                I+row_offset,           // fileid
>                3,                      //fileindex
>                random(),               //jobid
>                random(),               //pathid
>                1,                      //filenameid
>                1,                      //markid
>                "LSTATDATA",            //LSTAT
>                "MD512345678901234"     //md5
> //              ,0,                     //size
> //              "2008-09-01T01:02:04",  //ctime
> //              "2008-09-01T01:02:05"   //mtime
>        );
>        }
>
> }
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Bacula-devel mailing list
> Bac...@li...
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>
>



-- 
with best regards

Re: [Bacula-devel] Alternative DB proposal

From: Dan L. <da...@la...> - 2008-10-08 15:29:09

On Oct 8, 2008, at 11:12 AM, Yuri Timofeev wrote:

> Good work!
> But...
> I think the test is not quite correct. ;)
> I am talking about the command "COPY" in PostgreSQL.
>
> In other words, I think that COPY and INSERT performed differently.
> "COPY" must be implemented faster than individual "INSERT" into real  
> Bacula Job.

Yes, COPY, if doing multiple rows, should be faster than INSERT.

> I may be wrong.

I may be too.  I doubt it.
>


>

-- 
Dan Langille
http://langille.org/

Re: [Bacula-devel] Alternative DB proposal: Use case for ctime?

From: John H. <Jo...@mi...> - 2008-10-08 18:43:17

Does  anyone think we need ctime?
Else I can run the tests without it.

--John

Re: [Bacula-devel] Alternative DB proposal

From: Jesper K. <je...@kr...> - 2008-10-08 19:26:12

Yuri Timofeev wrote:
> Good work!
> But...
> I think the test is not quite correct. ;)
> I am talking about the command "COPY" in PostgreSQL.
> 
> In other words, I think that COPY and INSERT performed differently.
> "COPY" must be implemented faster than individual "INSERT" into real Bacula Job.
> 
> I may be wrong.
> 
> To make the most correct test, we must take the bacula source code and
> based on it to make gen_filetable.c, for example.
> We must  use INSERTs instead COPY.

INSERT in a transaction should perform "similar" to "copy". You could 
even do a insert with multiple tuples. But single inserts will 
absolutely be slower than copy due to parsing/copying and transaction 
overhead.

-- 
Jesper

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-10 06:08:55

Sorry.
I was wrong.
I am now looking into the source code bacula.
Working with postgresql is using:
'COPY batch FROM STDIN'
(not INSERT ;)

I am now prepared and launched a test for MySQL. I look forward to
when it will end. ;)

2008/10/8 Yuri Timofeev <ti...@gm...>:
> Good work!
> But...
> I think the test is not quite correct. ;)
> I am talking about the command "COPY" in PostgreSQL.
>
> In other words, I think that COPY and INSERT performed differently.
> "COPY" must be implemented faster than individual "INSERT" into real Bacula Job.
>
> I may be wrong.
>
> To make the most correct test, we must take the bacula source code and
> based on it to make gen_filetable.c, for example.
> We must  use INSERTs instead COPY.
>
> At the moment I have a server (yet completely empty and sufficiently
> powerful) where I can create a database size of 100-200Gb or more, to
> conduct tests.
> I need the source code gen_filetable.c
>
> I novice C programmer, so do not promise that write code fast. ;)
>
> 2008/10/7 John Huttley <Jo...@mi...>:
>> Hi
>> My first test are complete.
>> To aid in getting all the questions/answers and results in one place, I've
>> made a wiki entry.
>>
>>
>> http://wiki.bacula.org/doku.php?id=wiki:playground
>>
>>
>> My scripts are attached.
>>
>> --John
>>
>
>
>> */
>> #include <stdio.h>
>>
>> #define ROWS 1000000
>> main () {
>>
>> long int I;
>> int row_offset;
>>         row_offset=2000000000;
>>
>> //      printf("COPY file (fileid, fileindex, jobid, pathid, filenameid,
>> markid, lstat, md5, size, mtime,ctime) FROM stdin;\n");
>>        printf("COPY file (fileid, fileindex, jobid, pathid, filenameid,
>> markid, lstat, md5) FROM stdin;\n");
>>        for  (I=0; I< ROWS; I++ ) {
>>        printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\n",
>> //      printf("%d\t%d\t%d\t%d\t%d\t%d\t%s\t%s\t%d\t%s\t%s\n",
>>                I+row_offset,           // fileid
>>                3,                      //fileindex
>>                random(),               //jobid
>>                random(),               //pathid
>>                1,                      //filenameid
>>                1,                      //markid
>>                "LSTATDATA",            //LSTAT
>>                "MD512345678901234"     //md5
>> //              ,0,                     //size
>> //              "2008-09-01T01:02:04",  //ctime
>> //              "2008-09-01T01:02:05"   //mtime
>>        );
>>        }
>>
>> }
>>
>> -------------------------------------------------------------------------
>> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
>> Build the coolest Linux based applications with Moblin SDK & win great
>> prizes
>> Grand prize is a trip for two to an Open Source event anywhere in the world
>> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> _______________________________________________
>> Bacula-devel mailing list
>> Bac...@li...
>> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>>
>>
>
>
>
> --
> with best regards
>



-- 
with best regards

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-14 07:22:50

Attachments: test.tar.gz

Hi

2008/10/7 John Huttley <Jo...@mi...>:
> Hi
> My first test are complete.
> To aid in getting all the questions/answers and results in one place, I've
> made a wiki entry.
>
> http://wiki.bacula.org/doku.php?id=wiki:playground
>

My first test are complete. ;)

Summary:

MySQL, ProLiant DL180G5, CentOS 5.2

1. Batch inserts 10M unique! records in table 'File' (DB schema not
changed) : 39 hours (23+16), DB file size 15,4 Gb
2. Batch inserts 10M unique! records in table 'File' (NEW DB schema
see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2
Gb

Source code attached.

Please note that the length of time (16 hours) insert records from the
table 'batch' to the table 'File' the same.
This indicates the stability of results.





Full report:

# ./test
Date: Oct 10 2008 11:27:37
Host: Localhost via UNIX socket
Use database: 'bacula_test'.
Temporary table 'batch' created.
Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
End.
10000000 records inserts.
*** CPUs time used : 375.880000
*** Elapsed time (wall clock) : 85334.000000

Start inserts from temporary table 'batch' to 'File' ...
End inserts to 'File' ...
*** CPUs time used : 0.000000
*** Elapsed time (wall clock) : 58838.000000




# ./test2
Date: Oct 10 2008 11:27:37
Host: Localhost via UNIX socket
Use database: 'bacula_test2'.
Temporary table 'batch' created.
Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
End.
10000000 records inserts.
*** CPUs time used : 384.650000
*** Elapsed time (wall clock) : 71670.000000

Start inserts from temporary table 'batch' to 'File' ...
End inserts to 'File' ...
*** CPUs time used : 0.000000
*** Elapsed time (wall clock) : 58564.000000





-- 
with best regards

Re: [Bacula-devel] Alternative DB proposal

From: Kern S. <ke...@si...> - 2008-10-14 09:37:41

Hello,

Could you summarize the difference in your schemas.

From what I am seeing it looks like your first test that ran 39 hours is a 
standard Bacula File table more or less, and the second test NEW DB that ran 
35 hours is a standard Bacula File table with additional fields and produces 
a smaller size.

If that is true, then I don't understand the timing differences.

Regards,

Kern



On Tuesday 14 October 2008 09:22:44 Yuri Timofeev wrote:
> Hi
>
> 2008/10/7 John Huttley <Jo...@mi...>:
> > Hi
> > My first test are complete.
> > To aid in getting all the questions/answers and results in one place,
> > I've made a wiki entry.
> >
> > http://wiki.bacula.org/doku.php?id=wiki:playground
>
> My first test are complete. ;)
>
> Summary:
>
> MySQL, ProLiant DL180G5, CentOS 5.2
>
> 1. Batch inserts 10M unique! records in table 'File' (DB schema not
> changed) : 39 hours (23+16), DB file size 15,4 Gb
> 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema
> see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2
> Gb
>
> Source code attached.
>
> Please note that the length of time (16 hours) insert records from the
> table 'batch' to the table 'File' the same.
> This indicates the stability of results.
>
>
>
>
>
> Full report:
>
> # ./test
> Date: Oct 10 2008 11:27:37
> Host: Localhost via UNIX socket
> Use database: 'bacula_test'.
> Temporary table 'batch' created.
> Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> End.
> 10000000 records inserts.
> *** CPUs time used : 375.880000
> *** Elapsed time (wall clock) : 85334.000000
>
> Start inserts from temporary table 'batch' to 'File' ...
> End inserts to 'File' ...
> *** CPUs time used : 0.000000
> *** Elapsed time (wall clock) : 58838.000000
>
>
>
>
> # ./test2
> Date: Oct 10 2008 11:27:37
> Host: Localhost via UNIX socket
> Use database: 'bacula_test2'.
> Temporary table 'batch' created.
> Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> End.
> 10000000 records inserts.
> *** CPUs time used : 384.650000
> *** Elapsed time (wall clock) : 71670.000000
>
> Start inserts from temporary table 'batch' to 'File' ...
> End inserts to 'File' ...
> *** CPUs time used : 0.000000
> *** Elapsed time (wall clock) : 58564.000000

Re: [Bacula-devel] Alternative DB proposal

From: John H. <Jo...@mi...> - 2008-10-14 07:39:44

So the modified version is actually a bit faster?

Thats odd.

are you running  64Bit?
Whats your Ram?
Version of Mysql?


I'll run it on my system also.

Regards,

john


Yuri Timofeev wrote:
> Hi
>
> 2008/10/7 John Huttley <Jo...@mi...>:
>   
>> Hi
>> My first test are complete.
>> To aid in getting all the questions/answers and results in one place, I've
>> made a wiki entry.
>>
>> http://wiki.bacula.org/doku.php?id=wiki:playground
>>
>>     
>
> My first test are complete. ;)
>
> Summary:
>
> MySQL, ProLiant DL180G5, CentOS 5.2
>
> 1. Batch inserts 10M unique! records in table 'File' (DB schema not
> changed) : 39 hours (23+16), DB file size 15,4 Gb
> 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema
> see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2
> Gb
>
> Source code attached.
>
> Please note that the length of time (16 hours) insert records from the
> table 'batch' to the table 'File' the same.
> This indicates the stability of results.
>
>
>
>
>
> Full report:
>
> # ./test
> Date: Oct 10 2008 11:27:37
> Host: Localhost via UNIX socket
> Use database: 'bacula_test'.
> Temporary table 'batch' created.
> Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> End.
> 10000000 records inserts.
> *** CPUs time used : 375.880000
> *** Elapsed time (wall clock) : 85334.000000
>
> Start inserts from temporary table 'batch' to 'File' ...
> End inserts to 'File' ...
> *** CPUs time used : 0.000000
> *** Elapsed time (wall clock) : 58838.000000
>
>
>
>
> # ./test2
> Date: Oct 10 2008 11:27:37
> Host: Localhost via UNIX socket
> Use database: 'bacula_test2'.
> Temporary table 'batch' created.
> Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> End.
> 10000000 records inserts.
> *** CPUs time used : 384.650000
> *** Elapsed time (wall clock) : 71670.000000
>
> Start inserts from temporary table 'batch' to 'File' ...
> End inserts to 'File' ...
> *** CPUs time used : 0.000000
> *** Elapsed time (wall clock) : 58564.000000
>
>
>
>
>
>   
> ------------------------------------------------------------------------
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> ------------------------------------------------------------------------
>
> _______________________________________________
> Bacula-devel mailing list
> Bac...@li...
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-14 07:59:13

2008/10/14 John Huttley <Jo...@mi...>:
> So the modified version is actually a bit faster?
>

Well, yes.

> Thats odd.

So that MySQL slow with fields of type BLOB, imho.

In an alternative scheme appear new fields : size, ctime, mtime.
I therefore reduced length the value that is inserted into the field LStat.

char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I";
instead of
char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E";

>
> are you running  64Bit?

Yes.

> Whats your Ram?

8Gb, but MySQL does not use the entire memory.
The server is used only for this test.

> Version of Mysql?

mysql  Ver 14.12 Distrib 5.0.45, for redhat-linux-gnu (x86_64) using
readline 5.0

/etc/my.cnf taken from /usr/share/doc/mysql-server-5.0.45/my-huge.cnf

>
>
> I'll run it on my system also.
>
> Regards,
>
> john
>
>
> Yuri Timofeev wrote:
>
> Hi
>
> 2008/10/7 John Huttley <Jo...@mi...>:
>
>
> Hi
> My first test are complete.
> To aid in getting all the questions/answers and results in one place, I've
> made a wiki entry.
>
> http://wiki.bacula.org/doku.php?id=wiki:playground
>
>
>
> My first test are complete. ;)
>
> Summary:
>
> MySQL, ProLiant DL180G5, CentOS 5.2
>
> 1. Batch inserts 10M unique! records in table 'File' (DB schema not
> changed) : 39 hours (23+16), DB file size 15,4 Gb
> 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema
> see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2
> Gb
>
> Source code attached.
>
> Please note that the length of time (16 hours) insert records from the
> table 'batch' to the table 'File' the same.
> This indicates the stability of results.
>
>
>
>
>
> Full report:
>
> # ./test
> Date: Oct 10 2008 11:27:37
> Host: Localhost via UNIX socket
> Use database: 'bacula_test'.
> Temporary table 'batch' created.
> Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> End.
> 10000000 records inserts.
> *** CPUs time used : 375.880000
> *** Elapsed time (wall clock) : 85334.000000
>
> Start inserts from temporary table 'batch' to 'File' ...
> End inserts to 'File' ...
> *** CPUs time used : 0.000000
> *** Elapsed time (wall clock) : 58838.000000
>
>
>
>
> # ./test2
> Date: Oct 10 2008 11:27:37
> Host: Localhost via UNIX socket
> Use database: 'bacula_test2'.
> Temporary table 'batch' created.
> Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> End.
> 10000000 records inserts.
> *** CPUs time used : 384.650000
> *** Elapsed time (wall clock) : 71670.000000
>
> Start inserts from temporary table 'batch' to 'File' ...
> End inserts to 'File' ...
> *** CPUs time used : 0.000000
> *** Elapsed time (wall clock) : 58564.000000
>
>
>
>
>
>
>
> ________________________________
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great
> prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
>
> ________________________________
> _______________________________________________
> Bacula-devel mailing list
> Bac...@li...
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>



-- 
with best regards

Re: [Bacula-devel] Alternative DB proposal

From: Kern S. <ke...@si...> - 2008-10-14 07:56:26

On Tuesday 14 October 2008 09:39:27 John Huttley wrote:
> So the modified version is actually a bit faster?

That is what I understood too, but I wanted to get a confirmation.

If it is indeed the case that the new case runs faster, it is indeed odd, and 
I would say the tester has fallen into a very trap that is very common in 
performance analysis.  

That is to get reliable test results on a memory caching machine such as 
Linux, you must run each test 10 times, and throw out the two with the times 
that differ the most from the average, then re-compute the average based on 
the remaining 8 samples which should not vary more than a few percent.

Another aid is to run one set of 10 tests, then the other set of 10 tests, 
then the first set of 10 tests again, and make sure the two runs of the first 
set generate the same results (using the methodology mentioned above).
If you want to have a better measure of disk times, you can sprinkle "sync" 
shell commands between each of the 10 test runs.

Regards,

Kern

>
> Thats odd.
>
> are you running  64Bit?
> Whats your Ram?
> Version of Mysql?
>
>
> I'll run it on my system also.
>
> Regards,
>
> john
>
> Yuri Timofeev wrote:
> > Hi
> >
> > 2008/10/7 John Huttley <Jo...@mi...>:
> >> Hi
> >> My first test are complete.
> >> To aid in getting all the questions/answers and results in one place,
> >> I've made a wiki entry.
> >>
> >> http://wiki.bacula.org/doku.php?id=wiki:playground
> >
> > My first test are complete. ;)
> >
> > Summary:
> >
> > MySQL, ProLiant DL180G5, CentOS 5.2
> >
> > 1. Batch inserts 10M unique! records in table 'File' (DB schema not
> > changed) : 39 hours (23+16), DB file size 15,4 Gb
> > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema
> > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2
> > Gb
> >
> > Source code attached.
> >
> > Please note that the length of time (16 hours) insert records from the
> > table 'batch' to the table 'File' the same.
> > This indicates the stability of results.
> >
> >
> >
> >
> >
> > Full report:
> >
> > # ./test
> > Date: Oct 10 2008 11:27:37
> > Host: Localhost via UNIX socket
> > Use database: 'bacula_test'.
> > Temporary table 'batch' created.
> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> > End.
> > 10000000 records inserts.
> > *** CPUs time used : 375.880000
> > *** Elapsed time (wall clock) : 85334.000000
> >
> > Start inserts from temporary table 'batch' to 'File' ...
> > End inserts to 'File' ...
> > *** CPUs time used : 0.000000
> > *** Elapsed time (wall clock) : 58838.000000
> >
> >
> >
> >
> > # ./test2
> > Date: Oct 10 2008 11:27:37
> > Host: Localhost via UNIX socket
> > Use database: 'bacula_test2'.
> > Temporary table 'batch' created.
> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> > End.
> > 10000000 records inserts.
> > *** CPUs time used : 384.650000
> > *** Elapsed time (wall clock) : 71670.000000
> >
> > Start inserts from temporary table 'batch' to 'File' ...
> > End inserts to 'File' ...
> > *** CPUs time used : 0.000000
> > *** Elapsed time (wall clock) : 58564.000000
> >
> >
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's
> > challenge Build the coolest Linux based applications with Moblin SDK &
> > win great prizes Grand prize is a trip for two to an Open Source event
> > anywhere in the world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Bacula-devel mailing list
> > Bac...@li...
> > https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-14 08:06:33

2008/10/14 Kern Sibbald <ke...@si...>:
> On Tuesday 14 October 2008 09:39:27 John Huttley wrote:
>> So the modified version is actually a bit faster?
>
> That is what I understood too, but I wanted to get a confirmation.
>
> If it is indeed the case that the new case runs faster, it is indeed odd, and
> I would say the tester has fallen into a very trap that is very common in
> performance analysis.


Of course, this is only the first test!
I think that will soon be able to hold a series of tests.
I just limiting the number of entries from 10M to 5M (very long wait)




>
> That is to get reliable test results on a memory caching machine such as
> Linux, you must run each test 10 times, and throw out the two with the times
> that differ the most from the average, then re-compute the average based on
> the remaining 8 samples which should not vary more than a few percent.
>
> Another aid is to run one set of 10 tests, then the other set of 10 tests,
> then the first set of 10 tests again, and make sure the two runs of the first
> set generate the same results (using the methodology mentioned above).
> If you want to have a better measure of disk times, you can sprinkle "sync"
> shell commands between each of the 10 test runs.
>
> Regards,
>
> Kern
>
>>
>> Thats odd.
>>
>> are you running  64Bit?
>> Whats your Ram?
>> Version of Mysql?
>>
>>
>> I'll run it on my system also.
>>
>> Regards,
>>
>> john
>>
>> Yuri Timofeev wrote:
>> > Hi
>> >
>> > 2008/10/7 John Huttley <Jo...@mi...>:
>> >> Hi
>> >> My first test are complete.
>> >> To aid in getting all the questions/answers and results in one place,
>> >> I've made a wiki entry.
>> >>
>> >> http://wiki.bacula.org/doku.php?id=wiki:playground
>> >
>> > My first test are complete. ;)
>> >
>> > Summary:
>> >
>> > MySQL, ProLiant DL180G5, CentOS 5.2
>> >
>> > 1. Batch inserts 10M unique! records in table 'File' (DB schema not
>> > changed) : 39 hours (23+16), DB file size 15,4 Gb
>> > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema
>> > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2
>> > Gb
>> >
>> > Source code attached.
>> >
>> > Please note that the length of time (16 hours) insert records from the
>> > table 'batch' to the table 'File' the same.
>> > This indicates the stability of results.
>> >
>> >
>> >
>> >
>> >
>> > Full report:
>> >
>> > # ./test
>> > Date: Oct 10 2008 11:27:37
>> > Host: Localhost via UNIX socket
>> > Use database: 'bacula_test'.
>> > Temporary table 'batch' created.
>> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
>> > End.
>> > 10000000 records inserts.
>> > *** CPUs time used : 375.880000
>> > *** Elapsed time (wall clock) : 85334.000000
>> >
>> > Start inserts from temporary table 'batch' to 'File' ...
>> > End inserts to 'File' ...
>> > *** CPUs time used : 0.000000
>> > *** Elapsed time (wall clock) : 58838.000000
>> >
>> >
>> >
>> >
>> > # ./test2
>> > Date: Oct 10 2008 11:27:37
>> > Host: Localhost via UNIX socket
>> > Use database: 'bacula_test2'.
>> > Temporary table 'batch' created.
>> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
>> > End.
>> > 10000000 records inserts.
>> > *** CPUs time used : 384.650000
>> > *** Elapsed time (wall clock) : 71670.000000
>> >
>> > Start inserts from temporary table 'batch' to 'File' ...
>> > End inserts to 'File' ...
>> > *** CPUs time used : 0.000000
>> > *** Elapsed time (wall clock) : 58564.000000
>> >
>> >
>> >
>> >
>> >
>> >
>> > ------------------------------------------------------------------------
>> >
>> > -------------------------------------------------------------------------
>> > This SF.Net email is sponsored by the Moblin Your Move Developer's
>> > challenge Build the coolest Linux based applications with Moblin SDK &
>> > win great prizes Grand prize is a trip for two to an Open Source event
>> > anywhere in the world
>> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
>> > ------------------------------------------------------------------------
>> >
>> > _______________________________________________
>> > Bacula-devel mailing list
>> > Bac...@li...
>> > https://lists.sourceforge.net/lists/listinfo/bacula-devel
>
>
>
> -------------------------------------------------------------------------
> This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
> Build the coolest Linux based applications with Moblin SDK & win great prizes
> Grand prize is a trip for two to an Open Source event anywhere in the world
> http://moblin-contest.org/redirect.php?banner_id=100&url=/
> _______________________________________________
> Bacula-devel mailing list
> Bac...@li...
> https://lists.sourceforge.net/lists/listinfo/bacula-devel
>



-- 
with best regards

Re: [Bacula-devel] Alternative DB proposal

From: Kern S. <ke...@si...> - 2008-10-14 12:33:27

On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote:
> 2008/10/14 Kern Sibbald <ke...@si...>:
> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote:
> >> So the modified version is actually a bit faster?
> >
> > That is what I understood too, but I wanted to get a confirmation.
> >
> > If it is indeed the case that the new case runs faster, it is indeed odd,
> > and I would say the tester has fallen into a very trap that is very
> > common in performance analysis.
>
> Of course, this is only the first test!
> I think that will soon be able to hold a series of tests.
> I just limiting the number of entries from 10M to 5M (very long wait)

Yes, clearly running something 10 times is not very practical if it takes 35 
hours each time, so the test size must be reduced, and you can reduce the 
number of runs from 10 to say 5.

However, what was not at all evident from your first post is that there are 
apparently subtle differences in schemas that I did not see and differences 
in the size of the data you were inserting -- and those could possibly 
explain a large (or even all)  the difference in timings.

>
> > That is to get reliable test results on a memory caching machine such as
> > Linux, you must run each test 10 times, and throw out the two with the
> > times that differ the most from the average, then re-compute the average
> > based on the remaining 8 samples which should not vary more than a few
> > percent.
> >
> > Another aid is to run one set of 10 tests, then the other set of 10
> > tests, then the first set of 10 tests again, and make sure the two runs
> > of the first set generate the same results (using the methodology
> > mentioned above). If you want to have a better measure of disk times, you
> > can sprinkle "sync" shell commands between each of the 10 test runs.
> >
> > Regards,
> >
> > Kern
> >
> >> Thats odd.
> >>
> >> are you running  64Bit?
> >> Whats your Ram?
> >> Version of Mysql?
> >>
> >>
> >> I'll run it on my system also.
> >>
> >> Regards,
> >>
> >> john
> >>
> >> Yuri Timofeev wrote:
> >> > Hi
> >> >
> >> > 2008/10/7 John Huttley <Jo...@mi...>:
> >> >> Hi
> >> >> My first test are complete.
> >> >> To aid in getting all the questions/answers and results in one place,
> >> >> I've made a wiki entry.
> >> >>
> >> >> http://wiki.bacula.org/doku.php?id=wiki:playground
> >> >
> >> > My first test are complete. ;)
> >> >
> >> > Summary:
> >> >
> >> > MySQL, ProLiant DL180G5, CentOS 5.2
> >> >
> >> > 1. Batch inserts 10M unique! records in table 'File' (DB schema not
> >> > changed) : 39 hours (23+16), DB file size 15,4 Gb
> >> > 2. Batch inserts 10M unique! records in table 'File' (NEW DB schema
> >> > see also create_mysql_test2.sh) : 35 hours (19+16), DB file size 15,2
> >> > Gb
> >> >
> >> > Source code attached.
> >> >
> >> > Please note that the length of time (16 hours) insert records from the
> >> > table 'batch' to the table 'File' the same.
> >> > This indicates the stability of results.
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > Full report:
> >> >
> >> > # ./test
> >> > Date: Oct 10 2008 11:27:37
> >> > Host: Localhost via UNIX socket
> >> > Use database: 'bacula_test'.
> >> > Temporary table 'batch' created.
> >> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> >> > End.
> >> > 10000000 records inserts.
> >> > *** CPUs time used : 375.880000
> >> > *** Elapsed time (wall clock) : 85334.000000
> >> >
> >> > Start inserts from temporary table 'batch' to 'File' ...
> >> > End inserts to 'File' ...
> >> > *** CPUs time used : 0.000000
> >> > *** Elapsed time (wall clock) : 58838.000000
> >> >
> >> >
> >> >
> >> >
> >> > # ./test2
> >> > Date: Oct 10 2008 11:27:37
> >> > Host: Localhost via UNIX socket
> >> > Use database: 'bacula_test2'.
> >> > Temporary table 'batch' created.
> >> > Start filling tables 'batch'(temporary), 'Path', 'Filename' ...
> >> > End.
> >> > 10000000 records inserts.
> >> > *** CPUs time used : 384.650000
> >> > *** Elapsed time (wall clock) : 71670.000000
> >> >
> >> > Start inserts from temporary table 'batch' to 'File' ...
> >> > End inserts to 'File' ...
> >> > *** CPUs time used : 0.000000
> >> > *** Elapsed time (wall clock) : 58564.000000
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > ----------------------------------------------------------------------
> >> >--
> >> >
> >> > ----------------------------------------------------------------------
> >> >--- This SF.Net email is sponsored by the Moblin Your Move Developer's
> >> > challenge Build the coolest Linux based applications with Moblin SDK &
> >> > win great prizes Grand prize is a trip for two to an Open Source event
> >> > anywhere in the world
> >> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> >> > ----------------------------------------------------------------------
> >> >--
> >> >
> >> > _______________________________________________
> >> > Bacula-devel mailing list
> >> > Bac...@li...
> >> > https://lists.sourceforge.net/lists/listinfo/bacula-devel
> >
> > -------------------------------------------------------------------------
> > This SF.Net email is sponsored by the Moblin Your Move Developer's
> > challenge Build the coolest Linux based applications with Moblin SDK &
> > win great prizes Grand prize is a trip for two to an Open Source event
> > anywhere in the world
> > http://moblin-contest.org/redirect.php?banner_id=100&url=/
> > _______________________________________________
> > Bacula-devel mailing list
> > Bac...@li...
> > https://lists.sourceforge.net/lists/listinfo/bacula-devel

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-14 08:45:03

2008/10/14 Kern Sibbald <ke...@si...>:
> On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote:
>> 2008/10/14 Kern Sibbald <ke...@si...>:
>> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote:
>> >> So the modified version is actually a bit faster?
>> >
>> > That is what I understood too, but I wanted to get a confirmation.
>> >
>> > If it is indeed the case that the new case runs faster, it is indeed odd,
>> > and I would say the tester has fallen into a very trap that is very
>> > common in performance analysis.
>>
>> Of course, this is only the first test!
>> I think that will soon be able to hold a series of tests.
>> I just limiting the number of entries from 10M to 5M (very long wait)
>
> Yes, clearly running something 10 times is not very practical if it takes 35
> hours each time, so the test size must be reduced, and you can reduce the
> number of runs from 10 to say 5.
>
> However, what was not at all evident from your first post is that there are
> apparently subtle differences in schemas that I did not see and differences
> in the size of the data you were inserting -- and those could possibly
> explain a large (or even all)  the difference in timings.
>

In an alternative scheme appear new fields : size, ctime, mtime.
I therefore reduced length the value that is inserted into the field LStat.

For the old scheme, I used :
char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E";

and for the new scheme:
char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I";

But it is not entirely correct.



In an alternative scheme2 appear new fields : size, ctime, mtime, _atime_.

The new version of the tests, I did as correctly:
char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A
E"; /* for traditional scheme */
char *lstat    = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC";
/* for new scheme */

Therefore, in alternative scheme the length of lstat reduced.
That is right?



-- 
with best regards

Re: [Bacula-devel] Alternative DB proposal

From: Kern S. <ke...@si...> - 2008-10-14 08:56:38

On Tuesday 14 October 2008 10:42:22 Yuri Timofeev wrote:
> 2008/10/14 Kern Sibbald <ke...@si...>:
> > On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote:
> >> 2008/10/14 Kern Sibbald <ke...@si...>:
> >> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote:
> >> >> So the modified version is actually a bit faster?
> >> >
> >> > That is what I understood too, but I wanted to get a confirmation.
> >> >
> >> > If it is indeed the case that the new case runs faster, it is indeed
> >> > odd, and I would say the tester has fallen into a very trap that is
> >> > very common in performance analysis.
> >>
> >> Of course, this is only the first test!
> >> I think that will soon be able to hold a series of tests.
> >> I just limiting the number of entries from 10M to 5M (very long wait)
> >
> > Yes, clearly running something 10 times is not very practical if it takes
> > 35 hours each time, so the test size must be reduced, and you can reduce
> > the number of runs from 10 to say 5.
> >
> > However, what was not at all evident from your first post is that there
> > are apparently subtle differences in schemas that I did not see and
> > differences in the size of the data you were inserting -- and those could
> > possibly explain a large (or even all)  the difference in timings.
>
> In an alternative scheme appear new fields : size, ctime, mtime.
> I therefore reduced length the value that is inserted into the field LStat.
>
> For the old scheme, I used :
> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E";
>
> and for the new scheme:
> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I";
>
> But it is not entirely correct.
>
>
>
> In an alternative scheme2 appear new fields : size, ctime, mtime, _atime_.
>
> The new version of the tests, I did as correctly:
> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A
> E"; /* for traditional scheme */
> char *lstat    = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC";
> /* for new scheme */
>
> Therefore, in alternative scheme the length of lstat reduced.
> That is right?

I would not say it is a question of being right or not.  It is a possibility, 
but it would be a big effort to eliminate those fields -- first there is the 
programming problem of finding every place they are accessed and ensuring the 
new fields are used, but even more important, there is the problem of 
converting existing databases from the old scheme to the new one.

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-14 09:14:57

2008/10/14 Kern Sibbald <ke...@si...>:
> On Tuesday 14 October 2008 10:42:22 Yuri Timofeev wrote:
>> 2008/10/14 Kern Sibbald <ke...@si...>:
>> > On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote:
>> >> 2008/10/14 Kern Sibbald <ke...@si...>:
>> >> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote:
>> >> >> So the modified version is actually a bit faster?
>> >> >
>> >> > That is what I understood too, but I wanted to get a confirmation.
>> >> >
>> >> > If it is indeed the case that the new case runs faster, it is indeed
>> >> > odd, and I would say the tester has fallen into a very trap that is
>> >> > very common in performance analysis.
>> >>
>> >> Of course, this is only the first test!
>> >> I think that will soon be able to hold a series of tests.
>> >> I just limiting the number of entries from 10M to 5M (very long wait)
>> >
>> > Yes, clearly running something 10 times is not very practical if it takes
>> > 35 hours each time, so the test size must be reduced, and you can reduce
>> > the number of runs from 10 to say 5.
>> >
>> > However, what was not at all evident from your first post is that there
>> > are apparently subtle differences in schemas that I did not see and
>> > differences in the size of the data you were inserting -- and those could
>> > possibly explain a large (or even all)  the difference in timings.
>>
>> In an alternative scheme appear new fields : size, ctime, mtime.
>> I therefore reduced length the value that is inserted into the field LStat.
>>
>> For the old scheme, I used :
>> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A E";
>>
>> and for the new scheme:
>> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I";
>>
>> But it is not entirely correct.
>>
>>
>>
>> In an alternative scheme2 appear new fields : size, ctime, mtime, _atime_.
>>
>> The new version of the tests, I did as correctly:
>> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A
>> E"; /* for traditional scheme */
>> char *lstat    = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC";
>> /* for new scheme */
>>
>> Therefore, in alternative scheme the length of lstat reduced.
>> That is right?
>
> I would not say it is a question of being right or not.  It is a possibility,
> but it would be a big effort to eliminate those fields -- first there is the
> programming problem of finding every place they are accessed and ensuring the
> new fields are used, but even more important, there is the problem of
> converting existing databases from the old scheme to the new one.
>
>
>


Yes, I agree.
Perhaps these studies will never be translated into bacula source code.
However, it is interesting.



-- 
with best regards

Re: [Bacula-devel] Alternative DB proposal

From: Kern S. <ke...@si...> - 2008-10-14 09:38:22

On Tuesday 14 October 2008 11:12:17 Yuri Timofeev wrote:
> 2008/10/14 Kern Sibbald <ke...@si...>:
> > On Tuesday 14 October 2008 10:42:22 Yuri Timofeev wrote:
> >> 2008/10/14 Kern Sibbald <ke...@si...>:
> >> > On Tuesday 14 October 2008 10:06:22 Yuri Timofeev wrote:
> >> >> 2008/10/14 Kern Sibbald <ke...@si...>:
> >> >> > On Tuesday 14 October 2008 09:39:27 John Huttley wrote:
> >> >> >> So the modified version is actually a bit faster?
> >> >> >
> >> >> > That is what I understood too, but I wanted to get a confirmation.
> >> >> >
> >> >> > If it is indeed the case that the new case runs faster, it is
> >> >> > indeed odd, and I would say the tester has fallen into a very trap
> >> >> > that is very common in performance analysis.
> >> >>
> >> >> Of course, this is only the first test!
> >> >> I think that will soon be able to hold a series of tests.
> >> >> I just limiting the number of entries from 10M to 5M (very long wait)
> >> >
> >> > Yes, clearly running something 10 times is not very practical if it
> >> > takes 35 hours each time, so the test size must be reduced, and you
> >> > can reduce the number of runs from 10 to say 5.
> >> >
> >> > However, what was not at all evident from your first post is that
> >> > there are apparently subtle differences in schemas that I did not see
> >> > and differences in the size of the data you were inserting -- and
> >> > those could possibly explain a large (or even all)  the difference in
> >> > timings.
> >>
> >> In an alternative scheme appear new fields : size, ctime, mtime.
> >> I therefore reduced length the value that is inserted into the field
> >> LStat.
> >>
> >> For the old scheme, I used :
> >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A
> >> E";
> >>
> >> and for the new scheme:
> >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I";
> >>
> >> But it is not entirely correct.
> >>
> >>
> >>
> >> In an alternative scheme2 appear new fields : size, ctime, mtime,
> >> _atime_.
> >>
> >> The new version of the tests, I did as correctly:
> >> char *lstat = "MI s9MB IG0 B H2 H0 A 9t BAA I BIVsDs BIR93m BIVqaC A A
> >> E"; /* for traditional scheme */
> >> char *lstat    = "MI s9MB IG0 B H2 H0 A BAA I BIVsDs BIR93m BIVqaC";
> >> /* for new scheme */
> >>
> >> Therefore, in alternative scheme the length of lstat reduced.
> >> That is right?
> >
> > I would not say it is a question of being right or not.  It is a
> > possibility, but it would be a big effort to eliminate those fields --
> > first there is the programming problem of finding every place they are
> > accessed and ensuring the new fields are used, but even more important,
> > there is the problem of converting existing databases from the old scheme
> > to the new one.
>
> Yes, I agree.
> Perhaps these studies will never be translated into bacula source code.
> However, it is interesting.

Yes, it is very interesting.   

We are very likely going to set certain of the LStat fields to zero before the 
next release.  This will have the effect of compressing the LStat record 
without removing any of the fields.  This could give us up to 8-9 bytes gain 
(smaller) per File record, which will more than compensate the fact that we 
will be switching from 32 bit FileIds to 64 bit FileIds.   Switching to 64 
bit FileIds will be an important database change ...

Regards,

Kern

Re: [Bacula-devel] Alternative DB proposal

From: Yuri T. <ti...@gm...> - 2008-10-19 10:31:54

Attachments: graph_all.png vmstat_alt.log vmstat_canon.log README

Hi, baculamaniacs ;)

If you remember, I held a series of tests on the speed Bacula.
I compared the two DB scheme: canonical vs alternative (to the
disclosure LStat in several columns).

So good news: canonical DB scheme won, was faster. Details are not yet writing.

However (and this is "bad" news ;), I concluded that my tests were not correct.

For the following reasons.

When I changed the test, generates 5M _unique_ Filename, Path and made
(as in real Bacula):

INSERT INTO Path (Path) SELECT a.Path FROM
   (SELECT DISTINCT Path FROM batch) AS a WHERE NOT EXISTS
   (SELECT Path FROM Path AS p WHERE p.Path = a.Path)

my MySQL "fell".

So I think that I wasted time spent on his "tests" ;(

Continue.

I have in the database Catalog following statistics:
Job   2,823 records
File  11,338,602 records
Filename 2,188,445
Path  25,929

At each Job there is an average of 4016 files (entries in the table File).
For each file (one entry in the table File) in average, 0.1930 entries
in the Filename table and
0.0022868 entries in the table Path.

In the tests need to use a similar proportion, that is, for example,
10M entries in the File table will be done 1,930,000 ent
ries in the table Filename and 22,868 entries in the table Path.

In this case, the test will be very similar to real work Bacula.

I am going to write a message in a mailing-list bacula-users and ask
to send me the sample type:
select count(*) from Job;
select count(*) from File;
select count(*) from Filename;
select count(*) from Path;

in order to calculate the average proportion of Job's, File's, etc.

Then make a new series of "right" tests.

I have not aimed to prove that the canonical DB scheme will be slower
than the alternative DB scheme. I want to bring the tests to real work
and watch some interesting dependencies (see attach).



PS. Negative role played as RAID5. The new tests will I use RAID1+0.

-- 
with best regards