From: Martin S. <ma...@li...> - 2009-12-31 13:06:50
|
>>>>> On Wed, 30 Dec 2009 09:44:49 +0100, Jesper Krogh said: > > Martin Simmons wrote: > >>>>>> On Tue, 29 Dec 2009 11:05:09 +0100, Jesper Krogh said: > >> Kern Sibbald wrote: > >>> The Kaboom chapter of the manual tells you how to run the Director under the > >>> debugger. You can also attach to the Director while it is running, using: > >>> > >>> cd <bacula-binary-directory> > >>> gdb bacula-dir <pid-of-director> > >> A small month, the problem is still present.. takes hours to get "from > >> done" and to actual restore starts. I've managed to get a backtrace: > >> > >> Thread 2 (Thread 0x42767950 (LWP 10832)): > >> #0 0x000000000040a4a8 in add_findex (bsr=0x6dd468, JobId=32927, > >> findex=132808) at bsr.c:554 > >> #1 0x0000000000432965 in restore_cmd (ua=0x6dcb08, cmd=<value optimized > >> out>) at ua_restore.c:1094 > >> #2 0x0000000000425e56 in do_a_command (ua=0x6dcb08, cmd=0x6d5b10 "1") > >> at ua_cmds.c:180 > >> #3 0x0000000000438781 in handle_UA_client_request (arg=<value optimized > >> out>) at ua_server.c:147 > >> #4 0x000000000046cb8b in workq_server (arg=<value optimized out>) at > >> workq.c:357 > >> #5 0x00007f233e9553f7 in start_thread () from /lib/libpthread.so.0 > >> #6 0x00007f233db1db4d in clone () from /lib/libc.so.6 > >> #7 0x0000000000000000 in ?? () > >> > >> Repeating this with "continue/interrrupt" gives the same trace but with > >> different findex= values. > >> > >> The restore block looks like this: > >> > >> +--------+-------+-----------+-----------------+---------------------+------------+ > >> | JobId | Level | JobFiles | JobBytes | StartTime | > >> VolumeName | > >> +--------+-------+-----------+-----------------+---------------------+------------+ > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 | > >> 000779L3 | > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 | > >> 000789L3 | > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 | > >> 000804L3 | > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 | > >> 001805L3 | > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 | > >> 001806L3 | > >> | 32,927 | F | 3,183,314 | 684,965,311,013 | 2009-12-12 17:31:41 | > >> 001807L3 | > >> | 33,446 | D | 136,256 | 50,695,957,124 | 2009-12-28 08:01:50 | > >> 004048L4 | > >> | 33,473 | I | 1,224 | 16,023,974,683 | 2009-12-28 14:41:19 | > >> 004059L4 | > >> | 33,501 | I | 11,188 | 24,448,676,227 | 2009-12-29 01:40:23 | > >> 004059L4 | > >> +--------+-------+-----------+-----------------+---------------------+------------+ > >> > >> I'm on 2.4.3 and the bsr.c:554 is > >> > >> /* Walk down fi chain and find where to insert insert new FileIndex */ > >> for ( ; fi; fi=fi->next) { > >> if (findex == (fi->findex2 + 1)) { /* extend up */ > >> RBSR_FINDEX *nfi; > >> fi->findex2 = findex; > >> > >> It I get some more time I'll try to add debug information to find out > >> where it's actually looping. Suggestions are certainly welcome. > > > > It might be a variant of this problem: > > > > http://article.gmane.org/gmane.comp.bacula.user/54164/match=add%5ffindex > > It looks quite a lot like the same problem. But I did a diff of the > bsr.c of the freshest one with the 2.4.3 one and there are not changes. > Since an upgrade is non-reversibel I would prefer not to be "forced" to > do it but take it at a time where I had sufficient amount of testing time. > > Can you point to the changes that are supposed to deal with the problem? AFAIK, bsr.c didn't change. The fix was in the code that builds the tree, which now sorts the items by FileIndex. The function db_get_file_list was added and is called by build_directory_tree in ua_restore.c. __Martin |