From: Rudolph B. <r...@bo...> - 2013-11-29 09:57:37
|
Hi List, our bacula director (Debian Squeeze, squeeze-backports, Version 5.2.6) keeps crashing during the nightly backup runs (>400 jobs). I looked at the traceback from last nights crash and it looks rather strange: Here is a small excerpt from the traceback: Attempt to dump current JCRs. njcrs=226 threadid=(nil) JobId=0 JobStatus=R jcr=0xc41f38 name=*JobMonitor*.2013-11-28_17.13.43_01 threadid=(nil) killable=0 JobId=0 JobStatus=R jcr=0xc41f38 name=*JobMonitor*.2013-11-28_17.13.43_01 use_count=1 JobType=I JobLevel= sched_time=28-Nov-2013 17:13 start_time=28-Nov-2013 17:13 end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00 db=(nil) db_batch=(nil) batch_started=0 threadid=(nil) JobId=472723 JobStatus=R jcr=0xd0aca8 name=Some-Job-XY.2013-11-29_00.10.00_03 threadid=(nil) killable=0 JobId=472723 JobStatus=R jcr=0xd0aca8 name=Some-Job-XY.2013-11-29_00.10.00_03 use_count=2 JobType=B JobLevel=F sched_time=29-Nov-2013 00:10 start_time=29-Nov-2013 00:10 end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00 db=0xc46918 db_batch=(nil) batch_started=0 B_DB=0xc46918 db_name=bacula db_user=bacula connected=true cmd="UPDATE Job SET JobStatus='R',Level='I',StartTime='2013-11-29 01:32:42',ClientId=52,JobTDate=1385685162,PoolId=7,FileSetId=54 WHERE JobId=472811" changes=0 RWLOCK=0xc46920 w_active=1 w_wait=4 threadid=(nil) JobId=472724 JobStatus=R jcr=0xd0d188 name=Some-Job-ZZ.2013-11-29_00.10.00_04 threadid=(nil) killable=0 JobId=472724 JobStatus=R jcr=0xd0d188 name=Some-Job-ZZ.2013-11-29_00.10.00_04 use_count=2 JobType=B JobLevel=F sched_time=29-Nov-2013 00:10 start_time=29-Nov-2013 00:10 end_time=01-Jan-1970 01:00 wait_time=01-Jan-1970 01:00 db=0xc46918 db_batch=(nil) batch_started=0 B_DB=0xc46918 db_name=bacula db_user=bacula connected=true cmd="UPDATE Job SET JobStatus='R',Level='I',StartTime='2013-11-29 01:32:42',ClientId=52,JobTDate=1385685162,PoolId=7,FileSetId=54 WHERE JobId=472811" changes=0 RWLOCK=0xc46920 w_active=1 w_wait=4 I am not exactly sure what happens here, but this pattern repeats for roughly 1000 lines. Strange thing is - it always mentions a different Job, but always shows the exact same "UPDATE Job..."-Statement. The job that is actually referred to in the WHERE clause is also among the jobs mentioned. Do you have any idea what causes this beheaviour? Or is it expected? We have been running bacula director in foreground mode with daemontools to capture the debug output (level 50, which is already quite verbose but does not use too much disk i/o while logging). However, the crashes also always seem to happen right after a client connect: 2013-11-29 01:32:42.810051500 vlad01.sipgate.net: next_vol.c:253-472933 Vol=Volume-Name-0002 expired=0 2013-11-29 01:33:03.642079500 vlad01.sipgate.net: job.c:1365-472807 wstore=FileStorageForPool where= 2013-11-29 01:33:03.659857500 vlad01.sipgate.net: cram-md5.c:79-472807 send: auth cram-md5 <307015206.1385685183@bacula-host-name> ssl=2 2013-11-29 01:33:03.660066500 vlad01.sipgate.net: cram-md5.c:98-472807 Authenticate OK SomeSecretHashHere 2013-11-29 01:33:03.702799500 vlad01.sipgate.net: bnet.c:347-472807 TLS client negotiation established. 2013-11-29 01:33:03.783616500 Bacula interrupted by signal 11: Segmentation violation Another Example: 2013-11-28 01:48:10.456400500 vlad01.sipgate.net: next_vol.c:253-472391 Vol=Different-Volume-Name-0001 expired=0 2013-11-28 01:48:15.471433500 vlad01.sipgate.net: job.c:1365-472445 wstore=FileStorageForPool where= 2013-11-28 01:48:15.485142500 vlad01.sipgate.net: cram-md5.c:79-472445 send: auth cram-md5 <1436181388.1385599695@bacula-host-name> ssl=2 2013-11-28 01:48:15.485182500 vlad01.sipgate.net: cram-md5.c:98-472445 Authenticate OK SomeSecretHashHere 2013-11-28 01:48:15.527138500 vlad01.sipgate.net: bnet.c:347-472445 TLS client negotiation established. 2013-11-28 01:48:15.607948500 Bacula interrupted by signal 11: Segmentation violation Any hints on what could cause this behaviour? We are really clueless at this point... :-/ -- Mit freundlichen Grüßen / with kind regards Rudolph Bott |