The following bug has been REOPENED.
======================================================================
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000645
======================================================================
Reported By: quelrod
Assigned To:
======================================================================
Project: bacula
Bug ID: 645
Category: Director
Reproducibility: random
Severity: crash
Priority: normal
Status: feedback
======================================================================
Date Submitted: 07-19-2006 07:40 PDT
Last Modified: 07-27-2006 09:39 PDT
======================================================================
Summary: bacula crashes some nights when backups kick off
Description:
Some nights backups are fine, other nights they aren't. I ran bacula-dir
in gdb and waited for a crash. The first night it was running in gdb it
didn't crash. However, last night it did. bconsole is hung and it is
sort of still running but it can't do anything and you can't connect to
it.
======================================================================
----------------------------------------------------------------------
quelrod - 07-19-2006 07:43 PDT
----------------------------------------------------------------------
I forgot to mention some other strange behavior related to this. After it
crashes doing a status dir in bconsole doesn't show the last successful
backups. Before the crash they showed up in the status dir output.
Also, I failed to mention the log shows absolutely nothing.
----------------------------------------------------------------------
quelrod - 07-21-2006 07:09 PDT
----------------------------------------------------------------------
I verified the crash is still present in 1.38.11.
bacula-dir ran in the debugger fine the first night, but the second night
crashed. Latest gdb bt below:
Program received signal SIGPIPE, Broken pipe.
[Switching to Thread 1077783456 (LWP 28820)]
0xffffe410 in __kernel_vsyscall ()
(gdb)
Thread 4 (Thread 1084386224 (LWP 28840)):
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0xffffe410 in __kernel_vsyscall ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x4015c3fb in __read_nocancel () from /lib/tls/libpthread.so.0
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x080963d3 in read_nbytes ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x080977cd in bnet_recv ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x0807f846 in handle_UA_client_request ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x080b4b79 in workq_server ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x40157297 in start_thread () from /lib/tls/libpthread.so.0
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000007 0x4033037e in clone () from /lib/tls/libc.so.6
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000008 0x40a26bb0 in ?? ()
Thread 3 (Thread 1082284976 (LWP 28826)):
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0xffffe410 in __kernel_vsyscall ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x40159cec in pthread_cond_timedwait@... ()
from /lib/tls/libpthread.so.0
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x080b3e3a in watchdog_thread ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x40157297 in start_thread () from /lib/tls/libpthread.so.0
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x4033037e in clone () from /lib/tls/libc.so.6
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x40825bb0 in ?? ()
Thread 2 (Thread 1080183728 (LWP 28825)):
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0xffffe410 in __kernel_vsyscall ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x40328f61 in ___newselect_nocancel () from /lib/tls/libc.so.6
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x08099694 in bnet_thread_server ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0807f556 in connect_thread ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x40157297 in start_thread () from /lib/tls/libpthread.so.0
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x4033037e in clone () from /lib/tls/libc.so.6
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x40624bb0 in ?? ()
Thread 1 (Thread 1077783456 (LWP 28820)):
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0xffffe410 in __kernel_vsyscall ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x4015c37b in __write_nocancel () from /lib/tls/libpthread.so.0
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x4005d96d in vio_write () from /usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x4005e0a8 in net_real_write () from /usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x4005e2f8 in net_flush () from /usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x4005e855 in net_write_command () from
/usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x4005c16b in cli_advanced_command () from
/usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000007 0x400593bd in mysql_send_query () from
/usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000008 0x40059460 in mysql_real_query () from
/usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000009 0x40030cf5 in mysql_query () from /usr/lib/libmysqlclient_r.so.14
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000010 0x08088958 in InsertDB ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000011 0x08089da7 in db_create_job_record ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000012 0x0805d506 in run_job ()
http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000013 0x0804d3c2 in main ()
Here is the stranger part and again I have disappearing backups in status
dir.
Before the crash I had backups from 19-Jul and 20-Jul that ran. After the
crash all of those backups fail to appear in status dir.
----------------------------------------------------------------------
kern - 07-23-2006 01:55 PDT
----------------------------------------------------------------------
I don't believe that this is a bug. You have not shown any job report
output, and you have shown no evidence that Bacula has "crashed". What
appears to have happened is a SIGPIPE, which a communications error that
is out of Bacula's control, and does not cause Bacula to crash.
I suggest you submit this as a support request. See the support page of
the web site: http://www.bacula.org
Re-open this bug only after getting support and only if you have solid
evidence of a Bacula crash.
----------------------------------------------------------------------
quelrod - 07-27-2006 09:39 PDT
----------------------------------------------------------------------
I have tried other support channels: irc, mailing lists, etc. In general
no one seems interested in the problem.
In any case shouldn't bacula handle the SIGPIPE instead of having the
console hang?
After the problem:
treebeard:/usr/bacula/bin # ./bconsole
Connecting to Director treebeard:9101
and it sits there. I cannot ctrl-c the thing and have to kill the pid.
Of course if it couldn't connect it would immediately indicate that.
Also, here is part of my logfile:
26-Jul 08:22 treebeard-dir: Start Backup JobId 249,
Job=NS2_Backup.2006-07-25_20
.00.08
26-Jul 08:22 treebeard-dir: Bacula 1.38.11 (28Jun06): 26-Jul-2006
08:22:30
JobId: 249
Job: NS2_Backup.2006-07-25_20.00.08
Backup Level: Incremental, since=2006-07-24 10:02:04
Client: "ns2-fd" i386-pc-linux-gnu,debian,3.1
FileSet: "NS2" 2006-07-12 10:21:43
Pool: "Inc-Pool"
Storage: "Dell132t"
Scheduled time: 25-Jul-2006 20:00:07
Start time: 26-Jul-2006 08:22:12
End time: 26-Jul-2006 08:22:30
Elapsed time: 18 secs
Priority: 16
FD Files Written: 79
SD Files Written: 79
FD Bytes Written: 20,488,692 (20.48 MB)
SD Bytes Written: 20,496,742 (20.49 MB)
Rate: 1138.3 KB/s
Software Compression: None
Volume name(s): 000016
Volume Session Id: 28
Volume Session Time: 1153320579
Last Volume Bytes: 86,255,010,163 (86.25 GB)
Non-fatal FD errors: 0
SD Errors: 0
FD termination status: OK
SD termination status: OK
Termination: Backup OK
Notice the date Jul 26?
status dir shows the last backup as:
199 Full 30,492 327,135,920 OK 12-Jul-06 10:30
NS2_Backup
Whether the SIGPIPE can be fully handled the issue of jobs running but
then not showing up in all status output is certainly problematic.
Here is what else is curious the system was running happily for a good
month without this problem. I wonder if some underlying library is
causing problems now? All backups are done locally so communications
errors shouldn't be network driven but would be some sort of software
problem.
Tell me what to do, try, information to provide, etc. as I want to get
this bug solved and get you everything you need to aide in that.
Also, the problem only seems to mainfest during nightly runs and I haven't
found a way to reproduce it. Also, sometimes it happens every night, other
times every other night. It's really tempting to think the systems are
cursed given how erratic the behavior is :P
Bug History
Date Modified Username Field Change
======================================================================
07-19-06 07:40 quelrod New Bug
07-19-06 07:40 quelrod Bug Monitored: quelrod
07-19-06 07:43 quelrod Bugnote Added: 0001791
07-21-06 07:09 quelrod Bugnote Added: 0001792
07-23-06 01:55 kern Bugnote Added: 0001793
07-23-06 01:55 kern Resolution open => not a bug
07-23-06 01:55 kern Status new => closed
07-27-06 09:39 quelrod Bugnote Added: 0001804
07-27-06 09:39 quelrod Resolution not a bug => reopened
07-27-06 09:39 quelrod Status closed => feedback
======================================================================
|