From: Brian K. <bri...@va...> - 2006-01-05 22:45:45
|
Hey, all. We're seeing one of our Bacula machines acting up in a very strange manner. The below error occurs when a one of our machines tries to back itself up. It's a Red Hat Enterprise Linux machine. Initially, it was running Bacula 1.36.1. I upgraded to 1.38.3 in the hopes that this was a bug that had already been fixed, but we're seeing the same results under 1.38.3. I've also tried both full and incremental backups - both behave the same. The director, storage daemon, and file daemon are all on the same machine. There are no iptables rules whatsoever on the machine, aside from the default-to-accept rules. When I do a 'status client' from the console while the job is running, it's always stuck very early on: Running Jobs: JobId 440 Job server.2006-01-05_16.36.41 is running. Backup Job started: 05-Jan-06 16:36 Files=497 Bytes=16,452 Bytes/sec=22 Files Examined=505 Processing file: /dev/cciss/c0d6 SDReadSeqNo=5 fd=7 Director connected at: 05-Jan-06 16:48 Eventually, the job times out. The end of job summary looks like this: 05-Jan 16:54 server-fd: server.2006-01-05_16.36.41 Fatal error: backup.c:654 Network send error to SD. ERR=Connection timed out 05-Jan 16:55 server-dir: server.2006-01-05_16.36.41 Error: Bacula 1.38.3 (04Jan06): 05-Jan-2006 16:55:01 JobId: 440 Job: server.2006-01-05_16.36.41 Backup Level: Full Client: "server-fd" i686-pc-linux-gnu,redhat,Enterprise release FileSet: "Full Set" 2005-08-18 13:44:50 Pool: "Default" Storage: "File" Scheduled time: 05-Jan-2006 16:36:35 Start time: 05-Jan-2006 16:36:43 End time: 05-Jan-2006 16:55:01 Priority: 10 FD Files Written: 497 SD Files Written: 0 FD Bytes Written: 16,452 SD Bytes Written: 0 Rate: 0.0 KB/s Software Compression: None Volume name(s): Volume Session Id: 2 Volume Session Time: 1136496459 Last Volume Bytes: 85,443 Non-fatal FD errors: 0 SD Errors: 0 FD termination status: Error SD termination status: Running Termination: *** Backup Error *** Another interesting point is that even after we get this error message, a 'status storage' shows that the SD thinks the job is still running. Running Jobs: Writing: Full Backup job server JobId=440 Volume="server_ide_0038" pool="Default" device=""FileStorage" (/backup)" Files=9 Bytes=631 Bytes/sec=0 FDReadSeqNo=35 in_msg=25 out_msg=5 fd=7 Any insight on what could cause a problem like this (or suggestions on how to fix it =) ) would be greatly appreciated. Thanks! -Brian |