From: Alex B. <a.b...@gm...> - 2009-11-02 11:36:32
|
Hi again, 2009/10/29 Alex Bramley <a.b...@gm...>: > http://bugs.bacula.org/view.php?id=1399 Can anyone tell me if this will affect jobs run from a schedule in the same manner? Also (and this might be better dealt with in a seperate thread, i'm not sure) I had bacula-sd get into a hung state on friday's test backup run. This has caused no end of fun for me this morning -- volume size mismatches for all clients meaning I have had to delete volumes from disk and catalog and then re-run the backups manually. The culprit appears to me to be the second of the lines below. They all come from bacula-sd.log, just before it stopped responding: 30-Oct 21:01 bksrv0-sd JobId 157: Job jim-desktop.2009-10-30_21.00.00_38 marked to be canceled. 30-Oct 21:01 bksrv0-sd JobId 157: Fatal error: fd_cmds.c:177 FD command not found: 30-Oct 21:01 bksrv0-sd JobId 157: Job write elapsed time = 00:01:01, Transfer rate = 9.387 M bytes/second 30-Oct 21:01 bksrv0-sd JobId 157: Fatal error: append.c:292 Fatal append error on device "jim-desktop" (/backup/volumes/jim-desktop/): ERR=dev.c:532 Could not open: /backup/volumes/jim-desktop/jim-desktop-0045, ERR=No such file or directory I believe the backup was cancelled due to a mount timeout (I have Max Wait Time = 1 minute set), but as the volume was created with no problems I don't know why this would happen: 30-Oct 21:00 bksrv0-sd JobId 157: Warning: dev.c:534 dev.c:532 Could not open: /backup/volumes/jim-desktop/jim-desktop-0045, ERR=No such file or directory 30-Oct 21:00 bksrv0-sd JobId 157: Warning: dev.c:534 dev.c:532 Could not open: /backup/volumes/jim-desktop/jim-desktop-0045, ERR=No such file or directory 30-Oct 21:00 bksrv0-sd JobId 157: Labeled new Volume "jim-desktop-0045" on device "jim-desktop" (/backup/volumes/jim-desktop/). 30-Oct 21:00 bksrv0-sd JobId 157: Wrote label to prelabeled Volume "jim-desktop-0045" on device "jim-desktop" (/backup/volumes/jim-desktop/) Should I be getting that "Could not open" error twice? It seems a little suspect to me also -- maybe multiple things trying to open the volume and only one succeeding, leading to the other cancelling the job. The "FD command not found" error shouldn't be occuring at all as far as I can tell (having dug through the source), because if fd->cmd is zero-length shouldn't the error-checking in lines 150-155 of fd_cmds.c catch this? I looked through bugzilla and I'm wondering if this is related to http://bugs.bacula.org/view.php?id=1371 maybe? If so i'll leave it for now, as I'm planning to move to 3.0.3 this week. Thanks for any advice :-) --Alex |