From: Mantis B. T. <no...@bu...> - 2009-08-26 09:47:14
|
The following issue has been CLOSED ====================================================================== http://bugs.bacula.org/view.php?id=1353 ====================================================================== Reported By: user100 Assigned To: ====================================================================== Project: bacula Issue ID: 1353 Category: Director Reproducibility: always Severity: minor Priority: normal Status: closed Resolution: not a bug Fixed in Version: ====================================================================== Date Submitted: 2009-08-24 09:35 BST Last Modified: 2009-08-26 10:46 BST ====================================================================== Summary: Proceed or cancle waiting jobs Description: The job 23783 did not start because the bacula-fd was not running. I think I have stopped the fd manually and did not start it up again on that pc after updating the bacula-fd on some clients before the weekend. I connected to the client and startet the bacula-fd without any troubles but it seems the bacula-dir does not care (any more) and does not proceed. It was still waiting (and blocking other jobs) even the bacula-fd was running now. I had no "Max Wait Time" set at the director so I tried to cancle it. Some output from bconsole: Running Jobs: Console connected at 24-Aug-09 09:32 JobId Level Name Status ====================================================================== 23783 Increme swtest04.2009-08-21_22.10.01_10 is waiting for Client swtest04-fd to connect to Storage Autochanger 23787 VolumeT Verify_dmzxen01_-_Wurzel.2009-08-21_22.16.00_14 has been canceled 23788 VolumeT Verify_swconstructor.2009-08-21_22.16.00_15 is waiting execution 23789 VolumeT Verify_swdevsrv_-_Wurzel.2009-08-21_22.16.00_16 is waiting execution 23790 VolumeT Verify_swldap01.2009-08-21_22.16.00_17 is waiting execution 23791 VolumeT Verify_swlinbck2.2009-08-21_22.16.00_18 is waiting execution 23792 VolumeT Verify_swmesse2_-_Wurzel.2009-08-21_22.16.00_19 is waiting execution 23793 VolumeT Verify_swserv01_-_E.2009-08-21_22.16.00_20 is waiting execution 23794 VolumeT Verify_swserv02_-_java_dokumente.2009-08-21_22.16.00_21 is waiting execution 23795 VolumeT Verify_swserv04_-_Wurzel.2009-08-21_22.16.00_22 is waiting execution 23796 VolumeT Verify_swserv05_-_Wurzel.2009-08-21_22.16.00_23 is waiting execution 23797 VolumeT Verify_swtemp01.2009-08-21_22.16.00_24 is waiting execution 23798 VolumeT Verify_swtest04.2009-08-21_22.16.00_25 is waiting execution 23799 VolumeT Verify_swvista02.2009-08-21_22.16.00_26 is waiting execution 23800 VolumeT Verify_swwin2008.2009-08-21_22.16.00_27 is waiting execution 23802 VolumeT Verify_swh50.2009-08-22_03.05.00_30 is waiting execution 23804 VolumeT Verify_swserv03_-_Offbck.2009-08-23_23.00.00_32 is waiting execution 23805 Full BackupCatalog.2009-08-23_23.05.00_33 is waiting for higher priority jobs to finish 23806 VolumeT Verify_BackupCatalog.2009-08-23_23.05.00_34 is waiting execution ==== *cancel Select Job: 1: JobId=23783 Job=swtest04.2009-08-21_22.10.01_10 2: JobId=23787 Job=Verify_dmzxen01_-_Wurzel.2009-08-21_22.16.00_14 3: JobId=23788 Job=Verify_swconstructor.2009-08-21_22.16.00_15 4: JobId=23789 Job=Verify_swdevsrv_-_Wurzel.2009-08-21_22.16.00_16 5: JobId=23790 Job=Verify_swldap01.2009-08-21_22.16.00_17 6: JobId=23791 Job=Verify_swlinbck2.2009-08-21_22.16.00_18 7: JobId=23792 Job=Verify_swmesse2_-_Wurzel.2009-08-21_22.16.00_19 8: JobId=23793 Job=Verify_swserv01_-_E.2009-08-21_22.16.00_20 9: JobId=23794 Job=Verify_swserv02_-_java_dokumente.2009-08-21_22.16.00_21 10: JobId=23795 Job=Verify_swserv04_-_Wurzel.2009-08-21_22.16.00_22 11: JobId=23796 Job=Verify_swserv05_-_Wurzel.2009-08-21_22.16.00_23 12: JobId=23797 Job=Verify_swtemp01.2009-08-21_22.16.00_24 13: JobId=23798 Job=Verify_swtest04.2009-08-21_22.16.00_25 14: JobId=23799 Job=Verify_swvista02.2009-08-21_22.16.00_26 15: JobId=23800 Job=Verify_swwin2008.2009-08-21_22.16.00_27 16: JobId=23802 Job=Verify_swh50.2009-08-22_03.05.00_30 17: JobId=23804 Job=Verify_swserv03_-_Offbck.2009-08-23_23.00.00_32 18: JobId=23805 Job=BackupCatalog.2009-08-23_23.05.00_33 19: JobId=23806 Job=Verify_BackupCatalog.2009-08-23_23.05.00_34 Choose Job to cancel (1-19): 1 3904 Job swtest04.2009-08-21_22.10.01_10 not found. * I don´t know from where he got number 3904. It´s not the right jobnumber. After that I restarted running bacula-dir and bacula-sd and got a failure-mail regarding the job (mail.txt attached). Greetings, user100 ====================================================================== ---------------------------------------------------------------------- (0004533) user100 (reporter) - 2009-08-25 09:28 http://bugs.bacula.org/view.php?id=1353#c4533 ---------------------------------------------------------------------- Additional info: I found out that the waiting job would proceed as expected if the "FD Connect Timeout" limit is not reached. So I think the missing continuation of the job is not the real bug. Normally the waiting job should not continue after the timeout because it should not run any more (so that´s ok) but it should get cancled from the director after the timeout (and that does not work). If the bacula-dir does not have the same "cancle" problems as a user in bconsole I guess this depends on two different bugs. Greetings, user100 ---------------------------------------------------------------------- (0004539) kern (administrator) - 2009-08-26 10:46 http://bugs.bacula.org/view.php?id=1353#c4539 ---------------------------------------------------------------------- This is a support issue and not a bug -- at least you have given no clear indication that anything is wrong. Depending on your OS and the Client OS, it can take some time to cancel a job -- even up to 2 hours if it is stuck in an OS network system call. This is mentioned in the documentation. The 3904 is a message number not a Jobid and is also explained in the manual. If all jobs are blocking on a single job then you need to modify your Bacula configuration. Issue History Date Modified Username Field Change ====================================================================== 2009-08-24 09:35 user100 New Issue 2009-08-24 09:35 user100 File Added: mail.txt 2009-08-25 09:28 user100 Note Added: 0004533 2009-08-26 10:46 kern Note Added: 0004539 2009-08-26 10:46 kern Status new => closed 2009-08-26 10:46 kern Resolution open => not a bug ====================================================================== |