|
From: Adam C. <ada...@li...> - 2008-05-15 13:05:03
|
Hello, I use Max Wait Time to cancel jobs that are left in queue because no tapes are available. This is useful when our customers forget to load a new set of tapes into the changer. The problem is that SD crashes in this case, here a sample of logs: 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" or label a new one for: Job: pdc1.it-lyon.2008-05-02_21.00.26 Storage: "Dell-LTO2" (/dev/nst0) Pool: Friday Media type: LTO2 Then: 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, Job=intox1.it-lyon.2008-05-02_21.00.28 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal error: job.c:1808 Comm error with SD. bad response to Append Data. ERR=Aucune donnée disponible 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir 2.2.5 (09Oct07): 06-May-2008 11:11:01 Bacula-sd processus sometimes wipes, sometimes it keeps running but doesn't work anymore until we restart it. Another log example: 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or label a new one for: Job: atp-data.2008-05-02_22.00.44 Storage: "Drive-1" (/dev/nst0) Pool: Weekly Media type: LTO3 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or label a new one for: Job: atp-data.2008-05-02_22.00.44 Storage: "Drive-1" (/dev/nst0) Pool: Weekly Media type: LTO3 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job atp-data.2008-05-02_22.00.44 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, Transfer rate = 3.350 M bytes/second 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network send error to SD. ERR=Broken pipe 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 (26Jan08): 08-mai-2008 12:23:41 This is a serious issue as Max Wait Time can't be used (always crash). Could you please tell me if this is a known issue or not ? If not, a customer is okay to "forget to change the tape" so I can provide you some debugging backtraces if needed. Thanks in advance, Best regards, Adam. |
|
From: Kern S. <ke...@si...> - 2008-05-15 16:53:12
|
Hello Adam, If the SD is crashing, then there is definitely a bug and you should open a bug report. It would be preferable if you move up to version 2.2.8 as it simplifies things for me in debugging and finding the problems. Best regards, Kern On Thursday 15 May 2008 09:05:02 Adam Cécile wrote: > Hello, > > I use Max Wait Time to cancel jobs that are left in queue because no > tapes are available. > This is useful when our customers forget to load a new set of tapes into > the changer. > > The problem is that SD crashes in this case, here a sample of logs: > 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" > or label a new one for: > Job: pdc1.it-lyon.2008-05-02_21.00.26 > Storage: "Dell-LTO2" (/dev/nst0) > Pool: Friday > Media type: LTO2 > > Then: > 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, > Job=intox1.it-lyon.2008-05-02_21.00.28 > 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" > 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal > error: job.c:1808 Comm error with SD. bad response to Append Data. > ERR=Aucune donnée disponible > 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir > 2.2.5 (09Oct07): 06-May-2008 11:11:01 > > Bacula-sd processus sometimes wipes, sometimes it keeps running but > doesn't work anymore until we restart it. > > Another log example: > > 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or > label a new one for: > Job: atp-data.2008-05-02_22.00.44 > Storage: "Drive-1" (/dev/nst0) > Pool: Weekly > Media type: LTO3 > 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or > label a new one for: > Job: atp-data.2008-05-02_22.00.44 > Storage: "Drive-1" (/dev/nst0) > Pool: Weekly > Media type: LTO3 > 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded > waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job > atp-data.2008-05-02_22.00.44 > 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, > Transfer rate = 3.350 M bytes/second > 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network > send error to SD. ERR=Broken pipe > 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 > (26Jan08): 08-mai-2008 12:23:41 > > This is a serious issue as Max Wait Time can't be used (always crash). > > Could you please tell me if this is a known issue or not ? If not, a > customer is okay to "forget to change the tape" so I can provide you > some debugging backtraces if needed. > > Thanks in advance, > > Best regards, Adam. |
|
From: Adam C. <ada...@li...> - 2008-05-16 07:09:26
|
Reported as #1087: http://bugs.bacula.org/view.php?id=1087 Best regards, Adam. Kern Sibbald a écrit : > Hello Adam, > > If the SD is crashing, then there is definitely a bug and you should open a > bug report. It would be preferable if you move up to version 2.2.8 as it > simplifies things for me in debugging and finding the problems. > > Best regards, > > Kern > > On Thursday 15 May 2008 09:05:02 Adam Cécile wrote: > >> Hello, >> >> I use Max Wait Time to cancel jobs that are left in queue because no >> tapes are available. >> This is useful when our customers forget to load a new set of tapes into >> the changer. >> >> The problem is that SD crashes in this case, here a sample of logs: >> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" >> or label a new one for: >> Job: pdc1.it-lyon.2008-05-02_21.00.26 >> Storage: "Dell-LTO2" (/dev/nst0) >> Pool: Friday >> Media type: LTO2 >> >> Then: >> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, >> Job=intox1.it-lyon.2008-05-02_21.00.28 >> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" >> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal >> error: job.c:1808 Comm error with SD. bad response to Append Data. >> ERR=Aucune donnée disponible >> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir >> 2.2.5 (09Oct07): 06-May-2008 11:11:01 >> >> Bacula-sd processus sometimes wipes, sometimes it keeps running but >> doesn't work anymore until we restart it. >> >> Another log example: >> >> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >> label a new one for: >> Job: atp-data.2008-05-02_22.00.44 >> Storage: "Drive-1" (/dev/nst0) >> Pool: Weekly >> Media type: LTO3 >> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >> label a new one for: >> Job: atp-data.2008-05-02_22.00.44 >> Storage: "Drive-1" (/dev/nst0) >> Pool: Weekly >> Media type: LTO3 >> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded >> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job >> atp-data.2008-05-02_22.00.44 >> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, >> Transfer rate = 3.350 M bytes/second >> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network >> send error to SD. ERR=Broken pipe >> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 >> (26Jan08): 08-mai-2008 12:23:41 >> >> This is a serious issue as Max Wait Time can't be used (always crash). >> >> Could you please tell me if this is a known issue or not ? If not, a >> customer is okay to "forget to change the tape" so I can provide you >> some debugging backtraces if needed. >> >> Thanks in advance, >> >> Best regards, Adam. >> > > > |
|
From: Kern S. <ke...@si...> - 2008-05-16 12:31:50
|
On Friday 16 May 2008 03:09:26 Adam Cécile wrote: > Reported as #1087: > http://bugs.bacula.org/view.php?id=1087 OK, thanks. If you haven't already done so, please attach a traceback when it crashes, as well as your bacula-dir.conf and bacula-sd.conf files. Thanks, Kern > > Best regards, Adam. > > Kern Sibbald a écrit : > > Hello Adam, > > > > If the SD is crashing, then there is definitely a bug and you should open > > a bug report. It would be preferable if you move up to version 2.2.8 as > > it simplifies things for me in debugging and finding the problems. > > > > Best regards, > > > > Kern > > > > On Thursday 15 May 2008 09:05:02 Adam Cécile wrote: > >> Hello, > >> > >> I use Max Wait Time to cancel jobs that are left in queue because no > >> tapes are available. > >> This is useful when our customers forget to load a new set of tapes into > >> the changer. > >> > >> The problem is that SD crashes in this case, here a sample of logs: > >> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" > >> or label a new one for: > >> Job: pdc1.it-lyon.2008-05-02_21.00.26 > >> Storage: "Dell-LTO2" (/dev/nst0) > >> Pool: Friday > >> Media type: LTO2 > >> > >> Then: > >> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, > >> Job=intox1.it-lyon.2008-05-02_21.00.28 > >> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" > >> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal > >> error: job.c:1808 Comm error with SD. bad response to Append Data. > >> ERR=Aucune donnée disponible > >> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir > >> 2.2.5 (09Oct07): 06-May-2008 11:11:01 > >> > >> Bacula-sd processus sometimes wipes, sometimes it keeps running but > >> doesn't work anymore until we restart it. > >> > >> Another log example: > >> > >> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or > >> label a new one for: > >> Job: atp-data.2008-05-02_22.00.44 > >> Storage: "Drive-1" (/dev/nst0) > >> Pool: Weekly > >> Media type: LTO3 > >> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or > >> label a new one for: > >> Job: atp-data.2008-05-02_22.00.44 > >> Storage: "Drive-1" (/dev/nst0) > >> Pool: Weekly > >> Media type: LTO3 > >> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded > >> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job > >> atp-data.2008-05-02_22.00.44 > >> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, > >> Transfer rate = 3.350 M bytes/second > >> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network > >> send error to SD. ERR=Broken pipe > >> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 > >> (26Jan08): 08-mai-2008 12:23:41 > >> > >> This is a serious issue as Max Wait Time can't be used (always crash). > >> > >> Could you please tell me if this is a known issue or not ? If not, a > >> customer is okay to "forget to change the tape" so I can provide you > >> some debugging backtraces if needed. > >> > >> Thanks in advance, > >> > >> Best regards, Adam. |
|
From: Adam C. <ada...@li...> - 2008-05-19 06:51:57
|
Hi, Could you please tell me more about how to get a useful traceback ? Thanks in advance, Regards, Adam. Kern Sibbald a écrit : > On Friday 16 May 2008 03:09:26 Adam Cécile wrote: > >> Reported as #1087: >> http://bugs.bacula.org/view.php?id=1087 >> > > OK, thanks. If you haven't already done so, please attach a traceback when it > crashes, as well as your bacula-dir.conf and bacula-sd.conf files. > > Thanks, > > Kern > > >> Best regards, Adam. >> >> Kern Sibbald a écrit : >> >>> Hello Adam, >>> >>> If the SD is crashing, then there is definitely a bug and you should open >>> a bug report. It would be preferable if you move up to version 2.2.8 as >>> it simplifies things for me in debugging and finding the problems. >>> >>> Best regards, >>> >>> Kern >>> >>> On Thursday 15 May 2008 09:05:02 Adam Cécile wrote: >>> >>>> Hello, >>>> >>>> I use Max Wait Time to cancel jobs that are left in queue because no >>>> tapes are available. >>>> This is useful when our customers forget to load a new set of tapes into >>>> the changer. >>>> >>>> The problem is that SD crashes in this case, here a sample of logs: >>>> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" >>>> or label a new one for: >>>> Job: pdc1.it-lyon.2008-05-02_21.00.26 >>>> Storage: "Dell-LTO2" (/dev/nst0) >>>> Pool: Friday >>>> Media type: LTO2 >>>> >>>> Then: >>>> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, >>>> Job=intox1.it-lyon.2008-05-02_21.00.28 >>>> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" >>>> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal >>>> error: job.c:1808 Comm error with SD. bad response to Append Data. >>>> ERR=Aucune donnée disponible >>>> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir >>>> 2.2.5 (09Oct07): 06-May-2008 11:11:01 >>>> >>>> Bacula-sd processus sometimes wipes, sometimes it keeps running but >>>> doesn't work anymore until we restart it. >>>> >>>> Another log example: >>>> >>>> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >>>> label a new one for: >>>> Job: atp-data.2008-05-02_22.00.44 >>>> Storage: "Drive-1" (/dev/nst0) >>>> Pool: Weekly >>>> Media type: LTO3 >>>> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >>>> label a new one for: >>>> Job: atp-data.2008-05-02_22.00.44 >>>> Storage: "Drive-1" (/dev/nst0) >>>> Pool: Weekly >>>> Media type: LTO3 >>>> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded >>>> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job >>>> atp-data.2008-05-02_22.00.44 >>>> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, >>>> Transfer rate = 3.350 M bytes/second >>>> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network >>>> send error to SD. ERR=Broken pipe >>>> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 >>>> (26Jan08): 08-mai-2008 12:23:41 >>>> >>>> This is a serious issue as Max Wait Time can't be used (always crash). >>>> >>>> Could you please tell me if this is a known issue or not ? If not, a >>>> customer is okay to "forget to change the tape" so I can provide you >>>> some debugging backtraces if needed. >>>> >>>> Thanks in advance, >>>> >>>> Best regards, Adam. >>>> > > > |
|
From: Kern S. <ke...@si...> - 2008-05-20 12:36:24
|
On Monday 19 May 2008 08:52:07 Adam Cécile wrote: > Hi, > > Could you please tell me more about how to get a useful traceback ? Well, if the installation is done correctly, an automatic dump should be emailed to you. If not, the Kaboom chapter of the manual describes the techniques for manually debugging Bacula and the commands to issue to get the desired traceback. Best regards, Kern > > Thanks in advance, > > Regards, Adam. > > Kern Sibbald a écrit : > > On Friday 16 May 2008 03:09:26 Adam Cécile wrote: > >> Reported as #1087: > >> http://bugs.bacula.org/view.php?id=1087 > > > > OK, thanks. If you haven't already done so, please attach a traceback > > when it crashes, as well as your bacula-dir.conf and bacula-sd.conf > > files. > > > > Thanks, > > > > Kern > > > >> Best regards, Adam. > >> > >> Kern Sibbald a écrit : > >>> Hello Adam, > >>> > >>> If the SD is crashing, then there is definitely a bug and you should > >>> open a bug report. It would be preferable if you move up to version > >>> 2.2.8 as it simplifies things for me in debugging and finding the > >>> problems. > >>> > >>> Best regards, > >>> > >>> Kern > >>> > >>> On Thursday 15 May 2008 09:05:02 Adam Cécile wrote: > >>>> Hello, > >>>> > >>>> I use Max Wait Time to cancel jobs that are left in queue because no > >>>> tapes are available. > >>>> This is useful when our customers forget to load a new set of tapes > >>>> into the changer. > >>>> > >>>> The problem is that SD crashes in this case, here a sample of logs: > >>>> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume > >>>> "Daily-005" or label a new one for: > >>>> Job: pdc1.it-lyon.2008-05-02_21.00.26 > >>>> Storage: "Dell-LTO2" (/dev/nst0) > >>>> Pool: Friday > >>>> Media type: LTO2 > >>>> > >>>> Then: > >>>> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, > >>>> Job=intox1.it-lyon.2008-05-02_21.00.28 > >>>> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" > >>>> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 > >>>> Fatal error: job.c:1808 Comm error with SD. bad response to Append > >>>> Data. ERR=Aucune donnée disponible > >>>> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula > >>>> pdc1.it-lyon-dir 2.2.5 (09Oct07): 06-May-2008 11:11:01 > >>>> > >>>> Bacula-sd processus sometimes wipes, sometimes it keeps running but > >>>> doesn't work anymore until we restart it. > >>>> > >>>> Another log example: > >>>> > >>>> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or > >>>> label a new one for: > >>>> Job: atp-data.2008-05-02_22.00.44 > >>>> Storage: "Drive-1" (/dev/nst0) > >>>> Pool: Weekly > >>>> Media type: LTO3 > >>>> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or > >>>> label a new one for: > >>>> Job: atp-data.2008-05-02_22.00.44 > >>>> Storage: "Drive-1" (/dev/nst0) > >>>> Pool: Weekly > >>>> Media type: LTO3 > >>>> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded > >>>> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job > >>>> atp-data.2008-05-02_22.00.44 > >>>> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = > >>>> 134:15:41, Transfer rate = 3.350 M bytes/second > >>>> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network > >>>> send error to SD. ERR=Broken pipe > >>>> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir > >>>> 2.2.8 (26Jan08): 08-mai-2008 12:23:41 > >>>> > >>>> This is a serious issue as Max Wait Time can't be used (always crash). > >>>> > >>>> Could you please tell me if this is a known issue or not ? If not, a > >>>> customer is okay to "forget to change the tape" so I can provide you > >>>> some debugging backtraces if needed. > >>>> > >>>> Thanks in advance, > >>>> > >>>> Best regards, Adam. |
|
From: Arno L. <al...@it...> - 2008-05-19 07:42:33
|
Hi, 19.05.2008 08:52, Adam Cécile wrote: > Hi, > > Could you please tell me more about how to get a useful traceback ? You need gdb installed and the binaries you run should not be stripped. The former is usually ensured by your package manager, the latter is typically done by compiling from source. I beluieve you need the -g switch to gcc, and during install, you skip the strip process. Bacula itself has a script that is automatically called when a program crashes, which will create a backtrace and mail it to the configured operator. You best verify that all this works, for example by sending a signal to the debug-version SD. Also, have a look at the mail I just send regarding Marias problem... the information might be helpful for you, too. And I suspect the problems might be related. Arno > Thanks in advance, > > Regards, Adam. > > Kern Sibbald a écrit : >> On Friday 16 May 2008 03:09:26 Adam Cécile wrote: >> >>> Reported as #1087: >>> http://bugs.bacula.org/view.php?id=1087 >>> >> OK, thanks. If you haven't already done so, please attach a traceback when it >> crashes, as well as your bacula-dir.conf and bacula-sd.conf files. >> >> Thanks, >> >> Kern >> >> >>> Best regards, Adam. >>> >>> Kern Sibbald a écrit : >>> >>>> Hello Adam, >>>> >>>> If the SD is crashing, then there is definitely a bug and you should open >>>> a bug report. It would be preferable if you move up to version 2.2.8 as >>>> it simplifies things for me in debugging and finding the problems. >>>> >>>> Best regards, >>>> >>>> Kern >>>> >>>> On Thursday 15 May 2008 09:05:02 Adam Cécile wrote: >>>> >>>>> Hello, >>>>> >>>>> I use Max Wait Time to cancel jobs that are left in queue because no >>>>> tapes are available. >>>>> This is useful when our customers forget to load a new set of tapes into >>>>> the changer. >>>>> >>>>> The problem is that SD crashes in this case, here a sample of logs: >>>>> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" >>>>> or label a new one for: >>>>> Job: pdc1.it-lyon.2008-05-02_21.00.26 >>>>> Storage: "Dell-LTO2" (/dev/nst0) >>>>> Pool: Friday >>>>> Media type: LTO2 >>>>> >>>>> Then: >>>>> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, >>>>> Job=intox1.it-lyon.2008-05-02_21.00.28 >>>>> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" >>>>> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal >>>>> error: job.c:1808 Comm error with SD. bad response to Append Data. >>>>> ERR=Aucune donnée disponible >>>>> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir >>>>> 2.2.5 (09Oct07): 06-May-2008 11:11:01 >>>>> >>>>> Bacula-sd processus sometimes wipes, sometimes it keeps running but >>>>> doesn't work anymore until we restart it. >>>>> >>>>> Another log example: >>>>> >>>>> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >>>>> label a new one for: >>>>> Job: atp-data.2008-05-02_22.00.44 >>>>> Storage: "Drive-1" (/dev/nst0) >>>>> Pool: Weekly >>>>> Media type: LTO3 >>>>> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >>>>> label a new one for: >>>>> Job: atp-data.2008-05-02_22.00.44 >>>>> Storage: "Drive-1" (/dev/nst0) >>>>> Pool: Weekly >>>>> Media type: LTO3 >>>>> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded >>>>> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job >>>>> atp-data.2008-05-02_22.00.44 >>>>> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, >>>>> Transfer rate = 3.350 M bytes/second >>>>> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network >>>>> send error to SD. ERR=Broken pipe >>>>> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 >>>>> (26Jan08): 08-mai-2008 12:23:41 >>>>> >>>>> This is a serious issue as Max Wait Time can't be used (always crash). >>>>> >>>>> Could you please tell me if this is a known issue or not ? If not, a >>>>> customer is okay to "forget to change the tape" so I can provide you >>>>> some debugging backtraces if needed. >>>>> >>>>> Thanks in advance, >>>>> >>>>> Best regards, Adam. >>>>> >> >> > > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Microsoft > Defy all challenges. Microsoft(R) Visual Studio 2008. > http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ > _______________________________________________ > Bacula-users mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-users > -- Arno Lehmann IT-Service Lehmann www.its-lehmann.de |
|
From: Adam C. <ada...@li...> - 2008-05-19 16:19:47
|
Okay, Bacula-sd and director are currently running with -d 100 and we plan to NOT change the tape tonight. I hope we'll get and useful traceback. If not I'll rebuild the Debian package with NOSTRIP and see if sd send us a backtrace. Regards, Adam. Arno Lehmann a écrit : > Hi, > > 19.05.2008 08:52, Adam Cécile wrote: > >> Hi, >> >> Could you please tell me more about how to get a useful traceback ? >> > > You need gdb installed and the binaries you run should not be > stripped. The former is usually ensured by your package manager, the > latter is typically done by compiling from source. I beluieve you need > the -g switch to gcc, and during install, you skip the strip process. > > Bacula itself has a script that is automatically called when a program > crashes, which will create a backtrace and mail it to the configured > operator. > > You best verify that all this works, for example by sending a signal > to the debug-version SD. > > Also, have a look at the mail I just send regarding Marias problem... > the information might be helpful for you, too. And I suspect the > problems might be related. > > Arno > > >> Thanks in advance, >> >> Regards, Adam. >> >> Kern Sibbald a écrit : >> >>> On Friday 16 May 2008 03:09:26 Adam Cécile wrote: >>> >>> >>>> Reported as #1087: >>>> http://bugs.bacula.org/view.php?id=1087 >>>> >>>> >>> OK, thanks. If you haven't already done so, please attach a traceback when it >>> crashes, as well as your bacula-dir.conf and bacula-sd.conf files. >>> >>> Thanks, >>> >>> Kern >>> >>> >>> >>>> Best regards, Adam. >>>> >>>> Kern Sibbald a écrit : >>>> >>>> >>>>> Hello Adam, >>>>> >>>>> If the SD is crashing, then there is definitely a bug and you should open >>>>> a bug report. It would be preferable if you move up to version 2.2.8 as >>>>> it simplifies things for me in debugging and finding the problems. >>>>> >>>>> Best regards, >>>>> >>>>> Kern >>>>> >>>>> On Thursday 15 May 2008 09:05:02 Adam Cécile wrote: >>>>> >>>>> >>>>>> Hello, >>>>>> >>>>>> I use Max Wait Time to cancel jobs that are left in queue because no >>>>>> tapes are available. >>>>>> This is useful when our customers forget to load a new set of tapes into >>>>>> the changer. >>>>>> >>>>>> The problem is that SD crashes in this case, here a sample of logs: >>>>>> 03-May 12:01 pdc1.it-lyon-sd JobId 1580: Please mount Volume "Daily-005" >>>>>> or label a new one for: >>>>>> Job: pdc1.it-lyon.2008-05-02_21.00.26 >>>>>> Storage: "Dell-LTO2" (/dev/nst0) >>>>>> Pool: Friday >>>>>> Media type: LTO2 >>>>>> >>>>>> Then: >>>>>> 02-May 21:00 pdc1.it-lyon-dir JobId 1582: Start Backup JobId 1582, >>>>>> Job=intox1.it-lyon.2008-05-02_21.00.28 >>>>>> 02-May 21:01 pdc1.it-lyon-dir JobId 1582: Using Device "Dell-LTO2" >>>>>> 06-May 11:10 intox1.it-lyon-fd: intox1.it-lyon.2008-05-02_21.00.28 Fatal >>>>>> error: job.c:1808 Comm error with SD. bad response to Append Data. >>>>>> ERR=Aucune donnée disponible >>>>>> 06-May 11:11 pdc1.it-lyon-dir JobId 1582: Error: Bacula pdc1.it-lyon-dir >>>>>> 2.2.5 (09Oct07): 06-May-2008 11:11:01 >>>>>> >>>>>> Bacula-sd processus sometimes wipes, sometimes it keeps running but >>>>>> doesn't work anymore until we restart it. >>>>>> >>>>>> Another log example: >>>>>> >>>>>> 06-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >>>>>> label a new one for: >>>>>> Job: atp-data.2008-05-02_22.00.44 >>>>>> Storage: "Drive-1" (/dev/nst0) >>>>>> Pool: Weekly >>>>>> Media type: LTO3 >>>>>> 07-mai 12:23 localhost-sd JobId 235: Please mount Volume "000027L3" or >>>>>> label a new one for: >>>>>> Job: atp-data.2008-05-02_22.00.44 >>>>>> Storage: "Drive-1" (/dev/nst0) >>>>>> Pool: Weekly >>>>>> Media type: LTO3 >>>>>> 08-mai 12:23 localhost-sd JobId 235: Fatal error: Max time exceeded >>>>>> waiting to mount Storage Device "Drive-1" (/dev/nst0) for Job >>>>>> atp-data.2008-05-02_22.00.44 >>>>>> 08-mai 12:23 localhost-sd JobId 235: Job write elapsed time = 134:15:41, >>>>>> Transfer rate = 3.350 M bytes/second >>>>>> 08-mai 12:23 localhost-fd JobId 235: Fatal error: backup.c:892 Network >>>>>> send error to SD. ERR=Broken pipe >>>>>> 08-mai 12:23 localhost-dir JobId 235: Error: Bacula localhost-dir 2.2.8 >>>>>> (26Jan08): 08-mai-2008 12:23:41 >>>>>> >>>>>> This is a serious issue as Max Wait Time can't be used (always crash). >>>>>> >>>>>> Could you please tell me if this is a known issue or not ? If not, a >>>>>> customer is okay to "forget to change the tape" so I can provide you >>>>>> some debugging backtraces if needed. >>>>>> >>>>>> Thanks in advance, >>>>>> >>>>>> Best regards, Adam. >>>>>> >>>>>> >>> >>> >> ------------------------------------------------------------------------- >> This SF.net email is sponsored by: Microsoft >> Defy all challenges. Microsoft(R) Visual Studio 2008. >> http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/ >> _______________________________________________ >> Bacula-users mailing list >> Bac...@li... >> https://lists.sourceforge.net/lists/listinfo/bacula-users >> >> > > |