From: Dan L. <da...@la...> - 2013-11-04 20:01:38
|
On Oct 30, 2013, at 4:48 PM, dweimer wrote: > On 10/16/2013 5:43 pm, David Newman wrote: >> On 10/16/13 12:44 PM, dweimer wrote: >>> On 10/16/2013 2:13 pm, David Newman wrote: >>>> On 10/14/13 2:44 AM, Martin Simmons wrote: >>>>>>>>>> On Sun, 13 Oct 2013 18:25:07 -0700, David Newman said: >>>>>> >>>>>> On 10/9/13 4:41 PM, David Newman wrote: >>>>>>> FreeBSD 9.2-RELEASE, bacula-client-5.2.12_3 installed from ports >>>>>>> >>>>>>> Ever since upgrading this host to FreeBSD 9.2, bacula-fd crashes >>>>>>> as >>>>>>> soon >>>>>>> as bacula-dir starts a backup job. The entry in /var/log/messages >>>>>>> is: >>>>>>> >>>>>>> Oct 9 16:25:50 o bacula-fd: Bacula interrupted by signal 0: >>>>>>> UNKNOWN >>>>>>> SIGNAL >>>>>>> >>>>>>> Backups worked fine on this host running FreeBSD 9.1 and other >>>>>>> hosts >>>>>>> upgraded to FreeBSD 9.2 run backups OK. >>>>>>> >>>>>>> I've done the uninstall/reinstall thing with the bacula-client >>>>>>> port, >>>>>>> but >>>>>>> that made no difference. >>>>>>> >>>>>>> Thanks in advance for troubleshooting clues. >>>>>>> >>>>>>> dn >>>>>> >>>>>> Is there a Wireshark decode for Bacula? >>>>>> >>>>>> I'm still stuck on this problem, and need more info on what's >>>>>> causing >>>>>> that UNKNOWN SIGNAL error. Wireshark 1.8.6 just shows strings of >>>>>> bytes >>>>>> for the Bacula stuff. >>>>>> >>>>>> Thanks. >>>>>> >>>>>> dn >>>>> >>>>> A wireshark decode won't help much here because problems like this >>>>> must be in >>>>> the fd itself. >>>>> >>>>> Try attaching gdb to the bacula-fd process and see if it catches the >>>>> mysterious signal (see >>>>> http://www.bacula.org/5.2.x-manuals/en/problems/problems/What_Do_When_Bacula.html#SECTION00640000000000000000). >>>> >>>> No luck with this. Per that URL, I've put the btraceback.gdb file in >>>> the >>>> same directory as the bacula-fd executable on the client (in this >>>> case, >>>> /usr/local/sbin) and made the .gdb file executable. >>>> >>>> At run time it produces this error: >>>> >>>> /usr/local/sbin/btraceback.gdb:1: Error in sourced command file: >>>> No symbol table is loaded. Use the "file" command. >>>> >>>> That's problem 1. Problem 2 is that the syntax given for capturing >>>> STDERR and STDOUT -- 2>\&1 -- doesn't work on either csh (root's >>>> default >>>> on FreeBSD) or bash. >>>> >>>> Any ideas on remedying either issue? >>>> >>>> Thanks. >>>> >>>> dn >>>> >>> >>> I have 2>&1, no backslash before the ampersand used with /bin/sh in >>> several cron scripts, on FreeBSD seems to do the job >> >> Thanks, that works for capturing STDERR and STDOUT. >> >> But that .gdb file still produces the same error: >> >> /usr/local/sbin/btraceback.gdb:1: Error in sourced command file: >> No symbol table is loaded. Use the "file" command. >> >> So, I'm still blocked on debugging this issue. >> >> dn >> >> > > Well one of my FreeBSD 9.2 systems decided to take a new route to this > problem. My backups starting failing this morning, without the > bacula-fd process stopping, it starts the client run before job script, > then after two hours fails with no response from the client. > > 2013-10-30 07:52:34 bacula-dir JobId 291: Start Backup JobId 291, > Job=Webmail-Backup.2013-10-30_07.52.32_46 > 2013-10-30 07:52:34 bacula-dir JobId 291: Using Device "FileStorage" > 2013-10-30 07:52:35 webmail-fd JobId 291: shell command: run > ClientRunBeforeJob "/root/bacula/before.sh" > 2013-10-30 07:52:35 webmail-fd JobId 291: ClientRunBeforeJob: > 2013-10-30 07:52:35 webmail-fd JobId 291: ClientRunBeforeJob: Create > PostgreSQL Backup... > 2013-10-30 07:52:35 webmail-fd JobId 291: ClientRunBeforeJob: > 2013-10-30 07:52:35 webmail-fd JobId 291: ClientRunBeforeJob: Getting > Database List > 2013-10-30 07:52:35 webmail-fd JobId 291: ClientRunBeforeJob: > 2013-10-30 09:58:46 bacula-dir JobId 291: Fatal error: Socket error on > ClientRunBeforeJob command: ERR=Connection reset by peer I have no idea. But I have one suggestion, just for kicks. I've long been skeptical of multiple run before/after scripts. I've always preferred to have just one script. Is it worth combining them into one? > > 2013-10-30 09:58:46 bacula-dir JobId 291: Fatal error: Client > "webmail-fd" RunScript failed. > 2013-10-30 09:58:46 bacula-dir JobId 291: Fatal error: Network error > with FD during Backup: ERR=Connection reset by peer That definitely sounds like a networking issue. Some kind of communication issue. > > 2013-10-30 09:58:47 bacula-dir JobId 291: Fatal error: No Job status > returned from FD. > 2013-10-30 09:58:47 bacula-dir JobId 291: Error: Bacula bacula-dir > 5.2.12 (12Sep12): > Build OS: amd64-portbld-freebsd9.2 freebsd 9.2-RELEASE > JobId: 291 > Job: Webmail-Backup.2013-10-30_07.52.32_46 > Backup Level: Incremental, since=2013-10-29 00:07:02 > Client: "webmail-fd" 5.2.12 (12Sep12) > amd64-portbld-freebsd9.2,freebsd,9.2-RELEASE > FileSet: "WebmailZFS-FileSet" 2013-09-27 13:12:07 > Pool: "File" (From Job resource) > Catalog: "MyCatalog" (From Client resource) > Storage: "File" (From Pool resource) > Scheduled time: 30-Oct-2013 07:52:30 > Start time: 30-Oct-2013 07:52:34 > End time: 30-Oct-2013 09:58:47 > Elapsed time: 2 hours 6 mins 13 secs > Priority: 10 > FD Files Written: 0 > SD Files Written: 0 > FD Bytes Written: 0 (0 B) > SD Bytes Written: 0 (0 B) > Rate: 0.0 KB/s > Software Compression: None > VSS: no > Encryption: no > Accurate: no > Volume name(s): > Volume Session Id: 6 > Volume Session Time: 1383098903 > Last Volume Bytes: 27,632,643,492 (27.63 GB) > Non-fatal FD errors: 1 > SD Errors: 0 > FD termination status: Error > SD termination status: OK > Termination: *** Backup Error *** > > > When I check this server, the client run before job script completed, > all the database dumps, were successful, and the ZFS snapshots that > follow the Database dumps complete as well. However Bacula stops > returning the script's status. > > This server was running fine on up through the full backup done Monday > morning, but now comes right back to this problem on every attempt to > backup today. A reboot didn't help, trying a full backup instead of > incremental made no difference. > > Canceled one of the attempts, and restarted after removing the client > run before script, its now backing up files just fine. so I have > temporarily setup a cron job to run 30 minutes before backup to execute > my database backups and zfs snapshots. and removed the client run > before job. Do smaller jobs help? That is, if you do not have the RunBefore scripts, does the job work? > I can find no errors logged on the server running the bacula-fd or the > bacula server with the exception of the timeout error message. Tried > adding heartbeat interval of 1 minute on the client, that didn't help > either. > > -- > Thanks, > Dean E. Weimer > http://www.dweimer.net/ > > ------------------------------------------------------------------------------ > Android is increasing in popularity, but the open development platform that > developers love is also attractive to malware creators. Download this white > paper to learn more about secure code signing practices that can help keep > Android apps secure. > http://pubads.g.doubleclick.net/gampad/clk?id=65839951&iu=/4140/ostg.clktrk > _______________________________________________ > Bacula-users mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-users -- Dan Langille - http://langille.org |