From: <bac...@li...> - 2005-02-25 22:27:33
|
A BUGNOTE has been added to this bug. ====================================================================== http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000228 ====================================================================== Reported By: mac_ville Assigned To: ====================================================================== Project: bacula Bug ID: 228 Category: Director Reproducibility: random Severity: crash Priority: normal Status: feedback ====================================================================== Date Submitted: 02-09-2005 23:26 PST Last Modified: 02-25-2005 14:27 PST ====================================================================== Summary: Bacula-dir crashes Description: At random bacula-dir crashes and leaves the same traceback message. Very few changes to setup of clients. ====================================================================== ---------------------------------------------------------------------- kern - 02-10-2005 02:39 PST ---------------------------------------------------------------------- Thanks for the traceback. It is very interesting because it points the finger rather directly at a compiler bug or a hardware error (bad memory ...) because a valid argument to a subroutine becomes zero on the first use in that subroutine (a call to another subroutine). Could you specify what version of what compiler you are using? Also, I would be interested to know exactly what Bacula was doing at the time of the crash -- is it a simple backup job that it runs regularly? Could you explain what "Very few changes to setup of clients" means? Are you having any problems with this machine with other programs or the OS crashing? Suggestions assuming it is a compiler problem: - Upgrade to the latest gcc compiler, but nothing later than version 3.4.2 - Turn off all optimization in the compiler by ensuring that there is no -O2 or other -O option specified on the compiles. This can be done with the CFLAGS environment variable -- see the Installation part of the manual. ---------------------------------------------------------------------- mac_ville - 02-10-2005 07:52 PST ---------------------------------------------------------------------- Kern : thanks for your quick response... 1. >Could you explain what "Very few changes to setup of clients" means? Meaning - environment has been static since installation Nov 2004 2. Compilation report: Configuration on Sat Nov 27 16:15:50 CET 2004: Host: powerpc-apple-darwin7.6.0 -- darwin 7.6.0 Bacula version: 1.36.1 (26 November 2004) Source code location: . Install binaries: /sbin Install config files: /etc/bacula Scripts directory: /etc/bacula Working directory: /var/bacula/working PID directory: /var/run Subsys directory: /var/run/subsys C Compiler: gcc C++ Compiler: g++ Compiler flags: -g -O2 -Wall Linker flags: -O Libraries: -lpthread Statically Linked Tools: no Statically Linked FD: yes Statically Linked SD: yes Statically Linked DIR: yes Statically Linked CONS: no Database type: MySQL Database lib: -L/usr/lib/mysql -lmysqlclient_r -lz Job Output Email: root@localhost Traceback Email: root@localhost SMTP Host Address: localhost Director Port: 9101 File daemon Port: 9102 Storage daemon Port: 9103 Director User: Director Group: Storage Daemon User: Storage DaemonGroup: File Daemon User: File Daemon Group: SQL binaries Directory /usr/bin Large file support: no Bacula conio support: yes -ltermcap readline support: no TCP Wrappers support: no ZLIB support: yes enable-smartalloc: yes enable-gnome: no enable-wx-console: no enable-tray-monitor: client-only: no ACL support: no marge:~/Desktop/bacula-1.36.1 mvilhelm$ 3. Bacula was doing a backup of an w2k server - a simple regular job occuring every weekday. 4. No other problems with machine and crashes are random. 5. marge:~ mvilhelm$ gcc --version gcc (GCC) 3.3 20030304 (Apple Computer, Inc. build 1640) Copyright (C) 2002 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ---------------------------------------------------------------------- kern - 02-10-2005 08:42 PST ---------------------------------------------------------------------- OK, thanks for the ./configure listing and the responses to the questions. My best guess at the moment is a compiler bug, but if that is the case, it shouldn't be quite so random, so I'm not entirely happy with that idea. I'm not quite sure why there is a -O listed for your Linker options, but my recommendation for immediate things to do are: - Rebuild at least the Director, eliminating the -O option for the linker and the -O2 option for the compiles. The output of ./configure should be correct, but what really counts, is what ends up on the compile/link command lines during the "make". - If there is a later compiler available for your machine, try upgrading to a 3.4 version or possibly downgrading to a 3.2 version. Normally a 3.3 where the second digit is odd means an unstable development version, and should *not* be used in production. ---------------------------------------------------------------------- mac_ville - 02-10-2005 09:53 PST ---------------------------------------------------------------------- I just run: ./configure --with-mysql And I get: C Compiler: gcc C++ Compiler: g++ Compiler flags: -g -O2 -Wall Linker flags: -O Don't know how to disable the -O/O2 flags :) Apple does not provide a 3.4 gcc compiler - 'in the hands of Apple', read about problems compiling 3.4 on OS X. ---------------------------------------------------------------------- mac_ville - 02-10-2005 10:28 PST ---------------------------------------------------------------------- ok - fixed the flags by editing the 'configure' script - might not be the best way but it worked... Up and running with the 'new' compile... ---------------------------------------------------------------------- mac_ville - 02-10-2005 22:13 PST ---------------------------------------------------------------------- ok - I had a dump last night too... This server runs mainly bacula but also web/mail/dns services, never had any problems w anything else... ** From: ro...@ma... Subject: Bacula GDB traceback of bacula-dir Date: February 10, 2005 11:18:59 PM CET To: ro...@lo... Reading symbols for shared libraries ... done /private/var/bacula/working/14241: No such file or directory. Attaching to program: `/sbin/bacula-dir', process 14241. Reading symbols for shared libraries . done 0x90012588 in clock_sleep_trap () $1 = <unknown type> $2 = 0x300188 "bacula-dir" $3 = 0x3001b8 "/sbin/bacula-dir" $4 = <unknown type> $5 = 0xa1cd0 "1.36.1 (26 November 2004)" $6 = 0x9bd20 "powerpc-apple-darwin7.7.0" $7 = 0x9bd3c "darwin" $8 = 0x9bd44 "7.7.0" http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90012588 in clock_sleep_trap () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000d758 in nanosleep () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00054f28 in _Z11bmicrosleepll (sec=60, usec=0) at bsys.c:515 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0001e298 in _Z17wait_for_next_jobPc (one_shot_job_to_run=0x0) at scheduler.c:101 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x000035c0 in main (argc=0, argv=0xbffffef4) at dird.c:240 Thread 4 (process 14241 thread 0x2023): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x900314ac in wait4 () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x00052bbc in signal_handler (sig=10) at signal.c:159 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 <signal handler called> http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x00071f0c in _Z13rwl_writelockP12s_rwlock_tag (rwl=Cannot access memory at address 0xac ) at rwlock.c:216 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x0003d690 in _Z8_db_lockPKciP4s_db (file=0x9d850 "sql_create.c", line=114, mdb=0x0) at sql.c:238 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x0003f47c in _Z25db_create_jobmedia_recordP3JCRP4s_dbP12JOBMEDIA_DBR (jcr=0x805c18, mdb=0x0, jm=0xf0284b40) at sql_create.c:114 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x0000909c in _Z15catalog_requestP3JCRP5BSOCKPc (jcr=0x805c18, bs=0x3071f8, msg=0x80d850 "CreateJobMedia FirstIndex=1 LastIndex=177 StartFile=0 EndFile=0 StartBlock=194 EndBlock=28514489\n") at catreq.c:262 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000007 0x00010508 in _Z11bget_dirmsgP5BSOCK (bs=0x3071f8) at getmsg.c:177 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000008 0x000190f8 in msg_thread (arg=0x805c18) at msgchan.c:226 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000009 0x900246e8 in _pthread_body () Thread 3 (process 14241 thread 0xb03): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90018be8 in semaphore_timedwait_signal_trap () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000e788 in _pthread_cond_wait () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00053dc8 in watchdog_thread (arg=0x0) at watchdog.c:289 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x900246e8 in _pthread_body () Thread 2 (process 14241 thread 0xa03): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x9000b20c in select () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x000737c4 in _Z18bnet_thread_serverP5dlistiP9workq_tagPFPvS3_E (addrs=0x3003a8, max_clients=10, client_wq=0xc7b6c, handle_client_request=0x36820 <_Z24handle_UA_client_requestPv>) at bnet_server.c:155 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00036684 in connect_thread (arg=0x3003a8) at ua_server.c:90 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x900246e8 in _pthread_body () Thread 1 (process 14241 thread 0x203): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90012588 in clock_sleep_trap () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000d758 in nanosleep () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00054f28 in _Z11bmicrosleepll (sec=60, usec=0) at bsys.c:515 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0001e298 in _Z17wait_for_next_jobPc (one_shot_job_to_run=0x0) at scheduler.c:101 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x000035c0 in main (argc=0, argv=0xbffffef4) at dird.c:240 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90012588 in clock_sleep_trap () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000d758 in nanosleep () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00054f28 in _Z11bmicrosleepll (sec=60, usec=0) at bsys.c:515 515 stat = nanosleep(&timeout, NULL); timeout = { tv_sec = 60, tv_nsec = 0 } tv = { tv_sec = 18, tv_usec = 23 } tz = { tz_minuteswest = 1, tz_dsttime = 1 } stat = 10 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0001e298 in _Z17wait_for_next_jobPc (one_shot_job_to_run=0x0) at scheduler.c:101 101 bmicrosleep(NEXT_CHECK_SECS, 0); /* recheck once per minute */ jcr = (JCR *) 0x0 job = (JOB *) 0x0 run = (RUN *) 0xa1980 now = 366 first = false next_job = (job_item *) 0x0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x000035c0 in main (argc=0, argv=0xbffffef4) at dird.c:240 240 while ((jcr = wait_for_next_job(runjob))) { ch = -1 jcr = (JCR *) 0x805c18 no_signals = 0 test_config = 0 uid = 0x0 gid = 0x0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x00000000 in ?? () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x00000000 in ?? () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x00000000 in ?? () No symbol table info available. ---------------------------------------------------------------------- kern - 02-11-2005 09:49 PST ---------------------------------------------------------------------- This dump shows that Bacula is trying to reference a catalog DB that is not open, which is very odd. I've attached a modified version of catreq.c, which you should download and place in <bacula>/src/dird/catreq.c replacing the old version. Rebuild Bacula with the standard stuff "make" "make install", then report back what happens when you run Bacula. Are you running any strange types of jobs? or any large number of console programs, or perhaps calling the console program from within a script in the job? Something is apparently causing the database to get closed out from under the job. ---------------------------------------------------------------------- kern - 02-16-2005 05:00 PST ---------------------------------------------------------------------- Do you by any chance have Bacula configured to restart a job that fails? This is one reason I could see that the DB could get closed -- though it shouldn't. ---------------------------------------------------------------------- mac_ville - 02-16-2005 05:45 PST ---------------------------------------------------------------------- Kern, I do not have bacula restart failed jobs, I do not run any strange jobs or console requests. I have not recompiled with the catreq.c file. I wanted to run with the non-optimized version of bacula-dir for a while. And everything has been running fine since last dump. The only 'strange' thing I do is running bacula under may favourite OS - OS X :) BTW thanks for a great backup program... ---------------------------------------------------------------------- mac_ville - 02-25-2005 14:27 PST ---------------------------------------------------------------------- Kern, this is the last dump, happened today CET. I will recompile with the new catreq.c and see what happens. From: ro...@ma... Subject: Bacula GDB traceback of bacula-dir Date: den 25 februari 2005 23.16.04 MET To: ro...@lo... Reading symbols for shared libraries ... done /private/var/bacula/working/593: No such file or directory. Attaching to program: `/sbin/bacula-dir', process 593. Reading symbols for shared libraries . done 0x90012588 in clock_sleep_trap () $1 = <unknown type> $2 = 0x300188 "bacula-dir" $3 = 0x3001b8 "/sbin/bacula-dir" $4 = <unknown type> $5 = 0xa1cd0 "1.36.1 (26 November 2004)" $6 = 0x9bd20 "powerpc-apple-darwin7.7.0" $7 = 0x9bd3c "darwin" $8 = 0x9bd44 "7.7.0" http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90012588 in clock_sleep_trap () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000d758 in nanosleep () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00054f28 in _Z11bmicrosleepll (sec=60, usec=0) at bsys.c:515 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0001e298 in _Z17wait_for_next_jobPc (one_shot_job_to_run=0x0) at scheduler.c:101 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x000035c0 in main (argc=0, argv=0xbffffef4) at dird.c:240 Thread 4 (process 593 thread 0x2023): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x900314ac in wait4 () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x00052bbc in signal_handler (sig=10) at signal.c:159 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 <signal handler called> http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x00071f0c in _Z13rwl_writelockP12s_rwlock_tag (rwl=Cannot access memory at address 0xac ) at rwlock.c:216 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x0003d690 in _Z8_db_lockPKciP4s_db (file=0x9d850 "sql_create.c", line=114, mdb=0x0) at sql.c:238 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x0003f47c in _Z25db_create_jobmedia_recordP3JCRP4s_dbP12JOBMEDIA_DBR (jcr=0x805c18, mdb=0x0, jm=0xf0284b40) at sql_create.c:114 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x0000909c in _Z15catalog_requestP3JCRP5BSOCKPc (jcr=0x805c18, bs=0x3075c8, msg=0x80d850 "CreateJobMedia FirstIndex=1 LastIndex=140 StartFile=0 EndFile=0 StartBlock=194 EndBlock=19611826\n") at catreq.c:262 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000007 0x00010508 in _Z11bget_dirmsgP5BSOCK (bs=0x3075c8) at getmsg.c:177 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000008 0x000190f8 in msg_thread (arg=0x805c18) at msgchan.c:226 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000009 0x900246e8 in _pthread_body () Thread 3 (process 593 thread 0xb03): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90018be8 in semaphore_timedwait_signal_trap () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000e788 in _pthread_cond_wait () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00053dc8 in watchdog_thread (arg=0x0) at watchdog.c:289 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x900246e8 in _pthread_body () Thread 2 (process 593 thread 0xa03): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x9000b20c in select () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x000737c4 in _Z18bnet_thread_serverP5dlistiP9workq_tagPFPvS3_E (addrs=0x3003a8, max_clients=10, client_wq=0xc7b6c, handle_client_request=0x36820 <_Z24handle_UA_client_requestPv>) at bnet_server.c:155 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00036684 in connect_thread (arg=0x3003a8) at ua_server.c:90 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x900246e8 in _pthread_body () Thread 1 (process 593 thread 0x203): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90012588 in clock_sleep_trap () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000d758 in nanosleep () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00054f28 in _Z11bmicrosleepll (sec=60, usec=0) at bsys.c:515 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0001e298 in _Z17wait_for_next_jobPc (one_shot_job_to_run=0x0) at scheduler.c:101 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x000035c0 in main (argc=0, argv=0xbffffef4) at dird.c:240 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x90012588 in clock_sleep_trap () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x9000d758 in nanosleep () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x00054f28 in _Z11bmicrosleepll (sec=60, usec=0) at bsys.c:515 515 stat = nanosleep(&timeout, NULL); timeout = { tv_sec = 60, tv_nsec = 0 } tv = { tv_sec = 15, tv_usec = 23 } tz = { tz_minuteswest = 1, tz_dsttime = 3 } stat = 25 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0001e298 in _Z17wait_for_next_jobPc (one_shot_job_to_run=0x0) at scheduler.c:101 101 bmicrosleep(NEXT_CHECK_SECS, 0); /* recheck once per minute */ jcr = (JCR *) 0x0 job = (JOB *) 0x0 run = (RUN *) 0xa1980 now = 366 first = false next_job = (job_item *) 0x0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x000035c0 in main (argc=0, argv=0xbffffef4) at dird.c:240 240 while ((jcr = wait_for_next_job(runjob))) { ch = -1 jcr = (JCR *) 0x805c18 no_signals = 0 test_config = 0 uid = 0x0 gid = 0x0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x00000000 in ?? () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x00000000 in ?? () No symbol table info available. http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x00000000 in ?? () No symbol table info available. Bug History Date Modified Username Field Change ====================================================================== 02-09-05 23:26 mac_ville New Bug 02-09-05 23:26 mac_ville File Added: marge_traceback 02-10-05 02:39 kern Bugnote Added: 0000643 02-10-05 02:39 kern Status new => feedback 02-10-05 07:52 mac_ville Bugnote Added: 0000646 02-10-05 08:42 kern Bugnote Added: 0000648 02-10-05 09:53 mac_ville Bugnote Added: 0000650 02-10-05 10:28 mac_ville Bugnote Added: 0000651 02-10-05 22:13 mac_ville Bugnote Added: 0000652 02-11-05 09:42 kern File Added: catreq.c 02-11-05 09:49 kern Bugnote Added: 0000656 02-16-05 05:00 kern Bugnote Added: 0000674 02-16-05 05:45 mac_ville Bugnote Added: 0000675 02-25-05 14:27 mac_ville Bugnote Added: 0000699 ====================================================================== |