From: <bac...@li...> - 2005-03-18 15:33:49
|
A BUGNOTE has been added to this bug. ====================================================================== http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000257 ====================================================================== Reported By: strappaz Assigned To: ====================================================================== Project: bacula Bug ID: 257 Category: Director Reproducibility: unable to duplicate Severity: major Priority: normal Status: feedback ====================================================================== Date Submitted: 03-11-2005 03:28 PST Last Modified: 03-18-2005 07:33 PST ====================================================================== Summary: Bacula-dir segmentation fault Description: The bacula-dir dies on a segmentation fault. It happened three times since upgrade from 1.36.0 to 1.36.2 (three days ago). The job continues to run between the fd and the sd ====================================================================== ---------------------------------------------------------------------- kern - 03-11-2005 05:18 PST ---------------------------------------------------------------------- You will need to run the director under the debugger as described in the Kaboom section of the manual. Follow the instructions in the manual for obtaining a traceback as the one attached does not show the fault. ---------------------------------------------------------------------- strappaz - 03-15-2005 00:15 PST ---------------------------------------------------------------------- Bacula ran the friday's full correctly and hang yesterday. This is the output of gdb : hunt bin # gdb ./bacula-dir GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -s -f -c /usr/local/bacula/etc/bacula-dir.conf Starting program: /usr/local/bacula/bin/bacula-dir -s -f -c /usr/local/bacula/etc/bacula-dir.conf warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 27703)] [New Thread 32769 (LWP 5079)] [New Thread 16386 (LWP 8798)] [New Thread 32771 (LWP 15770)] [New Thread 49156 (LWP 15322)] [New Thread 65541 (LWP 15888)] [New Thread 81926 (LWP 18097)] [New Thread 98311 (LWP 28426)] [New Thread 114696 (LWP 27832)] [New Thread 131081 (LWP 2220)] [New Thread 147466 (LWP 30680)] [New Thread 163851 (LWP 15177)] [New Thread 180236 (LWP 15444)] [New Thread 196621 (LWP 28472)] [New Thread 213006 (LWP 4522)] [New Thread 229391 (LWP 24181)] [New Thread 245776 (LWP 1424)] [New Thread 262161 (LWP 19851)] [New Thread 278546 (LWP 3907)] [New Thread 294931 (LWP 26123)] [New Thread 311316 (LWP 14096)] [New Thread 327701 (LWP 18792)] [New Thread 344086 (LWP 29775)] [New Thread 360471 (LWP 28801)] [New Thread 376856 (LWP 21798)] [New Thread 393241 (LWP 7919)] [New Thread 409626 (LWP 21195)] [New Thread 426011 (LWP 23348)] [New Thread 442396 (LWP 2115)] [New Thread 458781 (LWP 22374)] [New Thread 475166 (LWP 25897)] [New Thread 491551 (LWP 12993)] [New Thread 507936 (LWP 14965)] [New Thread 524321 (LWP 26603)] [New Thread 540706 (LWP 5336)] [New Thread 557091 (LWP 2999)] [New Thread 573476 (LWP 4373)] [New Thread 589861 (LWP 7858)] [New Thread 606246 (LWP 29574)] [New Thread 622631 (LWP 568)] [New Thread 639016 (LWP 13712)] [New Thread 655401 (LWP 13045)] [New Thread 671786 (LWP 22749)] [New Thread 688171 (LWP 32485)] [New Thread 704556 (LWP 20489)] [New Thread 720941 (LWP 27436)] [New Thread 737326 (LWP 28603)] [New Thread 753711 (LWP 14332)] [New Thread 770096 (LWP 18052)] [New Thread 786481 (LWP 16702)] [New Thread 802866 (LWP 19253)] [New Thread 819251 (LWP 702)] [New Thread 835636 (LWP 368)] [New Thread 852021 (LWP 10507)] Program received signal SIGPIPE, Broken pipe. [Switching to Thread 16384 (LWP 27703)] 0x4be2f59b in write () from /lib/libpthread.so.0 (gdb) (gdb) thread apply all bt Thread 22 (Thread 327701 (LWP 18792)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x4be2f61b in read () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x4be32b84 in __JCR_LIST__ () from /lib/libpthread.so.0 Thread 4 (Thread 32771 (LWP 15770)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x4be2ff86 in nanosleep () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x00000001 in ?? () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x4be2c4c6 in __pthread_timedsuspend_new () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x4be290a4 in pthread_cond_timedwait_relative () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x080a9bbc in watchdog_thread (arg=0x0) at watchdog.c:289 ---Type <return> to continue, or q <return> to quit--- http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x4be29dfe in pthread_start_thread () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x4be29e88 in pthread_start_thread_event () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000007 0x4c0069aa in clone () from /lib/libc.so.6 Thread 3 (Thread 16386 (LWP 8798)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x4c000691 in select () from /lib/libc.so.6 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x4be32b84 in __JCR_LIST__ () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0xb7bffbe0 in ?? () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0xb7bfef54 in ?? () Thread 2 (Thread 32769 (LWP 5079)): ---Type <return> to continue, or q <return> to quit--- http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x4bffdd15 in fts_children () from /lib/libc.so.6 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x4bffddc9 in poll () from /lib/libc.so.6 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x4be29abe in __pthread_manager () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x4be29ced in __pthread_manager_event () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x4c0069aa in clone () from /lib/libc.so.6 Thread 1 (Thread 16384 (LWP 27703)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x4be2f59b in write () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x4bdfb2c4 in ?? () from /usr/lib/libmysqlclient_r.so.12 (gdb) ---------------------------------------------------------------------- kern - 03-16-2005 03:48 PST ---------------------------------------------------------------------- OK, for some reason the program is getting a SIGPIPE. You need to run the program again, and when it stops with a SIGPIPE, enter the "cont" command. If it gets another SIGPIPE, do the same. When it finally stops or dies for some other reason, then do the traceback commands you did and post them on this bug report. ---------------------------------------------------------------------- strappaz - 03-17-2005 07:12 PST ---------------------------------------------------------------------- I think it was trying to backup a workstation which wasn't online today. The job has failed this morning and was rescheduled to this afternoon. It was the only one this afternoon. The job was upgraded from incremental to differential. hunt bin # gdb ./bacula-dir GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -s -f -c /usr/local/bacula/etc/bacula-dir.conf Starting program: /usr/local/bacula/bin/bacula-dir -s -f -c /usr/local/bacula/etc/bacula-dir.conf warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 9944)] [New Thread 32769 (LWP 8428)] [New Thread 16386 (LWP 26785)] [New Thread 32771 (LWP 20905)] [New Thread 49156 (LWP 8414)] [New Thread 65541 (LWP 16227)] [New Thread 81926 (LWP 15976)] [New Thread 98311 (LWP 17033)] [New Thread 114696 (LWP 22555)] [New Thread 131081 (LWP 10109)] [New Thread 147466 (LWP 28528)] [New Thread 163851 (LWP 2612)] [New Thread 180236 (LWP 30773)] [New Thread 196621 (LWP 4299)] [New Thread 213006 (LWP 25302)] [New Thread 229391 (LWP 2333)] [New Thread 245776 (LWP 18647)] [New Thread 262161 (LWP 10918)] [New Thread 278546 (LWP 6426)] [New Thread 294931 (LWP 20948)] [New Thread 311316 (LWP 11703)] [New Thread 327701 (LWP 10350)] [New Thread 344086 (LWP 7945)] [New Thread 360471 (LWP 21182)] [New Thread 376856 (LWP 21804)] [New Thread 393241 (LWP 31398)] [New Thread 409626 (LWP 30535)] [New Thread 426011 (LWP 1682)] [New Thread 442396 (LWP 31730)] [New Thread 458781 (LWP 28744)] [New Thread 475166 (LWP 16398)] [New Thread 491551 (LWP 30493)] [New Thread 507936 (LWP 27242)] [New Thread 524321 (LWP 32231)] [New Thread 540706 (LWP 22436)] [New Thread 557091 (LWP 6293)] [New Thread 573476 (LWP 25759)] [New Thread 589861 (LWP 7245)] [New Thread 606246 (LWP 226)] [New Thread 622631 (LWP 6319)] [New Thread 639016 (LWP 12144)] [New Thread 655401 (LWP 20548)] [New Thread 671786 (LWP 3046)] [New Thread 688171 (LWP 24594)] [New Thread 704556 (LWP 29080)] [New Thread 720941 (LWP 5439)] [New Thread 737326 (LWP 25923)] [New Thread 753711 (LWP 21659)] [New Thread 770096 (LWP 16414)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 770096 (LWP 16414)] 0x08097431 in alist::first() (this=0x0) at alist.c:55 55 cur_item = 1; Current language: auto; currently c++ (gdb) cont Continuing. [New Thread 786481 (LWP 7738)] Program received signal SIGPIPE, Broken pipe. [Switching to Thread 786481 (LWP 7738)] 0x412af59b in write () from /lib/libpthread.so.0 (gdb) cont Continuing. [New Thread 802866 (LWP 16264)] Program received signal SIGPIPE, Broken pipe. [Switching to Thread 802866 (LWP 16264)] 0x412af59b in write () from /lib/libpthread.so.0 (gdb) cont Continuing. [New Thread 819251 (LWP 21413)] ---------------------------------------------------------------------- kern - 03-17-2005 10:21 PST ---------------------------------------------------------------------- OK, this time, you trapped it, but I need you to do it one more time because I don't know how it got to the point where it died. Stop it, then re-run it under the debugger as you did, and each time it gets a SIGPIPE, enter "cont" as you did (twice I think) then when it stops with the SIGSEGV instead of entering cont, enter "thread apply all bt" without the quotes. That should tell me how it got to the point where it failed. ---------------------------------------------------------------------- strappaz - 03-18-2005 06:17 PST ---------------------------------------------------------------------- hunt bin # gdb ./bacula-dir GNU gdb 6.0 Copyright 2003 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i686-pc-linux-gnu"...Using host libthread_db library "/lib/libthread_db.so.1". (gdb) run -s -f -c /usr/local/bacula/etc/bacula-dir.conf Starting program: /usr/local/bacula/bin/bacula-dir -s -f -c /usr/local/bacula/etc/bacula-dir.conf warning: Unable to find dynamic linker breakpoint function. GDB will be unable to debug shared library initializers and track explicitly loaded dynamic code. [Thread debugging using libthread_db enabled] [New Thread 16384 (LWP 13301)] [New Thread 32769 (LWP 30883)] [New Thread 16386 (LWP 29748)] [New Thread 32771 (LWP 32532)] [New Thread 49156 (LWP 20127)] [New Thread 65541 (LWP 10758)] [New Thread 81926 (LWP 19015)] [New Thread 98311 (LWP 22788)] [New Thread 114696 (LWP 20082)] [New Thread 131081 (LWP 15486)] [New Thread 147466 (LWP 29195)] [New Thread 163851 (LWP 29615)] [New Thread 180236 (LWP 18095)] [New Thread 196621 (LWP 12759)] [New Thread 213006 (LWP 22785)] [New Thread 229391 (LWP 24368)] [New Thread 245776 (LWP 7970)] [New Thread 262161 (LWP 28717)] [New Thread 278546 (LWP 13444)] [New Thread 294931 (LWP 27682)] [New Thread 311316 (LWP 17263)] [New Thread 327701 (LWP 26392)] [New Thread 344086 (LWP 28629)] [New Thread 360471 (LWP 6901)] [New Thread 376856 (LWP 21815)] [New Thread 393241 (LWP 23450)] [New Thread 409626 (LWP 3432)] [New Thread 426011 (LWP 22550)] [New Thread 442396 (LWP 20633)] [New Thread 458781 (LWP 14381)] [New Thread 475166 (LWP 7841)] [New Thread 491551 (LWP 19850)] [New Thread 507936 (LWP 25539)] [New Thread 524321 (LWP 18784)] [New Thread 540706 (LWP 798)] [New Thread 557091 (LWP 6418)] [New Thread 573476 (LWP 92)] [New Thread 589861 (LWP 5059)] [New Thread 606246 (LWP 8450)] [New Thread 622631 (LWP 25675)] [New Thread 639016 (LWP 12229)] [New Thread 655401 (LWP 30872)] [New Thread 671786 (LWP 10537)] [New Thread 688171 (LWP 845)] [New Thread 704556 (LWP 2103)] [New Thread 720941 (LWP 28661)] [New Thread 737326 (LWP 20078)] [New Thread 753711 (LWP 8220)] [New Thread 770096 (LWP 7938)] [New Thread 786481 (LWP 22343)] [New Thread 802866 (LWP 12990)] [New Thread 819251 (LWP 27855)] [New Thread 835636 (LWP 15829)] [New Thread 852021 (LWP 3172)] [New Thread 868406 (LWP 30327)] [New Thread 884791 (LWP 2417)] [New Thread 901176 (LWP 27204)] [New Thread 917561 (LWP 12678)] [New Thread 933946 (LWP 1782)] [New Thread 950331 (LWP 4699)] [New Thread 966716 (LWP 5704)] [New Thread 983101 (LWP 17609)] [New Thread 999486 (LWP 2760)] [New Thread 1015871 (LWP 796)] [New Thread 1032256 (LWP 27145)] [New Thread 1048641 (LWP 26552)] [New Thread 1065026 (LWP 25830)] [New Thread 1081411 (LWP 6451)] [New Thread 1097796 (LWP 21576)] [New Thread 1114181 (LWP 30773)] [New Thread 1130566 (LWP 16005)] [New Thread 1146951 (LWP 21829)] [New Thread 1163336 (LWP 16892)] [New Thread 1179721 (LWP 5670)] [New Thread 1196106 (LWP 31445)] [New Thread 1212491 (LWP 64)] Program received signal SIGSEGV, Segmentation fault. [Switching to Thread 1212491 (LWP 64)] 0x08097431 in alist::first() (this=0x0) at alist.c:55 55 cur_item = 1; Current language: auto; currently c++ (gdb) thread apply all bt Thread 76 (Thread 1212491 (LWP 64)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x08097431 in alist::first() (this=0x0) at alist.c:55 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x080601ab in connect_to_storage_daemon(JCR*, int, int, int) (jcr=0x810ecc0, retry_interval=10, max_retry_time=1800, verbose=1) at msgchan.c:69 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x0804e039 in do_backup(JCR*) (jcr=0x810ecc0) at backup.c:145 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x0805a799 in job_thread (arg=0x810ecc0) at job.c:215 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x0805dcfe in jobq_server (arg=0x80d24e0) at jobq.c:444 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x441dedfe in pthread_start_thread () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x441dee88 in pthread_start_thread_event () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000007 0x443bb9aa in clone () from /lib/libc.so.6 Thread 75 (Thread 1196106 (LWP 31445)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x441e461b in read () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x441e7b84 in __JCR_LIST__ () from /lib/libpthread.so.0 Thread 4 (Thread 32771 (LWP 32532)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x441e4f86 in nanosleep () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x00000001 in ?? () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x441e14c6 in __pthread_timedsuspend_new () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x441de0a4 in pthread_cond_timedwait_relative () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x080a9bbc in watchdog_thread (arg=0x0) at watchdog.c:289 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000005 0x441dedfe in pthread_start_thread () from /lib/libpthread.so.0 ---Type <return> to continue, or q <return> to quit--- http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000006 0x441dee88 in pthread_start_thread_event () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000007 0x443bb9aa in clone () from /lib/libc.so.6 Thread 3 (Thread 16386 (LWP 29748)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x443b5691 in select () from /lib/libc.so.6 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x441e7b84 in __JCR_LIST__ () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0xb09ffbe0 in ?? () http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0xb09fef54 in ?? () Thread 2 (Thread 32769 (LWP 30883)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x443b2d15 in fts_children () from /lib/libc.so.6 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000001 0x443b2dc9 in poll () from /lib/libc.so.6 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000002 0x441deabe in __pthread_manager () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000003 0x441deced in __pthread_manager_event () from /lib/libpthread.so.0 http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000004 0x443bb9aa in clone () from /lib/libc.so.6 Thread 1 (Thread 16384 (LWP 13301)): http://bugs.bacula.org/bug_view_advanced_page.php?bug_id=0000000 0x441e4f86 in nanosleep () from /lib/libpthread.so.0 0x08097431 55 cur_item = 1; (gdb) ---------------------------------------------------------------------- kern - 03-18-2005 07:33 PST ---------------------------------------------------------------------- OK, this time, I see exactly where the problem is coming from. The only thing that makes sense is that that particular job did not have a Storage resource defined for it. If it did, then something is clobbering memory. I can add some code that will test for this case and fail the job so that Bacula doesn't crash. What I don't understand is why Bacula tried to start the job without a Storage resource. I'll work up a patch that will fail the job and will attach it to this bug report, but if you want to go a bit further, run the same thing, then when it dies with the SIGSEGV do the following two commands: up 1 print *jcr It should print out a big listing of the jcr. If it says "jcr not in current environment" or something like that, just repeat the "up 1 and print *jcr" until you get something. Bug History Date Modified Username Field Change ====================================================================== 03-11-05 03:28 strappaz New Bug 03-11-05 03:28 strappaz File Added: config.out 03-11-05 03:43 strappaz Bug Monitored: strappaz 03-11-05 05:18 kern Bugnote Added: 0000748 03-11-05 05:18 kern Status new => feedback 03-15-05 00:15 strappaz Bugnote Added: 0000750 03-16-05 03:48 kern Bugnote Added: 0000764 03-17-05 07:12 strappaz Bugnote Added: 0000777 03-17-05 10:21 kern Bugnote Added: 0000779 03-18-05 06:17 strappaz Bugnote Added: 0000785 03-18-05 07:33 kern Bugnote Added: 0000786 ====================================================================== |