From: <bac...@li...> - 2008-04-03 05:27:44
|
A NOTE has been added to this issue. ====================================================================== http://bugs.bacula.org/view.php?id=1063 ====================================================================== Reported By: tilman Assigned To: ====================================================================== Project: bacula Issue ID: 1063 Category: Director Reproducibility: sometimes Severity: crash Priority: normal Status: feedback ====================================================================== Date Submitted: 03-12-2008 15:08 GMT Last Modified: 04-03-2008 06:27 BST ====================================================================== Summary: list nextvol crashes director Description: Entering the command "list nextvol" in bconsole frequently (but not always) crashes the director. Symptoms: - bconsole exits (shell prompt appears) - bacula-dir process exits (vanishes) - after restarting bacula-dir and re-entering bconsole, I have messages: 16-Feb 14:09 xenon-dir: Fatal Error at sql_get.c:580 because: rwl_writelock failure. stat=22: ERR=Invalid argument 16-Feb 14:09 xenon-dir: ERROR in mem_pool.c:163 Failed ASSERT: obuf 16-Feb 14:09 xenon-dir: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation The "Failed ASSERT" line is not always present, though. Sometimes I see 02-Feb 15:28 xenon-dir: ABORTING due to ERROR in smartall.c:194 double free from smartall.c:328 in its place, sometimes it's just the "rwl_writelock" and "signal 11" lines. ====================================================================== ---------------------------------------------------------------------- tilman - 03-12-08 15:09 ---------------------------------------------------------------------- See also: http://marc.info/?l=bacula-users&m=120515906327651&w=2 ---------------------------------------------------------------------- kern - 03-18-08 11:19 ---------------------------------------------------------------------- I am unable to reproduce this. I would suggest that you upgrade to the latest version 2.2.8, and try again. If it continues to fail for you, please try to simplify it to a test case that I can reproduce. ---------------------------------------------------------------------- tilman - 03-24-08 19:08 ---------------------------------------------------------------------- I upgraded to 2.2.8 on one of the affected machines, and the crash happened again. The message after the restart was: *mes 24-Mar 19:03 xenon-dir: Fatal Error at sql_get.c:580 because: rwl_writelock failure. stat=22: ERR=Invalid argument 24-Mar 19:03 xenon-dir: Fatal Error because: Bacula interrupted by signal 11: Segmentation violation * So not even the line number changed. I'm not quite sure how to simplify this beyond what I already wrote. Omitting any of the "steps to reproduce" will obviously fail to reproduce the problem. :-) Anyway, I cannot tell you how to reproduce the crash reliably. It sometimes takes many tries for me, too, but so far I could not discern any pattern. I did hope the gdb backtrace I managed to capture would help you, but apparently not so. Is there anything else I can do in order to help identifying the problem? ---------------------------------------------------------------------- kern - 03-28-08 13:54 ---------------------------------------------------------------------- I cannot duplicate this problem. As a last attempt, please do the following: - Include *all* the output from list volumes list nextvol (repeated until it crashes) - Run the director under the debugger (see the Kaboom chapter) when it crashes, after doing the where, do: thread apply all bt up (after this you should be in sql_get.c) print *mdb - What version of g++ are you using? ---------------------------------------------------------------------- tilman - 03-28-08 19:43 ---------------------------------------------------------------------- Please find in bacula-dir-gdb.log the complete gdb session of bacula-dir crashing on "list nextvol", and in bconsole.log the corresponding bconsole session with a couple of hopefully useful informational commands. The bconsole session starts with the message from the previous crash. (Un?)fortunately this time the crash happened right on the first try, so the log doesn't contain the output of a successful "list nextvol". The two SIGUSR2 / cont sequences in the gdb session are from the two stat commands for the fd and sd and seem to be normal. This is with the contributed bacula 2.2.8 RPM for openSUSE 10.3 except for a self-compiled director because the one in the RPM was compiled without debugging information. My compiler is gcc-c++-4.2-24 shipped with openSUSE 10.3. If you need any more information, feel free to ask. I'll continue running the director in gdb for now. ---------------------------------------------------------------------- kern - 03-28-08 20:20 ---------------------------------------------------------------------- I suspect that it is an OpenSuSE 10.3/gcc 4.2 problem. I am installing OpenSuSE now and will see if I can reproduce the problem. I'm out most of the weekend though ... when I get back I will look at your logs and run some tests. In the mean time, I took a *quick* look at your gdb output, and there are a *lot* of pointers totally messed up. Did you have errors compiling? If so, something is surely messed up. Also check that your compile options are those recommended in the manual (i.e. those that the Bacula make generates). You might try compiling with the -O0 (oh zero) option, or try to fall back to a 4.1 compiler. ---------------------------------------------------------------------- tilman - 03-31-08 11:54 ---------------------------------------------------------------------- I compiled from the bacula-2.2.8-2.src.rpm package on SourceForge, after editing the spec file to add debugging information, with the command rpmbuild -ba --define "build_su103 1" --define "build_sqlite 1" --define "nobuild_gconsole 1" bacula.spec There was an "unused variable" warning in mtx (source file nsmhack.c), a patch to file src/cats/make_catalog_backup.in applied with fuzz 2, and the "configure" output contained this slightly irregular stretch: checking for MySQL support... checking for SQLite3 support... no checking for SQLite support... no nm: 'a.out': No such file Also, a couple of linker warnings: "Using '...' in statically linked applications requires at runtime the shared libraries from the glibc version used for linking strip static-bacula-fd" from the file daemon. Apart from that, no errors or warnings during compilation. The packaging phase had two complaints: objdump: /var/tmp/bacula-root/etc/bacula/rescue/linux/cdrom/roottree/root/ntfsresize: File format not recognized sed: cannot read /usr/src/packages/SOURCES/bacula.spec: no such file or directory but that shouldn't matter as I did not use the resulting rpm file but copied the bacula-dir executable directly from the build directory. Full log of the rebuild (458 kB) available at http://gollum.phnxsoft.com/~ts/linux/build-bacula.log in case you want to see all the gory details. HTH T. ---------------------------------------------------------------------- kern - 04-01-08 10:58 ---------------------------------------------------------------------- I cannot duplicate this problem. I've loaded a SuSE 10.3 distribution on a Xenon machine using the 4.2 compiler, tested it, but cannot reproduce the error. I think there is something wrong in your build. I would recommend that you build from source since you are having a lot of problems or errors rebuilding from the SRPM. You might be able to get help for this on the bacula-users list or by contacting Scott Barninger directly (he is not subscribed to that list). ---------------------------------------------------------------------- tilman - 04-01-08 23:49 ---------------------------------------------------------------------- I recompiled from 2.2.8 source, and reproduced the crash under gdb with that, so it seems it's not the fault of the RPM packages. Please find attached: build-bacula-from-source.log - log of my compiling, this time without any suspicious messages bacula-crash-new.txt - log from gdb session with that freshly compiled director executable crashing, and the diagnostic commands you requested HTH T. ---------------------------------------------------------------------- kern - 04-03-08 06:27 ---------------------------------------------------------------------- OK, thanks. I think I see what is going wrong, but need to figure out the best way to resolve it. Issue History Date Modified Username Field Change ====================================================================== 03-12-08 15:08 tilman New Issue 03-12-08 15:09 tilman Note Added: 0003208 03-18-08 11:19 kern Note Added: 0003216 03-18-08 11:19 kern Status new => feedback 03-24-08 19:08 tilman Note Added: 0003231 03-28-08 13:54 kern Note Added: 0003236 03-28-08 19:34 tilman File Added: bacula-dir-gdb.log 03-28-08 19:34 tilman File Added: bconsole.log 03-28-08 19:43 tilman Note Added: 0003240 03-28-08 20:20 kern Note Added: 0003241 03-31-08 11:54 tilman Note Added: 0003245 04-01-08 10:58 kern Note Added: 0003248 04-01-08 23:49 tilman Note Added: 0003255 04-01-08 23:49 tilman File Added: build-bacula-from-source.log 04-01-08 23:50 tilman File Added: bacula-crash-new.txt 04-03-08 06:27 kern Note Added: 0003264 ====================================================================== |