From: Dirk H B. <dba...@sc...> - 2007-08-29 12:06:44
|
Woo Hoo It appears to be working. No segfault, all data returned to bat from the director. Dirk On Wed, 2007-08-29 at 11:34 +0200, Kern Sibbald wrote: > Hello Dirk, > > Could you update to the latest SVN and test again? I think there is a good > chance that I have fixed your problem. It was one of the possibilities on my > list, but thanks to Martin's comment, I re-read the stdarg documentation and > now believe that it was due to re-calling a subroutine, which trashed a > critical argument pointer. > > Anyway, let's now hope it is fixed ... > > Regards, > > Kern > > On Tuesday 28 August 2007 22:48, Dirk H Bartley wrote: > > On Tue, 2007-08-28 at 07:53 +0200, Kern Sibbald wrote: > > > Hello Dirk, > > > > > > I've looked over the code, and if there is something wrong with it, I am > > > certainly missing it. Perhaps someone on the devel list will see > > > something that I cannot. > > > > > > At this point, I'm privileging a compiler bug. Could you give me the > > > following information? > > > > > > 1. The version of the compiler and the architecture for each machine > > > where you have the failure. > > > > These are gentoo machines. I'm getting the gcc version with: > > gcc-config -l > > > > myth2 ~ # uname -a > > Linux myth2 2.6.19-gentoo-r2 #5 SMP Thu Dec 28 22:24:13 EST 2006 x86_64 > > AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux > > gcc x86_64-pc-linux-gnu-4.1.1 > > > > srvalum3 ~ # uname -a > > Linux srvalum3 2.6.20-gentoo #3 SMP Sun Feb 18 12:32:04 EST 2007 x86_64 > > Dual-Core AMD Opteron(tm) Processor 2210 AuthenticAMD GNU/Linux > > gcc x86_64-pc-linux-gnu-4.1.1 > > > > > 2. The version of the compiler and the architecture for each machine > > > where you do not have the failure. > > > > workplay ~ # uname -a > > Linux workplay 2.6.17-gentoo-r5 #4 SMP Thu Apr 19 19:58:11 EDT 2007 i686 > > AMD Sempron(tm) 2600+ AuthenticAMD GNU/Linux > > gcc i686-pc-linux-gnu-3.4.4 > > > > > Could you give me the complete compile line with all the options for both > > > dird/ua_output.c and lib/bsnprintf.c? Either edit the Makefile and > > > remove the $(NO_ECHO) in front of the compile rules (the .c.o: and .cc.o: > > > lines), or set the environment variable NO_ECHO to the empty string. > > > > Compiling ua_output.c > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > -I.. -O2 -march=athlon64 -pipe -g ua_output.c > > > > Compiling bsnprintf.c > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > -I.. -O2 -march=athlon64 -pipe -g bsnprintf.c > > > > then set makefile optimization > > > > Compiling bsnprintf.c > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > -I.. -O0 -march=athlon64 -pipe -g bsnprintf.c > > > > Compiling ua_output.c > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > -I.. -O0 -march=athlon64 -pipe -g ua_output.c > > > > > Could you set the compile optimization for those two files to -O0 (minus > > > oh zero)? Either edit the Makefile or set it on the command line via a > > > preceding environment variable setting of CFLAGS. Then test again and > > > see if it fails. > > > > Yes it fails. > > > > > As a separate test, if the above test still fails, could you comment out > > > the #define USE_BSNPRINTF 1 > > > line in src/version.h and then rebuild everything? > > > > Yes it fails as well > > > > > Another interesting test would be to put: > > > > > > Dmsg1(000, "fmt=%s\n", fmt); > > > > > > just after the line "again:" at line 737 in src/dird/ua_output.c as well > > > as: > > > > > > Dmsg0(000, "goto again\n"); > > > > > > after "msg = realloc_pool_memory(msg, maxlen + maxlen/2);" at line > > > 741. Then report what it prints when the seg fault occurs. > > > > srvalum3-dir: ua_output.c:738 fmt=%s > > srvalum3-dir: ua_output.c:738 fmt=%s > > srvalum3-dir: ua_output.c:738 fmt=%s > > srvalum3-dir: ua_output.c:738 fmt=%s > > srvalum3-dir: ua_output.c:738 fmt=%s > > srvalum3-dir: ua_output.c:743 goto again > > srvalum3-dir: ua_output.c:738 fmt=%s > > Kaboom! bacula-dir, srvalum3-dir got signal 11 - Segmentation violation. > > Attempting traceback. > > Kaboom! exepath=/usr/sbin/ > > Calling: /usr/sbin/btraceback /usr/sbin/bacula-dir 16016 > > Traceback complete, attempting cleanup ... > > > > And here is thread 2 > > > > Thread 2 (Thread 1098918208 (LWP 16043)): > > #0 0x00002af703e5faef in waitpid () from /lib/libpthread.so.0 > > #1 0x000000000046b9e3 in signal_handler (sig=11) at signal.c:167 > > #2 <signal handler called> > > #3 0x00002af704a06c10 in strlen () from /lib/libc.so.6 > > #4 0x00002af7049d8b70 in vfprintf () from /lib/libc.so.6 > > #5 0x00002af7049f9a4a in vsnprintf () from /lib/libc.so.6 > > #6 0x0000000000452bac in bvsnprintf (str=0x9 <Address 0x9 out of > > bounds>, size=<value optimized out>, > > format=0x41801e01 "tn", ap=0x7) at bsys.c:292 > > #7 0x0000000000431f6d in bmsg (ua=0x6e74a8, fmt=0x513af4 "%s", > > arg_ptr=0x41801ec0) at ua_output.c:740 > > #8 0x0000000000432326 in UAContext::send_msg (this=0x9, fmt=0x513af4 "% > > s") at ua_output.c:778 > > #9 0x000000000042e43c in sql_handler (ctx=0x6e74a8, num_field=3, > > row=0x6e7738) at ua_dotcmds.c:480 > > #10 0x00000000004508f2 in db_sql_query (mdb=0x6e7888, query=<value > > optimized out>, result_handler=0x42e350 <sql_handler>, > > ctx=0x6e74a8) at postgresql.c:320 > > #11 0x000000000042dd57 in sql_cmd (ua=0x6e74a8, cmd=<value optimized > > out>) at ua_dotcmds.c:496 > > #12 0x000000000042da9b in do_a_dot_command (ua=0x6e74a8, > > cmd=0x6f0f70 ".sql query=\"SELECT LogId, Time, LogText FROM Log > > WHERE JobId='2674'\"") at ua_dotcmds.c:131 > > #13 0x000000000043e75f in handle_UA_client_request (arg=<value optimized > > out>) at ua_server.c:145 > > #14 0x0000000000473c1d in workq_server (arg=<value optimized out>) at > > workq.c:357 > > #15 0x00002af703e58135 in start_thread () from /lib/libpthread.so.0 > > #16 0x00002af704a5335e in clone () from /lib/libc.so.6 > > #17 0x0000000000000000 in ?? () > > > > On this machine, gcc had three versions > > > > srvalum3 bacula # gcc-config -l > > [1] x86_64-pc-linux-gnu-3.3.6 > > [2] x86_64-pc-linux-gnu-4.1.1 * > > [3] x86_64-pc-linux-gnu-4.1.2 > > > > So as an experiment I set gcc to 4.1.2 and recompiled bacula with the > > same result. > > > > > > Dirk > > > > > Best regards, > > > > > > Kern > > > > > > PS: for the list, the problem is clearly (according to the traceback) in > > > Thread 2 between stack frame 4 and 5 where the argument "fmt" in stack > > > frame 5 should be identical to argument "format" in stack frame 4, but > > > has been shifted by 2 bytes! > > > > ------------------------------------------------------------------------- > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > Bacula-devel mailing list > > Bac...@li... > > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Bacula-devel mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-devel |