From: Kern S. <ke...@si...> - 2007-08-29 12:48:19
|
On Wednesday 29 August 2007 14:37, Kern Sibbald wrote: > On Wednesday 29 August 2007 13:41, Dirk H Bartley wrote: > > Woo Hoo > > > > It appears to be working. No segfault, all data returned to bat from > > the director. > > OK great. Thanks for testing. > > Now we just need to deal with the antiquated systems that don't have > va_copy() :-( By the way, just so that it is clear to everyone, I have code that will work with and code without va_copy(), but the va_copy() code is enabled by default except for a few systems where I was pretty sure it does not exist. For other systems, when we run into compile problems, we just need to #undef HAVE_VA_COPY in src/baconfig.h. For older systems like the FreeBSD, 4.x that may not have va_copy, I will let the users figure out the appropriate way to test for the system versions. If we have too many problems, I may make the default to not use va_copy, which will work everywhere, and then just enable it on systems where we know it exists -- or at some point, either someone will get annoyed enough to send in a patch or I will finish my mini-vacation (very tiring) and implement a ./configure solution. For the moment, at least the Director when it builds does not crash :-) Regards, Kern > > > Dirk > > > > On Wed, 2007-08-29 at 11:34 +0200, Kern Sibbald wrote: > > > Hello Dirk, > > > > > > Could you update to the latest SVN and test again? I think there is a > > > good chance that I have fixed your problem. It was one of the > > > possibilities on my list, but thanks to Martin's comment, I re-read the > > > stdarg documentation and now believe that it was due to re-calling a > > > subroutine, which trashed a critical argument pointer. > > > > > > Anyway, let's now hope it is fixed ... > > > > > > Regards, > > > > > > Kern > > > > > > On Tuesday 28 August 2007 22:48, Dirk H Bartley wrote: > > > > On Tue, 2007-08-28 at 07:53 +0200, Kern Sibbald wrote: > > > > > Hello Dirk, > > > > > > > > > > I've looked over the code, and if there is something wrong with it, > > > > > I am certainly missing it. Perhaps someone on the devel list will > > > > > see something that I cannot. > > > > > > > > > > At this point, I'm privileging a compiler bug. Could you give me > > > > > the following information? > > > > > > > > > > 1. The version of the compiler and the architecture for each > > > > > machine where you have the failure. > > > > > > > > These are gentoo machines. I'm getting the gcc version with: > > > > gcc-config -l > > > > > > > > myth2 ~ # uname -a > > > > Linux myth2 2.6.19-gentoo-r2 #5 SMP Thu Dec 28 22:24:13 EST 2006 > > > > x86_64 AMD Athlon(tm) 64 Processor 3200+ AuthenticAMD GNU/Linux > > > > gcc x86_64-pc-linux-gnu-4.1.1 > > > > > > > > srvalum3 ~ # uname -a > > > > Linux srvalum3 2.6.20-gentoo #3 SMP Sun Feb 18 12:32:04 EST 2007 > > > > x86_64 Dual-Core AMD Opteron(tm) Processor 2210 AuthenticAMD > > > > GNU/Linux gcc x86_64-pc-linux-gnu-4.1.1 > > > > > > > > > 2. The version of the compiler and the architecture for each > > > > > machine where you do not have the failure. > > > > > > > > workplay ~ # uname -a > > > > Linux workplay 2.6.17-gentoo-r5 #4 SMP Thu Apr 19 19:58:11 EDT 2007 > > > > i686 AMD Sempron(tm) 2600+ AuthenticAMD GNU/Linux > > > > gcc i686-pc-linux-gnu-3.4.4 > > > > > > > > > Could you give me the complete compile line with all the options > > > > > for both dird/ua_output.c and lib/bsnprintf.c? Either edit the > > > > > Makefile and remove the $(NO_ECHO) in front of the compile rules > > > > > (the .c.o: and .cc.o: lines), or set the environment variable > > > > > NO_ECHO to the empty string. > > > > > > > > Compiling ua_output.c > > > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > > > -I.. -O2 -march=athlon64 -pipe -g ua_output.c > > > > > > > > Compiling bsnprintf.c > > > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > > > -I.. -O2 -march=athlon64 -pipe -g bsnprintf.c > > > > > > > > then set makefile optimization > > > > > > > > Compiling bsnprintf.c > > > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > > > -I.. -O0 -march=athlon64 -pipe -g bsnprintf.c > > > > > > > > Compiling ua_output.c > > > > /usr/bin/x86_64-pc-linux-gnu-g++ -c -I/usr/include/python2.4 -I. > > > > -I.. -O0 -march=athlon64 -pipe -g ua_output.c > > > > > > > > > Could you set the compile optimization for those two files to -O0 > > > > > (minus oh zero)? Either edit the Makefile or set it on the command > > > > > line via a preceding environment variable setting of CFLAGS. Then > > > > > test again and see if it fails. > > > > > > > > Yes it fails. > > > > > > > > > As a separate test, if the above test still fails, could you > > > > > comment out the #define USE_BSNPRINTF 1 > > > > > line in src/version.h and then rebuild everything? > > > > > > > > Yes it fails as well > > > > > > > > > Another interesting test would be to put: > > > > > > > > > > Dmsg1(000, "fmt=%s\n", fmt); > > > > > > > > > > just after the line "again:" at line 737 in src/dird/ua_output.c > > > > > as well as: > > > > > > > > > > Dmsg0(000, "goto again\n"); > > > > > > > > > > after "msg = realloc_pool_memory(msg, maxlen + maxlen/2);" at line > > > > > 741. Then report what it prints when the seg fault occurs. > > > > > > > > srvalum3-dir: ua_output.c:738 fmt=%s > > > > srvalum3-dir: ua_output.c:738 fmt=%s > > > > srvalum3-dir: ua_output.c:738 fmt=%s > > > > srvalum3-dir: ua_output.c:738 fmt=%s > > > > srvalum3-dir: ua_output.c:738 fmt=%s > > > > srvalum3-dir: ua_output.c:743 goto again > > > > srvalum3-dir: ua_output.c:738 fmt=%s > > > > Kaboom! bacula-dir, srvalum3-dir got signal 11 - Segmentation > > > > violation. Attempting traceback. > > > > Kaboom! exepath=/usr/sbin/ > > > > Calling: /usr/sbin/btraceback /usr/sbin/bacula-dir 16016 > > > > Traceback complete, attempting cleanup ... > > > > > > > > And here is thread 2 > > > > > > > > Thread 2 (Thread 1098918208 (LWP 16043)): > > > > #0 0x00002af703e5faef in waitpid () from /lib/libpthread.so.0 > > > > #1 0x000000000046b9e3 in signal_handler (sig=11) at signal.c:167 > > > > #2 <signal handler called> > > > > #3 0x00002af704a06c10 in strlen () from /lib/libc.so.6 > > > > #4 0x00002af7049d8b70 in vfprintf () from /lib/libc.so.6 > > > > #5 0x00002af7049f9a4a in vsnprintf () from /lib/libc.so.6 > > > > #6 0x0000000000452bac in bvsnprintf (str=0x9 <Address 0x9 out of > > > > bounds>, size=<value optimized out>, > > > > format=0x41801e01 "tn", ap=0x7) at bsys.c:292 > > > > #7 0x0000000000431f6d in bmsg (ua=0x6e74a8, fmt=0x513af4 "%s", > > > > arg_ptr=0x41801ec0) at ua_output.c:740 > > > > #8 0x0000000000432326 in UAContext::send_msg (this=0x9, fmt=0x513af4 > > > > "% s") at ua_output.c:778 > > > > #9 0x000000000042e43c in sql_handler (ctx=0x6e74a8, num_field=3, > > > > row=0x6e7738) at ua_dotcmds.c:480 > > > > #10 0x00000000004508f2 in db_sql_query (mdb=0x6e7888, query=<value > > > > optimized out>, result_handler=0x42e350 <sql_handler>, > > > > ctx=0x6e74a8) at postgresql.c:320 > > > > #11 0x000000000042dd57 in sql_cmd (ua=0x6e74a8, cmd=<value optimized > > > > out>) at ua_dotcmds.c:496 > > > > #12 0x000000000042da9b in do_a_dot_command (ua=0x6e74a8, > > > > cmd=0x6f0f70 ".sql query=\"SELECT LogId, Time, LogText FROM Log > > > > WHERE JobId='2674'\"") at ua_dotcmds.c:131 > > > > #13 0x000000000043e75f in handle_UA_client_request (arg=<value > > > > optimized out>) at ua_server.c:145 > > > > #14 0x0000000000473c1d in workq_server (arg=<value optimized out>) at > > > > workq.c:357 > > > > #15 0x00002af703e58135 in start_thread () from /lib/libpthread.so.0 > > > > #16 0x00002af704a5335e in clone () from /lib/libc.so.6 > > > > #17 0x0000000000000000 in ?? () > > > > > > > > On this machine, gcc had three versions > > > > > > > > srvalum3 bacula # gcc-config -l > > > > [1] x86_64-pc-linux-gnu-3.3.6 > > > > [2] x86_64-pc-linux-gnu-4.1.1 * > > > > [3] x86_64-pc-linux-gnu-4.1.2 > > > > > > > > So as an experiment I set gcc to 4.1.2 and recompiled bacula with the > > > > same result. > > > > > > > > > > > > Dirk > > > > > > > > > Best regards, > > > > > > > > > > Kern > > > > > > > > > > PS: for the list, the problem is clearly (according to the > > > > > traceback) in Thread 2 between stack frame 4 and 5 where the > > > > > argument "fmt" in stack frame 5 should be identical to argument > > > > > "format" in stack frame 4, but has been shifted by 2 bytes! > > > > > > > > --------------------------------------------------------------------- > > > >-- -- This SF.net email is sponsored by: Splunk Inc. > > > > Still grepping through log files to find problems? Stop. > > > > Now Search log events and configuration files using AJAX and a > > > > browser. Download your FREE copy of Splunk now >> > > > > http://get.splunk.com/ > > > > _______________________________________________ > > > > Bacula-devel mailing list > > > > Bac...@li... > > > > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > > > > > ----------------------------------------------------------------------- > > >-- This SF.net email is sponsored by: Splunk Inc. > > > Still grepping through log files to find problems? Stop. > > > Now Search log events and configuration files using AJAX and a browser. > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > _______________________________________________ > > > Bacula-devel mailing list > > > Bac...@li... > > > https://lists.sourceforge.net/lists/listinfo/bacula-devel > > ------------------------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > Bacula-devel mailing list > Bac...@li... > https://lists.sourceforge.net/lists/listinfo/bacula-devel |