From: Mantis B. T. <no...@bu...> - 2012-05-26 15:34:35
|
The issue 0001875 has been set as DUPLICATE OF the following issue. ====================================================================== http://bugs.bacula.org/view.php?id=1798 ====================================================================== Reported By: fschmidt Assigned To: marcovw ====================================================================== Project: bacula Issue ID: 1798 Category: Storage Daemon Reproducibility: always Severity: crash Priority: normal Status: closed Resolution: no change required Fixed in Version: ====================================================================== Date Submitted: 2011-12-09 10:39 GMT Last Modified: 2012-05-26 16:34 BST ====================================================================== Summary: SD crashs with bus error in scan.c Description: I run the Director on an x86 Solaris, FD and SD on sparc Solaris, both bacula version 5.2.2. When I try to backup files, the SD crashs with a bus error. The other SDs on x86 Linux and x86 Solaris run fine. The SD was created using $ ./configure --disable-build-dird --with-mysql --enable-acl Additional Information: # gdb /sbin/bacula-sd (gdb) run -f -v -c /etc/bacula/bacula-sd.conf Starting program: /sbin/bacula-sd -f -v -c /etc/bacula/bacula-sd.conf warning: Lowest section in /usr/lib/libpthread.so.1 is .dynamic at 00000074 Program received signal SIGSEGV, Segmentation fault. [Switching to LWP 5] 0xff1cdbe8 in bsscanf(char const*, char const*, ...) ( buf=0x8352b " VolSessionId=0 VolSessionTime=0\n", fmt=0x690b3 " VolSessionId=%d VolSessionTime=%d\n") at scan.c:426 426 *((int32_t *)vp) = (int32_t)value; Current language: auto; currently c++ (gdb) info threads http://bugs.bacula.org/view.php?id=0 0xff1cdbe8 in bsscanf(char const*, char const*, ...) ( buf=0x8352b " VolSessionId=0 VolSessionTime=0\n", fmt=0x690b3 " VolSessionId=%d VolSessionTime=%d\n") at scan.c:426 http://bugs.bacula.org/view.php?id=1 0x00030838 in job_cmd(JCR*) (jcr=0x6ea78) at mem_pool.h:96 http://bugs.bacula.org/view.php?id=2 0x0002b07c in handle_connection_request(void*) (arg=0x6dbe8) at dircmd.c:233 http://bugs.bacula.org/view.php?id=3 0xff1d8624 in workq_server (arg=0x6a118) at workq.c:346 ====================================================================== Relationships ID Summary ---------------------------------------------------------------------- has duplicate 0001853 bacula-sd dead but pid file exists has duplicate 0001875 Storage Deamon crashes (coredump/traceb... ====================================================================== ---------------------------------------------------------------------- (0006089) kern (administrator) - 2011-12-09 13:41 http://bugs.bacula.org/view.php?id=1798#c6089 ---------------------------------------------------------------------- There is something completely wrong. There is nothing that corresponds to the traceback you have. It looks like your debugger is totally broken because it is mixing subroutines and file names in a way that it wrong. There is nothing in our code (maybe you modified the code) that has a sscanf corresponding to the arguments you show above VolSessionId and VolSessionTime. I suspect that either you have a bad Solaris build (wrong compile options, ...) or you are mixing a Director and SD of different versions. A bus error would tend to indicate that your build is screwed up or your build options are wrong. However, the error shown above is a Seg Fault not a bus error. So, nothing seems to correspond to anything. We don't help on this list with build problems. Please get help from the bacula-users email list. ---------------------------------------------------------------------- (0006090) marcovw (developer) - 2011-12-09 16:21 http://bugs.bacula.org/view.php?id=1798#c6090 ---------------------------------------------------------------------- I think we do have the code in 5.2.2 its just gdb that is cutting the total line. Its the sscanf in job.c Which uses the following definition: static char jobcmd[] = "JobId=%d job=%127s job_name=%127s client_name=%127s " "type=%d level=%d FileSet=%127s NoAttr=%d SpoolAttr=%d FileSetMD5=%127s " "SpoolData=%d WritePartAfterJob=%d PreferMountedVols=%d SpoolSize=%s " "rerunning=%d VolSessionId=%d VolSessionTime=%d\n"; And the exact sscanf is this: stat = sscanf(dir->msg, jobcmd, &JobId, job.c_str(), job_name.c_str(), client_name.c_str(), &JobType, &level, fileset_name.c_str(), &no_attributes, &spool_attributes, fileset_md5.c_str(), &spool_data, &write_part_after_job, &PreferMountedVols, spool_size, &jcr->rerunning, &jcr->VolSessionId, &jcr->VolSessionTime); I do agree with Kern that the gdb output looks strange. I would advise to use a real debugger on Solaris (e.g. dbx). What compiler are you using ? gcc or SUN CC ? You could set a breakpoint on the bsscanf function and single step through it and see why vp is becoming NULL (as that is what I expect what is happening.) that would explain the error as we then dereference a NULL pointer and that leads to SEGV errors. But looking at the number of arguments it looks strange. ---------------------------------------------------------------------- (0006091) marcovw (developer) - 2011-12-09 16:21 http://bugs.bacula.org/view.php?id=1798#c6091 ---------------------------------------------------------------------- B.T.W. a first step to make sure that you are not running different versions is to set a breakpoint on the bsscanf function and do the following in gdb (or even better in dbx) In dbx it would be something like - stop in bsscanf - print buf - print fmt ---------------------------------------------------------------------- (0006092) fschmidt (reporter) - 2011-12-09 16:28 http://bugs.bacula.org/view.php?id=1798#c6092 ---------------------------------------------------------------------- Bacula was compiled with gcc 3.4.6. I'll check cc and dbx on Monday. ---------------------------------------------------------------------- (0006093) fschmidt (reporter) - 2011-12-09 17:20 http://bugs.bacula.org/view.php?id=1798#c6093 ---------------------------------------------------------------------- I compiled SD and FD with Sun CC and they run fine. Is it possible/recommended to debug gcc binaries with dbx? ---------------------------------------------------------------------- (0006094) marcovw (developer) - 2011-12-09 17:30 http://bugs.bacula.org/view.php?id=1798#c6094 ---------------------------------------------------------------------- dbx is the better debugger on Solaris without doubt. I had severe problems with GDB on 64 bits binaries in particular. As far as I debug the binaries on Solaris (which is my primary plaform) I tend to always use the SUN cc and dbx (or even dbxtool the GUI debugger which with the source make life a lot easier). Most other debugging I do on Linux in a VM with gdb. Debugging a gcc binary with dbx should be no problem as on Solaris things are tied together with CTF data which both SUN CC and gcc deliver. It could very well be a misfiring optimization did you run the SUN CC with any special optimize options ? (e.g. like -fast etc.) ? I was mostly triggered by the bus error thing which it doesn't seem to be as on SPARC you get bus errors when things are not aligned right. I have had my fair share of those in the past. I have moved on to x86 a couple of years ago so I don't have any SPARC running in production other then a T2000 for testing every now and then. ---------------------------------------------------------------------- (0006095) kern (administrator) - 2011-12-09 23:05 http://bugs.bacula.org/view.php?id=1798#c6095 ---------------------------------------------------------------------- If the code is at the point Marco says, which I would say is 99.9999% sure, then my best guess is that there is a 5.0.x Director that is connecting to the SD, which is causing it to crash. Or perhaps it is also possible that you have compiled it as a 32 bit program, in which case, it may also break things. Compiling as a 32 bit program is the default with gcc if I remember right. ---------------------------------------------------------------------- (0006096) fschmidt (reporter) - 2011-12-13 17:07 http://bugs.bacula.org/view.php?id=1798#c6096 ---------------------------------------------------------------------- bacula-dir is version 5.2.2, ELF 32-bit LSB executable 80386 The CC binary of bacula-sd runs without problems, its also 32bit: ELF 32-bit MSB executable SPARC32PLUS The gcc binary is identified as ELF 32-bit MSB executable SPARC The debugger shows: buf = "Hello Director vhost4-dir calling\n" fmt = "Hello Start Job %127s" ---------------------------------------------------------------------- (0006097) marcovw (developer) - 2011-12-13 18:03 http://bugs.bacula.org/view.php?id=1798#c6097 ---------------------------------------------------------------------- Given the info above you seem to have a serious misfiring compiler. It just seems to fail in any bsscanf at random as the buf and fmt above are from a complete different set of code e.g. the sd handshake I presume. You have a couple of options given in the best to worst order: - compile using the SUN compiler as it gives way better code (at least on SPARC it always faster then gcc some times being more the 3-6 times so why bother with an ancient gcc compiler) The Oracle Studio stuff is free anyhow and tomorrow 12.3 is released so its a least updated from time to time which gcc is not at least not on Solaris 10 (Solaris 11 has a gcc 4.x version) - make sure you have the latest patches and maybe gcc is patched (Looking at the way SUN and Oracle do patches it may even be hidden in a kernel patch.) - lower the optimization of the gcc compiler try gcc -o0 (e.g. zero) - comment the define in src/baconfig which defines sscanf as bscanf which should default you to the native sscanf which may work or not but work around a misfire by gcc compiling the bsscanf vararg parsing. ... ---------------------------------------------------------------------- (0006098) fschmidt (reporter) - 2011-12-13 18:15 http://bugs.bacula.org/view.php?id=1798#c6098 ---------------------------------------------------------------------- Option 1 works for me. Many thanks for your help! ---------------------------------------------------------------------- (0006099) marcovw (developer) - 2011-12-13 18:30 http://bugs.bacula.org/view.php?id=1798#c6099 ---------------------------------------------------------------------- We will keep this in mind when more people run into the same problem. Issue History Date Modified Username Field Change ====================================================================== 2011-12-09 10:39 fschmidt New Issue 2011-12-09 13:41 kern Note Added: 0006089 2011-12-09 13:41 kern Status new => feedback 2011-12-09 16:11 marcovw Note Added: 0006090 2011-12-09 16:21 marcovw Note Added: 0006091 2011-12-09 16:21 marcovw Note Edited: 0006090 2011-12-09 16:28 fschmidt Note Added: 0006092 2011-12-09 16:28 fschmidt Status feedback => new 2011-12-09 17:20 fschmidt Note Added: 0006093 2011-12-09 17:30 marcovw Note Added: 0006094 2011-12-09 23:05 kern Note Added: 0006095 2011-12-09 23:05 kern Status new => feedback 2011-12-13 17:07 fschmidt Note Added: 0006096 2011-12-13 17:07 fschmidt Status feedback => new 2011-12-13 18:02 marcovw Note Added: 0006097 2011-12-13 18:03 marcovw Note Edited: 0006097 2011-12-13 18:03 marcovw Assigned To => marcovw 2011-12-13 18:03 marcovw Status new => feedback 2011-12-13 18:15 fschmidt Note Added: 0006098 2011-12-13 18:15 fschmidt Status feedback => assigned 2011-12-13 18:30 marcovw Note Added: 0006099 2011-12-13 18:30 marcovw Status assigned => closed 2011-12-13 18:30 marcovw Resolution open => no change required 2012-03-27 10:25 marcovw Relationship added has duplicate 0001853 2012-05-26 16:34 marcovw Relationship added has duplicate 0001875 ====================================================================== |