|
From: Philippe W. <phi...@sk...> - 2012-11-06 21:30:31
|
On Tue, 2012-11-06 at 20:42 +0530, Uday Reddy wrote: > On Tue, Nov 6, 2012 at 8:34 PM, John Reiser <jr...@bi...> wrote: > >> With vgdb/gdb, the debugger hangs after a couple of continues. > >> > >> Program received signal SIGTRAP, Trace/breakpoint trap. > >> 0x000000300b4ea3d7 in writev () from /lib64/libc.so.6 > > > > Is there a real 'int3' instruction there, or was SIGTRAP sent > > from somewhere else [using kill()]? Look with something like: > > (gdb) x/20i 0x000000300b4ea3d7 - 0x20 > > > > [Philippe, please comment.] When Valgrind encounters an error and --vgdb-error=xxxx tells to let GDB connect, then Valgrind will return the control to GDB by telling that a SIGTRAP has been encountered. Valgrind gdbsrv never uses "real trap instructions", nor does it generate a SIGTRAP signal with sigkill. The SIGTRAP reported by GDB is purely because Valgrind gdbsrv reports that the process has stopped its execution due an error. The gdbserver protocol implies to indicate the stop reason : the SIGTRAP looks an as good reason as any other (the protocol does not have a reason "stopped due to an error detected by Valgrind" :). So, in summary, we can be reasonably sure that the SIGTRAP above is caused by Valgrind reporting the "writev error" to GDB, to allow the GDB user to dig more in depth on this error. > > > >> There is no progress after this. The CPU utilization is 0 and it looks > >> like memcheck has hung. On the valgrind side, I see > >> ==29118== Syscall param writev(vector[...]) points to uninitialised byte(s) > >> ==29118== at 0x300B4EA3D7: writev (in /usr/lib64/libc-2.15.so) > >> ==29118== by 0x6256B92: ??? (in > >> /usr/lib64/openmpi/lib/openmpi/mca_oob_tcp.so) > > > > What are the likely parameters to that writev()? How many different pieces, > > of what sizes, etc.? Would the output be going to a disk file, or to a socket? > > The output would be going to a socket. I believe it's just sending out > a buffer. If it's just sending out uninitialized data, it should > perhaps not be related to the "invalid read" that I see via valgrind > at a later stage? If memcheck looks to be hung, then just type control-c in GDB. This should interrupt the Valgrind process/memcheck. Then you can use GDB commands such as backtrace and similar to see what your application is doing (I guess blocked in a system call if it really consumes no cpu at all). Philippe NB: It is always possible that there is a bug in Valgrind and/or in Valgrind gdbsrv which causes your application to be really "abnormally" hung under Valgrind. If that is the case, re-running with -d -d -d -v -v -v --trace-syscalls=yes could give an idea about what is going wrong. |