From: Luke H. <lu...@PA...> - 2003-04-10 09:57:12
|
I'm attempting to run valgrind on a DCE RPC server. The problem is that, while under valgrind, the server doesn't seem to receive any client requests. It does work normally, of course, but there are a couple of bugs for which valgrind will be very useful in fixing. There are some interesting aspects of the DCE RPC runtime that, while not necessarily related to this problem, may give valgrind some grief. The RPC library does the following "weird" things: o wraps the pthread API to provide POSIX Threads Draft 4 API semantics (however, the symbols do not conflict with LinuxThreads) o wraps common system calls with jackets that enable asynchronous thread cancellation for the duration of the system call o implements exception handling on top of setjmp()/longjmp() and pthread cancels (see below): /* * Note that the rough schema for the exception handler is: * * do { * pthread_cleanup_push("routine that will longjmp back here"); * val = setjmp(...); * if (val == 0) * ...normal code... * else * ...exception code... * ...finally code... * if ("exception happened") * if ("exception handled") * break; * else * ...re-raise exception... * pthread_cleanup_pop(...); * } while (0); * * Exceptions are raised by doing a pthread_cancel against one's self * and then doing a pthread_testcancel. This causes the topmost cleanup * routine to be popped and called. This routine (_exc_cleanup_handler) * longjmp's back to the exception handler. This approach means we can * leverage off the fact that the push/pop routines are maintaining some * per-thread state (hopefully [but likely not] more efficiently than * we could ourselves). We need this state to string together the dynamic * stack of exception handlers. */ Disabling the system call jackets didn't seem to help. A backtrace of the under-valgrind server follows: (gdb) bt #0 0x401924a2 in vgPlain_do_syscall () from /usr/local/lib/valgrind/valgrind.so #1 0x4052a1e2 in recvmsg (fd=9, message=0x496ca89c, flags=0) at libc_jacket.c:221 #2 0x45fd1078 in receive_packet (assoc=0x4179608c, fragbuf_p=0x496ca9e4, ovf_fragbuf_p=0x496ca9e8, st=0x496ca9f0) at ../../../ncklib/include/comsoc_bsd.h:339 #3 0x45fcf9ea in receive_dispatch (assoc=0x4179608c) at cnrcvr.c:508 #4 0x45fcf248 in rpc__cn_network_receiver (assoc=0x4179608c) at cnrcvr.c:274 #5 0x4053254c in thread_wrapper (info=0x417961e0) at vg_libpthread.c:667 #6 0x4016c778 in do__apply_in_new_thread_bogusRA () at vg_scheduler.c:2122 Then, probably after the siglongjmp() call: (gdb) bt #0 vg_do_syscall2 (syscallno=1079218620, arg1=1077504048, arg2=0) at vg_mylibc.c:76 #1 0x00000004 in ?? () #2 0x4017eff2 in vgPlain_main () at vg_main.c:1384 (gdb) Valgrind shows: 04/10 09:56:09 [RPC] Registered endpoint ncacn_ip_tcp:127.0.0.1[1035] 04/10 09:56:09 [RPC] Registered endpoint ncadg_ip_udp:127.0.0.1[1035] ... ==3670== valgrind's libpthread.so: KLUDGED call to: siglongjmp (cleanup handlers are ignored) Any ideas? cheers, -- Luke -- Luke Howard | PADL Software Pty Ltd | www.padl.com |
From: David E. <da...@2g...> - 2003-04-10 10:12:51
|
On Thu, 2003-04-10 at 11:56, Luke Howard wrote: > I'm attempting to run valgrind on a DCE RPC server. The problem is > that, while under valgrind, the server doesn't seem to receive any > client requests. It does work normally, of course, but there are > a couple of bugs for which valgrind will be very useful in fixing. What valgrind version are you using? -- -\- David Eriksson -/- www.2GooD.nu "I personally refuse to use inferior tools because of ideology." - Linus Torvalds |
From: Luke H. <lu...@PA...> - 2003-04-10 10:15:25
|
>What valgrind version are you using? 1.9.3. I will try 1.9.5 and report any improvements. -- Luke -- Luke Howard | PADL Software Pty Ltd | www.padl.com |
From: David E. <da...@2g...> - 2003-04-10 10:21:27
|
On Thu, 2003-04-10 at 12:14, Luke Howard wrote: > >What valgrind version are you using? > > 1.9.3. I will try 1.9.5 and report any improvements. May I suggeest that you also try the old 1.0.4 version. The reason for this is that I have a program with sockets and threads that works fine in 1.0.4 but has problems in 1.9.4. I haven't tried 1.9.5 yet. -- -\- David Eriksson -/- www.2GooD.nu "I personally refuse to use inferior tools because of ideology." - Linus Torvalds |
From: Luke H. <lu...@PA...> - 2003-04-10 10:32:45
|
>The reason for this is that I have a program with sockets and threads >that works fine in 1.0.4 but has problems in 1.9.4. I haven't tried >1.9.5 yet. No luck with 1.9.5 or 1.0.4 I'm afraid. It does appear that the stack gets smashed (see below) when siglongjmp()/sigsetjmp() are called: (gdb) bt #0 vg_do_syscall2 (syscallno=1074473960, arg1=1073822720, arg2=0) at vg_mylibc.c:76 #1 0x00000004 in ?? () #2 0x40064136 in vgPlain_main () at vg_main.c:1173 (gdb) And that this is potentially the problem. -- Luke -- Luke Howard | PADL Software Pty Ltd | www.padl.com |
From: Bastien C. <ba...@ch...> - 2003-04-10 11:47:36
|
On Thursday 10 April 2003 12:31, Luke Howard wrote: > ... > And that this is potentially the problem. Uh, just thinking about it: debugging valgrind with itself, is that possi= ble?=20 :-) Salut, Bastien --=20 -- The universe has its own cure for stupidity. -- -- Unfortunately, it doesn't always apply it. -- |
From: Nicholas N. <nj...@ca...> - 2003-04-10 12:09:17
|
On Thu, 10 Apr 2003, Bastien Chevreux wrote: > Uh, just thinking about it: debugging valgrind with itself, is that possible? > :-) Nope, unfortunately not, due to the LD_PRELOAD (ab)use at start-up. One of the reasons we'd like to not use it. Reminds me of an error message I saw on some kind of emulator program (User Mode Linux, or Wine, or something like that, I can't remember): "Error: Foo cannot run itself. You just had to try, didn't you?" N |
From: Julian S. <js...@ac...> - 2003-04-10 21:22:09
|
On Thursday 10 April 2003 10:21 am, David Eriksson wrote: > On Thu, 2003-04-10 at 12:14, Luke Howard wrote: > > >What valgrind version are you using? > > > > 1.9.3. I will try 1.9.5 and report any improvements. > > May I suggeest that you also try the old 1.0.4 version. > > The reason for this is that I have a program with sockets and threads > that works fine in 1.0.4 but has problems in 1.9.4. I haven't tried > 1.9.5 yet. Really? Can you send a test case? That shouldn't happen. I know there are some difficulties on glibc-2.3.X systems, but apart from that 1.9.X should run anything 1.0.X runs. J |