[Valgrind-users] Debugging a DCE RPC server

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

I'm attempting to run valgrind on a DCE RPC server. The problem is
that, while under valgrind, the server doesn't seem to receive any
client requests. It does work normally, of course, but there are
a couple of bugs for which valgrind will be very useful in fixing.

There are some interesting aspects of the DCE RPC runtime that,
while not necessarily related to this problem, may give valgrind
some grief. The RPC library does the following "weird" things: 

  o wraps the pthread API to provide POSIX Threads Draft
    4 API semantics (however, the symbols do not conflict
    with LinuxThreads)

  o wraps common system calls with jackets that enable
    asynchronous thread cancellation for the duration of
    the system call

  o implements exception handling on top of setjmp()/longjmp()
    and pthread cancels (see below):

/*
 * Note that the rough schema for the exception handler is:
 *
 *      do {
 *          pthread_cleanup_push("routine that will longjmp back here");
 *          val = setjmp(...);  
 *          if (val == 0)
 *              ...normal code...
 *          else
 *              ...exception code...
 *          ...finally code...
 *          if ("exception happened")
 *              if ("exception handled")
 *                  break;
 *              else
 *                  ...re-raise exception...
 *          pthread_cleanup_pop(...);
 *      } while (0);
 * 
 * Exceptions are raised by doing a pthread_cancel against one's self  
 * and then doing a pthread_testcancel.  This causes the topmost cleanup
 * routine to be popped and called.  This routine (_exc_cleanup_handler)
 * longjmp's back to the exception handler.  This approach means we can
 * leverage off the fact that the push/pop routines are maintaining some
 * per-thread state (hopefully [but likely not] more efficiently than
 * we could ourselves).  We need this state to string together the dynamic
 * stack of exception handlers.
 */

Disabling the system call jackets didn't seem to help.

A backtrace of the under-valgrind server follows:

(gdb) bt
#0  0x401924a2 in vgPlain_do_syscall () from /usr/local/lib/valgrind/valgrind.so
#1  0x4052a1e2 in recvmsg (fd=9, message=0x496ca89c, flags=0) at libc_jacket.c:221
#2  0x45fd1078 in receive_packet (assoc=0x4179608c, fragbuf_p=0x496ca9e4, ovf_fragbuf_p=0x496ca9e8, 
    st=0x496ca9f0) at ../../../ncklib/include/comsoc_bsd.h:339
#3  0x45fcf9ea in receive_dispatch (assoc=0x4179608c) at cnrcvr.c:508
#4  0x45fcf248 in rpc__cn_network_receiver (assoc=0x4179608c) at cnrcvr.c:274
#5  0x4053254c in thread_wrapper (info=0x417961e0) at vg_libpthread.c:667
#6  0x4016c778 in do__apply_in_new_thread_bogusRA () at vg_scheduler.c:2122

Then, probably after the siglongjmp() call:

(gdb) bt
#0  vg_do_syscall2 (syscallno=1079218620, arg1=1077504048, arg2=0) at vg_mylibc.c:76
#1  0x00000004 in ?? ()
#2  0x4017eff2 in vgPlain_main () at vg_main.c:1384
(gdb) 

Valgrind shows:

04/10 09:56:09 [RPC] Registered endpoint ncacn_ip_tcp:127.0.0.1[1035]
04/10 09:56:09 [RPC] Registered endpoint ncadg_ip_udp:127.0.0.1[1035]
...
==3670== valgrind's libpthread.so: KLUDGED call to: siglongjmp (cleanup handlers are ignored)

Any ideas?

cheers,

-- Luke

--
Luke Howard | PADL Software Pty Ltd | www.padl.com