From: Gene C. <ge...@cc...> - 2013-03-04 20:32:16
|
Hi Joshua, Thanks for uncovering this bug, and the precise documentation of the bug. This is really helpful. We expect to be able to fix this quickly. It is typically caused in exactly the way that you document. User code holds a lock defined by one of the run-time libraries, and DMTCP calling a function that uses the same lock at checkpoint time. DMTCP tries to be conservative at checkpoint time and avoid situations in which it might grab a lock already held by user code. You seem to have uncovered an analogous situation occuring during static initialization. Thanks for reporting this. - Gene On Mon, Mar 04, 2013 at 07:03:31PM +0000, Louie, Joshua D wrote: > Hi, > I've run into an issue where static initialization/ or constructor attribute functions that are called in a loaded shared object causes a hang. All versions I've tried (1.2.4 - eventual release of 2.0.0) hit the same issue. Here's the scenario (and I've attached sample code to reproduce the issue). The main code opens a shared object with dlopen, and one of the static initialization functions does a fork or system command. For fork/execvp/execvpe/execve, they all have to grab the lock with write permissions. The problem is that before we call the actual dlopen, we have to grab the lock with read permissions. Normally we release the lock after dlopen stuff all finishes. The problem is that we're not done with dlopen, so as a result the call trying to get the write lock can't get it since there's still a reader waiting on it. With my particular situation, I can make progress by not grabbing the lock on the dlopen, since I have well defined times as to when a checkpoint will occur, but I wanted to bring thi > s to your attention, so you all can figure out the best way to deal with it. > > ---------- dlopen_test.c ---------- > #include<cstdio> > #include<dlfcn.h> > #include<dmtcpaware.h> > > typedef int (*print_fn)(void); > > int main() { > printf("Opening ./printer.so\n"); > void *so = dlopen("./printer.so", RTLD_LOCAL| RTLD_LAZY); > printf("Done opening ./printer.so\n"); > > print_fn print_func; > > print_func = (print_fn)dlsym(so, "print_func"); > print_func(); > return 0; > } > > ---------- printer.c ---------- > #include<cstdio> > #include<cstdlib> > > extern "C" int > print_constructor() { > printf(" In print_constructor\n"); > system("echo ' Will I hang?'"); // This is where in DMTCP environment, it hangs > printf(" Leaving print_constructor\n"); > return 0; > } > > extern "C" int > print_func() { > printf(" In print_func\n"); > return 0; > } > > static int value = print_constructor(); > > ---------- Expected results ---------- > Opening ./printer.so > In print_constructor > Will I hang? > Leaving print_constructor > Done opening ./printer.so > In print_func > > Joshua Louie > > ------------------------------------------------------------------------------ > Everyone hates slow websites. So do we. > Make your web apps faster with AppDynamics > Download AppDynamics Lite for free today: > http://p.sf.net/sfu/appdyn_d2d_feb > _______________________________________________ > Dmtcp-forum mailing list > Dmt...@li... > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum |