From: Marco M. <ma...@ma...> - 2010-02-11 17:38:43
|
It works! Thank you very much for your help and support. BTW: I plan to maintain a dmtcp package for Nixpkgs/NixOS (http://nixos.org ) Version 1.1.3 is already included in Nixpkgs and tested successfully on NixOS. I'm going to prepare a new package with the latest svn version. Many thanks to everybody, Marco On Feb 10, 2010, at 5:16 AM, Kapil Arya wrote: > Actually, I forgot to mention earlier. I have updated DMTCP, so to use > dmtcp_{checkpoint,restart,command,coordinator} you do not need to use > dmtcp_nocheckpoint. However, for commands other than dmtcp_*, you > still have to use dmtcp_nocheckpoint to prevent it from getting > checkpointed. > > > > On Tue, Feb 9, 2010 at 11:12 PM, Jason Ansel <ja...@cs...> > wrote: >> The latest SVN also contains a command: dmtcp_nocheckpoint. >> >> From your script you may have to call: >> dmtcp_nocheckpoint dmtcp_command -bc >> >> To prevent dmtcp_command from getting checkpointed. >> >> --- >> >> Thanks Kapil! >> >> --Jason >> >> On Tue, Feb 9, 2010 at 8:00 PM, Kapil Arya <ka...@cc...> wrote: >>> Hi, >>> >>> The latest svn (rev:489) contains the fix for the bug. It had to >>> do with bash. >>> >>> Bash keeping its own list of shell/env variables; when one calls >>> unsetenv(), bash removes the variable from its internal list but it >>> wouldn't remove it from the process' set of env variables. >>> >>> Later on, if you do getenv() on the same name, bash (being unable to >>> it in its internal list) would lookup that variable in the process' >>> evn variable list and return it from there. The fix is to call >>> unsetenv() and glibc:unsetenv() both, in order to get rid of any >>> environment variable. >>> >>> The culprit environment variable in our case was "LD_PRELOAD". As >>> you >>> would have guessed, gzip was exec()'d with this "LD_PRELOAD" set to >>> dmtcphijack.so and hence was getting under checkpoint control which >>> caused the problems. >>> >>> Thanks, >>> -Kapil >>> >>> On Tue, Feb 9, 2010 at 2:39 PM, Kapil Arya <ka...@cc...> >>> wrote: >>>> Thanks Jason! >>>> >>>> I remember seeing this problem earlier and some point I fixed it. I >>>> will look into my notes and will write back soon. >>>> >>>> Thanks, >>>> -Kapil >>>> >>>> On Tue, Feb 9, 2010 at 2:28 PM, Jason Ansel >>>> <ja...@cs...> wrote: >>>>> I though the issue here was dmtcp_checkpoint being checkpointed, >>>>> but I >>>>> was wrong. I just pushed revision 488 which adds the >>>>> dmtcp_nocheckpoint command that allows you start a non-chekpointed >>>>> process from within a checkpointed one. This did not fix the >>>>> problem. >>>>> >>>>> Kapil, >>>>> >>>>> Can you take a look at this, I think the failure is in newer >>>>> code that >>>>> I dont know. Try running: >>>>> dmtcp_checkpoint bash ./test/selfcheckpoint.sh >>>>> >>>>> In the latest svn. >>>>> >>>>> It dies at: >>>>> [27761] ERROR at connectionmanager.cpp:247 in fdToDevice; >>>>> REASON='JASSERT(false) failed' >>>>> device = /tmp/dmtcp.Nmj002 (deleted) >>>>> Message: Unimplemented file type. >>>>> Terminating... >>>>> >>>>> >>>>> Thanks, >>>>> >>>>> --Jason >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Mon, Feb 8, 2010 at 9:45 AM, Marco Maggesi <ma...@ma... >>>>> > wrote: >>>>>> Sorry, I tried to follow your suggestion but I can't find the >>>>>> right place where to add the -static flag. >>>>>> >>>>>> I'm lost in the various Makefiles. >>>>>> >>>>>> I also tried to build everything static with >>>>>> >>>>>> CXXFLAGS=-static ./configure >>>>>> >>>>>> but I get an error during the compilation. >>>>>> >>>>>> Can you give more insight into how can I compile dmtcp_command >>>>>> statically. >>>>>> Thank you. >>>>>> Marco >>>>>> >>>>>> >>>>>> On Feb 7, 2010, at 7:14 AM, Jason Ansel wrote: >>>>>> >>>>>>> Try compiling dmtcp_command statically (add "-static" to the gcc >>>>>>> command that compiles it). >>>>>>> >>>>>>> The issue here is dmtcp is trying to checkpoint >>>>>>> dmtcp_checkpoint, >>>>>>> which is not supported. Static compilation should prevent this. >>>>>>> >>>>>>> If this works we can make it the default option in the next >>>>>>> release. >>>>>>> >>>>>>> --Jason >>>>>>> >>>>>>> On Thu, Feb 4, 2010 at 10:01 AM, <ma...@ma...> wrote: >>>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I would like to write a script that checkpoint itself after >>>>>>>> some >>>>>>>> computations. >>>>>>>> Does it can be done without using the dmtcp API? >>>>>>>> >>>>>>>> To make a test, I wrote the following bash script called >>>>>>>> self-freeze.sh >>>>>>>> (Actually, I would like to use this with the OCaml >>>>>>>> interpreter, but I >>>>>>>> will >>>>>>>> give an example with bash here for commodity.) >>>>>>>> >>>>>>>> ----------------------------------------------------------------- >>>>>>>> #!/bin/sh >>>>>>>> >>>>>>>> # make some heavy bash computations. >>>>>>>> TWO=$((1+1)) >>>>>>>> >>>>>>>> # freeze itself >>>>>>>> $(dmtcp_command -bc; sleep 1; dmtcp_command -k) >>>>>>>> >>>>>>>> # print the result of the heavy computation on restart. >>>>>>>> echo $TWO >>>>>>>> ----------------------------------------------------------------- >>>>>>>> >>>>>>>> and I tried to do the following: >>>>>>>> >>>>>>>> dmtcp_checkpoint self-freeze.sh >>>>>>>> >>>>>>>> I would expect that the program terminates after checkpointing. >>>>>>>> Instead the program remains blocked and no checkpoint is >>>>>>>> created. >>>>>>>> >>>>>>>> Then I replaced "dmtcp_command -bc" with "dmtcp_command -c" and >>>>>>>> repeated the experiment. This time the program terminates with >>>>>>>> a segmentation fault. Below you find the trace of the output. >>>>>>>> >>>>>>>> I tried several other variants of this method with no luck. >>>>>>>> How can I fix this? >>>>>>>> >>>>>>>> Thank you in advance, >>>>>>>> Marco >>>>>>>> >>>>>>>> ----------------------------------------------------------------- >>>>>>>> ~$ dmtcp_checkpoint ./self-freeze.sh >>>>>>>> DMTCP/MTCP Copyright (C) 2006-2008 Jason Ansel, Michael >>>>>>>> Rieker, >>>>>>>> Kapil Arya, and Gene >>>>>>>> Cooperman >>>>>>>> This program comes with ABSOLUTELY NO WARRANTY. >>>>>>>> This is free software, and you are welcome to redistribute it >>>>>>>> under certain conditions; see COPYING file for details. >>>>>>>> (Use flag "-q" to hide this message.) >>>>>>>> >>>>>>>> [15974] ERROR at dmtcpworker.cpp:758 in connectToCoordinator; >>>>>>>> REASON='JASSERT(_coordinatorSocket.isValid()) failed' >>>>>>>> coordinatorAddr = 127.0.0.1 >>>>>>>> coordinatorPort = 7779 >>>>>>>> Message: Failed to connect to DMTCP coordinator >>>>>>>> Terminating... >>>>>>>> dmtcp_coordinator starting... >>>>>>>> Port: 7779 >>>>>>>> Checkpoint Interval: -1 >>>>>>>> Exit on last client: 1 >>>>>>>> Backgrounding... >>>>>>>> [15983] NOTE at connectionmanager.cpp:443 in >>>>>>>> handlePreExistingFd; >>>>>>>> REASON='found pre-existing socket... will not be restored' >>>>>>>> fd = 3 >>>>>>>> [15984] ERROR at connectionmanager.cpp:231 in fdToDevice; >>>>>>>> REASON='JASSERT(false) failed' >>>>>>>> device = /tmp/dmtcp.DNdhXL (deleted) >>>>>>>> Message: Unimplemented file type. >>>>>>>> Terminating... >>>>>>>> device = socket:[168300] >>>>>>>> [15983] ERROR at connectionmanager.cpp:231 in fdToDevice; >>>>>>>> REASON='JASSERT(false) failed' >>>>>>>> device = /tmp/dmtcp.7I0rZL (deleted) >>>>>>>> Message: Unimplemented file type. >>>>>>>> Terminating... >>>>>>>> Segmentation fault >>>>>>>> ----------------------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ---------------------------------------------------------------- >>>>>>>> This message was sent using IMP, the Internet Messaging >>>>>>>> Program. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------------ >>>>>>>> The Planet: dedicated and managed hosting, cloud storage, >>>>>>>> colocation >>>>>>>> Stay online with enterprise data centers and the best network >>>>>>>> in the >>>>>>>> business >>>>>>>> Choose flexible plans and management services without long- >>>>>>>> term contracts >>>>>>>> Personal 24x7 support from experience hosting pros just a >>>>>>>> phone call >>>>>>>> away. >>>>>>>> http://p.sf.net/sfu/theplanet-com >>>>>>>> _______________________________________________ >>>>>>>> Dmtcp-forum mailing list >>>>>>>> Dmt...@li... >>>>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> |