From: Gene C. <ge...@cc...> - 2010-02-11 22:22:12
|
Thanks for letting us know about NixOS. Whenever we have to write the next grant proposal, it's always helpful to mention things like this. As you have probably seen, we also produce a .deb package, and soon we will submit it officially to the Debian distro. If there are opportunities to share any of the support functions between NixOS and Debian, please let us know. - Gene On Thu, Feb 11, 2010 at 06:38:14PM +0100, Marco Maggesi wrote: > It works! > > Thank you very much for your help and support. > > BTW: I plan to maintain a dmtcp package for Nixpkgs/NixOS > (http://nixos.org) > Version 1.1.3 is already included in Nixpkgs and tested successfully on > NixOS. > I'm going to prepare a new package with the latest svn version. > > Many thanks to everybody, > Marco > > On Feb 10, 2010, at 5:16 AM, Kapil Arya wrote: > >> Actually, I forgot to mention earlier. I have updated DMTCP, so to use >> dmtcp_{checkpoint,restart,command,coordinator} you do not need to use >> dmtcp_nocheckpoint. However, for commands other than dmtcp_*, you >> still have to use dmtcp_nocheckpoint to prevent it from getting >> checkpointed. >> >> >> >> On Tue, Feb 9, 2010 at 11:12 PM, Jason Ansel <ja...@cs...> >> wrote: >>> The latest SVN also contains a command: dmtcp_nocheckpoint. >>> >>> From your script you may have to call: >>> dmtcp_nocheckpoint dmtcp_command -bc >>> >>> To prevent dmtcp_command from getting checkpointed. >>> >>> --- >>> >>> Thanks Kapil! >>> >>> --Jason >>> >>> On Tue, Feb 9, 2010 at 8:00 PM, Kapil Arya <ka...@cc...> wrote: >>>> Hi, >>>> >>>> The latest svn (rev:489) contains the fix for the bug. It had to >>>> do with bash. >>>> >>>> Bash keeping its own list of shell/env variables; when one calls >>>> unsetenv(), bash removes the variable from its internal list but it >>>> wouldn't remove it from the process' set of env variables. >>>> >>>> Later on, if you do getenv() on the same name, bash (being unable to >>>> it in its internal list) would lookup that variable in the process' >>>> evn variable list and return it from there. The fix is to call >>>> unsetenv() and glibc:unsetenv() both, in order to get rid of any >>>> environment variable. >>>> >>>> The culprit environment variable in our case was "LD_PRELOAD". As >>>> you >>>> would have guessed, gzip was exec()'d with this "LD_PRELOAD" set to >>>> dmtcphijack.so and hence was getting under checkpoint control which >>>> caused the problems. >>>> >>>> Thanks, >>>> -Kapil >>>> >>>> On Tue, Feb 9, 2010 at 2:39 PM, Kapil Arya <ka...@cc...> >>>> wrote: >>>>> Thanks Jason! >>>>> >>>>> I remember seeing this problem earlier and some point I fixed it. I >>>>> will look into my notes and will write back soon. >>>>> >>>>> Thanks, >>>>> -Kapil >>>>> >>>>> On Tue, Feb 9, 2010 at 2:28 PM, Jason Ansel >>>>> <ja...@cs...> wrote: >>>>>> I though the issue here was dmtcp_checkpoint being >>>>>> checkpointed, but I >>>>>> was wrong. I just pushed revision 488 which adds the >>>>>> dmtcp_nocheckpoint command that allows you start a non-chekpointed >>>>>> process from within a checkpointed one. This did not fix the >>>>>> problem. >>>>>> >>>>>> Kapil, >>>>>> >>>>>> Can you take a look at this, I think the failure is in newer >>>>>> code that >>>>>> I dont know. Try running: >>>>>> dmtcp_checkpoint bash ./test/selfcheckpoint.sh >>>>>> >>>>>> In the latest svn. >>>>>> >>>>>> It dies at: >>>>>> [27761] ERROR at connectionmanager.cpp:247 in fdToDevice; >>>>>> REASON='JASSERT(false) failed' >>>>>> device = /tmp/dmtcp.Nmj002 (deleted) >>>>>> Message: Unimplemented file type. >>>>>> Terminating... >>>>>> >>>>>> >>>>>> Thanks, >>>>>> >>>>>> --Jason >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On Mon, Feb 8, 2010 at 9:45 AM, Marco Maggesi >>>>>> <ma...@ma...> wrote: >>>>>>> Sorry, I tried to follow your suggestion but I can't find the >>>>>>> right place where to add the -static flag. >>>>>>> >>>>>>> I'm lost in the various Makefiles. >>>>>>> >>>>>>> I also tried to build everything static with >>>>>>> >>>>>>> CXXFLAGS=-static ./configure >>>>>>> >>>>>>> but I get an error during the compilation. >>>>>>> >>>>>>> Can you give more insight into how can I compile dmtcp_command >>>>>>> statically. >>>>>>> Thank you. >>>>>>> Marco >>>>>>> >>>>>>> >>>>>>> On Feb 7, 2010, at 7:14 AM, Jason Ansel wrote: >>>>>>> >>>>>>>> Try compiling dmtcp_command statically (add "-static" to the gcc >>>>>>>> command that compiles it). >>>>>>>> >>>>>>>> The issue here is dmtcp is trying to checkpoint >>>>>>>> dmtcp_checkpoint, >>>>>>>> which is not supported. Static compilation should prevent this. >>>>>>>> >>>>>>>> If this works we can make it the default option in the next >>>>>>>> release. >>>>>>>> >>>>>>>> --Jason >>>>>>>> >>>>>>>> On Thu, Feb 4, 2010 at 10:01 AM, <ma...@ma...> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I would like to write a script that checkpoint itself >>>>>>>>> after some >>>>>>>>> computations. >>>>>>>>> Does it can be done without using the dmtcp API? >>>>>>>>> >>>>>>>>> To make a test, I wrote the following bash script called >>>>>>>>> self-freeze.sh >>>>>>>>> (Actually, I would like to use this with the OCaml >>>>>>>>> interpreter, but I >>>>>>>>> will >>>>>>>>> give an example with bash here for commodity.) >>>>>>>>> >>>>>>>>> ----------------------------------------------------------------- >>>>>>>>> #!/bin/sh >>>>>>>>> >>>>>>>>> # make some heavy bash computations. >>>>>>>>> TWO=$((1+1)) >>>>>>>>> >>>>>>>>> # freeze itself >>>>>>>>> $(dmtcp_command -bc; sleep 1; dmtcp_command -k) >>>>>>>>> >>>>>>>>> # print the result of the heavy computation on restart. >>>>>>>>> echo $TWO >>>>>>>>> ----------------------------------------------------------------- >>>>>>>>> >>>>>>>>> and I tried to do the following: >>>>>>>>> >>>>>>>>> dmtcp_checkpoint self-freeze.sh >>>>>>>>> >>>>>>>>> I would expect that the program terminates after checkpointing. >>>>>>>>> Instead the program remains blocked and no checkpoint is >>>>>>>>> created. >>>>>>>>> >>>>>>>>> Then I replaced "dmtcp_command -bc" with "dmtcp_command -c" and >>>>>>>>> repeated the experiment. This time the program terminates with >>>>>>>>> a segmentation fault. Below you find the trace of the output. >>>>>>>>> >>>>>>>>> I tried several other variants of this method with no luck. >>>>>>>>> How can I fix this? >>>>>>>>> >>>>>>>>> Thank you in advance, >>>>>>>>> Marco >>>>>>>>> >>>>>>>>> ----------------------------------------------------------------- >>>>>>>>> ~$ dmtcp_checkpoint ./self-freeze.sh >>>>>>>>> DMTCP/MTCP Copyright (C) 2006-2008 Jason Ansel, Michael >>>>>>>>> Rieker, >>>>>>>>> Kapil Arya, and >>>>>>>>> Gene Cooperman >>>>>>>>> This program comes with ABSOLUTELY NO WARRANTY. >>>>>>>>> This is free software, and you are welcome to redistribute it >>>>>>>>> under certain conditions; see COPYING file for details. >>>>>>>>> (Use flag "-q" to hide this message.) >>>>>>>>> >>>>>>>>> [15974] ERROR at dmtcpworker.cpp:758 in connectToCoordinator; >>>>>>>>> REASON='JASSERT(_coordinatorSocket.isValid()) failed' >>>>>>>>> coordinatorAddr = 127.0.0.1 >>>>>>>>> coordinatorPort = 7779 >>>>>>>>> Message: Failed to connect to DMTCP coordinator >>>>>>>>> Terminating... >>>>>>>>> dmtcp_coordinator starting... >>>>>>>>> Port: 7779 >>>>>>>>> Checkpoint Interval: -1 >>>>>>>>> Exit on last client: 1 >>>>>>>>> Backgrounding... >>>>>>>>> [15983] NOTE at connectionmanager.cpp:443 in >>>>>>>>> handlePreExistingFd; >>>>>>>>> REASON='found pre-existing socket... will not be restored' >>>>>>>>> fd = 3 >>>>>>>>> [15984] ERROR at connectionmanager.cpp:231 in fdToDevice; >>>>>>>>> REASON='JASSERT(false) failed' >>>>>>>>> device = /tmp/dmtcp.DNdhXL (deleted) >>>>>>>>> Message: Unimplemented file type. >>>>>>>>> Terminating... >>>>>>>>> device = socket:[168300] >>>>>>>>> [15983] ERROR at connectionmanager.cpp:231 in fdToDevice; >>>>>>>>> REASON='JASSERT(false) failed' >>>>>>>>> device = /tmp/dmtcp.7I0rZL (deleted) >>>>>>>>> Message: Unimplemented file type. >>>>>>>>> Terminating... >>>>>>>>> Segmentation fault >>>>>>>>> ----------------------------------------------------------------- >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ---------------------------------------------------------------- >>>>>>>>> This message was sent using IMP, the Internet Messaging >>>>>>>>> Program. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------------ >>>>>>>>> The Planet: dedicated and managed hosting, cloud storage, >>>>>>>>> colocation >>>>>>>>> Stay online with enterprise data centers and the best >>>>>>>>> network in the >>>>>>>>> business >>>>>>>>> Choose flexible plans and management services without >>>>>>>>> long-term contracts >>>>>>>>> Personal 24x7 support from experience hosting pros just a >>>>>>>>> phone call >>>>>>>>> away. >>>>>>>>> http://p.sf.net/sfu/theplanet-com >>>>>>>>> _______________________________________________ >>>>>>>>> Dmtcp-forum mailing list >>>>>>>>> Dmt...@li... >>>>>>>>> https://lists.sourceforge.net/lists/listinfo/dmtcp-forum >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> > |