From: Gene C. <ge...@cc...> - 2011-07-29 20:21:03
|
> Just wondering if manually removing the standard output and error files and > ignoring the above note is the right thing to do? > [ in doing: dmtcp_checkpoint myprogram >outfile 2>errfile ] Yes, that's exactly right. Fundamentally, when DMTCP restarts it has a problem because it knows it should send stdout to outfile, but it sees a pre-existing outfile. There's no one right answer in this situation. One could append to the end, but suppose this is an old outfile from a different run. If my memory is correct, the current heuristic of DMTCP is to treat stdin/stdout/stderr specially and not overwrite an existing file. There are also special heuristics when DMTCP is doing I/O to /dev/tty (the terminal), as opposed to an ordinary file. For ordinary writing to a file, DMTCP will remember the file offset at which it was writing, and then continue writing from that previous file offset. I hope this helps in understanding what behavior to expect from DMTCP. Best, - Gene On Fri, Jul 29, 2011 at 02:16:00PM -0500, siavash mirarab wrote: > Thanks Gene for the quick reply. > > I think I figured out why I was confused. When I redirect stderr and stdout > of myprogram to files, the output does not get flushed right away. In fact, > it gets flushed only after the program stops running (I am not quite sure > why it doesn't flush more often). Previously I was terminating myprogram > midway through. The reason the errfile and outfile files were empty was > because they were not flushed yet. If I wait until the program finishes, > standard error and standard output do get correctly redirected to errfile > and outfile. > > Here is another question though. When I try to restart a program that has > redirected input and output, I get the following error: > > [23713] ERROR at connection.cpp:1054 in restore; > REASON='JASSERT(jalib::Filesystem::FileExists(_path) == false) failed' > _path = <dir>/errfile > > where errfile is the file to which standard error was redirected. I am > guessing I need to manually remove errfile before restarting the program? > > When I do manually remove the file and restart, everything seems to work ok, > but I get the following message: > [31218] NOTE at connection.cpp:1172 in restoreFile; REASON='File not > present, copying from saved checkpointed file' > _path = <dir>/errfile > > Just wondering if manually removing the standard output and error files and > ignoring the above note is the right thing to do? > > > Thanks > Siavash > > On Fri, Jul 29, 2011 at 1:24 PM, Gene Cooperman <ge...@cc...> wrote: > > > First, a little background on DMTCP, to see if I can clarify the question. > > The program dmtcp_checkpoint really just does an exec() into myprogram. > > There is always just one process (one pid). Redirection of stdout and > > stderr survive across an exec(). In essence, dmtcp_checkpoint > > is a thin wrapper that does something close to: > > LD_PRELOAD=dmtcphijack.so myprogram > > (assuming bash, where this will set LD_PRELOAD in the environment of > > myprogram) > > > > Next, you write: > > > everything (stderr of dmtcp_checkpoint in addition to stderr and stdout > > of > > > myprogram) is outputted to my screen, as expected. > > > > So, in the case below, that single stdout and stderr is redirected > > to outfile and errfile for both programs (since they're the same process). > > > dmtcp_checkpoint myprogram >outfile 2>errfile > > > > Essentially, the steps are: > > 1. The shell sets up the redirection > > 2. dmtcp_checkpoint begins > > 3. It execs into myprogram, while preserving the original redirection > > > > If you're interested in having DMTCP be less verbose, you could > > always try: dmtcp_checkpoint --quiet myprogram > > (In general, dmtcp_checkpoint --help for the options.) > > > > It's possible I misunderstood your question. Does this help clarify > > things? > > If not, please write back. > > > > Best, > > - the DMTCP team > > > > On Fri, Jul 29, 2011 at 11:57:46AM -0500, siavash mirarab wrote: > > > Hello, > > > > > > I am wondering what happens to the standard error and standard output of > > the > > > program being run under dmtcp_checkpoint? Here is what I have > > > experienced. When I run > > > > > > dmtcp_checkpoint myprogram > > > > > > everything (stderr of dmtcp_checkpoint in addition to stderr and stdout > > of > > > myprogram) is outputted to my screen, as expected. However, when I run > > > > > > dmtcp_checkpoint myprogram >outfile 2>errfile > > > > > > only stderr and stdout of dmtcp_ckeckpoint seem to be redirected to > > errfile > > > and outfile (and outfile is empty). > > > > > > Is this the expected behavior? Is there a way to ask dmtcp_checkpoint to > > > redirect the stderr and stdout of myprogram to a file? > > > > > > Thanks > > > Siavash > > > > > > > ------------------------------------------------------------------------------ > > > Got Input? Slashdot Needs You. > > > Take our quick survey online. Come on, we don't ask for help often. > > > Plus, you'll get a chance to win $100 to spend on ThinkGeek. > > > http://p.sf.net/sfu/slashdot-survey > > > > > _______________________________________________ > > > Dmtcp-forum mailing list > > > Dmt...@li... > > > https://lists.sourceforge.net/lists/listinfo/dmtcp-forum > > > > |