Hi Arjun,
> If I have an older checkpoint flushed out to the parallel file system
> and i suffer a catastrophic failure of nodes that violates the
> redundancy scheme, is it possible in SCR from me to send the checkpoint
> images from the parallel file system to one or more spare nodes to
> restart the entire application from the previous checkpoint ?
Yes, SCR can do this. It will first attempt to restart the job with spare nodes using cached checkpoints. If it fails to be able to rebuild the cached checkpoints, it will restart using the most recent checkpoint that was pushed out to the parallel file system.
Hope that helps,
Kathryn
>
> Arjun J Rao
> ------------------------------------------------------------------------------
> Learn the latest--Visual Studio 2012, SharePoint 2013, SQL 2012, more!
> Discover the easy way to master current and previous Microsoft technologies
> and advance your career. Get an incredible 1,500+ hours of step-by-step
> tutorial videos with LearnDevNow. Subscribe today and save!
> http://pubads.g.doubleclick.net/gampad/clk?id=58041391&iu=/4140/ostg.clktrk_______________________________________________
> Scalablecr-discuss mailing list
> Sca...@li...
> https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss
______________________________________________________________
Kathryn Mohror, ka...@ll..., http://people.llnl.gov/mohror1
CASC @ Lawrence Livermore National Laboratory, Livermore, CA, USA
|