From: Walker, Bruce J <bruce.walker@hp...> - 2003-12-11 22:07:17
> 3. Checkpoint/restart. Questions were asked in three areas=20
> i. databases (large queries - continue from where they=20
> left off?),
> ii. networking (can sessions be maintained for a cvip?)
> iii. HPC (large computations - will a crash mean a restart?)
The term checkpoint/restart is used in a couple of different ways. Some
people use the term/technique as a way to process migration (checkpoint
to disk and immediately "restart" on another machine). That isn't
needed in SSI since we can migrate without checkpointing. Except for
process migration, I don't think checkpoint makes sense for either
databases or networking (and we do the migration without the need for
checkpoint). There are several reasons. If you are doing checkpoints
for reasons other than migration, you checkpoint and then continue the
execution. If there is a failure you restart at the last checkpoint.
All the work done since the last checkpoint is lost/redone. If there
are external inputs that have come in since the last checkpoint, those
won't come in again so you have a mess, in the case of databases or
networking applications. =20
For HPC, it makes sense to checkpoint in many cases because it is ok to
redo the calculations on a restart (no external inputs to lose). There
is a project underway to provide checkpoint/restart for HPC applications
on openssi, specifically for HPC computation (MPI in particular).
> En Chiang
> This SF.net email is sponsored by: IBM Linux Tutorials.
> Become an expert in LINUX or just sharpen your skills. Sign=20
> up for IBM's
> Free Linux Tutorials. Learn everything from the bash shell=20
> to sys admin.
> Click now! =
> Ssic-linux-users mailing list