Re: [Scalablecr-discuss] Fortran bindings for SCR
Brought to you by:
kathrynmohror,
moody20
From: Adam T. M. <mo...@ll...> - 2014-08-18 19:11:22
|
Hi Wadud, Currently, SCR blocks all processes in a barrier at SCR_Start_checkpoint. The reason for this is to prevent processes from deleting any checkpoint data until all processes have at least reached the SCR_Start_checkpoint call. It could be that some process never reaches the call because it failed, in which case, we want to keep the existing checkpoint to restart the job. If you can store more than one checkpoint in cache at a time, this restriction can be relaxed, but for now we always invoke the barrier anyway. SCR_Complete_checkpoint is also implemented as a synchronous collective. We use this function to compute the redundancy data at the end of the checkpoint. This can be done quite efficiently by using all of the application processes. For checkpoints that are written down to the parallel file system in addition to cache, we do have the capability to copy the data from cache to the parallel file system asychronously. Writing to the parallel file system takes orders of magnitude more time than writing to cache, so there is a big benefit to this. Currently, one has to run an extra process on each node for this support. Having said all of that, it may be possible to support full asynchronous checkpointing, but I haven't thought through all of the details to be sure. -Adam Wadud Miah wrote: >Hi Adam, > >From what I understand, the SCR library checkpoints synchronously. Do you think it can be updated to write checkpoints asynchronously? I think asynchronous checkpoint scheme is still sort of synchronous as a single checkpoint can be written one at a time, so it has to wait at a barrier at the next checkpoint. Perhaps this can be user configured. > >I like how the library has been developed by prefixing all subroutine names with "SCR_". I like this practise and this is how the Hypre linear solver library has been written. > >Regards, >Wadud. > >-----Original Message----- >From: Wadud Miah [mailto:w....@qm...] >Sent: 18 August 2014 10:56 >To: Adam T. Moody >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hello Adam, > >Thanks so much for all your help. Is the version in Github the latest, i.e. 1.1.8? > >Regards, > >-----Original Message----- >From: Adam T. Moody [mailto:mo...@ll...] >Sent: 16 August 2014 00:05 >To: Wadud Miah >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hi Wadud, >Great, that helps. We're overdue for making an official 1.1-8 release, >so I'm glad you pulled the latest from github. > >pdsh is required in the scavenge phase. This is run from the >scr_postrun script in order to copy files from /tmp to the parallel file >system in the event of a failure. This is some functionality that would >need to be ported if you don't have it available on your system. >-Adam > > >Wadud Miah wrote: > > > >>Hi Adam, >> >>I obtained the latest version from git which contains the Fortran bindings. I noticed that configure worked even though I do not have pdsh installed. Will SCR still work without PDSH? >> >>Thanks for your help. >> >>-----Original Message----- >>From: Wadud Miah [mailto:w....@qm...] >>Sent: 15 August 2014 22:15 >>To: Adam T. Moody >>Cc: sca...@li... >>Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR >> >>Hello Adam, >> >>Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. >> >>Regards, >>Wadud. >> >>-----Original Message----- >>From: Adam T. Moody [mailto:mo...@ll...] >>Sent: 15 August 2014 21:09 >>To: Wadud Miah >>Cc: sca...@li... >>Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR >> >>Hi Wadud, >>I forgot to mention that you need link Fortran apps to -lscrf instead of >>-lscr. >> >>It's also helpful to look at the examples/test_ckpt.F for an example and >>see the makefiles.example for instructions on how it was compiled and >>linked. >>-Adam >> >> >>Adam T. Moody wrote: >> >> >> >> >> >>>Hi Wadud, >>>Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >>>are modeled after the Fortran bindings used for MPI, so if you're >>>familiar with MPI calls from Fortran, SCR calls will look familiar. All >>>functions are invoked with captial letters, and each function returns an >>>error code of type INTEGER in its last argument. You sould be able to >>>include scrf.h in your Fortran application and make calls to SCR functions. >>> >>>It's been a while since I've tested those, though, so let me know if you >>>hit any problems. >>>-Adam >>> >>> >>>Wadud Miah wrote: >>> >>> >>> >>> >>> >>> >>> >>>>Hello, >>>> >>>>Will Fortran bindings be available for SCR? >>>> >>>>Regards, >>>> >>>>------------------------------------------- >>>>Wadud Miah >>>>Research Computing Services (HPC) >>>> >>>> >>>> >>>> >>>>------------------------------------------------------------------------ >>>> >>>>------------------------------------------------------------------------------ >>>> >>>> >>>>------------------------------------------------------------------------ >>>> >>>>_______________________________________________ >>>>Scalablecr-discuss mailing list >>>>Sca...@li... >>>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>------------------------------------------------------------------------------ >>>_______________________________________________ >>>Scalablecr-discuss mailing list >>>Sca...@li... >>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>> >>> >>> >>> >>> >>> >>------------------------------------------------------------------------------ >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |