scalablecr-discuss Mailing List for Scalable Checkpoint / Restart Library
Brought to you by:
kathrynmohror,
moody20
You can subscribe to this list here.
2013 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(4) |
Oct
|
Nov
(4) |
Dec
(6) |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2014 |
Jan
(7) |
Feb
|
Mar
|
Apr
(6) |
May
|
Jun
|
Jul
|
Aug
(16) |
Sep
|
Oct
|
Nov
|
Dec
|
2016 |
Jan
(6) |
Feb
(4) |
Mar
|
Apr
|
May
|
Jun
|
Jul
(1) |
Aug
|
Sep
|
Oct
|
Nov
|
Dec
|
From: Maksym P. <mpl...@os...> - 2016-07-21 17:05:53
|
Hello, I'm interacting with SCR already for some time, but now I'm curious what are the applications which use SCR. Could anybody share his/her experience of embedding SCR into some application? Are there certain types of the applications which benefit from SCR more? Probably you're also aware of the examples of SCR being used as a part of another C/R facility (like CRUISE). I would be grateful for any feedback. -- Regards, Maksym Planeta |
From: Maksym P. <mpl...@os...> - 2016-02-18 14:16:07
|
Hi, I have a request for comment regarding function "scr_bool_check_halt_and_decrement" in scr.c. This function checks if halting condition is met, and if yes rank 0 broadcasts decision to halt and all processes call exit(0). I don't really understand the way how rank zero checks the conditions. Essentially it checks the conditions in following way: if (A) { /* do A */ scr_halt(A) } if (B) { /* do B */ scr_halt(B) } if (C) { /* do C */ scr_halt(C) } Why doesn't rank zero use if-then-else-if construct? Is it intentional? Do you have some priorities in the order how the reasons should be stored? The implication of current order that you have to initiate flush too several times. I'm not sure if that's significant, but I'd like to hear from you if you intentionally did what you did. -- Regards, Maksym Planeta |
From: Holguin, C. A <chr...@in...> - 2016-02-16 18:15:47
|
Kathryn, Thanks for getting back to me. We have been digging into SCR quite a bit, but it would certainly be valuable to hear your perspective on what the RM needs are for SCR. Do you have a general timeframe for when you could attend a meeting with us? -- Chris Holguin From: Mohror, Kathryn [mailto:mo...@ll...] Sent: Monday, February 15, 2016 1:26 PM To: Holguin, Christopher A <chr...@in...>; sca...@li... Subject: RE: PMIx integration Hi Chris, Sorry for the delay in reply. We are not aware of anyone doing work in this area, and it sounds like it could be interesting to pursue. We'd be happy to meet with you to discuss SCR's needs from a resource manager if you would find it useful. Best, Kathryn _________________________________________________________________ Kathryn Mohror, <mailto:ka...@ll...> ka...@ll..., <http://scalability.llnl.gov/> http://scalability.llnl.gov/ Scalability Team @ Lawrence Livermore National Laboratory, Livermore, CA, USA From: Holguin, Christopher A [mailto:chr...@in...] Sent: Tuesday, February 9, 2016 10:40 AM To: sca...@li... <mailto:sca...@li...> Subject: [Scalablecr-discuss] PMIx integration To whom it may concern: I am curious if anyone using or developing for SCR is working with PMIx ( https://github.com/pmix/master ) and considering porting SCR to take advantage of it. The idea behind using PMIx is that SCR would no longer have a specific dependency on any particular RM, assuming PMIx adoption goes well and is supported on the RM in question. I am currently investigating this as part of my role with Intel Corp. and just wanted to see if there has been any prior work done or if there is any work ongoing related to this. Thanks. -- Chris Holguin Intel Corporation |
From: Mohror, K. <mo...@ll...> - 2016-02-15 20:25:47
|
Hi Chris, Sorry for the delay in reply. We are not aware of anyone doing work in this area, and it sounds like it could be interesting to pursue. We'd be happy to meet with you to discuss SCR's needs from a resource manager if you would find it useful. Best, Kathryn _________________________________________________________________ Kathryn Mohror, ka...@ll...<mailto:ka...@ll...>, http://scalability.llnl.gov/ Scalability Team @ Lawrence Livermore National Laboratory, Livermore, CA, USA From: Holguin, Christopher A [mailto:chr...@in...] Sent: Tuesday, February 9, 2016 10:40 AM To: sca...@li... Subject: [Scalablecr-discuss] PMIx integration To whom it may concern: I am curious if anyone using or developing for SCR is working with PMIx ( https://github.com/pmix/master ) and considering porting SCR to take advantage of it. The idea behind using PMIx is that SCR would no longer have a specific dependency on any particular RM, assuming PMIx adoption goes well and is supported on the RM in question. I am currently investigating this as part of my role with Intel Corp. and just wanted to see if there has been any prior work done or if there is any work ongoing related to this. Thanks. -- Chris Holguin Intel Corporation |
From: Holguin, C. A <chr...@in...> - 2016-02-09 18:40:04
|
To whom it may concern: I am curious if anyone using or developing for SCR is working with PMIx ( https://github.com/pmix/master ) and considering porting SCR to take advantage of it. The idea behind using PMIx is that SCR would no longer have a specific dependency on any particular RM, assuming PMIx adoption goes well and is supported on the RM in question. I am currently investigating this as part of my role with Intel Corp. and just wanted to see if there has been any prior work done or if there is any work ongoing related to this. Thanks. -- Chris Holguin Intel Corporation |
From: Adam T. M. <mo...@ll...> - 2016-01-26 23:45:12
|
Hi Maksym, We replaced CACEDESC with STORE. The user manual is correct in this case (Section 6.2). I've pushed a fix for this: https://github.com/hpc/scr/commit/f8e067ac3057c4c6b9f9a464431c3be913112117 Sorry about that. I obviously forgot to check that the example scr.user.conf file was up-to-date with the latest source code. -Adam Maksym Planeta wrote: > And a follow up question. > > The example proposes using CACHEDESC (or CACHE of you do the same > change as with CKPTDESC) keyword. The documentation proposes using > STORE keyword. > > Both the keyword seems to specify the same thing. > > Which one is the right one? > > On 01/26/2016 12:54 AM, Mohror, Kathryn wrote: > >> Hi Maksym, >> >>> >>> I decided to try out SCR. I compiled it and installed as specified >>> in the manual. >>> No I try to specify checkpoint descriptors in use configuration file. >> >> >> Glad to hear you're trying out SCR! >> >>> It turns out that the documentation describes other format, from >>> what an >>> example at https://github.com/hpc/scr/blob/master/scr.user.conf shows. >>> >>> For example, file doc/scr_users_manual.pdf does not contain keyword >>> CKPTDESC whatsoever. >>> >>> Could you tell me what is the correct format? >> >> >> It looks like you have uncovered a bug in the example scr.user.conf >> file. Please use the keyword CKPT as you found in the user's guide >> for those lines instead of CKPTDESC. >> >>> I tried to use the one which documentation specifies, but I get an >>> error which >>> tells, that I probably don't have enough nodes: >>> >>> SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner >>> processes for redundancy descriptor 0, disabling checkpoint, too few >>> nodes? >>> @ scr_reddesc.c:169 >>> >>> I definitely do, because I specify SET_SIZE=1 and create a job with >>> 4 nodes. >> >> >> Yes, this error is related to the SCR_SET_SIZE parameter. Try setting >> it to 8 and see if it works better. I believe the reason you get that >> message is because the set size needs to be greater to 1 for a >> redundancy scheme to work. >> >> Let me know if that helps! If not we can work some more on it. >> >> Kathryn >> >>> >>> I attach all the configuration files for completeness. >>> >>> -- >>> Regards, >>> Maksym Planeta >> > >------------------------------------------------------------------------ > >------------------------------------------------------------------------------ >Site24x7 APM Insight: Get Deep Visibility into Application Performance >APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month >Monitor end-to-end web transactions and take corrective actions now >Troubleshoot faster and improve end-user experience. Signup Now! >http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 > >------------------------------------------------------------------------ > >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Maksym P. <mpl...@os...> - 2016-01-26 09:41:10
|
And a follow up question. The example proposes using CACHEDESC (or CACHE of you do the same change as with CKPTDESC) keyword. The documentation proposes using STORE keyword. Both the keyword seems to specify the same thing. Which one is the right one? On 01/26/2016 12:54 AM, Mohror, Kathryn wrote: > Hi Maksym, > >> >> I decided to try out SCR. I compiled it and installed as specified in the manual. >> No I try to specify checkpoint descriptors in use configuration file. > > Glad to hear you're trying out SCR! > >> It turns out that the documentation describes other format, from what an >> example at https://github.com/hpc/scr/blob/master/scr.user.conf shows. >> >> For example, file doc/scr_users_manual.pdf does not contain keyword >> CKPTDESC whatsoever. >> >> Could you tell me what is the correct format? > > It looks like you have uncovered a bug in the example scr.user.conf file. Please use the keyword CKPT as you found in the user's guide for those lines instead of CKPTDESC. > >> I tried to use the one which documentation specifies, but I get an error which >> tells, that I probably don't have enough nodes: >> >> SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner >> processes for redundancy descriptor 0, disabling checkpoint, too few nodes? >> @ scr_reddesc.c:169 >> >> I definitely do, because I specify SET_SIZE=1 and create a job with 4 nodes. > > Yes, this error is related to the SCR_SET_SIZE parameter. Try setting it to 8 and see if it works better. I believe the reason you get that message is because the set size needs to be greater to 1 for a redundancy scheme to work. > > Let me know if that helps! If not we can work some more on it. > > Kathryn >> >> I attach all the configuration files for completeness. >> >> -- >> Regards, >> Maksym Planeta -- Regards, Maksym Planeta |
From: Maksym P. <mpl...@os...> - 2016-01-26 07:59:36
|
Thank you for the response. On 01/26/2016 12:54 AM, Mohror, Kathryn wrote: > Hi Maksym, > >> >> I decided to try out SCR. I compiled it and installed as specified in the manual. >> No I try to specify checkpoint descriptors in use configuration file. > > Glad to hear you're trying out SCR! > >> It turns out that the documentation describes other format, from what an >> example at https://github.com/hpc/scr/blob/master/scr.user.conf shows. >> >> For example, file doc/scr_users_manual.pdf does not contain keyword >> CKPTDESC whatsoever. >> >> Could you tell me what is the correct format? > > It looks like you have uncovered a bug in the example scr.user.conf file. Please use the keyword CKPT as you found in the user's guide for those lines instead of CKPTDESC. > CKPTDESC seems to be not the only dead keyword. CACHEDESC from example scr.user.conf is not used neither in the source nor in the documentation. >> I tried to use the one which documentation specifies, but I get an error which >> tells, that I probably don't have enough nodes: >> >> SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner >> processes for redundancy descriptor 0, disabling checkpoint, too few nodes? >> @ scr_reddesc.c:169 >> >> I definitely do, because I specify SET_SIZE=1 and create a job with 4 nodes. > > Yes, this error is related to the SCR_SET_SIZE parameter. Try setting it to 8 and see if it works better. I believe the reason you get that message is because the set size needs to be greater to 1 for a redundancy scheme to work. > > Let me know if that helps! If not we can work some more on it. > Changing the set size and removing GROUP=WORLD helped. Thank you. > Kathryn >> >> I attach all the configuration files for completeness. >> >> -- >> Regards, >> Maksym Planeta -- Regards, Maksym Planeta |
From: Adam T. M. <mo...@ll...> - 2016-01-26 00:54:19
|
Hello Maksym, FYI, I just pushed a commit to fix the error you uncovered in the scr.user.conf example file. Thanks for pointing this out. In addition to everything Kathryn mentioned, you'll also want to use GROUP=NODE for all of your checkpoints in your scr.user.conf file below. Here, the GROUP field refers to the group of processes that are likely to fail at the same time. SCR uses this information to pick processes for its redundancy sets. If you specify GROUP=NODE, that tells SCR that all procs on a compute node are likely to fail at the same time (all processes that share the same string returned by gethostname). -Adam Maksym Planeta wrote: > Hello, > > I decided to try out SCR. I compiled it and installed as specified in > the manual. No I try to specify checkpoint descriptors in use > configuration file. > > It turns out that the documentation describes other format, from what an > example at https://github.com/hpc/scr/blob/master/scr.user.conf shows. > > For example, file doc/scr_users_manual.pdf does not contain keyword > CKPTDESC whatsoever. > > Could you tell me what is the correct format? > > I tried to use the one which documentation specifies, but I get an > error which tells, that I probably don't have enough nodes: > > SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner > processes for redundancy descriptor 0, disabling checkpoint, too few > nodes? @ scr_reddesc.c:169 > > I definitely do, because I specify SET_SIZE=1 and create a job with 4 > nodes. > > I attach all the configuration files for completeness. > >------------------------------------------------------------------------ > >scr_srun: Started: Mon Jan 25 15:54:36 CET 2016 >scr_srun: prerun: Mon Jan 25 15:54:42 CET 2016 >scr_prerun: Started: Mon Jan 25 15:54:42 CET 2016 >scr_prerun: Ended: Mon Jan 25 15:54:42 CET 2016 >scr_prerun: secs: 0 >scr_prerun: exit code: 0 >scr_srun: RUN 1: Mon Jan 25 15:54:43 CET 2016 >SCR v1.1.8 ERROR: taurusi6325: scr_log_event: Missing username, jobname, or start time, disabling logging @ scr_log_event.c:204 >SCR v1.1.8 WARNING: rank 0 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 9 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 61 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 77 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 33 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 1 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 65 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 78 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 43 on taurusi6328: SCR v1.1.8 WARNING: rank 2 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 66 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 80 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 24 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 4 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 67 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 83 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 25 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 5 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 70 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 86 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 26 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 6 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 48 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 92 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 27 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 8 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 49 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 72 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 28 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 50 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 73 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 29 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 11 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 51 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 74 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 30 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 12 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 52 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 75 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 31 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 13 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 53 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 76 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 32 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 14 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 54 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 79 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 34 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 15 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 55 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 81 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 35 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 16 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 82 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 36 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 17 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 84 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 37 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 18 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 85 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 38 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 19 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 88 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 39 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 20 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 89 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 40 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 21 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 56 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 90 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 41 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 22 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 57 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 91 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 42 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 23 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 58 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 93 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 44 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 7 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 59 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 94 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 45 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 3 on taurusi6325: SCR v1.1.8 WARNING: rank 60 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 95 on taurusi6330: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 46 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 62 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 47 on taurusi6328: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 > >SCR v1.1.8 WARNING: rank 64 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 68 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 69 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 71 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 63 on taurusi6329: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 0 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 1 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 2 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 3 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 4 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 5 on taurusi6325: SCR v1.1.8 WARNING: rank 6 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 7 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 24 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 8 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 87 on taurusi6330: SCR v1.1.8 WARNING: rank 25 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 26 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 > >SCR v1.1.8 WARNING: rank 27 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 9 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 72 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 28 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 73 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 29 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 74 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 30 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 75 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 76 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 77 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 78 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 11 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 > >SCR v1.1.8 WARNING: rank 79 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 48 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 80 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 12 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 49 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 81 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 50 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 82 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 51 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 13 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 52 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 53 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 54 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 55 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 14 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 56 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 57 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 58 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 59 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 15 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 60 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 16 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 31 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 32 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 33 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 17 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 18 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 34 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 35 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 36 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 37 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 19 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 38 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 39 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 40 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 41 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 20 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 21 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 22 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 83 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 84 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 85 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 86 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 23 on taurusi6325: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 87 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 88 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 89 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 90 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 91 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 61 on taurusi6329: SCR v1.1.8 WARNING: rank 92 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 42 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 62 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 93 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 43 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 > >SCR v1.1.8 WARNING: rank 63 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 94 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 44 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 64 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 95 on taurusi6330: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 65 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 66 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 67 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 68 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 69 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 70 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 71 on taurusi6329: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 0 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 1 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 2 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 3 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 4 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 5 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 45 on taurusi6328: SCR v1.1.8 WARNING: rank 46 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 47 on taurusi6328: Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 6 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 24 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 25 on taurusi6328: SCR v1.1.8 WARNING: rank 7 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 > >SCR v1.1.8 WARNING: rank 26 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 48 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 27 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 8 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 49 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 28 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 50 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 51 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 52 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 9 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 53 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 54 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 55 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 11 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 56 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 1, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 57 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 > >SCR v1.1.8 WARNING: rank 58 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 72 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 73 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 12 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 74 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 75 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 29 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 13 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 76 on taurusi6330: SCR v1.1.8 WARNING: rank 30 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 77 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 31 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 78 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 79 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 14 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 32 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 33 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 15 on taurusi6325: SCR v1.1.8 WARNING: rank 34 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 35 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 36 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 37 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 16 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 38 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 39 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 17 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 18 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 19 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 20 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 59 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 80 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 60 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 81 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 21 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 61 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 82 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 62 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 83 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 63 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 84 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 22 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 64 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 85 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 40 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 65 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 86 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 41 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 23 on taurusi6325: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 66 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 87 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 42 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 67 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 68 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 88 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 69 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 89 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 70 on taurusi6329: SCR v1.1.8 WARNING: rank 90 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 71 on taurusi6329: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 91 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 92 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 93 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 94 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 95 on taurusi6330: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 > >SCR v1.1.8 WARNING: rank 0 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 1 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 2 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 3 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 43 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 44 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 4 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 45 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 46 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 47 on taurusi6328: Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169SCR v1.1.8 WARNING: rank 5 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 > >SCR v1.1.8 WARNING: rank 24 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 25 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 6 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 7 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 8 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >Failed to find partner processes for redundancy descriptor 2, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 >SCR v1.1.8 WARNING: rank 48 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 49 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 9 on taurusi6325: SCR v1.1.8 WARNING: rank 50 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 51 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 52 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 53 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 54 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 11 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 26 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 27 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 12 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 55 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 72 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 28 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 56 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 73 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 13 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 57 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 74 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 75 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 14 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 76 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 29 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 77 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 30 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 15 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 78 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 31 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 79 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 32 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 33 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 16 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 34 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 35 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 36 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 17 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 18 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 19 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 20 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 21 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 22 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 37 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 58 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 80 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 38 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 23 on taurusi6325: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 59 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 81 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 39 on taurusi6328: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 60 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 82 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 61 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 83 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 62 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 84 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 63 on taurusi6329: SCR v1.1.8 WARNING: rank 85 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 64 on taurusi6329: Failed to find partner processes for redundancy descriptor 3, disabling checkpoint, too few nodes? @ scr_reddesc.c:86 >SCR v1.1.8 WARNING: rank 86 on taurusi6330: Failed to find partner processes for redundancy descriptor 3, disab... [truncated message content] |
From: Mohror, K. <mo...@ll...> - 2016-01-25 23:54:39
|
Hi Maksym, > > I decided to try out SCR. I compiled it and installed as specified in the manual. > No I try to specify checkpoint descriptors in use configuration file. Glad to hear you're trying out SCR! > It turns out that the documentation describes other format, from what an > example at https://github.com/hpc/scr/blob/master/scr.user.conf shows. > > For example, file doc/scr_users_manual.pdf does not contain keyword > CKPTDESC whatsoever. > > Could you tell me what is the correct format? It looks like you have uncovered a bug in the example scr.user.conf file. Please use the keyword CKPT as you found in the user's guide for those lines instead of CKPTDESC. > I tried to use the one which documentation specifies, but I get an error which > tells, that I probably don't have enough nodes: > > SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner > processes for redundancy descriptor 0, disabling checkpoint, too few nodes? > @ scr_reddesc.c:169 > > I definitely do, because I specify SET_SIZE=1 and create a job with 4 nodes. Yes, this error is related to the SCR_SET_SIZE parameter. Try setting it to 8 and see if it works better. I believe the reason you get that message is because the set size needs to be greater to 1 for a redundancy scheme to work. Let me know if that helps! If not we can work some more on it. Kathryn > > I attach all the configuration files for completeness. > > -- > Regards, > Maksym Planeta |
From: Maksym P. <mpl...@os...> - 2016-01-25 16:10:20
|
Hello, I decided to try out SCR. I compiled it and installed as specified in the manual. No I try to specify checkpoint descriptors in use configuration file. It turns out that the documentation describes other format, from what an example at https://github.com/hpc/scr/blob/master/scr.user.conf shows. For example, file doc/scr_users_manual.pdf does not contain keyword CKPTDESC whatsoever. Could you tell me what is the correct format? I tried to use the one which documentation specifies, but I get an error which tells, that I probably don't have enough nodes: SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner processes for redundancy descriptor 0, disabling checkpoint, too few nodes? @ scr_reddesc.c:169 I definitely do, because I specify SET_SIZE=1 and create a job with 4 nodes. I attach all the configuration files for completeness. -- Regards, Maksym Planeta |
From: Adam T. M. <mo...@ll...> - 2014-08-18 19:11:22
|
Hi Wadud, Currently, SCR blocks all processes in a barrier at SCR_Start_checkpoint. The reason for this is to prevent processes from deleting any checkpoint data until all processes have at least reached the SCR_Start_checkpoint call. It could be that some process never reaches the call because it failed, in which case, we want to keep the existing checkpoint to restart the job. If you can store more than one checkpoint in cache at a time, this restriction can be relaxed, but for now we always invoke the barrier anyway. SCR_Complete_checkpoint is also implemented as a synchronous collective. We use this function to compute the redundancy data at the end of the checkpoint. This can be done quite efficiently by using all of the application processes. For checkpoints that are written down to the parallel file system in addition to cache, we do have the capability to copy the data from cache to the parallel file system asychronously. Writing to the parallel file system takes orders of magnitude more time than writing to cache, so there is a big benefit to this. Currently, one has to run an extra process on each node for this support. Having said all of that, it may be possible to support full asynchronous checkpointing, but I haven't thought through all of the details to be sure. -Adam Wadud Miah wrote: >Hi Adam, > >From what I understand, the SCR library checkpoints synchronously. Do you think it can be updated to write checkpoints asynchronously? I think asynchronous checkpoint scheme is still sort of synchronous as a single checkpoint can be written one at a time, so it has to wait at a barrier at the next checkpoint. Perhaps this can be user configured. > >I like how the library has been developed by prefixing all subroutine names with "SCR_". I like this practise and this is how the Hypre linear solver library has been written. > >Regards, >Wadud. > >-----Original Message----- >From: Wadud Miah [mailto:w....@qm...] >Sent: 18 August 2014 10:56 >To: Adam T. Moody >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hello Adam, > >Thanks so much for all your help. Is the version in Github the latest, i.e. 1.1.8? > >Regards, > >-----Original Message----- >From: Adam T. Moody [mailto:mo...@ll...] >Sent: 16 August 2014 00:05 >To: Wadud Miah >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hi Wadud, >Great, that helps. We're overdue for making an official 1.1-8 release, >so I'm glad you pulled the latest from github. > >pdsh is required in the scavenge phase. This is run from the >scr_postrun script in order to copy files from /tmp to the parallel file >system in the event of a failure. This is some functionality that would >need to be ported if you don't have it available on your system. >-Adam > > >Wadud Miah wrote: > > > >>Hi Adam, >> >>I obtained the latest version from git which contains the Fortran bindings. I noticed that configure worked even though I do not have pdsh installed. Will SCR still work without PDSH? >> >>Thanks for your help. >> >>-----Original Message----- >>From: Wadud Miah [mailto:w....@qm...] >>Sent: 15 August 2014 22:15 >>To: Adam T. Moody >>Cc: sca...@li... >>Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR >> >>Hello Adam, >> >>Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. >> >>Regards, >>Wadud. >> >>-----Original Message----- >>From: Adam T. Moody [mailto:mo...@ll...] >>Sent: 15 August 2014 21:09 >>To: Wadud Miah >>Cc: sca...@li... >>Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR >> >>Hi Wadud, >>I forgot to mention that you need link Fortran apps to -lscrf instead of >>-lscr. >> >>It's also helpful to look at the examples/test_ckpt.F for an example and >>see the makefiles.example for instructions on how it was compiled and >>linked. >>-Adam >> >> >>Adam T. Moody wrote: >> >> >> >> >> >>>Hi Wadud, >>>Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >>>are modeled after the Fortran bindings used for MPI, so if you're >>>familiar with MPI calls from Fortran, SCR calls will look familiar. All >>>functions are invoked with captial letters, and each function returns an >>>error code of type INTEGER in its last argument. You sould be able to >>>include scrf.h in your Fortran application and make calls to SCR functions. >>> >>>It's been a while since I've tested those, though, so let me know if you >>>hit any problems. >>>-Adam >>> >>> >>>Wadud Miah wrote: >>> >>> >>> >>> >>> >>> >>> >>>>Hello, >>>> >>>>Will Fortran bindings be available for SCR? >>>> >>>>Regards, >>>> >>>>------------------------------------------- >>>>Wadud Miah >>>>Research Computing Services (HPC) >>>> >>>> >>>> >>>> >>>>------------------------------------------------------------------------ >>>> >>>>------------------------------------------------------------------------------ >>>> >>>> >>>>------------------------------------------------------------------------ >>>> >>>>_______________________________________________ >>>>Scalablecr-discuss mailing list >>>>Sca...@li... >>>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>------------------------------------------------------------------------------ >>>_______________________________________________ >>>Scalablecr-discuss mailing list >>>Sca...@li... >>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>> >>> >>> >>> >>> >>> >>------------------------------------------------------------------------------ >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Adam T. M. <mo...@ll...> - 2014-08-18 18:44:03
|
Hello Wadud, You're welcome! The version on github is the latest. This version is what we will eventually tag as v1.1-8, but we still have a few things to finish. -Adam Wadud Miah wrote: >Hello Adam, > >Thanks so much for all your help. Is the version in Github the latest, i.e. 1.1.8? > >Regards, > >-----Original Message----- >From: Adam T. Moody [mailto:mo...@ll...] >Sent: 16 August 2014 00:05 >To: Wadud Miah >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hi Wadud, >Great, that helps. We're overdue for making an official 1.1-8 release, >so I'm glad you pulled the latest from github. > >pdsh is required in the scavenge phase. This is run from the >scr_postrun script in order to copy files from /tmp to the parallel file >system in the event of a failure. This is some functionality that would >need to be ported if you don't have it available on your system. >-Adam > > >Wadud Miah wrote: > > > >>Hi Adam, >> >>I obtained the latest version from git which contains the Fortran bindings. I noticed that configure worked even though I do not have pdsh installed. Will SCR still work without PDSH? >> >>Thanks for your help. >> >>-----Original Message----- >>From: Wadud Miah [mailto:w....@qm...] >>Sent: 15 August 2014 22:15 >>To: Adam T. Moody >>Cc: sca...@li... >>Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR >> >>Hello Adam, >> >>Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. >> >>Regards, >>Wadud. >> >>-----Original Message----- >>From: Adam T. Moody [mailto:mo...@ll...] >>Sent: 15 August 2014 21:09 >>To: Wadud Miah >>Cc: sca...@li... >>Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR >> >>Hi Wadud, >>I forgot to mention that you need link Fortran apps to -lscrf instead of >>-lscr. >> >>It's also helpful to look at the examples/test_ckpt.F for an example and >>see the makefiles.example for instructions on how it was compiled and >>linked. >>-Adam >> >> >>Adam T. Moody wrote: >> >> >> >> >> >>>Hi Wadud, >>>Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >>>are modeled after the Fortran bindings used for MPI, so if you're >>>familiar with MPI calls from Fortran, SCR calls will look familiar. All >>>functions are invoked with captial letters, and each function returns an >>>error code of type INTEGER in its last argument. You sould be able to >>>include scrf.h in your Fortran application and make calls to SCR functions. >>> >>>It's been a while since I've tested those, though, so let me know if you >>>hit any problems. >>>-Adam >>> >>> >>>Wadud Miah wrote: >>> >>> >>> >>> >>> >>> >>> >>>>Hello, >>>> >>>>Will Fortran bindings be available for SCR? >>>> >>>>Regards, >>>> >>>>------------------------------------------- >>>>Wadud Miah >>>>Research Computing Services (HPC) >>>> >>>> >>>> >>>> >>>>------------------------------------------------------------------------ >>>> >>>>------------------------------------------------------------------------------ >>>> >>>> >>>>------------------------------------------------------------------------ >>>> >>>>_______________________________________________ >>>>Scalablecr-discuss mailing list >>>>Sca...@li... >>>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>------------------------------------------------------------------------------ >>>_______________________________________________ >>>Scalablecr-discuss mailing list >>>Sca...@li... >>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>> >>> >>> >>> >>> >>> >>------------------------------------------------------------------------------ >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > > |
From: Wadud M. <w....@qm...> - 2014-08-18 10:52:06
|
Hi Adam, >From what I understand, the SCR library checkpoints synchronously. Do you think it can be updated to write checkpoints asynchronously? I think asynchronous checkpoint scheme is still sort of synchronous as a single checkpoint can be written one at a time, so it has to wait at a barrier at the next checkpoint. Perhaps this can be user configured. I like how the library has been developed by prefixing all subroutine names with "SCR_". I like this practise and this is how the Hypre linear solver library has been written. Regards, Wadud. -----Original Message----- From: Wadud Miah [mailto:w....@qm...] Sent: 18 August 2014 10:56 To: Adam T. Moody Cc: sca...@li... Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR Hello Adam, Thanks so much for all your help. Is the version in Github the latest, i.e. 1.1.8? Regards, -----Original Message----- From: Adam T. Moody [mailto:mo...@ll...] Sent: 16 August 2014 00:05 To: Wadud Miah Cc: sca...@li... Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR Hi Wadud, Great, that helps. We're overdue for making an official 1.1-8 release, so I'm glad you pulled the latest from github. pdsh is required in the scavenge phase. This is run from the scr_postrun script in order to copy files from /tmp to the parallel file system in the event of a failure. This is some functionality that would need to be ported if you don't have it available on your system. -Adam Wadud Miah wrote: >Hi Adam, > >I obtained the latest version from git which contains the Fortran bindings. I noticed that configure worked even though I do not have pdsh installed. Will SCR still work without PDSH? > >Thanks for your help. > >-----Original Message----- >From: Wadud Miah [mailto:w....@qm...] >Sent: 15 August 2014 22:15 >To: Adam T. Moody >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hello Adam, > >Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. > >Regards, >Wadud. > >-----Original Message----- >From: Adam T. Moody [mailto:mo...@ll...] >Sent: 15 August 2014 21:09 >To: Wadud Miah >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hi Wadud, >I forgot to mention that you need link Fortran apps to -lscrf instead of >-lscr. > >It's also helpful to look at the examples/test_ckpt.F for an example and >see the makefiles.example for instructions on how it was compiled and >linked. >-Adam > > >Adam T. Moody wrote: > > > >>Hi Wadud, >>Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >>are modeled after the Fortran bindings used for MPI, so if you're >>familiar with MPI calls from Fortran, SCR calls will look familiar. All >>functions are invoked with captial letters, and each function returns an >>error code of type INTEGER in its last argument. You sould be able to >>include scrf.h in your Fortran application and make calls to SCR functions. >> >>It's been a while since I've tested those, though, so let me know if you >>hit any problems. >>-Adam >> >> >>Wadud Miah wrote: >> >> >> >> >> >>>Hello, >>> >>>Will Fortran bindings be available for SCR? >>> >>>Regards, >>> >>>------------------------------------------- >>>Wadud Miah >>>Research Computing Services (HPC) >>> >>> >>> >>> >>>------------------------------------------------------------------------ >>> >>>------------------------------------------------------------------------------ >>> >>> >>>------------------------------------------------------------------------ >>> >>>_______________________________________________ >>>Scalablecr-discuss mailing list >>>Sca...@li... >>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>> >>> >>> >>> >>> >>> >>------------------------------------------------------------------------------ >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > ------------------------------------------------------------------------------ _______________________________________________ Scalablecr-discuss mailing list Sca...@li... https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss |
From: Wadud M. <w....@qm...> - 2014-08-18 09:56:33
|
Hello Adam, Thanks so much for all your help. Is the version in Github the latest, i.e. 1.1.8? Regards, -----Original Message----- From: Adam T. Moody [mailto:mo...@ll...] Sent: 16 August 2014 00:05 To: Wadud Miah Cc: sca...@li... Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR Hi Wadud, Great, that helps. We're overdue for making an official 1.1-8 release, so I'm glad you pulled the latest from github. pdsh is required in the scavenge phase. This is run from the scr_postrun script in order to copy files from /tmp to the parallel file system in the event of a failure. This is some functionality that would need to be ported if you don't have it available on your system. -Adam Wadud Miah wrote: >Hi Adam, > >I obtained the latest version from git which contains the Fortran bindings. I noticed that configure worked even though I do not have pdsh installed. Will SCR still work without PDSH? > >Thanks for your help. > >-----Original Message----- >From: Wadud Miah [mailto:w....@qm...] >Sent: 15 August 2014 22:15 >To: Adam T. Moody >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hello Adam, > >Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. > >Regards, >Wadud. > >-----Original Message----- >From: Adam T. Moody [mailto:mo...@ll...] >Sent: 15 August 2014 21:09 >To: Wadud Miah >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hi Wadud, >I forgot to mention that you need link Fortran apps to -lscrf instead of >-lscr. > >It's also helpful to look at the examples/test_ckpt.F for an example and >see the makefiles.example for instructions on how it was compiled and >linked. >-Adam > > >Adam T. Moody wrote: > > > >>Hi Wadud, >>Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >>are modeled after the Fortran bindings used for MPI, so if you're >>familiar with MPI calls from Fortran, SCR calls will look familiar. All >>functions are invoked with captial letters, and each function returns an >>error code of type INTEGER in its last argument. You sould be able to >>include scrf.h in your Fortran application and make calls to SCR functions. >> >>It's been a while since I've tested those, though, so let me know if you >>hit any problems. >>-Adam >> >> >>Wadud Miah wrote: >> >> >> >> >> >>>Hello, >>> >>>Will Fortran bindings be available for SCR? >>> >>>Regards, >>> >>>------------------------------------------- >>>Wadud Miah >>>Research Computing Services (HPC) >>> >>> >>> >>> >>>------------------------------------------------------------------------ >>> >>>------------------------------------------------------------------------------ >>> >>> >>>------------------------------------------------------------------------ >>> >>>_______________________________________________ >>>Scalablecr-discuss mailing list >>>Sca...@li... >>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>> >>> >>> >>> >>> >>> >>------------------------------------------------------------------------------ >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Adam T. M. <mo...@ll...> - 2014-08-15 23:05:22
|
Hi Wadud, Great, that helps. We're overdue for making an official 1.1-8 release, so I'm glad you pulled the latest from github. pdsh is required in the scavenge phase. This is run from the scr_postrun script in order to copy files from /tmp to the parallel file system in the event of a failure. This is some functionality that would need to be ported if you don't have it available on your system. -Adam Wadud Miah wrote: >Hi Adam, > >I obtained the latest version from git which contains the Fortran bindings. I noticed that configure worked even though I do not have pdsh installed. Will SCR still work without PDSH? > >Thanks for your help. > >-----Original Message----- >From: Wadud Miah [mailto:w....@qm...] >Sent: 15 August 2014 22:15 >To: Adam T. Moody >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hello Adam, > >Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. > >Regards, >Wadud. > >-----Original Message----- >From: Adam T. Moody [mailto:mo...@ll...] >Sent: 15 August 2014 21:09 >To: Wadud Miah >Cc: sca...@li... >Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR > >Hi Wadud, >I forgot to mention that you need link Fortran apps to -lscrf instead of >-lscr. > >It's also helpful to look at the examples/test_ckpt.F for an example and >see the makefiles.example for instructions on how it was compiled and >linked. >-Adam > > >Adam T. Moody wrote: > > > >>Hi Wadud, >>Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >>are modeled after the Fortran bindings used for MPI, so if you're >>familiar with MPI calls from Fortran, SCR calls will look familiar. All >>functions are invoked with captial letters, and each function returns an >>error code of type INTEGER in its last argument. You sould be able to >>include scrf.h in your Fortran application and make calls to SCR functions. >> >>It's been a while since I've tested those, though, so let me know if you >>hit any problems. >>-Adam >> >> >>Wadud Miah wrote: >> >> >> >> >> >>>Hello, >>> >>>Will Fortran bindings be available for SCR? >>> >>>Regards, >>> >>>------------------------------------------- >>>Wadud Miah >>>Research Computing Services (HPC) >>> >>> >>> >>> >>>------------------------------------------------------------------------ >>> >>>------------------------------------------------------------------------------ >>> >>> >>>------------------------------------------------------------------------ >>> >>>_______________________________________________ >>>Scalablecr-discuss mailing list >>>Sca...@li... >>>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >>> >>> >>> >>> >>> >>> >>------------------------------------------------------------------------------ >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Wadud M. <w....@qm...> - 2014-08-15 21:59:51
|
Hi Adam, I obtained the latest version from git which contains the Fortran bindings. I noticed that configure worked even though I do not have pdsh installed. Will SCR still work without PDSH? Thanks for your help. -----Original Message----- From: Wadud Miah [mailto:w....@qm...] Sent: 15 August 2014 22:15 To: Adam T. Moody Cc: sca...@li... Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR Hello Adam, Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. Regards, Wadud. -----Original Message----- From: Adam T. Moody [mailto:mo...@ll...] Sent: 15 August 2014 21:09 To: Wadud Miah Cc: sca...@li... Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR Hi Wadud, I forgot to mention that you need link Fortran apps to -lscrf instead of -lscr. It's also helpful to look at the examples/test_ckpt.F for an example and see the makefiles.example for instructions on how it was compiled and linked. -Adam Adam T. Moody wrote: >Hi Wadud, >Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >are modeled after the Fortran bindings used for MPI, so if you're >familiar with MPI calls from Fortran, SCR calls will look familiar. All >functions are invoked with captial letters, and each function returns an >error code of type INTEGER in its last argument. You sould be able to >include scrf.h in your Fortran application and make calls to SCR functions. > >It's been a while since I've tested those, though, so let me know if you >hit any problems. >-Adam > > >Wadud Miah wrote: > > > >>Hello, >> >>Will Fortran bindings be available for SCR? >> >>Regards, >> >>------------------------------------------- >>Wadud Miah >>Research Computing Services (HPC) >> >> >> >> >>------------------------------------------------------------------------ >> >>------------------------------------------------------------------------------ >> >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > ------------------------------------------------------------------------------ _______________________________________________ Scalablecr-discuss mailing list Sca...@li... https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss |
From: Wadud M. <w....@qm...> - 2014-08-15 21:18:54
|
Hi Adam, I first learnt about SCR at SC '13 last year by attending one of the presentations. I really like the idea of using local storage and providing RAID-like resilience. I am currently using SGE, but I think it should also be ported to PBS and LSF which are the other popular schedulers. I am sure you can get some collaboration going with Altair and IBM. Regards, Wadud. -----Original Message----- From: Adam T. Moody [mailto:mo...@ll...] Sent: 15 August 2014 19:22 To: Wadud Miah Cc: sca...@li... Subject: Re: [Scalablecr-discuss] support for job schedulers Hello Wadud, Thanks for your enthusiasm! SCR is designed to be portable to different job schedulers. It currently runs with SLURM and in Cray's environment with aprun. I'd be happy to help you port it to another system if you're interested. Do you have a particular scheduler in mind? -Adam Wadud Miah wrote: >Hello, > >Will SCR support be available for other job schedulers? E.g. SGE, LSF or PBS? I think this is a great tool and should be integrated with other job schedulers. > >Regards, > >------------------------------------------- >Wadud Miah >Research Computing Services (HPC) >020 7882 8393 > > > > >------------------------------------------------------------------------ > >------------------------------------------------------------------------------ > > >------------------------------------------------------------------------ > >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Wadud M. <w....@qm...> - 2014-08-15 21:16:03
|
Hello Adam, Thanks for your reply. I cannot find the libscrf.so library and neither the example Fortran program examples/test_ckpt.F in my installation of version of 1.1.7. I had a look at the configure options and there is nothing there to indicate the building for the Fortran bindings. Which version do you have? I also assigned the environment variable F77 to ifort. Regards, Wadud. -----Original Message----- From: Adam T. Moody [mailto:mo...@ll...] Sent: 15 August 2014 21:09 To: Wadud Miah Cc: sca...@li... Subject: Re: [Scalablecr-discuss] Fortran bindings for SCR Hi Wadud, I forgot to mention that you need link Fortran apps to -lscrf instead of -lscr. It's also helpful to look at the examples/test_ckpt.F for an example and see the makefiles.example for instructions on how it was compiled and linked. -Adam Adam T. Moody wrote: >Hi Wadud, >Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >are modeled after the Fortran bindings used for MPI, so if you're >familiar with MPI calls from Fortran, SCR calls will look familiar. All >functions are invoked with captial letters, and each function returns an >error code of type INTEGER in its last argument. You sould be able to >include scrf.h in your Fortran application and make calls to SCR functions. > >It's been a while since I've tested those, though, so let me know if you >hit any problems. >-Adam > > >Wadud Miah wrote: > > > >>Hello, >> >>Will Fortran bindings be available for SCR? >> >>Regards, >> >>------------------------------------------- >>Wadud Miah >>Research Computing Services (HPC) >> >> >> >> >>------------------------------------------------------------------------ >> >>------------------------------------------------------------------------------ >> >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Adam T. M. <mo...@ll...> - 2014-08-15 20:09:25
|
Hi Wadud, I forgot to mention that you need link Fortran apps to -lscrf instead of -lscr. It's also helpful to look at the examples/test_ckpt.F for an example and see the makefiles.example for instructions on how it was compiled and linked. -Adam Adam T. Moody wrote: >Hi Wadud, >Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They >are modeled after the Fortran bindings used for MPI, so if you're >familiar with MPI calls from Fortran, SCR calls will look familiar. All >functions are invoked with captial letters, and each function returns an >error code of type INTEGER in its last argument. You sould be able to >include scrf.h in your Fortran application and make calls to SCR functions. > >It's been a while since I've tested those, though, so let me know if you >hit any problems. >-Adam > > >Wadud Miah wrote: > > > >>Hello, >> >>Will Fortran bindings be available for SCR? >> >>Regards, >> >>------------------------------------------- >>Wadud Miah >>Research Computing Services (HPC) >> >> >> >> >>------------------------------------------------------------------------ >> >>------------------------------------------------------------------------------ >> >> >>------------------------------------------------------------------------ >> >>_______________________________________________ >>Scalablecr-discuss mailing list >>Sca...@li... >>https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss >> >> >> >> > > >------------------------------------------------------------------------------ >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Adam T. M. <mo...@ll...> - 2014-08-15 18:22:33
|
Hello Wadud, Thanks for your enthusiasm! SCR is designed to be portable to different job schedulers. It currently runs with SLURM and in Cray's environment with aprun. I'd be happy to help you port it to another system if you're interested. Do you have a particular scheduler in mind? -Adam Wadud Miah wrote: >Hello, > >Will SCR support be available for other job schedulers? E.g. SGE, LSF or PBS? I think this is a great tool and should be integrated with other job schedulers. > >Regards, > >------------------------------------------- >Wadud Miah >Research Computing Services (HPC) >020 7882 8393 > > > > >------------------------------------------------------------------------ > >------------------------------------------------------------------------------ > > >------------------------------------------------------------------------ > >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Adam T. M. <mo...@ll...> - 2014-08-15 18:14:59
|
Hi Wadud, Fortran 77 bindings are available in src/scrf.c and src/scrf.h. They are modeled after the Fortran bindings used for MPI, so if you're familiar with MPI calls from Fortran, SCR calls will look familiar. All functions are invoked with captial letters, and each function returns an error code of type INTEGER in its last argument. You sould be able to include scrf.h in your Fortran application and make calls to SCR functions. It's been a while since I've tested those, though, so let me know if you hit any problems. -Adam Wadud Miah wrote: >Hello, > >Will Fortran bindings be available for SCR? > >Regards, > >------------------------------------------- >Wadud Miah >Research Computing Services (HPC) > > > > >------------------------------------------------------------------------ > >------------------------------------------------------------------------------ > > >------------------------------------------------------------------------ > >_______________________________________________ >Scalablecr-discuss mailing list >Sca...@li... >https://lists.sourceforge.net/lists/listinfo/scalablecr-discuss > > |
From: Wadud M. <w....@qm...> - 2014-08-15 13:18:02
|
Hello, Will SCR support be available for other job schedulers? E.g. SGE, LSF or PBS? I think this is a great tool and should be integrated with other job schedulers. Regards, ------------------------------------------- Wadud Miah Research Computing Services (HPC) 020 7882 8393 |
From: Wadud M. <w....@qm...> - 2014-08-15 13:00:30
|
Hello, Will Fortran bindings be available for SCR? Regards, ------------------------------------------- Wadud Miah Research Computing Services (HPC) |
From: Wadud M. <w....@qm...> - 2014-08-14 20:43:12
|
Hello Kathryn, Thanks for the quick reply. I completely missed that part of the documentation! Thanks again, Wadud. -----Original Message----- From: Kathryn Mohror [mailto:ka...@ll...] Sent: 14 August 2014 18:13 To: Wadud Miah; sca...@li... Subject: Re: [Scalablecr-discuss] path for cached checkpoint data Hi Wadud, Does the information in section 6.2 of the User¹s Guide have what you are looking for? The CACHEDESC variable sets the location(s) for the caches. I¹ll look into how we can make that information easier to find. Thanks! And let me know if that doesn¹t help, Kathryn _________________________________________________________________ Kathryn Mohror, ka...@ll..., http://scalability.llnl.gov/ Scalability Team @ Lawrence Livermore National Laboratory, Livermore, CA, USA On 8/14/14, 8:52 AM, "Wadud Miah" <w....@qm...> wrote: >Hello, > >Could someone please tell me the path to the cached checkpoint data? I >know the SCR_PREFIX variable is used to specify the path to checkpoint >data on the parallel file system, but is there a similar variable for >cached local checkpoint > data? I imagine it will live somewhere in /tmp and I can¹t find this >information anywhere in the documentation. > >Thanks in advance, > >------------------------------------------- >Wadud Miah >Research Computing Services (HPC) >020 7882 8393 > > |