Re: [Scalablecr-discuss] User configuration file format
Brought to you by:
kathrynmohror,
moody20
From: Maksym P. <mpl...@os...> - 2016-01-26 09:41:10
|
And a follow up question. The example proposes using CACHEDESC (or CACHE of you do the same change as with CKPTDESC) keyword. The documentation proposes using STORE keyword. Both the keyword seems to specify the same thing. Which one is the right one? On 01/26/2016 12:54 AM, Mohror, Kathryn wrote: > Hi Maksym, > >> >> I decided to try out SCR. I compiled it and installed as specified in the manual. >> No I try to specify checkpoint descriptors in use configuration file. > > Glad to hear you're trying out SCR! > >> It turns out that the documentation describes other format, from what an >> example at https://github.com/hpc/scr/blob/master/scr.user.conf shows. >> >> For example, file doc/scr_users_manual.pdf does not contain keyword >> CKPTDESC whatsoever. >> >> Could you tell me what is the correct format? > > It looks like you have uncovered a bug in the example scr.user.conf file. Please use the keyword CKPT as you found in the user's guide for those lines instead of CKPTDESC. > >> I tried to use the one which documentation specifies, but I get an error which >> tells, that I probably don't have enough nodes: >> >> SCR v1.1.8 WARNING: rank 10 on taurusi6325: Failed to find partner >> processes for redundancy descriptor 0, disabling checkpoint, too few nodes? >> @ scr_reddesc.c:169 >> >> I definitely do, because I specify SET_SIZE=1 and create a job with 4 nodes. > > Yes, this error is related to the SCR_SET_SIZE parameter. Try setting it to 8 and see if it works better. I believe the reason you get that message is because the set size needs to be greater to 1 for a redundancy scheme to work. > > Let me know if that helps! If not we can work some more on it. > > Kathryn >> >> I attach all the configuration files for completeness. >> >> -- >> Regards, >> Maksym Planeta -- Regards, Maksym Planeta |