Hi there, I spent 2 weeks running a large model and when it finally finished I got the following error messages. I don't think the simulations failed, because when that has happened in the past I've seen a drop in CPU usage due to one or more cores not being used. That leaves the mysterious "error in processing results", plus the bug mentioned below associated with a missing file in the root simulation directory. I'm not sure how to fix either of these. I obviously want to prevent this from happening again—can you help? Thanks! I'm using runjags 2.0.4-2.
Running a pilot chain...
Calling 3 simulations using the parallel method...
Following the progress of chain 1 (the program will wait for all chains
to finish before continuing):
Welcome to JAGS 4.2.0 on Thu Mar 22 11:56:20 2018
JAGS is free software and comes with ABSOLUTELY NO WARRANTY
Loading module: basemod: ok
Loading module: bugs: ok
. . Reading data file data.txt
. Compiling model graph
Resolving undeclared variables
Allocating nodes
Graph information:
Observed stochastic nodes: 88197
Unobserved stochastic nodes: 449
Total graph size: 2809081
. Reading parameter file inits1.txt
. Initializing model
. Adapting 1000
-------------------------------------------------| 1000
++++++++++++++++++++++++++++++++++++++++++++++++++ 100%
Adaptation successful
. Updating 20000
-------------------------------------------------| 20000 ******* 100%
. . . . . . . . . . Updating 200000
-------------------------------------------------| 200000 ******* 100%
. . . . Updating 0
. Deleting model
.
All chains have finished
Error in readChar(con, 5L, useBytes = TRUE) : cannot open the connection
In addition: Warning messages:
1: In paste(output.string, "\"", variable[[i]], "\" <- ", value.string, :
closing unused connection 5 (<-localhost:11014)
2: In paste(output.string, "\"", variable[[i]], "\" <- ", value.string, :
closing unused connection 4 (<-localhost:11014)
3: In paste(output.string, "\"", variable[[i]], "\" <- ", value.string, :
closing unused connection 3 (<-localhost:11014)
4: You attempted to start parallel chains without setting different PRNG for each chain, which is not recommended. Different .RNG.name values have been added to each set of initial values.
5: In readChar(con, 5L, useBytes = TRUE) :
cannot open compressed file 'simchainsinfo.Rsave', probable reason 'No such file or directory'
Error in runjags.readin(directory = startinfo$directory, silent.jags = silent.jags, :
The required 'simchainsinfo.Rsave' file was not found in the root simulation directory, please file a bug report to the package developer!
Error in file(con, "r") : cannot open the connection
In addition: Warning message:
In file(con, "r") : cannot open file 'model.txt': No such file or directory
Note: Either one or more simulation(s) failed, or there was an error in
processing the results. You may be able to retrieve any successful
simulations using:
results.jags("/private/var/folders/hn/fgc198pn2bg54lhqdf4fdsgw0000gn/T/Rtmpk7SA4m/runjagsfiles2d96c20378e",
recover.chains=TRUE)
See the help file for that function for possible options.
To remove failed simulation folders use cleanup.jags() - this will be
run automatically when the runjags package is unloaded
Last edit: M Sethi 2018-04-05
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I run a model using run.jags() for 1 week and I have the same error with you when 2 chains have finished. Could you tell me how to solve this problem? Help!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi, this just happened again—exactly the same error messages—and I could really use support figuring it out because this is a huge model and it's insane that it gets all the way to the end and then crashes somehow. Running the following just returns the same messages again. Help!
Hi, this just happened again—exactly the same error messages—and I could really use support figuring it out because this is a huge model and it's insane that it gets all the way to the end and then crashes somehow. Running the following just returns the same messages again. Help!
Sorry for the delay in replying - I have been away from work for a couple of weeks.
I am not entirely sure what is going on as I cannot reproduce your results, but my best guess is that the error happened while reading the CODA files (possibly because you are trying to read a lot of parameters for 200,000 iterations?). Could you give me some more information on e.g. the model and monitored variables requested etc?
In the meantime some observations that might help:
1) If your simulation takes 2 weeks to run then I would avoid using the autorun.jags function - this is really intended for use with multiple (relatively fast) simulations rather for a single long run. You will have finer control over run length (and therefore probably save computational time) if you use run.jags (and extend.jags manually as needed) instead.
2) I would advise using the bgparallel method rather than the parallel method - this means you are not wasting resources tying up an R session just monitoring the JAGS simulation for 2 weeks. This also means you can safely quit R during the simulation, so it is easier to recover the results if R crashes for some reason.
3) If you need to monitor lots of parameters and have autocorrelation issues then I would advise using the thin option to avoid having to import as many iterations (i.e. use thin=20 and sample=10000 rather than thin=1 and sample=200000).
4) If you still have problems importing the results look at the read.monitor option to results.jags
In any case you are correct that it seems the simulations themselves completed OK - or at least the first one did based on the output you are seeing. The simchainsinfo.Rsave file is usually written after the chains are read, but if this fails then the file currently isn't written - I will fix this for the next version of runjags.
Hope that helps.
Matt
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I just wanted to pop in and say I have the same issue (Mar 5 2023). It was only a run of 3K iterations (burnin 100, adapt 2K, 3 chains) and the output shouldn't have been enormous as far as monitored variables. A bit of a bummer to wait 5 days and have it produce no output.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi there, I spent 2 weeks running a large model and when it finally finished I got the following error messages. I don't think the simulations failed, because when that has happened in the past I've seen a drop in CPU usage due to one or more cores not being used. That leaves the mysterious "error in processing results", plus the bug mentioned below associated with a missing file in the root simulation directory. I'm not sure how to fix either of these. I obviously want to prevent this from happening again—can you help? Thanks! I'm using runjags 2.0.4-2.
Last edit: M Sethi 2018-04-05
Hi,
I run a model using run.jags() for 1 week and I have the same error with you when 2 chains have finished. Could you tell me how to solve this problem? Help!
Hi, this just happened again—exactly the same error messages—and I could really use support figuring it out because this is a huge model and it's insane that it gets all the way to the end and then crashes somehow. Running the following just returns the same messages again. Help!
results.jags("/private/var/folders/hn/fgc198pn2bg54lhqdf4fdsgw0000gn/T/Rtmpk7SA4m/runjagsfiles2d96c20378e", recover.chains=TRUE)
Hi, this just happened again—exactly the same error messages—and I could really use support figuring it out because this is a huge model and it's insane that it gets all the way to the end and then crashes somehow. Running the following just returns the same messages again. Help!
results.jags("/private/var/folders/hn/fgc198pn2bg54lhqdf4fdsgw0000gn/T/Rtmpk7SA4m/runjagsfiles2d96c20378e", recover.chains=TRUE)
Sorry for the delay in replying - I have been away from work for a couple of weeks.
I am not entirely sure what is going on as I cannot reproduce your results, but my best guess is that the error happened while reading the CODA files (possibly because you are trying to read a lot of parameters for 200,000 iterations?). Could you give me some more information on e.g. the model and monitored variables requested etc?
In the meantime some observations that might help:
1) If your simulation takes 2 weeks to run then I would avoid using the autorun.jags function - this is really intended for use with multiple (relatively fast) simulations rather for a single long run. You will have finer control over run length (and therefore probably save computational time) if you use run.jags (and extend.jags manually as needed) instead.
2) I would advise using the bgparallel method rather than the parallel method - this means you are not wasting resources tying up an R session just monitoring the JAGS simulation for 2 weeks. This also means you can safely quit R during the simulation, so it is easier to recover the results if R crashes for some reason.
3) If you need to monitor lots of parameters and have autocorrelation issues then I would advise using the thin option to avoid having to import as many iterations (i.e. use thin=20 and sample=10000 rather than thin=1 and sample=200000).
4) If you still have problems importing the results look at the read.monitor option to results.jags
In any case you are correct that it seems the simulations themselves completed OK - or at least the first one did based on the output you are seeing. The simchainsinfo.Rsave file is usually written after the chains are read, but if this fails then the file currently isn't written - I will fix this for the next version of runjags.
Hope that helps.
Matt
I just wanted to pop in and say I have the same issue (Mar 5 2023). It was only a run of 3K iterations (burnin 100, adapt 2K, 3 chains) and the output shouldn't have been enormous as far as monitored variables. A bit of a bummer to wait 5 days and have it produce no output.