Hi,
I have a rather large MCMC run that I am trying to handle with runjags. I am running 3 chains with a little over 100k variables on a high-memory cluster node (with Rscript, so batch mode) using the 'parallel' method and JAGS 3.4.0. I know autorun.jags is recommended for large runs of only simulated data, but it really fits the bill for running in batch mode this way. In the past, with default 10k burn-in steps and 10k monitoring, the Gelman-Rubin calculation is prohibitively long. I tried to increase the number of monitoring steps to reduce the duty cycle of run/check, but ran into this:
All chains have finished
Waiting for the CODA files to be completed...
Error in runjags.readin(directory = startinfo$directory, copy = (keep.jags.files & :
Timed out waiting for the CODA files to be completed - the file size and modification times at 2015-04-16 15:09:54 were: sim.1/CODAchain1.txt : 195555328 : 2015-04-16 15:02:25, sim.2/CODAchain1.txt : 0 : 2015-04-16 15:03:26, sim.3/CODAchain1.txt : 0 : 2015-04-16 15:04:54. Please file a bug report (including this message) to the runjags package author.
Also, the memory footprint is massive. I've run chains in separate instances and found single chains to max out at 6-8GB, but my batch system this is maxing out at 80GB.
So apart from asking for some general guidance, does thin.sample=100 and startsample=30000 mean that I am actually storing a trace 300 long for each variable or 30k long for each variable? This could be the source of the memory inflation.
Thanks!
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
With these settings, each CODA file is (or should be) storing 30,000 values of 100,000 variables which must then be read into R - this is causing problems with memory availability. The chains would later be thinned according to the thin.sample argument, but your simulation is not getting past the import stage. You can use the thin argument to control thinning in JAGS directly, but you will likely still have problems with 100,000 variables. Is it possible to monitor fewer variables?
Alternatively, if you update runjags to version 2.0.1 (which is now available on CRAN), you will be able to use the read.monitor argument with the results.jags function to read one (or a few) of the monitored variables at a time from the same saved simulation folder, as long as you also use keep.jags.files=TRUE. This means you only have to run the simulation once but can still import and summarise the variables of interest in a few different batches. The new version of runjags is much better at handling imports of large CODA files such as these, so the error message you are seeing should be resolved.
Hope that helps,
Matt
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Great, this clears up my misunderstanding with the thinning options. I'll try this out first with the older version of runjags before migrating to 2.0.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I'm running into a situation where, with run.jags, I am using the noread.monitor argument with the entire list of my monitor variables, and yet I am waiting a long time on "Reading coda files..." Is there anything else I can do to avoid this wait?
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Long read times are either due to lots of monitored variables (not including any only specified in noread.monitor), a large number of samples (after thinning in JAGS), or lots of chains. Only reducing one/more these can reduce the read times I'm afraid.
The only thing I can think of is that you might be giving some of the same variables to monitor and noread.monitor. If so, then monitor will take precedence and they WILL be read - there should be a warning about this but obviously the function has to be completed (or interrupted) first.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Ah, I see, I wanted some variables to be written to file (and so I put them in the monitor list), but not read back into R (so I also put them in the noread.monitor list). I see now that it was my own fault in misreading the document - it's clearly spelled out in the noread.monitor description. I discovered it is not possible to list every variable in noread.monitor. I need at least one in the regular monitor list to be loaded back into R or else an error is thrown.
With results.jags and the read.monitor option, is it possible to read a subset of one variable? For instance,
Absolutely - thanks for the report! I will post a reply here when a fixed version is uploaded. In the meantime it is worth noting that the bug only exists for the read.monitor argument of results.jags, i.e. the following does work as expected:
Hi,
I have a rather large MCMC run that I am trying to handle with runjags. I am running 3 chains with a little over 100k variables on a high-memory cluster node (with Rscript, so batch mode) using the 'parallel' method and JAGS 3.4.0. I know autorun.jags is recommended for large runs of only simulated data, but it really fits the bill for running in batch mode this way. In the past, with default 10k burn-in steps and 10k monitoring, the Gelman-Rubin calculation is prohibitively long. I tried to increase the number of monitoring steps to reduce the duty cycle of run/check, but ran into this:
Also, the memory footprint is massive. I've run chains in separate instances and found single chains to max out at 6-8GB, but my batch system this is maxing out at 80GB.
So apart from asking for some general guidance, does thin.sample=100 and startsample=30000 mean that I am actually storing a trace 300 long for each variable or 30k long for each variable? This could be the source of the memory inflation.
Thanks!
Hi Stephen
With these settings, each CODA file is (or should be) storing 30,000 values of 100,000 variables which must then be read into R - this is causing problems with memory availability. The chains would later be thinned according to the thin.sample argument, but your simulation is not getting past the import stage. You can use the thin argument to control thinning in JAGS directly, but you will likely still have problems with 100,000 variables. Is it possible to monitor fewer variables?
Alternatively, if you update runjags to version 2.0.1 (which is now available on CRAN), you will be able to use the read.monitor argument with the results.jags function to read one (or a few) of the monitored variables at a time from the same saved simulation folder, as long as you also use keep.jags.files=TRUE. This means you only have to run the simulation once but can still import and summarise the variables of interest in a few different batches. The new version of runjags is much better at handling imports of large CODA files such as these, so the error message you are seeing should be resolved.
Hope that helps,
Matt
Great, this clears up my misunderstanding with the thinning options. I'll try this out first with the older version of runjags before migrating to 2.0.
I'm running into a situation where, with run.jags, I am using the noread.monitor argument with the entire list of my monitor variables, and yet I am waiting a long time on "Reading coda files..." Is there anything else I can do to avoid this wait?
Long read times are either due to lots of monitored variables (not including any only specified in noread.monitor), a large number of samples (after thinning in JAGS), or lots of chains. Only reducing one/more these can reduce the read times I'm afraid.
The only thing I can think of is that you might be giving some of the same variables to monitor and noread.monitor. If so, then monitor will take precedence and they WILL be read - there should be a warning about this but obviously the function has to be completed (or interrupted) first.
Ah, I see, I wanted some variables to be written to file (and so I put them in the monitor list), but not read back into R (so I also put them in the noread.monitor list). I see now that it was my own fault in misreading the document - it's clearly spelled out in the noread.monitor description. I discovered it is not possible to list every variable in noread.monitor. I need at least one in the regular monitor list to be loaded back into R or else an error is thrown.
With results.jags and the read.monitor option, is it possible to read a subset of one variable? For instance,
results<-results.jags('/runjagsfiles',read.monitor='Alpha[4:6]',recover.chains=TRUE)
does not seem to work (plotting results doesn't change if i change the index, or even the monitor variable)
It certainly should work, e.g.:
Can you give me any more details of the code you are using that doesn't work as expected?
Your example script shows the same problem I have been seeing. Look at the values in the true.y[10] and true.y[12] lines:
Yes, I see what you mean now - apologies. That is definitely a bug - I will investigate a fix and upload to sourceforge ASAP.
In the meantime it appears that all specified indices must start with 1 otherwise they will be matched incorrectly, sorry.
No problem, I'm happy this exchange has been helpful.
Absolutely - thanks for the report! I will post a reply here when a fixed version is uploaded. In the meantime it is worth noting that the bug only exists for the read.monitor argument of results.jags, i.e. the following does work as expected:
Thanks and apologies again :)
Just to confirm that the read.monitor bug is fixed in version 2.0.2 which is now on CRAN.
Thanks again for the bug report.
Matt