Menu

Timeout waiting for coda files to be completed (v1.6)

runjags
Stephen C
2015-04-17
2015-09-16
  • Stephen C

    Stephen C - 2015-04-17

    Hi,
    I have a rather large MCMC run that I am trying to handle with runjags. I am running 3 chains with a little over 100k variables on a high-memory cluster node (with Rscript, so batch mode) using the 'parallel' method and JAGS 3.4.0. I know autorun.jags is recommended for large runs of only simulated data, but it really fits the bill for running in batch mode this way. In the past, with default 10k burn-in steps and 10k monitoring, the Gelman-Rubin calculation is prohibitively long. I tried to increase the number of monitoring steps to reduce the duty cycle of run/check, but ran into this:

    All chains have finished
    Waiting for the CODA files to be completed...
    Error in runjags.readin(directory = startinfo$directory, copy = (keep.jags.files & :
    Timed out waiting for the CODA files to be completed - the file size and modification times at 2015-04-16 15:09:54 were: sim.1/CODAchain1.txt : 195555328 : 2015-04-16 15:02:25, sim.2/CODAchain1.txt : 0 : 2015-04-16 15:03:26, sim.3/CODAchain1.txt : 0 : 2015-04-16 15:04:54. Please file a bug report (including this message) to the runjags package author.

    Also, the memory footprint is massive. I've run chains in separate instances and found single chains to max out at 6-8GB, but my batch system this is maxing out at 80GB.

    parsamples <- autorun.jags(dpeaqms.model,
    data=modeldata,
    n.chains=3,
    inits=initlist,
    jags="/opt/jags/3.4.0/bin/jags",
    monitor=c("Beta","Gamma","Sigma","Alpha","kappa"),
    method='parallel',
    plots=FALSE,
    startburnin=10000,
    startsample=30000,
    thin.sample=100,
    summarise=FALSE)

    So apart from asking for some general guidance, does thin.sample=100 and startsample=30000 mean that I am actually storing a trace 300 long for each variable or 30k long for each variable? This could be the source of the memory inflation.

    Thanks!

     
  • Matt Denwood

    Matt Denwood - 2015-04-18

    Hi Stephen

    With these settings, each CODA file is (or should be) storing 30,000 values of 100,000 variables which must then be read into R - this is causing problems with memory availability. The chains would later be thinned according to the thin.sample argument, but your simulation is not getting past the import stage. You can use the thin argument to control thinning in JAGS directly, but you will likely still have problems with 100,000 variables. Is it possible to monitor fewer variables?

    Alternatively, if you update runjags to version 2.0.1 (which is now available on CRAN), you will be able to use the read.monitor argument with the results.jags function to read one (or a few) of the monitored variables at a time from the same saved simulation folder, as long as you also use keep.jags.files=TRUE. This means you only have to run the simulation once but can still import and summarise the variables of interest in a few different batches. The new version of runjags is much better at handling imports of large CODA files such as these, so the error message you are seeing should be resolved.

    Hope that helps,

    Matt

     
  • Stephen C

    Stephen C - 2015-04-20

    Great, this clears up my misunderstanding with the thinning options. I'll try this out first with the older version of runjags before migrating to 2.0.

     
  • Stephen C

    Stephen C - 2015-04-29

    I'm running into a situation where, with run.jags, I am using the noread.monitor argument with the entire list of my monitor variables, and yet I am waiting a long time on "Reading coda files..." Is there anything else I can do to avoid this wait?

     
  • Matt Denwood

    Matt Denwood - 2015-04-30

    Long read times are either due to lots of monitored variables (not including any only specified in noread.monitor), a large number of samples (after thinning in JAGS), or lots of chains. Only reducing one/more these can reduce the read times I'm afraid.

    The only thing I can think of is that you might be giving some of the same variables to monitor and noread.monitor. If so, then monitor will take precedence and they WILL be read - there should be a warning about this but obviously the function has to be completed (or interrupted) first.

     
  • Stephen C

    Stephen C - 2015-04-30

    Ah, I see, I wanted some variables to be written to file (and so I put them in the monitor list), but not read back into R (so I also put them in the noread.monitor list). I see now that it was my own fault in misreading the document - it's clearly spelled out in the noread.monitor description. I discovered it is not possible to list every variable in noread.monitor. I need at least one in the regular monitor list to be loaded back into R or else an error is thrown.

    With results.jags and the read.monitor option, is it possible to read a subset of one variable? For instance,

    results<-results.jags('/runjagsfiles',read.monitor='Alpha[4:6]',recover.chains=TRUE)

    does not seem to work (plotting results doesn't change if i change the index, or even the monitor variable)

     
  • Matt Denwood

    Matt Denwood - 2015-05-01

    It certainly should work, e.g.:

    N <- 100
    X <- 1:N
    Y <- rnorm(N, 2*X + 10, 1)
    
    model <- "model {
    for(i in 1 : N){
    Y[i] ~ dnorm(true.y[i], precision)
    true.y[i] <- (m * X[i]) + c
    }
    m ~ dunif(-1000,1000)
    c ~ dunif(-1000,1000)
    precision ~ dexp(1)
    #data# X, Y, N
    #monitor# m
    }"
    
    path <- run.jags(model=model, noread.monitor=c("true.y"), keep=TRUE, method='bg')
    results.jags(path, read.monitor='true.y[10:11]')
    results.jags(path, read.monitor='true.y[12:13]')
    

    Can you give me any more details of the code you are using that doesn't work as expected?

     
  • Stephen C

    Stephen C - 2015-05-01

    Your example script shows the same problem I have been seeing. Look at the values in the true.y[10] and true.y[12] lines:

    >results.jags(path,read.monitor='true.y[10:11]')
    Simulation complete.  Reading coda files...
    Coda files loaded successfully
    Calculating summary statistics...
    Calculating the Gelman-Rubin statistic for 2 variables....
    Finished running the simulation
    
    JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
    
           Lower95 Median Upper95   Mean        SD Mode       MCerr MC%ofSD SSeff   AC.10
    true.y[10]  1.9957 2.0021  2.0085  2.002 0.0032881   -- 0.000078991     2.4  1733 0.19175
    true.y[11]  11.606 11.972  12.344 11.974   0.18824   --   0.0045023     2.4  1748 0.18826
    
              psrf
    true.y[10] 0.99998
    true.y[11] 0.99999
    
    Total time taken: 3.2 seconds
    
    > results.jags(path,read.monitor='true.y[12:13]')
    Simulation complete.  Reading coda files...
    Coda files loaded successfully
    Calculating summary statistics...
    Calculating the Gelman-Rubin statistic for 2 variables....
    Finished running the simulation
    
    JAGS model summary statistics from 20000 samples (chains = 2; adapt+burnin = 5000):
    
           Lower95 Median Upper95   Mean        SD Mode       MCerr MC%ofSD SSeff   AC.10
    true.y[12]  1.9957 2.0021  2.0085  2.002 0.0032881   -- 0.000078991     2.4  1733 0.19175
    true.y[13]  11.606 11.972  12.344 11.974   0.18824   --   0.0045023     2.4  1748 0.18826
    
              psrf
    true.y[12] 0.99998
    true.y[13] 0.99999
    
    Total time taken: 3 seconds
    
     
  • Matt Denwood

    Matt Denwood - 2015-05-01

    Yes, I see what you mean now - apologies. That is definitely a bug - I will investigate a fix and upload to sourceforge ASAP.

    In the meantime it appears that all specified indices must start with 1 otherwise they will be matched incorrectly, sorry.

     
  • Stephen C

    Stephen C - 2015-05-01

    No problem, I'm happy this exchange has been helpful.

     
  • Matt Denwood

    Matt Denwood - 2015-05-04

    Absolutely - thanks for the report! I will post a reply here when a fixed version is uploaded. In the meantime it is worth noting that the bug only exists for the read.monitor argument of results.jags, i.e. the following does work as expected:

    run.jags(model=model, monitor=c("true.y[1:2]"))
    run.jags(model=model, monitor=c("true.y[3:4]"))
    

    Thanks and apologies again :)

     
  • Matt Denwood

    Matt Denwood - 2015-09-16

    Just to confirm that the read.monitor bug is fixed in version 2.0.2 which is now on CRAN.

    Thanks again for the bug report.

    Matt

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.