I use JAGS via rjags in R most days. It is one of my very few computationally intensive activities. I'm purchasing a new computer for the lab and am interested in getting something to maximize computational speed using JAGS. I generally run rjags in parallel by sending 1 chain to each of 3-8 cores (depending on desired # of chains). I know RAM will be very important, but I'm wondering how important processor speed is and if a solid state drive would really make much difference? Also, I was curious if anyone knows a way to monitor read-write (I/O) operations in JAGS? Is everything getting stored in RAM or is some being written out to the hard drive?
Thanks for any info,
I would prioritize 1) # of cores; 2) fast RAM; 3) processor speed/cache size, though I have not compared the two indpendently. If you know about how much RAM your models tend to use, there's no sense in getting too much, but we've had the experience of having a small change in the model produce a large change (doubling or more) in RAM use, so it can come back to bite you. I hate doing model selection driven by software problems.
I can't give a definitive answer on disk use, but I know that it's minimal compared to RAM. The graph which JAGS repeatedly traverses is held in RAM, and that is also where samples are stored until written out to files and/or R. I can't see a solid state drive doing much for you.
I am on the very edge of getting a new computer for our lab, mainly for JAGS and other MCMC uses.
Until now I have prioritized the speed/core over the number of cores and over anything else. For this reason we have been planning to go for the i7-4000 series instead of Xeons or Opterons with higher number of slower cores.
The rationale has been that so far our problem with MCMC is slow mixing coupled with long computing time/iteration. The slow mixing means that we need to run very long chains with huge thinning. I have understood that a single chain in JAGS utilizes a single core so the speed/core would be the most crucial spec to get the job done as fast as possible.
But now seeing a recommendation to prioritize the number of cores made me stop for a while to ask a question: can the latest version of JAGS utilize more than one core for a single MCMC chain to reduce the time/iteration? If so, we may also need to update our plans for the new computer.
Regarding the amount of RAM and the number of iterations: we use coda.samples() in rjags to simulate a bunch of iterations, save them to disk and then call coda.samples() again. Thus the RAM usage can be controlled, with some extra overhead from the disk operation.
JAGS still uses one core but why not start 8 runs in parallel through rjags (or in a separate R session), let each one adapt individually from different starting values, then collect posterior samples from each core/JAGS session? That's what we do and that's why I suggested prioritizing the number of cores. I doubt that the speed differences among the higher-end processors are ever 8x. You do pay the cost in adaptation time per-core (which ideally JAGS will let us avoid some day) but as long as that's small compared to sampling time, then you're ok prioritizing cores.
Thanks that all makes sense. It's a big help. I've also had the experience of not monitoring certain parameters (just their sum or other derived value) simply because it would eat up all my RAM.
The R interface doesn't write anything to disk, with the exception of the model file. This is actually copied to a temporary file before being read in. This is done for technical reasons that I will probably circumvent in a future release.
In principle, there is no reason why one could not define a Monitor that simply streams the values out to a file instead of holding them in memory. If someone really needed this, it could be arranged.