Since there is always some tradeoff between running time and the quality of MCMC output, it seems to me that when running large analyses it's very useful to understand the relationship between amount of samples and running time so that one can predict and select the right tradeoff ahead of time. I had imagined that the running time would increase linearly with the number of samples, but I seem to be seeing something different.
For example, with one model/dataset (graph size of 683k), JAGS successfully completed 2 chains x 200 samples in under 2 hours. Then, when I tried to do the "full run" with 2 chains x 1800 samples I finally had to stop the program after running for over 30 hours, since I had no idea when it would ever complete, since it was well over the 2 hr x 9 = 18 hours that I thought it should take to complete. Everything with respect to CPU usage and memory usage seems to be the same between the two, and well within my system's capacity.
Am I misunderstanding the way the sampling is supposed to work, or is something unexpected possibly happening?
I'm continuing to try out different size runs and in the process I'm also getting some stronger confirmation of this phenomenon. A 2x600 sample run also completed in under 2 hours, and but a 2x1200 run did not complete within 22 hours. I am trying a 2x900 run now to see if it will finish anytime soon...
There is something wrong here. The run time should be proportional to the run length. It looks like JAGS is getting stuck in an infinite loop at some point.
I do try to avoid this possibility, as it can be difficult to tell if a long MCMC run has stopped completely. But I may have missed something. To find out, we need to run your example through a debugger. It is probably easiest if you send it to me (with complete data, initial values, and script).
While compiling this information, I noticed that the problem is likely due to the use of a parameters-in file rather than run length. It just so happened that every time I wanted to do the full-length run I wanted to save time so I was reading in the paramaters from the file. So I am crossing my fingers and hoping the long run will succeed this time around without parameters, we'll see...
In any case, no matter the reason, I still imagine you would want to be able to resolve the seemingly infinite loop, so here are the relevant files.
I have also noticed this issue, and was going to post about it.
I have found on a number of different classes of problems that the running time is worse than linear in the number of samples requested.
The problem is clearly visible if you watch the progress bar - the later % increments are significantly slower than the first ones. (This isn't psychological).
I typically notice this on problems requiring > 100,000 samples.
I am unsure about how the code works but wondered if this somehow relates to dynamic memory allocation. When the sampler is called, the number of samples is known, and so the space allocation for the storage can be done at the start. If the allocation is deferred, then incrementally adding storage could be a cause of the slowdown. Otherwise I have no idea.
The other thing that occurred to me: does JAGS keep a running mean of any of the parameters, or similar? As calculation of the mean is O(n), this would imply an O(n^2) work effort - if the running mean is implemented naively (there are O(1) updates available).
I have just undertaken a 5 million iteration (thinning factor 5) on a relatively simple problem. I have attached a spreadsheet which demonstrates the issue.
In the initial part of the run, the time per iteration is nearly constant as expected (implying that total running time is proportional to number of iterations). However, from around the 20% mark, the time per iteration increases approximately linearly, implying total run time quadratic in the number of iterations.
The degradation is pretty bad - an increase in the per-sample time of 50% in going from 30% of the total run size to 60%.
Actually, even if you look closely the initial portion degrades linearly with the number of iterations, the slope is just not as steep.
I have attached a test case which demonstrates this.
This situation is not unique to this file - I have seen it in all of the other models I have been working with.
This is obviously a real pain - it implies that you are better doing many small runs and combining them, but then you have to worry about issues around convergence. I guess you could always feed the sampler state from the end of one as init values into the next one. Better to avoid the whole thing if possible.
Your assistance greatly appreciated Martyn. I am running JAGS 3.4.0 on Windows on the 64-bit latest version of R.
Note that my script will produce an object that has a size of about 500MB
OK. I have it. I can confirm the quadratic behaviour of the example posted by VelocideX.
The bug only affects the rjags interface and not the command line interface, which I generally use for debugging. This explains why I couldn't reproduce the problem.
Under rjags, the progress bar refreshes every 100 iterations, so in a really long run of 5 million iterations, it will be refreshed 50000 times. The idea was to keep the update sensitive to a user interrupt, even on very long runs.
Due to a stupid bug in the library, each refresh triggers a reallocation of the memory of the monitored values, and of course the amount of stored data is O(N) leading to O(N^2) behaviour of the run time, with the reallocation dominating the actual MCMC sampling.
Version 3-12 of the rjags package - which should be available on CRAN soon - works around the reallocation bug by dropping the refresh frequency of the progress bar. As you can see from the attachment this makes the run time linear in the number of iterations. The next release of the JAGS library will eliminate the bug entirely.
Thanks Martyn. This will make a huge difference to my analyses (and those of everyone else!). Very grateful.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.