Updating Jags models with chains ran in parallel

Help
Edwin
2014-02-07
2014-03-30
  • Edwin
    Edwin
    2014-02-07

    Hi,
    I hope this is a right forum for my question (I apologize if not).
    I want to update models ran using the jags.parallel function from the R package R2Jags.
    I have used a supercomputer to ran models with 4 chains. I have ran them for 180,000 iterations without reaching convergence according to the Gelman & Rubin statistic. The walltime of the supercomputer (48h) does not allow me to run the models longer. Thus, I want to update the models but available functions (I am not a programmer), i.e. update.rjags, do not update in parallel. I would appreciate if someone can guide me on how to overcome this limitation. Alternatively: is it a valid approach to just re-run the models using the posteriors of previous runs to initialize the update?
    Many thanks in advance.

     
  • Matt Denwood
    Matt Denwood
    2014-02-07

    Hi Edwin

    I don't use the R2Jags package, but if you can get a full representation of the parameter states of the model (including RNG states) when it exits, then this can be used to start a new simulation effectively (almost) as if you hadn't stopped it. Using JAGS at the command line, this would involve using a 'parameters to' statement and then a 'parameters in' statement to re-initialise the model after restarting JAGS. Using the rjags interface you use state(…,internal=TRUE) on an existing JAGS model, and then pass the result back as the inits argument to a new jags.model() call. Alternatively, the runjags package allows you to extend models more directly using the extend.jags function, and can handle chains over parallel cores using method='parallel' or method='rjparallel', or method='snow' for using a distributed computing cluster (you can also change a model run on a single core to running over multiple cores by changing the method argument to extend.jags).

    Hope that helps,

    Matt

     
    Last edit: Matt Denwood 2014-02-07
  • Martyn Plummer
    Martyn Plummer
    2014-02-07

    First of all, make sure you upgrade to the latest version of rjags (version 3-12). There is a bug in older versions that may be slowing down your runs considerably.

    However, if that doesn't work, then it looks like you have a model that is too big to tackle with Gibbs sampling. A possible alternative is Stan, which is designed to scale up better. Otherwise you need to go back and think again about how to analyze your model.

    Edit: What I mean to say is that if you're not getting results after 48 hours then you need more than a technical fix.

     
    Last edit: Martyn Plummer 2014-02-07
  • Edwin
    Edwin
    2014-02-07

    Dear Matt and Martyn,
    Indeed the rjags version I have been using is 3-11. I hope some speed is gained by updating rjags. About STAN, when I started this project I remember reading that STAN was not much better than JAGS when handling binary/logistic models. Maybe that has changed now but (very unfortunately) I am short on time to learn on its use.
    Matt's R package seems to be a promising alternative. I will try it and post back for reference to others.
    Many thanks for your very useful replies.
    Edwin

     
    • Martyn Plummer
      Martyn Plummer
      2014-02-07

      If you have a binary logistic model than I hope you are using the glm module.

       
  • Matt Denwood
    Matt Denwood
    2014-02-07

    Hi Edwin

    This should facilitate extending your model, but as Martyn suggested this might not ultimately help with your problem - if you have not got anywhere near convergence after 180,000 iterations, you might well be in the same position after 360,000 iterations. Before throwing another 48 hours of computing time at it I would definitely explore some re-parameterisations of your model to improve identifiability and/or reduce autocorrelation (apologies for stating the obvious if you have already done this).

    Good luck!

    Matt

     
  • Edwin
    Edwin
    2014-02-07

    Hi,
    I hardly believe in obvious things, so I appreciate all details of your communications. In fact, I should have posted my question long time before. For instance, I am afraid I missed a basic step in optimizing my model fit, i.e. JAGS glm module. Thanks for pointing it out Martyn.
    In this respect, my problems with model convergence started when I added an increasing number of correlated predictors (standardized to z-scores though) to the second level of my hierarchical model. In general, I would expect the outcome of models with well-mixed chains not to depend on the sampler used but, if I am comparing model fits (DIC), should I re-run everything using the glm module?
    Best,
    Edwin

     
  • Edwin
    Edwin
    2014-03-27

    As promised, an update on model convergence problem and further questions.
    First, on the glm module, it seems R2jags loads it by default. Thus this was not part of the problem. After updating rjags, my first approach was just to run the model for longer but, as anticipated by Martyn and Matt, this did not help. I decided then to go back and simplify my model.

    Originally, I had modeled the variance-covariance matrix of the first level coefficients (Betas) of my hierarchical model using a scaled inverse Wishart distribution as suggested by Gelman and Hill (2007):

    model<-function (){
        for (i in 1:n){
            y[i] ~ dbern (p.bound[i])
            p.bound[i] <- max(0, min(1, p[i]))
            logit(p[i]) <- inprod(B[SPECIES[i],],X[i,]) 
        }
        for (j in 1:J){
            for (k in 1:K){
                B[j,k]<-xi[k]*B.raw[j,k]
            }
            B.raw[j,1:K] ~ dmnorm (B.raw.hat[j,], Tau.B.raw[,])
        }
        for (j in 1:J){
            for (k in 1:K){
                B.raw.hat[j,k] <- inprod(G.raw[k,], U[j,])
            }
        }
        for (k in 1:K){
            for (l in 1:L){
                G[k,l]<-xi[k]*G.raw[k,l]
                G.raw[k,l] ~ dnorm (0, .0001)
            }           
            xi[k]~ dunif (0, 100)
        }
        Tau.B.raw[1:K,1:K] ~ dwish(W[,], df)
        df <- K+1
        Sigma.B.raw[1:K,1:K] <- inverse(Tau.B.raw[,])
        for (k in 1:K){
            for (k.prime in 1:K){
                rho.B[k,k.prime] <- Sigma.B.raw[k,k.prime]/
                    sqrt(Sigma.B.raw[k,k]*Sigma.B.raw[k.prime,k.prime])
            }
            sigma.B[k] <- abs(xi[k])*sqrt(Sigma.B.raw[k,k])
        }
    }
    

    This model converges when the group level predictor matrix (i.e. G.raw[k,l]) contains only an intercept and one predictor but does not converge for cases with an intercept and two or more predictors. After discussing with some colleagues, I decided to simplify the model by modeling the betas independently (even tough some Betas are correlated):

    model.nocor<-function (){
        for (i in 1:n){
            y[i] ~ dbern (p.bound[i])
            p.bound[i] <- max(0, min(1, p[i]))
            logit(p[i]) <- inprod(B[GROUP[i],],X[i,]) 
        }
        for (j in 1:J){
            B[j,1]~ dnorm (B.hat[j,1], Tau.B[1])
            B[j,2]~ dnorm (B.hat[j,2], Tau.B[2])
            B[j,3]~ dnorm (B.hat[j,3], Tau.B[3])
            B[j,4]~ dnorm (B.hat[j,4], Tau.B[4])
            B[j,5]~ dnorm (B.hat[j,5], Tau.B[5])
        }   
        for (j in 1:J){
            for (k in 1:K){
                B.hat[j,k] <- inprod(G.hat[k,], U[j,])
            }
        }
        for (k in 1:K){
            for (l in 1:L){
                G[k,l]<- G.hat[k,l]
                G.hat[k,l] ~ dnorm (0, .0001)
            }           
        }
        for (k in 1:K){
            Tau.B[k] <-pow(sigma.B[k], -2)
            sigma.B[k] ~ dunif(0, 100)
        }
    }
    

    This model ran faster and did converge for models with an intercept and >1 group level predictors YET coefficients of group level predictors (G's) whose credible interval did not overlapped zero in the original model (for convergent cases, i.e. those with an intercept and one group level predictor only) now overlapped zero. This suggest to me that modeling the correlations among Betas is important but by doing so as explained I trade-off with model convergence.

    Thus my questions now are:
    Is there an alternative way to model the variance-covariance of the Betas without trading-off convergence?
    Can anyone suggests other modifications to the model that may improve convergence?

    Thanks again in advance for helping.
    P.S. Martyn, thanks for the re-formatting.

     
    Last edit: Edwin 2014-03-27
  • Martyn Plummer
    Martyn Plummer
    2014-03-27

    Try replacing this:

     y[i] ~ dbern (p.bound[i])
     p.bound[i] <- max(0, min(1, p[i]))
    

    with this:

     y[i] ~ dbern (p[i])
    

    Clamping the value of p is not necessary because it should always be between 0 and 1. What you are doing is counter-productive as it prevents the "glm" module from recognizing this as a logistic regression model. So although the 'glm' module is loaded, it is not being used.

     
    • Edwin
      Edwin
      2014-03-30

      Thanks again!
      Bounding p was indeed preventing the use of the "glm" module. Unbounding p speeded up model run; yet, for the same number of iterations (200000), convergence was actually worst for the unbounded p model than for the model bounding p between 0 and 1. In the unbounded model, the number of parameters with Gelman and Rubin's Rhat values greater than 1.1 increased.
      So I am still wondering... Is it possible to improve the modelling of the variance-covariance matrix of the betas? or this is the best way for a Gibbs sampler.
      Edwin