Differences in estimated coefficients using JAGS, WinBUGS and the glm function

2014-04-17
2014-05-05
  • Jérôme Guélat

    Hi,

    I was trying to give a really basic example during a lecture but I found some surprising differences between estimates provided by JAGS, WinBUGS and the glm function in R. WinBUGS and the glm function give the same results, but JAGS gives different estimates.

    Here's the example with a simple Poisson regression:

    set.seed(24)
    library(rjags)
    load.module("glm")
    
    # Generate count data using 2 covariates
    nsites <- 200
    alpha0 <- 1
    alpha1 <- 2
    alpha2 <- 3
    
    A <- runif(n = nsites, -1, 1)   # Site covariate 1
    B <- runif(n = nsites, -1, 1)   # Site covariate 2
    
    lam <- exp(alpha0 + alpha1*A + alpha2*B)
    y <- rpois(n = nsites, lambda = lam)
    
    # Analyze with the glm function (without covariate B)
    m <- glm(y ~ A, family = poisson)
    
    # Define the same model in the BUGS language
    cat(file = "Poisson_GLM.txt","
    model {
        # Priors
        alpha0 ~ dnorm(0, 0.0001)
        alpha1 ~ dnorm(0, 0.0001)
        # Likelihood
        for (i in 1:R){
            y[i] ~ dpois(lambda[i])
            log(lambda[i]) <- alpha0 + alpha1 * A[i]
        }
    }
    ")
    
    # Bundle data
    jags.data <- list(y = y, R = nsites, A = A)
    # Initial values
    inits <- function() list(alpha0 = rnorm(1, 0, 10), alpha1 = rnorm(1, 0, 10))
    # Parameters monitored
    params <- c("alpha0", "alpha1")
    # MCMC settings
    ni <- 10000   ;   nt <- 1   ;   nb <- 2000   ;  nc <- 3
    
    # Create a JAGS model object
    fit <- jags.model(file = "Poisson_GLM.txt", data = jags.data, inits = inits, n.chains = nc)
    # Burnin
    update(fit, nb)
    # Generate posterior samples
    samples <- coda.samples(model = fit, variable.names = params, n.iter = ni - nb, thin = nt)
    # Convergence looks good
    plot(samples)
    
    # Compute posterior summaries
    summfit <- summary(samples)
    
    # Compare estimates of the glm function with estimates of JAGS (not similar)
    m$coef
    summfit$statistics[,"Mean"]
    
    # Fit the model in WinBUGS
    library(R2WinBUGS)
    out <- bugs(data = jags.data, parameters.to.save = params, model.file = "Poisson_GLM.txt", inits = inits, n.iter = ni, n.chains = nc, n.burnin = nb, n.thin = nt, bugs.dir = "C:/Program Files (x86)/WinBUGS14")
    # Check the estimates (similar to the glm function)
    out$mean[1:2]
    

    As you can see, JAGS estimates seem to be wrong. If I use uniform priors for alpha0 and alpha1 (e.g. dunif(-1000, 1000)), then I get the correct estimates. It looks like the normal priors are influencing the results but I can't believe that the priors provide so much information with such a low precision. I also tried to use different initial values, but that didn't help.

    I'm now using R 3.0.3 with JAGS 3.4.0 on a Windows computer (but I also tried with JAGS 3.3.0 and 3.4.0 on a Mac).

    Can someone explain what's happening here? Thanks in advance!

    Jérôme

     
  • Martyn Plummer

    Martyn Plummer - 2014-04-17

    Oh dear. You have found a bug in the auxiliary mixture sampler from the glm module (It is not by chance that the glm module is not loaded by default). I'm surprised that it fails on such a simple example. I'll look into it and let you know.

     
    • Daniel Hocking

      Daniel Hocking - 2014-04-17

      Interesting that it works perfectly when the glm module is not loaded. When is the glm module needed?


      Daniel Hocking
      djhocking@gmail.com

      http://danieljhocking.wordpress.com/


      On Apr 17, 2014, at 12:39 PM, Martyn Plummer martyn_plummer@users.sf.net wrote:

      Oh dear. You have found a bug in the auxiliary mixture sampler from the glm module (It is not by chance that the glm module is not loaded by default). I'm surprised that it fails on such a simple example. I'll look into it and let you know.

      Differences in estimated coefficients using JAGS, WinBUGS and the glm function

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/mcmc-jags/discussion/610036/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
  • Jérôme Guélat

    Thanks a lot! I almost always load the glm module, I guess I should be more careful ;)

     
  • Martyn Plummer

    Martyn Plummer - 2014-04-25

    I've been scratching my head over this all week trying to find the bug. I believe that it isn't a bug as such. The code is an accurate transcription of the algorithm but there is a problem with the algorithm itself.

    The key to your example is that it is an ill-fitting model. Because the second covariate is missing, the data are highly over-dispersed compared to the Poisson model we are trying to fit. The auxiliary mixture sampling algorithm being used here relies on an implicit ``good model'' assumption which does not hold in this case. Briefly, the algorithm introduces some continuous latent variables and then approximates them by a finite mixture of normals. In your example, we are way out in the tail of these latent variables where the normal mixture approximation is rather poor.

    I believe that the problem can be overcome by a adding a Metropolis-Hastings acceptance step that accounts for the difference between the true distribution and its normal mixture approximation.

     
    • Daniel Hocking

      Daniel Hocking - 2014-04-28

      So what is happening differently when the glm module is not loaded because then the coefficient estimates in JAGS are correct?

      I believe that the problem can be overcome by a adding a Metropolis-Hastings acceptance step that accounts for the difference between the true distribution and its normal mixture approximation.

      Is this something that needs to be added to JAGS underlying code or something that can be adjusted when calling from rjags?

      Thanks,
      Dan


      Daniel Hocking
      djhocking@gmail.com

      http://www.nerunningservices.com
      http://danieljhocking.wordpress.com/


      On Apr 25, 2014, at 10:29 AM, Martyn Plummer martyn_plummer@users.sf.net wrote:

      I've been scratching my head over this all week trying to find the bug. I believe that it isn't a bug as such. The code is an accurate transcription of the algorithm but there is a problem with the algorithm itself.

      The key to your example is that it is an ill-fitting model. Because the second covariate is missing, the data are highly over-dispersed compared to the Poisson model we are trying to fit. The auxiliary mixture sampling algorithm being used here relies on an implicit ``good model'' assumption which does not hold in this case. Briefly, the algorithm introduces some continuous latent variables and then approximates them by a finite mixture of normals. In your example, we are way out in the tail of these latent variables where the normal mixture approximation is rather poor.

      I believe that the problem can be overcome by a adding a Metropolis-Hastings acceptance step that accounts for the difference between the true distribution and its normal mixture approximation.

      Differences in estimated coefficients using JAGS, WinBUGS and the glm function

      Sent from sourceforge.net because you indicated interest in https://sourceforge.net/p/mcmc-jags/discussion/610036/

      To unsubscribe from further messages, please visit https://sourceforge.net/auth/subscriptions/

       
      • Martyn Plummer

        Martyn Plummer - 2014-04-28

        Loading the glm module makes sampling methods available that are specific to generalized linear models. They make sampling more efficient by sampling all variables in the linear predictor together in a single block. The sampling algorithm in question was published by Sylvia Fruewirth-Schnatter and colleagues in 2009. Clearly it does not work on Jerome's model. The M-H step must be done within JAGS. I will let you know how I get on.

         
  • Jérôme Guélat

    Thanks a lot for the information and for the time you spent trying to find a bug!

    I'm an ecologist trying to do statistics and the fact that some MCMC sampling algorithms have an implicit "good model" assumption is new to me. I'm glad to have learned that and I will be (even) more careful with my models now!

     
  • Martyn Plummer

    Martyn Plummer - 2014-05-05

    To be honest, I didn't realise there was a "good model" assumption either, but your example proves it.

     

Log in to post a comment.