## 'Matt trick' in JAGS document.SUBSCRIPTION_OPTIONS = { "thing": "thread", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

2013-06-09
2013-06-11

Hi All,

Over on the Stan list there has been some discussion of speeding up models through the use of what they refer to as the 'Matt trick.' I have been trying to implement this in JAGS, but I keep getting strange results. Just to be clear up front, I'm working with a logistic-binomial model, hence why the distribution of the priors looks a little bit different than what we might expect were we looking at a model with a continuous outcome.

Let's start with a simple random intercept model with a random intercept term u defined as follows:

```for(g in 1:G){
u[g] ~ dnorm(u0,sigma2inv)
}
u0 ~ dnorm(0,.01)
sigma ~ dunif(0,10)
sigma2inv<-pow(sigma,-2)
```

This is a pretty standard way of writing out the model. Assuming that I am correctly translating from Stan to JAGS, the Matt trick refers to the idea of rewriting the model above like so:

```for(g in 1:G){
u[g] <- u0 + e_u[g]*sigma
e_u[g] ~ dnorm(0,1)
}
u0 ~ dnorm(0,.01)
sigma ~ dunif(0,10)
```

Even if my code here is wrong, you should still be able to get a sense of what I'm trying to do. As I understand it, the idea behind this parameterization is to make the model more efficient by breaking up the correlation between e_u on the one hand, and u0 and sigma on the other.

When I run the model in JAGS using the above parameterization, I find that while I get good estimates of u0 and bx, I get totally off-the-wall estimates for sigma. I'm not sure whether it is a coding error on my part or whether this trick doesn't carry over to JAGS for some reason. I'm using the glm module. Is it possible that the glm-specific routine might not play well with the Matt trick?

My full model is as follows:

```model{
for(i in 1:N){
xb[i] <- bx*x[i]+u[group[i]]
logit(p[i]) <- xb[i]
y[i] ~ dbin(p[i],trials[i])
}
for(g in 1:G){
u[g] <- u0 + e_u[g]*sigma
e_u[g] ~ dnorm(0,1)
}
u0 ~ dnorm(0,.01)
sigma ~ dunif(0,10)
bx ~ dnorm(0,.01)
}
```

Any thoughts would be much appreciated.

More generally, the two parameterizations you show are referred to by Yu and Meng (http://dx.doi.org/10.1198/jcgs.2011.203main) as the sufficient parameterization and the ancillary parameterization. In the first, `u[1:G]` is sufficient for `sigma` and in the second `e_u[1:G]` is marginally independent of `sigma` (so ancillary, from the frequentist viewpoint). Neither parameterization is inherently superior and the worst case performance of either one can be bad. Yu and Meng propose an interweaving strategy that combines the best performance of both. Andrew Gelman's folded half-t prior for sigma has a redundant parameterization that factorizes along the lines of the ancillary/sufficient dichotomy.