I'm curious if anyone has tips or advice on getting a large, 4-level
hierarchical model to converge. In my data, I have 9180 loci (data points)
from 1200 populations nested within 320 studies from 135 species in 17 taxa.
In other words, this doesn't seem like a simple problem. However, the model
has only three important coefficients: one for data points (alpha.L), one for
studies (alpha.y), and one for species (alpha.s). The full code is below.
After running 2 million iterations (a couple days of crunching), alpha.L still
isn't stationary, and two chains haven't converged. Any advice on how to get
this working better? Any help or tips would be much appreciated.
Background: I'm running this in JAGS under Mac OS X with a 2.53 GHz Intel Core
2 Duo. I've also tried the problem under OpenBugs on a Dell PC with Windows
XP, with similar results.
Model:
Model with four hierarchical levels: taxa, species, study, population
Data input:
n.taxa: number of taxa
n.species: number of species
n.pops: number of populations
n.studies: number of studies
n.data: number of data points
X: 0 if the locus isn't cross-species, 1 if it is
C: 0 if the species isn't collapsed, 1 if it is
N: 0 if the paper isn't a primer note, 1 if it is
taxon.list: the taxon that each species is from
species.list: the species that each study is from
study.list: the study that each population is from
pop.list: the population that each data point is from
L: each data point
Deterministic Nodes
tau: precision for data points within taxa
Shat: expected diversity to each species
Yhat: expected diversity for each study
Lhat: expected diversity for each data point
Output nodes:
lower.T: lower limit to taxon diversity means
upper.T: upper limit to taxon diversity means
alpha.L: the cross-species locus effect
alpha.s: the collapse effect
alpha.y: the primer note effect
sigma.S: standard deviation for species within taxa
sigma.Y: standard deviation for study within species
sigma.P: standard deviation for populations within studies
sigma.L: standard deviation for data points within populations
T: mean diversity for each taxon
S: mean diversity for each species
Y: mean diversity for each study
P: mean diversity for each population
model {
hyperpriors on the taxon means and the coefficients
lower.T ~ dunif(0,100)
upper.T ~ dunif(0,100)
alpha.L ~ dunif(-1, 1)
alpha.s ~ dunif(-1, 1)
alpha.y ~ dunif(-1, 1)
sigma.S ~ dunif(0, 100)
tau.S <- pow(sigma.S, -2)
sigma.Y ~ dunif(0, 100)
tau.Y <- pow(sigma.Y, -2)
sigma.P ~ dunif(0, 100)
tau.P <- pow(sigma.P, -2)
sigma.L ~ dunif(0, 100)
tau.L <- pow(sigma.L,-2)
go through each taxon
for (t in 1:n.taxa){
T is mean taxon diversity
T ~ dunif(lower.T, upper.T)
}
go through each species
for (s in 1:n.species){
Code the collapse effect as a proportional impact on diversity
Shat <- (1 + alpha.sC)T[taxon.list]
S is mean species diversity
S ~ dnorm(Shat, tau.S)
}
go through each study
for (y in 1:n.studies){
Primer note effect is a proportional effect
Yhat <- (1 + alpha.y * N) * S[species.list]
Y is mean study diversity
Y ~ dnorm(Yhat, tau.Y)
}
go through each population
for (p in 1:n.pops){
P
~
dnorm(Y[study.list
], tau.P)
}
go through each data point
for (i in 1:n.data){
Actual data points are distributed around the expected value for the species
plus an adjustment for whether or not the data point is from a cross-species
locus
Lhat_ <- (1 + alpha.LX_)P[pop.list_]
L_ ~ dnorm(Lhat_, tau.L)
}
}
Regards,
Malin_
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Try flattening your model a bit by centring your random effects on zero.
Instead of this:
# Code the collapse effect as a proportional impact on diversity
Shat[t]<-(1+ alpha.s*C[t])*T[taxon.list[t]]# S is mean species diversity
S[t]~ dnorm(Shat[t], tau.S)# Code the collapse effect as a proportional impact on diversity
Shat[t]<-(1+ alpha.s*C[t])*T[taxon.list[t]]
Serror[t]~ dnorm(0, tau.S)# S is mean species diversity
S[t]<- Shat[it]+ Serror[t]
and so on. NB I changed the index from s to t to stop it being interpreted as
BBCode strikethrough.
You can also try loading the glm model, which will give you block updating,
but the compiler might choke on the model.
Martyn
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Anonymous
-
2011-04-06
Hi Martyn,
Thanks for the good suggestion. Strangely, centering the random effects seemed
to make it run about 3x slower on a test piece of data. Perhaps because the
graph size increases?
I did have success speeding up convergence dramatically when I changed 3
things:
1) Used empirical means and standard deviations for the initial values of the
groups means (T, S, Y, and P) and standard deviations (sigma.L, sigma.P,
sigma.Y, sigma.S). I still use random initial values for the coefficients I
care about (alpha.s, alpha.y, and alpha.L).
2) Narrowed the prior for T and recoded without upper.T and lower.T:
# go through each taxon
for (t in 1:n.taxa){
# T is mean taxon diversity
T[t] ~ dunif(0, 10)
}
3) Loaded the glm module
I'm pretty sure #1 is the bit that sped this up so much. Can you see anything
wrong with this approach?
Thanks,
Malin
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
The automatically generated initial values can't be right for every problem,
so yes you should choose your own starting values when you think it
appropriate.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I'm curious if anyone has tips or advice on getting a large, 4-level
hierarchical model to converge. In my data, I have 9180 loci (data points)
from 1200 populations nested within 320 studies from 135 species in 17 taxa.
In other words, this doesn't seem like a simple problem. However, the model
has only three important coefficients: one for data points (alpha.L), one for
studies (alpha.y), and one for species (alpha.s). The full code is below.
After running 2 million iterations (a couple days of crunching), alpha.L still
isn't stationary, and two chains haven't converged. Any advice on how to get
this working better? Any help or tips would be much appreciated.
Background: I'm running this in JAGS under Mac OS X with a 2.53 GHz Intel Core
2 Duo. I've also tried the problem under OpenBugs on a Dell PC with Windows
XP, with similar results.
Model:
Model with four hierarchical levels: taxa, species, study, population
Data input:
n.taxa: number of taxa
n.species: number of species
n.pops: number of populations
n.studies: number of studies
n.data: number of data points
X: 0 if the locus isn't cross-species, 1 if it is
C: 0 if the species isn't collapsed, 1 if it is
N: 0 if the paper isn't a primer note, 1 if it is
taxon.list: the taxon that each species is from
species.list: the species that each study is from
study.list: the study that each population is from
pop.list: the population that each data point is from
L: each data point
Deterministic Nodes
tau: precision for data points within taxa
Shat: expected diversity to each species
Yhat: expected diversity for each study
Lhat: expected diversity for each data point
Output nodes:
lower.T: lower limit to taxon diversity means
upper.T: upper limit to taxon diversity means
alpha.L: the cross-species locus effect
alpha.s: the collapse effect
alpha.y: the primer note effect
sigma.S: standard deviation for species within taxa
sigma.Y: standard deviation for study within species
sigma.P: standard deviation for populations within studies
sigma.L: standard deviation for data points within populations
T: mean diversity for each taxon
S: mean diversity for each species
Y: mean diversity for each study
P: mean diversity for each population
model {
hyperpriors on the taxon means and the coefficients
lower.T ~ dunif(0,100)
upper.T ~ dunif(0,100)
alpha.L ~ dunif(-1, 1)
alpha.s ~ dunif(-1, 1)
alpha.y ~ dunif(-1, 1)
sigma.S ~ dunif(0, 100)
tau.S <- pow(sigma.S, -2)
sigma.Y ~ dunif(0, 100)
tau.Y <- pow(sigma.Y, -2)
sigma.P ~ dunif(0, 100)
tau.P <- pow(sigma.P, -2)
sigma.L ~ dunif(0, 100)
tau.L <- pow(sigma.L,-2)
go through each taxon
for (t in 1:n.taxa){
T is mean taxon diversity
T ~ dunif(lower.T, upper.T)
}
go through each species
for (s in 1:n.species){
Code the collapse effect as a proportional impact on diversity
Shat
<- (1 + alpha.sC)T[taxon.list]S is mean species diversity
S
~ dnorm(Shat, tau.S)}
go through each study
for (y in 1:n.studies){
Primer note effect is a proportional effect
Yhat <- (1 + alpha.y * N) * S[species.list]
Y is mean study diversity
Y ~ dnorm(Yhat, tau.Y)
}
go through each population
for (p in 1:n.pops){
P~dnorm(Y[study.list
], tau.P)}
go through each data point
for (i in 1:n.data){
Actual data points are distributed around the expected value for the species
plus an adjustment for whether or not the data point is from a cross-species
locus
Lhat_ <- (1 + alpha.LX_)P[pop.list_]
L_ ~ dnorm(Lhat_, tau.L)
}
}
Regards,
Malin_Try flattening your model a bit by centring your random effects on zero.
Instead of this:
and so on. NB I changed the index from s to t to stop it being interpreted as
BBCode strikethrough.
You can also try loading the glm model, which will give you block updating,
but the compiler might choke on the model.
Martyn
Hi Martyn,
Thanks for the good suggestion. Strangely, centering the random effects seemed
to make it run about 3x slower on a test piece of data. Perhaps because the
graph size increases?
I did have success speeding up convergence dramatically when I changed 3
things:
1) Used empirical means and standard deviations for the initial values of the
groups means (T, S, Y, and P) and standard deviations (sigma.L, sigma.P,
sigma.Y, sigma.S). I still use random initial values for the coefficients I
care about (alpha.s, alpha.y, and alpha.L).
2) Narrowed the prior for T and recoded without upper.T and lower.T:
3) Loaded the glm module
I'm pretty sure #1 is the bit that sped this up so much. Can you see anything
wrong with this approach?
Thanks,
Malin
The automatically generated initial values can't be right for every problem,
so yes you should choose your own starting values when you think it
appropriate.