Data Statements

Josh
2012-12-13
2012-12-13
• Josh - 2012-12-13

I am working with some data where the data itself is described by a mean and confidence interval. The raw data were adjusted to account for detection bias and I would like to carry this uncertainty forward into future analyses. I thought one way of doing this is to include a data statement where my data give rise to a stochastic realization from a normal distribution. The manual says that the data portion of the model is forward sampled once, which to me sounds like I am not really passing the data as a distribution, but just a single realization. Is a data statement the best solution when we have data with error? Secondly, if the data is read in as is done below does anyone have suggestions on where to round the data for a input into a discrete distribution? I give some simulated data and code to show my first thought on accomplishing this task.

```    sink("fixed.txt")
cat("
data{
for(i in 1:n){
size[i] ~ dnorm(size.mu[i], size.prec[i])
succ[i] ~ dnorm(succ.mu[i], succ.prec[i])
}
}
model{

phi ~ dbeta(1, 1)

for(i in 1:n){
succ[i] ~ dbin(phi, size[i])
}

}
",fill=TRUE)
sink()

jdata <- list(size.mu = round(runif(100, 100, 200)),
size.prec = 1/rnorm(100, 200, 10),
succ.mu = round(runif(100, 50, 90)),
succ.prec = 1/rnorm(100, 100, 5),
n = 4)
inits <- function(){list(phi = runif(1, 0, 1) )}
parameters <- c("phi")
#
out <- jags(jdata,
inits,
parameters,
"fixed.txt",
n.thin = 1,
n.chains = 3,
n.burnin = 1000,
n.iter = 3000)
```

The above will fail the discrete-value check for the binomial distribution. So, I thought I would round the values. Here I chose to round in the data statement, but does it make a difference? Additionally, this would appear to make more sense if I simply used a Poisson or other discrete valued distribution, but the model that corrected the raw data assumed normal everything and so I starting with normal. My way of solving the problem is posted below:

```    sink("fixed.txt")
cat("
data{
for(i in 1:n){
size[i] ~ dnorm(size.mu[i], size.prec[i])T(0,)
succ[i] ~ dnorm(succ.mu[i], succ.prec[i])T(0,)
size.obs[i] <- round(size[i])
succ.obs[i] <- round(succ[i])
}
}
model{

phi ~ dbeta(1, 1)

for(i in 1:n){
succ.obs[i] ~ dbin(phi, size.obs[i])
}

}
",fill=TRUE)
sink()

jdata <- list(size.mu = round(runif(100, 100, 200)),
size.prec = 1/rnorm(100, 200, 10),
succ.mu = round(runif(100, 50, 90)),
succ.prec = 1/rnorm(100, 100, 5),
n = 4)
inits <- function(){list(phi = runif(1, 0, 1) )}
parameters <- c("phi")
#
out <- jags(jdata,
inits,
parameters,
"fixed.txt",
n.thin = 1,
n.chains = 3,
n.burnin = 1000,
n.iter = 3000)
```

Thanks, Josh

• Martyn Plummer - 2012-12-13

You are right that each run will not take into account the uncertainty in your data. But if you pool the samples from, say, 20 runs then you should get a good approximation to the posterior distribution you are looking for.

Right now it is not possible to generate the data under one model and analyze it under another except by doing this multiple-imputation approach with a data set.

• Josh - 2012-12-13

Thank you for the quick response and the idea.