Martyn Plummer - 2014-01-14

I took a somewhat longer break over the holidays...

This is an interesting question. But before giving you the answer, I need to fix your model:

```model{
beta[1] ~ dnorm(0, 1.5)
beta[2] ~ dnorm(0, 1.5)
###p.obs ~ dbeta(1, 1)
p.obs <- 0.7
prob ~ dbeta(1, 1)

for (i in 1:n){
p[i] <- ilogit(X[i, ] %*% beta)
z[i] ~ dbern(p[i])
y[i] ~ dbin(p.obs, z[i])
}
}
```

I've made two changes. Firstly, I have changed the definition of y so that it corresponds to the R code that generated the data. So if `z[i]==0` then we must have `y[i]=0`, and if `z[i]==1` then we have probability `p.obs` of observing `y[i]=1`.

The second thing I did is to fix `p.obs` to 0.7 (the true value). With these data you cannot estimate `p.obs` because there is no replication, i.e. to estimate `p.obs` you would need either a validation study with `y` and `z` both observed on the same individuals, or a reliability study with two independent measurements of `y` on the same individuals.

With these data, you have to assume `p.obs` is known or, as a sensitivity analysis, you can put a strongly informative prior around it, bearing in mind that any uncertainty will feed directly into the posterior. Trying to put a non-informative prior on `p.obs` will result in an unidentifiable model and no doubt convergence problems.

Now, your question is how to estimate the prevalence in the population. There are three ways of interpreting this. The first two are:

1) Your `n` individuals constitute the whole population
2) Your `n` individuals represent a sample from a much larger population.

If your `n` individuals constitute the whole population then the population prevalence is, by definition, the mean of `z`. So you can add this derived quantity to your bugs model

```    p.hat1 <- mean(z)
```

and monitor `p.hat1`.

If these `n` individuals are a sub-sample of a larger population and you want to estimate the prevalence in this hyper-population then you have to work a bit harder. You need to simulate a new individual from the population who is not in the sample. To do this, we need to add a model for the covariate X.

```    ## Model for covariate X[2]
pi ~ dbeta(1, 1)
for (i in 1:n){
X[i,2] ~ dbern(pi)
}
```

Here I've added a new parameter `pi`, the prevalence of `X[2]`.

Then we can simulate sampling a new individual from the population

```    ## Sample new individual from the population
x2.new ~ dbern(pi)
## Prevalence in randomly drawn individual
p.hat2 <- ilogit(beta[1] + x2.new*beta[2])
```

Monitoring `p.hat2` gives you an estimate of the prevalence in the hyper population.

Both `p.hat1` and `p.hat2` will have the same posterior mean, but the variance of `p.hat2` will be larger, as it includes some sampling error (i.e. your `n` individuals are representative of the population, but do not describe it exactly).

3) The third possibility is to monitor

```   p.hat3 <- mean(p[])
```

This is not so easy to interpret. It represents the prevalence in a hypothetical replicate of this sample of `n` individuals, with the same values of `X` but (conditionally) independent values of `y`. It represents a half-way house between the the two other possibilities and so has an intermediate variance.