Martyn Plummer - 2014-01-14

I took a somewhat longer break over the holidays...

This is an interesting question. But before giving you the answer, I need to fix your model:

    beta[1] ~ dnorm(0, 1.5)
    beta[2] ~ dnorm(0, 1.5)
    ###p.obs ~ dbeta(1, 1)
    p.obs <- 0.7
    prob ~ dbeta(1, 1)

    for (i in 1:n){
      p[i] <- ilogit(X[i, ] %*% beta)
      z[i] ~ dbern(p[i])
      y[i] ~ dbin(p.obs, z[i])

I've made two changes. Firstly, I have changed the definition of y so that it corresponds to the R code that generated the data. So if z[i]==0 then we must have y[i]=0, and if z[i]==1 then we have probability p.obs of observing y[i]=1.

The second thing I did is to fix p.obs to 0.7 (the true value). With these data you cannot estimate p.obs because there is no replication, i.e. to estimate p.obs you would need either a validation study with y and z both observed on the same individuals, or a reliability study with two independent measurements of y on the same individuals.

With these data, you have to assume p.obs is known or, as a sensitivity analysis, you can put a strongly informative prior around it, bearing in mind that any uncertainty will feed directly into the posterior. Trying to put a non-informative prior on p.obs will result in an unidentifiable model and no doubt convergence problems.

Now, your question is how to estimate the prevalence in the population. There are three ways of interpreting this. The first two are:

1) Your n individuals constitute the whole population
2) Your n individuals represent a sample from a much larger population.

If your n individuals constitute the whole population then the population prevalence is, by definition, the mean of z. So you can add this derived quantity to your bugs model

    p.hat1 <- mean(z)

and monitor p.hat1.

If these n individuals are a sub-sample of a larger population and you want to estimate the prevalence in this hyper-population then you have to work a bit harder. You need to simulate a new individual from the population who is not in the sample. To do this, we need to add a model for the covariate X.

    ## Model for covariate X[2]
    pi ~ dbeta(1, 1)
    for (i in 1:n){
       X[i,2] ~ dbern(pi)

Here I've added a new parameter pi, the prevalence of X[2].

Then we can simulate sampling a new individual from the population

    ## Sample new individual from the population ~ dbern(pi)
    ## Prevalence in randomly drawn individual
    p.hat2 <- ilogit(beta[1] +*beta[2])

Monitoring p.hat2 gives you an estimate of the prevalence in the hyper population.

Both p.hat1 and p.hat2 will have the same posterior mean, but the variance of p.hat2 will be larger, as it includes some sampling error (i.e. your n individuals are representative of the population, but do not describe it exactly).

3) The third possibility is to monitor

   p.hat3 <- mean(p[])

This is not so easy to interpret. It represents the prevalence in a hypothetical replicate of this sample of n individuals, with the same values of X but (conditionally) independent values of y. It represents a half-way house between the the two other possibilities and so has an intermediate variance.