## Indexing Problem document.SUBSCRIPTION_OPTIONS = { "thing": "thread", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help
2011-06-01
2012-09-01

I'm pretty new to JAGS - I'm sorry to clutter the board if there is an easy
answer to this question that I just haven't been able to find by perusing the
help boards.

I have a large dataset of individual level data that include zip codes. I
there are a non-trivial number of zip codes that fall into more than one
congressional district. I would like to incorporate the uncertainty over which
districts individuals fall into in the analysis.

Basically, I have created two NxK matrices where N is the number of
respondents and K is the maximum number of districts that a respondent could
possibly be in (5 in this data). The first has the probability (proportion of
the population of the respondent's zip code that falls into Congressional
District k) that the respondent is in a particular district, the second has
the indices for the different districts.

I want to fit a multilevel model with respondents nested within congressional
districts also incorporating this uncertainty about location.

Here is the relevant bit of my code:

for (i in 1:N) {

d_ <- dist[i,D_+1]

P_ ~ dunif(0,1)

D_ ~ dinterval(P_, p)

mu_ <- b[d_]*lib_

mfq_ ~ dnorm(mu_, tau.y)

}

the 'dist' matrix has the index numbers for the congressional districts. The
'p' matrix has the cumulative probabilities of falling into each district (the
cut points for the interval distribution that I would like to reflect the
uncertainty about respondent location -- for individuals that might fall into
less than 5 districts, I've set the higher cut points to values greater than
1). In the beginning stages, I've been just trying to fit a really simple
model (one independent variable, 'lib', and no intercept) with a district-
varying coefficient.

I've done a lot of debugging, and it appears as if the interval distribution
is working correctly (the program will run if I replace 'd_ <- dist[i,D_+1]'
with 'd_ <- dist') and the values look right to me, but it doesn't seem to
like when I try to probabilistically change the index for the district.

Am I approaching this in the right way? Thanks!

• Tim Handley - 2011-06-01

First, note that this system uses BBCode, which interprets bracket-i-bracket
as the command to italicize text. So when posting code, you ought to try and
use a code block.

It sounds like you're trying to incorporate measurement uncertainty into your
analysis. For many of your data points (individuals) the exact value of
'district' is unknown. For these individuals, you're trying to draw sample
values of 'district' from a specified distribution of possible district
values.

I'm recasting you're problem into this form (measurement uncertainty) because
I was trying to do this very thing (incorporate measurement uncertainty into a
JAGS model) several months ago. In the end, I don't think it is possible to do
this, though the reasons are somewhat complex, and I still don't fully
understand the issue. See, for example, this post:
https://sourceforge.net/projects/mcmc-
jags/forums/forum/610037/topic/4505486
.

Does 'measurement uncertainty' seem an accurate characterization of your
problem?

Do you know the p values already? Can you find, or do you have, good p values
from some sort of GIS?

Have you considered a weighted average as a simpler solution? e.g.

```mu<-(b[d1]*p[d1]+b[d2]*p[d2]...)*lib
```

Measurement uncertainty is exactly what I am trying to get at. I had
considered the weighting solution, but I thought it would be more appropriate
to include it at the sampling stage. I suppose they are equivalent at the
limit though, right?

• Tim Handley - 2011-06-13

I don't think the two strategies (weighted average vs. stochastic sampling)
are the same. I imagine that the mean/median values of the posterior
distributions would be similar. So if mean/median values are your main
concern, then the two methods are probably equivalent. However, I think the
weighted average won't accurately model the uncertainty in your measurements.
This means that the model will underestimate the uncertainty in your results,
producing credible intervals which are too small, and p-values which are too
large.

In that post I pointed out, Martyn suggested that one could use R to
stochastically generate multiple data sets based on known measurement
uncertainty, run your model on each data set, and pool the results. So if
you're interested in credible intervals, this might be the way to go.

Again, I'm still trying to figure this out myself. So I'd be curious to hear
about what you choose to do, and how well it worked for you.