Indexing Problem

Brad Jones
  • Brad Jones

    Brad Jones - 2011-06-01

    I'm pretty new to JAGS - I'm sorry to clutter the board if there is an easy
    answer to this question that I just haven't been able to find by perusing the
    help boards.

    I have a large dataset of individual level data that include zip codes. I
    would like to assign individuals to congressional districts, unfortunately
    there are a non-trivial number of zip codes that fall into more than one
    congressional district. I would like to incorporate the uncertainty over which
    districts individuals fall into in the analysis.

    Basically, I have created two NxK matrices where N is the number of
    respondents and K is the maximum number of districts that a respondent could
    possibly be in (5 in this data). The first has the probability (proportion of
    the population of the respondent's zip code that falls into Congressional
    District k) that the respondent is in a particular district, the second has
    the indices for the different districts.

    I want to fit a multilevel model with respondents nested within congressional
    districts also incorporating this uncertainty about location.

    Here is the relevant bit of my code:

    for (i in 1:N) {

    d_ <- dist[i,D_+1]

    P_ ~ dunif(0,1)

    D_ ~ dinterval(P_, p)

    mu_ <- b[d_]*lib_

    mfq_ ~ dnorm(mu_, tau.y)


    the 'dist' matrix has the index numbers for the congressional districts. The
    'p' matrix has the cumulative probabilities of falling into each district (the
    cut points for the interval distribution that I would like to reflect the
    uncertainty about respondent location -- for individuals that might fall into
    less than 5 districts, I've set the higher cut points to values greater than
    1). In the beginning stages, I've been just trying to fit a really simple
    model (one independent variable, 'lib', and no intercept) with a district-
    varying coefficient.

    I've done a lot of debugging, and it appears as if the interval distribution
    is working correctly (the program will run if I replace 'd_ <- dist[i,D_+1]'
    with 'd_ <- dist') and the values look right to me, but it doesn't seem to
    like when I try to probabilistically change the index for the district.

    Am I approaching this in the right way? Thanks!

  • Tim Handley

    Tim Handley - 2011-06-01

    First, note that this system uses BBCode, which interprets bracket-i-bracket
    as the command to italicize text. So when posting code, you ought to try and
    use a code block.

    It sounds like you're trying to incorporate measurement uncertainty into your
    analysis. For many of your data points (individuals) the exact value of
    'district' is unknown. For these individuals, you're trying to draw sample
    values of 'district' from a specified distribution of possible district

    I'm recasting you're problem into this form (measurement uncertainty) because
    I was trying to do this very thing (incorporate measurement uncertainty into a
    JAGS model) several months ago. In the end, I don't think it is possible to do
    this, though the reasons are somewhat complex, and I still don't fully
    understand the issue. See, for example, this post:

    Does 'measurement uncertainty' seem an accurate characterization of your

    Do you know the p values already? Can you find, or do you have, good p values
    from some sort of GIS?

    Have you considered a weighted average as a simpler solution? e.g.

  • Brad Jones

    Brad Jones - 2011-06-12

    Thanks for the reply and sorry about the rookie mistake...

    Measurement uncertainty is exactly what I am trying to get at. I had
    considered the weighting solution, but I thought it would be more appropriate
    to include it at the sampling stage. I suppose they are equivalent at the
    limit though, right?

    Thanks again for the reply!

  • Tim Handley

    Tim Handley - 2011-06-13

    I don't think the two strategies (weighted average vs. stochastic sampling)
    are the same. I imagine that the mean/median values of the posterior
    distributions would be similar. So if mean/median values are your main
    concern, then the two methods are probably equivalent. However, I think the
    weighted average won't accurately model the uncertainty in your measurements.
    This means that the model will underestimate the uncertainty in your results,
    producing credible intervals which are too small, and p-values which are too

    In that post I pointed out, Martyn suggested that one could use R to
    stochastically generate multiple data sets based on known measurement
    uncertainty, run your model on each data set, and pool the results. So if
    you're interested in credible intervals, this might be the way to go.

    Again, I'm still trying to figure this out myself. So I'd be curious to hear
    about what you choose to do, and how well it worked for you.


Log in to post a comment.