Truncation, indexing and missing values

Help
Anonymous
2011-01-01
2012-09-01
  • Anonymous - 2011-01-01

    Hi everyone,

    I am trying to estimate a latent variable model adapted from chapter 6 of Lee
    (2007): Structural Equation Modeling, A Bayesian Approach. The original code
    is for WinBUGS, I am trying to convert it to JAGS, but I am running into some
    problems with truncation and missing data.

    The problem is two-fold: (1) some manifest variables are ordered categorical,
    and (2) some observations for these variables are missing. z are the observed
    values for the ordinal variables (in the dat.txt file), and Lee uses the
    underlying a continuous variable y with a series of thresholds for the ordinal
    variables. The WinBUGS truncation trick that is used in the original code goes
    something like this:

    for(j in 2:2){
            y[i,1]~dnorm(mu[i,j],psi[j])I(thd[z[i,j]],thd[z[i,j]+1])
     }
    

    The bounds for truncation are chosen from the vector of threshold values thd.
    We obviously want to pick different bounds depending on the observed values of
    the ordinal variable, which is why we use the indexing trick that you see
    above: thd[z+1].

    As far as I can tell from the JAGS manual, this should be one of those cases
    where it might be enough to substitute-in Ts in the truncation calls.
    Unfortunately, this does not work; JAGS stops as soon as it meets a missing
    value. Adding 1 to NA (the observed value of z for missing obserations) gives
    a non-integer index value:

     Error in node z[1,2]
     Index expression evaluates to non-integer value
    

    Although this model works in OpenBUGS with the usual truncation operator (I),
    I was unable to determine exactly what is happening behind the scene with the
    index. In any case, I need to run this thing on a cluster that has JAGS
    installed and would really like to get this to work with T rather than I. Is
    my approach all wrong, or just too simple? Is there a workaround that you
    could suggest?

    To facilitate diagnostic, here is a link to 4 files: dat.txt, init.txt,
    mod.txt, mod.jags.txt & jags.R. mod.txt works out of the box with its data and
    init files in OpenBUGS. In mod.jags.txt, I just substituted Is for Ts, and I
    am calling JAGS via rjags from the jags.R script.

    http://dl.dropbox.com/u/40028/jags-help.rar

    Thank you so much for your great work on JAGS and rjags. Working with this
    software has made my life a lot easier these past few months. Any help with my
    problem would be MUCH appreciated.

    Best,

    Vincent

     
  • Martyn Plummer

    Martyn Plummer - 2011-01-04

    There are a number of different problems here. I'll get to the main one in a
    minute, but the error with the non-integer index is caused by two things

    1) You are mixing up integer and non-integer variables in the same data array
    "z". In the model file, columns 2,3,9, and 17 of z are assumed to be integer
    valued (and this is true in the data file). In the other columns, z is defined
    as a normal random variable (and some non-integer data values are provided in
    the data file). It is a bad idea to give the same name ("z") to variables that
    are qualitatively different. It makes the code much harder to read.

    2) You are reading in the data using the dget() function. This does not work
    correctly: although the BUGS data format superficially resembles the S dput
    format, it is actually different because matrices are stored by row instead of
    by column. When you read in BUGS data using dput() then the elements of all
    the arrays are mixed up. The rjags package provides a read.data() function
    which correctly reads in BUGS data if you use the option format="bugs".

    Underlying this is another problem. This is a case of interval censoring, not
    truncation and therefore the T(,) construct will give you the wrong answer. If
    "z" is the indicator variable, "y" is the underlying continuous variable and
    "thd" is a vector of strictly increasing thresholds then BUGS has:

     y~dnorm(mu,psi) I(thd[z], thd[z+1])
    

    But in JAGS this is written as:

    y ~ dnorm(mu, psi)
    z0 ~ dinterval(y, thd)
    z <- z0 + 1
    

    This is a better formulation because it describes how the data z are
    generated
    . A model in the BUGS language should always be able to generate a
    new data set if you fix the top-level parameters. There is no way to do this
    when using the WinBUGS I(,) construct for interval censoring.

     
  • Anonymous - 2011-01-05

    Hi again,

    Your comments were very helpful and much appreciated. dinterval() will make my
    life much easier indeed.

    Here's a quick follow-up question if you can spare another second. You suggest
    using this (where y: underlying continous variable, z: observed ordered
    variable):

    y ~ dnorm(mu, psi) 
    z0 ~ dinterval(y, thd) 
    z <- z0 + 1
    

    Could you please explain what that last line odes? I understand that
    dinterval() sets to zero anything below the first cut-point, so it seems
    appropriate to bring z0 back in line with the observed values of the data,
    which are 1,2,3... But since z is observed in the data, using that third line
    of code produces this error:

    z[1,1] is a logical node and cannot be observed
    

    On points 1 & 2, you are absolutely right on all counts. In fact, I use rjags
    directly with lists of data from R, and I don't read the dat or init lists
    from file, so the problem doesn't really come up in my practice. This was just
    a clumsy attempt at producing a self-contained replication example. But thanks
    for pointing me to read.data(), it will probably prove useful in the future.

    Best,

    Vincent

     
  • Martyn Plummer

    Martyn Plummer - 2011-01-06

    That's true. One easy way out is to prepend your threshold vector with a large
    negative value, so that no value of y falls below thd. Then all your z values
    will start from 1 and you don't need to increment z to use it as an index.
    Alternatively, you can use z0 instead of z in your data.

    By the way, I am going to deprecate the read.data function. It will be
    replaced by two functions "read.bugsdata" and a "read.jagsdata" in a future
    version of the rjags package.

     
  • Anonymous - 2011-01-06

    You are going to find me terribly annoying, but I still don't understand the
    logic perfectly well. It seems to me that if I only use z0, I lose the
    connection between my model and the observed data. Assuming that z is a dummy
    0/1 such that the base category from dinterval() can be 0, what I need should
    probably look like this:

        for(j in 1:1){
            y[i] ~ dnorm(mu[i,j], psi[j])
            z[i,j] ~ dinterval(y[i], thd)
        }
        for(j in 2:34){
            z[i,j]~dnorm(mu[i,j],psi[j])
        }
        mu[i,1] <- xi[i,1]
        mu[i,2] <- lam[1] * xi[i,1]
    

    where the observed categorical variable z gets an interval distribution based
    on the normally distributed underlying continuous variable y. Then, mu is
    estimated as a parameter of the distribution for y, and mu is itself used to
    load the latent construct xi. This follows the logic set out in the second
    for-loop of the above code, where z are all continuous and normally
    distributed observed variables. But of course, this doesn't work:

    Error in node z[1,1]
    Observed node inconsistent with unobserved parents at initialization
    

    Again, sorry for the trouble, and I really appreciate the time you spend in
    answering my questions.

    Best,

    Vincent

     
  • Anonymous - 2011-01-06

    And again with the missing subscripts...

    for(j in 1:1){ 
    y[i] ~ dnorm(mu[i,j], psi[j]) 
    z[i,j] ~ dinterval(y[i], thd) 
    } 
    for(j in 2:34){ 
    z[i,j]~dnorm(mu[i,j],psi[j]) 
    } 
    mu[i,1] <- xi[i]  
    mu[i,2] <- lam[1] * xi[i] 
    ...
    
     
  • Martyn Plummer

    Martyn Plummer - 2011-01-07

    JAGS will try to generate reasonable initial values from the prior, but this
    will fail whenever you use dinterval to represent an a posteriori
    constraint. You need to provide starting values for (continuous, latent) y, to
    ensure that it is consistent with (discrete, observed) z and the threshold
    parameters.

    Missing indices seem to be caused by the fact that the "code" block is not
    parsed correctly by Sourceforge's forum software. In BBCode, the letter i in
    square brackets means "put the following text in italics". Inside a code block
    this should be ignored as mark-up and printed verbatim but it swallows them
    instead.

    I think we'll move the forums to phpBB. I've activated this as a hosted app.

     
  • Anonymous - 2011-05-08

    To the original poster: Did you ever find a solution to this issue? I am
    working on exactly the same example from Lee.

     

Log in to post a comment.