## Truncation, indexing and missing values document.SUBSCRIPTION_OPTIONS = { "thing": "thread", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Help
Anonymous
2011-01-01
2012-09-01
• Anonymous - 2011-01-01

Hi everyone,

I am trying to estimate a latent variable model adapted from chapter 6 of Lee
(2007): Structural Equation Modeling, A Bayesian Approach. The original code
is for WinBUGS, I am trying to convert it to JAGS, but I am running into some
problems with truncation and missing data.

The problem is two-fold: (1) some manifest variables are ordered categorical,
and (2) some observations for these variables are missing. z are the observed
values for the ordinal variables (in the dat.txt file), and Lee uses the
underlying a continuous variable y with a series of thresholds for the ordinal
variables. The WinBUGS truncation trick that is used in the original code goes
something like this:

```for(j in 2:2){
y[i,1]~dnorm(mu[i,j],psi[j])I(thd[z[i,j]],thd[z[i,j]+1])
}
```

The bounds for truncation are chosen from the vector of threshold values thd.
We obviously want to pick different bounds depending on the observed values of
the ordinal variable, which is why we use the indexing trick that you see
above: thd[z+1].

As far as I can tell from the JAGS manual, this should be one of those cases
where it might be enough to substitute-in Ts in the truncation calls.
Unfortunately, this does not work; JAGS stops as soon as it meets a missing
value. Adding 1 to NA (the observed value of z for missing obserations) gives
a non-integer index value:

``` Error in node z[1,2]
Index expression evaluates to non-integer value
```

Although this model works in OpenBUGS with the usual truncation operator (I),
I was unable to determine exactly what is happening behind the scene with the
index. In any case, I need to run this thing on a cluster that has JAGS
installed and would really like to get this to work with T rather than I. Is
my approach all wrong, or just too simple? Is there a workaround that you
could suggest?

To facilitate diagnostic, here is a link to 4 files: dat.txt, init.txt,
mod.txt, mod.jags.txt & jags.R. mod.txt works out of the box with its data and
init files in OpenBUGS. In mod.jags.txt, I just substituted Is for Ts, and I
am calling JAGS via rjags from the jags.R script.

http://dl.dropbox.com/u/40028/jags-help.rar

Thank you so much for your great work on JAGS and rjags. Working with this
software has made my life a lot easier these past few months. Any help with my
problem would be MUCH appreciated.

Best,

Vincent

• Martyn Plummer - 2011-01-04

There are a number of different problems here. I'll get to the main one in a
minute, but the error with the non-integer index is caused by two things

1) You are mixing up integer and non-integer variables in the same data array
"z". In the model file, columns 2,3,9, and 17 of z are assumed to be integer
valued (and this is true in the data file). In the other columns, z is defined
as a normal random variable (and some non-integer data values are provided in
the data file). It is a bad idea to give the same name ("z") to variables that
are qualitatively different. It makes the code much harder to read.

2) You are reading in the data using the dget() function. This does not work
correctly: although the BUGS data format superficially resembles the S dput
format, it is actually different because matrices are stored by row instead of
by column. When you read in BUGS data using dput() then the elements of all
the arrays are mixed up. The rjags package provides a read.data() function
which correctly reads in BUGS data if you use the option format="bugs".

Underlying this is another problem. This is a case of interval censoring, not
truncation and therefore the T(,) construct will give you the wrong answer. If
"z" is the indicator variable, "y" is the underlying continuous variable and
"thd" is a vector of strictly increasing thresholds then BUGS has:

``` y~dnorm(mu,psi) I(thd[z], thd[z+1])
```

But in JAGS this is written as:

```y ~ dnorm(mu, psi)
z0 ~ dinterval(y, thd)
z <- z0 + 1
```

This is a better formulation because it describes how the data z are
generated
. A model in the BUGS language should always be able to generate a
new data set if you fix the top-level parameters. There is no way to do this
when using the WinBUGS I(,) construct for interval censoring.

• Anonymous - 2011-01-05

Hi again,

Your comments were very helpful and much appreciated. dinterval() will make my
life much easier indeed.

Here's a quick follow-up question if you can spare another second. You suggest
using this (where y: underlying continous variable, z: observed ordered
variable):

```y ~ dnorm(mu, psi)
z0 ~ dinterval(y, thd)
z <- z0 + 1
```

Could you please explain what that last line odes? I understand that
dinterval() sets to zero anything below the first cut-point, so it seems
appropriate to bring z0 back in line with the observed values of the data,
which are 1,2,3... But since z is observed in the data, using that third line
of code produces this error:

```z[1,1] is a logical node and cannot be observed
```

On points 1 & 2, you are absolutely right on all counts. In fact, I use rjags
directly with lists of data from R, and I don't read the dat or init lists
from file, so the problem doesn't really come up in my practice. This was just
a clumsy attempt at producing a self-contained replication example. But thanks
for pointing me to read.data(), it will probably prove useful in the future.

Best,

Vincent

• Martyn Plummer - 2011-01-06

That's true. One easy way out is to prepend your threshold vector with a large
negative value, so that no value of y falls below thd. Then all your z values
will start from 1 and you don't need to increment z to use it as an index.
Alternatively, you can use z0 instead of z in your data.

By the way, I am going to deprecate the read.data function. It will be
replaced by two functions "read.bugsdata" and a "read.jagsdata" in a future
version of the rjags package.

• Anonymous - 2011-01-06

You are going to find me terribly annoying, but I still don't understand the
logic perfectly well. It seems to me that if I only use z0, I lose the
connection between my model and the observed data. Assuming that z is a dummy
0/1 such that the base category from dinterval() can be 0, what I need should
probably look like this:

```    for(j in 1:1){
y[i] ~ dnorm(mu[i,j], psi[j])
z[i,j] ~ dinterval(y[i], thd)
}
for(j in 2:34){
z[i,j]~dnorm(mu[i,j],psi[j])
}
mu[i,1] <- xi[i,1]
mu[i,2] <- lam[1] * xi[i,1]
```

where the observed categorical variable z gets an interval distribution based
on the normally distributed underlying continuous variable y. Then, mu is
estimated as a parameter of the distribution for y, and mu is itself used to
load the latent construct xi. This follows the logic set out in the second
for-loop of the above code, where z are all continuous and normally
distributed observed variables. But of course, this doesn't work:

```Error in node z[1,1]
Observed node inconsistent with unobserved parents at initialization
```

Again, sorry for the trouble, and I really appreciate the time you spend in

Best,

Vincent

• Anonymous - 2011-01-06

And again with the missing subscripts...

```for(j in 1:1){
y[i] ~ dnorm(mu[i,j], psi[j])
z[i,j] ~ dinterval(y[i], thd)
}
for(j in 2:34){
z[i,j]~dnorm(mu[i,j],psi[j])
}
mu[i,1] <- xi[i]
mu[i,2] <- lam[1] * xi[i]
...
```

• Martyn Plummer - 2011-01-07

JAGS will try to generate reasonable initial values from the prior, but this
will fail whenever you use dinterval to represent an a posteriori
constraint. You need to provide starting values for (continuous, latent) y, to
ensure that it is consistent with (discrete, observed) z and the threshold
parameters.

Missing indices seem to be caused by the fact that the "code" block is not
parsed correctly by Sourceforge's forum software. In BBCode, the letter i in
square brackets means "put the following text in italics". Inside a code block
this should be ignored as mark-up and printed verbatim but it swallows them