mark pahuta
2013-02-01
Hi,
I have missing data for my outcome, which is an observed state. My states are well, ill, dead X = {1, 2, 3}. When missing, I know that X is either 1 or 2, so I don't want to just allow jags to pull values from the posterior. I have written a data block that imputes the values. But I just figured out, that it imputes only once for each chain. Is there any way to have new values, for my outcome, drawn randomly in each ITERATION???
my data block is below. the full model can be downloaded from
https://docs.google.com/folder/d/0B2_rKFnvrjMAVThLRkxFUW9ORlk/edit?usp=sharing
data{ impstates[1] <-0.5 impstates[2] <-0.5 for(a in 1:Npat){ for (b in 1:nobs[a]){ So_imp[a,b] ~ dcat(impstates) So_new[a,b] <- ifelse(censind[a,b]==1, So_imp[a,b], So[a,b]) } } for(cc in 1:Npat){ So_corr[cc,1]<-So_new[cc,1] } for(c in 1:Npat){ for (d in 2:nobs[c]){ So_corr[c,d]<-ifelse(censind[c,d]==1 && So_corr[c,d-1]==2, 2, So_new[c,d]) Sp[c,d] <- So_corr[c,d-1] } } }
mark pahuta
2013-02-02
I think others have come across this problem. http://sourceforge.net/p/mcmc-jags/discussion/610037/thread/71466030#56f5
I dont understand how to go about implementing the loop idea from the above link.
Has anyone attempted this?
Chris Jackson
2013-02-04
I don't understand why you don't want to generate values from the posterior. If you set the prior probability of 3 to zero, then the posterior probability will be zero as well, so your constraint will be satisfied. Though I realise you might then need a different model for the non-missing and missing observations, which may be awkward to code.
Martyn Plummer
2013-02-04
Your missing data are interval-censored. This can be represented with the dinterval
distribution
In the data block:
c[a,b, 1] <- ifelse(censid[a,b], 1, So[a,b]) - 0.5 c[a,b, 2] <- ifelse(censid[a,b], 2, So[a,b]) + 0.5 interval[a,b] <- 1
In the model block
interval[a,b] ~ dinterval(So[a,b], c[a,b,])
Conversely, you should not have multiple versions of your outcome variable So[,a,b]
. Just keep the one and encode missing values as NA
in the input data. The missing observations will be imputed from the model.
I looked at your code and there is another issue with the use of dmstate
. In an illness-death model, the death times are normally known exactly. So the use of the dmstate
distribution (for interval-censored transitions) is inappropriate. Chris's msm
package for R handles this very well. I think you should try to work out how to fit the model with msm
and this will focus your thinking before you come back (if necessary) to JAGS.
mark pahuta
2013-02-04
Hi,
Thank you for your help. I apologize upfront... I'm having trouble understanding how to implement the interval idea Martyn suggests. I am aware of the issue regarding interval censoring of death times, my data are interval censored for death.
I want to impute So[,] when it is missing. My concern about about just letting the posterior fill in these values is that couldn't 3 (the absorbing death state potentially be used)? As Chris mentioned, I could make the prior probability of death 0... but I'm not sure how to do this aside from using a different Q for censored and non-censored observations. But wouldn't this distort the likelihood?
Here is the code of what I am trying now
model{ for(i in 1:Npat){ for (j in 2:nobs[i]){ So[i,j] ~ dmstate (So[i,j-1], tm[i,j], Q[,,stud_id[i],(surg_id[i]+1)]) }} for(a in 1:Npat){ So[a,1]~dcat(impstates[1,]) for (b in 2:nobs[a]){ censtyp[a,b]~dinterval(So[a,b-1], brkpt) So[a,b] ~ dcat(impstates[censtyp[a,b],]) }} brkpt[1]<-0.9 brkpt[2]<-1.1 impstates[1,1]<-0.5 impstates[1,2]<-0.5 impstates[1,3]<-0 impstates[2,1]<-0 impstates[2,2]<-1 impstates[2,3]<-0
1) If the first So[,] is missing, I just make a random draw of states 1 and 2
2) I then loop through the remaining So[,] values. I first create a censtype variable: if So is 1 then censtype =1, else it is 2. Censtype is used to specify the probabilities for dcat.
But I run into a problem that I can't redefine nodes... I guess I can't do this for observed variables? I hope this code makes my thoughts more explicit. I'd appreciate any help.
Chris Jackson
2013-02-05
Ignore my previous comment - Martyn's right about this being a job for dinterval(). As in his code above, the cut-points should be different for each observation. In your notation, you'd write "censtyp" instead of "interval" and "brkpt" instead of "c", but otherwise you can use Martyn's code. You should remove the dcat() statements, because the dmstate() statement defines how each So[a,b] is generated stochastically. In other words, dmstate defines a likelihood for the observed So[a,b], and a prior for the missing So[a,b].
The tricky thing to understand about dinterval() is that it doesn't really define a prior for the missing data, it just represents that some of the observations are coarsened. If the dmstate() model generates a value for So[a,b] within these cutpoints, then the likelihood contribution will be 1. If the generated value is inconsistent with the cutpoints, then the likelihood is 0 and the value of So[a,b] will be implicitly rejected.
mark pahuta
2013-02-05
Hi I just want to make sure I understand the use, and sequence of events, of dinterval correctly:
1. If So[,] is missing, a value will be drawn from the posterior, which could be {1,2, 3}.
2. Then all observed and imputed So[,] values are passed to dinterval.
3. If the imputed value is consistent with the brkpts in dinterval (ie not 3 in this case), then the imputed So[,] contributes to the likelihood
4. all observed So[,] always contribute to the likelihood (because brkpts were manually defined to only allow the observed So[,]).
The main source of confusion for me has been that dmstate also depends on the previous So[,]. Please correct me if I am wrong, NA/imputed values cannot be used as the previous state?.
Therefore, what I have done is place a prior on Sp[,], which defines the previous state. This prior has logic in that, if the previous imputed value was Sp[i,j]=2, then next imputed value must be 2 (this model is progressive, with no reversible transitions).
Using this approach, are the imputed values for So[i,j] guaranteed to never be 1 if Sp[i,j] = 2 as the transition intensity matrix does not allow reversible transitions?
model{ for(i in 1:Npat){ for (j in 2:nobs[i]){ censtyp[i,j] ~ dinterval(So[i,j], brkpt[i,j,]) So[i,j] ~ dmstate (Sp[i,j], tm[i,j], Q[,,stud_id[i],(surg_id[i])]) }} for(i in 1:Npat){ Sp[i,1]~dcat(impstates[1,,stud_id[i],1]) for (j in 2:nobs[i]){ Sp[i,j]~dcat(impstates[Sp[i,j-1],,stud_id[i],j]) }}
Chris Jackson
2013-02-05
No - the value drawn from the posterior can be only be 1 or 2. I think you're missing a bit of Bayesian graphical modelling theory. Informally, the posterior of a node is proportional to the prior multiplied by all likelihood contributions. The likelihood contribution from dinterval() is zero for a value of 3. So the value drawn from the posterior can never be 3.
So[i,j] also contributes to the likelihood of the next state. If your model does not allow reversible transitions, then the likelihood for a state of 2 followed by a state of 1 will be zero. So if the previous state is 2, then the value drawn from the posterior of the next state will never be 1.
Therefore there's no need to use these extra dcat() priors - that's wrongly putting the data in twice. Maybe it would help with understanding to draw (bits of) this as a directed acyclic graph?
mark pahuta
2013-02-05
OK, I think I am finally getting it. I think my problem was that some of the So[i,1] were censored, which were required to draw from dmstate for So[i, 2...j]. So I placed a prior on So[i,1], and the model seems to work.
for(i in 1:Npat){ So[i,1]~dcat(impstates[1,,stud_id[i]]) for (j in 2:nobs[i]){ censtyp[i,j] ~ dinterval(So[i,j], brkpt[i,j,]) So[i,j] ~ dmstate (So[i,j-1], tm[i,j], Q[,,stud_id[i],(surg_id[i])]) } } for(k in 1:3){ impstates[1,1,k]~dbeta(2,5) impstates[1,2,k]<-1-impstates[1,1,k] }
Chris Jackson
2013-02-05
That looks fine. Though you might not need two levels to the prior on the first state. That's OK if you want to learn about the probability that the first state is 1, there's no harm in it, but it's not necessary if you just want to impute values. The imputations won't just come from the prior - because there's a likelihood contribution from the second state, values which are more plausible given the data will be imputed more often.