Hierarchical model with undefined ancestor/directed cycle

K Walter
  • K Walter
    K Walter

    I am having trouble with the following model and continue to get the error: Unable to resolve node p[1,1]. This may be due to an undefined ancestor node or a directed cycle in the graph.

    I think that I have give priors to all of the ancestor nodes and am worried there is a directed cycle, but I am not sure where. Any advice would be much appreciated, I am new to JAGS.

    Thank you!

    #y[i,t]=observed presence/absence of disease for each grid i, time t; 
    #N[i,t]=population size of grid cell i, time t
    #A=adjacency matrix (nn x nn)
    #nn is the number of gridcells (973)
    #tt is the length of the timeseries (29)
    #q[i,t] is the latent human risk of infection in grid cell i, time t-1 **the q matrix index t represents time in the year t-1 
    #index = the row number for the first year of the time series for each township
    #tseries = length of timeseries for township (total number of years of reporting for gridcell)
    model { 
    for (i in 1:(nn)){
            y[i,index[i]] ~ dbin(p[i,index[i]],1) 
            p.bound[i,index[i]] <- max(0,min(1,p[i,index[i]]))
            cloglog(p[i,index[i]]) <- log(q[i,index[i]+1]) + log(N[i,index[i]])
            q[i,index[i]+1]<- inprod(q[,index[i]], A[,i]) * dif             
            for (t in 1:(tseries[i]-1)){
            y[i,index[i]+t] ~ dbin(p[i,index[i]+t],1) 
            p.bound[i,index[i]+t] <- max(0,min(1,p[i,index[i]+t]))
            cloglog(p[i,index[i]+t]) <- log(q[i,index[i]+t+1]) + log(N[i,index[i]+t])
            q[i,index[i]+t+1]<- inprod(q[,index[i]+t], A[,i]) * dif             
    # PRIORS
    dif ~ dgamma(0.001,0.001) # diffusion parameter
    for (k in 1:nn)  {for (f in 1:index[k]) {q[k,f]~dbeta(1,100)} } 
  • I'm failing to see where 'p' is defined in this model which might explain your problem.

  • K Walter
    K Walter

    p is defined here: cloglog(p[i,index[i]]) and here: cloglog(p[i,index[i]+t]) and represents the probability of detecting at least one case in a location at a given time step.

  • Martyn Plummer
    Martyn Plummer

    I have drawn a graph by hand of the nodes that p[1,1] depends on. I assume that index[1] = 1. In this case there are only three ways to generate the error:
    1) N is not supplied as data, or N[1,1] is missing
    2) A is not supplied as data or some elements of A[,1] are missing
    3) index[k] = 0 for some k. This would mean q[k,1] was never defined

    If this advice does not help then you need to give a working example including data and, if necessary, intial values. Fake data will do, of course. We just need something that will reproduce the problem.

  • K Walter
    K Walter

    Thank you very much, yes there was definitely an issue with the A matrix.
    I am now able to initialize my model and run it on a small sample of my data .
    However, when I try to run the model on the full data, Graph Size according to JAGS is 1171432 and R crashes.

    I was wondering if you this is reasonable size for model fitting on a computing cluster or if I should simplify the structure? I attached my model structure as a text file and my data and initial values as .RData files. Please let me know if you need more information or data in another format.

    Thank you!

  • We've run models with over a million nodes but on machines that have ~ 100 GB RAM. Try running subsets of the data to see what the memory requirements are... that said I don't see why such a large model is generated by your data/model.

    Last edit: Krzysztof Sakrejda 2014-05-28
  • K Walter
    K Walter

    Okay, I have run on a small subset of data, but when I expand the subset, I get the error:
    Error in setParameters(init.values[[i]], i) : Error in node q[81,8]
    Attempt to set value of non-variable node

    The corresponding data y[81,], in this case, is a vector of 0s. I was wondering if this means there is too little informative data to estimate this node? Do you have any suggestions for how to deal with this? Thank you,

    • I don't think so. The nodes in the matrix q are calculated directly from other variables so you should not be setting initial values for them (set initial values instead for stochastic parents of q). That's why you are getting the message.

  • K Walter
    K Walter

    Thank you very much, yes, I removed the initial values for q. I am still having trouble figuring out exactly how to specify my model for the presence/absence of disease cases across space and over time.

    My dataset is incomplete--it includes the presence/absence of cases, y, from different states which begin reporting in different years-- and I am trying to use these data to model spread of disease. When I run the model I get the error:

    Compilation error on line 16.
    Unable to resolve node p[1,2]
    This may be due to an undefined ancestor node or a directed cycle in the graph

    I think this may be due to the fact that it is not possible to estimate underlying risk, q, for years before which there are observed data, y. I tried defining a matrix, q, setting values for underlying risk, q, for years in which we have no data, and filling in the rest of the q matrix with NA, allowing these values to be estimated by Gibbs sampling. I was wondering if this is a correct approach to dealing with incomplete data?

    q=matrix(NA, nrow=nn, ncol=30)
    for (k in 1:nn) {for (f in 1:index[k]) {q[k,f]=rbeta(1,1,100000)} }

    I attached the model text file; any input would be greatly appreciated as I am new to Bayesian modeling.

    Thank you again for your help!

  • Martyn Plummer
    Martyn Plummer

    You should augment your model file by putting a prior distribution on q for the first year of observation in each state. A diffuse normal prior would be a reasonable starting point:

    for (i in 1:nn) {
     q[i, index[i]] ~ dnorm(0, 1.0E-3)
  • K Walter
    K Walter

    Thank you for all the above help! I am able to run the model but am having trouble because of the large graph size:1191417 and slow run time (30,000 iterations of the most simple model takes >60 hours on a cluster). This isn't a very complicated model so I am not sure why the graph size is so big--is there a way to speed sampling or to reparameterize/re-write my model so that it is more efficient? Any suggestions or advice would be much appreciated!

    I attached the model text file and the data.

    Last edit: K Walter 2014-08-06
  • K Walter
    K Walter

    I'm sorry, I did not include initial values, they are posted below. Any advice on how to deal with this large graph size would be very helpful.
    Thank you,
    jags.inits=list(list(dif=0.05, pers=0.999),
    list(dif=0.01, pers=0.9),
    list(dif=0.10, pers=0.8))

  • Martyn Plummer
    Martyn Plummer

    I cannot read the data file at the moment as I am traveling without my usual toolkit. Looking at the model file, however, I am fairly sure that the large matrix A is the problem. This is described as an adjacency matrix, so it should be sparse, and a more efficient data representation is required.

  • Martyn Plummer
    Martyn Plummer

    So I have been working on vector indexing in JAGS, which is what we need to work efficiently with sparse matrices. In your example, we can convert the adjacency matrix A to compressed column format in R

    A <- as(data$A, "dgCMatrix")
    data$Ap <- A@p + 1
    data$Ai <- A@i + 1

    There is no need to store A@x because all non-zero values of A have value 1.

    Then in your model, the expression

    inprod(q[,index[i]+t], A[,i])



    You need to understand compressed column format to see why this works. The model compiles and runs, but I can't vouch for the correctness, as this is all completely new in the development branch.

  • K Walter
    K Walter

    Fantastic, will try this vector indexing. Thank you!