Fitting a hierarchical linear model using a t-distribution instead of a normal

Shravan
2013-12-08
2013-12-09
  • Shravan
    Shravan
    2013-12-08

    Dear all,

    I want to fit a varying intercepts model, one that using the package lme4 would be something like:

    lmer(rt~so + (1|subj) + (1|item), data)

    I generate the posterior predicted distribution to examine its properties relative to the data (e.g., the distribution of maximum values in the posterior predictive distribution, compared to the maximum value in the data---this is what Gelman et al (2014) suggest one do to evaluate model fit).

    One of the issues here is that the distribution of maximum values from the posterior predictive distribution is much smaller than (lies far away from) the actual observed maximum. That is, the model doesn't really represent the data well. I thought that I should be able to change this line in my model:

    rt[j] ~ dnorm( mu[j], tau.e )

    to:

    rt[j] ~ dt( mu[j], tau.e , 2)

    that is, a t distribution with low degrees of freedom, so that it has fatter tails (this is what Gelman et al suggest in the BDA 3rd edition book, ch 6). This would mean that my posterior predictive distribution would look like this:

    rt.pred[j] ~ dt(mu[j],tau.e,2)

    My question is: does this approach make sense in principle? It makes sense for the speed of light data discussed in ch 6 of Gelman et al 2014, where Newcomb's data have two very low values; I can have the model allow those extreme values as possible ones by using a t-distribution. So I figured that for a hierarchical linear model the same approach should work. Is this correct?

    [Note that I have very high upper bounds for the priors for my various standard deviations, but this is not relevant to my question, as far as I can tell.]

    The model looks like this:

    model
    {
    # The model for each observational unit
    # (each row is a subject's data point)
    for( j in 1:N )
    {
    mu[j] <- beta[1] + beta[2] * ( so[j] ) + u[subj[j]] + w[item[j]]
    rt[j] ~ dnorm( mu[j], tau.e )
    ## I want to change the above line to:
    #rt[j] ~ dt(mu[j],tau.e, 2)

    ##generate predicted values: 
     rt.pred[j] ~ dnorm(mu[j],tau.e)
    ##I want to change the above line to:
    #rt.pred[j] ~ dt(mu[j],tau.e, 2)
     }
    

    # Random effects for each subject
    for( i in 1:I )
    {
    u[i] ~ dnorm(0,tau.u)
    }

     # Random effects for each item
     for( k in 1:K )
     {
     w[k] ~ dnorm(0,tau.w)
     }
    

    # Uninformative priors:

    # Fixed effect intercept and slope
    beta[1] ~ dnorm(0.0,1.0E-5)
    beta[2] ~ dnorm(0.0,1.0E-5)

    # Residual (within-person) variance
    tau.e <- pow(sigma.e,-2)
    sigma.e ~ dunif(0,2000)

    # Between-person variation
    tau.u <- pow(sigma.u,-2)
    sigma.u ~ dunif(0,500)

     # Between-item variation
     tau.w <- pow(sigma.w,-2)
     sigma.w  ~ dunif(0,500)
    
    ## track predicted values:
    smallest <- min(rt.pred)
    mn      <- mean(rt.pred)
    largest  <- max(rt.pred)
    

    }

    many thanks,

    Shravan

     
  • Martyn Plummer
    Martyn Plummer
    2013-12-09

    That seems fine. I would just add two comments:

    1) I would not use a t distribution on 2 degrees of freedom. it does not have a finite variance. Try a t-distribution on 4 degrees of freedom first.

    2) When generating replicate data sets for posterior predictive checks, you can replicate at different levels. You have replicated the error term, but you might also want to think about replicating the random effects as well. It depends on what deviations from the model you are looking for, but if your replicate data looks under-dispersed, then you might want to consider it.