Rotatinal Invariance in Factor Models

2013-05-03
2013-07-25
  • Georg Hosoya
    Georg Hosoya
    2013-05-03

    Hi!

    I am new to this forum and my background is mostly in psychological methods. At the moment one of my sideprojects is to explore the MCMC framework for estimating latent variable models and I have stumbled over the problem of rotational invariance described in this article by Erosheva and Curtis here.

    Here is some code to show a simple factor model with one latent continuous node eta (e.g. a trait) 'influencing' the manifest indicators y (e.g. responses to items).

    model{
      ##Likelihood
      # i: Persons
      # j: Items
    
      for(i in 1:n)
      {
        for(j in 1:k)
        {
          mu[i,j]<-beta[j]+lambda[j]*eta[i]
          y[i,j]~dnorm(mu[i,j], tau.y[j])
        }
      }
    
      ##Priors
      for(j in 1:k)
      {
      # Item-Residuals
        tau.y[j] <- pow(sigma.y[j], -2)
        sigma.y[j] ~ dunif (0, 100)
      # Intercepts
        beta[j]~ dnorm(0,1.0E-3) 
      # Loadings  
        lambda[j]~dnorm(0,1.0E-3)
      }
    
      # Factor-Values
      for(i in 1:n)
      {
        eta[i]~dnorm(0,1)
      }
    
    }
    

    In this case the likelihood surface is bimodal if the lambdas consist of a mix of positive and negative real values, leading to two equivalent solutions for the lambdas and etas reflected around the zero axis.

    I am just curious, if anyone knows of a solution that does not involve postprocessing the markov chains as described in the article above, maybe by introducing another set of constraints?

    It would be really cool, if MCMC could also be applied to more complex latent variable models.

    If anyone is interested I can also provide some simple R code to simulate data from a known distribution to check for consistency.

    Thanks and best,

    Georg

     
    Last edit: Georg Hosoya 2013-05-03
  • Georg Hosoya
    Georg Hosoya
    2013-07-09

    Dear all,

    thanks to this forum I had an interesting E-Mail exchange with Vincent Arel-Bundock from Michigan University who together with one of his co-authors, Walter R. Mebane, greatly helped solving the problem.

    The trick seems to be fixing the loading of one item to one and the intercept to zero. The variance of the latent trait distribution is freely estimated. Maybe one should estimate the mean also.

    I tried it out with the Holzinger & Swineford dataset, here is the code:

    # 1_fa.R
    # Experimental
    
    library(R2jags)
    library(MBESS)
    data(HS.data)
    # Select item-subset of HS.data
    y<-as.matrix(HS.data[,26:30])
    n<-dim(y)[1]
    k<-dim(y)[2]
    
    data<-list("y", "n", "k")
    
    parameters<-c("beta", "sigma.y", "lambda", "eta", "sigma.eta", "eta.mean")
    
    output<-jags.parallel(data, inits=NULL, parameters, 
            model.file="1_fab.txt",n.chains=2, n.iter=2000, n.burnin=1000)
    
    attach.jags(output)
    plot(output)
    
    cor(apply(eta,2,mean), apply(y,1,mean))
    

    This is the jags script:

    # Experimental
    # 1_fab.txt
    
    model{
      ## Likelihood
      # i: Persons
      # k: Items
    
      for(i in 1:n)
      {
        for(j in 1:k)
        {
          mu[i,j]<-beta[j]+lambda[j]*eta[i]
          y[i,j]~dnorm(mu[i,j], tau.y[j])
        }
      }
    
      ## Constraints
      # Loading of first item constraint to one and 
      # intercept of first item constraint to zero
    
         lambda[1]<-1
         beta[1]<-0
    
      # Priors for loadings and items
       for(j in 2:k)
       {  
        # Intercepts
           beta[j]~ dnorm(0,1.0E-3) 
        # Loadings 
           lambda[j]~dnorm(0,1.0E-3) 
       }
    
      # Priors for item residuals 
      for(j in 1:k)
      {
          tau.y[j] <- pow(sigma.y[j], -2)
          sigma.y[j] ~ dunif (0, 100)
      }
    
      # Priors for factor values
      for(i in 1:n)
      {
          eta[i]~dnorm(eta.mean, tau.eta)
      }
    
      # Hyperprior for latent score distribution
          eta.mean~dnorm(0,1.0E-4)
          tau.eta<- pow(sigma.eta, -2)
          sigma.eta ~ dunif (0, 100)
    }
    

    Big thanks! If I have the time I will try out multiple latent variables with a scaled inverse Wishart prior on the factor score variance-covariance matrix.

    I am also curious if this solution transfers to the Birnbaum model.

    Georg

     
    Last edit: Georg Hosoya 2013-07-09
  • Martyn Plummer
    Martyn Plummer
    2013-07-10

    I'm glad someone was able to help you. I stayed silent because I am not too familiar with factor analysis models.

     
  • Georg Hosoya
    Georg Hosoya
    2013-07-25

    Thanks Martyn! Meanwhile I have tried a 3-factorial model with a scaled inverse Wishart prior (after Gelman & Hill, 2007) on the variance-covariance matrix of factor values on the Holzinger-Swineford dataset that is also used in the tutorial for the R package lavaan by Yves Rosseel et al.

    Using the traditional practice of setting one loading per factor to one for identification reasons seems to work nicely in this case and gives comparable results to lavaan.

    In the following code I simply use lavaan first and in a second step the model is fit with JAGS for comparison. I think the scripts are pretty much self-explanatory.

    However, keep in mind that this is somewhat experimental.

    Here is the R-script:

    ## holzinger.R
    # Experimental!
    
    library(lavaan)
    
    # From the lavaan tutorial
    HS.model <- ' visual  =~ x1 + x2 + x3      
                  textual =~ x4 + x5 + x6
                  speed   =~ x7 + x8 + x9 '
    
    fit <- cfa(HS.model, data=HolzingerSwineford1939, meanstructure=TRUE)
    
    summary(fit, standardized=TRUE)
    
    library(R2jags)
    data<-HolzingerSwineford1939
    ## Transform data to long format----
    #  the complicated way
    
    # Dependent variable
    y<-c(data$x1, data$x2, data$x3, data$x4, data$x5, data$x6, data$x7, data$x8, data$x9)
    
    # Item identifier
    item<-c(rep(1, 301), rep(2, 301), rep(3, 301), rep(4, 301), rep(5, 301), rep(6, 301), rep(7, 301), rep(8, 301), rep(9, 301))
    
    # Construct identifier (1: visual, 2: textual, 3: speed)
    const<-c(rep(1,3*301 ), rep(2,3*301 ), rep(3,3*301 ))
    
    # Person id
    id<-rep(seq(1:301),9)
    ## ---------------------------------
    # Fitting the model with JAGS
    W=(diag(3))
    
    dat<-list("y", "item", "const", "id", "W")
    parameters<-(c("beta", "sigma.y", "lam", "sigma.eta1", "sigma.eta2", "sigma.eta3", "rho_eta13", "rho_eta12", "rho_eta23", "eta"))
    
    output<-jags.parallel(dat, inits=NULL, parameters,  model.file="3fa.txt",n.chains=2, n.iter=3000, n.burnin=2000)
    plot(output)
    

    Here is the jags.script.

    ## 3fa.txt
    # Experimental!
    model{
    
      # Likelihood
      for(i in 1:2709) # Loop over observations
      {
          # Linear decomposition could probably done more
          # elegantly with matrix algebra
          mu[i]<-beta[item[i]] +
             lam[const[i], item[i]]*eta[const[i], id[i]]+
             lam[const[i], item[i]]*eta[const[i], id[i]]+
             lam[const[i], item[i]]*eta[const[i], id[i]]
    
          # Distributional assumptions for the 
          # observed variables
             y[i]~dnorm(mu[i], tau.y[item[i]])
      }
    
      # Priors for the items
        for(i in 1:9)
        {
          beta[i]~dnorm(0,1.0E-3) 
          tau.y[i] <- pow(sigma.y[i], -2)  
          sigma.y[i] ~ dunif (0, 100)   
        }
    
        # Matrix of loadings [construct, item]
        # Constraints and priors 
         lam[1,1]<-1
         lam[2,1]<-0
         lam[3,1]<-0
         lam[1,2]~dnorm(0,1.0E-3) 
         lam[2,2]<-0
         lam[3,2]<-0
         lam[1,3]~dnorm(0,1.0E-3) 
         lam[2,3]<-0
         lam[3,3]<-0
         lam[1,4]<-0
         lam[2,4]<-1
         lam[3,4]<-0
         lam[1,5]<-0
         lam[2,5]~dnorm(0,1.0E-3)
         lam[3,5]<-0
         lam[1,6]<-0
         lam[2,6]~dnorm(0,1.0E-3)
         lam[3,6]<-0
         lam[1,7]<-0
         lam[2,7]<-0
         lam[3,7]<-1
         lam[1,8]<-0
         lam[2,8]<-0
         lam[3,8]~dnorm(0,1.0E-3)
         lam[1,9]<-0
         lam[2,9]<-0
         lam[3,9]~dnorm(0,1.0E-3)
    
        # Scaled inverse Wishart prior for
        # variance-covariance matrix of factor values
        # after Gelman and Hill (2007)
    
        for(i in 1:301)
        {  
          eta[1,i]<-xi.eta1*eta.raw[i,1]
          eta[2,i]<-xi.eta2*eta.raw[i,2]
          eta[3,i]<-xi.eta3*eta.raw[i,3]
          eta.raw[i,1:3]~dmnorm(eta.raw.hat[i,],tau.eta.raw[,])
          eta.raw.hat[i,1]<-mu.eta1.raw
          eta.raw.hat[i,2]<-mu.eta2.raw
          eta.raw.hat[i,3]<-mu.eta3.raw
        }
          mu.eta1<-xi.eta1*mu.eta1.raw
          mu.eta2<-xi.eta2*mu.eta2.raw
          mu.eta3<-xi.eta3*mu.eta3.raw
          mu.eta1.raw<-0
          mu.eta2.raw<-0
          mu.eta3.raw<-0
          xi.eta1~dunif(0,500)
          xi.eta2~dunif(0,500)
          xi.eta3~dunif(0,500)
          tau.eta.raw[1:3,1:3] ~ dwish(W[,],df)
          df<-4
          sigma.eta.raw[1:3,1:3]<-inverse(tau.eta.raw[,])
          sigma.eta1<-xi.eta1*sqrt(sigma.eta.raw[1,1])
          sigma.eta2<-xi.eta2*sqrt(sigma.eta.raw[2,2])
          sigma.eta3<-xi.eta3*sqrt(sigma.eta.raw[3,3])
          rho_eta12<-sigma.eta.raw[1,2]/sqrt(sigma.eta.raw[1,1]*sigma.eta.raw[2,2])
          rho_eta13<-sigma.eta.raw[1,3]/sqrt(sigma.eta.raw[1,1]*sigma.eta.raw[3,3])
          rho_eta23<-sigma.eta.raw[2,3]/sqrt(sigma.eta.raw[2,2]*sigma.eta.raw[3,3])
    }