James - 2013-01-22

I'm interested in the hypothesis that an observation y, comes from one of two populations. I'm assuming that the populations are multivariate normal, and that their parameters are estimated by two samples x_1, and x_2 respectively. The estimated means of populations are

-1.1 | 561.6 | 1381.8
1.2 | 399.8 | 1890.6

The questioned observation is

-1.2 | 494.0 | 1252.0

I won't bother with the covariance matrices

My BUGS model to test this is:

:::R
model{
for(i in 1:n[1]){
x1[i, 1:p] ~ dmnorm(mu[1, 1:p], tau[1:p, 1:p, 1])
}

for(i in 1:n[2]){
x2[i, 1:p] ~ dmnorm(mu[2, 1:p], tau[1:p, 1:p, 2])
}

for(k in 1:2){
mu[k, 1:p] ~ dmnorm(zeros[1:p], V[1:p, 1:p])
tau[1:p, 1:p, k] ~ dwish(V[1:p, 1:p], p)
S[1:p, 1:p, k] <- inverse(tau[1:p, 1:p, k])
}

u ~ dunif(0, 1)
k <- 1 + step(u - 0.5) # Population 1 or 2?

y[1:p] ~ dmnorm(mu[k, 1:p], tau[1:p, 1:p, k])
}

And the code to run it is:

:::R
p = 3
V = solve(cov(rbind(x1,x2)))#diag(rep(0.00001,p))
I = diag(rep(1,p))
zeros = rep(0, p)

tau0 = array(dim = c(p, p, 2))
tau0[,,1] = V
tau0[,,2] = V

bugsData = list(x1 = x1, x2 = x2, y = y, n = c(n1, n2), V = V, zeros = zeros, p = p)
bugsInits = list(list(mu = rbind(zeros,zeros), tau = tau0, u = 0))
bugsFile = "mv.bugs.r"

sim = jags.model(file = bugsFile, data = bugsData, inits = bugsInits)
system.time(update(sim,1000000))
parameters = c("mu","S","k")
sim.sample = coda.samples(model = sim, variable.names = parameters,
n.iter = 500000, n.thin = 50)

stats = summary(sim.sample)

The summary output shows good concordance between the sample means of x_1 and _x_2, but no matter what I do, the estimates of the covariances are very poor. As a test, I just tried fitting a multivariate normal to the x_1 data (and also to the x_2). The code is below, and the covariances are very close to the sample values

Test model
:::R

model{
for(i in 1:n){
x1[i, 1:3] ~ dmnorm(mu, invSigma)
}
mu[1:3] ~ dmnorm(zeros[1:3], V[1:3, 1:3])
invSigma[1:3, 1:3] ~ dwish(V, 3)
Sigma <- inverse(invSigma[1:3, 1:3])
}

Test code
:::R
p = 3

V = diag(rep(0.00001,p))
I = diag(rep(1,p))
zeros = rep(0, p)

bugsData = list(x1 = x1, n = nrow(x1), V = V, zeros = zeros)
bugsInits = list(list(mu = zeros, invSigma = I))
bugsFile = "test.bugs.r"

sim = jags.model(file = bugsFile, data = bugsData, inits = bugsInits)
system.time(update(sim,50000))
parameters = c("mu","Sigma")
sim.sample = coda.samples(model = sim, variable.names = parameters,
n.iter = 500000)

stats = summary(sim.sample)
Sigma = matrix(stats$statistics[1:9, 1], nc = p)
mu = matrix(stats$statistics[10:12,1], nc = p)

The data is not mine to give away, but if I can provide a simulated data set to anyone who can help. There are 29 observations in set x1 and 16 in set x2