#10 PCA - add non-shortcut method for processing

Eleanor Howe

The way PCA is implemented now is a hack- a shortcut. We need to give the user the option of using the non-shortcut that takes much more memory, if they have enough. Details below from emails:

John Quackenbush wrote:

I got this message and was perplexed by the fact that the number of components and
eigenvalue distributions are the same for gene or sample clusters.

Would somebody look into this and see if there is a bug?


-------- Original Message --------
Subject: Re: PCA cluster question
Date: Wed, 12 Sep 2007 00:02:40 -0400
From: nazaire@broad.mit.edu
To: johnq@jimmy.harvard.edu


Not completely. My understanding is that if the samples are the
variables, then the number of principal components should be example
to the number of samples. If the genes are the variables, then the
number of principal components should be equal to the number of genes.
Am I wrong about this? Also it looks like the eigenvalues and percent
variation are the same whether the samples or the genes are the
variables on the the same dataset? Shouldn't they be different?

John Quackenbush wrote:

If you cluster "samples," you get a representation of the samples
in "gene space" so you should have as many spots as samples.And if you
cluster genes, you get genes in sample space - so you should have
thousands of spots.Does that make sense?Marc-Danie Nazaire wrote:

> John,Thanks for replying. What I am still not clear about is what
> the difference between selecting the samples or genes? Is it the
> same as selecting either the genes or samples as variables?Thanks,
> Marc-Danie John Quackenbush wrote: PCA is a dimensional reduction
> approach. What it does is creates
> linear combinations of the genes that best capture the variation in
> the data. It is impossible to draw a figure in 10,000 dimensions,
> but the PCA aces are, indeed, combinations of the 10,000 genes on
> the array.I hope that makes sense.Marc-Danie Nazaire wrote:
> Hello,I wanted to know why the number of principal components are
> not equal to the number of genes when doing clustering by
> samples?Thanks, Marc-Danie


  • Eleanor Howe
    Eleanor Howe

    • status: open --> closed-fixed