On Tue, Jun 12, 2012 at 1:03 AM, Justin R <justinbrowe@...> wrote:
> operating system Windows 7
> matplotlib version : 1.1.0
> obtained from sourceforge
>
> the class seems to generate the same Wt matrix for every input. The
> every element of the weight matrix is either +sqrt(1/2) or sqrt(1/2).
>
> dat1 = 4*np.random.randn(200,1) + 2
> dat2 = dat1*.25 + 1*np.random.randn(200,1)
> pcaObj1 = PCA(np.hstack((dat1,dat2)))
> print pcaObj1.Wt
>
> dat3 = 2*np.random.randn(200,1) + 2
> dat4 = dat3*2 + 3*np.random.randn(200,1)
> pcaObj2 = PCA(np.hstack((dat1,dat2)))
> print pcaObj2.Wt
>
> The output Y seems to be correct, and the projection function works.
> only the Wt matrix seems to be messed up. Am I using this class
> incorrectly, or could this be a bug?
Hi,
I wouldn't call myself a PCA expert  so don't weight my answer too
heavily  but here is what I think is happening:
Looking at the code, the input data array is centered and scaled to
unit variance in each dimension. The attribute .a of the class is a
copy of the array that is actually sent to the SVD; note the
centering/scaling. I don't have a proof of this, but intuitively I
expect that the PCA axes associated with a 2dimension centered/scaled
array will always be at 45" angles (e.g., [1,1], [1,1], etc., which
are normalized to [sqrt(1/2), sqrt(1/2)], etc). I think one way to
describe this is that after centering/scaling there are no degrees of
freedom left if you only started with 2 dimensions. So I don't think
there is a bug, but it is maybe unclear what the PCA class is doing.
If you increase to > 2 dimensions, you can see there is random
fluctuation in Wt:
In [102]: pcaObj = PCA(np.random.randn(200,2))
In [103]: pcaObj.Wt
Out[103]:
array([[0.70710678, 0.70710678],
[0.70710678, 0.70710678]])
In [104]: pcaObj = PCA(np.random.randn(200,3))
In [105]: pcaObj.Wt
Out[105]:
array([[ 0.65456366, 0.24141116, 0.7164266 ],
[ 0.39843462, 0.91551401, 0.05553329],
[ 0.64249223, 0.32179924, 0.69544877]])
In [106]: pcaObj = PCA(np.random.randn(200,3))
In [107]: pcaObj.Wt
Out[107]:
array([[0.29885902, 0.67436982, 0.67521007],
[0.95428685, 0.21449891, 0.20815098],
[0.00446109, 0.70655189, 0.70764718]])
Hope that helps,
Aronne
