You can subscribe to this list here.
2010 
_{Jan}
(23) 
_{Feb}
(4) 
_{Mar}
(56) 
_{Apr}
(74) 
_{May}
(107) 
_{Jun}
(79) 
_{Jul}
(212) 
_{Aug}
(122) 
_{Sep}
(289) 
_{Oct}
(176) 
_{Nov}
(531) 
_{Dec}
(268) 

2011 
_{Jan}
(255) 
_{Feb}
(157) 
_{Mar}
(199) 
_{Apr}
(274) 
_{May}
(495) 
_{Jun}
(157) 
_{Jul}
(276) 
_{Aug}
(212) 
_{Sep}
(356) 
_{Oct}
(356) 
_{Nov}
(421) 
_{Dec}
(365) 
2012 
_{Jan}
(530) 
_{Feb}
(236) 
_{Mar}
(495) 
_{Apr}
(286) 
_{May}
(347) 
_{Jun}
(253) 
_{Jul}
(335) 
_{Aug}
(254) 
_{Sep}
(429) 
_{Oct}
(506) 
_{Nov}
(358) 
_{Dec}
(147) 
2013 
_{Jan}
(492) 
_{Feb}
(328) 
_{Mar}
(477) 
_{Apr}
(348) 
_{May}
(248) 
_{Jun}
(237) 
_{Jul}
(526) 
_{Aug}
(407) 
_{Sep}
(253) 
_{Oct}
(263) 
_{Nov}
(202) 
_{Dec}
(184) 
2014 
_{Jan}
(246) 
_{Feb}
(258) 
_{Mar}
(305) 
_{Apr}
(168) 
_{May}
(182) 
_{Jun}
(238) 
_{Jul}
(335) 
_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

From: Michael Eickenberg <michael.eickenberg@gm...>  20140731 14:42:00

OK, that may explain it: You have less samples than features. You mean center your data. This means that you are adding one singular value of 0. The last component corresponds to this 0 singular value. But actually the space of the 0 singular value is 8 dimensional and can be arbitrarily spanned by any orthonormal system. You are only looking at one of these 8 vectors. All 8 of them together will span the same space whether you use X or X.T. But since this decomposition is arbitrary, it may find different vectors on different runs, e.g. with X and X.T. Hope this is it :) Michael On Thu, Jul 31, 2014 at 4:25 PM, Deepak Pandian <peerlessdeepaks@...> wrote: > On Thu, Jul 31, 2014 at 7:49 PM, Kyle Kastner <kastnerkyle@...> > wrote: > > It looks like the transpose may make the system underdetermined. If you > try > > with > > > > X = np.random.randn(*X.shape) > > Sorry I didnt quite get what to do here, I tried, > shape=(3,10) > X = rng.randn(*shape) > > and it doesn't change anything. > >  > With Regards, > Deepak Pandian > "Deconstructing world one piece at a time" > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Kyle Kastner <kastnerkyle@gm...>  20140731 14:40:02

I did not see your earlier script... now I am interested. I have been hacking on it but don't know what is going on yet. On Thu, Jul 31, 2014 at 4:25 PM, Deepak Pandian <peerlessdeepaks@...> wrote: > On Thu, Jul 31, 2014 at 7:49 PM, Kyle Kastner <kastnerkyle@...> > wrote: > > It looks like the transpose may make the system underdetermined. If you > try > > with > > > > X = np.random.randn(*X.shape) > > Sorry I didnt quite get what to do here, I tried, > shape=(3,10) > X = rng.randn(*shape) > > and it doesn't change anything. > >  > With Regards, > Deepak Pandian > "Deconstructing world one piece at a time" > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Deepak Pandian <peerlessdeepaks@gm...>  20140731 14:26:00

On Thu, Jul 31, 2014 at 7:49 PM, Kyle Kastner <kastnerkyle@...> wrote: > It looks like the transpose may make the system underdetermined. If you try > with > > X = np.random.randn(*X.shape) Sorry I didnt quite get what to do here, I tried, shape=(3,10) X = rng.randn(*shape) and it doesn't change anything.  With Regards, Deepak Pandian "Deconstructing world one piece at a time" 
From: Kyle Kastner <kastnerkyle@gm...>  20140731 14:19:48

It looks like the transpose may make the system underdetermined. If you try with X = np.random.randn(*X.shape) What happens? On Thu, Jul 31, 2014 at 4:17 PM, Kyle Kastner <kastnerkyle@...> wrote: > What is the shape of X > > > On Thu, Jul 31, 2014 at 4:14 PM, Deepak Pandian <peerlessdeepaks@... > > wrote: > >> On Thu, Jul 31, 2014 at 7:31 PM, Olivier Grisel >> <olivier.grisel@...> wrote: >> > The sign of the components is not deterministic. The absolute values >> > should be the same. >> >> But the last component differs even in absolute values, >> Pca_components: >> [[0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 >> 0.24499776 0.02891609 0.10832649 0.09922362] >> [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 >> 0.62203739 0.03558049 0.0780125 0.27114941] >> [ 0.3481033 0.32602361 0.20594773 0.26088651 0.63592937 0.39839983 >> 0.20642102 0.23287936 0.0447964 0.00887286]] >> >> SVD on X (matches pca since it follows the same steps) >> Below is V from svd of (X) >> [[0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 >> 0.24499776 0.02891609 0.10832649 0.09922362] >> [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 >> 0.62203739 0.03558049 0.0780125 0.27114941] >> [ 0.3481033 0.32602361 0.20594773 0.26088651 0.63592937 0.39839983 >> 0.20642102 0.23287936 0.0447964 0.00887286] >> >> >> SVD on X.T: >> >> Below is U.T of SVD(X.T) >> [0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 >> 0.24499776 0.02891609 0.10832649 0.09922362] >> [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 >> 0.62203739 0.03558049 0.0780125 0.27114941] >> [ 0.39253032 0.27426443 0.09946156 0.17669364 0.55699776 0.60982874 >> 0.0461356 0.18837449 0.04298647 0.08936917] >> >> You can see that the final component doesn't match in terms of absolute >> values. >> >> >> I am pasting the code I used for svd(X.T) to check if I am making some >> mistakes here. >> def svd_t(X): >> #x.shape = n_features,n_samples >> #every sample is a column >> mean= np.mean(X,axis=1) >> mean = mean[:,np.newaxis] >> X = X mean >> u,s,v = np.linalg.svd(X,full_matrices=True) >> return u,s,v >> >> >> Thanks >> Deepak >> >>  >> With Regards, >> Deepak Pandian >> "Deconstructing world one piece at a time" >> >> >>  >> Infragistics Professional >> Build stunning WinForms apps today! >> Reboot your WinForms applications with our WinForms controls. >> Build a bridge from your legacy apps to the future. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > > 
From: Kyle Kastner <kastnerkyle@gm...>  20140731 14:18:00

What is the shape of X On Thu, Jul 31, 2014 at 4:14 PM, Deepak Pandian <peerlessdeepaks@...> wrote: > On Thu, Jul 31, 2014 at 7:31 PM, Olivier Grisel > <olivier.grisel@...> wrote: > > The sign of the components is not deterministic. The absolute values > > should be the same. > > But the last component differs even in absolute values, > Pca_components: > [[0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 > 0.24499776 0.02891609 0.10832649 0.09922362] > [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 > 0.62203739 0.03558049 0.0780125 0.27114941] > [ 0.3481033 0.32602361 0.20594773 0.26088651 0.63592937 0.39839983 > 0.20642102 0.23287936 0.0447964 0.00887286]] > > SVD on X (matches pca since it follows the same steps) > Below is V from svd of (X) > [[0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 > 0.24499776 0.02891609 0.10832649 0.09922362] > [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 > 0.62203739 0.03558049 0.0780125 0.27114941] > [ 0.3481033 0.32602361 0.20594773 0.26088651 0.63592937 0.39839983 > 0.20642102 0.23287936 0.0447964 0.00887286] > > > SVD on X.T: > > Below is U.T of SVD(X.T) > [0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 > 0.24499776 0.02891609 0.10832649 0.09922362] > [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 > 0.62203739 0.03558049 0.0780125 0.27114941] > [ 0.39253032 0.27426443 0.09946156 0.17669364 0.55699776 0.60982874 > 0.0461356 0.18837449 0.04298647 0.08936917] > > You can see that the final component doesn't match in terms of absolute > values. > > > I am pasting the code I used for svd(X.T) to check if I am making some > mistakes here. > def svd_t(X): > #x.shape = n_features,n_samples > #every sample is a column > mean= np.mean(X,axis=1) > mean = mean[:,np.newaxis] > X = X mean > u,s,v = np.linalg.svd(X,full_matrices=True) > return u,s,v > > > Thanks > Deepak > >  > With Regards, > Deepak Pandian > "Deconstructing world one piece at a time" > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Deepak Pandian <peerlessdeepaks@gm...>  20140731 14:14:35

On Thu, Jul 31, 2014 at 7:31 PM, Olivier Grisel <olivier.grisel@...> wrote: > The sign of the components is not deterministic. The absolute values > should be the same. But the last component differs even in absolute values, Pca_components: [[0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 0.24499776 0.02891609 0.10832649 0.09922362] [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 0.62203739 0.03558049 0.0780125 0.27114941] [ 0.3481033 0.32602361 0.20594773 0.26088651 0.63592937 0.39839983 0.20642102 0.23287936 0.0447964 0.00887286]] SVD on X (matches pca since it follows the same steps) Below is V from svd of (X) [[0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 0.24499776 0.02891609 0.10832649 0.09922362] [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 0.62203739 0.03558049 0.0780125 0.27114941] [ 0.3481033 0.32602361 0.20594773 0.26088651 0.63592937 0.39839983 0.20642102 0.23287936 0.0447964 0.00887286] SVD on X.T: Below is U.T of SVD(X.T) [0.61098855 0.54326104 0.3089728 0.29598692 0.01767037 0.25638981 0.24499776 0.02891609 0.10832649 0.09922362] [0.04285247 0.58774877 0.02102497 0.18987243 0.09547488 0.37323125 0.62203739 0.03558049 0.0780125 0.27114941] [ 0.39253032 0.27426443 0.09946156 0.17669364 0.55699776 0.60982874 0.0461356 0.18837449 0.04298647 0.08936917] You can see that the final component doesn't match in terms of absolute values. I am pasting the code I used for svd(X.T) to check if I am making some mistakes here. def svd_t(X): #x.shape = n_features,n_samples #every sample is a column mean= np.mean(X,axis=1) mean = mean[:,np.newaxis] X = X mean u,s,v = np.linalg.svd(X,full_matrices=True) return u,s,v Thanks Deepak  With Regards, Deepak Pandian "Deconstructing world one piece at a time" 
From: Olivier Grisel <olivier.grisel@en...>  20140731 14:02:29

The sign of the components is not deterministic. The absolute values should be the same. Alternatively fix the sign with svd_flip: https://github.com/scikitlearn/scikitlearn/blob/master/sklearn/utils/extmath.py#L525  Olivier 
From: Deepak Pandian <peerlessdeepaks@gm...>  20140731 13:11:19

On Thu, Jul 31, 2014 at 6:30 PM, Michael Eickenberg <michael.eickenberg@...> wrote: > full_matrices=False may be the reason if your matrix isn't square. Thanks Michael. I tried full_matrices=True and also called PCA without specifying the n_components,but still I see a mismatch exactly for the last component alone.  With Regards, Deepak Pandian "Deconstructing world one piece at a time" 
From: Michael Eickenberg <michael.eickenberg@gm...>  20140731 13:01:02

full_matrices=False may be the reason if your matrix isn't square. On Thu, Jul 31, 2014 at 2:57 PM, Deepak Pandian <peerlessdeepaks@...> wrote: > Thanks for the quick look. > On Thu, Jul 31, 2014 at 6:19 PM, Michael Eickenberg > <michael.eickenberg@...> wrote: > > our components should be in ut.T > > Oh yes. I was actually looking in ut.T but typed wrong in the mail. > print pca_comp > print v > print ut.T > assert_array_almost_equal(v,pca_comp,3) > assert_array_almost_equal(ut.T,pca_comp,3) > > The first assertion gets away fine. But the final component of ut.T > alone is not matching. I am struggling to find a reason for that. > Thanks for your time. > >  > With Regards, > Deepak Pandian > "Deconstructing world one piece at a time" > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Deepak Pandian <peerlessdeepaks@gm...>  20140731 12:58:07

Thanks for the quick look. On Thu, Jul 31, 2014 at 6:19 PM, Michael Eickenberg <michael.eickenberg@...> wrote: > our components should be in ut.T Oh yes. I was actually looking in ut.T but typed wrong in the mail. print pca_comp print v print ut.T assert_array_almost_equal(v,pca_comp,3) assert_array_almost_equal(ut.T,pca_comp,3) The first assertion gets away fine. But the final component of ut.T alone is not matching. I am struggling to find a reason for that. Thanks for your time.  With Regards, Deepak Pandian "Deconstructing world one piece at a time" 
From: Michael Eickenberg <michael.eickenberg@gm...>  20140731 12:49:16

Your components should be in ut.T On Thu, Jul 31, 2014 at 2:48 PM, Michael Eickenberg < michael.eickenberg@...> wrote: > Forget what I said. I didn't read your test properly. > > > On Thu, Jul 31, 2014 at 2:47 PM, Michael Eickenberg < > michael.eickenberg@...> wrote: > >> vt.T should be equal to the output of sklearn_pca(X.T). >> >> There is a big difference in centered across samples or across features. >> Sklearn PCA centers across samples and that is the standard afaik. >> >> Hope that helps, >> Michael >> >> >> On Thu, Jul 31, 2014 at 2:41 PM, Deepak Pandian < >> peerlessdeepaks@...> wrote: >> >>> Hello Everyone, >>> >>> I assumed that doing a PCA on X is equivalent to performing a SVD on >>> a meancentered X. >>> >>> For sklearn.pca the input matrix is of the shape, (n_samples,n_features). >>> >>> When I perform a SVD on a matrix X shaped (n_features,n_samples) >>> ,some of the eigen vectors arent matching with the pca.components_ >>> >>> >>> def sklearn_pca(X): >>> pca =PCA(n_components=4) >>> pca.fit(X) >>> return pca.components_ >>> >>> >>> def svd_pca(X): >>> #every sample is a row >>> X= X np.mean(X,axis=0) >>> u,s,v = np.linalg.svd(X,full_matrices=False) >>> return u,s,v >>> >>> def svd_t(X): >>> #x.shape = n_features,n_samples >>> #every sample is a column >>> mean= np.mean(X,axis=1) >>> mean = mean[:,np.newaxis] >>> X = X mean >>> u,s,v = np.linalg.svd(X,full_matrices=False) >>> return u,s,v >>> >>> def test_svd_pca(): >>> rng = np.random.RandomState(1) >>> X = rng.randn(4,10) >>> XT = np.copy(X) >>> PX= np.copy(X) >>> pca_comp = sklearn_pca(PX) >>> u,s,v = svd_pca(X) >>> ut,st,vt = svd_t(X.T) >>> >>> >>> I expected pca.components_ to be equivalent to v and vt.T. While its >>> same as v , I get mismatches with vt.T >>> >>> Is it right to expect vt.T to be same as pca.components_? If not >>> please help me clear my misunderstanding. >>> >>>  >>> With Regards, >>> Deepak >>> >>> >>>  >>> Infragistics Professional >>> Build stunning WinForms apps today! >>> Reboot your WinForms applications with our WinForms controls. >>> Build a bridge from your legacy apps to the future. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Scikitlearngeneral mailing list >>> Scikitlearngeneral@... >>> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>> >> >> > 
From: Michael Eickenberg <michael.eickenberg@gm...>  20140731 12:48:09

Forget what I said. I didn't read your test properly. On Thu, Jul 31, 2014 at 2:47 PM, Michael Eickenberg < michael.eickenberg@...> wrote: > vt.T should be equal to the output of sklearn_pca(X.T). > > There is a big difference in centered across samples or across features. > Sklearn PCA centers across samples and that is the standard afaik. > > Hope that helps, > Michael > > > On Thu, Jul 31, 2014 at 2:41 PM, Deepak Pandian <peerlessdeepaks@... > > wrote: > >> Hello Everyone, >> >> I assumed that doing a PCA on X is equivalent to performing a SVD on >> a meancentered X. >> >> For sklearn.pca the input matrix is of the shape, (n_samples,n_features). >> >> When I perform a SVD on a matrix X shaped (n_features,n_samples) >> ,some of the eigen vectors arent matching with the pca.components_ >> >> >> def sklearn_pca(X): >> pca =PCA(n_components=4) >> pca.fit(X) >> return pca.components_ >> >> >> def svd_pca(X): >> #every sample is a row >> X= X np.mean(X,axis=0) >> u,s,v = np.linalg.svd(X,full_matrices=False) >> return u,s,v >> >> def svd_t(X): >> #x.shape = n_features,n_samples >> #every sample is a column >> mean= np.mean(X,axis=1) >> mean = mean[:,np.newaxis] >> X = X mean >> u,s,v = np.linalg.svd(X,full_matrices=False) >> return u,s,v >> >> def test_svd_pca(): >> rng = np.random.RandomState(1) >> X = rng.randn(4,10) >> XT = np.copy(X) >> PX= np.copy(X) >> pca_comp = sklearn_pca(PX) >> u,s,v = svd_pca(X) >> ut,st,vt = svd_t(X.T) >> >> >> I expected pca.components_ to be equivalent to v and vt.T. While its >> same as v , I get mismatches with vt.T >> >> Is it right to expect vt.T to be same as pca.components_? If not >> please help me clear my misunderstanding. >> >>  >> With Regards, >> Deepak >> >> >>  >> Infragistics Professional >> Build stunning WinForms apps today! >> Reboot your WinForms applications with our WinForms controls. >> Build a bridge from your legacy apps to the future. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > > 
From: Michael Eickenberg <michael.eickenberg@gm...>  20140731 12:47:27

vt.T should be equal to the output of sklearn_pca(X.T). There is a big difference in centered across samples or across features. Sklearn PCA centers across samples and that is the standard afaik. Hope that helps, Michael On Thu, Jul 31, 2014 at 2:41 PM, Deepak Pandian <peerlessdeepaks@...> wrote: > Hello Everyone, > > I assumed that doing a PCA on X is equivalent to performing a SVD on > a meancentered X. > > For sklearn.pca the input matrix is of the shape, (n_samples,n_features). > > When I perform a SVD on a matrix X shaped (n_features,n_samples) > ,some of the eigen vectors arent matching with the pca.components_ > > > def sklearn_pca(X): > pca =PCA(n_components=4) > pca.fit(X) > return pca.components_ > > > def svd_pca(X): > #every sample is a row > X= X np.mean(X,axis=0) > u,s,v = np.linalg.svd(X,full_matrices=False) > return u,s,v > > def svd_t(X): > #x.shape = n_features,n_samples > #every sample is a column > mean= np.mean(X,axis=1) > mean = mean[:,np.newaxis] > X = X mean > u,s,v = np.linalg.svd(X,full_matrices=False) > return u,s,v > > def test_svd_pca(): > rng = np.random.RandomState(1) > X = rng.randn(4,10) > XT = np.copy(X) > PX= np.copy(X) > pca_comp = sklearn_pca(PX) > u,s,v = svd_pca(X) > ut,st,vt = svd_t(X.T) > > > I expected pca.components_ to be equivalent to v and vt.T. While its > same as v , I get mismatches with vt.T > > Is it right to expect vt.T to be same as pca.components_? If not > please help me clear my misunderstanding. > >  > With Regards, > Deepak > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Deepak Pandian <peerlessdeepaks@gm...>  20140731 12:42:22

Hello Everyone, I assumed that doing a PCA on X is equivalent to performing a SVD on a meancentered X. For sklearn.pca the input matrix is of the shape, (n_samples,n_features). When I perform a SVD on a matrix X shaped (n_features,n_samples) ,some of the eigen vectors arent matching with the pca.components_ def sklearn_pca(X): pca =PCA(n_components=4) pca.fit(X) return pca.components_ def svd_pca(X): #every sample is a row X= X np.mean(X,axis=0) u,s,v = np.linalg.svd(X,full_matrices=False) return u,s,v def svd_t(X): #x.shape = n_features,n_samples #every sample is a column mean= np.mean(X,axis=1) mean = mean[:,np.newaxis] X = X mean u,s,v = np.linalg.svd(X,full_matrices=False) return u,s,v def test_svd_pca(): rng = np.random.RandomState(1) X = rng.randn(4,10) XT = np.copy(X) PX= np.copy(X) pca_comp = sklearn_pca(PX) u,s,v = svd_pca(X) ut,st,vt = svd_t(X.T) I expected pca.components_ to be equivalent to v and vt.T. While its same as v , I get mismatches with vt.T Is it right to expect vt.T to be same as pca.components_? If not please help me clear my misunderstanding.  With Regards, Deepak 
From: Michael Eickenberg <michael.eickenberg@gm...>  20140731 12:15:12

That isn't bounded, but 1 / (1 + dist) would be. exp(dist / c) would probably get too small too quickly. On Thu, Jul 31, 2014 at 2:09 PM, Lars Buitinck <larsmans@...> wrote: > 20140731 14:04 GMT+02:00 Sheila the angel <from.d.putto@...>: > > Also the NearestCentroid classifier do not have decision_function ! > > I think we should add one, but I've never bothered to figure out what > the right decision function would be. Inverse of distance? > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Lars Buitinck <larsmans@gm...>  20140731 12:10:13

20140731 14:04 GMT+02:00 Sheila the angel <from.d.putto@...>: > Also the NearestCentroid classifier do not have decision_function ! I think we should add one, but I've never bothered to figure out what the right decision function would be. Inverse of distance? 
From: Sheila the angel <from.putto@gm...>  20140731 12:05:07

Also the NearestCentroid classifier do not have decision_function ! >In some cases, you can get more information from classifier.decision_function(). On 28 July 2014 20:43, Josh Vredevoogd <cleverless@...> wrote: > In some cases, you can get more information from > classifier.decision_function(). The output will not be a probability but > can still be more useful than the binary result  I'm thinking of > metaclassifiers or classifier evaluation. Caveat: there are likely gotchas > in going this direction if you don't know how the classifier works. > > > On Mon, Jul 28, 2014 at 11:14 AM, Lars Buitinck <larsmans@...> > wrote: > >> 20140728 18:39 GMT+02:00 Sheila the angel <from.d.putto@...>: >> > For the classifier which do not provide probability estimate of the >> class >> > (gives error 'object has no attribute predict_proba " ), is there any >> easy >> > way to calculate the posterior probability? >> >> No. If there were, we would have implemented predict_proba. >> >> (Or yes, but it's always zero or one.) >> >> >>  >> Infragistics Professional >> Build stunning WinForms apps today! >> Reboot your WinForms applications with our WinForms controls. >> Build a bridge from your legacy apps to the future. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > 
From: Christian Schulz <mining.facts@gm...>  20140730 11:04:28

Hi, knows somebody a feasible approach to use multiclass classification with ~ 1500 categories instead of generating 1500 binary models? In the past I avoid as much as possible multiclass classification for obvious reason. For my current issue it would have some important advantages at least for prototyping (1) No need to organize this huge amount of models in a database (serialization) (2) Comparability between the scores Disadvantage: (1) Difficult to adjust/weighting the outcome Many thanks Christian 
From: Danny Sullivan <dsullivan7@ho...>  20140730 10:55:52

Ok, sounds good. I only brought it up because it made unit testing a little bit trickier but I found a work around. I'll see if I can do some tests to try results with different intercept_decay values. On 7/30/14, 12:01 PM, Peter Prettenhofer wrote: > The way I implemented it, the learning rate for the intercept should > be 0.01 times the learning rate of the other features. > The value of .01 is something that I set empirically, I adopted it > from Leon Buttou's sgd project and experimented with different values. > I found that lower intercept learning rates help a bit but the > concrete value is not too important  so I decided to use a fixed value. > I think the decay value might in fact be a function of the number of > nonzero values per feature. If you have a dataset with sparse and > dense features then intercept decay should be turned off  > alternatively, you can also scale the dense features to decrease their > magnitude. > > > 20140730 11:42 GMT+02:00 Danny Sullivan <dsullivan7@... > <mailto:dsullivan7@...>>: > > I found that for sparse data, the scikit implementation of sgd > uses an intercept_decay variable set to .01 > (SPARSE_INTERCEPT_DECAY) to avoid intercept oscillation. Shouldn't > this be determined by the learning_rate instead? I'm asking > because it adds a layer of tuning that the user doesn't have > control over. > > Danny > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > <mailto:Scikitlearngeneral@...> > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > > > >  > Peter Prettenhofer > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > > > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: Sean Violante <sean.violante@gm...>  20140730 10:09:25

If its a numerical matrix, what about using the (standard?) matrix market format ( which is readable by both scipy and R matrix package (readMM,writeMM) http://math.nist.gov/MatrixMarket/formats.html http://docs.scipy.org/doc/scipy/reference/io.html 
From: Peter Prettenhofer <peter.prettenhofer@gm...>  20140730 10:01:26

The way I implemented it, the learning rate for the intercept should be 0.01 times the learning rate of the other features. The value of .01 is something that I set empirically, I adopted it from Leon Buttou's sgd project and experimented with different values. I found that lower intercept learning rates help a bit but the concrete value is not too important  so I decided to use a fixed value. I think the decay value might in fact be a function of the number of nonzero values per feature. If you have a dataset with sparse and dense features then intercept decay should be turned off  alternatively, you can also scale the dense features to decrease their magnitude. 20140730 11:42 GMT+02:00 Danny Sullivan <dsullivan7@...>: > I found that for sparse data, the scikit implementation of sgd uses an > intercept_decay variable set to .01 (SPARSE_INTERCEPT_DECAY) to avoid > intercept oscillation. Shouldn't this be determined by the learning_rate > instead? I'm asking because it adds a layer of tuning that the user doesn't > have control over. > > Danny > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > >  Peter Prettenhofer 
From: Hamed Zamani <hamedzamani@ac...>  20140730 09:55:25

Hi, Here is the code for balanced accuracy for two classes: def balanced_accuracy(y_true, y_pred): total_pos = 0 total_neg = 0 correct_pos = 0 correct_neg = 0 for k in range(len(y_true)): if (y_true[k] < 0): total_neg += 1; if (y_pred[k] < 0): correct_neg += 1 else: total_pos += 1 if (y_pred[k] > 0): correct_pos += 1 if (total_neg == 0 or total_pos == 0): raise Exception("There is not any sample data for at least one of the classes") return (float(correct_pos)/total_pos+float(correct_neg)/total_neg)/2 However, it is also may be better to use confusion matrix implemented in scikit. If you want I can also write the code for multiclass. Cheers, Hamed On Tue, Jul 29, 2014 at 12:12 AM, Yogesh Karpate <yogeshkarpate@...> wrote: > Dear Hamed, > Can you share the code of "balanced accuracy" as you mentioned in last > mail. > > > On Tue, Jul 29, 2014 at 12:07 AM, Hamed Zamani <hamedzamani@...> > wrote: > >> Dear Mario, >> >> Yes of course. Sorry I forgot to mention GMeans. It is also one of the >> measures which have been used frequently. >> >>  Hamed >> >> >> >> On Tue, Jul 29, 2014 at 2:24 AM, Mario Michael Krell <krell@... >> > wrote: >> >>> Dear Hamed, >>> >>> I think it would be a good idea to also consider gmean when extending >>> scikit. It is the geometric mean of TNR and TPR instead of the arithmetic >>> mean used for the balanced accuracy. >>> >>> Greets >>> >>> Mario >>> >>> On 28.07.2014, at 19:00, >>> scikitlearngeneralrequest@... wrote: >>> >>> Dear Joel, >>> >>> Sorry for the delay. I was in a trip and I couldn't check my email. >>> >>> To the best of my knowledge and according to the kind responses in this >>> email thread, we cannot claim that an specific measure is better than the >>> others for imbalanced data. In other words, there are some evaluation >>> measure suitable for imbalanced data and each of them has its own >>> advantages. Hence, choosing the best evaluation measure totally depends >>> on >>> the application which you are working on. >>> >>> Anyway, "Matthew's Correlation Coefficient", "AUC of ROC", "Fmeasure", >>> "Balanced Accuracy", and generally "Weighted Accuracy" have been used >>> frequently in the literature. Among these measures, only "balanced >>> accuracy" is not developed in scikitlearn and I think it is worthwhile >>> to >>> add it to this library. I have developed it before and if you want I can >>> add it to the project or send it to you. >>> >>> Kind Regards, >>> Hamed >>> >>> >>> >>> >>>  >>> Infragistics Professional >>> Build stunning WinForms apps today! >>> Reboot your WinForms applications with our WinForms controls. >>> Build a bridge from your legacy apps to the future. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >>> >>> _______________________________________________ >>> Scikitlearngeneral mailing list >>> Scikitlearngeneral@... >>> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>> >>> >> >> >>  >> Infragistics Professional >> Build stunning WinForms apps today! >> Reboot your WinForms applications with our WinForms controls. >> Build a bridge from your legacy apps to the future. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> >> > > >  > Warm Regards > Yogesh Karpate > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > 
From: Danny Sullivan <dsullivan7@ho...>  20140730 09:42:32

I found that for sparse data, the scikit implementation of sgd uses an intercept_decay variable set to .01 (SPARSE_INTERCEPT_DECAY) to avoid intercept oscillation. Shouldn't this be determined by the learning_rate instead? I'm asking because it adds a layer of tuning that the user doesn't have control over. Danny 
From: Nicolas <nicolas.fauchereau@gm...>  20140730 00:02:25

Hi Raphael I would have a look at this talk (Scipy 2014): https://www.youtube.com/watch?v=MxK7Fe4xfXM&list=PLYx7XA2nY5GfuhCvStxgbynFNrxr3VFog&index=111 as they have developed library to identify / track particles in images and series of images cheers Nico On 30 July 2014 09:50, Raphael Okoye <raphael@...> wrote: > hi Ronnie, > > Thanks for the response. I want to extract the blobs/crystals from the > background. I want to be able to accurately account for all the blobs in > each image. As an aside question, please do you know if my existing python > version will still work if I install Anaconda python on it? > > Regards > Raphael. > > > > > On 29 July 2014 13:21, Ronnie Ghose <ronnie.ghose@...> wrote: > >> ... hasn't been in the last few days afaik Raphael. And looking at the >> LAB channel alone without any changes *will* segment a few of those in >> the 3rd image, also what features do you need exactly....? >> >> >> On Tue, Jul 29, 2014 at 4:16 PM, Raphael Okoye <raphael@...> >> wrote: >> >>> hi Andy, >>> >>> Thanks a lot. I have tried on my own but not with scikimage. I used >>> cell profiler http://www.cellprofiler.org/... I tried using >>> scikitimage but it's not working properly because not all it's libararies >>> are working in the version of python I'm using. I was advised to install >>> Anaconda and then proceed. >>> >>> Regards >>> Raphael >>> >>> >>> On 29 July 2014 19:36, Andy <t3kcit@...> wrote: >>> >>>> Hi Raphael. >>>> This list is about the scikitlearn machine learning package, not image >>>> analysis. >>>> You might want to talk to the scikitimage list. >>>> However, just asking for help for a general task without connecting it >>>> to the package or having tried anything yourself is >>>> not likely to get a great answer anywhere. >>>> >>>> Best, >>>> Andy >>>> >>>> >>>> >>>> On 07/29/2014 07:03 PM, Raphael Okoye wrote: >>>> >>>> hi all, >>>> >>>> I want to analyze these images. See links below. The background and >>>> the crystals have the same color thus, making it difficult to segment. And >>>> since the background and crystals have the same color, it is difficult to >>>> simply subtract the background. Please, I need ideas on how to proceed with >>>> this analysis. Thanks. >>>> >>>> Raphael. >>>> >>>> >>>> https://drive.google.com/file/d/0B6nj1d7RmQ7TRmNwZzI1ajdhbFk/edit?usp=sharing >>>> >>>> >>>> https://drive.google.com/file/d/0B6nj1d7RmQ7TUUNpNEl4VzlWOEE/edit?usp=sharing >>>> >>>> >>>> https://drive.google.com/file/d/0B6nj1d7RmQ7TcHpoNjM0ZFhSZ1U/edit?usp=sharing >>>> >>>> >>>>  >>>> Infragistics Professional >>>> Build stunning WinForms apps today! >>>> Reboot your WinForms applications with our WinForms controls. >>>> Build a bridge from your legacy apps to the future.http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >>>> >>>> >>>> >>>> _______________________________________________ >>>> Scikitlearngeneral mailing listScikitlearngeneral@...nethttps://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>>> >>>> >>>> >>>> >>>>  >>>> Infragistics Professional >>>> Build stunning WinForms apps today! >>>> Reboot your WinForms applications with our WinForms controls. >>>> Build a bridge from your legacy apps to the future. >>>> >>>> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >>>> _______________________________________________ >>>> Scikitlearngeneral mailing list >>>> Scikitlearngeneral@... >>>> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>>> >>>> >>> >>> >>>  >>> Infragistics Professional >>> Build stunning WinForms apps today! >>> Reboot your WinForms applications with our WinForms controls. >>> Build a bridge from your legacy apps to the future. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Scikitlearngeneral mailing list >>> Scikitlearngeneral@... >>> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>> >>> >> >> >>  >> Infragistics Professional >> Build stunning WinForms apps today! >> Reboot your WinForms applications with our WinForms controls. >> Build a bridge from your legacy apps to the future. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> >> > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > >   Dr. Nicolas Fauchereau Climate Scientist  National Climate Centre National Institute of Water and Atmospheric Research (NIWA) Ltd. 41 Market Place Viaduct Precinct, Auckland NEW ZEALAND Tel: +64 (0)9 375 2053  "It is a mistake to think you can solve any major problems just with potatoes.". Douglas Adams. 
From: Joel Nothman <joel.nothman@gm...>  20140729 22:22:19

See the refit parameter. Iff refit=True, the model is fit on the entire training data using the bestfound (hyper)parameters, making full use of the available training data in the final model. On 30 July 2014 00:35, Pagliari, Roberto <rpagliari@...> wrote: > Hi Joel, > > That’s what I thought, but I got confused by a previous comment: > > > > “This is not entirely correct. *The "best_estimator_" is retrained on the > whole training set*, while best_score_ is the average over folds. > I like your string for best_estimator_, but for best_score_ I would > probably also say "Highest average score of the best parameter setting over > crossvalidation folds".” > > > > So C is the value that provides the best average score over the k folds > (and best_score_ is the corresponding value returned by gridSearchCV). > > > > So now the question is, how are the weights computed? Are they computed > using the whole training set (with C found earlier), or are they the > averaged over the k folds? This is not explicitly mentioned in the > documentation. > > > > I’m trying to understand what the text highlighted above means. > > > > Thank you, > > Roberto > > > > > > > > *From:* Joel Nothman [mailto:joel.nothman@...] > *Sent:* Monday, July 28, 2014 8:28 PM > > *To:* scikitlearngeneral > *Subject:* Re: [Scikitlearngeneral] gridSearchCV best_estimator_ > best_score_ > > > > Make sure you read and understand > http://scikitlearn.org/stable/modules/cross_validation.html. Basically, > getting the score of the final model on the full training data will be a > poor indication of how well the model will perform on other data. The > average of k folds where we have held out test data will be a much better > indication of how well the model will generalise to new instances. > > > > On 29 July 2014 07:54, Pagliari, Roberto <rpagliari@...> wrote: > > Hi Joel, > > Just to make sure I understood. > > > >  C is computed with cross validation, by finding the highest > average score over the k folds > >  Once C is found, weights are computed over the whole training > set. > > > > If that’s the case, why is the best_score_ averaged over the k folds? > Shouldn’t it be computed over the whole training set, since that’s the way > the weights were determined? > > > > Thank you again for the clarification, > > > > > > > > > > *From:* Joel Nothman [mailto:joel.nothman@...] > *Sent:* Monday, July 28, 2014 10:32 AM > *To:* scikitlearngeneral > > > *Subject:* Re: [Scikitlearngeneral] gridSearchCV best_estimator_ > best_score_ > > > > I do think you're right to attempt to improve it! Please submit a PR! > > > > On 29 July 2014 00:05, Pagliari, Roberto <rpagliari@...> wrote: > > You are right. > > > > I guess only C (in the case of linear SVM) is the best averaged over the > fold. And once C is found, the weights over the whole training set are > computed. > > > > If that’s the case, my proposal may be misleading. > > > > Thank you, > > Roberto > > > > > > *From:* Andy [mailto:t3kcit@...] > *Sent:* Saturday, July 26, 2014 4:42 AM > > > *To:* scikitlearngeneral@... > *Subject:* Re: [Scikitlearngeneral] gridSearchCV best_estimator_ > best_score_ > > > > On 07/25/2014 10:30 PM, Pagliari, Roberto wrote: > > Hi Andy, > > Maybe it’s just me, but the ”left out data” threw me off. Perhaps, I would > integrate with your previous comments: > > > > best_estimator_ > > estimator > > Estimator that was chosen by the search, i.e. estimator which gave highest > *average* score (or smallest loss if specified) *over the > crossvalidation folds*. on the left out data. > > best_score_ > > float > > *Highest average score* of *the* best_estimator *computed above* on the > left out data. > > > > This is not entirely correct. The "best_estimator_" is retrained on the > whole training set, while best_score_ is the average over folds. > I like your string for best_estimator_, but for best_score_ I would > probably also say "Highest average score of the best parameter setting over > crossvalidation folds". > > Pull request welcome. The current docstring warrants improvement I think ;) > > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > > > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > > > >  > Infragistics Professional > Build stunning WinForms apps today! > Reboot your WinForms applications with our WinForms controls. > Build a bridge from your legacy apps to the future. > > http://pubads.g.doubleclick.net/gampad/clk?id=153845071&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > 