You can subscribe to this list here.
2004 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}
(1) 
_{Sep}

_{Oct}

_{Nov}
(1) 
_{Dec}


2005 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}
(1) 
_{Jul}

_{Aug}

_{Sep}

_{Oct}
(2) 
_{Nov}

_{Dec}
(1) 
2006 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}
(3) 
_{Jun}
(1) 
_{Jul}
(3) 
_{Aug}
(8) 
_{Sep}

_{Oct}

_{Nov}

_{Dec}

2007 
_{Jan}
(1) 
_{Feb}

_{Mar}
(1) 
_{Apr}

_{May}
(2) 
_{Jun}
(3) 
_{Jul}
(1) 
_{Aug}
(4) 
_{Sep}
(15) 
_{Oct}
(4) 
_{Nov}

_{Dec}

2008 
_{Jan}
(10) 
_{Feb}
(2) 
_{Mar}

_{Apr}

_{May}
(7) 
_{Jun}
(4) 
_{Jul}
(6) 
_{Aug}
(12) 
_{Sep}

_{Oct}
(3) 
_{Nov}
(13) 
_{Dec}
(10) 
2009 
_{Jan}
(12) 
_{Feb}
(19) 
_{Mar}
(27) 
_{Apr}

_{May}
(6) 
_{Jun}
(9) 
_{Jul}

_{Aug}
(5) 
_{Sep}
(12) 
_{Oct}
(20) 
_{Nov}
(1) 
_{Dec}
(8) 
2010 
_{Jan}
(5) 
_{Feb}
(8) 
_{Mar}
(3) 
_{Apr}
(4) 
_{May}
(3) 
_{Jun}
(12) 
_{Jul}
(22) 
_{Aug}
(19) 
_{Sep}
(7) 
_{Oct}
(7) 
_{Nov}
(7) 
_{Dec}
(21) 
2011 
_{Jan}
(10) 
_{Feb}
(18) 
_{Mar}
(26) 
_{Apr}
(12) 
_{May}

_{Jun}
(3) 
_{Jul}
(6) 
_{Aug}
(11) 
_{Sep}
(19) 
_{Oct}
(32) 
_{Nov}
(31) 
_{Dec}
(27) 
2012 
_{Jan}
(8) 
_{Feb}
(5) 
_{Mar}
(19) 
_{Apr}
(3) 
_{May}
(3) 
_{Jun}
(14) 
_{Jul}
(15) 
_{Aug}
(3) 
_{Sep}
(14) 
_{Oct}
(7) 
_{Nov}
(6) 
_{Dec}
(36) 
2013 
_{Jan}
(18) 
_{Feb}
(8) 
_{Mar}
(22) 
_{Apr}
(4) 
_{May}
(18) 
_{Jun}
(16) 
_{Jul}
(9) 
_{Aug}
(8) 
_{Sep}
(4) 
_{Oct}
(6) 
_{Nov}
(1) 
_{Dec}
(3) 
2014 
_{Jan}
(5) 
_{Feb}
(3) 
_{Mar}
(5) 
_{Apr}
(6) 
_{May}
(2) 
_{Jun}

_{Jul}
(4) 
_{Aug}
(4) 
_{Sep}
(7) 
_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 







1

2

3
(5) 
4
(3) 
5

6

7
(4) 
8

9

10

11

12
(1) 
13

14
(1) 
15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30
(1) 






From: Michael Bewley <michael.bewley@gm...>  20070930 07:07:56

Hi, I'm using MDP for data reduction on a very large dataset (196 instances, each with around 10000 variables). I'm trying to perform PCA on the raw voxels in a set of 3D images. My basic problem is that I run out of memory. Following the tutorial, I call PCANode.train on one instance at a time, but when I call "stop_training", the memory usage goes up quite a lot (say 2 GB). Is it possible to drop the memory requirements given that I'm only requesting a small number of output dimensions? My code outline is as follows: x = #Numpy array with shape=(190,10000) pNode = mdp.nodes.PCANode (svd=True,output_dim=50,input_dim=self.__featArray.shape[1]) for i in xrange(x.shape[0]): self.__pNode.train(N.array(x[i,:],ndmin=2)) pNode.stop_training() I'm using the latest svn trunk of symeig and mdp (I like the improvements by the way, very helpful, particularly the ability for the ICANode to pass arguments to the internal whitening node...) Thanks, Mike 
From: Tiziano Zito <t.zito@bi...>  20070914 16:22:21

Dear mdp users, we are pleased to announce that a new implementation of the FastICA node is available in the svn repository. The node is now uptodate with the original Matlab version (2.5) published on 19.10.2005. Highlights:  fine tuning is implemented  the stabilized version of the algorithm is implemented  the new 'skew' non linearity for skewed input data is implemented  bug fix in 'tanh' non linearity  bug fix in 'gaus' non linearity in 'symm' approach  all combinations of input parameters are being tested If you are already using the FastICANode, you should check it out! Get it with: svn co https://mdptoolkit.svn.sourceforge.net/svnroot/mdptoolkit/mdp/trunk/mdp mdp We will be very grateful for any feedback or bug reports! enjoy, tiziano 
From: Tiziano Zito <t.zito@bi...>  20070912 14:29:01

Dear MDP users, we have just added a new functionality to PCANode and WhiteningNode. The nodes can now be instantiated with svd=True, to use Singular Value Decomposition instead of the standard eigenvalue problem solver. This should definitely solve the problems with the "Covariance matrix may be singular" exception. By setting reduce=True and var_abs and var_rel, it is possible to automatically discard irrelevant principal components. You can test the new nodes in the svn trunk: svn co https://mdptoolkit.svn.sourceforge.net/svnroot/mdptoolkit/mdp/trunk/mdp mdp It is a good idea to update the symeig package too: svn co https://mdptoolkit.svn.sourceforge.net/svnroot/mdptoolkit/mdp/trunk/symeig symeig Read the nodes internal string docs for more info. Let us know if the nodes work as expected! ciao, tiziano 
From: Charlie Strauss <cems@la...>  20070907 17:04:31

On Sep 7, 2007, at 10:41 AM, Tiziano Zito wrote: >> Thanks! I have some follow up questions: >> >>> icav = ica.get_projmatrix() >> >> >> Let me see if I understand this correctly as it is ambiguous. >> First off I note that in the current release version (fink) that I >> have CuBICANode() class does not currently >> get_projmatrix() method so I assume you are referring to the current >> svn. > Yes, I am referring to the current SVN trunk. > >> Next, when one asks for the projection matrix of an ICA problem there >> are three possible entities that one might consider a kind of >> projection matrix. If one is doing PCA, followed by ICA then one has >> the PCA rotation matrix, and then the ICA rotation matrix and then >> finally the product of these matricies and the eigenvalues of the >> PCA. (actually I'm slightly fuzzy on when or wheter the eigen >> values get divided out in this process of whitening). > If you perform the dimensionality reduction and/or the whitening > inside the ICANode, i.e. if you instantiate the node with: > ica = mdp.nodes.CuBICANode(whitened=False, white_comp=5) > then, as clearly stated in the documentation of the node, which you > can see with help(mdp.nodes.CuBICANode), the nodes takes care > internally of the whitening and the dimensionality reduction. A > call to the "get_projmatrix" method in this case returns the matrix > that you need to multiply the meanfree input data with, in order to > get the output. The eigenvalue division, the mean removal, the > multiplication by the PCA matrix, everything is automatically > performed by the node. If you perform whitening outside of the node, > for example using a WhiteningNode before, you would instantiate the > ICANode with > ica= mdp.nodes.CuBICANode(whitened=True) > In this case MDP assumes that the input to the ICANode is "white", > i.e. > meanfree, unit variance and decorrelated. The projmatrix is again > the matrix that you need to multiply the input (to the ICANode) data > with in order to get the output. In this case there is no eigenvalue > division, no mean removal, etc. If you then want to get the "global" > projmatrix, i.e. the one that takes into account your WhiteningNode, > or whatever else you have ahead of the ICANode, you'll have to > calculate it yourself. > Does this explanation make sense? Thanks! yes that is clearer than the documentation I currently have was. > >> Another obviously missing feature here is the recovery of the eigen >> values used in the PCA or whitening step? one can't invert anything >> without those I would think, so it seems like they should be >> available. I would presume that they are available since they are >> both used for whitening and also used to compute the variance. > well, "obviously missing feature" seems a bit rude to me, No rudeness meant. When I sense something is missing, and think it's peculiar, it's a clue to me that I'm not understanding something critical. Sorry to be cavalier in my adverbs. > especially if this information is available, as clearly stated in > the documentation of the WhiteningNode and PCANode. The eigenvalues of > the covariance matrix are stored in the node.d attribute. For the > ICANode, if you let the node perform the whitening internally, this > informatiuon is stored in node.white.d . > >> In my case, when I set the system to compute with output_vec to be a >> fraction like 0.8, I get an error saying the matrix is singular. I >> have not explored in detail what's happening here. I can tell you >> that the data is real and noisey data with no repeated sequences, so >> it's unlikely there are any actual sigularities in its ev spectrum. >> each run has 55 points in it and there are 53 runs, so the matrix is >> nearly square (one would prefer I think a lager aspect ratio >> matrix). However if one were doing SVD on such a matrix, zero >> eigen values should not be any actual problem for the method since >> they won't contribute the the variance. Thus I'm puzzled by what the >> cause of this instability is. > the fact that you get the error when using output_dim=0.8 and not > when you set output_dim=N seems odd. Can you elaborate on that? In > any case, at the moment all covariance matrix diagonalizations in > MDP are done using the LAPACK eigensolver for symmetric definite > positive matrices. Using SVD instead is in fact an option, and, > as dicussed in this thread: > http://sourceforge.net/mailarchive/forum.php? > thread_name=c0a568e40709030529q2183e3f3r19a95d0d487d8c23% > 40mail.gmail.com&forum_name=mdptoolkitusers > we already have an implementation that will be available in the next > weeks. We did not want to use SVD by default, because in the > nonsingular case the eigensolver is way faster (>100x). Please stay > tuned with the MDP website and you'll have your SVDusing PCA and > Whitening nodes soon! Cool! well if it's really 10x or 100x faster then it seems like the rational thing to do here to always try the eigensolver but trap any errors and use SVD when it's singular. as for the error. this works: pca1 = mdp.nodes.PCANode(output_dim = 5 ) this fails: pca1 = mdp.nodes.PCANode(output_dim = 0.8) when I run pca.train(yy) here is the traceback: 38 39 pca1.train(yy) > 40 out_pca1=pca1.execute(yy) 41 figure() 42 plot(transpose(out_pca1) ) # plot some samples. /sw/lib/python2.5/sitepackages/mdp/nodes/pca_nodes.py in execute (self, x, n) 169 """Project the input on the first 'n' principal components. 170 If 'n' is not set, use all available components.""" > 171 return super(PCANode, self).execute(x, n) 172 173 def _inverse(self, y, n = None): /sw/lib/python2.5/sitepackages/mdp/signal_node.py in execute(self, x, *args, **kargs) 377 super(MyNode, self).execute(x, arg1, karg2=karg2) 378 """ > 379 self._pre_execution_checks(x) 380 return self._execute(self._refcast(x), *args, **kargs) 381 /sw/lib/python2.5/sitepackages/mdp/signal_node.py in _pre_execution_checks(self, x) 273 It can be used when a subclass defines multiple execution methods.""" 274 > 275 self._if_training_stop_training() 276 277 # control the dimension x /sw/lib/python2.5/sitepackages/mdp/signal_node.py in _if_training_stop_training(self) 262 def _if_training_stop_training(self): 263 if self.is_training(): > 264 self.stop_training() 265 # if there is some training phases left 266 # we shouldn't be here! /sw/lib/python2.5/sitepackages/mdp/signal_node.py in stop_training (self, *args, **kwargs) 354 355 # close the current phase. > 356 self._train_seq[self._train_phase][1](*args, **kwargs) 357 self._train_phase += 1 358 self._train_phase_started = False /sw/lib/python2.5/sitepackages/mdp/nodes/pca_nodes.py in _stop_training(self) 105 if d.min() <= 0: 106 errs="Got negative eigenvalues: Covariance matrix may be singular." > 107 raise NodeException, errs 108 109 <class 'mdp.signal_node.NodeException'>: Got negative eigenvalues: Covariance matrix may be singular. WARNING: Failure executing file: <normcolumns.py> As I said I have not tried to explore this in detail so I can't say for sure that my data set is simply unusual. I could send you a yaml file with it if you want to reproduce it. Please note that I am using the curent fink version not the current svn version. Note that my input matrix has shape (53,55) I'm suspecting this is the problem. My guess here is that internally you compute the covariance matrix and diagonalize this. And I bet that the covariance matrix always uses the second index as it's shape, rather than say using the smaller of the two (which SVD would do). In this case then there might not be enough data to support 55 eigen values. Or so I guess. > > hth, > tiziano > > ps: please post any reply to the mailinglist. > Charlie Strauss Bioscience Division cems@... 505 665 4838 Quidquid latine dictum sit, altum sonatur. 
From: Tiziano Zito <t.zito@bi...>  20070907 16:42:26

> Thanks! I have some follow up questions: > > >icav = ica.get_projmatrix() > > > Let me see if I understand this correctly as it is ambiguous. > First off I note that in the current release version (fink) that I > have CuBICANode() class does not currently > get_projmatrix() method so I assume you are referring to the current > svn. Yes, I am referring to the current SVN trunk. > Next, when one asks for the projection matrix of an ICA problem there > are three possible entities that one might consider a kind of > projection matrix. If one is doing PCA, followed by ICA then one has > the PCA rotation matrix, and then the ICA rotation matrix and then > finally the product of these matricies and the eigenvalues of the > PCA. (actually I'm slightly fuzzy on when or wheter the eigen > values get divided out in this process of whitening). If you perform the dimensionality reduction and/or the whitening inside the ICANode, i.e. if you instantiate the node with: ica = mdp.nodes.CuBICANode(whitened=False, white_comp=5) then, as clearly stated in the documentation of the node, which you can see with help(mdp.nodes.CuBICANode), the nodes takes care internally of the whitening and the dimensionality reduction. A call to the "get_projmatrix" method in this case returns the matrix that you need to multiply the meanfree input data with, in order to get the output. The eigenvalue division, the mean removal, the multiplication by the PCA matrix, everything is automatically performed by the node. If you perform whitening outside of the node, for example using a WhiteningNode before, you would instantiate the ICANode with ica= mdp.nodes.CuBICANode(whitened=True) In this case MDP assumes that the input to the ICANode is "white", i.e. meanfree, unit variance and decorrelated. The projmatrix is again the matrix that you need to multiply the input (to the ICANode) data with in order to get the output. In this case there is no eigenvalue division, no mean removal, etc. If you then want to get the "global" projmatrix, i.e. the one that takes into account your WhiteningNode, or whatever else you have ahead of the ICANode, you'll have to calculate it yourself. Does this explanation make sense? > Another obviously missing feature here is the recovery of the eigen > values used in the PCA or whitening step? one can't invert anything > without those I would think, so it seems like they should be > available. I would presume that they are available since they are > both used for whitening and also used to compute the variance. well, "obviously missing feature" seems a bit rude to me, especially if this information is available, as clearly stated in the documentation of the WhiteningNode and PCANode. The eigenvalues of the covariance matrix are stored in the node.d attribute. For the ICANode, if you let the node perform the whitening internally, this informatiuon is stored in node.white.d . > In my case, when I set the system to compute with output_vec to be a > fraction like 0.8, I get an error saying the matrix is singular. I > have not explored in detail what's happening here. I can tell you > that the data is real and noisey data with no repeated sequences, so > it's unlikely there are any actual sigularities in its ev spectrum. > each run has 55 points in it and there are 53 runs, so the matrix is > nearly square (one would prefer I think a lager aspect ratio > matrix). However if one were doing SVD on such a matrix, zero > eigen values should not be any actual problem for the method since > they won't contribute the the variance. Thus I'm puzzled by what the > cause of this instability is. the fact that you get the error when using output_dim=0.8 and not when you set output_dim=N seems odd. Can you elaborate on that? In any case, at the moment all covariance matrix diagonalizations in MDP are done using the LAPACK eigensolver for symmetric definite positive matrices. Using SVD instead is in fact an option, and, as dicussed in this thread: http://sourceforge.net/mailarchive/forum.php?thread_name=c0a568e40709030529q2183e3f3r19a95d0d487d8c23%40mail.gmail.com&forum_name=mdptoolkitusers we already have an implementation that will be available in the next weeks. We did not want to use SVD by default, because in the nonsingular case the eigensolver is way faster (>100x). Please stay tuned with the MDP website and you'll have your SVDusing PCA and Whitening nodes soon! hth, tiziano ps: please post any reply to the mailinglist. 
From: Tiziano Zito <t.zito@bi...>  20070907 07:37:45

Hi Charlie, you are right that the functionality is missing in the released MDP version. If you have a look at the follwoing thread in the mailing list archives: http://sourceforge.net/mailarchive/forum.php?thread_name=1182428342.16125.3.camel%40beckerspc&forum_name=mdptoolkitusers you'll see that the question came up already, and the functionality has been implemented already in the SVN trunk :))) Please check it out and let us know if it works as expected. By the way, if you just wan tto reduce the dimensionality of the input data before feeding it to ICA, you may try the following: ica = mdp.nodes.CuBICANode(whitened=False, white_comp=5) and then getting the basis vectors is a matter of doing icav = ica.get_projmatrix() Regarding the problem with the pca.get_explained_variance is that if you explicitly request a fixed N of output components, the PCA node will *not* solve the whole eigenvalue problem. PCA only gets the N largest eigenvalues (corresponding to the N directions with largest variance), and has no way to know the *total variance*. The get_explained_variance method only makes sense if you request a fraction of the total variance with PCANode(output_dim=0.90): you will then have a pca.desired_variance=0.90 and pca.get_explained_variance will return something around 0.9, depending on the input data. hth, tiziano On Thu 06 Sep, 18:38, Charlie Strauss wrote: > I'm new to MDP but a seemingly missing fuctionality in ICA is the > recovery of the ICA basis vectors ("eigen vectors"). Since this is a > missing function it makes me wonder if I'm using it wrong. > > Currently my workflow is as follows, and at the end I recover the ICA > basis vectors. Surely Is there a simpler way to do this?? That is, > by analogy to PCA there ought to be a get_projmatrix() method. > Instead I find myself calling inverse() on (1,0,0,...) > > from numpy import * > from pylab import * > import mdp > > pca1 = mdp.nodes.PCANode(output_dim = 5) > ica1 = mdp.nodes.CuBICANode() > > # yy is a 53 x 55 matrix: 55 runs with a time series of 53 values each. > > pca1.train(yy) > out_pca1=pca1.execute(yy) > > figure() > plot(transpose(out_pca1) ) # plot some samples. > > # plot the basis vectors for PCA > pm_pca1 = pca1.get_projmatrix() > figure() > plot(pm_pca1) > > ica1.train(out_pca1) > out_ica1=ica1.execute(out_pca1) > > figure() > plot(transpose(out_ica1)) > > > # now we will recover the ICA basis vectors. This seems pretty twisted. > figure() > for i in eye(5): > plot(sum(pm_pca1*ica1.inverse(i[newaxis,:]),axis=1)) > > > An unrelated problem I think is that the pca1.get_explained_variance > () is coming out as none. I'm not sure why this is. > > > > Charlie Strauss > Bioscience Division > cems@... > 505 665 4838 > Quidquid latine dictum sit, altum sonatur. > > > > >  > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Charlie Strauss <cems@la...>  20070907 00:38:16

I'm new to MDP but a seemingly missing fuctionality in ICA is the recovery of the ICA basis vectors ("eigen vectors"). Since this is a missing function it makes me wonder if I'm using it wrong. Currently my workflow is as follows, and at the end I recover the ICA basis vectors. Surely Is there a simpler way to do this?? That is, by analogy to PCA there ought to be a get_projmatrix() method. Instead I find myself calling inverse() on (1,0,0,...) from numpy import * from pylab import * import mdp pca1 = mdp.nodes.PCANode(output_dim = 5) ica1 = mdp.nodes.CuBICANode() # yy is a 53 x 55 matrix: 55 runs with a time series of 53 values each. pca1.train(yy) out_pca1=pca1.execute(yy) figure() plot(transpose(out_pca1) ) # plot some samples. # plot the basis vectors for PCA pm_pca1 = pca1.get_projmatrix() figure() plot(pm_pca1) ica1.train(out_pca1) out_ica1=ica1.execute(out_pca1) figure() plot(transpose(out_ica1)) # now we will recover the ICA basis vectors. This seems pretty twisted. figure() for i in eye(5): plot(sum(pm_pca1*ica1.inverse(i[newaxis,:]),axis=1)) An unrelated problem I think is that the pca1.get_explained_variance () is coming out as none. I'm not sure why this is. Charlie Strauss Bioscience Division cems@... 505 665 4838 Quidquid latine dictum sit, altum sonatur. 
From: Michael Bewley <michael.bewley@gm...>  20070904 15:53:05

Good idea  I had tried to install symeig, but couldn't get it to work. I discovered that the default scipy and numpy packages in ubuntu (feisty) weren't built with ATLAS support (although it had packages to install atlas...). So, I rebuilt them from source, installed symeig (which worked this time), and the error is gone. Just using scipy 0.5.2 without ATLAS support, the error happened with seed 5 and 100  i assume most others as well. Finally I can use MDP properly! I think it's definitely worth gearing mdp towards end users  there's nothing else in python that can do PCA and ICA, and they are a staple part of any data mining system. Are there plans to have it included as a module in the scipy package? As someone who's just getting interested in python for scientific computing (the thought of paying however many thousand a year after I leave uni for a Matlab license isn't very appealing), it makes life much easier if all the functionality can be found in one place. It's a shame to have lots of little projects strewn across the internet that may be fantastic, but are difficult to find! Thanks again for your help, I'll be watching mdp's development with interest. Mike On 9/4/07, Tiziano Zito < t.zito@...> wrote: > > > from scipy import * > > import mdp > > x = random.normal(size=(5,700)) > > icanode = mdp.nodes.FastICANode (whitened=False,white_comp=x.shape[0]) > > icanode.train(x) > > icanode.stop_training() > > > > it works fine. > > If I use an initial size=(5,800), I get "some eigenvalues have > significant > > imaginary part, Covariance matrix may be singular". This happens even if > I > > reduce white_comp to 1. > > My case uses a matrix with size=(196,3480), which also fails (works for > a > > random.normal distribution, not on my data). > > > > I've tried your examples with size=(5,800) and size=(196,3480), and > they work. Could you install symeig ( http://mdptoolkit.sourceforge.net/symeig.html > ) > and try again? > I think symeig is more robust than the scipy internal eig solver. If > you can't install symeig, or the examples are failing even with it, > would you care to send the random seed of a failing example? The > random seed can be set by: > scipy.random.seed(N) > there is no easy way to *get* the random seed, so I suggest that you > try a couple of seeds until it fails. Or, if you prefer, pickle > (with prot=1) your data and post them. I can try using them > locally with the new WhiteningNode, which imho is the best solution > to your problem. > > > The documentation is clear, I just think it could be helpful to raise > some > > kind of input setting exception if there are more variables than > > observations (or even just automatically set output_dim/white_comp to > the > > smaller of variables or observations). > we'll think about that, I'm not sure the automatic setting is a > good idea, but some kind of warning may do. > > > I'm looking very much from a users > > perspective  trying to use PCA and ICA as a small part of my > undergraduate > > engineering thesis, and don't have time to look in depth at the > specifics of > > different implementations of the algorithms. I'd like to be able to > throw > > any table of data at it, and be told if I'm doing something stupid that > will > > definitely fail! MDP is a great package though  i was surprised that > scipy > > doesn't even include an implementation of PCA. > I'm glad you use MDP and like it, when we initially released it, it was > aimed to users who already know the algorithms and their pitfalls, > who can look at the source code and modify it. We are now getting > more and more feedback from people like you, which can't or have no > time to look at the implementation and just want the results. We are > adapting MDP to that audience, even writing you own nodes has become > much easier now than it was at the beginning. Thank you for your > feedback, it really helps us! > > tiziano > > > 
From: Tiziano Zito <t.zito@bi...>  20070904 08:28:42

> from scipy import * > import mdp > x = random.normal(size=(5,700)) > icanode = mdp.nodes.FastICANode (whitened=False,white_comp=x.shape[0]) > icanode.train(x) > icanode.stop_training() > > it works fine. > If I use an initial size=(5,800), I get "some eigenvalues have significant > imaginary part, Covariance matrix may be singular". This happens even if I > reduce white_comp to 1. > My case uses a matrix with size=(196,3480), which also fails (works for a > random.normal distribution, not on my data). > I've tried your examples with size=(5,800) and size=(196,3480), and they work. Could you install symeig ( http://mdptoolkit.sourceforge.net/symeig.html ) and try again? I think symeig is more robust than the scipy internal eig solver. If you can't install symeig, or the examples are failing even with it, would you care to send the random seed of a failing example? The random seed can be set by: scipy.random.seed(N) there is no easy way to *get* the random seed, so I suggest that you try a couple of seeds until it fails. Or, if you prefer, pickle (with prot=1) your data and post them. I can try using them locally with the new WhiteningNode, which imho is the best solution to your problem. > The documentation is clear, I just think it could be helpful to raise some > kind of input setting exception if there are more variables than > observations (or even just automatically set output_dim/white_comp to the > smaller of variables or observations). we'll think about that, I'm not sure the automatic setting is a good idea, but some kind of warning may do. > I'm looking very much from a users > perspective  trying to use PCA and ICA as a small part of my undergraduate > engineering thesis, and don't have time to look in depth at the specifics of > different implementations of the algorithms. I'd like to be able to throw > any table of data at it, and be told if I'm doing something stupid that will > definitely fail! MDP is a great package though  i was surprised that scipy > doesn't even include an implementation of PCA. I'm glad you use MDP and like it, when we initially released it, it was aimed to users who already know the algorithms and their pitfalls, who can look at the source code and modify it. We are now getting more and more feedback from people like you, which can't or have no time to look at the implementation and just want the results. We are adapting MDP to that audience, even writing you own nodes has become much easier now than it was at the beginning. Thank you for your feedback, it really helps us! tiziano 
From: Michael Bewley <michael.bewley@gm...>  20070904 02:09:52

HI, I tried your suggestion, which works under some test cases, just not all (unfortunately not the case I want!) If I do: from scipy import * import mdp x = random.normal(size=(5,700)) icanode = mdp.nodes.FastICANode (whitened=False,white_comp=x.shape[0]) icanode.train(x) icanode.stop_training() it works fine. If I use an initial size=(5,800), I get "some eigenvalues have significant imaginary part, Covariance matrix may be singular". This happens even if I reduce white_comp to 1. My case uses a matrix with size=(196,3480), which also fails (works for a random.normal distribution, not on my data). I'm guessing you have some kind of threshold checking for the imaginary part of eigenvalues, and it's crossing over as the number of variables increases compared to the number of observations? The documentation is clear, I just think it could be helpful to raise some kind of input setting exception if there are more variables than observations (or even just automatically set output_dim/white_comp to the smaller of variables or observations). I'm looking very much from a users perspective  trying to use PCA and ICA as a small part of my undergraduate engineering thesis, and don't have time to look in depth at the specifics of different implementations of the algorithms. I'd like to be able to throw any table of data at it, and be told if I'm doing something stupid that will definitely fail! MDP is a great package though  i was surprised that scipy doesn't even include an implementation of PCA. Mike On 9/4/07, Tiziano Zito <t.zito@... > wrote: > > just to be sure that it actually works when you try, the right > syntax is: > > icanode = mdp.nodes.FastICANode(whitened=False, white_comp=N) > > note 'whitened' instead of 'whitening'. The meaning of the arguments > is documented in the node __init__ method, please let us know if you > find that documentation unclear or non sufficient. as pietro said, a node > doing PCA and one doing Whitening using svd will be released soon. > the whitening node will be able to throw directions away, which have > a relative and/or absolute variance smaller than a user given > threshold, think somethink like 1E8. I think a flow using this new > WhiteningNode and an ICANode will be the more natural solution for > your problem. Check out the MDP site within a month for more news!!! > > have a nice day, > tiziano > > > FastICANode begins by whitening its input data (i.e., PCA + rescaling > > to variance one). You were probably thinking about doing something > > like > > > > flw = mdp.Flow([mdp.nodes.WhiteningNode(output_dim=N), > > mdp.nodes.FastICANode(whitening=False)]) > > flw.train(a) > > > > which should indeed work. There is a simpler solution: > > > > icanode = mdp.nodes.FastICANode (white_comp=N) > > icanode.train(a) > > icanode.stop_training() > > > > This reduces the number of components as needed. > > Your second suggestion > > > > flw = mdp.Flow([mdp.nodes.WhiteningNode (output_dim=0.95), > > mdp.nodes.FastICANode(whitening=False)]) > > > > would work for data with a fullrank covariance matrix, but not here: > > in order to decide how many components correspond to 95% of the input > > variance, PCA has to solve the full DxD problem (it must compute much > > 100% of the variance is), which would raise an exception since the > > matrix is singular. > > > > I believe Tiziano wrote a node that uses Singular Value Decomposition > > to perform PCA, which would make the algorithm stable even in the > > singular case. We're planning to release it soon together with a few > > other nodes. Stay tuned! > > Pietro > > > > On 9/3/07, Michael Bewley < michael.bewley@...> wrote: > > > Wow  that's what I call fast response  thanks! Yep  makes sense, > and > > > works for a single PCANode. If I use it in a flow with a FastICANode, > the > > > FastICANode doesn't seem to work that out. As far as I'm aware (not > being an > > > ICA expert), ICA needs at least as many observations as variables. My > plan > > > was to use a flow to cut down the number of variables to less than the > > > > number of observations (via a PCANode), then use ICA. Doing it as > separate > > > nodes works, just not as a flow. I've tried playing with input_dim and > > > output_dim of the FastICANode before putting it in the flow, but no > luck. > > > Maybe I just don't understand how flows work? > > > Wouldn't it make sense to make the PCANode automatically cap the > number of > > > principal components at the number of input observations? That way if > I > > > apply it to an unknown dataset with less observations than variables, > I can > > > use e.g. output_dim=0.95 without worrying that it might try and make > too > > > many components and die with an error. Would be particularly handy for > large > > > data matrices that take a while to compute! > > > Thanks again, > > > MIke > > > > > > > > > > > > On 9/3/07, Pietro Berkes < berkes@...> wrote: > > > > Hi Mike! > > > > If you've got less observations than dimensions, the covariance > matrix > > > > is singular. If you don't specify any additional arguments, MDP will > > > > > try to give you back as many PCA components as there are input > > > > dimensions, which is impossible in this case. The error message > should > > > > disappear if you tell MDP to return as many PCA components as there > > > > are observations. > > > > E.g., if you have N=5 observations for D=10 dimensions, > > > > > > > > import mdp > > > > D, N = 10, 5 > > > > a = mdp.numx_rand.rand (N,D) # get some random data > > > > mdp.pca(a) # this raises an error > > > > mdp.pca(a, output_dim=N) # this does not > > > > > > > > The additional arguments are documented in mdp.nodes.PCANode.__init_ _ . > > > > > > > > Note, however, that if the number of dimensions is very large, it > > > > would be more efficient to diagonalize the covariance matrix in the > > > > observations space instead than in the input space. I can send you a > > > > > reference if you need more details. > > > > > > > > All the best, > > > > Pietro > > > > > > > > > > > > On 9/3/07, Michael Bewley < michael.bewley@...> wrote: > > > > > Hi, > > > > > I'm trying to use MDP for an application of PCA, and having a > problem > > > with > > > > > "covariance matrix may be singular". Whenever I have less > observations > > > than > > > > > variables, I get this error. As far as I'm aware, PCA can be used > to > > > reduce > > > > > the dimensionality of a dataset  e.g. in image processing one > might > > > have 30 > > > > > images and several thousand variables (pixels), and use PCA to > reduce > > > the > > > > > number of variables. I've used other PCA packages with this > situation > > > and > > > > > they seem to work... > > > > > Clearly either my theory, mdp or my usage of it is wrong! Any > ideas > > > which? > > > > > Mike > > > > > > > > > > > > > >  > > > > > This SF.net email is sponsored by: Splunk Inc. > > > > > Still grepping through log files to find problems? Stop. > > > > > Now Search log events and configuration files using AJAX and a > browser. > > > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > > > _______________________________________________ > > > > > mdptoolkitusers mailing list > > > > > mdptoolkitusers@... > > > > > > > > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > > > > > > > > > > > > > > > > > > > > > > > >  > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > mdptoolkitusers mailing list > > mdptoolkitusers@... > > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Tiziano Zito <t.zito@bi...>  20070903 17:03:52

just to be sure that it actually works when you try, the right syntax is: icanode = mdp.nodes.FastICANode(whitened=False, white_comp=N) note 'whitened' instead of 'whitening'. The meaning of the arguments is documented in the node __init__ method, please let us know if you find that documentation unclear or non sufficient. as pietro said, a node doing PCA and one doing Whitening using svd will be released soon. the whitening node will be able to throw directions away, which have a relative and/or absolute variance smaller than a user given threshold, think somethink like 1E8. I think a flow using this new WhiteningNode and an ICANode will be the more natural solution for your problem. Check out the MDP site within a month for more news!!! have a nice day, tiziano > FastICANode begins by whitening its input data (i.e., PCA + rescaling > to variance one). You were probably thinking about doing something > like > > flw = mdp.Flow([mdp.nodes.WhiteningNode(output_dim=N), > mdp.nodes.FastICANode(whitening=False)]) > flw.train(a) > > which should indeed work. There is a simpler solution: > > icanode = mdp.nodes.FastICANode(white_comp=N) > icanode.train(a) > icanode.stop_training() > > This reduces the number of components as needed. > Your second suggestion > > flw = mdp.Flow([mdp.nodes.WhiteningNode(output_dim=0.95), > mdp.nodes.FastICANode(whitening=False)]) > > would work for data with a fullrank covariance matrix, but not here: > in order to decide how many components correspond to 95% of the input > variance, PCA has to solve the full DxD problem (it must compute much > 100% of the variance is), which would raise an exception since the > matrix is singular. > > I believe Tiziano wrote a node that uses Singular Value Decomposition > to perform PCA, which would make the algorithm stable even in the > singular case. We're planning to release it soon together with a few > other nodes. Stay tuned! > Pietro > > On 9/3/07, Michael Bewley <michael.bewley@...> wrote: > > Wow  that's what I call fast response  thanks! Yep  makes sense, and > > works for a single PCANode. If I use it in a flow with a FastICANode, the > > FastICANode doesn't seem to work that out. As far as I'm aware (not being an > > ICA expert), ICA needs at least as many observations as variables. My plan > > was to use a flow to cut down the number of variables to less than the > > number of observations (via a PCANode), then use ICA. Doing it as separate > > nodes works, just not as a flow. I've tried playing with input_dim and > > output_dim of the FastICANode before putting it in the flow, but no luck. > > Maybe I just don't understand how flows work? > > Wouldn't it make sense to make the PCANode automatically cap the number of > > principal components at the number of input observations? That way if I > > apply it to an unknown dataset with less observations than variables, I can > > use e.g. output_dim=0.95 without worrying that it might try and make too > > many components and die with an error. Would be particularly handy for large > > data matrices that take a while to compute! > > Thanks again, > > MIke > > > > > > > > On 9/3/07, Pietro Berkes <berkes@...> wrote: > > > Hi Mike! > > > If you've got less observations than dimensions, the covariance matrix > > > is singular. If you don't specify any additional arguments, MDP will > > > try to give you back as many PCA components as there are input > > > dimensions, which is impossible in this case. The error message should > > > disappear if you tell MDP to return as many PCA components as there > > > are observations. > > > E.g., if you have N=5 observations for D=10 dimensions, > > > > > > import mdp > > > D, N = 10, 5 > > > a = mdp.numx_rand.rand(N,D) # get some random data > > > mdp.pca(a) # this raises an error > > > mdp.pca(a, output_dim=N) # this does not > > > > > > The additional arguments are documented in mdp.nodes.PCANode.__init_ _ . > > > > > > Note, however, that if the number of dimensions is very large, it > > > would be more efficient to diagonalize the covariance matrix in the > > > observations space instead than in the input space. I can send you a > > > reference if you need more details. > > > > > > All the best, > > > Pietro > > > > > > > > > On 9/3/07, Michael Bewley <michael.bewley@...> wrote: > > > > Hi, > > > > I'm trying to use MDP for an application of PCA, and having a problem > > with > > > > "covariance matrix may be singular". Whenever I have less observations > > than > > > > variables, I get this error. As far as I'm aware, PCA can be used to > > reduce > > > > the dimensionality of a dataset  e.g. in image processing one might > > have 30 > > > > images and several thousand variables (pixels), and use PCA to reduce > > the > > > > number of variables. I've used other PCA packages with this situation > > and > > > > they seem to work... > > > > Clearly either my theory, mdp or my usage of it is wrong! Any ideas > > which? > > > > Mike > > > > > > > > > >  > > > > This SF.net email is sponsored by: Splunk Inc. > > > > Still grepping through log files to find problems? Stop. > > > > Now Search log events and configuration files using AJAX and a browser. > > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > > _______________________________________________ > > > > mdptoolkitusers mailing list > > > > mdptoolkitusers@... > > > > > > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > > > > > > > > > > > > > > > >  > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Pietro Berkes <berkes@ga...>  20070903 14:35:28

FastICANode begins by whitening its input data (i.e., PCA + rescaling to variance one). You were probably thinking about doing something like flw = mdp.Flow([mdp.nodes.WhiteningNode(output_dim=N), mdp.nodes.FastICANode(whitening=False)]) flw.train(a) which should indeed work. There is a simpler solution: icanode = mdp.nodes.FastICANode(white_comp=N) icanode.train(a) icanode.stop_training() This reduces the number of components as needed. Your second suggestion flw = mdp.Flow([mdp.nodes.WhiteningNode(output_dim=0.95), mdp.nodes.FastICANode(whitening=False)]) would work for data with a fullrank covariance matrix, but not here: in order to decide how many components correspond to 95% of the input variance, PCA has to solve the full DxD problem (it must compute much 100% of the variance is), which would raise an exception since the matrix is singular. I believe Tiziano wrote a node that uses Singular Value Decomposition to perform PCA, which would make the algorithm stable even in the singular case. We're planning to release it soon together with a few other nodes. Stay tuned! Pietro On 9/3/07, Michael Bewley <michael.bewley@...> wrote: > Wow  that's what I call fast response  thanks! Yep  makes sense, and > works for a single PCANode. If I use it in a flow with a FastICANode, the > FastICANode doesn't seem to work that out. As far as I'm aware (not being an > ICA expert), ICA needs at least as many observations as variables. My plan > was to use a flow to cut down the number of variables to less than the > number of observations (via a PCANode), then use ICA. Doing it as separate > nodes works, just not as a flow. I've tried playing with input_dim and > output_dim of the FastICANode before putting it in the flow, but no luck. > Maybe I just don't understand how flows work? > Wouldn't it make sense to make the PCANode automatically cap the number of > principal components at the number of input observations? That way if I > apply it to an unknown dataset with less observations than variables, I can > use e.g. output_dim=0.95 without worrying that it might try and make too > many components and die with an error. Would be particularly handy for large > data matrices that take a while to compute! > Thanks again, > MIke > > > > On 9/3/07, Pietro Berkes <berkes@...> wrote: > > Hi Mike! > > If you've got less observations than dimensions, the covariance matrix > > is singular. If you don't specify any additional arguments, MDP will > > try to give you back as many PCA components as there are input > > dimensions, which is impossible in this case. The error message should > > disappear if you tell MDP to return as many PCA components as there > > are observations. > > E.g., if you have N=5 observations for D=10 dimensions, > > > > import mdp > > D, N = 10, 5 > > a = mdp.numx_rand.rand(N,D) # get some random data > > mdp.pca(a) # this raises an error > > mdp.pca(a, output_dim=N) # this does not > > > > The additional arguments are documented in mdp.nodes.PCANode.__init_ _ . > > > > Note, however, that if the number of dimensions is very large, it > > would be more efficient to diagonalize the covariance matrix in the > > observations space instead than in the input space. I can send you a > > reference if you need more details. > > > > All the best, > > Pietro > > > > > > On 9/3/07, Michael Bewley <michael.bewley@...> wrote: > > > Hi, > > > I'm trying to use MDP for an application of PCA, and having a problem > with > > > "covariance matrix may be singular". Whenever I have less observations > than > > > variables, I get this error. As far as I'm aware, PCA can be used to > reduce > > > the dimensionality of a dataset  e.g. in image processing one might > have 30 > > > images and several thousand variables (pixels), and use PCA to reduce > the > > > number of variables. I've used other PCA packages with this situation > and > > > they seem to work... > > > Clearly either my theory, mdp or my usage of it is wrong! Any ideas > which? > > > Mike > > > > > > >  > > > This SF.net email is sponsored by: Splunk Inc. > > > Still grepping through log files to find problems? Stop. > > > Now Search log events and configuration files using AJAX and a browser. > > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > > _______________________________________________ > > > mdptoolkitusers mailing list > > > mdptoolkitusers@... > > > > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > > > > > > > > > 
From: Pietro Berkes <berkes@ga...>  20070903 14:16:00

Hi Mike! If you've got less observations than dimensions, the covariance matrix is singular. If you don't specify any additional arguments, MDP will try to give you back as many PCA components as there are input dimensions, which is impossible in this case. The error message should disappear if you tell MDP to return as many PCA components as there are observations. E.g., if you have N=5 observations for D=10 dimensions, import mdp D, N = 10, 5 a = mdp.numx_rand.rand(N,D) # get some random data mdp.pca(a) # this raises an error mdp.pca(a, output_dim=N) # this does not The additional arguments are documented in mdp.nodes.PCANode.__init__ . Note, however, that if the number of dimensions is very large, it would be more efficient to diagonalize the covariance matrix in the observations space instead than in the input space. I can send you a reference if you need more details. All the best, Pietro On 9/3/07, Michael Bewley <michael.bewley@...> wrote: > Hi, > I'm trying to use MDP for an application of PCA, and having a problem with > "covariance matrix may be singular". Whenever I have less observations than > variables, I get this error. As far as I'm aware, PCA can be used to reduce > the dimensionality of a dataset  e.g. in image processing one might have 30 > images and several thousand variables (pixels), and use PCA to reduce the > number of variables. I've used other PCA packages with this situation and > they seem to work... > Clearly either my theory, mdp or my usage of it is wrong! Any ideas which? > Mike > >  > This SF.net email is sponsored by: Splunk Inc. > Still grepping through log files to find problems? Stop. > Now Search log events and configuration files using AJAX and a browser. > Download your FREE copy of Splunk now >> http://get.splunk.com/ > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > 
From: Michael Bewley <michael.bewley@gm...>  20070903 14:05:55

Wow  that's what I call fast response  thanks! Yep  makes sense, and works for a single PCANode. If I use it in a flow with a FastICANode, the FastICANode doesn't seem to work that out. As far as I'm aware (not being an ICA expert), ICA needs at least as many observations as variables. My plan was to use a flow to cut down the number of variables to less than the number of observations (via a PCANode), then use ICA. Doing it as separate nodes works, just not as a flow. I've tried playing with input_dim and output_dim of the FastICANode before putting it in the flow, but no luck. Maybe I just don't understand how flows work? Wouldn't it make sense to make the PCANode automatically cap the number of principal components at the number of input observations? That way if I apply it to an unknown dataset with less observations than variables, I can use e.g. output_dim=0.95 without worrying that it might try and make too many components and die with an error. Would be particularly handy for large data matrices that take a while to compute! Thanks again, MIke On 9/3/07, Pietro Berkes <berkes@...> wrote: > > Hi Mike! > If you've got less observations than dimensions, the covariance matrix > is singular. If you don't specify any additional arguments, MDP will > try to give you back as many PCA components as there are input > dimensions, which is impossible in this case. The error message should > disappear if you tell MDP to return as many PCA components as there > are observations. > E.g., if you have N=5 observations for D=10 dimensions, > > import mdp > D, N = 10, 5 > a = mdp.numx_rand.rand(N,D) # get some random data > mdp.pca(a) # this raises an error > mdp.pca(a, output_dim=N) # this does not > > The additional arguments are documented in mdp.nodes.PCANode.__init__ . > > Note, however, that if the number of dimensions is very large, it > would be more efficient to diagonalize the covariance matrix in the > observations space instead than in the input space. I can send you a > reference if you need more details. > > All the best, > Pietro > > > On 9/3/07, Michael Bewley <michael.bewley@...> wrote: > > Hi, > > I'm trying to use MDP for an application of PCA, and having a problem > with > > "covariance matrix may be singular". Whenever I have less observations > than > > variables, I get this error. As far as I'm aware, PCA can be used to > reduce > > the dimensionality of a dataset  e.g. in image processing one might > have 30 > > images and several thousand variables (pixels), and use PCA to reduce > the > > number of variables. I've used other PCA packages with this situation > and > > they seem to work... > > Clearly either my theory, mdp or my usage of it is wrong! Any ideas > which? > > Mike > > > > >  > > This SF.net email is sponsored by: Splunk Inc. > > Still grepping through log files to find problems? Stop. > > Now Search log events and configuration files using AJAX and a browser. > > Download your FREE copy of Splunk now >> http://get.splunk.com/ > > _______________________________________________ > > mdptoolkitusers mailing list > > mdptoolkitusers@... > > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > > > > 
From: Michael Bewley <michael.bewley@gm...>  20070903 12:30:01

Hi, I'm trying to use MDP for an application of PCA, and having a problem with "covariance matrix may be singular". Whenever I have less observations than variables, I get this error. As far as I'm aware, PCA can be used to reduce the dimensionality of a dataset  e.g. in image processing one might have 30 images and several thousand variables (pixels), and use PCA to reduce the number of variables. I've used other PCA packages with this situation and they seem to work... Clearly either my theory, mdp or my usage of it is wrong! Any ideas which? Mike 