You can subscribe to this list here.
2004 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}
(1) 
_{Sep}

_{Oct}

_{Nov}
(1) 
_{Dec}


2005 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}
(1) 
_{Jul}

_{Aug}

_{Sep}

_{Oct}
(2) 
_{Nov}

_{Dec}
(1) 
2006 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}
(3) 
_{Jun}
(1) 
_{Jul}
(3) 
_{Aug}
(8) 
_{Sep}

_{Oct}

_{Nov}

_{Dec}

2007 
_{Jan}
(1) 
_{Feb}

_{Mar}
(1) 
_{Apr}

_{May}
(2) 
_{Jun}
(3) 
_{Jul}
(1) 
_{Aug}
(4) 
_{Sep}
(15) 
_{Oct}
(4) 
_{Nov}

_{Dec}

2008 
_{Jan}
(10) 
_{Feb}
(2) 
_{Mar}

_{Apr}

_{May}
(7) 
_{Jun}
(4) 
_{Jul}
(6) 
_{Aug}
(12) 
_{Sep}

_{Oct}
(3) 
_{Nov}
(13) 
_{Dec}
(10) 
2009 
_{Jan}
(12) 
_{Feb}
(19) 
_{Mar}
(27) 
_{Apr}

_{May}
(6) 
_{Jun}
(9) 
_{Jul}

_{Aug}
(5) 
_{Sep}
(12) 
_{Oct}
(20) 
_{Nov}
(1) 
_{Dec}
(8) 
2010 
_{Jan}
(5) 
_{Feb}
(8) 
_{Mar}
(3) 
_{Apr}
(4) 
_{May}
(3) 
_{Jun}
(12) 
_{Jul}
(22) 
_{Aug}
(19) 
_{Sep}
(7) 
_{Oct}
(7) 
_{Nov}
(7) 
_{Dec}
(21) 
2011 
_{Jan}
(10) 
_{Feb}
(18) 
_{Mar}
(26) 
_{Apr}
(12) 
_{May}

_{Jun}
(3) 
_{Jul}
(6) 
_{Aug}
(11) 
_{Sep}
(19) 
_{Oct}
(32) 
_{Nov}
(31) 
_{Dec}
(27) 
2012 
_{Jan}
(8) 
_{Feb}
(5) 
_{Mar}
(19) 
_{Apr}
(3) 
_{May}
(3) 
_{Jun}
(14) 
_{Jul}
(15) 
_{Aug}
(3) 
_{Sep}
(14) 
_{Oct}
(7) 
_{Nov}
(6) 
_{Dec}
(36) 
2013 
_{Jan}
(18) 
_{Feb}
(8) 
_{Mar}
(22) 
_{Apr}
(4) 
_{May}
(18) 
_{Jun}
(16) 
_{Jul}
(9) 
_{Aug}
(8) 
_{Sep}
(4) 
_{Oct}
(6) 
_{Nov}
(1) 
_{Dec}
(3) 
2014 
_{Jan}
(5) 
_{Feb}
(3) 
_{Mar}
(5) 
_{Apr}
(6) 
_{May}
(2) 
_{Jun}

_{Jul}
(4) 
_{Aug}
(4) 
_{Sep}
(7) 
_{Oct}
(3) 
_{Nov}
(5) 
_{Dec}
(3) 
2015 
_{Jan}
(1) 
_{Feb}

_{Mar}

_{Apr}
(1) 
_{May}
(2) 
_{Jun}

_{Jul}
(3) 
_{Aug}
(4) 
_{Sep}

_{Oct}
(5) 
_{Nov}

_{Dec}

2016 
_{Jan}
(11) 
_{Feb}
(7) 
_{Mar}
(1) 
_{Apr}
(1) 
_{May}
(4) 
_{Jun}
(3) 
_{Jul}
(6) 
_{Aug}
(1) 
_{Sep}
(1) 
_{Oct}
(1) 
_{Nov}
(5) 
_{Dec}

2017 
_{Jan}
(6) 
_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 



1

2

3

4
(6) 
5

6

7
(2) 
8
(1) 
9
(5) 
10

11

12

13

14

15

16

17

18
(2) 
19

20

21

22
(4) 
23
(4) 
24
(1) 
25

26

27

28

29
(5) 
30
(1) 



From: Neal Becker <ndbecker2@gm...>  20111130 19:14:09

At least, I think it's a pca question. Please excuse me, I'm new to pca (and not that great with linear algebra). I have an input vector x. I compute a set of outputs y = Ax where A is a matrix of 512 rows and 8 columns. That is, given a vector x of 8 elements, I want to compute 512 outputs. These outputs are actually the results of 512 FIR filter operations (each 1d). That's a lot of filters. So I'm thinking to use pca to find a new matrix A' y' = (A' x) B such that y' \approx y That is, I want to find a new matrix A' with a smaller number of rows than A, and then a matrix B which will transform the result of A' x to the original basis. The idea is to hopefully find a set of much less than 512 vectors, and then the operation of performing these M<512 filters, followed by the transformation B, will be a lot less than the original computation (although the final result y' is only approximately y). Does this make any sense? Is pca appropriate? How do I use mdp for this? 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111129 13:42:54

Excellent, thanks for teaching me how to fish :) Cheers, Fabian On 11/29/11, Tiziano Zito wrote: > hi fabian, > > next time you can just do something like this: > >>> import mdp > >>> print mdp.nodes.SFANode()._symeig.__code__ > <code object wrap_eigh at 0xa4a6c80, file > "...mdptoolkit/mdp/utils/_symeig.py", line 41> > > which gives you exactly the information you were looking for. > > ciao, > tiziano > > > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Tiziano Zito <tiziano.zito@bc...>  20111129 13:29:01

hi fabian, next time you can just do something like this: >>> import mdp >>> print mdp.nodes.SFANode()._symeig.__code__ <code object wrap_eigh at 0xa4a6c80, file "...mdptoolkit/mdp/utils/_symeig.py", line 41> which gives you exactly the information you were looking for. ciao, tiziano 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111129 13:15:40

Thanks Philipp, that's just what I needed. Out of curiosity, however, a followup question: Does someone by any chance know what the most time consuming step of the Scipy/LAPACK eigenvalue routine is? I'm currently looking for the bottleneck in my computations, and even though it is not the finding of the eigenvaules, I'm curious whether it would (within reason) be easily possible to speed it up by parallelism (i.e., if there's a big matrix multiplication somewhere in there, just go for Cuda). (However, I guess there's a multitude of eigenvalue algorithms out there, so it might be a moot question anyways..) Cheers, Fabian On 11/29/11, Philipp Meier wrote: > mdp.__init__.py :: 133(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/__init__.py;h=92ff0837b262345ace212fc9c07570d4dbf77a83;hb=HEAD#l133) > > > > > 132(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/__init__.py;h=92ff0837b262345ace212fc9c07570d4dbf77a83;hb=HEAD#l132) # set symeig > > > > > 133(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/__init__.py;h=92ff0837b262345ace212fc9c07570d4dbf77a83;hb=HEAD#l133) utils.symeig = configuration.get_symeig(numx_linalg) > > > > > 134(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/__init__.py;h=92ff0837b262345ace212fc9c07570d4dbf77a83;hb=HEAD#l134) > > > > > > > > mdp.configuration.py :: 219ff(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l219) > > > > > 219(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l219) def get_symeig(numx_linalg): > > > > > 220(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l220) # if we have scipy, check if the version of > > > > > 221(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l221) # scipy.linalg.eigh supports the rich interface > > > > > 222(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l222) args = inspect.getargspec(numx_linalg.eigh)[0] > > > > > 223(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l223) if len(args) > 4: > > > > > 224(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l224) # if yes, just wrap it > > > > > 225(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l225) from utils._symeig import wrap_eigh as symeig > > > > > 226(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l226) config.ExternalDepFound('symeig', 'scipy.linalg.eigh') > > > > > 227(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l227) else: > > > > > 228(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l228) # either we have numpy, or we have an old scipy > > > > > 229(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l229) # we need to use our own rich wrapper > > > > > 230(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l230) from utils._symeig import _symeig_fake as symeig > > > > > 231(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l231) config.ExternalDepFound('symeig', 'symeig_fake') > > > > > 232(http://mdptoolkit.git.sourceforge.net/git/gitweb.cgi?p=mdptoolkit/mdptoolkit;a=blob;f=mdp/configuration.py;h=09ae0e555f0c6ef4e802018595711f77e29bb307;hb=HEAD#l232) return symeig > > > > > > > > > > So it should be a mapping to this python function: http://docs.scipy.org/doc/scipy0.7.x/reference/generated/scipy.linalg.eigh.html > > > and respectivly to this LAPACK function http://www.netlib.org/lapack/double/dspevd.f (depending on dtype obviously) > > On 29 November 2011 11:15, Fabian Schoenfeld <fabian.schoenfeld@...(javascript:main.compose()> wrote: > > > > > > > > > Hi! > > > > > > > > > > A quick question: I want to trace the computation over the lifetime of a SFA > > > > > node, and can't seem to find the symeig function it is using. In sfa_nodes.py it > > > > > says > > > > > > > > > > from mdp.utils import ( .., symeig, .. ) > > > > > > > > > > But when looking for the source, the closest thing I can find is the file > > > > > _symeig.py in the 'utils' folder, which merely contains a function called > > > > > '_symeig_fake', which is definitely not being called by the SFA node. I also > > > > > can't find the function in the MDP documentation of the util module..? (I'm > > > > > working on MDP 3.0, so maybe it has been changed in the docs, but I should still > > > > > be able to trace the call in the source files I use?) > > > > > > > > > > In short: When calling 'symeig', what function does the SFANode call, and where > > > > > can I have a look at its source, if available? > > > > > > > > > > Cheers, > > > > > Fabian > > > > > > > > > >  > > > > > All the data continuously generated in your IT infrastructure > > > > > contains a definitive record of customers, application performance, > > > > > security threats, fraudulent activity, and more. Splunk takes this > > > > > data and makes sense of it. IT sense. And common sense. > > > > > http://p.sf.net/sfu/splunknovd2d > > > > > _______________________________________________ > > > > > mdptoolkitusers mailing list > > > > > mdptoolkitusers@...(javascript:main.compose() > > > > > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > > > > > > > > > >  > Philipp Meier > TUBerlin, Fr2071 > Neural Information Processing Group, > Phone:++4930 31426756(tel:%2B%2B4930%2031426756) > > > > > > > > > > > > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111129 10:16:20

Hi! A quick question: I want to trace the computation over the lifetime of a SFA node, and can't seem to find the symeig function it is using. In sfa_nodes.py it says from mdp.utils import ( .., symeig, .. ) But when looking for the source, the closest thing I can find is the file _symeig.py in the 'utils' folder, which merely contains a function called '_symeig_fake', which is definitely not being called by the SFA node. I also can't find the function in the MDP documentation of the util module..? (I'm working on MDP 3.0, so maybe it has been changed in the docs, but I should still be able to trace the call in the source files I use?) In short: When calling 'symeig', what function does the SFANode call, and where can I have a look at its source, if available? Cheers, Fabian 
From: Esben Jannik Bjerrum <esbenjannik@ro...>  20111124 11:27:07

From: Pietro Berkes <berkes@ga...>  20111123 15:41:45

Or like this ;) However, I don't think that we considered the case of missing data when writing NIPALS, but I am not the author of that node. P. On Wed, Nov 23, 2011 at 3:35 PM, Kevin Dunn <kgdunn@...> wrote: > On Wed, Nov 23, 2011 at 10:08, Esben Jannik Bjerrum > <esbenjannik@...> wrote: >> Hi MDP users, >> I'm a newbie to MDP, but do anyone have experience with using the PCAnode >> with missing data? I've succesfully used the PCAnode with some test dataset >> which are complete and don't contain missing data. But my real world dataset >> have some missing values. I've tried to mask them with nan and using Numpys >> fix_invalid, but this only gives partial success when svd=True. Is there a >> trick for how to work with datasets containing missing values? Maybe set the >> missing values to zero after meancentering and normalisation? >> >> Heres the exerpt from my code. >> #data is an array containing some invalid data 0.0 >> data[data < 0.1] = N.nan >> data=N.ma.fix_invalid(N.array(data.values,dtype='f8')) >> #Now data is residues bfaktors as collumns (=variables) and PDB's as rows >> (index = observations) >> >> #Define the flow and make the PCA >> flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.PCANode(svd=True)]) >> #flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.NIPALSNode()]) >> >> flow.train(data) >> >> # Plot the loadings >> loading=flow[1].get_projmatrix() >> print "Loadings" >> print loading >> P.plot(loading[:,0],loading[:,1],'o') >> >> scores = flow(data) >> print scores >> P.figure(2) >> P.plot(scores[:,0],scores[:,1],'o') >> >> P.show() > > > Hi Esben, > > You are looking to calculate the PCA model with missing data present. > That's most easily done using the NIPALS algorithm (rather than SVD), > and using single component projection (SCP) to handle missing data. > > To understand SCP one has to realize that the NIPALS algorithm is > nothing more than a series of alternating least squares steps that > converge to the scores and loadings. The scores are the regression > slope coefficients when regressing rows in X onto the loadings. The > loadings are the slope coefficients when regression columns in X onto > the scores. > > SCP works by just skipping over the missing values when calculating > these regression slopes: the algorithm will still converge, and it > will be identical to the SVD if there happen to be no missing values > present. > > More details are in the slides for my class 3: > http://latent.connectmv.com/wiki/Principal_Component_Analysis#Class_3 > I also recommend you read this paper: > http://literature.connectmv.com/item/68/missingdatamethodsinpcaandpls > > In short: I would leave your missing values as NaN, not replace them > with zero (which is OK, but suboptimal), then use the NIPALS > algorithm. However, I'm not sure if MDP's NIPALSNode handles missing > values as described above. > > Kevin > > >> Best Regards >> Esben > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Pietro Berkes <berkes@ga...>  20111123 15:39:19

Hi Esben! If I understand you correctly, you are missing not entire rows of your data matrix, but single elements of it? In that case, the correct thing to do is really application specific. IMHO, replacing them with 0.0 seems to be a bad idea, since you would be systematically projecting them on the axis, which might introduce artifacts (try plotting your data after the replacement ). A possibility might be: a) measure the variance and mean on all columns, b) replace the missing elements on each column with a random number drawn from a Gaussian with the same variance and mean as the rest of the column. This should be much less biased (again, have a look at your data after you do this). Hope this helps, Pietro On Wed, Nov 23, 2011 at 3:08 PM, Esben Jannik Bjerrum <esbenjannik@...> wrote: > Hi MDP users, > I'm a newbie to MDP, but do anyone have experience with using the PCAnode > with missing data? I've succesfully used the PCAnode with some test dataset > which are complete and don't contain missing data. But my real world dataset > have some missing values. I've tried to mask them with nan and using Numpys > fix_invalid, but this only gives partial success when svd=True. Is there a > trick for how to work with datasets containing missing values? Maybe set the > missing values to zero after meancentering and normalisation? > > Heres the exerpt from my code. > #data is an array containing some invalid data 0.0 > data[data < 0.1] = N.nan > data=N.ma.fix_invalid(N.array(data.values,dtype='f8')) > #Now data is residues bfaktors as collumns (=variables) and PDB's as rows > (index = observations) > > #Define the flow and make the PCA > flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.PCANode(svd=True)]) > #flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.NIPALSNode()]) > > flow.train(data) > > # Plot the loadings > loading=flow[1].get_projmatrix() > print "Loadings" > print loading > P.plot(loading[:,0],loading[:,1],'o') > > scores = flow(data) > print scores > P.figure(2) > P.plot(scores[:,0],scores[:,1],'o') > > P.show() > > Best Regards > Esben > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > 
From: Kevin Dunn <kgdunn@gm...>  20111123 15:36:43

On Wed, Nov 23, 2011 at 10:08, Esben Jannik Bjerrum <esbenjannik@...> wrote: > Hi MDP users, > I'm a newbie to MDP, but do anyone have experience with using the PCAnode > with missing data? I've succesfully used the PCAnode with some test dataset > which are complete and don't contain missing data. But my real world dataset > have some missing values. I've tried to mask them with nan and using Numpys > fix_invalid, but this only gives partial success when svd=True. Is there a > trick for how to work with datasets containing missing values? Maybe set the > missing values to zero after meancentering and normalisation? > > Heres the exerpt from my code. > #data is an array containing some invalid data 0.0 > data[data < 0.1] = N.nan > data=N.ma.fix_invalid(N.array(data.values,dtype='f8')) > #Now data is residues bfaktors as collumns (=variables) and PDB's as rows > (index = observations) > > #Define the flow and make the PCA > flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.PCANode(svd=True)]) > #flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.NIPALSNode()]) > > flow.train(data) > > # Plot the loadings > loading=flow[1].get_projmatrix() > print "Loadings" > print loading > P.plot(loading[:,0],loading[:,1],'o') > > scores = flow(data) > print scores > P.figure(2) > P.plot(scores[:,0],scores[:,1],'o') > > P.show() Hi Esben, You are looking to calculate the PCA model with missing data present. That's most easily done using the NIPALS algorithm (rather than SVD), and using single component projection (SCP) to handle missing data. To understand SCP one has to realize that the NIPALS algorithm is nothing more than a series of alternating least squares steps that converge to the scores and loadings. The scores are the regression slope coefficients when regressing rows in X onto the loadings. The loadings are the slope coefficients when regression columns in X onto the scores. SCP works by just skipping over the missing values when calculating these regression slopes: the algorithm will still converge, and it will be identical to the SVD if there happen to be no missing values present. More details are in the slides for my class 3: http://latent.connectmv.com/wiki/Principal_Component_Analysis#Class_3 I also recommend you read this paper: http://literature.connectmv.com/item/68/missingdatamethodsinpcaandpls In short: I would leave your missing values as NaN, not replace them with zero (which is OK, but suboptimal), then use the NIPALS algorithm. However, I'm not sure if MDP's NIPALSNode handles missing values as described above. Kevin > Best Regards > Esben 
From: Esben Jannik Bjerrum <esbenjannik@ro...>  20111123 15:08:38

Hi MDP users, I'm a newbie to MDP, but do anyone have experience with using the PCAnode with missing data? I've succesfully used the PCAnode with some test dataset which are complete and don't contain missing data. But my real world dataset have some missing values. I've tried to mask them with nan and using Numpys fix_invalid, but this only gives partial success when svd=True. Is there a trick for how to work with datasets containing missing values? Maybe set the missing values to zero after meancentering and normalisation? Heres the exerpt from my code. #data is an array containing some invalid data 0.0 data[data < 0.1] = N.nan data=N.ma.fix_invalid(N.array(data.values,dtype='f8')) #Now data is residues bfaktors as collumns (=variables) and PDB's as rows (index = observations) #Define the flow and make the PCA flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.PCANode(svd=True)]) #flow = mdp.Flow([mdp.nodes.NormalizeNode(), mdp.nodes.NIPALSNode()]) flow.train(data) # Plot the loadings loading=flow[1].get_projmatrix() print "Loadings" print loading P.plot(loading[:,0],loading[:,1],'o') scores = flow(data) print scores P.figure(2) P.plot(scores[:,0],scores[:,1],'o') P.show() Best Regards Esben 
From: Timmy Wilson <timmyt@sm...>  20111122 17:50:13

Thanks Niko/Pietro, I hadn't seen the examples page, or the oger toolbox  a lot of cool stuff here  i'll be busy! The other methods highlighted on mdp examples page (Slow Feature Analysis, Growing Neural Gas, Locally Linear Embedding) have me curious. I'm slowly making my way through Natural Image Statistics  A probabilistic approach to early computational vision  http://www.naturalimagestatistics.net/  which is fantastic! Can you guys recommend another book/resource  i'm curious if/how i can use the other methods on the examples page?? > I also have some standalone Python code > using DBNs for digit recognition > http://people.brandeis.edu/~berkes/docs/dbn_tutorial.html (see "My solution"). Pietro  this will be helpful, thank you Thanks again guys  i'm excited to learn more, and contribute where i can, Timmy Wilson On Tue, Nov 22, 2011 at 3:13 AM, Pietro Berkes <berkes@...> wrote: > To add to Niko's message: I also have some standalone Python code > using DBNs for digit recognition > http://people.brandeis.edu/~berkes/docs/dbn_tutorial.html (see "My solution"). > > > On Tue, Nov 22, 2011 at 7:27 AM, Niko Wilbert <mail@...> wrote: >> Hi Timmy, >> >> on the MDP examples page there are proofofconcept implementations >> for backpropagation and deep belief networks (see the last two entries >> on http://mdptoolkit.sourceforge.net/examples/examples.html). Note >> that they haven't really been tested and may still contains errors and >> other problems (any feedback on them is of course welcome). >> >> You might also want to look at alternative implementation approaches, >> like those used in the Oger toolbox >> (http://mdptoolkit.sourceforge.net/examples/examples.html, which is >> partly based on MDP) or in other libraries (scitkits.learn, >> PyBrain...). >> >> Cheers, Niko >> >> >> >> On Tue, Nov 22, 2011 at 3:15 AM, Timmy Wilson <timmyt@...> wrote: >>> Hi MDP community, >>> >>> I'm interested in using neural nets to find niche communities (latent >>> topics) in a social network (adjacency matrix). >>> >>> I considered LDA  but think i can do better w/ an 'emergent' (more >>> natureinspired) method. >>> >>> Eventually i want to add an evolutionary component to the algo, but >>> that's for another day! >>> >>> Hinton + Salakhutdinov show great document clustering results using a >>> 20005002501252 autoencoder  >>> http://www.cs.toronto.edu/%7Ehinton/science.pdf >>> >>> I'm imagining a flow which consists of PCA preprocessing, recursive >>> RBMs, and then maybe backpropagation?? >>> >>> It seems like someone may have gone down this path (a neural network >>> flow), and/or may have some words of wisdom? >>> >>> And maybe, if i'm really, really lucky someone has a good example :] >>> >>> In any case, i'm thrilled to find MDP  it looks like it'll fit my >>> needs perfectly  thanks guys!! >>> >>> Timmy Wilson >>> Cleveland OH >>> >>>  >>> All the data continuously generated in your IT infrastructure >>> contains a definitive record of customers, application performance, >>> security threats, fraudulent activity, and more. Splunk takes this >>> data and makes sense of it. IT sense. And common sense. >>> http://p.sf.net/sfu/splunknovd2d >>> _______________________________________________ >>> mdptoolkitusers mailing list >>> mdptoolkitusers@... >>> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >>> >> >>  >> All the data continuously generated in your IT infrastructure >> contains a definitive record of customers, application performance, >> security threats, fraudulent activity, and more. Splunk takes this >> data and makes sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunknovd2d >> _______________________________________________ >> mdptoolkitusers mailing list >> mdptoolkitusers@... >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Pietro Berkes <berkes@ga...>  20111122 08:14:22

To add to Niko's message: I also have some standalone Python code using DBNs for digit recognition http://people.brandeis.edu/~berkes/docs/dbn_tutorial.html (see "My solution"). On Tue, Nov 22, 2011 at 7:27 AM, Niko Wilbert <mail@...> wrote: > Hi Timmy, > > on the MDP examples page there are proofofconcept implementations > for backpropagation and deep belief networks (see the last two entries > on http://mdptoolkit.sourceforge.net/examples/examples.html). Note > that they haven't really been tested and may still contains errors and > other problems (any feedback on them is of course welcome). > > You might also want to look at alternative implementation approaches, > like those used in the Oger toolbox > (http://mdptoolkit.sourceforge.net/examples/examples.html, which is > partly based on MDP) or in other libraries (scitkits.learn, > PyBrain...). > > Cheers, Niko > > > > On Tue, Nov 22, 2011 at 3:15 AM, Timmy Wilson <timmyt@...> wrote: >> Hi MDP community, >> >> I'm interested in using neural nets to find niche communities (latent >> topics) in a social network (adjacency matrix). >> >> I considered LDA  but think i can do better w/ an 'emergent' (more >> natureinspired) method. >> >> Eventually i want to add an evolutionary component to the algo, but >> that's for another day! >> >> Hinton + Salakhutdinov show great document clustering results using a >> 20005002501252 autoencoder  >> http://www.cs.toronto.edu/%7Ehinton/science.pdf >> >> I'm imagining a flow which consists of PCA preprocessing, recursive >> RBMs, and then maybe backpropagation?? >> >> It seems like someone may have gone down this path (a neural network >> flow), and/or may have some words of wisdom? >> >> And maybe, if i'm really, really lucky someone has a good example :] >> >> In any case, i'm thrilled to find MDP  it looks like it'll fit my >> needs perfectly  thanks guys!! >> >> Timmy Wilson >> Cleveland OH >> >>  >> All the data continuously generated in your IT infrastructure >> contains a definitive record of customers, application performance, >> security threats, fraudulent activity, and more. Splunk takes this >> data and makes sense of it. IT sense. And common sense. >> http://p.sf.net/sfu/splunknovd2d >> _______________________________________________ >> mdptoolkitusers mailing list >> mdptoolkitusers@... >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Niko Wilbert <mail@ni...>  20111122 07:27:32

Hi Timmy, on the MDP examples page there are proofofconcept implementations for backpropagation and deep belief networks (see the last two entries on http://mdptoolkit.sourceforge.net/examples/examples.html). Note that they haven't really been tested and may still contains errors and other problems (any feedback on them is of course welcome). You might also want to look at alternative implementation approaches, like those used in the Oger toolbox (http://mdptoolkit.sourceforge.net/examples/examples.html, which is partly based on MDP) or in other libraries (scitkits.learn, PyBrain...). Cheers, Niko On Tue, Nov 22, 2011 at 3:15 AM, Timmy Wilson <timmyt@...> wrote: > Hi MDP community, > > I'm interested in using neural nets to find niche communities (latent > topics) in a social network (adjacency matrix). > > I considered LDA  but think i can do better w/ an 'emergent' (more > natureinspired) method. > > Eventually i want to add an evolutionary component to the algo, but > that's for another day! > > Hinton + Salakhutdinov show great document clustering results using a > 20005002501252 autoencoder  > http://www.cs.toronto.edu/%7Ehinton/science.pdf > > I'm imagining a flow which consists of PCA preprocessing, recursive > RBMs, and then maybe backpropagation?? > > It seems like someone may have gone down this path (a neural network > flow), and/or may have some words of wisdom? > > And maybe, if i'm really, really lucky someone has a good example :] > > In any case, i'm thrilled to find MDP  it looks like it'll fit my > needs perfectly  thanks guys!! > > Timmy Wilson > Cleveland OH > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Timmy Wilson <timmyt@sm...>  20111122 04:13:35

Hi MDP community, I'm interested in using neural nets to find niche communities (latent topics) in a social network (adjacency matrix). I considered LDA  but think i can do better w/ an 'emergent' (more natureinspired) method. Eventually i want to add an evolutionary component to the algo, but that's for another day! Hinton + Salakhutdinov show great document clustering results using a 20005002501252 autoencoder  http://www.cs.toronto.edu/%7Ehinton/science.pdf I'm imagining a flow which consists of PCA preprocessing, recursive RBMs, and then maybe backpropagation?? It seems like someone may have gone down this path (a neural network flow), and/or may have some words of wisdom? And maybe, if i'm really, really lucky someone has a good example :] In any case, i'm thrilled to find MDP  it looks like it'll fit my needs perfectly  thanks guys!! Timmy Wilson Cleveland OH 
From: Michael Sarahan <mcsarahan@uc...>  20111118 21:10:08

PCA is a 2D technique  eigenvalue decomposition, essentially. You need to reshape your data down to 2 dimensions, with the elements of the vectors as rows in the array, and each pixel in the array as a column. You need to unfold your data. For your example, elem_wave=elem_wave.reshape(k,1) For your application, you'll then want to reshape the scores/weights back to the 3D shape of your vector array, except that now you'll have a vector of scores/weights for each vector component at each coordinate of your array. # note  here k is the length of the number of scores/weights. # By default, this is k, but you can reduce your dimensionality. score_results=pca_node.v.T.reshape(k,i,j) (Note: you may not need the transpose here. If things come out fishy, omit the .T part.) And then, to access any given scores for a pixel, index appropriately: # matplotlib example  1D plot of scores/weights for pixel 0,0 plt.plot(score_results[:,0,0]) or, plot the scores for a single component (here, the first component) across the several pixels: plt.imshow(score_results[0]) Hope this helps. Mike On 18 November 2011 15:40, Neal Becker <ndbecker2@...> wrote: > newb here (to mdp, and to pca) > > I have a 3d array, which consists of a 2d array of vectors  the vectors > represent signals. > > I'd like to use pca to find a smaller 2d array of vectors (the size of these > resulting vectors would be the same as the original size). > > So from an initial set of i x j vectors (of length k), I'd like a set of > > i' x j' vectors of length k. > > I tried playing with mdp: > > pcanode = mdp.nodes.PCANode() > > elem_wave.shape > Out[12]: (64, 8, 8) > > pcanode.train (elem_wave) > > NodeException: x has rank 3, should be 2 > > Any hints? > >  > All the data continuously generated in your IT infrastructure > contains a definitive record of customers, application performance, > security threats, fraudulent activity, and more. Splunk takes this > data and makes sense of it. IT sense. And common sense. > http://p.sf.net/sfu/splunknovd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Neal Becker <ndbecker2@gm...>  20111118 15:40:43

newb here (to mdp, and to pca) I have a 3d array, which consists of a 2d array of vectors  the vectors represent signals. I'd like to use pca to find a smaller 2d array of vectors (the size of these resulting vectors would be the same as the original size). So from an initial set of i x j vectors (of length k), I'd like a set of i' x j' vectors of length k. I tried playing with mdp: pcanode = mdp.nodes.PCANode() elem_wave.shape Out[12]: (64, 8, 8) pcanode.train (elem_wave) NodeException: x has rank 3, should be 2 Any hints? 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111109 11:01:16

I'll try to get a minimal case which reproduces the error, but I suspect that in doing so I'll already find what's wrong ;) Also, I'm only using the single steps for execution as I want to create sample plots, and that was just the easiest solution. Training is done with a lot more time steps of course. Good to know that execution is merely a multiplication though  fresh food for Cuda ;) (And sorry for the double posting, my client got somewhat excited..) Cheers, Fabian On 11/09/11, Tiziano Zito wrote: > > As above: When only feeding a single timestep, then we both said the same thing. > > I guess I should have made it clearer that I'm only feeding single steps into > > the network, work on the result, and then feed the next step. > > > > Thus, so far I don't see that I'm doing anything wrong (only that I could do it > > somewhat more efficiently, but I'll worry about that once a first version is > > running correctly). However, for a case like this: > > > > k = sfa_node.execute( data, 10 )[0,a] > > j = sfa_node.execute( data, 50 )[0,a] > > > > I still get different results. But at least I now know that's not supposed to be > > the case, so I'll recheck my code and try to find out where that's coming from. > as I showed with the random inputs this should not be the case. Note > that sfanode.execute it's just a matrix multiplication (look at the > source code, it's just 4 lines), I really do not see where the > different results should be coming from. Can you come up with a > minimal example to reproduce your problem? > > > > You can decide to "discard" some dimensions, and in extreme case you can just > > > select one output component. In this case if input is TxN output will be 1xN. > > > > But in this case the result should be Nx1? It's one component (thus one column), > > and the result contains the T timesteps (thus T rows)? > Yes, that's right: Nx1 should be. > > The thing that bothers me now is that you say you are feeding > "single" steps into the network. SFA needs to be able to calculate > the time derivative, so it needs at least 2 samples. If you train > with only one sample you get an error like: > mdp.TrainingException: Need at least 2 time samples to compute time derivative (1 given) > > so you probably use this "one time step" paradigm only for the > execution, where of course there's no limitation of sorts. right? > > tiziano > > >  > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsasfdev2dev1 > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Tiziano Zito <tiziano.zito@bc...>  20111109 09:09:30

> As above: When only feeding a single timestep, then we both said the same thing. > I guess I should have made it clearer that I'm only feeding single steps into > the network, work on the result, and then feed the next step. > > Thus, so far I don't see that I'm doing anything wrong (only that I could do it > somewhat more efficiently, but I'll worry about that once a first version is > running correctly). However, for a case like this: > > k = sfa_node.execute( data, 10 )[0,a] > j = sfa_node.execute( data, 50 )[0,a] > > I still get different results. But at least I now know that's not supposed to be > the case, so I'll recheck my code and try to find out where that's coming from. as I showed with the random inputs this should not be the case. Note that sfanode.execute it's just a matrix multiplication (look at the source code, it's just 4 lines), I really do not see where the different results should be coming from. Can you come up with a minimal example to reproduce your problem? > > You can decide to "discard" some dimensions, and in extreme case you can just > > select one output component. In this case if input is TxN output will be 1xN. > > But in this case the result should be Nx1? It's one component (thus one column), > and the result contains the T timesteps (thus T rows)? Yes, that's right: Nx1 should be. The thing that bothers me now is that you say you are feeding "single" steps into the network. SFA needs to be able to calculate the time derivative, so it needs at least 2 samples. If you train with only one sample you get an error like: mdp.TrainingException: Need at least 2 time samples to compute time derivative (1 given) so you probably use this "one time step" paradigm only for the execution, where of course there's no limitation of sorts. right? tiziano 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111109 08:55:44

> This is wrong. I do not know what you checked and when, but in MDP components > are stored on columns True, but I didn't say anything else. Maybe I wrote it somewhat confused, but what I'm doing it training the network with lots of data, but getting the output slice by slice. I.e.: > so, for SFANode input comes as a matrix and the return value is also > a matrix. If you do not constrain the output dimensions, if the > input is a TxN matrix (T samples for each of the N different > components), the output will be a TxN matrix. > My input 'matrix' is a 1xn matrix, just a single inputtimestep. And since SFA works instantaneous, I should get back an 1xn matrix, which is what I tried to explain in the first post. Further, the first component of the returned (in my case) row vector should then store the activity of the slowest component, and so on. > Wrong. sfanode.execute(x)[0,1] will give you the first sample of the > second slowest function. The complete "answer", as you call it, is > as long in time as the input As above: When only feeding a single timestep, then we both said the same thing. I guess I should have made it clearer that I'm only feeding single steps into the network, work on the result, and then feed the next step. Thus, so far I don't see that I'm doing anything wrong (only that I could do it somewhat more efficiently, but I'll worry about that once a first version is running correctly). However, for a case like this: k = sfa_node.execute( data, 10 )[0,a] j = sfa_node.execute( data, 50 )[0,a] I still get different results. But at least I now know that's not supposed to be the case, so I'll recheck my code and try to find out where that's coming from. > You can decide to "discard" some dimensions, and in extreme case you can just > select one output component. In this case if input is TxN output will be 1xN. But in this case the result should be Nx1? It's one component (thus one column), and the result contains the T timesteps (thus T rows)? Cheers, Fabian On 11/09/11, Tiziano Zito wrote: > > However, it seems I don't really understand what an SFA node actually returns. > > As far as I know: once a node is trained, it can be called with a dataslice > > which returns a rowvector matrix (or did so, last time I checked). > This is wrong. I do not know what you checked and when, but in MDP > components are stored on columns: > http://mdptoolkit.sourceforge.net/tutorial/quick_start.html > """An important remark > > Input array data is typically assumed to be twodimensional and > ordered such that observations of the same variable are stored on > rows and different variables are stored on columns. > """ > > so, for SFANode input comes as a matrix and the return value is also > a matrix. If you do not constrain the output dimensions, if the > input is a TxN matrix (T samples for each of the N different > components), the output will be a TxN matrix. SFANode is just "rotating" > the input data into a frame of reference where components are sorted > by slowness. > > > I thought this vector would contain the 'answers' of the slowest found functions, i.e., > > something like sfa_node.execute( x )[0,1] would get me the answer of the > > second slowest function to the provided data x. > Wrong. sfanode.execute(x)[0,1] will give you the first sample of the > second slowest function. The complete "answer", as you call it, is > as long in time as the input: SFA is an "instantaneous" algorithm, > for every input sample it will return an output sample. There's no > downsampling involved. > > > Now, however, it seems rather different. From what you're saying: When calling > > > > k = sfa_node.execute( data, 10 ) > > > > then I get a matrix of 10 columns, where the first column contains the answer > > of the first/slowest signal/component? (Also, the correct terminology would be > > 'filtering through the slowest component' instead of 'getting the answer from > > the slowest function', I guess) > Yes, this is correct. > > > > > Thus: > > > > a = <arbitray index> > > > > k = sfa_node.execute( data, 10 )[0,a] > > j = sfa_node.execute( data, 50 )[0,a] > > > > Should yield the same for k and j? (I doesn't in my test cases) > it does for me: > >>> import mdp > >>> x = mdp.numx_rand.random((100,10)) > >>> sfanode = mdp.nodes.SFANode() > >>> sfanode.train(x) > >>> out = sfanode.execute(x) > >>> first = sfanode.execute(x, 1) > >>> mdp.numx.alltrue(out[:,0] == first[:,0]) > True > >>> first_three = sfanode.execute(x, 3) > >>> mdp.numx.alltrue(out[:,:3] == first_three[:,:3]) > True > > > I also assumed somewhat blindly that data filtered through a component yields > > just a scalar.. thus a sfa node returning a rowvector and not a matrix. If a > > matrix is returned however, the filtering yields a multidimensional result > > apperantly..? > I think you need to read a bit about SFA. The scholarpedia article > is a good starting point: http://www.scholarpedia.org/article/Slow_feature_analysis > > As I tried to explain above, SFA just "rotates" the input data into > a convenient space. > > Note that we are talking about linear SFA here. If you create a flow > with a nonlinear expansion and a PCA and a SFA, matters change in > terms of dimensionality, but it is still true that PCA and SFA are > just smart rotations. You can decide to "discard" some dimensions, > and in extreme case you can just select one output component. In > this case if input is TxN output will be 1xN. > > Hope that helps, > tiziano > >  > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsasfdev2dev1 > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111109 08:55:40

> This is wrong. I do not know what you checked and when, but in MDP components > are stored on columns True, but I didn't say anything else. Maybe I wrote it somewhat confused, but what I'm doing it training the network with lots of data, but getting the output slice by slice. I.e.: > so, for SFANode input comes as a matrix and the return value is also > a matrix. If you do not constrain the output dimensions, if the > input is a TxN matrix (T samples for each of the N different > components), the output will be a TxN matrix. > My input 'matrix' is a 1xn matrix, just a single inputtimestep. And since SFA works instantaneous, I should get back an 1xn matrix, which is what I tried to explain in the first post. Further, the first component of the returned (in my case) row vector should then store the activity of the slowest component, and so on. > Wrong. sfanode.execute(x)[0,1] will give you the first sample of the > second slowest function. The complete "answer", as you call it, is > as long in time as the input As above: When only feeding a single timestep, then we both said the same thing. I guess I should have made it clearer that I'm only feeding single steps into the network, work on the result, and then feed the next step. Thus, so far I don't see that I'm doing anything wrong (only that I could do it somewhat more efficiently, but I'll worry about that once a first version is running correctly). However, for a case like this: k = sfa_node.execute( data, 10 )[0,a] j = sfa_node.execute( data, 50 )[0,a] I still get different results. But at least I now know that's not supposed to be the case, so I'll recheck my code and try to find out where that's coming from. > You can decide to "discard" some dimensions, and in extreme case you can just > select one output component. In this case if input is TxN output will be 1xN. But in this case the result should be Nx1? It's one component (thus one column), and the result contains the T timesteps (thus T rows)? Cheers, Fabian On 11/09/11, Tiziano Zito wrote: > > However, it seems I don't really understand what an SFA node actually returns. > > As far as I know: once a node is trained, it can be called with a dataslice > > which returns a rowvector matrix (or did so, last time I checked). > This is wrong. I do not know what you checked and when, but in MDP > components are stored on columns: > http://mdptoolkit.sourceforge.net/tutorial/quick_start.html > """An important remark > > Input array data is typically assumed to be twodimensional and > ordered such that observations of the same variable are stored on > rows and different variables are stored on columns. > """ > > so, for SFANode input comes as a matrix and the return value is also > a matrix. If you do not constrain the output dimensions, if the > input is a TxN matrix (T samples for each of the N different > components), the output will be a TxN matrix. SFANode is just "rotating" > the input data into a frame of reference where components are sorted > by slowness. > > > I thought this vector would contain the 'answers' of the slowest found functions, i.e., > > something like sfa_node.execute( x )[0,1] would get me the answer of the > > second slowest function to the provided data x. > Wrong. sfanode.execute(x)[0,1] will give you the first sample of the > second slowest function. The complete "answer", as you call it, is > as long in time as the input: SFA is an "instantaneous" algorithm, > for every input sample it will return an output sample. There's no > downsampling involved. > > > Now, however, it seems rather different. From what you're saying: When calling > > > > k = sfa_node.execute( data, 10 ) > > > > then I get a matrix of 10 columns, where the first column contains the answer > > of the first/slowest signal/component? (Also, the correct terminology would be > > 'filtering through the slowest component' instead of 'getting the answer from > > the slowest function', I guess) > Yes, this is correct. > > > > > Thus: > > > > a = <arbitray index> > > > > k = sfa_node.execute( data, 10 )[0,a] > > j = sfa_node.execute( data, 50 )[0,a] > > > > Should yield the same for k and j? (I doesn't in my test cases) > it does for me: > >>> import mdp > >>> x = mdp.numx_rand.random((100,10)) > >>> sfanode = mdp.nodes.SFANode() > >>> sfanode.train(x) > >>> out = sfanode.execute(x) > >>> first = sfanode.execute(x, 1) > >>> mdp.numx.alltrue(out[:,0] == first[:,0]) > True > >>> first_three = sfanode.execute(x, 3) > >>> mdp.numx.alltrue(out[:,:3] == first_three[:,:3]) > True > > > I also assumed somewhat blindly that data filtered through a component yields > > just a scalar.. thus a sfa node returning a rowvector and not a matrix. If a > > matrix is returned however, the filtering yields a multidimensional result > > apperantly..? > I think you need to read a bit about SFA. The scholarpedia article > is a good starting point: http://www.scholarpedia.org/article/Slow_feature_analysis > > As I tried to explain above, SFA just "rotates" the input data into > a convenient space. > > Note that we are talking about linear SFA here. If you create a flow > with a nonlinear expansion and a PCA and a SFA, matters change in > terms of dimensionality, but it is still true that PCA and SFA are > just smart rotations. You can decide to "discard" some dimensions, > and in extreme case you can just select one output component. In > this case if input is TxN output will be 1xN. > > Hope that helps, > tiziano > >  > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsasfdev2dev1 > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Tiziano Zito <tiziano.zito@bc...>  20111109 08:25:41

> However, it seems I don't really understand what an SFA node actually returns. > As far as I know: once a node is trained, it can be called with a dataslice > which returns a rowvector matrix (or did so, last time I checked). This is wrong. I do not know what you checked and when, but in MDP components are stored on columns: http://mdptoolkit.sourceforge.net/tutorial/quick_start.html """An important remark Input array data is typically assumed to be twodimensional and ordered such that observations of the same variable are stored on rows and different variables are stored on columns. """ so, for SFANode input comes as a matrix and the return value is also a matrix. If you do not constrain the output dimensions, if the input is a TxN matrix (T samples for each of the N different components), the output will be a TxN matrix. SFANode is just "rotating" the input data into a frame of reference where components are sorted by slowness. > I thought this vector would contain the 'answers' of the slowest found functions, i.e., > something like sfa_node.execute( x )[0,1] would get me the answer of the > second slowest function to the provided data x. Wrong. sfanode.execute(x)[0,1] will give you the first sample of the second slowest function. The complete "answer", as you call it, is as long in time as the input: SFA is an "instantaneous" algorithm, for every input sample it will return an output sample. There's no downsampling involved. > Now, however, it seems rather different. From what you're saying: When calling > > k = sfa_node.execute( data, 10 ) > > then I get a matrix of 10 columns, where the first column contains the answer > of the first/slowest signal/component? (Also, the correct terminology would be > 'filtering through the slowest component' instead of 'getting the answer from > the slowest function', I guess) Yes, this is correct. > > Thus: > > a = <arbitray index> > > k = sfa_node.execute( data, 10 )[0,a] > j = sfa_node.execute( data, 50 )[0,a] > > Should yield the same for k and j? (I doesn't in my test cases) it does for me: >>> import mdp >>> x = mdp.numx_rand.random((100,10)) >>> sfanode = mdp.nodes.SFANode() >>> sfanode.train(x) >>> out = sfanode.execute(x) >>> first = sfanode.execute(x, 1) >>> mdp.numx.alltrue(out[:,0] == first[:,0]) True >>> first_three = sfanode.execute(x, 3) >>> mdp.numx.alltrue(out[:,:3] == first_three[:,:3]) True > I also assumed somewhat blindly that data filtered through a component yields > just a scalar.. thus a sfa node returning a rowvector and not a matrix. If a > matrix is returned however, the filtering yields a multidimensional result > apperantly..? I think you need to read a bit about SFA. The scholarpedia article is a good starting point: http://www.scholarpedia.org/article/Slow_feature_analysis As I tried to explain above, SFA just "rotates" the input data into a convenient space. Note that we are talking about linear SFA here. If you create a flow with a nonlinear expansion and a PCA and a SFA, matters change in terms of dimensionality, but it is still true that PCA and SFA are just smart rotations. You can decide to "discard" some dimensions, and in extreme case you can just select one output component. In this case if input is TxN output will be 1xN. Hope that helps, tiziano 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111108 09:19:09

"I fail to understand what you are asking."  That's quite alright, as I fail to understand what you're saying :) However, it seems I don't really understand what an SFA node actually returns. As far as I know: once a node is trained, it can be called with a dataslice which returns a rowvector matrix (or did so, last time I checked). I thought this vector would contain the 'answers' of the slowest found functions, i.e., something like sfa_node.execute( x )[0,1] would get me the answer of the second slowest function to the provided data x. Now, however, it seems rather different. From what you're saying: When calling k = sfa_node.execute( data, 10 ) then I get a matrix of 10 columns, where the first column contains the answer of the first/slowest signal/component? (Also, the correct terminology would be 'filtering through the slowest component' instead of 'getting the answer from the slowest function', I guess) Thus: a = <arbitray index> k = sfa_node.execute( data, 10 )[0,a] j = sfa_node.execute( data, 50 )[0,a] Should yield the same for k and j? (I doesn't in my test cases) I also assumed somewhat blindly that data filtered through a component yields just a scalar.. thus a sfa node returning a rowvector and not a matrix. If a matrix is returned however, the filtering yields a multidimensional result apperantly..? Cheers, Fabian ps: pickling worked, thanks! much more easier to not always train from sratch ;) On 11/07/11, Tiziano Zito wrote: > > (a) What exactly does the second parameter of the execute() function denote? I > > find the documentation somewhat confusing  what exactly would I have to use > > to get the result from the slowest function? (Or the second slowest for that > > matter.) > > > > slowest = sfa_node( data, 1 )[0,0] > > or > > slowest = sfa_node( data, 32 )[0,0] > > > > I don't quite get what the formal difference is here. In both cases (I think) > > I should get the answer from the slowest found function.. but based on a) only > > the slowest or b) the 32 slowest functions..? o_O > > What if I want the second slowest? Then the answer of course should only be > > computed using the second slowest function.. not the first two slowest..? > I fail to understand what you are asking. If n=1, you get the input > data filtered through the 1st (==slowest) component. If n=10, you > get the input filtered through the first 10 (10 slowest) components, > and so on. If you want the input filtered *only* through the 3rd > slowest component, you either set n=3 and just retain the last > column of the resulting matrix, or you do the multiplication by > hand: > > mdp.utils.mult(x, sfanode.sf[:,2])  sfanode.bias[2] > > you use 2 instead of 3 because array indexing starts from 0 ;) > > > (b) When having stored a SFA node to a file via save(), how can I load it? > just > import pickle > fl = open('dumped_node','r') > sfanode = pickle.load(fl) > > > hth, > tiziano > > >  > RSA(R) Conference 2012 > Save $700 by Nov 18 > Register now > http://p.sf.net/sfu/rsasfdev2dev1 > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers 
From: Tiziano Zito <tiziano.zito@bc...>  20111107 16:18:57

> (a) What exactly does the second parameter of the execute() function denote? I > find the documentation somewhat confusing  what exactly would I have to use > to get the result from the slowest function? (Or the second slowest for that > matter.) > > slowest = sfa_node( data, 1 )[0,0] > or > slowest = sfa_node( data, 32 )[0,0] > > I don't quite get what the formal difference is here. In both cases (I think) > I should get the answer from the slowest found function.. but based on a) only > the slowest or b) the 32 slowest functions..? o_O > What if I want the second slowest? Then the answer of course should only be > computed using the second slowest function.. not the first two slowest..? I fail to understand what you are asking. If n=1, you get the input data filtered through the 1st (==slowest) component. If n=10, you get the input filtered through the first 10 (10 slowest) components, and so on. If you want the input filtered *only* through the 3rd slowest component, you either set n=3 and just retain the last column of the resulting matrix, or you do the multiplication by hand: mdp.utils.mult(x, sfanode.sf[:,2])  sfanode.bias[2] you use 2 instead of 3 because array indexing starts from 0 ;) > (b) When having stored a SFA node to a file via save(), how can I load it? just import pickle fl = open('dumped_node','r') sfanode = pickle.load(fl) hth, tiziano 
From: Fabian Schoenfeld <fabian.schoenfeld@in...>  20111107 09:28:41

Hi! Just two smallish questions regarding the SFANodes: (a) What exactly does the second parameter of the execute() function denote? I find the documentation somewhat confusing  what exactly would I have to use to get the result from the slowest function? (Or the second slowest for that matter.) slowest = sfa_node( data, 1 )[0,0] or slowest = sfa_node( data, 32 )[0,0] I don't quite get what the formal difference is here. In both cases (I think) I should get the answer from the slowest found function.. but based on a) only the slowest or b) the 32 slowest functions..? o_O What if I want the second slowest? Then the answer of course should only be computed using the second slowest function.. not the first two slowest..? (b) When having stored a SFA node to a file via save(), how can I load it? Thanks for the help, as usual :) Cheers, Fabian 