You can subscribe to this list here.
2004 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}
(1) 
_{Sep}

_{Oct}

_{Nov}
(1) 
_{Dec}


2005 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}
(1) 
_{Jul}

_{Aug}

_{Sep}

_{Oct}
(2) 
_{Nov}

_{Dec}
(1) 
2006 
_{Jan}

_{Feb}

_{Mar}

_{Apr}

_{May}
(3) 
_{Jun}
(1) 
_{Jul}
(3) 
_{Aug}
(8) 
_{Sep}

_{Oct}

_{Nov}

_{Dec}

2007 
_{Jan}
(1) 
_{Feb}

_{Mar}
(1) 
_{Apr}

_{May}
(2) 
_{Jun}
(3) 
_{Jul}
(1) 
_{Aug}
(4) 
_{Sep}
(15) 
_{Oct}
(4) 
_{Nov}

_{Dec}

2008 
_{Jan}
(10) 
_{Feb}
(2) 
_{Mar}

_{Apr}

_{May}
(7) 
_{Jun}
(4) 
_{Jul}
(6) 
_{Aug}
(12) 
_{Sep}

_{Oct}
(3) 
_{Nov}
(13) 
_{Dec}
(10) 
2009 
_{Jan}
(12) 
_{Feb}
(19) 
_{Mar}
(27) 
_{Apr}

_{May}
(6) 
_{Jun}
(9) 
_{Jul}

_{Aug}
(5) 
_{Sep}
(12) 
_{Oct}
(20) 
_{Nov}
(1) 
_{Dec}
(8) 
2010 
_{Jan}
(5) 
_{Feb}
(8) 
_{Mar}
(3) 
_{Apr}
(4) 
_{May}
(3) 
_{Jun}
(12) 
_{Jul}
(22) 
_{Aug}
(19) 
_{Sep}
(7) 
_{Oct}
(7) 
_{Nov}
(7) 
_{Dec}
(21) 
2011 
_{Jan}
(10) 
_{Feb}
(18) 
_{Mar}
(26) 
_{Apr}
(12) 
_{May}

_{Jun}
(3) 
_{Jul}
(6) 
_{Aug}
(11) 
_{Sep}
(19) 
_{Oct}
(32) 
_{Nov}
(31) 
_{Dec}
(27) 
2012 
_{Jan}
(8) 
_{Feb}
(5) 
_{Mar}
(19) 
_{Apr}
(3) 
_{May}
(3) 
_{Jun}
(14) 
_{Jul}
(15) 
_{Aug}
(3) 
_{Sep}
(14) 
_{Oct}
(7) 
_{Nov}
(6) 
_{Dec}
(36) 
2013 
_{Jan}
(18) 
_{Feb}
(8) 
_{Mar}
(22) 
_{Apr}
(4) 
_{May}
(18) 
_{Jun}
(16) 
_{Jul}
(9) 
_{Aug}
(8) 
_{Sep}
(4) 
_{Oct}
(6) 
_{Nov}
(1) 
_{Dec}
(3) 
2014 
_{Jan}
(5) 
_{Feb}
(3) 
_{Mar}
(5) 
_{Apr}
(6) 
_{May}
(2) 
_{Jun}

_{Jul}
(4) 
_{Aug}
(4) 
_{Sep}
(7) 
_{Oct}
(3) 
_{Nov}
(5) 
_{Dec}
(3) 
2015 
_{Jan}
(1) 
_{Feb}

_{Mar}

_{Apr}
(1) 
_{May}
(2) 
_{Jun}

_{Jul}
(3) 
_{Aug}
(4) 
_{Sep}

_{Oct}
(5) 
_{Nov}

_{Dec}

2016 
_{Jan}
(11) 
_{Feb}
(7) 
_{Mar}
(1) 
_{Apr}
(1) 
_{May}
(4) 
_{Jun}
(3) 
_{Jul}
(6) 
_{Aug}
(1) 
_{Sep}
(1) 
_{Oct}
(1) 
_{Nov}
(5) 
_{Dec}

2017 
_{Jan}
(6) 
_{Feb}

_{Mar}

_{Apr}

_{May}

_{Jun}

_{Jul}

_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 




1

2

3
(1) 
4

5

6

7

8
(3) 
9

10

11

12
(2) 
13
(1) 
14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30



From: Jason Merrill <jason.merrill@ya...>  20100913 03:38:03

Great, that's helpful. Thanks a lot. On Sun, Sep 12, 2010 at 7:11 PM, Pietro Berkes <berkes@...> wrote: > Hi Jason! > FANode starts the learning cycles from a random matrix, and that's why > you're seeing variations in the results. There is an argument in the > constructor, 'tol', which controls the tolerance on the convergence of > the loglikelihood. The default is 10^4, you can make is smaller if > you want your results to be more precise. Also, there is a > 'max_cycles' argument that can be increased if needed. You can have a > look here for more information: > http://mdptoolkit.sourceforge.net/docs/api/index.html . > If you need to get reliably the same results, you'll need to fix the > random seed in numpy. > Best, > Pietro > > > On Sun, Sep 12, 2010 at 6:24 PM, Jason Merrill <jason.merrill@...> wrote: >> I'm using FANode to do factor analysis, and I've noticed that the >> output varies from run to run with the same input data, usually by >> about one percent. Is this expected behavior? >> >> Here's the script I'm using >> >> import mdp >> import numpy >> >> a = numpy.array([[3,4,5],[6,1,5],[5,6,5],[6,6,3]],'float32') >> fanode = mdp.nodes.FANode(output_dim=1) >> zscores = fanode(a).transpose()[0] >> print zscores >> >> On three successive runs, it output >> $ python mdp_test.py >> [ 0.59307748 0.5755226 0.53850538 1.70710552] >> $ python mdp_test.py >> [ 0.59694427 0.57506168 0.5287832 1.70078909] >> $ python mdp_test.py >> [ 0.59462035 0.57533944 0.53463584 1.70459569] >> >> Regards, >> >> Jason Merrill >> >>  >> Start uncovering the many advantages of virtual appliances >> and start using them to simplify application deployment and >> accelerate your shift to cloud computing >> http://p.sf.net/sfu/novellsfdev2dev >> _______________________________________________ >> mdptoolkitusers mailing list >> mdptoolkitusers@... >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > >  > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing > http://p.sf.net/sfu/novellsfdev2dev > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Pietro Berkes <berkes@ga...>  20100912 23:11:58

Hi Jason! FANode starts the learning cycles from a random matrix, and that's why you're seeing variations in the results. There is an argument in the constructor, 'tol', which controls the tolerance on the convergence of the loglikelihood. The default is 10^4, you can make is smaller if you want your results to be more precise. Also, there is a 'max_cycles' argument that can be increased if needed. You can have a look here for more information: http://mdptoolkit.sourceforge.net/docs/api/index.html . If you need to get reliably the same results, you'll need to fix the random seed in numpy. Best, Pietro On Sun, Sep 12, 2010 at 6:24 PM, Jason Merrill <jason.merrill@...> wrote: > I'm using FANode to do factor analysis, and I've noticed that the > output varies from run to run with the same input data, usually by > about one percent. Is this expected behavior? > > Here's the script I'm using > > import mdp > import numpy > > a = numpy.array([[3,4,5],[6,1,5],[5,6,5],[6,6,3]],'float32') > fanode = mdp.nodes.FANode(output_dim=1) > zscores = fanode(a).transpose()[0] > print zscores > > On three successive runs, it output > $ python mdp_test.py > [ 0.59307748 0.5755226 0.53850538 1.70710552] > $ python mdp_test.py > [ 0.59694427 0.57506168 0.5287832 1.70078909] > $ python mdp_test.py > [ 0.59462035 0.57533944 0.53463584 1.70459569] > > Regards, > > Jason Merrill > >  > Start uncovering the many advantages of virtual appliances > and start using them to simplify application deployment and > accelerate your shift to cloud computing > http://p.sf.net/sfu/novellsfdev2dev > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Jason Merrill <jason.merrill@ya...>  20100912 22:24:15

I'm using FANode to do factor analysis, and I've noticed that the output varies from run to run with the same input data, usually by about one percent. Is this expected behavior? Here's the script I'm using import mdp import numpy a = numpy.array([[3,4,5],[6,1,5],[5,6,5],[6,6,3]],'float32') fanode = mdp.nodes.FANode(output_dim=1) zscores = fanode(a).transpose()[0] print zscores On three successive runs, it output $ python mdp_test.py [ 0.59307748 0.5755226 0.53850538 1.70710552] $ python mdp_test.py [ 0.59694427 0.57506168 0.5287832 1.70078909] $ python mdp_test.py [ 0.59462035 0.57533944 0.53463584 1.70459569] Regards, Jason Merrill 
From: Pietro Berkes <berkes@ga...>  20100908 13:49:22

thanks philipp, that's exactly right 1) you can do either way depending on whether you want to set a fixed number of important dimensions, or you want to let MDP do it automatically 2) I forgot one line: # compute total transformation from x to y mn = flow[0].avg w = np.dot(flow[0].v, flow[1].E_y_mtx) y2 = np.dot(x  mn, w) np.absolute(y2y).max() P. On Wed, Sep 8, 2010 at 6:46 AM, Philipp Meier <pmeier82@...> wrote: > to 1) there are different approaches to determine the numper of principal > components to keep for the projection. 1) the supervised approach lets you > simply choose the number of principal components, usefull if you know a lot > about your problem or you need to make assumptions about dimensionalities in > the data flow. 2) as the principal components are the eigenvectors of the > empirical data covariance matrix ranked from largest to smallest magnitude, > ou can always compute how many of them you have to keep in order to explain > a certain percentage of the variance in your data. this is done by summing > up the eigenvalues until they represent the required share of the total > variance (argmin_i sum_i=0^n{eig_val[i]}/sum_j=0^n{eig_val[j] = ratio, given > the eigenvalues and vectors are ordered from largest to smallest magnitude). > to 2) y2 = np.dot(x  mn, w) = np.dot(x  flow[0].avg, np.dot(flow[0].v, > flow[1].E_y_mtx)) > On 8 September 2010 04:38, Thorton Timms <mightythorton@...> wrote: >> >> Thanks. I was pretty close to this. >> >> Two questions. >> 1) Why is the number of dimensions set in the PCA step, instead of using >> reduce=True? >> >> 2) what is y2 on the last line of code? >> Thanks, >> Thorton >> >> On Fri, Sep 3, 2010 at 7:25 AM, Pietro Berkes <berkes@...> >> wrote: >>> >>> You need to compute the total transformation from x to y: >>> >>> PCANode transforms your input, x, to an output x': >>> >>> x' = (x  avg) * v >>> >>> FANode transform its input, x1 to an output y: >>> >>> y = (x'  mu) * E_y_mtx >>> >>> mu is zero since the output of PCANode has mean zero >>> >>> => y = x' * E_y_mtx = (x  avg) * v * E_y_mtx = (x  avg) * w >>> >>> In conclusions: the factors that you are looking for are the product >>> of the PCA factors and the FA factors. Here's some code that might >>> make this point simpler: >>> >>> import numpy as np >>> import mdp >>> from mdp import nodes as nd >>> >>> x = np.loadtxt('in2.txt') >>> flow = mdp.Flow([nd.PCANode(output_dim=100), >>> nd.FANode(output_dim=10)]) >>> >>> flow.train(x) >>> y = flow.execute(x) >>> >>> # compute total transformation from x to y >>> mn = flow[0].avg >>> w = np.dot(flow[0].v, flow[1].E_y_mtx) >>> print np.absolute(y2y).max() >>> >>> The output should be *very* close to zero, showing that indeed the >>> matrix w computes the total factors. >>> >>> P. >>> >>> >>> >>> On Fri, Aug 27, 2010 at 10:54 PM, Thorton Timms <mightythorton@...> >>> wrote: >>> > Pietro, >>> > >>> > Thanks! The new error is much more helpful. >>> > I am attempting to, both, reduce the dimensions and also find hidden >>> > structures. It is important that I be able to know what original >>> > factors >>> > are contributing to the new reduced factors. I need human >>> > understanding of >>> > the analysis as well as a prediction model. >>> > >>> > I can do a PCA first, but it will make determining the structure >>> > (correlation from the new factors to the original factors) a bit more >>> > difficult. Any suggestions? >>> > >>> > Thanks again for all the help, >>> > Thorton >>> > >>> > On Wed, Aug 25, 2010 at 12:17 PM, Pietro Berkes >>> > <berkes@...> >>> > wrote: >>> >> >>> >> Dear Thorton, >>> >> >>> >> it looks like the covariance matrix of your input has determinant >>> >> zero, i.e. it is singular. That means that some columns are linearly >>> >> dependent. >>> >> >>> >> import numpy as np >>> >> # x contains the data >>> >> np.linalg.det(np.cov(x, rowvar=0)) >>> >> 0.0 >>> >> >>> >> Using PCANode, it seems like there are only 193 nondegenerate >>> >> dimensions: >>> >> >>> >> x2 = mdp.nodes.PCANode(reduce=True)(x) >>> >> x2.shape >>> >> (500, 193) >>> >> >>> >> The output dimensions is 193, meaning that PCANode found that the rest >>> >> of the eigenvalues of the covariance matrix of the data is 0. This is >>> >> what confuses FANode. >>> >> >>> >> It is possible that with the whole data set, the problem is reduced >>> >> (please give the code above a try). However, I suspect that the >>> >> covariance matrix will still be very illconditioned. Some of the >>> >> numeric values seem to have a scale much smaller than one (several >>> >> orders of magnitude), rescaling them to be of order 1 might help. >>> >> Another thing you can to is to run PCANode first as shown above, then >>> >> FANode on the reduced data, then find the complete transformations by >>> >> multiplying the projection matrices together. >>> >> >>> >> I know algorithms for binary variables alone, and others for >>> >> continuous variables, but I can't think of an outofthebox one for a >>> >> mix of both. I guess it will depend a lot on your goal. Are you trying >>> >> to reduce the dimensionality of the data set, or trying to find hidden >>> >> structure, or something else entirely? >>> >> >>> >> Best, >>> >> Pietro >>> >> >>> >> >>> >> On Tue, Aug 24, 2010 at 6:29 PM, Thorton Timms >>> >> <mightythorton@...> >>> >> wrote: >>> >> > FANode parameters: >>> >> > tol=0.0001, output_dim=dim >>> >> > >>> >> > Where dim is in range (1056). All output dimensions in that range >>> >> > seem >>> >> > to >>> >> > produce the same problem. >>> >> > My data has 241 factors and over 41,000 data points. >>> >> > I created a sample of 500 data points and 10 factors(see attached). >>> >> > If >>> >> > you >>> >> > look at the second to last row in the Ey matrix, you can see that it >>> >> > dominates and some others dominate, but not as much. >>> >> > >>> >> > My data consists mostly binary attributes (hence all the 1 and 0 >>> >> > values. >>> >> > Howerver, there are some numeric values. The numberic values have >>> >> > been >>> >> > normalized to values between 0 and 1. The numeric values are the >>> >> > ones >>> >> > that >>> >> > seem to be dominating (I didn't notice this until I created this >>> >> > sample >>> >> > output). >>> >> > >>> >> > I assume that the binary values are scewing the analysis? if that >>> >> > is >>> >> > the >>> >> > case, then what is the best practice for data sets with binary and >>> >> > numerical >>> >> > factors? >>> >> > >>> >> > Thanks for all the help, >>> >> > Thorton >>> >> > >>> >> > >>> >> > On Tue, Aug 24, 2010 at 10:49 AM, Pietro Berkes >>> >> > <berkes@...> >>> >> > wrote: >>> >> >> >>> >> >> It doesn't sound right, but I'll need more details: >>> >> >> >>> >> >>  what are the parameters to the FANode constructor? >>> >> >>  how many data points? >>> >> >>  how many factors did you extract? >>> >> >> >>> >> >> Of course, having a look at the data or the mixing matrix E_y_mtx >>> >> >> would be even better. >>> >> >> P. >>> >> >> >>> >> >> On Tue, Aug 24, 2010 at 10:50 AM, Thorton Timms >>> >> >> <mightythorton@...> >>> >> >> wrote: >>> >> >> > I have a large data set with almost 200 factors. When I run the >>> >> >> > MDP >>> >> >> > Factor >>> >> >> > Analysis, and analyze the E_y matrix, it appears that each of the >>> >> >> > reduced >>> >> >> > factors is influenced the most by the same handful of original >>> >> >> > factors >>> >> >> > (just >>> >> >> > with different weights). Is this normal? I would expect that at >>> >> >> > least >>> >> >> > some >>> >> >> > of the reduced factors would be base on a mixture of original >>> >> >> > factors. >>> >> >> > >>> >> >> > Thanks, >>> >> >> > Thorton >>> >> > >>> >> > >>> >> > >>> >> >  >>> >> > Sell apps to millions through the Intel(R) Atom(Tm) Developer >>> >> > Program >>> >> > Be part of this innovative community and reach millions of netbook >>> >> > users >>> >> > worldwide. Take advantage of special opportunities to increase >>> >> > revenue >>> >> > and >>> >> > speed timetomarket. Join now, and jumpstart your future. >>> >> > http://p.sf.net/sfu/intelatomd2d >>> >> > _______________________________________________ >>> >> > mdptoolkitusers mailing list >>> >> > mdptoolkitusers@... >>> >> > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >>> >> > >>> >> > >>> >> >>> >> >>> >> >>> >>  >>> >> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program >>> >> Be part of this innovative community and reach millions of netbook >>> >> users >>> >> worldwide. Take advantage of special opportunities to increase revenue >>> >> and >>> >> speed timetomarket. Join now, and jumpstart your future. >>> >> http://p.sf.net/sfu/intelatomd2d >>> >> _______________________________________________ >>> >> mdptoolkitusers mailing list >>> >> mdptoolkitusers@... >>> >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >>> > >>> > >>> > >>> >  >>> > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program >>> > Be part of this innovative community and reach millions of netbook >>> > users >>> > worldwide. Take advantage of special opportunities to increase revenue >>> > and >>> > speed timetomarket. Join now, and jumpstart your future. >>> > http://p.sf.net/sfu/intelatomd2d >>> > _______________________________________________ >>> > mdptoolkitusers mailing list >>> > mdptoolkitusers@... >>> > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >>> > >>> > >>> >>> >>>  >>> This SF.net Dev2Dev email is sponsored by: >>> >>> Show off your parallel programming skills. >>> Enter the Intel(R) Threading Challenge 2010. >>> http://p.sf.net/sfu/intelthreadsfd >>> _______________________________________________ >>> mdptoolkitusers mailing list >>> mdptoolkitusers@... >>> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> >> >> >>  >> This SF.net Dev2Dev email is sponsored by: >> >> Show off your parallel programming skills. >> Enter the Intel(R) Threading Challenge 2010. >> http://p.sf.net/sfu/intelthreadsfd >> _______________________________________________ >> mdptoolkitusers mailing list >> mdptoolkitusers@... >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > > > >  > Philipp Meier > TUBerlin, Fr2071 > Neural Information Processing Group, > Phone:++4930 31426756 > >  > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intelthreadsfd > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > 
From: Philipp Meier <pmeier82@go...>  20100908 10:46:33

to 1) there are different approaches to determine the numper of principal components to keep for the projection. 1) the supervised approach lets you simply choose the number of principal components, usefull if you know a lot about your problem or you need to make assumptions about dimensionalities in the data flow. 2) as the principal components are the eigenvectors of the empirical data covariance matrix ranked from largest to smallest magnitude, ou can always compute how many of them you have to keep in order to explain a certain percentage of the variance in your data. this is done by summing up the eigenvalues until they represent the required share of the total variance (argmin_i sum_i=0^n{eig_val[i]}/sum_j=0^n{eig_val[j] = ratio, given the eigenvalues and vectors are ordered from largest to smallest magnitude). to 2) y2 = np.dot(x  mn, w) = np.dot(x  flow[0].avg, np.dot(flow[0].v, flow[1].E_y_mtx)) On 8 September 2010 04:38, Thorton Timms <mightythorton@...> wrote: > Thanks. I was pretty close to this. > Two questions. > 1) Why is the number of dimensions set in the PCA step, instead of using > reduce=True? > > 2) what is y2 on the last line of code? > Thanks, > Thorton > > > On Fri, Sep 3, 2010 at 7:25 AM, Pietro Berkes <berkes@...>wrote: > >> You need to compute the total transformation from x to y: >> >> PCANode transforms your input, x, to an output x': >> >> x' = (x  avg) * v >> >> FANode transform its input, x1 to an output y: >> >> y = (x'  mu) * E_y_mtx >> >> mu is zero since the output of PCANode has mean zero >> >> => y = x' * E_y_mtx = (x  avg) * v * E_y_mtx = (x  avg) * w >> >> In conclusions: the factors that you are looking for are the product >> of the PCA factors and the FA factors. Here's some code that might >> make this point simpler: >> >> import numpy as np >> import mdp >> from mdp import nodes as nd >> >> x = np.loadtxt('in2.txt') >> flow = mdp.Flow([nd.PCANode(output_dim=100), >> nd.FANode(output_dim=10)]) >> >> flow.train(x) >> y = flow.execute(x) >> >> # compute total transformation from x to y >> mn = flow[0].avg >> w = np.dot(flow[0].v, flow[1].E_y_mtx) >> print np.absolute(y2y).max() >> >> The output should be *very* close to zero, showing that indeed the >> matrix w computes the total factors. >> >> P. >> >> >> >> On Fri, Aug 27, 2010 at 10:54 PM, Thorton Timms <mightythorton@...> >> wrote: >> > Pietro, >> > >> > Thanks! The new error is much more helpful. >> > I am attempting to, both, reduce the dimensions and also find hidden >> > structures. It is important that I be able to know what original >> factors >> > are contributing to the new reduced factors. I need human understanding >> of >> > the analysis as well as a prediction model. >> > >> > I can do a PCA first, but it will make determining the structure >> > (correlation from the new factors to the original factors) a bit more >> > difficult. Any suggestions? >> > >> > Thanks again for all the help, >> > Thorton >> > >> > On Wed, Aug 25, 2010 at 12:17 PM, Pietro Berkes < >> berkes@...> >> > wrote: >> >> >> >> Dear Thorton, >> >> >> >> it looks like the covariance matrix of your input has determinant >> >> zero, i.e. it is singular. That means that some columns are linearly >> >> dependent. >> >> >> >> import numpy as np >> >> # x contains the data >> >> np.linalg.det(np.cov(x, rowvar=0)) >> >> 0.0 >> >> >> >> Using PCANode, it seems like there are only 193 nondegenerate >> dimensions: >> >> >> >> x2 = mdp.nodes.PCANode(reduce=True)(x) >> >> x2.shape >> >> (500, 193) >> >> >> >> The output dimensions is 193, meaning that PCANode found that the rest >> >> of the eigenvalues of the covariance matrix of the data is 0. This is >> >> what confuses FANode. >> >> >> >> It is possible that with the whole data set, the problem is reduced >> >> (please give the code above a try). However, I suspect that the >> >> covariance matrix will still be very illconditioned. Some of the >> >> numeric values seem to have a scale much smaller than one (several >> >> orders of magnitude), rescaling them to be of order 1 might help. >> >> Another thing you can to is to run PCANode first as shown above, then >> >> FANode on the reduced data, then find the complete transformations by >> >> multiplying the projection matrices together. >> >> >> >> I know algorithms for binary variables alone, and others for >> >> continuous variables, but I can't think of an outofthebox one for a >> >> mix of both. I guess it will depend a lot on your goal. Are you trying >> >> to reduce the dimensionality of the data set, or trying to find hidden >> >> structure, or something else entirely? >> >> >> >> Best, >> >> Pietro >> >> >> >> >> >> On Tue, Aug 24, 2010 at 6:29 PM, Thorton Timms < >> mightythorton@...> >> >> wrote: >> >> > FANode parameters: >> >> > tol=0.0001, output_dim=dim >> >> > >> >> > Where dim is in range (1056). All output dimensions in that range >> seem >> >> > to >> >> > produce the same problem. >> >> > My data has 241 factors and over 41,000 data points. >> >> > I created a sample of 500 data points and 10 factors(see attached). >> If >> >> > you >> >> > look at the second to last row in the Ey matrix, you can see that it >> >> > dominates and some others dominate, but not as much. >> >> > >> >> > My data consists mostly binary attributes (hence all the 1 and 0 >> values. >> >> > Howerver, there are some numeric values. The numberic values have >> been >> >> > normalized to values between 0 and 1. The numeric values are the >> ones >> >> > that >> >> > seem to be dominating (I didn't notice this until I created this >> sample >> >> > output). >> >> > >> >> > I assume that the binary values are scewing the analysis? if that is >> >> > the >> >> > case, then what is the best practice for data sets with binary and >> >> > numerical >> >> > factors? >> >> > >> >> > Thanks for all the help, >> >> > Thorton >> >> > >> >> > >> >> > On Tue, Aug 24, 2010 at 10:49 AM, Pietro Berkes >> >> > <berkes@...> >> >> > wrote: >> >> >> >> >> >> It doesn't sound right, but I'll need more details: >> >> >> >> >> >>  what are the parameters to the FANode constructor? >> >> >>  how many data points? >> >> >>  how many factors did you extract? >> >> >> >> >> >> Of course, having a look at the data or the mixing matrix E_y_mtx >> >> >> would be even better. >> >> >> P. >> >> >> >> >> >> On Tue, Aug 24, 2010 at 10:50 AM, Thorton Timms >> >> >> <mightythorton@...> >> >> >> wrote: >> >> >> > I have a large data set with almost 200 factors. When I run the >> MDP >> >> >> > Factor >> >> >> > Analysis, and analyze the E_y matrix, it appears that each of the >> >> >> > reduced >> >> >> > factors is influenced the most by the same handful of original >> >> >> > factors >> >> >> > (just >> >> >> > with different weights). Is this normal? I would expect that at >> >> >> > least >> >> >> > some >> >> >> > of the reduced factors would be base on a mixture of original >> >> >> > factors. >> >> >> > >> >> >> > Thanks, >> >> >> > Thorton >> >> > >> >> > >> >> > >>  >> >> > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program >> >> > Be part of this innovative community and reach millions of netbook >> users >> >> > worldwide. Take advantage of special opportunities to increase >> revenue >> >> > and >> >> > speed timetomarket. Join now, and jumpstart your future. >> >> > http://p.sf.net/sfu/intelatomd2d >> >> > _______________________________________________ >> >> > mdptoolkitusers mailing list >> >> > mdptoolkitusers@... >> >> > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> >> > >> >> > >> >> >> >> >> >> >>  >> >> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program >> >> Be part of this innovative community and reach millions of netbook >> users >> >> worldwide. Take advantage of special opportunities to increase revenue >> and >> >> speed timetomarket. Join now, and jumpstart your future. >> >> http://p.sf.net/sfu/intelatomd2d >> >> _______________________________________________ >> >> mdptoolkitusers mailing list >> >> mdptoolkitusers@... >> >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > >> > >> > >>  >> > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program >> > Be part of this innovative community and reach millions of netbook users >> > worldwide. Take advantage of special opportunities to increase revenue >> and >> > speed timetomarket. Join now, and jumpstart your future. >> > http://p.sf.net/sfu/intelatomd2d >> > _______________________________________________ >> > mdptoolkitusers mailing list >> > mdptoolkitusers@... >> > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > >> > >> >> >>  >> This SF.net Dev2Dev email is sponsored by: >> >> Show off your parallel programming skills. >> Enter the Intel(R) Threading Challenge 2010. >> http://p.sf.net/sfu/intelthreadsfd >> _______________________________________________ >> mdptoolkitusers mailing list >> mdptoolkitusers@... >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > > > >  > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intelthreadsfd > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > >  Philipp Meier TUBerlin, Fr2071 Neural Information Processing Group, Phone:++4930 31426756 
From: Thorton Timms <mightythorton@gm...>  20100908 02:38:56

Thanks. I was pretty close to this. Two questions. 1) Why is the number of dimensions set in the PCA step, instead of using reduce=True? 2) what is y2 on the last line of code? Thanks, Thorton On Fri, Sep 3, 2010 at 7:25 AM, Pietro Berkes <berkes@...>wrote: > You need to compute the total transformation from x to y: > > PCANode transforms your input, x, to an output x': > > x' = (x  avg) * v > > FANode transform its input, x1 to an output y: > > y = (x'  mu) * E_y_mtx > > mu is zero since the output of PCANode has mean zero > > => y = x' * E_y_mtx = (x  avg) * v * E_y_mtx = (x  avg) * w > > In conclusions: the factors that you are looking for are the product > of the PCA factors and the FA factors. Here's some code that might > make this point simpler: > > import numpy as np > import mdp > from mdp import nodes as nd > > x = np.loadtxt('in2.txt') > flow = mdp.Flow([nd.PCANode(output_dim=100), > nd.FANode(output_dim=10)]) > > flow.train(x) > y = flow.execute(x) > > # compute total transformation from x to y > mn = flow[0].avg > w = np.dot(flow[0].v, flow[1].E_y_mtx) > print np.absolute(y2y).max() > > The output should be *very* close to zero, showing that indeed the > matrix w computes the total factors. > > P. > > > > On Fri, Aug 27, 2010 at 10:54 PM, Thorton Timms <mightythorton@...> > wrote: > > Pietro, > > > > Thanks! The new error is much more helpful. > > I am attempting to, both, reduce the dimensions and also find hidden > > structures. It is important that I be able to know what original factors > > are contributing to the new reduced factors. I need human understanding > of > > the analysis as well as a prediction model. > > > > I can do a PCA first, but it will make determining the structure > > (correlation from the new factors to the original factors) a bit more > > difficult. Any suggestions? > > > > Thanks again for all the help, > > Thorton > > > > On Wed, Aug 25, 2010 at 12:17 PM, Pietro Berkes <berkes@... > > > > wrote: > >> > >> Dear Thorton, > >> > >> it looks like the covariance matrix of your input has determinant > >> zero, i.e. it is singular. That means that some columns are linearly > >> dependent. > >> > >> import numpy as np > >> # x contains the data > >> np.linalg.det(np.cov(x, rowvar=0)) > >> 0.0 > >> > >> Using PCANode, it seems like there are only 193 nondegenerate > dimensions: > >> > >> x2 = mdp.nodes.PCANode(reduce=True)(x) > >> x2.shape > >> (500, 193) > >> > >> The output dimensions is 193, meaning that PCANode found that the rest > >> of the eigenvalues of the covariance matrix of the data is 0. This is > >> what confuses FANode. > >> > >> It is possible that with the whole data set, the problem is reduced > >> (please give the code above a try). However, I suspect that the > >> covariance matrix will still be very illconditioned. Some of the > >> numeric values seem to have a scale much smaller than one (several > >> orders of magnitude), rescaling them to be of order 1 might help. > >> Another thing you can to is to run PCANode first as shown above, then > >> FANode on the reduced data, then find the complete transformations by > >> multiplying the projection matrices together. > >> > >> I know algorithms for binary variables alone, and others for > >> continuous variables, but I can't think of an outofthebox one for a > >> mix of both. I guess it will depend a lot on your goal. Are you trying > >> to reduce the dimensionality of the data set, or trying to find hidden > >> structure, or something else entirely? > >> > >> Best, > >> Pietro > >> > >> > >> On Tue, Aug 24, 2010 at 6:29 PM, Thorton Timms <mightythorton@... > > > >> wrote: > >> > FANode parameters: > >> > tol=0.0001, output_dim=dim > >> > > >> > Where dim is in range (1056). All output dimensions in that range > seem > >> > to > >> > produce the same problem. > >> > My data has 241 factors and over 41,000 data points. > >> > I created a sample of 500 data points and 10 factors(see attached). > If > >> > you > >> > look at the second to last row in the Ey matrix, you can see that it > >> > dominates and some others dominate, but not as much. > >> > > >> > My data consists mostly binary attributes (hence all the 1 and 0 > values. > >> > Howerver, there are some numeric values. The numberic values have > been > >> > normalized to values between 0 and 1. The numeric values are the ones > >> > that > >> > seem to be dominating (I didn't notice this until I created this > sample > >> > output). > >> > > >> > I assume that the binary values are scewing the analysis? if that is > >> > the > >> > case, then what is the best practice for data sets with binary and > >> > numerical > >> > factors? > >> > > >> > Thanks for all the help, > >> > Thorton > >> > > >> > > >> > On Tue, Aug 24, 2010 at 10:49 AM, Pietro Berkes > >> > <berkes@...> > >> > wrote: > >> >> > >> >> It doesn't sound right, but I'll need more details: > >> >> > >> >>  what are the parameters to the FANode constructor? > >> >>  how many data points? > >> >>  how many factors did you extract? > >> >> > >> >> Of course, having a look at the data or the mixing matrix E_y_mtx > >> >> would be even better. > >> >> P. > >> >> > >> >> On Tue, Aug 24, 2010 at 10:50 AM, Thorton Timms > >> >> <mightythorton@...> > >> >> wrote: > >> >> > I have a large data set with almost 200 factors. When I run the MDP > >> >> > Factor > >> >> > Analysis, and analyze the E_y matrix, it appears that each of the > >> >> > reduced > >> >> > factors is influenced the most by the same handful of original > >> >> > factors > >> >> > (just > >> >> > with different weights). Is this normal? I would expect that at > >> >> > least > >> >> > some > >> >> > of the reduced factors would be base on a mixture of original > >> >> > factors. > >> >> > > >> >> > Thanks, > >> >> > Thorton > >> > > >> > > >> > >  > >> > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program > >> > Be part of this innovative community and reach millions of netbook > users > >> > worldwide. Take advantage of special opportunities to increase revenue > >> > and > >> > speed timetomarket. Join now, and jumpstart your future. > >> > http://p.sf.net/sfu/intelatomd2d > >> > _______________________________________________ > >> > mdptoolkitusers mailing list > >> > mdptoolkitusers@... > >> > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > >> > > >> > > >> > >> > >> >  > >> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program > >> Be part of this innovative community and reach millions of netbook users > >> worldwide. Take advantage of special opportunities to increase revenue > and > >> speed timetomarket. Join now, and jumpstart your future. > >> http://p.sf.net/sfu/intelatomd2d > >> _______________________________________________ > >> mdptoolkitusers mailing list > >> mdptoolkitusers@... > >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > > > > > >  > > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program > > Be part of this innovative community and reach millions of netbook users > > worldwide. Take advantage of special opportunities to increase revenue > and > > speed timetomarket. Join now, and jumpstart your future. > > http://p.sf.net/sfu/intelatomd2d > > _______________________________________________ > > mdptoolkitusers mailing list > > mdptoolkitusers@... > > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > > > > > >  > This SF.net Dev2Dev email is sponsored by: > > Show off your parallel programming skills. > Enter the Intel(R) Threading Challenge 2010. > http://p.sf.net/sfu/intelthreadsfd > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > 
From: Pietro Berkes <berkes@ga...>  20100903 14:25:54

You need to compute the total transformation from x to y: PCANode transforms your input, x, to an output x': x' = (x  avg) * v FANode transform its input, x1 to an output y: y = (x'  mu) * E_y_mtx mu is zero since the output of PCANode has mean zero => y = x' * E_y_mtx = (x  avg) * v * E_y_mtx = (x  avg) * w In conclusions: the factors that you are looking for are the product of the PCA factors and the FA factors. Here's some code that might make this point simpler: import numpy as np import mdp from mdp import nodes as nd x = np.loadtxt('in2.txt') flow = mdp.Flow([nd.PCANode(output_dim=100), nd.FANode(output_dim=10)]) flow.train(x) y = flow.execute(x) # compute total transformation from x to y mn = flow[0].avg w = np.dot(flow[0].v, flow[1].E_y_mtx) print np.absolute(y2y).max() The output should be *very* close to zero, showing that indeed the matrix w computes the total factors. P. On Fri, Aug 27, 2010 at 10:54 PM, Thorton Timms <mightythorton@...> wrote: > Pietro, > > Thanks! The new error is much more helpful. > I am attempting to, both, reduce the dimensions and also find hidden > structures. It is important that I be able to know what original factors > are contributing to the new reduced factors. I need human understanding of > the analysis as well as a prediction model. > > I can do a PCA first, but it will make determining the structure > (correlation from the new factors to the original factors) a bit more > difficult. Any suggestions? > > Thanks again for all the help, > Thorton > > On Wed, Aug 25, 2010 at 12:17 PM, Pietro Berkes <berkes@...> > wrote: >> >> Dear Thorton, >> >> it looks like the covariance matrix of your input has determinant >> zero, i.e. it is singular. That means that some columns are linearly >> dependent. >> >> import numpy as np >> # x contains the data >> np.linalg.det(np.cov(x, rowvar=0)) >> 0.0 >> >> Using PCANode, it seems like there are only 193 nondegenerate dimensions: >> >> x2 = mdp.nodes.PCANode(reduce=True)(x) >> x2.shape >> (500, 193) >> >> The output dimensions is 193, meaning that PCANode found that the rest >> of the eigenvalues of the covariance matrix of the data is 0. This is >> what confuses FANode. >> >> It is possible that with the whole data set, the problem is reduced >> (please give the code above a try). However, I suspect that the >> covariance matrix will still be very illconditioned. Some of the >> numeric values seem to have a scale much smaller than one (several >> orders of magnitude), rescaling them to be of order 1 might help. >> Another thing you can to is to run PCANode first as shown above, then >> FANode on the reduced data, then find the complete transformations by >> multiplying the projection matrices together. >> >> I know algorithms for binary variables alone, and others for >> continuous variables, but I can't think of an outofthebox one for a >> mix of both. I guess it will depend a lot on your goal. Are you trying >> to reduce the dimensionality of the data set, or trying to find hidden >> structure, or something else entirely? >> >> Best, >> Pietro >> >> >> On Tue, Aug 24, 2010 at 6:29 PM, Thorton Timms <mightythorton@...> >> wrote: >> > FANode parameters: >> > tol=0.0001, output_dim=dim >> > >> > Where dim is in range (1056). All output dimensions in that range seem >> > to >> > produce the same problem. >> > My data has 241 factors and over 41,000 data points. >> > I created a sample of 500 data points and 10 factors(see attached). If >> > you >> > look at the second to last row in the Ey matrix, you can see that it >> > dominates and some others dominate, but not as much. >> > >> > My data consists mostly binary attributes (hence all the 1 and 0 values. >> > Howerver, there are some numeric values. The numberic values have been >> > normalized to values between 0 and 1. The numeric values are the ones >> > that >> > seem to be dominating (I didn't notice this until I created this sample >> > output). >> > >> > I assume that the binary values are scewing the analysis? if that is >> > the >> > case, then what is the best practice for data sets with binary and >> > numerical >> > factors? >> > >> > Thanks for all the help, >> > Thorton >> > >> > >> > On Tue, Aug 24, 2010 at 10:49 AM, Pietro Berkes >> > <berkes@...> >> > wrote: >> >> >> >> It doesn't sound right, but I'll need more details: >> >> >> >>  what are the parameters to the FANode constructor? >> >>  how many data points? >> >>  how many factors did you extract? >> >> >> >> Of course, having a look at the data or the mixing matrix E_y_mtx >> >> would be even better. >> >> P. >> >> >> >> On Tue, Aug 24, 2010 at 10:50 AM, Thorton Timms >> >> <mightythorton@...> >> >> wrote: >> >> > I have a large data set with almost 200 factors. When I run the MDP >> >> > Factor >> >> > Analysis, and analyze the E_y matrix, it appears that each of the >> >> > reduced >> >> > factors is influenced the most by the same handful of original >> >> > factors >> >> > (just >> >> > with different weights). Is this normal? I would expect that at >> >> > least >> >> > some >> >> > of the reduced factors would be base on a mixture of original >> >> > factors. >> >> > >> >> > Thanks, >> >> > Thorton >> > >> > >> >  >> > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program >> > Be part of this innovative community and reach millions of netbook users >> > worldwide. Take advantage of special opportunities to increase revenue >> > and >> > speed timetomarket. Join now, and jumpstart your future. >> > http://p.sf.net/sfu/intelatomd2d >> > _______________________________________________ >> > mdptoolkitusers mailing list >> > mdptoolkitusers@... >> > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers >> > >> > >> >> >>  >> Sell apps to millions through the Intel(R) Atom(Tm) Developer Program >> Be part of this innovative community and reach millions of netbook users >> worldwide. Take advantage of special opportunities to increase revenue and >> speed timetomarket. Join now, and jumpstart your future. >> http://p.sf.net/sfu/intelatomd2d >> _______________________________________________ >> mdptoolkitusers mailing list >> mdptoolkitusers@... >> https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > >  > Sell apps to millions through the Intel(R) Atom(Tm) Developer Program > Be part of this innovative community and reach millions of netbook users > worldwide. Take advantage of special opportunities to increase revenue and > speed timetomarket. Join now, and jumpstart your future. > http://p.sf.net/sfu/intelatomd2d > _______________________________________________ > mdptoolkitusers mailing list > mdptoolkitusers@... > https://lists.sourceforge.net/lists/listinfo/mdptoolkitusers > > 