You can subscribe to this list here.
2010 
_{Jan}
(23) 
_{Feb}
(4) 
_{Mar}
(56) 
_{Apr}
(74) 
_{May}
(107) 
_{Jun}
(79) 
_{Jul}
(212) 
_{Aug}
(122) 
_{Sep}
(289) 
_{Oct}
(176) 
_{Nov}
(531) 
_{Dec}
(268) 

2011 
_{Jan}
(255) 
_{Feb}
(157) 
_{Mar}
(199) 
_{Apr}
(274) 
_{May}
(495) 
_{Jun}
(157) 
_{Jul}
(276) 
_{Aug}
(212) 
_{Sep}
(356) 
_{Oct}
(356) 
_{Nov}
(421) 
_{Dec}
(365) 
2012 
_{Jan}
(530) 
_{Feb}
(236) 
_{Mar}
(495) 
_{Apr}
(286) 
_{May}
(347) 
_{Jun}
(253) 
_{Jul}
(335) 
_{Aug}
(254) 
_{Sep}
(429) 
_{Oct}
(506) 
_{Nov}
(358) 
_{Dec}
(147) 
2013 
_{Jan}
(492) 
_{Feb}
(328) 
_{Mar}
(477) 
_{Apr}
(348) 
_{May}
(248) 
_{Jun}
(237) 
_{Jul}
(526) 
_{Aug}
(407) 
_{Sep}
(253) 
_{Oct}
(263) 
_{Nov}
(202) 
_{Dec}
(184) 
2014 
_{Jan}
(246) 
_{Feb}
(258) 
_{Mar}
(305) 
_{Apr}
(168) 
_{May}
(182) 
_{Jun}
(238) 
_{Jul}
(340) 
_{Aug}
(256) 
_{Sep}
(312) 
_{Oct}
(168) 
_{Nov}
(135) 
_{Dec}
(118) 
S  M  T  W  T  F  S 


1
(23) 
2
(37) 
3
(9) 
4
(10) 
5
(20) 
6
(3) 
7
(2) 
8
(16) 
9
(4) 
10
(11) 
11

12
(28) 
13
(14) 
14
(12) 
15
(34) 
16
(24) 
17
(8) 
18
(19) 
19
(8) 
20
(5) 
21
(2) 
22
(36) 
23
(24) 
24
(26) 
25
(38) 
26
(18) 
27
(15) 
28
(2) 
29
(55) 
30
(28) 




From: Gael Varoquaux <gael.varoquaux@no...>  20101120 23:09:04

On Sat, Nov 20, 2010 at 04:59:41PM 0600, Robert Kern wrote: > > Sorry, I should have said 'Gaussian process regression', which is the > > full name, and is an equivalent to Kriging. Gaussian processes in > > themself are a very large class of probabilistic models. > > AFAICT, PyMC does not have any Gaussian process regression, and it does > > seem a bit outside its scope. > I'm pretty sure it does. See section 1.4 "Nonparametric regression" > and 2.4 "Geostatistical example" in the GP User's Guide: > http://pymc.googlecode.com/files/GPUserGuide.pdf Yes, you are right. My bad. The good news is that it means that the name is not too badly overloaded. I see that they do the estimation by sampling the posterior, whereas the proposed contribution in the scikit simply does a point estimate using the scipy's optimizers. I guess that PyMC's approach gives a full posterior estimate, and is thus richer than the point estimate, but I would except it to be slower. I wonder if they are any other fundemental differences (I don't know Gaussian processes terribly well). Gael 
From: Peter Prettenhofer <peter.prettenhofer@gm...>  20101120 16:38:45

Hi Vlad, that's great news  I'm looking forward to having more matrix factorization techniques in scikitlearn. Please consider the python port of the projected gradient method for NMF by ChihJen Lin [1]. It could be easily integrated into scikitlearn since it has the same licencing as Libsvm. Uwe Schmitt [2] also provides a bunch of NMF methods in python including sparse NMF by Hoyer. best, Peter [1] http://www.csie.ntu.edu.tw/~cjlin/nmf/index.html [2] http://public.procoders.net/nnma/ 2010/11/20 Vlad Niculae <vlad@...>: > Hi and thanks a lot for the interest. > > I am going to assess how useful these techniques are for feature > selection in handwritten digit recognition (zipcode). > > I did not look too much into it yet but for improvement on PCA > something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse > principal component analysis" should be useful. (there exists an > interesting penalized SVD method to solve it). > > However what seems the most useful to me is the sparse NMF that > consistently produces local representations of facial data as shown in > PO Hoyer (2004). "Nonnegative Matrix Factorization with Sparseness > Constraints" > > My goal is mainly to gain insight on these techniques and I think a > good way to do this is bringing them to my environment of choice and > running them on the zip code data. It is my first handson application > after much reading up and I am very enthusiastic and motivated. > > > On Sat, Nov 20, 2010 at 5:49 PM, Gael Varoquaux > <gael.varoquaux@...> wrote: >> On Sat, Nov 20, 2010 at 05:25:05PM +0200, Vlad Niculae wrote: >>> I am working on my undergrad thesis in a NumPy environment and I plan >>> to use as much of scikitslearn as I can. I will research and compare >>> implementations of PCA, sparse PCA, NMF and sparse NMF. However apart >>> from PCA, I did not find any unified libraries with the others, even >>> though there are plenty of implementations available. >> >>> On the learn homepage it says that matrix factorization is a planned >>> feature. Is there work in progress on this? If not, I could attempt to >>> gather together and port what I find, and contribute it. >> >> Hey Vlad, >> >> Welcome! Its great to have enthusiastic people joining us. >> >> Matrix factorization is indeed a planned feature, and we are starting to >> have a bit of methods doing this, specifically ICA and PCA >> (http://scikitlearn.sourceforge.net/modules/decompositions.html). But we >> are interested by adding much more (basically any 'standard' methods is >> more than welcome). >> >> I know that there is are a few NMF implementations in Python. Some of >> them have no license attached to them, so the first thing to do is to ask >> the authors if they are ready to license their code under a BSD license >> and have it included in the scikit (with their name on it, of course). >> MILK (by Luis Pedro) has an NMF implementation that is licensed under the >> MIT license, so compatible with the scikit. You will also have some work >> to do to compare the different implementations speedwise and >> stabilitywise. This kind of work is great to gain insight on the methods >> and will probably be beneficial for your research. Once you know which >> code you want to contribute, simply fork the scikit on github and start >> building your contribution in the fork. You will need to pay attention to >> respecting the coding style of the scikit and to writing examples and >> documentation (another great way of gaining insight). We will review it, >> and integrate it in the scikit when it is ripe. >> >> With regards the sparse PCA, What is your definition of sparse PCA? There >> are different ways of imposing a penalty on the PCA problem. We (at the >> Parietal INRIA team) have some code that implements a PCAlike problem in >> a sparse dictionary learning framework, using the scikit. It's not open >> source because we are still working on it, and because we need to shoot >> out a publication using it before we open it. However, it will be open in >> the near future (the big question is when), and we can share it with >> specific people asking for it. >> >> I suggest that you start small: small contributions are easier to >> integrate. You could for instance start with NMF, and we could focus on >> trying to get NMF in before we try to get any other method in. Then you >> could focus on sparse NMF, or maybe we could open up our sparse PCA code, >> and if it suits you, you code work on integrating it in the scikit >> (shouldn't be a huge amount of work, as we have the same coding style for >> our internal code). In the long run, if you want, you could make sure >> that the different matrix factorization methods expose an interface as >> uniform as possible (trust me, it requires some active work to fight >> software entropy :P). >> >> Purely out of curiosity, may I ask if you have a specific application in >> mind for matrix factorization? >> >> This is exciting! >> >> Gaël >> >>  >> Beautiful is writing same markup. Internet Explorer 9 supports >> standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. >> Spend less time writing and rewriting code and more time creating great >> experiences on the web. Be a part of the beta today >> http://p.sf.net/sfu/msIE9sfdev2dev >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > >  > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9sfdev2dev > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >  Peter Prettenhofer 
From: Vlad Niculae <vlad@ve...>  20101120 16:23:33

Hi and thanks a lot for the interest. I am going to assess how useful these techniques are for feature selection in handwritten digit recognition (zipcode). I did not look too much into it yet but for improvement on PCA something like H. Zou, T. Hastie and R. Tibshirani (2006). "Sparse principal component analysis" should be useful. (there exists an interesting penalized SVD method to solve it). However what seems the most useful to me is the sparse NMF that consistently produces local representations of facial data as shown in PO Hoyer (2004). "Nonnegative Matrix Factorization with Sparseness Constraints" My goal is mainly to gain insight on these techniques and I think a good way to do this is bringing them to my environment of choice and running them on the zip code data. It is my first handson application after much reading up and I am very enthusiastic and motivated. On Sat, Nov 20, 2010 at 5:49 PM, Gael Varoquaux <gael.varoquaux@...> wrote: > On Sat, Nov 20, 2010 at 05:25:05PM +0200, Vlad Niculae wrote: >> I am working on my undergrad thesis in a NumPy environment and I plan >> to use as much of scikitslearn as I can. I will research and compare >> implementations of PCA, sparse PCA, NMF and sparse NMF. However apart >> from PCA, I did not find any unified libraries with the others, even >> though there are plenty of implementations available. > >> On the learn homepage it says that matrix factorization is a planned >> feature. Is there work in progress on this? If not, I could attempt to >> gather together and port what I find, and contribute it. > > Hey Vlad, > > Welcome! Its great to have enthusiastic people joining us. > > Matrix factorization is indeed a planned feature, and we are starting to > have a bit of methods doing this, specifically ICA and PCA > (http://scikitlearn.sourceforge.net/modules/decompositions.html). But we > are interested by adding much more (basically any 'standard' methods is > more than welcome). > > I know that there is are a few NMF implementations in Python. Some of > them have no license attached to them, so the first thing to do is to ask > the authors if they are ready to license their code under a BSD license > and have it included in the scikit (with their name on it, of course). > MILK (by Luis Pedro) has an NMF implementation that is licensed under the > MIT license, so compatible with the scikit. You will also have some work > to do to compare the different implementations speedwise and > stabilitywise. This kind of work is great to gain insight on the methods > and will probably be beneficial for your research. Once you know which > code you want to contribute, simply fork the scikit on github and start > building your contribution in the fork. You will need to pay attention to > respecting the coding style of the scikit and to writing examples and > documentation (another great way of gaining insight). We will review it, > and integrate it in the scikit when it is ripe. > > With regards the sparse PCA, What is your definition of sparse PCA? There > are different ways of imposing a penalty on the PCA problem. We (at the > Parietal INRIA team) have some code that implements a PCAlike problem in > a sparse dictionary learning framework, using the scikit. It's not open > source because we are still working on it, and because we need to shoot > out a publication using it before we open it. However, it will be open in > the near future (the big question is when), and we can share it with > specific people asking for it. > > I suggest that you start small: small contributions are easier to > integrate. You could for instance start with NMF, and we could focus on > trying to get NMF in before we try to get any other method in. Then you > could focus on sparse NMF, or maybe we could open up our sparse PCA code, > and if it suits you, you code work on integrating it in the scikit > (shouldn't be a huge amount of work, as we have the same coding style for > our internal code). In the long run, if you want, you could make sure > that the different matrix factorization methods expose an interface as > uniform as possible (trust me, it requires some active work to fight > software entropy :P). > > Purely out of curiosity, may I ask if you have a specific application in > mind for matrix factorization? > > This is exciting! > > Gaël > >  > Beautiful is writing same markup. Internet Explorer 9 supports > standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3. > Spend less time writing and rewriting code and more time creating great > experiences on the web. Be a part of the beta today > http://p.sf.net/sfu/msIE9sfdev2dev > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Gael Varoquaux <gael.varoquaux@no...>  20101120 15:49:52

On Sat, Nov 20, 2010 at 05:25:05PM +0200, Vlad Niculae wrote: > I am working on my undergrad thesis in a NumPy environment and I plan > to use as much of scikitslearn as I can. I will research and compare > implementations of PCA, sparse PCA, NMF and sparse NMF. However apart > from PCA, I did not find any unified libraries with the others, even > though there are plenty of implementations available. > On the learn homepage it says that matrix factorization is a planned > feature. Is there work in progress on this? If not, I could attempt to > gather together and port what I find, and contribute it. Hey Vlad, Welcome! Its great to have enthusiastic people joining us. Matrix factorization is indeed a planned feature, and we are starting to have a bit of methods doing this, specifically ICA and PCA (http://scikitlearn.sourceforge.net/modules/decompositions.html). But we are interested by adding much more (basically any 'standard' methods is more than welcome). I know that there is are a few NMF implementations in Python. Some of them have no license attached to them, so the first thing to do is to ask the authors if they are ready to license their code under a BSD license and have it included in the scikit (with their name on it, of course). MILK (by Luis Pedro) has an NMF implementation that is licensed under the MIT license, so compatible with the scikit. You will also have some work to do to compare the different implementations speedwise and stabilitywise. This kind of work is great to gain insight on the methods and will probably be beneficial for your research. Once you know which code you want to contribute, simply fork the scikit on github and start building your contribution in the fork. You will need to pay attention to respecting the coding style of the scikit and to writing examples and documentation (another great way of gaining insight). We will review it, and integrate it in the scikit when it is ripe. With regards the sparse PCA, What is your definition of sparse PCA? There are different ways of imposing a penalty on the PCA problem. We (at the Parietal INRIA team) have some code that implements a PCAlike problem in a sparse dictionary learning framework, using the scikit. It's not open source because we are still working on it, and because we need to shoot out a publication using it before we open it. However, it will be open in the near future (the big question is when), and we can share it with specific people asking for it. I suggest that you start small: small contributions are easier to integrate. You could for instance start with NMF, and we could focus on trying to get NMF in before we try to get any other method in. Then you could focus on sparse NMF, or maybe we could open up our sparse PCA code, and if it suits you, you code work on integrating it in the scikit (shouldn't be a huge amount of work, as we have the same coding style for our internal code). In the long run, if you want, you could make sure that the different matrix factorization methods expose an interface as uniform as possible (trust me, it requires some active work to fight software entropy :P). Purely out of curiosity, may I ask if you have a specific application in mind for matrix factorization? This is exciting! Gaël 
From: Vlad Niculae <vlad@ve...>  20101120 15:25:13

Hello, First of all allow me to introduce myself, I am an undergrad student in CS planning to enter the field of machine learning. I am a big fan of your work here. I am working on my undergrad thesis in a NumPy environment and I plan to use as much of scikitslearn as I can. I will research and compare implementations of PCA, sparse PCA, NMF and sparse NMF. However apart from PCA, I did not find any unified libraries with the others, even though there are plenty of implementations available. On the learn homepage it says that matrix factorization is a planned feature. Is there work in progress on this? If not, I could attempt to gather together and port what I find, and contribute it. Yours, Vlad N 