Learn how easy it is to sync an existing GitHub or Google Code repo to a SourceForge project! See Demo
Close
From: Gael Varoquaux <gael.varoquaux@no...>  20100423 11:18:40

Hi there, We have a basic API for supervised learning: an estimator object with '.fit()' and '.predict()' methods. I am really happy with it, because it is very simple and is getting us quite far. Of course, we all know that we will have to make it a bit more elaborate as time goes, for instance we have added 'predict_proba'... However, for unsupervised learning, I don't have a clear vision of where we are going. I mainly do unsupervised learning in my research work but I am quite new to learning and I miss the big picture. To be able to integrate together various unsupervised learning algorithm, we need a basic common interface. I am wondering, how do other libraries do it? Do you have any experience to share, any opinion on the problem, any code to point to? Examples ================ One of the challenges is that it is hard to specify what the problem unsupervised learning is trying to solve. Let me go through a few examples of methods and usecases, maybe revealing my incomplete knowledge of the problem. Please don't hesitate to add examples or usecases. PCA  PCA is an unsupervised algorithm mostoften used do to dimension reduction. In this regard, it can be seen as learning a transformation (mainly a rotation of the feature space, but also an optional projection). Thus is can be seen as 'transforming the data'. I believe that manifold learning, would fall in the same category. Also, there exists supervised versions of transforms, such as univariate feature selection, and some supervised estimators may be used as transforms for dimension reduction (LDA for instance). So we have a 'transform' use case. ICA  I am have a hard time with ICA. I believe ICA would be a transform usecase too, eventhough I've seen people do strange things with ICA. Mixture modelling  Mixture modelling can be seen as a density estimation problem, as any unspervised learning that is associated with a predictive model or a likelihood. Clustering  For probabilistic clustering (such a k means seen as a Gaussian mixture model) clustering is like mixture modeling, and performs a density estimation. Clustering can also be seen as classification for which you learn the classes. Similarly, mixture modelling can be used in the same sens. Covariance estimation/Gaussian graphical models  I guess these are parametric density estimation problems. Usecases ========= >From this small review of what I know in unsupervised machine learning, I can see a few usecases, that are neither separated, nor specific to unsupervised learning: 1) Transforming the data. It seems that this falls under what Weka calls 'filters' 2) Density estimation, or fitting a probablitic predictive model 3) Learning a classifier The difference between 2 and 3 is that 2 requires the existance of a likelihood for new data, whereas 3 only requires a maximum a posteriori decision. Clues for an API ================== I don't want to take any decisions right now with regards to an API. However, I would like to open the discussion, and start hashing ideas. I am not sure if I have already stated my point of view on API design on this mailing list. Basically, my philosphy is to try to make things as simple as possible: the less amount of things a user has to learn and understand to use a library, to better. I like to identify a few core usecases and design interfaces around them. If they fit well as objects (such as the estimator object that we already have), I like to think about the smallest interface (set of methods and attributes) that can solve the usecase. For instance, in the supervised learning case, our core interface is that of the estimator, that implements simply a 'fit' and 'predict' method. I like to minimise the number of different objects or interfaces, because each time the user encounters a new interface, she has to learn something more. Thus, for me, API design is about finding a small set of light interfaces that solve the usecases. All the objects would implement a fit method:  fit: same as for surpervised learning, but with only 1 input) In addition, depending on the usecases, we could have the methods (possibly several of them on some objects): 1) Usecase 1: transforming. Weka has objects it calls 'Filters'. I think we could have a similar class, that could implement:  transform: takes data, returns a modified version. 2) Usecase 2: I am not sure how to call these objects. Names are important. Anyhow, the interface I see here would be:  test (I don't really like this name, any suggestions?): takes new data, returns a likelihood for it. 3) Usecase 3: (also a naming problem here):  predict: return the label according to the clustering learned in fit. That way, these objects could pretty much be used as unsupervised classifiers. OK, I'll stop here, as I am afraid that I am overfitting what I already now. Maybe I am talking nonsense. This mail was just to get the ball rolling, start a discussion, and prepare integration of unsupervised learning such as GMM, HMM, and so in the scikit. Gaël 
From: Vagabond_Aero <vagabondaero@gm...>  20100423 13:40:06
Attachments:
Message as HTML

Usecase 2 could also be called validate, rather than test. It seems that in Usecase 3, you are thinking something along the lines of evaluate. Just throwing out some ideas... On Fri, Apr 23, 2010 at 04:18, Gael Varoquaux <gael.varoquaux@... > wrote: > Hi there, > > We have a basic API for supervised learning: an estimator object with > '.fit()' and '.predict()' methods. I am really happy with it, because it > is very simple and is getting us quite far. Of course, we all know that > we will have to make it a bit more elaborate as time goes, for instance > we have added 'predict_proba'... > > However, for unsupervised learning, I don't have a clear vision of where > we are going. I mainly do unsupervised learning in my research work but I > am quite new to learning and I miss the big picture. > > To be able to integrate together various unsupervised learning algorithm, > we need a basic common interface. I am wondering, how do other libraries > do it? Do you have any experience to share, any opinion on the problem, > any code to point to? > > Examples > ================ > > One of the challenges is that it is hard to specify what the problem > unsupervised learning is trying to solve. Let me go through a few > examples of methods and usecases, maybe revealing my incomplete knowledge > of the problem. Please don't hesitate to add examples or usecases. > > PCA >  > > PCA is an unsupervised algorithm mostoften used do to dimension > reduction. In this regard, it can be seen as learning a transformation > (mainly a rotation of the feature space, but also an optional > projection). Thus is can be seen as 'transforming the data'. I believe > that manifold learning, would fall in the same category. Also, there > exists supervised versions of transforms, such as univariate feature > selection, and some supervised estimators may be used as transforms for > dimension reduction (LDA for instance). > > So we have a 'transform' use case. > > ICA >  > > I am have a hard time with ICA. I believe ICA would be a transform > usecase too, eventhough I've seen people do strange things with ICA. > > Mixture modelling >  > > Mixture modelling can be seen as a density estimation problem, as any > unspervised learning that is associated with a predictive model or a > likelihood. > > > Clustering >  > > For probabilistic clustering (such a k means seen as a Gaussian mixture > model) clustering is like mixture modeling, and performs a density > estimation. > > Clustering can also be seen as classification for which you learn the > classes. Similarly, mixture modelling can be used in the same sens. > > Covariance estimation/Gaussian graphical models >  > > I guess these are parametric density estimation problems. > > > Usecases > ========= > > >From this small review of what I know in unsupervised machine learning, I > can see a few usecases, that are neither separated, nor specific to > unsupervised learning: > > 1) Transforming the data. It seems that this falls under what Weka calls > 'filters' > > 2) Density estimation, or fitting a probablitic predictive model > > 3) Learning a classifier > > The difference between 2 and 3 is that 2 requires the existance of a > likelihood for new data, whereas 3 only requires a maximum a posteriori > decision. > > > Clues for an API > ================== > > I don't want to take any decisions right now with regards to an API. > However, I would like to open the discussion, and start hashing ideas. I > am not sure if I have already stated my point of view on API design on > this mailing list. Basically, my philosphy is to try to make things as > simple as possible: the less amount of things a user has to learn and > understand to use a library, to better. I like to identify a few core > usecases and design interfaces around them. If they fit well as objects > (such as the estimator object that we already have), I like to think > about the smallest interface (set of methods and attributes) that can > solve the usecase. For instance, in the supervised learning case, our > core interface is that of the estimator, that implements simply a 'fit' > and 'predict' method. I like to minimise the number of different objects > or interfaces, because each time the user encounters a new interface, she > has to learn something more. Thus, for me, API design is about finding a > small set of light interfaces that solve the usecases. > > All the objects would implement a fit method: >  fit: same as for surpervised learning, but with only 1 input) > > In addition, depending on the usecases, we could have the methods > (possibly several of them on some objects): > > 1) Usecase 1: transforming. Weka has objects it calls 'Filters'. I think > we could have a similar class, that could implement: > >  transform: takes data, returns a modified version. > > 2) Usecase 2: I am not sure how to call these objects. Names are > important. Anyhow, the interface I see here would be: > >  test (I don't really like this name, any suggestions?): takes new > data, returns a likelihood for it. > > 3) Usecase 3: (also a naming problem here): > >  predict: return the label according to the clustering learned in > fit. That way, these objects could pretty much be used as > unsupervised classifiers. > > > OK, I'll stop here, as I am afraid that I am overfitting what I already > now. Maybe I am talking nonsense. This mail was just to get the ball > rolling, start a discussion, and prepare integration of unsupervised > learning such as GMM, HMM, and so in the scikit. > > Gaël > > >  > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Ron Weiss <ronweiss@gm...>  20100423 14:15:32

On Fri, Apr 23, 2010 at 01:18:32PM +0200, Gael Varoquaux wrote: > Hi there, > > We have a basic API for supervised learning: an estimator object with > '.fit()' and '.predict()' methods. I am really happy with it, because it > is very simple and is getting us quite far. Of course, we all know that > we will have to make it a bit more elaborate as time goes, for instance > we have added 'predict_proba'... > > However, for unsupervised learning, I don't have a clear vision of where > we are going. I mainly do unsupervised learning in my research work but I > am quite new to learning and I miss the big picture. > > To be able to integrate together various unsupervised learning algorithm, > we need a basic common interface. I am wondering, how do other libraries > do it? Do you have any experience to share, any opinion on the problem, > any code to point to? > > Examples > ================ > > One of the challenges is that it is hard to specify what the problem > unsupervised learning is trying to solve. Let me go through a few > examples of methods and usecases, maybe revealing my incomplete knowledge > of the problem. Please don't hesitate to add examples or usecases. > > PCA >  > > PCA is an unsupervised algorithm mostoften used do to dimension > reduction. In this regard, it can be seen as learning a transformation > (mainly a rotation of the feature space, but also an optional > projection). Thus is can be seen as 'transforming the data'. I believe > that manifold learning, would fall in the same category. Also, there > exists supervised versions of transforms, such as univariate feature > selection, and some supervised estimators may be used as transforms for > dimension reduction (LDA for instance). > > So we have a 'transform' use case. > > ICA >  > > I am have a hard time with ICA. I believe ICA would be a transform > usecase too, eventhough I've seen people do strange things with ICA. > > Mixture modelling >  > > Mixture modelling can be seen as a density estimation problem, as any > unspervised learning that is associated with a predictive model or a > likelihood. > > > Clustering >  > > For probabilistic clustering (such a k means seen as a Gaussian mixture > model) clustering is like mixture modeling, and performs a density > estimation. > > Clustering can also be seen as classification for which you learn the > classes. Similarly, mixture modelling can be used in the same sens. > > Covariance estimation/Gaussian graphical models >  > > I guess these are parametric density estimation problems. > > > Usecases > ========= > > >From this small review of what I know in unsupervised machine learning, I > can see a few usecases, that are neither separated, nor specific to > unsupervised learning: > > 1) Transforming the data. It seems that this falls under what Weka calls > 'filters' > > 2) Density estimation, or fitting a probablitic predictive model > > 3) Learning a classifier > > The difference between 2 and 3 is that 2 requires the existance of a > likelihood for new data, whereas 3 only requires a maximum a posteriori > decision. > > > Clues for an API > ================== > > I don't want to take any decisions right now with regards to an API. > However, I would like to open the discussion, and start hashing ideas. I > am not sure if I have already stated my point of view on API design on > this mailing list. Basically, my philosphy is to try to make things as > simple as possible: the less amount of things a user has to learn and > understand to use a library, to better. I like to identify a few core > usecases and design interfaces around them. If they fit well as objects > (such as the estimator object that we already have), I like to think > about the smallest interface (set of methods and attributes) that can > solve the usecase. For instance, in the supervised learning case, our > core interface is that of the estimator, that implements simply a 'fit' > and 'predict' method. I like to minimise the number of different objects > or interfaces, because each time the user encounters a new interface, she > has to learn something more. Thus, for me, API design is about finding a > small set of light interfaces that solve the usecases. > > All the objects would implement a fit method: >  fit: same as for surpervised learning, but with only 1 input) > > In addition, depending on the usecases, we could have the methods > (possibly several of them on some objects): > > 1) Usecase 1: transforming. Weka has objects it calls 'Filters'. I think > we could have a similar class, that could implement: > >  transform: takes data, returns a modified version. > > 2) Usecase 2: I am not sure how to call these objects. Names are > important. Anyhow, the interface I see here would be: > >  test (I don't really like this name, any suggestions?): takes new > data, returns a likelihood for it. I like 'eval' or 'evaluate' for getting the likelihood of some data under e.g. a GMM. Ron > 3) Usecase 3: (also a naming problem here): > >  predict: return the label according to the clustering learned in > fit. That way, these objects could pretty much be used as > unsupervised classifiers. > > > OK, I'll stop here, as I am afraid that I am overfitting what I already > now. Maybe I am talking nonsense. This mail was just to get the ball > rolling, start a discussion, and prepare integration of unsupervised > learning such as GMM, HMM, and so in the scikit. > > Gaël > >  > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: Mike Dewar <mikedewar@gm...>  20100423 14:48:04

Aren't all of these use cases predicting latent variables? usecases: 1) 'transform' is the prediction of a reduced dimension variable 2/3) 'desntiy estimation' is the prediction of a latent label, either a hard 1ofK label like [0,0,1] in kmeans or a probabilistic label like [0.1,0.1,0.8] in GMM. (think it might be a bit confusing to lose the classification:supervised / clustering:unsupervised distinction) so you only need one new method: .predict_latent(data) to unify all of these. See Roweis and Gharamani's Unifying Review of Linear Gaussian Models (1999). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.5555&rep=rep1&type=pdf This also opens up things like Kalman Filters / EM in linear dynamic systems, forward/backward / BaumWelch in HMMs, more general variational methods and so on, all with the same basic API. Mike Dewar (largely channelling James Hensman) On 23 Apr 2010, at 07:18, Gael Varoquaux wrote: > Hi there, > > We have a basic API for supervised learning: an estimator object with > '.fit()' and '.predict()' methods. I am really happy with it, because it > is very simple and is getting us quite far. Of course, we all know that > we will have to make it a bit more elaborate as time goes, for instance > we have added 'predict_proba'... > > However, for unsupervised learning, I don't have a clear vision of where > we are going. I mainly do unsupervised learning in my research work but I > am quite new to learning and I miss the big picture. > > To be able to integrate together various unsupervised learning algorithm, > we need a basic common interface. I am wondering, how do other libraries > do it? Do you have any experience to share, any opinion on the problem, > any code to point to? > > Examples > ================ > > One of the challenges is that it is hard to specify what the problem > unsupervised learning is trying to solve. Let me go through a few > examples of methods and usecases, maybe revealing my incomplete knowledge > of the problem. Please don't hesitate to add examples or usecases. > > PCA >  > > PCA is an unsupervised algorithm mostoften used do to dimension > reduction. In this regard, it can be seen as learning a transformation > (mainly a rotation of the feature space, but also an optional > projection). Thus is can be seen as 'transforming the data'. I believe > that manifold learning, would fall in the same category. Also, there > exists supervised versions of transforms, such as univariate feature > selection, and some supervised estimators may be used as transforms for > dimension reduction (LDA for instance). > > So we have a 'transform' use case. > > ICA >  > > I am have a hard time with ICA. I believe ICA would be a transform > usecase too, eventhough I've seen people do strange things with ICA. > > Mixture modelling >  > > Mixture modelling can be seen as a density estimation problem, as any > unspervised learning that is associated with a predictive model or a > likelihood. > > > Clustering >  > > For probabilistic clustering (such a k means seen as a Gaussian mixture > model) clustering is like mixture modeling, and performs a density > estimation. > > Clustering can also be seen as classification for which you learn the > classes. Similarly, mixture modelling can be used in the same sens. > > Covariance estimation/Gaussian graphical models >  > > I guess these are parametric density estimation problems. > > > Usecases > ========= > >> From this small review of what I know in unsupervised machine learning, I > can see a few usecases, that are neither separated, nor specific to > unsupervised learning: > > 1) Transforming the data. It seems that this falls under what Weka calls > 'filters' > > 2) Density estimation, or fitting a probablitic predictive model > > 3) Learning a classifier > > The difference between 2 and 3 is that 2 requires the existance of a > likelihood for new data, whereas 3 only requires a maximum a posteriori > decision. > > > Clues for an API > ================== > > I don't want to take any decisions right now with regards to an API. > However, I would like to open the discussion, and start hashing ideas. I > am not sure if I have already stated my point of view on API design on > this mailing list. Basically, my philosphy is to try to make things as > simple as possible: the less amount of things a user has to learn and > understand to use a library, to better. I like to identify a few core > usecases and design interfaces around them. If they fit well as objects > (such as the estimator object that we already have), I like to think > about the smallest interface (set of methods and attributes) that can > solve the usecase. For instance, in the supervised learning case, our > core interface is that of the estimator, that implements simply a 'fit' > and 'predict' method. I like to minimise the number of different objects > or interfaces, because each time the user encounters a new interface, she > has to learn something more. Thus, for me, API design is about finding a > small set of light interfaces that solve the usecases. > > All the objects would implement a fit method: >  fit: same as for surpervised learning, but with only 1 input) > > In addition, depending on the usecases, we could have the methods > (possibly several of them on some objects): > > 1) Usecase 1: transforming. Weka has objects it calls 'Filters'. I think > we could have a similar class, that could implement: > >  transform: takes data, returns a modified version. > > 2) Usecase 2: I am not sure how to call these objects. Names are > important. Anyhow, the interface I see here would be: > >  test (I don't really like this name, any suggestions?): takes new > data, returns a likelihood for it. > > 3) Usecase 3: (also a naming problem here): > >  predict: return the label according to the clustering learned in > fit. That way, these objects could pretty much be used as > unsupervised classifiers. > > > OK, I'll stop here, as I am afraid that I am overfitting what I already > now. Maybe I am talking nonsense. This mail was just to get the ball > rolling, start a discussion, and prepare integration of unsupervised > learning such as GMM, HMM, and so in the scikit. > > Gaël > >  > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: Olivier Grisel <olivier.grisel@en...>  20100423 15:18:00

2010/4/23 Mike Dewar <mikedewar@...>: > Aren't all of these use cases predicting latent variables? > > usecases: > 1) 'transform' is the prediction of a reduced dimension variable > 2/3) 'desntiy estimation' is the prediction of a latent label, either a hard 1ofK label like [0,0,1] in kmeans or a probabilistic label like [0.1,0.1,0.8] in GMM. (think it might be a bit confusing to lose the classification:supervised / clustering:unsupervised distinction) > > so you only need one new method: .predict_latent(data) to unify all of these. See Roweis and Gharamani's Unifying Review of Linear Gaussian Models (1999). http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.5555&rep=rep1&type=pdf +1 In could also be named "encode" if you consider the latent variables activations as an encoded representation of the input signal. Note: this is not always a reduced dimensional representation of the input. E.g. sometimes the latent code is high dimensional but very sparse (sparse coding with linear dictionary learning for instance, or autoencoders / autoassociators with a L1 penalty on the code, which is another form of sparse coding but non linear). I like the view of Yann Lecun and his energy based models way of thinking on these matters: http://www.youtube.com/watch?v=3boKlkPBckA  Olivier http://twitter.com/ogrisel  http://code.oliviergrisel.name 
From: Gael Varoquaux <gael.varoquaux@no...>  20100423 21:57:08

On Fri, Apr 23, 2010 at 10:47:52AM 0400, Mike Dewar wrote: > > Usecases: > > 1) Transforming the data. It seems that this falls under what Weka calls > > 'filters' > > 2) Density estimation, or fitting a probablitic predictive model > > 3) Learning a classifier > > The difference between 2 and 3 is that 2 requires the existance of a > > likelihood for new data, whereas 3 only requires a maximum a posteriori > > decision. > Aren't all of these use cases predicting latent variables? > usecases: > 1) 'transform' is the prediction of a reduced dimension variable > 2/3) 'desntiy estimation' is the prediction of a latent label, either a hard 1ofK label like [0,0,1] in kmeans or a probabilistic label like [0.1,0.1,0.8] in GMM. (think it might be a bit confusing to lose the classification:supervised / clustering:unsupervised distinction) > so you only need one new method: .predict_latent(data) to unify all of > these. See Roweis and Gharamani's Unifying Review of Linear Gaussian > Models (1999). > http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.5555&rep=rep1&type=pdf > This also opens up things like Kalman Filters / EM in linear dynamic > systems, forward/backward / BaumWelch in HMMs, more general > variational methods and so on, all with the same basic API. Hey, Thanks for your feedback. I am a not sure I understand how different problems can be collapsed under one formalism and one API. It probably reflects my incomplete understanding of machine learning. Please, let me object, and you can correct me. I guess my problem mainly lies with what the estimator applied to new data should return depending on the usecase. Let us call that method 'apply'. In the case of a 'transform', or filter, 'apply' returns data, reducted, or not (as in ICA). In the case of a predictive model, 'apply' returns a likelihood (by the way, I like 'evaluate'). In the case of a model with different classes, 'apply' returns a class label. While as you point out, there is probably a nice view of the various problems that unifies them under a same framework, I wonder how a single method can satisfy all the use cases. Also, a given estimator could be used for different use cases, if possible. Am I missing something? Do we really want to satisfy all the usecases with one method? Of course, the danger of having a forest of different methods, is that the API can becomes complex and confusing. Cheers, Gaël 
From: Mike Dewar <mikedewar@gm...>  20100423 23:16:48

So I should start by saying that, while I really like the idea of all of these use cases being predictions of latent variables, I agree with you that it might not be the best API. Having said that, I'll spend most of the rest of the email on the assumption that it is a good API, and that latent variable modelling is a good framework for this stuff, philosophically and practically. To start with, the 'transform' use case doesn't return data (where by data I mean collected observations). It returns a set of latent variables related to the data via some transform. So 'predictlatent' should return the results of applying the learned transformation to a data point (or the original data set). In the same way, clustering returns a distribution over a set of labels  a set of latent variables related to the data via some transform. Hence 'predictlatent' returns a 1ofk encoding for 'hard' kmeans and a multinomial for GMMs, given a new data point. The reason I like the latentvariable point of view is that it naturally applies to dynamic models like HMMs and Linear Dynamic Systems, too. 'predictlatent' in these contexts implements a forwards/ backwards algorithm or RauchTungStriebel smoother, respectively. Now, the use case that sticks out for me is the 'apply' method for density estimation that you've said would return a likelihood. Couldn't all generative models have a 'likelihood' method that would return the likelihood of the model parameters given a data set? This is a different sort of thing which means, I guess, that I don't quite see the difference between the 2nd and 3rd use cases. So, to answer your email, these aren't particularly different problems formally, and hence there's an opportunity to make this clear in only providing a single method. One problem is that this has the potential of scaring users who just wanted to run PCA on their data (having said this there are plenty of implementations of this sort of thing already...). Another problem, of course, is that we often can't separate the problems of 'fitting' (learning) and 'prediction' (inference) in unsupervised problems. Hence what a user would actually need to call in a lot of cases is something like 'EM' which would then iteratively call 'fit' and 'predict' in order to build the model in the first place. Mike Dewar On 23 Apr 2010, at 17:56, Gael Varoquaux <gael.varoquaux@...> wrote: > On Fri, Apr 23, 2010 at 10:47:52AM 0400, Mike Dewar wrote: >>> Usecases: > >>> 1) Transforming the data. It seems that this falls under what Weka >>> calls >>> 'filters' > >>> 2) Density estimation, or fitting a probablitic predictive model > >>> 3) Learning a classifier > >>> The difference between 2 and 3 is that 2 requires the existance of a >>> likelihood for new data, whereas 3 only requires a maximum a >>> posteriori >>> decision. > >> Aren't all of these use cases predicting latent variables? > >> usecases: >> 1) 'transform' is the prediction of a reduced dimension variable >> 2/3) 'desntiy estimation' is the prediction of a latent label, >> either a hard 1ofK label like [0,0,1] in kmeans or a >> probabilistic label like [0.1,0.1,0.8] in GMM. (think it might be a >> bit confusing to lose the classification:supervised / >> clustering:unsupervised distinction) > >> so you only need one new method: .predict_latent(data) to unify all >> of >> these. See Roweis and Gharamani's Unifying Review of Linear Gaussian >> Models (1999). >> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.30.5555&rep=rep1&type=pdf > >> This also opens up things like Kalman Filters / EM in linear dynamic >> systems, forward/backward / BaumWelch in HMMs, more general >> variational methods and so on, all with the same basic API. > > Hey, > > Thanks for your feedback. > > I am a not sure I understand how different problems can be collapsed > under one formalism and one API. It probably reflects my incomplete > understanding of machine learning. Please, let me object, and you can > correct me. I guess my problem mainly lies with what the estimator > applied to new data should return depending on the usecase. Let us > call > that method 'apply'. > > In the case of a 'transform', or filter, 'apply' returns data, > reducted, > or not (as in ICA). > > In the case of a predictive model, 'apply' returns a likelihood (by > the > way, I like 'evaluate'). > > In the case of a model with different classes, 'apply' returns a class > label. > > While as you point out, there is probably a nice view of the various > problems that unifies them under a same framework, I wonder how a > single > method can satisfy all the use cases. Also, a given estimator could be > used for different use cases, if possible. > > Am I missing something? Do we really want to satisfy all the usecases > with one method? Of course, the danger of having a forest of different > methods, is that the API can becomes complex and confusing. > > Cheers, > > Gaël > >  >  >  >  > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: Yaroslav Halchenko <sf@on...>  20100426 20:34:02

just a drop of oil almost without any reasoning: On Fri, 23 Apr 2010, Mike Dewar wrote: > So, to answer your email, these aren't particularly different problems > formally, and hence there's an opportunity to make this clear in only > providing a single method. +1 and do not forget about 'def __call__' which might make some people happy since code might look for them as a simple mathematical equation (after preliminary '.fit(...)'ing if necessary ;) )  .. = /v\ = Keep in touch // \\ (yoh@www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^^^ [175555] 
From: Gael Varoquaux <gael.varoquaux@no...>  20100426 21:03:31

On Mon, Apr 26, 2010 at 04:33:55PM 0400, Yaroslav Halchenko wrote: > just a drop of oil almost without any reasoning: > On Fri, 23 Apr 2010, Mike Dewar wrote: > > So, to answer your email, these aren't particularly different problems > > formally, and hence there's an opportunity to make this clear in only > > providing a single method. > +1 I think that this is a dangerous path, because it may force upon a user to understand the theoretical big picture to use the package even in simple use cases, such as unsupervised dimension reduction. Let us use kmean clustering as an example: once I have fit the model to data, given new data, I may want to answer two different questions with this new data: i) How likely is this new data in this model? I can use this to crossvalidate the number of classes. ii) What is the class of the new samples according to the model that I have learned? In a more general framework, this would be: what is the value of the latent variables for these new observations? Although I can see that these two questions can be formulated as simply interrogating a probabilistic model with latent variables, I wonder if folding the two questions in a single method will make the code easier to follow. One question is whether every model can really be formulated to answer question ii. In other words, replacing should all 'transforms' be written as 'predict'. For instance when using PCA to do dimension reduction, we can indeed think of the problem as estimating a latentvariable model, and predicting the value of the latent variables from the data at hand, or from new data. However, this means that we have to digress about latentvariable models when explaining to a user how to do dimension reduction. And it becomes even stranger when doing supervised dimension reduction, such as univariate feature selection. I tend to prefer 'transform' rather than 'predict', for these operations, as I feel that it will make the code easier to read, although I am not completely sold, I must admit. > and do not forget about 'def __call__' which might make some > people happy since code might look for them as a simple mathematical > equation (after preliminary '.fit(...)'ing if necessary ;) ) I must admit that don't like that. I find that it makes the code less explicit. If I have an object, it changes are that has side effects. I would like to be aware about it, and seeing the code written as objectoriented code make it obvious. In addition, it then becomes even less clear what the method does: k_means = KMeans(n_classes=10) k_means.fit(X) Y = k_means(X) What is 'Y'? Whereas: k_means = KMeans(n_classes=10) k_means.fit(X) Y = k_means.predict(X) seems a bit more readable to me, and: k_means = KMeans(n_classes=10) k_means.fit(X) Y = k_means.likelihood(X) Is quite clear (due to the explicit name 'likelihood'). I guess that we will have to work from examples and explicit usecases. My 2 cents, Gaël 
From: Mike Dewar <mikedewar@gm...>  20100429 17:50:47

(sorry about the delay in replying to this thread  have been enjoying the conversation but forgot that I filter these emails into a separate folder, where they have been sitting most of the week) On 26 Apr 2010, at 17:03, Gael Varoquaux wrote: > On Mon, Apr 26, 2010 at 04:33:55PM 0400, Yaroslav Halchenko wrote: >> On Fri, 23 Apr 2010, Mike Dewar wrote: >>> So, to answer your email, these aren't particularly different problems >>> formally, and hence there's an opportunity to make this clear in only >>> providing a single method. >> +1 > > I think that this is a dangerous path, because it may force upon a user > to understand the theoretical big picture to use the package even in > simple use cases, such as unsupervised dimension reduction. I think you're right here, though maybe there's an opportunity to educate a bit: the 'simpler' usecases could be provided as wrappers onto a theoretically coherent framework. Then a user who just wants to do some dimension reduction will get what they're expecting, and a more advanced practitioner can use the single method as they see fit. Feels like it would be less work to build this way, too. I should reiterate something though  why would someone use this ML scikit to do PCA (or other basic algorithms that are all probably really well covered in Pythonland)? This isn't an argument to not include PCA, rather it's an argument to include it along everything else in the same consistent Machine Learning world view, not just anothertoolinthebag. > Let us use kmean clustering as an example: once I have fit the model to > data, given new data, I may want to answer two different questions with > this new data: > > i) How likely is this new data in this model? I can use this to > crossvalidate the number of classes. > > ii) What is the class of the new samples according to the model that I > have learned? In a more general framework, this would be: what is the > value of the latent variables for these new observations? > > Although I can see that these two questions can be formulated as simply > interrogating a probabilistic model with latent variables, I wonder if > folding the two questions in a single method will make the code easier to > follow. I think a model.likelihood(data) method is a totally valid method, and not one that should be folded into a sort of infer() (or predict_latent) method. As a side note  with unsupervised learning, seems like there should be at least three methods: model.learn(data) which finds parameters given the data and the current estimate of the hiddens, model.infer(data) which finds hidden variables given the current parameters and model.fit(data) which iterates between learn() and infer() to actually build the model in the first place. > One question is whether every model can really be formulated to answer > question ii. In other words, replacing should all 'transforms' be written > as 'predict'. For instance when using PCA to do dimension reduction, we > can indeed think of the problem as estimating a latentvariable model, > and predicting the value of the latent variables from the data at hand, > or from new data. However, this means that we have to digress about > latentvariable models when explaining to a user how to do dimension > reduction. And it becomes even stranger when doing supervised dimension > reduction, such as univariate feature selection. > > I tend to prefer 'transform' rather than 'predict', for these operations, > as I feel that it will make the code easier to read, although I am not > completely sold, I must admit. Given that, in some cases (like HMMs and mixture models), you're going to /have/ to explain latent variables anyway, why not explain latent variables straight away, then your'e done! I guess if you were writing a linearalgebra package focused on transformations then it would be a digression. But given that the latent variable model is such a prevalent one in ML then seems like this would be a great place to start explaining algorithms like PCA or kmeans that aren't traditionally described like this. > >> and do not forget about 'def __call__' which might make some >> people happy since code might look for them as a simple mathematical >> equation (after preliminary '.fit(...)'ing if necessary ;) ) > > I must admit that don't like that. I find that it makes the code less > explicit. If I have an object, it changes are that has side effects. I > would like to be aware about it, and seeing the code written as > objectoriented code make it obvious. In addition, it then becomes even > less clear what the method does: > > k_means = KMeans(n_classes=10) > k_means.fit(X) > Y = k_means(X) > > What is 'Y'? > > Whereas: > > k_means = KMeans(n_classes=10) > k_means.fit(X) > Y = k_means.predict(X) > > seems a bit more readable to me, and: > > k_means = KMeans(n_classes=10) > k_means.fit(X) > Y = k_means.likelihood(X) > > Is quite clear (due to the explicit name 'likelihood'). Not that I feel it's my place to discuss this, having only stopped lurking to give my opinion about some theory, but I'm with Gael on this one. Only use __call__ when it's totally, totally obvious. Mike > > I guess that we will have to work from examples and explicit usecases. > > My 2 cents, > > Gaël > >  > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: Gael Varoquaux <gael.varoquaux@no...>  20100513 20:10:35

Hey The conversation has settle down. I have learned a lot. Thanks a lot for the feedback I got, and it helped me shape my ideas. We'll turn this in APIs/implementation when we focus on Gaussian Mixture Modelling (hopefully soon). Gaël On Thu, Apr 29, 2010 at 01:50:33PM 0400, Mike Dewar wrote: > (sorry about the delay in replying to this thread  have been enjoying the conversation but forgot that I filter these emails into a separate folder, where they have been sitting most of the week) > On 26 Apr 2010, at 17:03, Gael Varoquaux wrote: > > On Mon, Apr 26, 2010 at 04:33:55PM 0400, Yaroslav Halchenko wrote: > >> On Fri, 23 Apr 2010, Mike Dewar wrote: > >>> So, to answer your email, these aren't particularly different problems > >>> formally, and hence there's an opportunity to make this clear in only > >>> providing a single method. > >> +1 > > I think that this is a dangerous path, because it may force upon a user > > to understand the theoretical big picture to use the package even in > > simple use cases, such as unsupervised dimension reduction. > I think you're right here, though maybe there's an opportunity to educate a bit: the 'simpler' usecases could be provided as wrappers onto a theoretically coherent framework. Then a user who just wants to do some dimension reduction will get what they're expecting, and a more advanced practitioner can use the single method as they see fit. Feels like it would be less work to build this way, too. > I should reiterate something though  why would someone use this ML scikit to do PCA (or other basic algorithms that are all probably really well covered in Pythonland)? This isn't an argument to not include PCA, rather it's an argument to include it along everything else in the same consistent Machine Learning world view, not just anothertoolinthebag. > > Let us use kmean clustering as an example: once I have fit the model to > > data, given new data, I may want to answer two different questions with > > this new data: > > i) How likely is this new data in this model? I can use this to > > crossvalidate the number of classes. > > ii) What is the class of the new samples according to the model that I > > have learned? In a more general framework, this would be: what is the > > value of the latent variables for these new observations? > > Although I can see that these two questions can be formulated as simply > > interrogating a probabilistic model with latent variables, I wonder if > > folding the two questions in a single method will make the code easier to > > follow. > I think a model.likelihood(data) method is a totally valid method, and not one that should be folded into a sort of infer() (or predict_latent) method. > As a side note  with unsupervised learning, seems like there should be at least three methods: model.learn(data) which finds parameters given the data and the current estimate of the hiddens, model.infer(data) which finds hidden variables given the current parameters and model.fit(data) which iterates between learn() and infer() to actually build the model in the first place. > > One question is whether every model can really be formulated to answer > > question ii. In other words, replacing should all 'transforms' be written > > as 'predict'. For instance when using PCA to do dimension reduction, we > > can indeed think of the problem as estimating a latentvariable model, > > and predicting the value of the latent variables from the data at hand, > > or from new data. However, this means that we have to digress about > > latentvariable models when explaining to a user how to do dimension > > reduction. And it becomes even stranger when doing supervised dimension > > reduction, such as univariate feature selection. > > I tend to prefer 'transform' rather than 'predict', for these operations, > > as I feel that it will make the code easier to read, although I am not > > completely sold, I must admit. > Given that, in some cases (like HMMs and mixture models), you're going to /have/ to explain latent variables anyway, why not explain latent variables straight away, then your'e done! I guess if you were writing a linearalgebra package focused on transformations then it would be a digression. But given that the latent variable model is such a prevalent one in ML then seems like this would be a great place to start explaining algorithms like PCA or kmeans that aren't traditionally described like this. > >> and do not forget about 'def __call__' which might make some > >> people happy since code might look for them as a simple mathematical > >> equation (after preliminary '.fit(...)'ing if necessary ;) ) > > I must admit that don't like that. I find that it makes the code less > > explicit. If I have an object, it changes are that has side effects. I > > would like to be aware about it, and seeing the code written as > > objectoriented code make it obvious. In addition, it then becomes even > > less clear what the method does: > > k_means = KMeans(n_classes=10) > > k_means.fit(X) > > Y = k_means(X) > > What is 'Y'? > > Whereas: > > k_means = KMeans(n_classes=10) > > k_means.fit(X) > > Y = k_means.predict(X) > > seems a bit more readable to me, and: > > k_means = KMeans(n_classes=10) > > k_means.fit(X) > > Y = k_means.likelihood(X) > > Is quite clear (due to the explicit name 'likelihood'). > Not that I feel it's my place to discuss this, having only stopped lurking to give my opinion about some theory, but I'm with Gael on this one. Only use __call__ when it's totally, totally obvious. > Mike > > I guess that we will have to work from examples and explicit usecases. > > My 2 cents, > > Gaël > >  > > _______________________________________________ > > Scikitlearngeneral mailing list > > Scikitlearngeneral@... > > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >  > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral  Gael Varoquaux Research Fellow, INRIA Laboratoire de NeuroImagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 GifsurYvette France Phone: ++ 33169087835 Mobile: ++ 33628256462 http://gaelvaroquaux.info 