Work at SourceForge, help us to make it a better place! We have an immediate need for a Support Technician in our San Francisco or Denver office.
Close
From: Bilal Dadanlar <bilal.dadanlar@ci...>  20130507 08:18:33
Attachments:
Message as HTML

Hi, For a classification problem, I need a short list of possible classes and their confidence of predictions (to find a treshold classifier is 99% sure). I used a multiclass SVM. dataset has 1000 classes, 78 instances for each and 2000 attributes. *.predict()* results are 72% accurate. However, results from *.predict_proba()* didn't work well in this case. most probable result is 30% accurate. .predict_proba() works different than .predict() http://stackoverflow.com/questions/15111408/howdoessklearnsvmsvcsfunctionpredictprobaworkinternally So, is there a way to calculate better predictions for ranking with probabilities? Thank you  Bilal Dadanlar cimri.com  Software Engineer 
From: Peter Prettenhofer <peter.prettenhofer@gm...>  20130507 09:03:56
Attachments:
Message as HTML

Do you need probabilities? You could just use the signed distance to each OVA hyperplane (via ``clf.decision_function()``) to rank the classes. Maybe the plattscaling screws up here... You could also look at Mathieu's "lightning" project https://github.com/mblondel/lightning  it features multinomial logistic regression which might give better calibrated probabilities than platt scaling... HTH 2013/5/7 Bilal Dadanlar <bilal.dadanlar@...> > Hi, > > For a classification problem, I need a short list of possible classes and > their confidence of predictions (to find a treshold classifier is 99% > sure). > > I used a multiclass SVM. dataset has 1000 classes, 78 instances for each > and 2000 attributes. *.predict()* results are 72% accurate. However, > results from *.predict_proba()* didn't work well in this case. most > probable result is 30% accurate. .predict_proba() works different than > .predict() > http://stackoverflow.com/questions/15111408/howdoessklearnsvmsvcsfunctionpredictprobaworkinternally > > So, is there a way to calculate better predictions for ranking with > probabilities? > > Thank you > >  > Bilal Dadanlar > cimri.com  Software Engineer > > >  > Learn Graph Databases  Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their applications. This 200page book is written by three acclaimed > leaders in the field. The early access version is available now. > Download your free book today! http://p.sf.net/sfu/neotech_d2d_may > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > >  Peter Prettenhofer 
From: abdalrahman eweiwi <abdalrahman.eweiwi@gm...>  20130507 10:09:54
Attachments:
Message as HTML

Hi, I have recently used partial least squares(PLS) and kernel partial least squares (KPLS) for a similar task, where I was interested in prediction confidence for each class on test data. It worked for me better than *SVM*and *predict_proba()*. PLS is already implemented in sklearn but not KPLS. To use PLS for this task you have treat your problem as a regression problem where the target matrix encodes class memberships of each training samples(read chapter 4 from the Elements of statistical learning). Hope that this help. A.Eweiwi On Tue, May 7, 2013 at 11:03 AM, Peter Prettenhofer < peter.prettenhofer@...> wrote: > Do you need probabilities? You could just use the signed distance to each > OVA hyperplane (via ``clf.decision_function()``) to rank the classes. Maybe > the plattscaling screws up here... > You could also look at Mathieu's "lightning" project > https://github.com/mblondel/lightning  it features multinomial logistic > regression which might give better calibrated probabilities than platt > scaling... > > HTH > > > 2013/5/7 Bilal Dadanlar <bilal.dadanlar@...> > >> Hi, >> >> For a classification problem, I need a short list of possible classes and >> their confidence of predictions (to find a treshold classifier is 99% >> sure). >> >> I used a multiclass SVM. dataset has 1000 classes, 78 instances for each >> and 2000 attributes. *.predict()* results are 72% accurate. However, >> results from *.predict_proba()* didn't work well in this case. most >> probable result is 30% accurate. .predict_proba() works different than >> .predict() >> http://stackoverflow.com/questions/15111408/howdoessklearnsvmsvcsfunctionpredictprobaworkinternally >> >> So, is there a way to calculate better predictions for ranking with >> probabilities? >> >> Thank you >> >>  >> Bilal Dadanlar >> cimri.com  Software Engineer >> >> >>  >> Learn Graph Databases  Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and >> their applications. This 200page book is written by three acclaimed >> leaders in the field. The early access version is available now. >> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> >> > > >  > Peter Prettenhofer > > >  > Learn Graph Databases  Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their applications. This 200page book is written by three acclaimed > leaders in the field. The early access version is available now. > Download your free book today! http://p.sf.net/sfu/neotech_d2d_may > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > 
From: Lars Buitinck <L.J.B<uitinck@uv...>  20130507 10:07:10

2013/5/7 Peter Prettenhofer <peter.prettenhofer@...>: > Do you need probabilities? You could just use the signed distance to each > OVA hyperplane (via ``clf.decision_function()``) to rank the classes. Maybe > the plattscaling screws up here... The more I find out about Platt scaling in LibSVM, the more I'm inclined to stay away from it. > You could also look at Mathieu's "lightning" project > https://github.com/mblondel/lightning  it features multinomial logistic > regression which might give better calibrated probabilities than platt > scaling... Or our own LogisticRegression. It cuts some corners, but sometimes it's good enough.  Lars Buitinck Scientific programmer, ILPS University of Amsterdam 
From: Peter Prettenhofer <peter.prettenhofer@gm...>  20130507 10:10:10
Attachments:
Message as HTML

2013/5/7 Lars Buitinck <L.J.Buitinck@...> > 2013/5/7 Peter Prettenhofer <peter.prettenhofer@...>: > > Do you need probabilities? You could just use the signed distance to each > > OVA hyperplane (via ``clf.decision_function()``) to rank the classes. > Maybe > > the plattscaling screws up here... > > The more I find out about Platt scaling in LibSVM, the more I'm > inclined to stay away from it. > > > You could also look at Mathieu's "lightning" project > > https://github.com/mblondel/lightning  it features multinomial > logistic > > regression which might give better calibrated probabilities than platt > > scaling... > > Or our own LogisticRegression. It cuts some corners, but sometimes > it's good enough. > Right, it should give you the same ordering as ``decision_function`` (just normalized). > >  > Lars Buitinck > Scientific programmer, ILPS > University of Amsterdam > > >  > Learn Graph Databases  Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their applications. This 200page book is written by three acclaimed > leaders in the field. The early access version is available now. > Download your free book today! http://p.sf.net/sfu/neotech_d2d_may > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >  Peter Prettenhofer 
From: Bilal Dadanlar <bilal.dadanlar@ci...>  20130507 20:56:13
Attachments:
Message as HTML

Thank you all. I havent thought of regression before. but i'll give it a try. afaik, platt scaling also use .decision_function() values, which gives N*(N1)/2 signed distance for each class pair (Ni,Nj). but I'm not sure if i can create a better ranking according to them. On Tue, May 7, 2013 at 1:09 PM, Peter Prettenhofer < peter.prettenhofer@...> wrote: > > > > 2013/5/7 Lars Buitinck <L.J.Buitinck@...> > >> 2013/5/7 Peter Prettenhofer <peter.prettenhofer@...>: >> > Do you need probabilities? You could just use the signed distance to >> each >> > OVA hyperplane (via ``clf.decision_function()``) to rank the classes. >> Maybe >> > the plattscaling screws up here... >> >> The more I find out about Platt scaling in LibSVM, the more I'm >> inclined to stay away from it. >> >> > You could also look at Mathieu's "lightning" project >> > https://github.com/mblondel/lightning  it features multinomial >> logistic >> > regression which might give better calibrated probabilities than platt >> > scaling... >> >> Or our own LogisticRegression. It cuts some corners, but sometimes >> it's good enough. >> > > Right, it should give you the same ordering as ``decision_function`` (just > normalized). > > >> >>  >> Lars Buitinck >> Scientific programmer, ILPS >> University of Amsterdam >> >> >>  >> Learn Graph Databases  Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and >> their applications. This 200page book is written by three acclaimed >> leaders in the field. The early access version is available now. >> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > > > >  > Peter Prettenhofer > > >  > Learn Graph Databases  Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases and > their applications. This 200page book is written by three acclaimed > leaders in the field. The early access version is available now. > Download your free book today! http://p.sf.net/sfu/neotech_d2d_may > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > >  Bilal Dadanlar cimri.com  Yazılım Mühendisi 
From: Bilal Dadanlar <bilal.dadanlar@ci...>  20130514 06:38:21
Attachments:
Message as HTML

In case it may help any of you, I simply ended up using linearSVC. Its .decision_function() gives one value for each class and compatible with .predict() function! Plus, it turned out to be faster than svm.svc (as always) and more accurate for my dataset. Thanks again for your help On Tue, May 7, 2013 at 5:03 PM, Bilal Dadanlar <bilal.dadanlar@...>wrote: > Thank you all. I havent thought of regression before. but i'll give it a > try. > > afaik, platt scaling also use .decision_function() values, which gives > N*(N1)/2 signed distance for each class pair (Ni,Nj). > but I'm not sure if i can create a better ranking according to them. > > > > On Tue, May 7, 2013 at 1:09 PM, Peter Prettenhofer < > peter.prettenhofer@...> wrote: > >> >> >> >> 2013/5/7 Lars Buitinck <L.J.Buitinck@...> >> >>> 2013/5/7 Peter Prettenhofer <peter.prettenhofer@...>: >>> > Do you need probabilities? You could just use the signed distance to >>> each >>> > OVA hyperplane (via ``clf.decision_function()``) to rank the classes. >>> Maybe >>> > the plattscaling screws up here... >>> >>> The more I find out about Platt scaling in LibSVM, the more I'm >>> inclined to stay away from it. >>> >>> > You could also look at Mathieu's "lightning" project >>> > https://github.com/mblondel/lightning  it features multinomial >>> logistic >>> > regression which might give better calibrated probabilities than platt >>> > scaling... >>> >>> Or our own LogisticRegression. It cuts some corners, but sometimes >>> it's good enough. >>> >> >> Right, it should give you the same ordering as ``decision_function`` >> (just normalized). >> >> >>> >>>  >>> Lars Buitinck >>> Scientific programmer, ILPS >>> University of Amsterdam >>> >>> >>>  >>> Learn Graph Databases  Download FREE O'Reilly Book >>> "Graph Databases" is the definitive new guide to graph databases and >>> their applications. This 200page book is written by three acclaimed >>> leaders in the field. The early access version is available now. >>> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may >>> _______________________________________________ >>> Scikitlearngeneral mailing list >>> Scikitlearngeneral@... >>> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>> >> >> >> >>  >> Peter Prettenhofer >> >> >>  >> Learn Graph Databases  Download FREE O'Reilly Book >> "Graph Databases" is the definitive new guide to graph databases and >> their applications. This 200page book is written by three acclaimed >> leaders in the field. The early access version is available now. >> Download your free book today! http://p.sf.net/sfu/neotech_d2d_may >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> >> > > >  > Bilal Dadanlar > cimri.com  Yazılım Mühendisi >  Bilal Dadanlar cimri.com  Yazılım Mühendisi 