From: <florian.wilhelm@gm...>  20140110 17:16:29

Hi, I'd like to add a TheilSen estimator for a multiple linear regression problem to ScikitLearn as described in the paper: http://home.olemiss.edu/~xdang/papers/MTSE.pdf Is anyone already working on this or are there any objections regarding the inclusion of a TheilSen estimator into ScikitLearn? Best regards, Florian Wilhelm 
From: Skipper Seabold <jsseabold@gm...>  20140110 18:18:35

Hi, There have been some implementations of TheilSen floating around for inclusion in statsmodels, but no PRs yet. IMO it might fit in a little better in statsmodels.robust than sklearn unless their are some aspects of TheilSen I'm not familiar with. Skipper Sent from my mobile > On Jan 10, 2014, at 12:16 PM, "florian.wilhelm@..." <florian.wilhelm@...> wrote: > > Hi, > > I'd like to add a TheilSen estimator for a multiple linear regression > problem to ScikitLearn as described in the paper: > http://home.olemiss.edu/~xdang/papers/MTSE.pdf > Is anyone already working on this or are there any objections > regarding the inclusion of a TheilSen estimator into ScikitLearn? > > Best regards, > > Florian Wilhelm > >  > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: <florian.wilhelm@gm...>  20140111 18:29:15

Hi, at Blue Yonder we often use ScikitLearn but are sometimes missing more robust regression methods that are not based on the L2 norm. So far I only knew TheilSen as a linear regression method with only a single explanatory variable. The work of Xin Dang, Hanxiang Peng, Xueqin Wang and Heping Zhang extend the method to n explanatory variables. So it should perfectly fit into the sklearn.linear_model subpackage I think. Where is the line drawn between functionality that should go into StatsModels and into ScikitLearn with respect to regression methods? Florian On 10 January 2014 19:18, Skipper Seabold <jsseabold@...> wrote: > Hi, > > There have been some implementations of TheilSen floating around for inclusion in statsmodels, but no PRs yet. IMO it might fit in a little better in statsmodels.robust than sklearn unless their are some aspects of TheilSen I'm not familiar with. > > Skipper > > Sent from my mobile > >> On Jan 10, 2014, at 12:16 PM, "florian.wilhelm@..." <florian.wilhelm@...> wrote: >> >> Hi, >> >> I'd like to add a TheilSen estimator for a multiple linear regression >> problem to ScikitLearn as described in the paper: >> http://home.olemiss.edu/~xdang/papers/MTSE.pdf >> Is anyone already working on this or are there any objections >> regarding the inclusion of a TheilSen estimator into ScikitLearn? >> >> Best regards, >> >> Florian Wilhelm >> >>  >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >> Critical Workloads, Development Environments & Everything In Between. >> Get a Quote or Start a Free Trial Today. >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > >  > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: Alexandre Gramfort <alexandre.gramfort@te...>  20140111 18:33:27
Attachments:
Message as HTML

hi, did you try SVR ? eventually setting epsilon to 0.? if it's too slow have a look at lightning new LinearSVR estimator. Alex On Sat, Jan 11, 2014 at 7:28 PM, florian.wilhelm@... < florian.wilhelm@...> wrote: > Hi, > > at Blue Yonder we often use ScikitLearn but are sometimes missing > more robust regression methods that are not based on the L2 norm. > So far I only knew TheilSen as a linear regression method with only a > single explanatory variable. The work of Xin Dang, Hanxiang Peng, > Xueqin Wang and Heping Zhang extend the method to n explanatory > variables. So it should perfectly fit into the sklearn.linear_model > subpackage I think. Where is the line drawn between functionality that > should go into StatsModels and into ScikitLearn with respect to > regression methods? > > Florian > > On 10 January 2014 19:18, Skipper Seabold <jsseabold@...> wrote: > > Hi, > > > > There have been some implementations of TheilSen floating around for > inclusion in statsmodels, but no PRs yet. IMO it might fit in a little > better in statsmodels.robust than sklearn unless their are some aspects of > TheilSen I'm not familiar with. > > > > Skipper > > > > Sent from my mobile > > > >> On Jan 10, 2014, at 12:16 PM, "florian.wilhelm@..." < > florian.wilhelm@...> wrote: > >> > >> Hi, > >> > >> I'd like to add a TheilSen estimator for a multiple linear regression > >> problem to ScikitLearn as described in the paper: > >> http://home.olemiss.edu/~xdang/papers/MTSE.pdf > >> Is anyone already working on this or are there any objections > >> regarding the inclusion of a TheilSen estimator into ScikitLearn? > >> > >> Best regards, > >> > >> Florian Wilhelm > >> > >> >  > >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. > >> Learn Why More Businesses Are Choosing CenturyLink Cloud For > >> Critical Workloads, Development Environments & Everything In Between. > >> Get a Quote or Start a Free Trial Today. > >> > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > >> _______________________________________________ > >> Scikitlearngeneral mailing list > >> Scikitlearngeneral@... > >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > > > >  > > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > > Learn Why More Businesses Are Choosing CenturyLink Cloud For > > Critical Workloads, Development Environments & Everything In Between. > > Get a Quote or Start a Free Trial Today. > > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > > _______________________________________________ > > Scikitlearngeneral mailing list > > Scikitlearngeneral@... > > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > >  > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Mathieu Blondel <mathieu@mb...>  20140113 07:15:44
Attachments:
Message as HTML

Here's an example that illustrates the use of LinearSVR for doing robust regression with lightning: https://github.com/mblondel/lightning/blob/master/examples/plot_robust_regression.py Regarding epsilon=0, it is a good choice for LinearSVR but less so for (kernel) SVR. epsilon=0 leads to completely dense solutions in the dual and so the kernel expansion (used in the predict method) might be slow to evaluate for large datasets. Mathieu On Sun, Jan 12, 2014 at 3:33 AM, Alexandre Gramfort < alexandre.gramfort@...> wrote: > hi, > > did you try SVR ? eventually setting epsilon to 0.? > > if it's too slow have a look at lightning new LinearSVR estimator. > > Alex > > > > > On Sat, Jan 11, 2014 at 7:28 PM, florian.wilhelm@... < > florian.wilhelm@...> wrote: > >> Hi, >> >> at Blue Yonder we often use ScikitLearn but are sometimes missing >> more robust regression methods that are not based on the L2 norm. >> So far I only knew TheilSen as a linear regression method with only a >> single explanatory variable. The work of Xin Dang, Hanxiang Peng, >> Xueqin Wang and Heping Zhang extend the method to n explanatory >> variables. So it should perfectly fit into the sklearn.linear_model >> subpackage I think. Where is the line drawn between functionality that >> should go into StatsModels and into ScikitLearn with respect to >> regression methods? >> >> Florian >> >> On 10 January 2014 19:18, Skipper Seabold <jsseabold@...> wrote: >> > Hi, >> > >> > There have been some implementations of TheilSen floating around for >> inclusion in statsmodels, but no PRs yet. IMO it might fit in a little >> better in statsmodels.robust than sklearn unless their are some aspects of >> TheilSen I'm not familiar with. >> > >> > Skipper >> > >> > Sent from my mobile >> > >> >> On Jan 10, 2014, at 12:16 PM, "florian.wilhelm@..." < >> florian.wilhelm@...> wrote: >> >> >> >> Hi, >> >> >> >> I'd like to add a TheilSen estimator for a multiple linear regression >> >> problem to ScikitLearn as described in the paper: >> >> http://home.olemiss.edu/~xdang/papers/MTSE.pdf >> >> Is anyone already working on this or are there any objections >> >> regarding the inclusion of a TheilSen estimator into ScikitLearn? >> >> >> >> Best regards, >> >> >> >> Florian Wilhelm >> >> >> >> >>  >> >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >> >> Critical Workloads, Development Environments & Everything In Between. >> >> Get a Quote or Start a Free Trial Today. >> >> >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> >> _______________________________________________ >> >> Scikitlearngeneral mailing list >> >> Scikitlearngeneral@... >> >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > >> > >>  >> > CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> > Learn Why More Businesses Are Choosing CenturyLink Cloud For >> > Critical Workloads, Development Environments & Everything In Between. >> > Get a Quote or Start a Free Trial Today. >> > >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> > _______________________________________________ >> > Scikitlearngeneral mailing list >> > Scikitlearngeneral@... >> > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> >> >>  >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >> Critical Workloads, Development Environments & Everything In Between. >> Get a Quote or Start a Free Trial Today. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > > > >  > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > 
From: <florian.wilhelm@gm...>  20140113 08:10:19

@Alexandre, @Mathieu: Thanks for these hints. I'll give it a try. So setting epsilon=0 and C to a large value should result in a regression in the L1 norm with almost no regularization of w, right?. One thing that just crossed my mind. Would it be possible in a linear SVR setting to let the norm(w) term [in the primal objective funtion] be in the L1 norm in order to get some sparsity like in Lasso? Florian On 13 January 2014 08:15, Mathieu Blondel <mathieu@...> wrote: > Here's an example that illustrates the use of LinearSVR for doing robust > regression with lightning: > https://github.com/mblondel/lightning/blob/master/examples/plot_robust_regression.py > > Regarding epsilon=0, it is a good choice for LinearSVR but less so for > (kernel) SVR. epsilon=0 leads to completely dense solutions in the dual and > so the kernel expansion (used in the predict method) might be slow to > evaluate for large datasets. > > Mathieu > > > > On Sun, Jan 12, 2014 at 3:33 AM, Alexandre Gramfort > <alexandre.gramfort@...> wrote: >> >> hi, >> >> did you try SVR ? eventually setting epsilon to 0.? >> >> if it's too slow have a look at lightning new LinearSVR estimator. >> >> Alex >> >> >> >> >> On Sat, Jan 11, 2014 at 7:28 PM, florian.wilhelm@... >> <florian.wilhelm@...> wrote: >>> >>> Hi, >>> >>> at Blue Yonder we often use ScikitLearn but are sometimes missing >>> more robust regression methods that are not based on the L2 norm. >>> So far I only knew TheilSen as a linear regression method with only a >>> single explanatory variable. The work of Xin Dang, Hanxiang Peng, >>> Xueqin Wang and Heping Zhang extend the method to n explanatory >>> variables. So it should perfectly fit into the sklearn.linear_model >>> subpackage I think. Where is the line drawn between functionality that >>> should go into StatsModels and into ScikitLearn with respect to >>> regression methods? >>> >>> Florian >>> >>> On 10 January 2014 19:18, Skipper Seabold <jsseabold@...> wrote: >>> > Hi, >>> > >>> > There have been some implementations of TheilSen floating around for >>> > inclusion in statsmodels, but no PRs yet. IMO it might fit in a little >>> > better in statsmodels.robust than sklearn unless their are some aspects of >>> > TheilSen I'm not familiar with. >>> > >>> > Skipper >>> > >>> > Sent from my mobile >>> > >>> >> On Jan 10, 2014, at 12:16 PM, "florian.wilhelm@..." >>> >> <florian.wilhelm@...> wrote: >>> >> >>> >> Hi, >>> >> >>> >> I'd like to add a TheilSen estimator for a multiple linear regression >>> >> problem to ScikitLearn as described in the paper: >>> >> http://home.olemiss.edu/~xdang/papers/MTSE.pdf >>> >> Is anyone already working on this or are there any objections >>> >> regarding the inclusion of a TheilSen estimator into ScikitLearn? >>> >> >>> >> Best regards, >>> >> >>> >> Florian Wilhelm >>> >> >>> >> >>> >>  >>> >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>> >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>> >> Critical Workloads, Development Environments & Everything In Between. >>> >> Get a Quote or Start a Free Trial Today. >>> >> >>> >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>> >> _______________________________________________ >>> >> Scikitlearngeneral mailing list >>> >> Scikitlearngeneral@... >>> >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>> > >>> > >>> >  >>> > CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>> > Learn Why More Businesses Are Choosing CenturyLink Cloud For >>> > Critical Workloads, Development Environments & Everything In Between. >>> > Get a Quote or Start a Free Trial Today. >>> > >>> > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>> > _______________________________________________ >>> > Scikitlearngeneral mailing list >>> > Scikitlearngeneral@... >>> > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>> >>> >>>  >>> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >>> Learn Why More Businesses Are Choosing CenturyLink Cloud For >>> Critical Workloads, Development Environments & Everything In Between. >>> Get a Quote or Start a Free Trial Today. >>> >>> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >>> _______________________________________________ >>> Scikitlearngeneral mailing list >>> Scikitlearngeneral@... >>> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> >> >> >> >>  >> CenturyLink Cloud: The Leader in Enterprise Cloud Services. >> Learn Why More Businesses Are Choosing CenturyLink Cloud For >> Critical Workloads, Development Environments & Everything In Between. >> Get a Quote or Start a Free Trial Today. >> >> http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > > >  > CenturyLink Cloud: The Leader in Enterprise Cloud Services. > Learn Why More Businesses Are Choosing CenturyLink Cloud For > Critical Workloads, Development Environments & Everything In Between. > Get a Quote or Start a Free Trial Today. > http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Mathieu Blondel <mathieu@mb...>  20140113 09:45:19
Attachments:
Message as HTML

On Mon, Jan 13, 2014 at 5:09 PM, florian.wilhelm@... < florian.wilhelm@...> wrote: > > So setting epsilon=0 and C to a large value should result in a > regression in the L1 norm with almost no regularization of w, right?. > One thing that just crossed my mind. Would it be possible in a linear > SVR setting to let the norm(w) term [in the primal objective funtion] > be in the L1 norm in order to get some sparsity like in Lasso? > That's definitely possible. If your dataset is not too large, you can formulate the objective as a linear program (LP) and use an LP solver to find the solution. Mathieu 
From: Mathieu Blondel <mathieu@mb...>  20140122 08:31:11
Attachments:
Message as HTML

I just remembered that you can also try RANSAC, which was recently added to scikitlearn master: http://scikitlearn.org/dev/auto_examples/linear_model/plot_ransac.html Mathieu On Mon, Jan 13, 2014 at 6:45 PM, Mathieu Blondel <mathieu@...>wrote: > > On Mon, Jan 13, 2014 at 5:09 PM, florian.wilhelm@... < > florian.wilhelm@...> wrote: > >> >> So setting epsilon=0 and C to a large value should result in a >> regression in the L1 norm with almost no regularization of w, right?. >> One thing that just crossed my mind. Would it be possible in a linear >> SVR setting to let the norm(w) term [in the primal objective funtion] >> be in the L1 norm in order to get some sparsity like in Lasso? >> > > That's definitely possible. If your dataset is not too large, you can > formulate the objective as a linear program (LP) and use an LP solver to > find the solution. > > Mathieu > 