You can subscribe to this list here.
2010 
_{Jan}
(23) 
_{Feb}
(4) 
_{Mar}
(56) 
_{Apr}
(74) 
_{May}
(107) 
_{Jun}
(79) 
_{Jul}
(212) 
_{Aug}
(122) 
_{Sep}
(289) 
_{Oct}
(176) 
_{Nov}
(531) 
_{Dec}
(268) 

2011 
_{Jan}
(255) 
_{Feb}
(157) 
_{Mar}
(199) 
_{Apr}
(274) 
_{May}
(495) 
_{Jun}
(157) 
_{Jul}
(276) 
_{Aug}
(212) 
_{Sep}
(356) 
_{Oct}
(356) 
_{Nov}
(421) 
_{Dec}
(365) 
2012 
_{Jan}
(530) 
_{Feb}
(236) 
_{Mar}
(495) 
_{Apr}
(286) 
_{May}
(347) 
_{Jun}
(253) 
_{Jul}
(335) 
_{Aug}
(254) 
_{Sep}
(429) 
_{Oct}
(506) 
_{Nov}
(358) 
_{Dec}
(147) 
2013 
_{Jan}
(492) 
_{Feb}
(328) 
_{Mar}
(477) 
_{Apr}
(348) 
_{May}
(248) 
_{Jun}
(237) 
_{Jul}
(526) 
_{Aug}
(407) 
_{Sep}
(253) 
_{Oct}
(263) 
_{Nov}
(202) 
_{Dec}
(184) 
2014 
_{Jan}
(246) 
_{Feb}
(258) 
_{Mar}
(305) 
_{Apr}
(168) 
_{May}
(182) 
_{Jun}
(238) 
_{Jul}
(340) 
_{Aug}
(256) 
_{Sep}
(312) 
_{Oct}
(168) 
_{Nov}
(135) 
_{Dec}
(125) 
2015 
_{Jan}
(75) 
_{Feb}
(326) 
_{Mar}
(440) 
_{Apr}
(277) 
_{May}
(203) 
_{Jun}
(133) 
_{Jul}
(182) 
_{Aug}
(208) 
_{Sep}
(125) 
_{Oct}
(133) 
_{Nov}
(86) 
_{Dec}
(101) 
2016 
_{Jan}
(119) 
_{Feb}
(167) 
_{Mar}
(162) 
_{Apr}
(97) 
_{May}
(54) 
_{Jun}
(5) 
_{Jul}
(2) 
_{Aug}

_{Sep}

_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 






1

2

3

4
(5) 
5
(2) 
6
(2) 
7
(7) 
8
(5) 
9

10

11
(3) 
12
(5) 
13
(10) 
14
(3) 
15
(5) 
16
(3) 
17
(9) 
18
(7) 
19
(2) 
20

21
(8) 
22
(7) 
23

24
(27) 
25
(31) 
26
(3) 
27
(24) 
28
(6) 
29

30

31
(2) 






From: Olivier Grisel <olivier.grisel@en...>  20101031 21:20:26

2010/10/31 Zak Stone <zstone@...>: > Hi folks, > > Has anyone successfully worked with the values in dual_coef_, > support_, and intercept_ when using the new precomputed kernel > support? I'm seeing wildly different values returned for identical > training data depending on whether or not I precompute the kernel. The > problem could certainly be on my end; I just thought I'd see whether > any of you have encountered similar issues. I think there has been little use of this outside of want can be found in the tests and examples folders. Can you please send a patch or a github fork and branch with a failing test that emphasizes your specific issue?  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Zak Stone <zstone@gm...>  20101031 19:46:01

Hi folks, Has anyone successfully worked with the values in dual_coef_, support_, and intercept_ when using the new precomputed kernel support? I'm seeing wildly different values returned for identical training data depending on whether or not I precompute the kernel. The problem could certainly be on my end; I just thought I'd see whether any of you have encountered similar issues. Thanks, Zak On Fri, Oct 22, 2010 at 3:05 AM, Fabian Pedregosa <fabian.pedregosa@...> wrote: > On Fri, Oct 22, 2010 at 4:32 AM, Zak Stone <zstone@...> wrote: >> Excellent news! Did you have a chance to make precomputed kernels work as well? > > Yes, they should work just as before. That was the trickiest part and > the reason why I decided to return the indices of support vectors. The > original libsvm does this through a hack in which it stores the > indices as the support vectors, but this obviously poses a consistency > problem for the API. > > Cheers, > >> >> Zak >> >> >> On Thu, Oct 21, 2010 at 9:50 AM, Fabian Pedregosa >> <fabian.pedregosa@...> wrote: >>> Hi all. >>> >>> After a bit of struggling with the build system, I finally merged into >>> master an update that uses libsvmdense instead of libsvm for Support >>> Vector classification and regression on dense arrays. In practice, >>> this means that classifiers that make use of libsvm have become much >>> more memoryefficient, needing about half memory compared to previous >>> versions. >>> >>> As an added bonus, the fit method now also returns the indices of >>> support vectors. Unfortuately, the API had to change a bit to allow >>> this: cf.support_ now stores the indices of support vectors and >>> clf.support_vectors_ holds the support vectors. >>> >>> In the short term, I'd like to add sample weights as implemented here >>> [1] and support for dtype=float16 arrays >>> >>> As usual, feedback is welcomed :) >>> >>> Fabian >>> >>> [1] http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances >>> >>>  >>> Nokia and AT&T present the 2010 Calling All InnovatorsNorth America contest >>> Create new apps & games for the Nokia N8 for consumers in U.S. and Canada >>> $10 million total in prizes  $4M cash, 500 devices, nearly $6M in marketing >>> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store >>> http://p.sf.net/sfu/nokiadev2dev >>> _______________________________________________ >>> Scikitlearngeneral mailing list >>> Scikitlearngeneral@... >>> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >>> >> >>  >> Nokia and AT&T present the 2010 Calling All InnovatorsNorth America contest >> Create new apps & games for the Nokia N8 for consumers in U.S. and Canada >> $10 million total in prizes  $4M cash, 500 devices, nearly $6M in marketing >> Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store >> http://p.sf.net/sfu/nokiadev2dev >> _______________________________________________ >> Scikitlearngeneral mailing list >> Scikitlearngeneral@... >> https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral >> > >  > Nokia and AT&T present the 2010 Calling All InnovatorsNorth America contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes  $4M cash, 500 devices, nearly $6M in marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store > http://p.sf.net/sfu/nokiadev2dev > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Mathieu Blondel <mathieu@mb...>  20101028 10:08:13

On Thu, Oct 28, 2010 at 6:26 PM, Olivier Grisel <olivier.grisel@...> wrote: > I think CSR is the most natural representation for efficient slicing > on shape[0]. If it does not work then it's probably a bug of the CSR > representation, no? Or me might just have to accept that there is no > fancy indexing available and that we have to explicitly perform > sp.issparse() check and compute the indexes explicitly in that case. Even in the event that they add the feature / fix the bug in a near feature, it will take a year or more to make it to mainstream distributions, so it will be necessary to be addressed one way or another in the scikit. And I agree that CSR is the most natural format for slicing rows. Mathieu 
From: Gael Varoquaux <gael.varoquaux@no...>  20101028 09:49:28

On Thu, Oct 28, 2010 at 11:36:22AM +0200, Olivier Grisel wrote: > 2010/10/28 Gael Varoquaux <gael.varoquaux@...>: > > On Thu, Oct 28, 2010 at 06:09:22PM +0900, Mathieu Blondel wrote: > >> On Thu, Oct 28, 2010 at 2:08 AM, Mathieu Blondel <mathieu@...> wrote: > >> > OK, I will try that. In my case, I think I will just need to convert > >> > to COO before fit() on the grid search object. Since other sparse > >> > matrices don't support fancy indexing, we probably want to raise an > >> > exception if sp.issparse(X) and not isspmatrix_coo(X) or automatically > >> > convert to COO if sp.issparse(X). > >> I tried with a COO matrix but I get > >> TypeError: only integer arrays with one element can be converted to an index > > Sorry, I was talking nonsense. You can do maskebased fancy indexing on > > sparse matrices only with CSC: > > In [16]: m.tocsc()[mask] > > Out[16]: > > <10x5 sparse matrix of type '<type 'numpy.float64'>' > > with 10 stored elements in Compressed Sparse Column format> > Are you sure this is working? Looks like there is a bug in scipy (or I am not understanding what's going on. I was pretty sure that at least one of the sparse matrix formats supported maskedbased fany indexing. Maybe I was wrong. Will need to investigate that. Right now I am busy elsewhere. Will get back when I find a bit of time. Gael 
From: Olivier Grisel <olivier.grisel@en...>  20101028 09:36:49

2010/10/28 Gael Varoquaux <gael.varoquaux@...>: > On Thu, Oct 28, 2010 at 06:09:22PM +0900, Mathieu Blondel wrote: >> On Thu, Oct 28, 2010 at 2:08 AM, Mathieu Blondel <mathieu@...> wrote: >> > OK, I will try that. In my case, I think I will just need to convert >> > to COO before fit() on the grid search object. Since other sparse >> > matrices don't support fancy indexing, we probably want to raise an >> > exception if sp.issparse(X) and not isspmatrix_coo(X) or automatically >> > convert to COO if sp.issparse(X). > >> I tried with a COO matrix but I get > >> TypeError: only integer arrays with one element can be converted to an index > > Sorry, I was talking nonsense. You can do maskebased fancy indexing on > sparse matrices only with CSC: > > In [16]: m.tocsc()[mask] > Out[16]: > <10x5 sparse matrix of type '<type 'numpy.float64'>' > with 10 stored elements in Compressed Sparse Column format> Are you sure this is working? What was the original shape of m?  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Gael Varoquaux <gael.varoquaux@no...>  20101028 09:29:09

On Thu, Oct 28, 2010 at 06:09:22PM +0900, Mathieu Blondel wrote: > On Thu, Oct 28, 2010 at 2:08 AM, Mathieu Blondel <mathieu@...> wrote: > > OK, I will try that. In my case, I think I will just need to convert > > to COO before fit() on the grid search object. Since other sparse > > matrices don't support fancy indexing, we probably want to raise an > > exception if sp.issparse(X) and not isspmatrix_coo(X) or automatically > > convert to COO if sp.issparse(X). > I tried with a COO matrix but I get > TypeError: only integer arrays with one element can be converted to an index Sorry, I was talking nonsense. You can do maskebased fancy indexing on sparse matrices only with CSC: In [16]: m.tocsc()[mask] Out[16]: <10x5 sparse matrix of type '<type 'numpy.float64'>' with 10 stored elements in Compressed Sparse Column format> The reason why fancy indexing cannot be used arbitrarily on sparse matrices is that, because of the internal storage structure of the sparse matrix, it would force conversion between sparse matrix format. I think everybody agrees that the current state of the scipy sparse matrix API is not ideal, but nobody has the time to rewamp it. I suggest therefore that: 1) Feature extraction routines return CSC matrices 2) Crossvalidation methods convert to CSC Sorry for the mix up CSC/COO Gaël 
From: Olivier Grisel <olivier.grisel@en...>  20101028 09:26:30

2010/10/28 Mathieu Blondel <mathieu@...>: > On Thu, Oct 28, 2010 at 2:08 AM, Mathieu Blondel <mathieu@...> wrote: >> OK, I will try that. In my case, I think I will just need to convert >> to COO before fit() on the grid search object. Since other sparse >> matrices don't support fancy indexing, we probably want to raise an >> exception if sp.issparse(X) and not isspmatrix_coo(X) or automatically >> convert to COO if sp.issparse(X). > > I tried with a COO matrix but I get > > TypeError: only integer arrays with one element can be converted to an index > > I have scipy 0.7 installed. You can reproduce the problem by running > sparse_cv.py, that I attached in the first email. I think CSR is the most natural representation for efficient slicing on shape[0]. If it does not work then it's probably a bug of the CSR representation, no? Or me might just have to accept that there is no fancy indexing available and that we have to explicitly perform sp.issparse() check and compute the indexes explicitly in that case.  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Mathieu Blondel <mathieu@mb...>  20101028 09:09:30

On Thu, Oct 28, 2010 at 2:08 AM, Mathieu Blondel <mathieu@...> wrote: > OK, I will try that. In my case, I think I will just need to convert > to COO before fit() on the grid search object. Since other sparse > matrices don't support fancy indexing, we probably want to raise an > exception if sp.issparse(X) and not isspmatrix_coo(X) or automatically > convert to COO if sp.issparse(X). I tried with a COO matrix but I get TypeError: only integer arrays with one element can be converted to an index I have scipy 0.7 installed. You can reproduce the problem by running sparse_cv.py, that I attached in the first email. Mathieu 
From: Mathieu Blondel <mathieu@mb...>  20101027 17:08:07

On Thu, Oct 28, 2010 at 1:46 AM, Gael Varoquaux <gael.varoquaux@...> wrote: > Yes, but it means that the expectation that mask is possible on X or y is > broken. I'd like this expectation to hold, because I suspect that I it is > people's mind. If people expect this to hold, then I would guess it's a problem with scipy sparce matrices, not the scikit. Anyway, I wonder why CSR matrices don't support masks on rows. > Here is my suggestion: use COO to carry the data around, and if CSR, CSC, > or DIAG are quicker for the linear algebra operations (as it is often the > case) do the conversion only when need (ie in the relevant fit routine). > This is generally accepted as a best practice by the people that I know > who use scipy's sparse matrices a lot. > > How does that sound? OK, I will try that. In my case, I think I will just need to convert to COO before fit() on the grid search object. Since other sparse matrices don't support fancy indexing, we probably want to raise an exception if sp.issparse(X) and not isspmatrix_coo(X) or automatically convert to COO if sp.issparse(X). Mathieu 
From: Gael Varoquaux <gael.varoquaux@no...>  20101027 16:46:52

On Thu, Oct 28, 2010 at 01:40:15AM +0900, Mathieu Blondel wrote: > > we might have to relax this requirement, but I'd much rather stick to it, > > as it is standard numpy good practice. > I use CSR matrices which are supposed to be fast to slice rowwise... > A quick fix is to do > ind = np.arange(X.shape[0]) > X_test, X_train = X[ind[test]], X[ind[train]] > in grid search for the matrices that don't support fancing indexing. Yes, but it means that the expectation that mask is possible on X or y is broken. I'd like this expectation to hold, because I suspect that I it is people's mind. > Another fix is to change crossvalidation to return indices but that's > an API change... I don't like that because it means that nested crossvalidation cannot be written as easily (when you are combining index arrays, it is really easy to make errors). Here is my suggestion: use COO to carry the data around, and if CSR, CSC, or DIAG are quicker for the linear algebra operations (as it is often the case) do the conversion only when need (ie in the relevant fit routine). This is generally accepted as a best practice by the people that I know who use scipy's sparse matrices a lot. How does that sound? Gaël 
From: Mathieu Blondel <mathieu@mb...>  20101027 16:40:25

On Thu, Oct 28, 2010 at 1:31 AM, Gael Varoquaux <gael.varoquaux@...> wrote: > On Thu, Oct 28, 2010 at 01:27:33AM +0900, Mathieu Blondel wrote: >> I suspect the problem might be incorrect slicing. As my second example >> script shows, slicing with an array of booleans returns the matrix >> unchanged. > > OK. Fancy indexing does not work with all kind of sparse matrices. What > kind of sparse matrices are you using? I suspect it might be good to use > COO. It doesn't work with CSR, CSC, LIL and DOK. I forgot to test COO. > > I had in mind that the requirement for X and y was to be indexable with > masks on the first axis, so that the following is valid: > > for test, train in KFold(100): > X_test, X_train = X[test], X[train] > > we might have to relax this requirement, but I'd much rather stick to it, > as it is standard numpy good practice. I use CSR matrices which are supposed to be fast to slice rowwise... A quick fix is to do ind = np.arange(X.shape[0]) X_test, X_train = X[ind[test]], X[ind[train]] in grid search for the matrices that don't support fancing indexing. Another fix is to change crossvalidation to return indices but that's an API change... Mathieu 
From: Gael Varoquaux <gael.varoquaux@no...>  20101027 16:31:49

On Thu, Oct 28, 2010 at 01:27:33AM +0900, Mathieu Blondel wrote: > I suspect the problem might be incorrect slicing. As my second example > script shows, slicing with an array of booleans returns the matrix > unchanged. OK. Fancy indexing does not work with all kind of sparse matrices. What kind of sparse matrices are you using? I suspect it might be good to use COO. I had in mind that the requirement for X and y was to be indexable with masks on the first axis, so that the following is valid: for test, train in KFold(100): X_test, X_train = X[test], X[train] we might have to relax this requirement, but I'd much rather stick to it, as it is standard numpy good practice. Gaël 
From: Mathieu Blondel <mathieu@mb...>  20101027 16:27:44

On Thu, Oct 28, 2010 at 12:58 AM, Gael Varoquaux <gael.varoquaux@...> wrote: > > 197 true_pos = np.sum(y_true[y_pred == 1]==1) > y_true = array([ 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., > 1., 1., 1., > 1., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., > 1., 1., 0., 0., 0., 0.]) > 198 false_pos = np.sum(y_true[y_pred == 1]==0) > 199 false_neg = np.sum(y_true[y_pred == 0]==1) > 200 precision = true_pos / float(true_pos + false_pos) > 201 recall = true_pos / float(true_pos + false_neg) > > ValueError: too many boolean indices This is where I was arrived after fixing the len(X) bug locally. I suspect the problem might be incorrect slicing. As my second example script shows, slicing with an array of booleans returns the matrix unchanged. (Remember, slicing is needed for splitting into train and test sets) I will check that tomorrow if nobody has fixed it by then (it's late here). Mathieu 
From: Olivier Grisel <olivier.grisel@en...>  20101027 16:10:10

2010/10/27 Olivier Grisel <olivier.grisel@...>: > > Indeed. This code should also be update to work with multiclass > classifiers where labels can be arbitrary integers (not just 0 and 1). > Furthermore I think we have binary classifier that output y \in {1, > 1} instead of y \in {0, 1}. BTW: I was planning to work on this issue tonight but Mathieu if you need it sooner, please go on. Just send me a notice if you start working on this so that I do not do the same in //.  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Olivier Grisel <olivier.grisel@en...>  20101027 16:08:20

2010/10/27 Gael Varoquaux <gael.varoquaux@...>: > On Wed, Oct 27, 2010 at 05:45:26PM +0200, Gael Varoquaux wrote: >> Which tells us that we need to use **always** X.shape[0] instead of >> len(X) to be compatible with sparse matrices. I wasn't aware of that; I >> have learned something. > >> Give me a minute and I think I should be able to fix your example. > > I have pushed a modification that improves the situation. But now I fall > in another problem in the precisionrecall: > > /volatile/varoquau/dev/scikitlearn/scikits/learn/metrics.pyc in > precision_recall(y_true=array([ 1., 1., 0., 0., 1., 1., 0., 0., > ..., 0., 0., > 1., 1., 0., 0., 0., 0.]), y_pred=array([0, 0, 0, 0, 0, 0, 0, > 0, 0, 0, 0, 0, 0, 0,..., 1, > 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)) > 192 > 193 References > 194 ========== > 195 http://en.wikipedia.org/wiki/Precision_and_recall > 196 """ > > 197 true_pos = np.sum(y_true[y_pred == 1]==1) > y_true = array([ 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., > 1., 1., 1., > 1., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., > 1., 1., 0., 0., 0., 0.]) > 198 false_pos = np.sum(y_true[y_pred == 1]==0) > 199 false_neg = np.sum(y_true[y_pred == 0]==1) > 200 precision = true_pos / float(true_pos + false_pos) > 201 recall = true_pos / float(true_pos + false_neg) > > ValueError: too many boolean indices Indeed. This code should also be update to work with multiclass classifiers where labels can be arbitrary integers (not just 0 and 1). Furthermore I think we have binary classifier that output y \in {1, 1} instead of y \in {0, 1}.  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Olivier Grisel <olivier.grisel@en...>  20101027 16:05:18

2010/10/27 Gael Varoquaux <gael.varoquaux@...>: > On Wed, Oct 27, 2010 at 05:58:11PM +0200, Gael Varoquaux wrote: >> I have pushed a modification that improves the situation. But now I fall >> in another problem in the precisionrecall: > > @mblondel: Discussion in the commit comments (I am not sure that you are > reading them) > http://github.com/scikitlearn/scikitlearn/commit/323a5d3254d3d2f5929538ae9106b62e7371efae#commitcomment178801 I noticed that github does not detect the @mention pattern and was about to resend on the mailing list...  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Gael Varoquaux <gael.varoquaux@no...>  20101027 16:03:29

On Wed, Oct 27, 2010 at 05:58:11PM +0200, Gael Varoquaux wrote: > I have pushed a modification that improves the situation. But now I fall > in another problem in the precisionrecall: @mblondel: Discussion in the commit comments (I am not sure that you are reading them) http://github.com/scikitlearn/scikitlearn/commit/323a5d3254d3d2f5929538ae9106b62e7371efae#commitcomment178801 
From: Gael Varoquaux <gael.varoquaux@no...>  20101027 15:58:19

On Wed, Oct 27, 2010 at 05:45:26PM +0200, Gael Varoquaux wrote: > Which tells us that we need to use **always** X.shape[0] instead of > len(X) to be compatible with sparse matrices. I wasn't aware of that; I > have learned something. > Give me a minute and I think I should be able to fix your example. I have pushed a modification that improves the situation. But now I fall in another problem in the precisionrecall: /volatile/varoquau/dev/scikitlearn/scikits/learn/metrics.pyc in precision_recall(y_true=array([ 1., 1., 0., 0., 1., 1., 0., 0., ..., 0., 0., 1., 1., 0., 0., 0., 0.]), y_pred=array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,..., 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int32)) 192 193 References 194 ========== 195 http://en.wikipedia.org/wiki/Precision_and_recall 196 """ > 197 true_pos = np.sum(y_true[y_pred == 1]==1) y_true = array([ 1., 1., 0., 0., 1., 1., 0., 0., 1., 1., 1., 1., 1., 1., 0., 0., 0., 1., 1., 1., 0., 1., 0., 0., 0., 0., 1., 1., 0., 0., 0., 0.]) 198 false_pos = np.sum(y_true[y_pred == 1]==0) 199 false_neg = np.sum(y_true[y_pred == 0]==1) 200 precision = true_pos / float(true_pos + false_pos) 201 recall = true_pos / float(true_pos + false_neg) ValueError: too many boolean indices ___________________________________________________________________________ WARNING: Failure executing file: <sparse_grid.py> It seems that y_pred and y_true don't have the same size. Can I leave you to look at that? I don't see an immediate solution. You might want to look at the joblib bug before: it prevents you from using the interactive debugging (ipython's "%debug" magic), and I find it terribly painful to debug with. Cheers, Gaël 
From: Keith Goodman <kwgoodman@gm...>  20101027 15:56:35

On Wed, Oct 27, 2010 at 8:54 AM, Keith Goodman <kwgoodman@...> wrote: > On Wed, Oct 27, 2010 at 8:43 AM, Alexandre Gramfort > <alexandre.gramfort@...> wrote: >> Hi Keith, >> >> I'm glad you enjoy the scikit :) >> >> I've tried your example but I cannot reproduce le pb on my mac: >> >> 6>run svm_hang.py >> 480 4 >> fitting with C=10.000000... >> 0.0295 seconds >> 480 4 >> fitting with C=20.000000... >> 0.0368 seconds >> 480 4 >> fitting with C=30.000000... >> 0.0580 seconds >> 480 4 >> fitting with C=40.000000... >> 0.0633 seconds >> 480 4 >> fitting with C=80.000000... >> 0.1233 seconds >> 480 4 >> fitting with C=100.000000... >> 0.1426 seconds >> >> where svm_hangs.py contains : >> >> import time >> import numpy as np >> from scikits.learn import svm >> >> def hang(C, npzfile='svc_hang.npz'): >> data = np.load(npzfile) >> x = data['x'] >> y = data['y'] >> clf = svm.SVC(C=C, kernel='linear', eps=1e3) >> n_samples, n_features = x.shape >> print n_samples, n_features >> print 'fitting with C=%f...' % C >> t1 = time.time() >> clf.fit(x, y) >> t2 = time.time() >> print '%7.4f seconds' % (t2  t1) >> >> if __name__ == '__main__': >> for C in [10, 20, 30, 40, 80, 100]: >> hang(C=C) >> >>  >> >> What version of the scikit are you using? on what system? > > On 64bit Ubuntu 10.04 I'm using: > > commit 0a914921c8d9ba256bf1ff3f4bdeafc5b4bba691 > Author: Mathieu Blondel <mathieu@...> > Date: Fri Oct 15 19:11:58 2010 +0900 > > The problem is gone after updating to: > > commit 694de78a894d7e89f21cff1543da6a88858b269a > Author: pprett <peter.prettenhofer@...> > Date: Wed Oct 27 14:07:37 2010 +0200 Ack. Sorry to including email addresses from the git log. 
From: Keith Goodman <kwgoodman@gm...>  20101027 15:54:42

On Wed, Oct 27, 2010 at 8:43 AM, Alexandre Gramfort <alexandre.gramfort@...> wrote: > Hi Keith, > > I'm glad you enjoy the scikit :) > > I've tried your example but I cannot reproduce le pb on my mac: > > 6>run svm_hang.py > 480 4 > fitting with C=10.000000... > 0.0295 seconds > 480 4 > fitting with C=20.000000... > 0.0368 seconds > 480 4 > fitting with C=30.000000... > 0.0580 seconds > 480 4 > fitting with C=40.000000... > 0.0633 seconds > 480 4 > fitting with C=80.000000... > 0.1233 seconds > 480 4 > fitting with C=100.000000... > 0.1426 seconds > > where svm_hangs.py contains : > > import time > import numpy as np > from scikits.learn import svm > > def hang(C, npzfile='svc_hang.npz'): > data = np.load(npzfile) > x = data['x'] > y = data['y'] > clf = svm.SVC(C=C, kernel='linear', eps=1e3) > n_samples, n_features = x.shape > print n_samples, n_features > print 'fitting with C=%f...' % C > t1 = time.time() > clf.fit(x, y) > t2 = time.time() > print '%7.4f seconds' % (t2  t1) > > if __name__ == '__main__': > for C in [10, 20, 30, 40, 80, 100]: > hang(C=C) > >  > > What version of the scikit are you using? on what system? On 64bit Ubuntu 10.04 I'm using: commit 0a914921c8d9ba256bf1ff3f4bdeafc5b4bba691 Author: Mathieu Blondel <mathieu@...> Date: Fri Oct 15 19:11:58 2010 +0900 The problem is gone after updating to: commit 694de78a894d7e89f21cff1543da6a88858b269a Author: pprett <peter.prettenhofer@...> Date: Wed Oct 27 14:07:37 2010 +0200 
From: Gael Varoquaux <gael.varoquaux@no...>  20101027 15:45:34

On Thu, Oct 28, 2010 at 12:30:42AM +0900, Mathieu Blondel wrote: > Lately I'm working with highly sparse matrices and I'm hitting the > problem that grid search doesn't seem to work with scipy sparse > matrices. > I attach a simple program to show that. I also attach another one to > show that slicing with array of booleans (as returned by the > crossvalidation objects) is not supported in scipy sparse matrices. > Slicing with indices works as expected for CSR matrices. > I suspect there is some work involved in joblib too. Since joblib has > efficient support for numpy arrays, it seems natural to convert sparse > matrices to CSR or COO format (conversion is implemented in C and is > fast) and store the internal data structures (e.g. X.data, X.indices, > X.indptr), which are numpy arrays. In my experience, pickle with the > highest protocol fails with some sparse matrices... No, joblib is fine here (at least from what I see when running your sparse_grid example): it relies mostly on the normal pickle routines, and the sparse matrices pickle. Joblib will simply capture the numpy arrays that represent the indices and values, and pickle these cleverly. The problem is simply: /volatile/varoquau/dev/scikitlearn/scikits/learn/grid_search.pyc in fit(self, X, y, refit, cv, **kw) 188 estimator = self.estimator 189 if cv is None: > 190 n_samples = len(X) 191 if y is not None and is_classifier(estimator): 192 cv = StratifiedKFold(y, k=3) /volatile/varoquau/dev/scipy/scipy/sparse/base.pyc in __len__(self) 188 def __len__(self): 189 # return self.getnnz() > 190 raise TypeError, "sparse matrix length is ambiguous; use getnnz()" \ 191 " or shape[0]" Which tells us that we need to use **always** X.shape[0] instead of len(X) to be compatible with sparse matrices. I wasn't aware of that; I have learned something. Give me a minute and I think I should be able to fix your example. > On a related note, I find the multiprocessing trace reports quite > difficult to read and sometimes the programs are difficult to kill. It > would be nice that if n_jobs=1, we get normal trace reports. Yes. I used to be the case. Fabian implemented a nice idea in joblib to mark exceptions, but as a consequence we loose the original tracebacks. IMHO this is a bug and needs to be fixed. I thought that I had fixed it in: http://github.com/joblib/joblib/commit/e4c6adf8e224c92669c79a7ec0f89418af044b46 but I have noticed, as you, that it isn't the case. I just haven't found time to look at it anymore :( (right now I am terribly busy, and it will be the case for another full month). If you want, just fork joblib and fix it, adding a test. I'll be much grateful, as it bother's me too. Gaël 
From: Alexandre Gramfort <alexandre.gramfort@in...>  20101027 15:43:58

Hi Keith, I'm glad you enjoy the scikit :) I've tried your example but I cannot reproduce le pb on my mac: 6>run svm_hang.py 480 4 fitting with C=10.000000... 0.0295 seconds 480 4 fitting with C=20.000000... 0.0368 seconds 480 4 fitting with C=30.000000... 0.0580 seconds 480 4 fitting with C=40.000000... 0.0633 seconds 480 4 fitting with C=80.000000... 0.1233 seconds 480 4 fitting with C=100.000000... 0.1426 seconds where svm_hangs.py contains : import time import numpy as np from scikits.learn import svm def hang(C, npzfile='svc_hang.npz'): data = np.load(npzfile) x = data['x'] y = data['y'] clf = svm.SVC(C=C, kernel='linear', eps=1e3) n_samples, n_features = x.shape print n_samples, n_features print 'fitting with C=%f...' % C t1 = time.time() clf.fit(x, y) t2 = time.time() print '%7.4f seconds' % (t2  t1) if __name__ == '__main__': for C in [10, 20, 30, 40, 80, 100]: hang(C=C)  What version of the scikit are you using? on what system?  Alex On Wed, Oct 27, 2010 at 5:33 PM, Keith Goodman <kwgoodman@...> wrote: > I'm enjoying scikits.learn. The examples and (pretty) plots in the > sphinx doc are great. > > While playing around with SVMs I ran into a set of x, y, and C that > hangs svm.SVC.fit. I've attached the data. Here's a script that > demonstrates: > > Output: > >>> hang(C=100) > fitting with C=100.000000... > 0.0790 seconds >>> hang(C=90) > fitting with C=90.000000... > 0.0798 seconds >>> hang(C=80) > fitting with C=80.000000... > ^Z > [4]+ Stopped ipython > > Script: > > import time > import numpy as np > from scikits.learn import svm > > def hang(C, npzfile='svc_hang.npz'): > data = np.load(npzfile) > x = data['x'] > y = data['y'] > clf = svm.SVC(C=C, kernel='linear', eps=1e3) > print 'fitting with C=%f...' % C > t1 = time.time() > clf.fit(x, y) > t2 = time.time() > print '%7.4f seconds' % (t2  t1) > > Changing eps from 1e3 to 1e2 makes it hang at around C=10. Would a > timeout and exit code help? > >  > Nokia and AT&T present the 2010 Calling All InnovatorsNorth America contest > Create new apps & games for the Nokia N8 for consumers in U.S. and Canada > $10 million total in prizes  $4M cash, 500 devices, nearly $6M in marketing > Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store > http://p.sf.net/sfu/nokiadev2dev > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > 
From: Keith Goodman <kwgoodman@gm...>  20101027 15:32:54

I'm enjoying scikits.learn. The examples and (pretty) plots in the sphinx doc are great. While playing around with SVMs I ran into a set of x, y, and C that hangs svm.SVC.fit. I've attached the data. Here's a script that demonstrates: Output: >> hang(C=100) fitting with C=100.000000... 0.0790 seconds >> hang(C=90) fitting with C=90.000000... 0.0798 seconds >> hang(C=80) fitting with C=80.000000... ^Z [4]+ Stopped ipython Script: import time import numpy as np from scikits.learn import svm def hang(C, npzfile='svc_hang.npz'): data = np.load(npzfile) x = data['x'] y = data['y'] clf = svm.SVC(C=C, kernel='linear', eps=1e3) print 'fitting with C=%f...' % C t1 = time.time() clf.fit(x, y) t2 = time.time() print '%7.4f seconds' % (t2  t1) Changing eps from 1e3 to 1e2 makes it hang at around C=10. Would a timeout and exit code help? 
From: Mathieu Blondel <mathieu@mb...>  20101027 15:30:49

Hello, Lately I'm working with highly sparse matrices and I'm hitting the problem that grid search doesn't seem to work with scipy sparse matrices. I attach a simple program to show that. I also attach another one to show that slicing with array of booleans (as returned by the crossvalidation objects) is not supported in scipy sparse matrices. Slicing with indices works as expected for CSR matrices. I suspect there is some work involved in joblib too. Since joblib has efficient support for numpy arrays, it seems natural to convert sparse matrices to CSR or COO format (conversion is implemented in C and is fast) and store the internal data structures (e.g. X.data, X.indices, X.indptr), which are numpy arrays. In my experience, pickle with the highest protocol fails with some sparse matrices... On a related note, I find the multiprocessing trace reports quite difficult to read and sometimes the programs are difficult to kill. It would be nice that if n_jobs=1, we get normal trace reports. Thanks, Mathieu 
From: Olivier Grisel <olivier.grisel@en...>  20101027 11:54:39

Hi Peter, I don't know if you are already registerd on the scikitlearncommits maling list or not. If not here is a failure report from the buildbot related to the tests in the SGD documentation: http://buildbot.afpy.org/scikitlearn/builders/ubuntu64bit/builds/247/steps/test/logs/stdio  Olivier  Forwarded message  From: <olivier.grisel@...> Date: 2010/10/27 Subject: [Scikitlearncommits] buildbot failure in Scikit Learn on ubuntu64bit To: scikitlearncommits@... The Buildbot has detected a failed build on builder ubuntu64bit while building Scikit Learn. Full details are available at: http://buildbot.afpy.org/scikitlearn/builders/ubuntu64bit/builds/247 Buildbot URL: http://buildbot.afpy.org/scikitlearn/ Buildslave for this Build: scikitlearnslave Build Reason: The Periodic scheduler named 'periodic' triggered this build Build Source Stamp: HEAD Blamelist: BUILD FAILED: failed test sincerely, The Buildbot  Nokia and AT&T present the 2010 Calling All InnovatorsNorth America contest Create new apps & games for the Nokia N8 for consumers in U.S. and Canada $10 million total in prizes  $4M cash, 500 devices, nearly $6M in marketing Develop with Nokia Qt SDK, Web Runtime, or Java and Publish to Ovi Store http://p.sf.net/sfu/nokiadev2dev _______________________________________________ Scikitlearncommits mailing list Scikitlearncommits@... https://lists.sourceforge.net/lists/listinfo/scikitlearncommits  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 