You can subscribe to this list here.
2010 
_{Jan}
(23) 
_{Feb}
(4) 
_{Mar}
(56) 
_{Apr}
(74) 
_{May}
(107) 
_{Jun}
(79) 
_{Jul}
(212) 
_{Aug}
(122) 
_{Sep}
(289) 
_{Oct}
(176) 
_{Nov}
(531) 
_{Dec}
(268) 

2011 
_{Jan}
(255) 
_{Feb}
(157) 
_{Mar}
(199) 
_{Apr}
(274) 
_{May}
(495) 
_{Jun}
(157) 
_{Jul}
(276) 
_{Aug}
(212) 
_{Sep}
(356) 
_{Oct}
(356) 
_{Nov}
(421) 
_{Dec}
(365) 
2012 
_{Jan}
(530) 
_{Feb}
(236) 
_{Mar}
(495) 
_{Apr}
(286) 
_{May}
(347) 
_{Jun}
(253) 
_{Jul}
(335) 
_{Aug}
(254) 
_{Sep}
(429) 
_{Oct}
(506) 
_{Nov}
(358) 
_{Dec}
(147) 
2013 
_{Jan}
(492) 
_{Feb}
(328) 
_{Mar}
(477) 
_{Apr}
(348) 
_{May}
(248) 
_{Jun}
(237) 
_{Jul}
(526) 
_{Aug}
(407) 
_{Sep}
(253) 
_{Oct}
(263) 
_{Nov}
(202) 
_{Dec}
(184) 
2014 
_{Jan}
(246) 
_{Feb}
(258) 
_{Mar}
(305) 
_{Apr}
(168) 
_{May}
(182) 
_{Jun}
(238) 
_{Jul}
(340) 
_{Aug}
(256) 
_{Sep}
(312) 
_{Oct}
(168) 
_{Nov}
(135) 
_{Dec}
(118) 
S  M  T  W  T  F  S 



1
(32) 
2
(20) 
3
(4) 
4
(9) 
5
(9) 
6
(2) 
7
(3) 
8
(3) 
9
(3) 
10
(2) 
11
(26) 
12

13

14

15

16

17

18
(2) 
19

20
(1) 
21
(14) 
22
(6) 
23
(2) 
24
(6) 
25
(1) 
26

27
(8) 
28
(4) 





From: Gael Varoquaux <gael.varoquaux@no...>  20110228 22:28:29

On Mon, Feb 28, 2011 at 05:18:18PM 0500, Satrajit Ghosh wrote: > came across this recently and i thought the framework would be a nice > thing to consider implementing (especially given gael's new updates and > thoughts on gridsearchcv). > [1]http://www.stat.berkeley.edu/users/ecpolley/presentations/assets/NCIBRB_final.pdf :) > (see page 13 for the framework schematic) I burst into laughter when I saw that. I have such a dream, one day, but for the thing to work reasonnably well, I feel that we need to be able to give it a lot of simple rules to guide which algorithms to try. It will take a while to get there. In the mean time, another nice feature would be objects that given a model (linear? which penalization?) and data find the best algorithm. We are slowing moving toward that. I suspect that it's an intermediate step to get the super learner. G 
From: Satrajit Ghosh <satra@mi...>  20110228 22:18:26

hi all, came across this recently and i thought the framework would be a nice thing to consider implementing (especially given gael's new updates and thoughts on gridsearchcv). http://www.stat.berkeley.edu/users/ecpolley/presentations/assets/NCIBRB_final.pdf(see page 13 for the framework schematic) http://www.stat.berkeley.edu/users/ecpolley/SL/ cheers, satra 
From: Gael Varoquaux <gael.varoquaux@no...>  20110228 12:53:10

On Mon, Feb 28, 2011 at 12:22:46PM +0100, Olivier Grisel wrote: > 2011/2/28 Gael Varoquaux <gael.varoquaux@...>: > > Any feedback is more than welcome. > Good work :) Thanks. Your consideration warms my heart (sincerely). Now, the big deal, as far as the scikit is considered: I have refactored the GridSearchCV to be able to distribute not only different parameters to the various CPUs, but also the different folds. The reason I did this is that I am currently fitting a SVM to largish data with a GridSearch and 3fold crossvalidation. I have a 12 CPU box, and most of the time, most of the CPUs were not doing anything. Indeed, a small number of parameter sets on the grid dominate the computation time. It is often the case in my experience. In the branch: https://github.com/GaelVaroquaux/scikitlearn/tree/grid_search each fold is fitted in parallel. Thus the different folds of the costly grid points are dispatched across CPUs. For a 3 fold CV, this almost gives a factor of 3 speedup on my box on my specific problem (as the number of CPUs is large compared to the number of folds) and the computational time is really dominated by 1 point. The danger is to blow the memory by dispatching a huge amount of jobs with different datasets. Thus the work I did in joblib with the pre_dispatch :). Now, I still need to be convinced that I haven't introduced a bug in the way the scores are computed. I need to run this a bit more on my data. If you want to give it a look/ a try, feedback is welcomed (and yes, I know, the code for unrolling the parallel loop is hard to read :$ ). G 
From: Olivier Grisel <olivier.grisel@en...>  20110228 11:23:14

2011/2/28 Gael Varoquaux <gael.varoquaux@...>: > Any feedback is more than welcome. Good work :)  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Gael Varoquaux <gael.varoquaux@no...>  20110227 23:24:56

Hi, I was looking at huge parallel for loops ran with joblib.Parallel (to be precise, in the scikits.learn's GridSearchCV) and I realized that as joblib was dispatching immediatly to subprocesses, it could create huge temporaries. Thus I refactored the Parallel engine, to enable late dispatches: >>> from math import sqrt >>> from joblib import Parallel, delayed >>> def producer(): ... for i in range(6): ... print 'Produced %s' % i ... yield i >>> out = Parallel(n_jobs=2, verbose=1, pre_dispatch='1.5*n_jobs')( ... delayed(sqrt)(i) for i in producer()) Produced 0 Produced 1 Produced 2 [Parallel(n_jobs=2)]: Done 1 out of 3+ elapsed: 0.0s remaining: 0.0s Produced 3 [Parallel(n_jobs=2)]: Done 2 out of 4+ elapsed: 0.0s remaining: 0.0s Produced 4 [Parallel(n_jobs=2)]: Done 3 out of 5+ elapsed: 0.0s remaining: 0.0s ... I am planning to release in a few days joblib 0.5.0 with this feature. The release will also contain small improvements that make joblib's caching engine more robust when used with many processes. The soontobereleased code can be found in the 0.5.X branch. I am planning to use this is the near future to improve parallelism in the scikits.learn's GridSearchCV. Any feedback is more than welcome. Gael 
From: Gael Varoquaux <gael.varoquaux@no...>  20110227 17:46:48

On Sun, Feb 27, 2011 at 06:40:49PM +0100, Olivier Grisel wrote: > Also has anyone tried to use the Randomized truncated SVD (a.k.a > fast_svd in the scikit) of X.Xt instead to extract the embedding of > spectral clustering? I guess it's not as fast as methods that > explicitly leverage the symmetry but could be worth a try anyway no? No, and I must admit that I would be interested in what comes out. However, I am very much worried about the stability of the results. The reason being that the orthogonality of the eigenvector often ends up being very important. This is why the problem is kept symmetric during the normalization step. On the other hand, It would be interesting to see if a randomized projection symmetric solver could be found. One endaveour that would be really useful for the scikit would be to extract the relevent part of pyamg and integrate a preconditionner for large sparse symmetric problems. It's a need that comes quite often in machine learning, and it would give us an edge. I had a quick look, and it seemed feasible. Unlike randomized projection methods, it would actually improve the numerical stability, on top of the speed. G 
From: Matthieu Brucher <matthieu.brucher@gm...>  20110227 17:46:31

> Ok. > > Also has anyone tried to use the Randomized truncated SVD (a.k.a > fast_svd in the scikit) of X.Xt instead to extract the embedding of > spectral clustering? I guess it's not as fast as methods that > explicitly leverage the symmetry but could be worth a try anyway no? > In spectral clustering, the matrix is generally symmetric (as far as I've seen during my thesis, the main algorithms always tackle such matrices). fast_svd was to be tested in the manifold module (for diffusion maps, because the input is dense), but I've given up on making it acceptable. Matthieu  Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher 
From: Olivier Grisel <olivier.grisel@en...>  20110227 17:41:19

2011/2/27 Matthieu Brucher <matthieu.brucher@...>: > > > 2011/2/27 Gael Varoquaux <gael.varoquaux@...> >> >> On Sun, Feb 27, 2011 at 06:27:14PM +0100, Olivier Grisel wrote: >> > I have been quickly reading about arpack and AMGbased >> > preconditioning. Very interesting stuff. One question though: in the >> > arpack based impl, we take the Normalized Laplacian eigenvectors with >> > largest eigenvalues while the pyamg variant uses the lowest >> > eigenvalues. Is this a bug or am I missing something? >> >> :) >> >> If I am not wrong in one case we take the smallest eigenvalue of the >> Laplacien, whereas in the other case, we take the largest eigenvalue of >> (I  Laplacien), where I is the identity. In the case of a normalized >> Laplacian, all it's eigenvalues are between 0 and 1, thus we are simply >> transforming a smallest eigenvalue problem to a biggest eigenvalue >> problem. > > +1 on this > >> >> > Also it we should add some references to the paper that is implemented >> > (e.g. in the docstring of the class and the sphinx doc). AFAIK there >> > are two distinct variant of normalizedlaplacian spectral clustering >> > but I don't know them well enough to tell which one is implemented >> > here. > > There are indeed two approaches, but they are strictly the same (it's just > exactly what we have in the scikit, one takes the largest eigenvalues, the > other the lowest, but they are theoretically similar (cf the original > Laplacian Eigenmap algorithm compared to Diffusion maps). Ok. Also has anyone tried to use the Randomized truncated SVD (a.k.a fast_svd in the scikit) of X.Xt instead to extract the embedding of spectral clustering? I guess it's not as fast as methods that explicitly leverage the symmetry but could be worth a try anyway no?  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Olivier Grisel <olivier.grisel@en...>  20110227 17:37:42

2011/2/27 Gael Varoquaux <gael.varoquaux@...>: > On Sun, Feb 27, 2011 at 06:27:14PM +0100, Olivier Grisel wrote: >> I have been quickly reading about arpack and AMGbased >> preconditioning. Very interesting stuff. One question though: in the >> arpack based impl, we take the Normalized Laplacian eigenvectors with >> largest eigenvalues while the pyamg variant uses the lowest >> eigenvalues. Is this a bug or am I missing something? > > :) > > If I am not wrong in one case we take the smallest eigenvalue of the > Laplacien, whereas in the other case, we take the largest eigenvalue of > (I  Laplacien), where I is the identity. In the case of a normalized > Laplacian, all it's eigenvalues are between 0 and 1, thus we are simply > transforming a smallest eigenvalue problem to a biggest eigenvalue > problem. > >> Also it we should add some references to the paper that is implemented >> (e.g. in the docstring of the class and the sphinx doc). AFAIK there >> are two distinct variant of normalizedlaplacian spectral clustering >> but I don't know them well enough to tell which one is implemented >> here. > > It's a mess. The best reference I know is the tutorial on spectral > clustering, by Von Luxburg, but it still takes some reading. > >> Also can AMG preconditioning be used to solved truncated SVD on sparse >> data efficiently or does it only work for symmetric eigen problems? > > I think that it works much better on symmetric problems. > >> Do you think we should extended the PCA class to use the >> scipy.sparse.linalg.arpack module when the data is sparse and the >> n_components is given? > > Maybe. I think that this should be tried. But I'd like checks for > numerical stability and speed: arpack can be 'surprising'. Om thanks for your answers. BTW, I will merge your fixed on the lfw branch soon :).  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Matthieu Brucher <matthieu.brucher@gm...>  20110227 17:36:56

2011/2/27 Gael Varoquaux <gael.varoquaux@...> > On Sun, Feb 27, 2011 at 06:27:14PM +0100, Olivier Grisel wrote: > > I have been quickly reading about arpack and AMGbased > > preconditioning. Very interesting stuff. One question though: in the > > arpack based impl, we take the Normalized Laplacian eigenvectors with > > largest eigenvalues while the pyamg variant uses the lowest > > eigenvalues. Is this a bug or am I missing something? > > :) > > If I am not wrong in one case we take the smallest eigenvalue of the > Laplacien, whereas in the other case, we take the largest eigenvalue of > (I  Laplacien), where I is the identity. In the case of a normalized > Laplacian, all it's eigenvalues are between 0 and 1, thus we are simply > transforming a smallest eigenvalue problem to a biggest eigenvalue > problem. > +1 on this > > Also it we should add some references to the paper that is implemented > > (e.g. in the docstring of the class and the sphinx doc). AFAIK there > > are two distinct variant of normalizedlaplacian spectral clustering > > but I don't know them well enough to tell which one is implemented > > here. > There are indeed two approaches, but they are strictly the same (it's just exactly what we have in the scikit, one takes the largest eigenvalues, the other the lowest, but they are theoretically similar (cf the original Laplacian Eigenmap algorithm compared to Diffusion maps). Matthieu  Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher 
From: Gael Varoquaux <gael.varoquaux@no...>  20110227 17:34:34

On Sun, Feb 27, 2011 at 06:27:14PM +0100, Olivier Grisel wrote: > I have been quickly reading about arpack and AMGbased > preconditioning. Very interesting stuff. One question though: in the > arpack based impl, we take the Normalized Laplacian eigenvectors with > largest eigenvalues while the pyamg variant uses the lowest > eigenvalues. Is this a bug or am I missing something? :) If I am not wrong in one case we take the smallest eigenvalue of the Laplacien, whereas in the other case, we take the largest eigenvalue of (I  Laplacien), where I is the identity. In the case of a normalized Laplacian, all it's eigenvalues are between 0 and 1, thus we are simply transforming a smallest eigenvalue problem to a biggest eigenvalue problem. > Also it we should add some references to the paper that is implemented > (e.g. in the docstring of the class and the sphinx doc). AFAIK there > are two distinct variant of normalizedlaplacian spectral clustering > but I don't know them well enough to tell which one is implemented > here. It's a mess. The best reference I know is the tutorial on spectral clustering, by Von Luxburg, but it still takes some reading. > Also can AMG preconditioning be used to solved truncated SVD on sparse > data efficiently or does it only work for symmetric eigen problems? I think that it works much better on symmetric problems. > Do you think we should extended the PCA class to use the > scipy.sparse.linalg.arpack module when the data is sparse and the > n_components is given? Maybe. I think that this should be tried. But I'd like checks for numerical stability and speed: arpack can be 'surprising'. Gaël 
From: Olivier Grisel <olivier.grisel@en...>  20110227 17:27:41

Hi, I have been quickly reading about arpack and AMGbased preconditioning. Very interesting stuff. One question though: in the arpack based impl, we take the Normalized Laplacian eigenvectors with largest eigenvalues while the pyamg variant uses the lowest eigenvalues. Is this a bug or am I missing something? Also it we should add some references to the paper that is implemented (e.g. in the docstring of the class and the sphinx doc). AFAIK there are two distinct variant of normalizedlaplacian spectral clustering but I don't know them well enough to tell which one is implemented here. Also can AMG preconditioning be used to solved truncated SVD on sparse data efficiently or does it only work for symmetric eigen problems? Do you think we should extended the PCA class to use the scipy.sparse.linalg.arpack module when the data is sparse and the n_components is given?  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Olivier Grisel <olivier.grisel@en...>  20110225 01:13:13

Back on this, I think I have found a better solution: I wrote a doctest fixture that test whether the data cache folder '~/scikit_learn_data` has been previously initialized or not and raise SkipTest if missing: https://github.com/scikitlearn/scikitlearn/pull/85  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Alexandre Gramfort <alexandre.gramfort@in...>  20110224 21:22:07

Hi, following this old thread I've put up a list of ideas for a GSOC: https://github.com/scikitlearn/scikitlearn/wiki/AlistoftopicsforaGooglesummerofcode(GSOC)2011 do not hesitate to add ideas and propose yourself as a mentor :) Alex On Mon, Jan 31, 2011 at 1:29 PM, Robert Kern <robert.kern@...> wrote: > On Mon, Jan 31, 2011 at 12:26, Mathieu Blondel <mathieu@...> wrote: >> On Tue, Feb 1, 2011 at 2:48 AM, Gael Varoquaux >> <gael.varoquaux@...> wrote: >> >>> Scipy cannot sponsort. It would be PSF via Scipy, or we could try others >> >> Why not? Does an organization needs to legally exist (e.g., as a >> foundation) to be eligible? > > There is no fundamental reason, but every time we have applied as a > separate mentoring organization, Google has told us to direct mentors > and students to apply under the PSF. > >  > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." >  Umberto Eco > >  > Special Offer Download ArcSight Logger for FREE (a $49 USD value)! > Finally, a worldclass log management solution at an even better pricefree! > Download using promo code Free_Logger_4_Dev2Dev. Offer expires > February 28th, so secure your free ArcSight Logger TODAY! > http://p.sf.net/sfu/arcsightsfd2d > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Brent Pedersen <bpederse@gm...>  20110224 18:01:04

On Thu, Feb 24, 2011 at 10:54 AM, Olivier Grisel <olivier.grisel@...> wrote: > 2011/2/24 Aman Thakral <aman.thakral@...>: >> Hi Brent, >> >> In PCA, the components can be thought of as linear combinations of the >> features. So no features are "chosen" rather, they are have associated >> loadings. To see which features make significant contributions, it's best >> to look at the loadings plot. > > Aman is right. > > Brent: if you want to do feature selection directly in the feature > space, not the singular space extracted by PCA, you can try one of the > following strategies: > > http://www.quora.com/Whataresomefeatureselectionmethods > thanks guys, i will read up and try to understand this better. 
From: Olivier Grisel <olivier.grisel@en...>  20110224 17:55:01

2011/2/24 Aman Thakral <aman.thakral@...>: > Hi Brent, > > In PCA, the components can be thought of as linear combinations of the > features. So no features are "chosen" rather, they are have associated > loadings. To see which features make significant contributions, it's best > to look at the loadings plot. Aman is right. Brent: if you want to do feature selection directly in the feature space, not the singular space extracted by PCA, you can try one of the following strategies: http://www.quora.com/Whataresomefeatureselectionmethods  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Aman Thakral <aman.thakral@gm...>  20110224 17:21:21

Hi Brent, In PCA, the components can be thought of as linear combinations of the features. So no features are "chosen" rather, they are have associated loadings. To see which features make significant contributions, it's best to look at the loadings plot. Does this help? Aman On Thu, Feb 24, 2011 at 12:12 PM, Brent Pedersen <bpederse@...> wrote: > hi, i'm following the iris example for PCA here: > http://scikitlearn.sourceforge.net/auto_examples/plot_pca.html > > 2 components are chosen of the 4 features, how can one tell which of > those features are chosen? > my guess is something like: > > >>> features = np.array(["sepal length", "sepal width", "petal length", > "petal width"]) > >>> order = pca.components_.sum(axis=1).argsort() > # take n_components with highest sum from pca.components_ > >>> keep = order[order >= len(order)  n_components] > > >>> components = features[keep] > >>> components > ['petal width' 'petal length'] > > > does that look correct? is there a simpler way to do this? > thanks, > brent > > >  > Free Software Download: Index, Search & Analyze Logs and other IT data in > RealTime with Splunk. Collect, index and harness all the fast moving IT > data > generated by your applications, servers and devices whether physical, > virtual > or in the cloud. Deliver compliance at lower cost and gain new business > insights. http://p.sf.net/sfu/splunkdev2dev > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Brent Pedersen <bpederse@gm...>  20110224 17:13:00

hi, i'm following the iris example for PCA here: http://scikitlearn.sourceforge.net/auto_examples/plot_pca.html 2 components are chosen of the 4 features, how can one tell which of those features are chosen? my guess is something like: >>> features = np.array(["sepal length", "sepal width", "petal length", "petal width"]) >>> order = pca.components_.sum(axis=1).argsort() # take n_components with highest sum from pca.components_ >>> keep = order[order >= len(order)  n_components] >>> components = features[keep] >>> components ['petal width' 'petal length'] does that look correct? is there a simpler way to do this? thanks, brent 
From: Chandrika Bhardwaj <chandrika1004@ii...>  20110224 16:59:24

Exception NotImplementedError: NotImplementedError() in 'scikits.learn.linear_mo del.sgd_fast.Classification.dloss' ignored Exception NotImplementedError: NotImplementedError() in 'scikits.learn.linear_mo del.sgd_fast.Classification.loss' ignored Exception NotImplementedError: NotImplementedError() in 'scikits.learn.linear_mo del.sgd_fast.Classification.dloss' ignored ...............Warning: invalid value encountered in divide Warning: invalid value encountered in divide Warning: invalid value encountered in divide ...Warning: invalid value encountered in divide Warning: invalid value encountered in divide ................................................................................ ................................................................................ ............. ====================================================================== FAIL: Check that SGD gives any results :)  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 51, in test_sgd assert_array_equal(clf.predict(T), true_result) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 2]) ====================================================================== FAIL: Test L1 regularization  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 192, in test_sgd_l1 assert_array_equal(pred, Y) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 50.0%) x: array([ 1., 1., 1., 1., 1., 1., 1., 1.]) y: array([1, 2, 1, 2, 2, 2, 1, 1]) ====================================================================== FAIL: Multiclass test case  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 123, in test_sgd_multiclass assert_array_equal(pred, true_result2) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 3]) ====================================================================== FAIL: Multiclass test case with multicore support  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 142, in test_sgd_multiclass_njobs assert_array_equal(pred, true_result2) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 3]) ====================================================================== FAIL: Multiclass test case  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 133, in test_sgd_multiclass_with_init_coef assert_array_equal(pred, true_result2) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 3]) ====================================================================== FAIL: Check that SGD gives any results :)  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 51, in test_sgd assert_array_equal(clf.predict(T), true_result) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 2]) ====================================================================== FAIL: Test L1 regularization  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 192, in test_sgd_l1 assert_array_equal(pred, Y) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 50.0%) x: array([ 1., 1., 1., 1., 1., 1., 1., 1.]) y: array([1, 2, 1, 2, 2, 2, 1, 1]) ====================================================================== FAIL: Multiclass test case  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 123, in test_sgd_multiclass assert_array_equal(pred, true_result2) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 3]) ====================================================================== FAIL: Multiclass test case with multicore support  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 142, in test_sgd_multiclass_njobs assert_array_equal(pred, true_result2) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 3]) ====================================================================== FAIL: Multiclass test case  Traceback (most recent call last): File "C:\Python26\lib\sitepackages\scikits\learn\linear_model\tests\test_sgd. py", line 133, in test_sgd_multiclass_with_init_coef assert_array_equal(pred, true_result2) File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 686, in asse rt_array_equal verbose=verbose, header='Arrays are not equal') File "C:\Python26\lib\sitepackages\numpy\testing\utils.py", line 618, in asse rt_array_compare raise AssertionError(msg) AssertionError: Arrays are not equal (mismatch 66.6666666667%) x: array([ 1., 1., 1.]) y: array([1, 2, 3])  Ran 601 tests in 385.074s FAILED (SKIP=2, failures=10) C:\Python26> 
From: Fabian Pedregosa <fabian.pedregosa@in...>  20110223 12:34:08

On Wed, Feb 23, 2011 at 11:31 AM, Olivier Grisel <olivier.grisel@...> wrote: > It seems that the numpy version (1.3.0) installed on the buildbot > server does not feature the assert_allclose method: Thanks for the headsup, should be fixed now. Fabian. 
From: Olivier Grisel <olivier.grisel@en...>  20110223 10:31:41

It seems that the numpy version (1.3.0) installed on the buildbot server does not feature the assert_allclose method: ====================================================================== ERROR: Failure: ImportError (cannot import name assert_allclose)  Traceback (most recent call last): File "/usr/local/lib/python2.6/distpackages/nose/loader.py", line 390, in loadTestsFromName addr.filename, addr.module) File "/usr/local/lib/python2.6/distpackages/nose/importer.py", line 39, in importFromPath return self.importFromDir(dir_path, fqname) File "/usr/local/lib/python2.6/distpackages/nose/importer.py", line 86, in importFromDir mod = load_module(part_fqname, fh, filename, desc) File "/home/ccomb/scikitlearnslave/full/build/scikits/learn/tests/test_neighbors.py", line 2, in <module> from numpy.testing import assert_array_almost_equal, assert_array_equal, \ ImportError: cannot import name assert_allclose  Olivier  Forwarded message  From: <olivier.grisel@...> Date: 2011/2/23 Subject: [Scikitlearncommits] buildbot failure in Scikit Learn on ubuntu64bit To: scikitlearncommits@... The Buildbot has detected a failed build on builder ubuntu64bit while building Scikit Learn. Full details are available at: http://buildbot.afpy.org/scikitlearn/builders/ubuntu64bit/builds/3102 Buildbot URL: http://buildbot.afpy.org/scikitlearn/ Buildslave for this Build: scikitlearnslave Build Reason: The webpage 'rebuild' button was pressed by '<unknown>': installed the latest stable version of nose (1.0.0) on the server Build Source Stamp: 340761d8dac4c9f389d1443d7c4f57129ac61567 Blamelist: BUILD FAILED: failed test sincerely, The Buildbot  Free Software Download: Index, Search & Analyze Logs and other IT data in RealTime with Splunk. Collect, index and harness all the fast moving IT data generated by your applications, servers and devices whether physical, virtual or in the cloud. Deliver compliance at lower cost and gain new business insights. http://p.sf.net/sfu/splunkdev2dev _______________________________________________ Scikitlearncommits mailing list Scikitlearncommits@... https://lists.sourceforge.net/lists/listinfo/scikitlearncommits  Olivier http://twitter.com/ogrisel  http://github.com/ogrisel 
From: Aman Thakral <aman.thakral@gm...>  20110222 15:56:22

Hi Fabian, This fixed the problem. Thank you very much, Aman On Mon, Feb 21, 2011 at 3:24 AM, Fabian Pedregosa <fabian.pedregosa@... > wrote: > On Mon, Feb 21, 2011 at 3:17 AM, Aman Thakral <aman.thakral@...> > wrote: > > Hi, > > > > I've recently tried to build from the source using the instructions > > for git. I'm receiving the following error (see below). > > > > I'm doing a windows install using the enthought distribution version 6.3. > > > > I'm not sure if this is an issue with the OS, or the version of numpy > > that is installed. > > > > Any help on this matter would be greatly appreciated. > > Hi Aman. Seems to me similar to this error [0]. I regenerated the C > code for _liblinear.c, so it is probably fixed in the github repo. > Otherwise, you can also try to regenerate the c code yourself: > > $ cython scikits/learn/svm/src/liblinear/_liblinear.pyx > > > Best, > > Fabian. > > > [0] http://www.mailarchive.com/cythondev@.../msg10368.html > > > > > Thanks, > > Aman > > > > > > compile options: 'Iscikits\learn\svm\src > IC:\Python26\lib\sitepackages\numpy\ > > core\include IC:\Python26\lib\sitepackages\numpy\core\include > IC:\Python26\in > > clude IC:\Python26\PC c' > > gcc mnocygwin O2 Wall Wstrictprototypes Iscikits\learn\svm\src > IC:\Pytho > > n26\lib\sitepackages\numpy\core\include > IC:\Python26\lib\sitepackages\numpy\c > > ore\include IC:\Python26\include IC:\Python26\PC c > scikits\learn\svm\src\libl > > inear\_liblinear.c o > build\temp.win322.6\Release\scikits\learn\svm\src\libline > > ar\_liblinear.o > > Found executable C:\Python26\Scripts\gcc.exe > > In file included from scikits\learn\svm\src\liblinear\_liblinear.c:224:0: > > scikits\learn\svm\src\liblinear\/liblinear_helper.c: In function > 'csr_copy_predi > > ct_values': > > scikits\learn\svm\src\liblinear\/liblinear_helper.c:284:10: warning: > unused vari > > able 't' > > scikits\learn\svm\src\liblinear\/liblinear_helper.c: In function > 'csr_copy_predi > > ct_proba': > > scikits\learn\svm\src\liblinear\/liblinear_helper.c:332:12: warning: > unused vari > > able 'temp' > > scikits\learn\svm\src\liblinear\_liblinear.c: In function > '__pyx_pf_10_liblinear > > _0train_wrap': > > scikits\learn\svm\src\liblinear\_liblinear.c:1074:21: warning: assignment > discar > > ds qualifiers from pointer target type > > scikits\learn\svm\src\liblinear\_liblinear.c: In function > '__pyx_pf_10_liblinear > > _1csr_train_wrap': > > scikits\learn\svm\src\liblinear\_liblinear.c:1752:21: warning: assignment > discar > > ds qualifiers from pointer target type > > scikits\learn\svm\src\liblinear\_liblinear.c: In function > '__Pyx_RaiseArgtupleIn > > valid': > > scikits\learn\svm\src\liblinear\_liblinear.c:7091:9: warning: unknown > conversion > > type character 'z' in format > > scikits\learn\svm\src\liblinear\_liblinear.c:7091:9: warning: format '%s' > expect > > s type 'char *', but argument 5 has type 'Py_ssize_t' > > scikits\learn\svm\src\liblinear\_liblinear.c:7091:9: warning: unknown > conversion > > type character 'z' in format > > scikits\learn\svm\src\liblinear\_liblinear.c:7091:9: warning: too many > arguments > > for format > > scikits\learn\svm\src\liblinear\_liblinear.c: In function > '__Pyx_RaiseNeedMoreVa > > luesError': > > scikits\learn\svm\src\liblinear\_liblinear.c:7693:18: warning: unknown > conversio > > n type character 'z' in format > > scikits\learn\svm\src\liblinear\_liblinear.c:7693:18: warning: format > '%s' expec > > ts type 'char *', but argument 3 has type 'Py_ssize_t' > > scikits\learn\svm\src\liblinear\_liblinear.c:7693:18: warning: too many > argument > > s for format > > scikits\learn\svm\src\liblinear\_liblinear.c: In function > '__Pyx_RaiseTooManyVal > > uesError': > > scikits\learn\svm\src\liblinear\_liblinear.c:7701:13: warning: unknown > conversio > > n type character 'z' in format > > scikits\learn\svm\src\liblinear\_liblinear.c:7701:13: warning: too many > argument > > s for format > > scikits\learn\svm\src\liblinear\_liblinear.c: In function '__Pyx_c_absf': > > scikits\learn\svm\src\liblinear\_liblinear.c:7973:25: error: #if with no > express > > ion > > scikits\learn\svm\src\liblinear\_liblinear.c: In function '__Pyx_c_abs': > > scikits\learn\svm\src\liblinear\_liblinear.c:8093:25: error: #if with no > express > > ion > > scikits\learn\svm\src\liblinear\_liblinear.c: At top level: > > > C:\Python26\lib\sitepackages\numpy\core\include/numpy/__multiarray_api.h:1187:1 > > : warning: '_import_array' defined but not used > > > C:\Python26\lib\sitepackages\numpy\core\include/numpy/__ufunc_api.h:196:1: > warn > > ing: '_import_umath' defined but not used > > error: Command "gcc mnocygwin O2 Wall Wstrictprototypes > Iscikits\learn\sv > > m\src IC:\Python26\lib\sitepackages\numpy\core\include > IC:\Python26\lib\site > > packages\numpy\core\include IC:\Python26\include IC:\Python26\PC c > scikits\le > > arn\svm\src\liblinear\_liblinear.c o > build\temp.win322.6\Release\scikits\learn > > \svm\src\liblinear\_liblinear.o" failed with exit status 1 > > > > >  > > The ultimate allinone performance toolkit: Intel(R) Parallel Studio XE: > > Pinpoint memory and threading errors before they happen. > > Find and fix more than 250 security defects in the development cycle. > > Locate bottlenecks in serial and parallel code that limit performance. > > http://p.sf.net/sfu/inteldev2devfeb > > _______________________________________________ > > Scikitlearngeneral mailing list > > Scikitlearngeneral@... > > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > > > > >  > The ultimate allinone performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/inteldev2devfeb > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral > 
From: Gael Varoquaux <gael.varoquaux@no...>  20110222 10:11:05

On Tue, Feb 22, 2011 at 10:23:17AM +0100, Tiziano Zito wrote: > LLE and Hessian LLE are implemented in MDP, you may want to have a look there ;) > I do not know if the implementation is algorithmically optimal, but > we would of course welcome any patches :) Well, to make the MDP code scale better, it would need to work with sparse matrices (as it is written in the comments of lle_nodes :>). I guess that the big deal to achieve efficiency with LLE is to have a good solver for the eigenvalue problem. We'll most probably rely on arpack + a scipy sparse matrix (possibly in diag format if the number of diagonals is small, as it speeds up matvec operations, and thus arpack). We'll use pyamg if available, as it gives a huge boost on large problems. In the long run, we'll probably extract some of the core functionality of pyamg to compute efficiently preconditioners on such systems. I suspect that only a small amount of code needs to be extracted to get a huge performance boost. All this requires compiled code and dependencies that you cannot afford, unfortunately. Gael 
From: Tiziano Zito <opossumnano@gm...>  20110222 09:26:30

hi fabian, > My main focus right now is to put together a manifold module that is > simple and efficient. It's just going to be a couple of algorithms for > now (Locally Linear Embedding and Multi Dimensional Scaling) but I > want it to be algorithmically optimal. I do not have code to share > yet, but it should be ready for the upcoming 0.7, which is planned for > April. LLE and Hessian LLE are implemented in MDP, you may want to have a look there ;) I do not know if the implementation is algorithmically optimal, but we would of course welcome any patches :) ciao, tiziano 
From: Gael Varoquaux <gael.varoquaux@no...>  20110222 08:01:26

On Mon, Feb 21, 2011 at 10:33:14PM +0800, xinfan meng wrote: > > Yes I am +1 to use this a rule of thumb: make the standard fit / > > predict / tranform object oriented interface perform the checks: it > > should probably not be part of the critical path performance wise. The > > high level API should be user friendly with meaning error messages > > when the data is inconsistent. > I think that is reasonable. > Also, do you think there will be other deficiency in data that needs > to be checked as the precondition of the API? I am worried of additional costs. I sometimes reimplement some of the scipy linalg algebra functions to avoid their NaN checking (to the cost that if I get NaNs, I get segfaults). Thus I would try to limit checks. I check on the shape would be an obvious easy and useful check, but if we all use np.atleast_1d, it should not be necessary. Gaël 