You can subscribe to this list here.
2010 
_{Jan}
(23) 
_{Feb}
(4) 
_{Mar}
(56) 
_{Apr}
(74) 
_{May}
(107) 
_{Jun}
(79) 
_{Jul}
(212) 
_{Aug}
(122) 
_{Sep}
(289) 
_{Oct}
(176) 
_{Nov}
(531) 
_{Dec}
(268) 

2011 
_{Jan}
(255) 
_{Feb}
(157) 
_{Mar}
(199) 
_{Apr}
(274) 
_{May}
(495) 
_{Jun}
(157) 
_{Jul}
(276) 
_{Aug}
(212) 
_{Sep}
(356) 
_{Oct}
(356) 
_{Nov}
(421) 
_{Dec}
(365) 
2012 
_{Jan}
(530) 
_{Feb}
(236) 
_{Mar}
(495) 
_{Apr}
(286) 
_{May}
(347) 
_{Jun}
(253) 
_{Jul}
(335) 
_{Aug}
(254) 
_{Sep}
(429) 
_{Oct}
(506) 
_{Nov}
(358) 
_{Dec}
(147) 
2013 
_{Jan}
(492) 
_{Feb}
(328) 
_{Mar}
(477) 
_{Apr}
(348) 
_{May}
(248) 
_{Jun}
(237) 
_{Jul}
(526) 
_{Aug}
(407) 
_{Sep}
(253) 
_{Oct}
(263) 
_{Nov}
(202) 
_{Dec}
(184) 
2014 
_{Jan}
(246) 
_{Feb}
(258) 
_{Mar}
(305) 
_{Apr}
(168) 
_{May}
(182) 
_{Jun}
(238) 
_{Jul}
(340) 
_{Aug}
(256) 
_{Sep}
(312) 
_{Oct}
(168) 
_{Nov}
(135) 
_{Dec}
(125) 
2015 
_{Jan}
(75) 
_{Feb}
(326) 
_{Mar}
(440) 
_{Apr}
(277) 
_{May}
(203) 
_{Jun}
(133) 
_{Jul}
(182) 
_{Aug}
(9) 
_{Sep}

_{Oct}

_{Nov}

_{Dec}

S  M  T  W  T  F  S 







1

2

3
(19) 
4
(17) 
5
(15) 
6

7
(9) 
8

9
(4) 
10
(10) 
11
(5) 
12
(2) 
13
(6) 
14

15

16

17

18

19

20
(1) 
21

22

23

24
(1) 
25
(4) 
26
(5) 
27
(2) 
28

29

30

31
(7) 





From: Dan Halbert <halbert@ha...>  20100511 21:15:14

On Tuesday, May 11, 2010 12:41pm, "Fabian Pedregosa" <fabian.pedregosa@...> said: > for the result from scikits.learn, be aware that numpy shows less > decimal places that it actually stores, so it might be showing the exact > same reusult ... Aha, you are right. If I extract the value from the numpy array and ask for its repr(), it shows the full 64bit value, which matches the other value. >.[0based vs 1based indexing; your suggested change to libsvm ptyhon interface] > svmc.svm_node_array_set(data,j,k+1,x[k]) > > it will correctly make 1start index, which is how it should be (from > the README). Can you confirm me this ? I have figured this out. The 0based vs. 1based is a red herring. The differences I saw in decision values are actually due to the modelbeing saved slightly imprecisely to a file and then read back in. The support vector valuesare printed out in full precision, but "rho" and other values are printed with %g format, which gives only six digits of precision. Sowhen the model is read back in, those values will be slightly differentthan the original inmemory values. Commandline "svmtrain" followed by"svmpredict" exercises this problem. Ideally, svm_save_model() shouldprint out all the model values in full precision. 0based vs 1based doesn't matter, as long as the model and the test data match. It is true that libsvm uses 1based indexing for all its examples, but internally 0based works fine as well. (I see how you consistently make things 1based in dense_to_sparse(...)). The libsvm python interface will use 0based if you use dense data.That should probably be documented better in the libsvm README, as there'san example which misleadingly shows a sparse 1based dataset and theequivalent 0based dense dataset. Side note: Have you noticed libsvmdense, available from the libsvm folks as well? It is the libsvm code with a few added #ifdef's to store the problem and model in a simpler vector format. The README says it can be 1.52 times faster for dense data. Dan 
From: Fabian Pedregosa <fabian.pedregosa@in...>  20100511 16:56:53

Yaroslav Halchenko wrote: > On Tue, 11 May 2010, Fabian Pedregosa wrote: > >>> $> python c "import numpy as N; print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='<f8'))" >>> [[ 2.40827914 0. ] >>> [0.90626401 3.00629199]] > >>> $> python c "import numpy as N; print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='>f8'))" >>> Traceback (most recent call last): >>> File "<string>", line 1, in <module> >>> File "/usr/lib/python2.5/sitepackages/numpy/linalg/linalg.py", line 423, in cholesky >>> Cholesky decomposition cannot be computed' >>> numpy.linalg.linalg.LinAlgError: Matrix is not positive definite  Cholesky decomposition cannot be computed > >> I'm sorry that I can't reproduce the problem. > really? both of those calls about work for you just fine? what is > OS/numpy version? I might consider switching  for me it is consistent > both on Debian (sid/lenny amd64 and sid hppa) and redhat (whatever) I meant the scikits failure. > > NB no comment yet on numpy trac: http://projects.scipy.org/numpy/ticket/1482 > >> You can safely comment >> those tests, as I'll rewrite them anyway to use the new dataset format. > no problem  I just excluded them from running in debian/rules  new > package was uploaded and built this time fine across all platforms! ;) > great! 
From: Fabian Pedregosa <fabian.pedregosa@in...>  20100511 16:42:40

Dan Halbert wrote: > > > On Monday, May 10, 2010 3:05pm, "Dan Halbert" <halbert@...> said: > > Using libsvm's python interface, I have narrowed down the #2 vs #3 > difference to > > the model changing when it is saved to a file and read back in. In > other words, > > passing the model through libsvm's routines svm_save_model() and > svm_load_model() > > produces result #3. If the model stays in memory, #2 is the result. > > What I wrote above is not quite correct. The problem is that the model > produced by the commandline executable "svmtrain" is slightly > different than the model produced by doing svm_train() and then doing > svm_save_model(). The difference is in the indices of the model values. > > For example, I generated a linearkernel model with this training data: > 1 1:1 2:2 3:3 > 1 1:4 2:5 3:6 > > The file produced by "svmtrain" is: >  > svm_type c_svc > kernel_type linear > nr_class 2 > total_sv 2 > rho 2.33333 > label 1 1 > nr_sv 1 1 > SV > 0.07407407407407407 1:1 2:2 3:3 > 0.07407407407407407 1:4 2:5 3:6 >  > > The file produced by doing svm_train() on the data above and then doing > svm_save_model() is: >  > svm_type c_svc > kernel_type linear > nr_class 2 > total_sv 2 > rho 2.33333 > label 1 1 > nr_sv 1 1 > SV > 0.07407407407407407 0:1 1:2 2:3 > 0.07407407407407407 0:4 1:5 2:6 >  > > Notice the 1based indexing in the first model file, and the 0based > indexing in the second one. This is enough to cause the decision values > to be different when doing svm_predict. > Thanks for you investigations, for the result from scikits.learn, be aware that numpy shows less decimal places that it actually stores, so it might be showing the exact same reusult ... as for this, I've been looking at the source code from libsvm's wrappings, and I think that maybe if you change line 127 of python/svm.py from svmc.svm_node_array_set(data,j,k,x[k]) to svmc.svm_node_array_set(data,j,k+1,x[k]) it will correctly make 1start index, which is how it should be (from the README). Can you confirm me this ? Cheers, > I will stop cluttering up the list with this stuff now and take it to a > more general libsvm forum. > > Dan > > >  > >  > > > >  > > _______________________________________________ > Scikitlearngeneral mailing list > Scikitlearngeneral@... > https://lists.sourceforge.net/lists/listinfo/scikitlearngeneral 
From: Yaroslav Halchenko <sf@on...>  20100511 16:28:57

On Tue, 11 May 2010, Fabian Pedregosa wrote: > > $> python c "import numpy as N; print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='<f8'))" > > [[ 2.40827914 0. ] > > [0.90626401 3.00629199]] > > $> python c "import numpy as N; print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='>f8'))" > > Traceback (most recent call last): > > File "<string>", line 1, in <module> > > File "/usr/lib/python2.5/sitepackages/numpy/linalg/linalg.py", line 423, in cholesky > > Cholesky decomposition cannot be computed' > > numpy.linalg.linalg.LinAlgError: Matrix is not positive definite  Cholesky decomposition cannot be computed > I'm sorry that I can't reproduce the problem. really? both of those calls about work for you just fine? what is OS/numpy version? I might consider switching  for me it is consistent both on Debian (sid/lenny amd64 and sid hppa) and redhat (whatever) NB no comment yet on numpy trac: http://projects.scipy.org/numpy/ticket/1482 > You can safely comment > those tests, as I'll rewrite them anyway to use the new dataset format. no problem  I just excluded them from running in debian/rules  new package was uploaded and built this time fine across all platforms! ;)  .. = /v\ = Keep in touch // \\ (yoh@www.)onerussian.com Yaroslav Halchenko /( )\ ICQ#: 60653192 Linux User ^^^^ [175555] 
From: Fabian Pedregosa <fabian.pedregosa@in...>  20100511 16:13:32

Yaroslav Halchenko wrote: > unfortunately although it does fix the issue with loading of the data, > it seems to run into numpy's bug while working with nonnative (i.e. > littleendian on bigendian) byte order: > > (Pdb) print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='<f8')) > *** LinAlgError: Matrix is not positive definite  Cholesky decomposition cannot be computed > (Pdb) print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]])) > [[ 2.40827914 0. ] > [0.90626401 3.00629199]] > > (Pdb) print N.linalg.svd(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='<f8')) > (array([[ 0., 1.], > [1., 0.]]), array([ 1.29365302e+157, 1.29365302e+157]), array([[ 1., 0.], > [ 0., 1.]])) > (Pdb) print N.linalg.svd(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]])) > (array([[0.39937903, 0.9167859 ], > [ 0.9167859 , 0.39937903]]), array([ 10.80988342, 4.84903093]), array([[0.39937903, 0.9167859 ], > [ 0.9167859 , 0.39937903]])) > > even on my laptop (regular little endian thingie) I can recreate similar scenario: > > $> python c "import numpy as N; print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='<f8'))" > [[ 2.40827914 0. ] > [0.90626401 3.00629199]] > > $> python c "import numpy as N; print N.linalg.cholesky(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='>f8'))" > Traceback (most recent call last): > File "<string>", line 1, in <module> > File "/usr/lib/python2.5/sitepackages/numpy/linalg/linalg.py", line 423, in cholesky > Cholesky decomposition cannot be computed' > numpy.linalg.linalg.LinAlgError: Matrix is not positive definite  Cholesky decomposition cannot be computed > > $> python c "import numpy as N; print N.linalg.svd(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='<f8'))" > (array([[0.39937903, 0.9167859 ], > [ 0.9167859 , 0.39937903]]), array([ 10.80988342, 4.84903093]), array([[0.39937903, 0.9167859 ], > [ 0.9167859 , 0.39937903]])) > > $> python c "import numpy as N; print N.linalg.svd(N.array([[ 5.7998084, 2.1825367 ], [2.1825367, 9.85910595]], dtype='>f8'))" > (array([[ 0., 1.], > [1., 0.]]), array([ 1.29365302e+157, 1.29365302e+157]), array([[ 1., 0.], > [ 0., 1.]])) > > > I guess I will: > > 1. report a bug against numpy on Debian (#581043) > 2. skip this unittest while building scikitlearn package on Debian for now > > alternatively I could look into adjusting load_dataset so it ensures > native byte order on the loaded data...? not sure yet if it is worth it... will check I'm sorry that I can't reproduce the problem. You can safely comment those tests, as I'll rewrite them anyway to use the new dataset format. Cheers, fabian > > On Mon, 10 May 2010, Yaroslav Halchenko wrote: > > >> On Mon, 10 May 2010, Fabian Pedregosa wrote: > >>> I'm currently working on this module, and these imports will probably be >>> dropped (i.e. don't worry about this, it will be fixed in the next release) >> meanwhile I've pushed the "fix" to the master... since it was so minimal >> and had no impact on littleendian platforms  I just went ahead with >> the push without asking for review. hope that is ok 