On Mon, Nov 22, 2010 at 08:18:27AM -0200, Alexandre Passos wrote:
> On Mon, Nov 22, 2010 at 07:54, Olivier Grisel <olivier.grisel@...> wrote:
> > 2010/11/22 Alexandre Passos <alexandre.tp@...>:
> >> Sure. I'll send a patch later today, after I figure out where in
> >> scikits.learn should I put it. This code is actually a bit faster than
> >> pysparsesvd, although it uses a bit more memory (I couldn't fix this
> >> as the memory is mostly used inside the qr decomposition, I think(.
> > Great: I think you can add it to the scikits.learn.pca package .
> > Maybe also wrap it in a new class called RandomPCA also in the pca module?
> I've attached a patch (make with git format-patch) that adds fast_svd
> to utils.extmath and gives the PCA class an option to use it instead
> of the regular SVD if mle is not selected, as Gael suggested.
It's in! Thanks a lot Alexandre, that was really fast.
Some timings on my laptop, on the faces examples:
In : %timeit PCA(n_comp=100, do_fast_svd=True).fit(X_train)
1 loops, best of 3: 18.3 s per loop
In : %timeit PCA(n_comp=100, do_fast_svd=False).fit(X_train)
1 loops, best of 3: 15 s per loop
So it doesn't improve things for 100 components with a dataset of this
size (n_samples: 1127, n_features: 4096). Timings don't go down as I
decrease the number of components I ask. If I decrease the number of
samples, the ratio between the two implementation stays the same.
Am I doing something wrong? Is it simply that I am not testing on