2010/10/24 Gael Varoquaux <gael.varoquaux@...>:
> On Sun, Oct 24, 2010 at 10:48:36PM +0200, Matthieu Brucher wrote:
>> I hope that in March, the manifold module will be available, so you
>> may try something differnet than SVD (they don't find nice embedded
>> dimensions, contrary to Laplacian EigenMaps, LLE, ...).
> But what is going to be the compared speed? The SVD on big data already
> take a lot of time, although they give good results. Do you have any
> intuition on what the cost/gain will be?
> I certainly do not want to sound pessimistic, I am just so much used to
> hearing people bashing PCA, and not to be able run the alternatives on my
Let's face it: SVD sucks at describing high-dimensional data. OK, it
is a bit longer to get an appropriate reduced space, but once you have
it, you have a factor more than 10 between the dimensions of the
respective reduced space, which is then a tremondous advantage when
doing KNN, mean-shift, SVM, RVM, ...
First, you don't have to diagonalize the entire matrix. Only the 5
main eigenvectors of the associated matrix are generally needed
(compared to the 100 of SVD). Now, SVD is what? O(n^3)? The other
embeding algorithms have the same complexity. Diffusion Maps should
have the same complexity (it's only the construction that is
different, there is no neighbor search like Laplacian Eigenmaps), then
Laplacian Eigenmaps have a graph construction in O(n^2) I think, like
LLE, and Isomap has a distance matrix to compute, which is at worst
But as I've said, the main gain is in the better description with less
dimension and thus far greater robustness. Then it's a balance to find
between speed and robustness.
Information System Engineer, Ph.D.