Re: [Dclib-devel] pegasos in dlib

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

My first guess would be that you are limiting it to too few support
vectors.   Underneath the svm_pegasos algorithm in dlib is an object (the
kcentroid) that basically just keeps a linearly independent set of sample
vectors.  If you tell svm_pegasos to use only 20 "support vectors" then the
kcentroid internally tries to find a set of 20 sample vectors that best span
the vector space induced by the kernel you use.  The resulting decision
function is a linear combination of these vectors so if, as you noted, it
picks bad ones then you most likely won't get a good answer.   But if you
picked ones that really did span the space covered by your samples then you
could apply an SVM learning algorithm using just that set and there wouldn't
be any local minima to get stuck in.  The catch is you have to pick a nice
spanning set somehow.  This is what dlib::kcentroid tries to do and
svm_pegasos is built on stop of it.  So the success of the dlib pegasos
implementation is predicated on the capability of the kcentroid to crate a
good spanning set.

So what do you do right?  I use svm_pegasos all the time and there are some
things you have to watch for related to how it approximates the spanning set
of vectors.   The first is if you set the tolerance too high then it might
not use enough support vectors even if you set the limit to 100000.  So for
example, the other day I was using it and it tried to use just 1 support
vector.  Obviously no good.  So if the number of SVs you get out of the
algorithm is less than whatever you set as the max allowable number (via
set_max_num_sv()) then this is because of the tolerance value being large.
The default is 0.01 but 0.001 is sometimes more appropriate.  I should
probably change it to be the default.    As an aside, the reason I don't
depsense with the tolerance (essentially setting it to 0) and let the cap on
support vectors be the only thing that determines how many you end up with
is because using a small tolerance and a SV cap together tends to make the
algorithm a whole lot faster.  The only problem is when the tolerance is too
big, then you run into trouble.  But this is easy to spot.

Similarly, if you just set the SV limit to be too low to accurately
represent the span of all the samples (in the feature space induced by your
kernel) then the result will probably be chaotic and bad.  So try using
more.  The only times I have seen svm_pegasos get weird are when I was using
too few SVs.  The more you use the slower it gets but it also gets more
accurate.   But usually, at least with the datasets I typically encounter, I
get by just fine with under about 150 support vectors.  Often under half
that is fine.  But it all depends on your dataset.   For example,  if you
had a 2D dataset that was a checkerboard and you wanted to separate the
white squares from the black squares you would need, at the very minimum
(assuming you used a gaussian kernel), a number of support vectors equal to
the number of squares on your board.

My second guess is that you aren't letting the algorithm run long enough.
Are you using the batch() or verbose_batch() training adapters (like in this
example: http://dclib.sourceforge.net/svm_pegasos_ex.cpp.html)?  I would say
let the algorithm run until the learning rate has dropped below something
like 0.1.   If you stop the algorithm before convergence then it definitely
won't work well.  Note that the time it takes to converge is also a function
of what you set as the value of lambda (I typically use lambda values
between 1e-4 and 1e-7, but YMMV).

What kernel are you using?

Another thing to think about is trying to remove some features.  Generally,
the more features you have the more vectors it takes to span the space
inhabited by your samples.   There are a lot of feature selection algorithms
out there.  Currently there is only one in dlib (the rank_features()
function) and you might give it a go and see what it says.

Anyway, I'm carrying on...   I would suggest setting the tolerance to 0.001
and trying maybe a max SV number of 100 and seeing what that gets you as a
starting point.  I would also make sure you normalized your features if you
haven't already.  That usually helps a lot (you might use
dlib::vector_normalizer).

Hope the above helps.  Certainly let me know if it doesn't.

I'm just an engineer from Maryland.  Not much to tell really :)  I work in a
group at Northrop Grumman that does a lot of image processing and computer
vision kinds of things.   They don't sponsor dlib or anything though.  It is
just a project I started back as an undergrad and work on for fun on my own
time.

Cheers,
Davis

On Fri, Jul 24, 2009 at 8:35 AM, Alex Siegel <al...@st...> wrote:

> I think I understand the pegasos API pretty well, and I've read the
> relevant research papers. One aspect that has been a source of frustration
> is the lack of predictability in the quality of the output decision
> function. As I'm searching the parameter space for optimal values, I'm
> finding that very small changes produce large, random variations in the
> classification percentage. Adding or removing features has a similar effect.
> It seems like pegasos is too easily trapped in local minima during its
> optimization, and it's making poor decisions about what support vectors to
> start with. The data I'm working with is extremely noisy, but there is a lot
> of it (65,000 samples with 50 features). I'd appreciate some insight into
> how to improve performance.
>
> Frankly, I'm also a little curious about you as a person. Your developer
> page has no information on it. Where do you live? Where do you work?
>
>
> On Thu, Jul 23, 2009 at 11:42 AM, Davis King <
> dav...@us...> wrote:
>
>> Thanks, good to know you find it useful :)   I'm not particular, you can
>> email me and the developer list, or post in the help or open discussion
>> forms on dlib's sourceforge page.   Whatever floats your boat.
>>
>> So what's your question?
>>
>> Cheers,
>> Davis
>
>
>