powguma
2013-11-28
Again, thank you very much for this awesome library. I am starting to get some positive results using the Waffles machine learning algorithms.
After running model.predict(), is there a way to show how well it did? (or a matching score? forgive me for possibly butchering terminologies) I'm not looking for MSE of the whole model, just the accuracy of one particular predict(). Also is it possible to show the second or third best prediction results? For example, for a KNN model, I want to see the k nearest neighbor values, and for a decision tree, the next nearest node.
Waffles provides me a lot of options to tinker with already, but I have a need for these functions which I couldn't find. The reason I need these is to compare the results or combine results because I have doubts about my best predictions.
I'm a machine learning beginner, and I do wonder if I may be digging a wrong hole. If so, please advise me :(
Mike Gashler
2013-11-28
It sounds like you are looking for model.predictDistribution(). It returns a distribution of predicted probability over all possible label values. Unfortunately, this feature is not very mature. Some of my models implement it rather poorly, so some experimentation and/or hacking may be necessary to get the results you want.
powguma
2013-12-04
Thank you for your reply Dr. Gashler.
Although the feature is unsupported, it's still good to learn it's implemented in Waffles. I'll try to make the best out of it, and if I get somewhere I'll let you know!
Anonymous
2014-03-01
Hi everyone,
I have the same need with a trained naive bayes. I have just one label and wrote the following in order to get the confidence (i.e. prediction value):
GPrediction out;
double posib[1];
posib[0] = 0;
nbModel->predictDistribution(posib, &out);
posib[0] = 1;
nbModel->predictDistribution(posib, &out);
But it does not work (I get an unknown exception). Would you please do me a favor and give me some explanations.
Thanks in advance.
Mike Gashler
2014-03-01
I have not been able to reproduce this behavior from the information given. If you would like to e-mail me a repro, I will be happy to take a look.
Anonymous
2014-03-02
Thanks a lot Dr. Gashler,
I found the problem! predictDistribution() is working flawlessly. It was due to my misunderstanding about the input parameter of this function. In fact the first parameter should be a double* which is my new pattern. I wrongly thought it is one of my label values (0 or 1). Sorry for my misunderstanding. Now, the function works fine with my pattern vector.
But, Still I don't know how can I use the output of predictDistribution to get the corresponding probability. Maybe I should write something like this:
:::c
Would you please explain me the way I can get the probabilities of Naive Bayes using the output of predictDistribution()?
Thanks in advance,
Mike Gashler
2014-03-03
That's right.
GCategoricalDistribution* pCat = out.asCategorical();
double prob = pCat->likelihood(0);
Anonymous
2014-03-04
Thanks for your straightforward answer Dr Gashler,
Using the above code, the compiler complains about "forward declaration".
The exact error is:
forward declaration of ‘struct GClasses::GCategoricalDistribution'
invalid use of incomplete type ‘struct GClasses::GCategoricalDistribution’
But I have not used any forward declaration! The only forward declaration I could find is in Waffles' GDistribution class. I have a class called "Classifier" and write the above code in one of its functions. Would you please explain if you have any idea about this building error?
Mike Gashler
2014-03-04
If you #include "GNaiveBayes.h", it indirectly includes "GLearner.h", which forward declares the GCategoricalDistribution class. So, I think the solution might be to #include "GDistribution.h", which provides full declaration of the GCategoricalDistribution class.
Anonymous
2014-03-05
Thanks a million for the informative answer. As you mentioned the solution was including "GDistribution.h". But I am a little confused about the output of function likelihood(). I have got the following results in my binary-label problem for a specific input pattern:
prob0=1 prob1=4.59525e-231
There are two points which I would appreciate if you explain about:
1- It seems that Waffles does some kind of "rounding off". Is it right?
2- Is it write to say that Prob0+ prob1 =1? I mean, can I calculate the exact value of prob0 as the following? prob0 = 1-4.59525e-231? (i.e prob 0 = 0.99999...)
Thanks again
Mike Gashler
2014-03-05
I'm not quite sure what you mean by rounding off. GNaiveBayes does support Laplacian smoothing (or "equivalent sample size"), which often improves predictive accuracy. This may be considered to be a form of rounding. You can remove this factor if you want by calling setEquivalentSampleSize(0.0);.
For reasons of numerical stability, GNaiveBayes computes probabilities in logarithmic space. When you call GPredictDistribution, it converts back to probability space. (See GCategoricalDistribution::normalizeFromLogSpace in GDistribution.cpp.) I suspect that this is where that tiny amount of numerical precision is getting lost. (1e-231 is a sufficiently tiny amount that it is probably not worth worrying about. It has been estimated that there are only 1e82 atoms in the entire observable Universe.)
Anonymous
2014-03-12
Thanks for your informative answer Dr. Gashler. I should confess that Waffles is the most well-designed versatile ML tool for c++ which I have seen ever. Thanks again for devoting your time to such a nice tool.
Anonymous