Waffles / Discussion / Help: What accuracy is returned for "tes...

Nobody/Anonymous - 2010-12-17

I'm running test on a baseline model I created for regression (continuous
values for output). The output is 237.21. What does that mean? It doesn't look
like it's the average error or sum of all the errors. Could you please advise?

Also, I'm not really clear on what algorithms can be used for regression and
what can be used for classification? Could you clarify this as well?

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Mike Gashler - 2010-12-29

Sorry for the delayed reply. (SourceForge used to notify me when the forum was
updated, but I haven't been receiving any notifications lately.)

Baseline is probably the poorest of all learners. (This is by design as it is
used as a "baseline" for comparison.) For regression, it always predict the
centroid value in the training set, and for classification it always predicts
the most common label.

The best models for regression are GKNN (if you have very few feature
dimensions) and GNeuralNet (if you have a lot of feature dimensions).
GNeuralNet is the best choice if you know what you're doing. Unfortunately,
neural nets have a lot of parameters. If you want something easy to use, you
might try a bagging ensembles of decision trees, or mean-margins trees (use
GBag with GDecisionTree or GMeanMarginsTree).

You may notice that some of the models (like GNeuralNet) expect all values to
be continuous, and other models (like GNaiveBayes) expect all values to be
nominal. You can use the GFilter class to solve this problem. For example, If
you wrap GNeuralNet in a filter with the GNominalToCat transform, then it can
operate on any type of data. Likewise, if you wrap GNaiveBayes in a filter
with the GDiscretize transform, then it can operate on both nominal and
continuous data. Thus, the GFilter class can make all of my models suitable
for doing classification or regression. (GKNN and GDecisionTree implicitly
handle both types without using a filter.)

Here is a command-line example of how to test a neural net with some
regression problem:

waffles_learn crossvalidate mydata.arff nominaltocat neuralnet -addlayer 32

Here is a command-line example of how to test a bagging ensemble of mean
margins trees with some regression problem:

waffles_learn crossvalidate mydata.arff bag 50 nominaltocat meanmarginstree
end

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Nobody/Anonymous - 2010-12-29

Thanks for the reply, however I'm really looking for how the error is
calculated specifically. For example, here are my real values and
corresponding predicted values on a baseline regression:

Actual value Prediction

138.3206093 119.2596578

102.8708374 119.2596578

139.3020113 119.2596578

141.8271008 119.2596578

139.6765169 119.2596578

98.11396617 119.2596578

108.4569783 119.2596578

116.8517775 119.2596578

141.6157746 119.2596578

112.674653 119.2596578

119.3599495 119.2596578

119.9591068 119.2596578

127.3965653 119.2596578

127.0143511 119.2596578

81.17963669 119.2596578

90.81633395 119.2596578

115.0208661 119.2596578

115.6759148 119.2596578

132.5177658 119.2596578

102.8526855 119.2596578

129.059482 119.2596578

120.6570755 119.2596578

123.2283525 119.2596578

113.7573259 119.2596578

153.1416136 119.2596578

121.3513454 119.2596578

112.929818 119.2596578

114.9028338 119.2596578

126.4990625 119.2596578

125.491234 119.2596578

111.4105838 119.2596578

117.5874593 119.2596578

114.6671417 119.2596578

91.29407153 119.2596578

126.6072217 119.2596578

And I'm getting a reported error of 237.21418379122. Where is this number
coming from? I cannot find how it is related to the predicted values vs. the
actual values. Here are the commands:

waffles_learn train data1.arff baseline > base.twt

waffles_learn.exe test base.twt data1.arff

237.21418379122

Thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Nobody/Anonymous - 2011-01-01

The waffles_learn tool reports mean-squared-error (MSE) by default. In LaTeX,
the formula would be: \frac{1}{n}\sum_i^{i<n}(target_i - prediction_i)^2. (The
square root of this value is closely related to Euclidean distance.) So, in
this case, the prediction is about 15.4 away from the ideal an average.

Another common metric is mean-absolute-error (MAE), which is the average of
the absolute difference between the target and prediction. Perhaps this is the
value you were expecting. I chose to report MSE because it has nicer
properties, and because it is more commonly used in my field. Now that you
mention it, I should probably add a switch so you can specify how you want the
error to be reported.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

Nobody/Anonymous - 2011-01-03

That makes sense, thanks.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Anonymous
  
  Add attachments
  Cancel
  You seem to have CSS turned off. Please don't fill out this field.
  
  You seem to have CSS turned off. Please don't fill out this field.

What accuracy is returned for &quot;tes...

Forums

Help

What accuracy is returned for &quot;tes... document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

What accuracy is returned for "tes...

What accuracy is returned for "tes...