Menu

Model selection

Jörgen Brandt
Attachments
formulas.png (47062 bytes)

Model selection

In PMM-Lab, it is possible to fit several models to the same data set. The advantage of this approach is the fact that one can choose the most appropriate model for the data set at hand. Models differ in their theoretical foundation, complexity, their robustness and their ease of interpretation. It is the researcher's responsibility to choose the model as a compromise according to the aforementioned aspects. This task is generally referred to as model selection.

PMM-Lab provides several measures with which the researcher can quantify the performance of his model.

  • Root Mean Square (RMS)
  • Statistic coefficient of determination R^2
  • Bayes Information Criterion (BIC)
  • Akaike Information Criterion (AIC)

The former two measures only quantify the error with respect to the training data set. The latter two as well consider model complexity. I.e. they prefer simple models in favor of complex ones.

Root Mean Square

The Root Mean Square (RMS) error is a measure for the difference between a data set and a corresponding fit. The RMS asymptotically converges to the standard deviation from the model's predicted value for sufficiently large sizes of data sets.

Statistic coefficient of determination

The R^2 value or statistical coefficient of determination is a measure for how well a regression model is capable of describing a data set. The R^2 does not explicitly consider model complexity. Its definintion range is (-inf,1]. Where 1 is a perfect fit while 0 is equivalent to the fit of an uninformed model that plainly predicts the expectation value of the dependent variable and is ignorant with respect to the independent variables. Hence, a model, that performs with an R^2 < 0 is expected to lose information on observation of the set of dependent variables instead of gaining. This situation is, of course, undesirable.

Bayes Information Criterion

The Bayes Information Criterion (BIC) is a measure for goodness of fit for a model. It explicitly includes the model complexity into the consideration. Models having a high number of degrees of freedom are penalized by the BIC. Hence, in two models of different complexity that perform equally well in approximating a data set the BIC will favor the simpler model in favor of the more complex one.

Akaike Information Criterion

Similarly to the BIC, the Akaike Information Criterion (AIC) describes the goodness of fit for a model with respect to model complexity. Penalty for complexity in the AIC is less severe than in BIC, thereby emphasizing accuracy over simplicity in comparison with the BIC.

Formulas

Formulas