I wonder, why is L2 regularization (Gaussian prior over parameters) implemented in TADM (see probs.cc) as the following, where f is the log-likelihood, i.e. LL(p,q) = - sum_i p(x_i) log q(x_i)
L2 (ridge) penalty: f = f + sum(x^2) / 2*sigma
Why is it f *plus* the regularization term (sum(x^2) / 2*sigma) instead of
LL *minus* it (as described in papers, a.o. Chen & Rosenfeld 1999)? wha'ts the explanation for the plus in the implementation?
if LL would be sum_i p(x_i) log q(x_i) then the plus would make sense, but its - sum_i p(x_i) log q(x_i), so shouldn't it actually be:
f = f - sum(x^2) / 2*sigma
(i.e. shouldn't there be in the code, line 312 in probs.cc: f -= pen; instead of f+= pen;)?
Thanks for any help!
Barbara
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
It has been a long time since I looked at this, but if I remember correctly it does the right thing. Youi just need to check the signs of the various terms.
(You can easily verify this by running the code with and without the prior; with the prior (and with a small variance) the weights all get smaller, as expected.
Miles
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear all,
I wonder, why is L2 regularization (Gaussian prior over parameters) implemented in TADM (see probs.cc) as the following, where f is the log-likelihood, i.e. LL(p,q) = - sum_i p(x_i) log q(x_i)
L2 (ridge) penalty: f = f + sum(x^2) / 2*sigma
Why is it f *plus* the regularization term (sum(x^2) / 2*sigma) instead of
LL *minus* it (as described in papers, a.o. Chen & Rosenfeld 1999)? wha'ts the explanation for the plus in the implementation?
if LL would be sum_i p(x_i) log q(x_i) then the plus would make sense, but its - sum_i p(x_i) log q(x_i), so shouldn't it actually be:
f = f - sum(x^2) / 2*sigma
(i.e. shouldn't there be in the code, line 312 in probs.cc: f -= pen; instead of f+= pen;)?
Thanks for any help!
Barbara
It has been a long time since I looked at this, but if I remember correctly it does the right thing. Youi just need to check the signs of the various terms.
(You can easily verify this by running the code with and without the prior; with the prior (and with a small variance) the weights all get smaller, as expected.
Miles