I spent a day profiling my code and traced a huge performance hit to the call of norm(M, "fro") in IT++. It turns out that this Frobenius norm calculation is done by multiplying M'*M and then throwing away everything but the diagonal. Now imagine a M which is, say, 15x2000...
Hello and thanks for a great library!
I spent a day profiling my code and traced a huge performance hit to the call of norm(M, "fro") in IT++. It turns out that this Frobenius norm calculation is done by multiplying M'*M and then throwing away everything but the diagonal. Now imagine a M which is, say, 15x2000...
Here is a patch for a more better implementation:
diff misc_stat.cpp.orig misc_stat.cpp
137c137,144
< return std::sqrt(sum(diag(transpose(m) * m)));
>
> double E = 0.0;
> for (int r=0; r < m.rows(); ++r) {
> for (int c=0; c < m.rows(); ++c) {
> E += (m(r,c)*m(r,c));
> }
> }
> return std::sqrt(E);
144,145d150
< return std::sqrt(sum(real(diag(hermitian_transpose(m) * m))));
< }
146a152,159
> double E = 0.0;
> for (int r=0; r < m.rows(); ++r) {
> for (int c=0; c < m.rows(); ++c) {
> E += std::norm(m(r,c));
> }
> }
> return std::sqrt(E);
> }
/Bo
Hi Bo,
Thanks for your report and patch. This improvement is now included in our SVN sources. It should be a part of the next IT++ release.
BR,
/Adam
Except that it should have been:
diff misc_stat.cpp.orig misc_stat.cpp
137c137,144
< return std::sqrt(sum(diag(transpose(m) * m)));
>
> double E = 0.0;
> for (int r=0; r < m.rows(); ++r) {
> for (int c=0; c < m.cols(); ++c) {
> E += (m(r,c)*m(r,c));
> }
> }
> return std::sqrt(E);
144,145d150
< return std::sqrt(sum(real(diag(hermitian_transpose(m) * m))));
< }
146a152,159
> double E = 0.0;
> for (int r=0; r < m.rows(); ++r) {
> for (int c=0; c < m.cols(); ++c) {
> E += std::norm(m(r,c));
> }
> }
> return std::sqrt(E);
> }