Hello,
Due to benchmarking tests, I need to compute the FLOPS spent by DGELS
when it works with a square (non transposed mode) matrix and only one
right hand side, i.e. the performance of DGELS using the QR
decomposition. Inspecting the code, I see that the operations sequence
is (I omit the functions that works only on B, because I use only one
right hand side):
DGELS = DLANGE+DLASCL+DGEQRF+DORM2R+DTRTRS
The individual operation counts are:
DLANGE: N^2? This function computes the max(abs(A[i,j])), so the N^2
elements of the matrix are used, but only for comparison. Shoul I
count each comparison as one FLOP?
DLASCL: N^2 in the better case. I say in the better case because the
function contains an inner loop
DGEQRF: 4/3N^3+ 2N^2+14/3N, (considering a square NxN matrix) as
stated in http://www.netlib.org/lapack/lawnspdf/lawn41.pdf
DORM2R: I don't know...
DTRTRS: N^3 (www.netlib.org/lapack/lawnspdf/lawn41.pdf)
Knows anyone an expression for the total count? Which formula should
be use fot DORM2R?
Thanks

