I try to make some tests with viennacl and decide to mesure the GFLOPs of a spmv routine.
I use the benchmark-utils.hhp that comes in the examples.
The code snippet below ilustrate what I'm trying to do:
for (int i = 0; i < LOOPS; ++i)
vcl_vectory = viennacl::linalg::prod(vcl_matrixM, vcl_vectorx);
time_kernel = timer.get() / static_cast<double>(LOOPS);
printOps(static_cast<double>(vcl_Matrix.nnz()) * 2.0, time_kernel);
Here, the vcl_matrixM is a sparse matrix in coo format and timer is a Timer object.
The issue is when the LOOP constant is 1 the GFLOPS is about 27, but when I increase the number of iterations th GFLOPS changes considerably: 10 iterations - 67 GFLOPS, 100 iterations - 156 GFLOPS, 1000 iterations - 356 GFLOPS and so on.
I really don't know whats possibly happing here.
Thanks in advance,
all operations are asynchronous on the GPU, so your for-loop only enqueues the necessary kernels, but does not wait for their completion. You need to use
to wait for kernel execution completion before taking the timings. Have a look at the other benchmarks in examples/benchmarks, this is also used there.
Hope this helps :-)
Thank you Karli,
I use the viennacl::backend::finish() as you mention and everything worked as expected.