I try to make some tests with viennacl and decide to mesure the GFLOPs of a spmv routine.
I use the benchmark-utils.hhp that comes in the examples.
The code snippet below ilustrate what I'm trying to do:
Here, the vcl_matrixM is a sparse matrix in coo format and timer is a Timer object.
The issue is when the LOOP constant is 1 the GFLOPS is about 27, but when I increase the number of iterations th GFLOPS changes considerably: 10 iterations - 67 GFLOPS, 100 iterations - 156 GFLOPS, 1000 iterations - 356 GFLOPS and so on.
I really don't know whats possibly happing here.
Thanks in advance,
Daniel Estrela
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
all operations are asynchronous on the GPU, so your for-loop only enqueues the necessary kernels, but does not wait for their completion. You need to use
viennacl::backend::finish();
to wait for kernel execution completion before taking the timings. Have a look at the other benchmarks in examples/benchmarks, this is also used there.
Hope this helps :-)
Best regards,
Karli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
I try to make some tests with viennacl and decide to mesure the GFLOPs of a spmv routine.
I use the benchmark-utils.hhp that comes in the examples.
The code snippet below ilustrate what I'm trying to do:
Here, the vcl_matrixM is a sparse matrix in coo format and timer is a Timer object.
The issue is when the LOOP constant is 1 the GFLOPS is about 27, but when I increase the number of iterations th GFLOPS changes considerably: 10 iterations - 67 GFLOPS, 100 iterations - 156 GFLOPS, 1000 iterations - 356 GFLOPS and so on.
I really don't know whats possibly happing here.
Thanks in advance,
Daniel Estrela
Hi Daniel,
all operations are asynchronous on the GPU, so your for-loop only enqueues the necessary kernels, but does not wait for their completion. You need to use
to wait for kernel execution completion before taking the timings. Have a look at the other benchmarks in examples/benchmarks, this is also used there.
Hope this helps :-)
Best regards,
Karli
Thank you Karli,
I use the viennacl::backend::finish() as you mention and everything worked as expected.
Best regards,
Daniel Estrela