GFLOPs changes with the number of iterations in a for loop

  • Daniel Estrela

    Daniel Estrela - 2014-02-25


    I try to make some tests with viennacl and decide to mesure the GFLOPs of a spmv routine.
    I use the benchmark-utils.hhp that comes in the examples.
    The code snippet below ilustrate what I'm trying to do:

    viennacl::copy(std_vectorx, vcl_vectorx);
    viennacl::copy(std_matrixM, vcl_matrixM);
    for (int i = 0; i < LOOPS; ++i)
        vcl_vectory = viennacl::linalg::prod(vcl_matrixM, vcl_vectorx);
    time_kernel = timer.get() / static_cast<double>(LOOPS);
    printOps(static_cast<double>(vcl_Matrix.nnz()) * 2.0, time_kernel);

    Here, the vcl_matrixM is a sparse matrix in coo format and timer is a Timer object.

    The issue is when the LOOP constant is 1 the GFLOPS is about 27, but when I increase the number of iterations th GFLOPS changes considerably: 10 iterations - 67 GFLOPS, 100 iterations - 156 GFLOPS, 1000 iterations - 356 GFLOPS and so on.

    I really don't know whats possibly happing here.

    Thanks in advance,
    Daniel Estrela

  • Karl Rupp

    Karl Rupp - 2014-02-25

    Hi Daniel,

    all operations are asynchronous on the GPU, so your for-loop only enqueues the necessary kernels, but does not wait for their completion. You need to use


    to wait for kernel execution completion before taking the timings. Have a look at the other benchmarks in examples/benchmarks, this is also used there.

    Hope this helps :-)

    Best regards,

  • Daniel Estrela

    Daniel Estrela - 2014-02-25

    Thank you Karli,

    I use the viennacl::backend::finish() as you mention and everything worked as expected.

    Best regards,
    Daniel Estrela


Log in to post a comment.