OpenCL slower than CPU?

Help
2011-04-29
2012-12-21
  • Rous Nicolas
    Rous Nicolas
    2011-04-29

    Hi,

    I'm trying to use OpenCL for a project an it seems to be slower than cpu computing on my laptop…

    I have a GeForce GT 230M with 48 cuda cores (from the Nvidia panel->System Information) but when I run the InfoExemple it says Compute units: 6
    I have also an Intel(R) Core(TM) i7  Q720  @1.60GHz avec Windows 7 Familiale

    moreover when I ran the VectorAdd Exemple, cuda take 25 ms for only 10 vectors and the CPU is less than 1 ms.
    and if I increase the number of elements it doesn't help… for 1000 elements, OpenCL is 27 ms and CPU is 3ms, for 100.000 elements, OpenCL is 390ms but CPU still better 370ms.
    ( and the timer is only around the execute, readfrombuffer and Finish function. It doesn't include the initialisation of the data, the creation of the OpenCL program and the kernel)

     
  • Rous Nicolas
    Rous Nicolas
    2011-04-29

    I found some info that OpenCL is slower than CPU if the number of operations done is small (it's the case for the addVector exemple).

    So does it means that OpenCL is not a good solution if I have a lot of data but a small number of operation to perform on them? In this case what kind of solution I need to use? Multithreading on CPU? Cuda? C++ calls instead of c# code? or something else?

    Do you think I can improve my results if I buy a new graphic card for my laptop? It's expensive and I can't buy a new one only for testing but if you think GeForce GT 230M is too bad for openCL? if it's the case do you know some good GPU card for an hp laptop?

    thank you

     
  • nythrix
    nythrix
    2011-04-30

    Yes, OpenCL has some overhead. Vector addition is a very simple operation and you need millions of numbers for the GPU to take the lead. In this case a simple for loop is faster.
    OpenCL is more elaborated and advanced than simple glsl/hlsl/cg shaders. A compute unit is therefore composed of many cuda cores (nVidia) or Stream Processors (ATi). For example, a GF9600GT (my card) has 64 cuda cores but only 8 compute units. That's normal. Also, HW figures on the shiny box aren't provided directly by the research labs but come from the PR & marketing departments ;)
    I certainly don't recommend buying a new card (unless school/work/goverment is paying for it). That said, I don't know what advice to give you until I read a bit more about what you're trying to do.

     
  • Rous Nicolas
    Rous Nicolas
    2011-05-02

    I'm currently trying to use OpenCL to compute matrix interpolation for skeletal animation.

    I have a huge amougt of point (separeated in differents objects) to process but the interpolation is not very complicate.
    For now I tried to create one OpenCL program per objects and pass it the point position, and the matrixs. I will try to create only one program for all the objects in order to see if it improve the performances.

    I think the solution for me will be to multithread my compute on CPU. I don't think it will be be too much overhead and It will definitivly improve the computing time.