Just to let you know something I have just noticed. It is a nasty problem
that had disappeared after a code cleaning but who has just made its
I experienced very big transfer problems. for 4800*4800matrix, I had about
a total of 300ms overhead on both AMD and NVidia SDK for 4 transfers (is
1/3 of the kernel time -.-) . By replacing CL_MEM_COPY_HOST_PTR with
clEnqueueWriteBuffer, the transfer time dropped to 70-100 ms, which is way
I think the drivers optimizes buffers under a certain size when they have
the default flags.
I don't know if it is related, but above something around 107Mega (I am 90%
sure it's not my implementation), the transfers on the nvidia plateform
become insanely slow, while AMD can keep up.
Anyway, beware CL_MEM_COPY_HOST_PTR! :p