I realize this is probably common knowledge to others more experienced with programming, but for a newbie in C# it absolutely confounded me for several weeks. Just figured I'd throw it out there in case it saves someone else from pulling out their hair.
It appears that when creating the OpenCL buffers using Cloo, the data is not transferred immediately to the device. This means that when using a local method variable to store, say, an array of data to send to the GPU, there is a chance that the variable can fall out of scope and be garbage collected before the data is entirely transferred.
It's probably a pretty "duh" thing for the experts out there, but the bug was absolute hell to track down because it disappeared as soon as I inserted a break, and only happened roughly once every 500 executions. I guess it's possible that some of the buffer creation arguments can be changed to prevent this, but storing the array outside of the method solved the problem completely =)
I should add this to the documentation of the constructors. Thanks for the heads up!
By the way: do you recall what flags where you using? UseHostPointer could cause this too, even when storing the array.
for creating the buffer, i was using:
new ComputeBuffer<float>(Context, ComputeMemoryFlags.ReadWrite | ComputeMemoryFlags.UseHostPointer, floatData)
The floatData was just a temporary variable because i was casting from an array of doubles. As soon as I stored it outside of the method, the problem disappeared and hasn't come back in a couple hundred thousand iterations.
Note, that UseHostPointer means OpenCL may use the data in-place (which is managed memory) without copying them. If the GC will move floatData during compacting, you may hit an invalid memory access thingy. Although current implementations probably cache the data, it doesn't mean you won't hit an error in the future. Either replace UseHostPointer with a much safer CopyHostPointer or pin the array so it doesn't get thrown around the heap.
Ah, that makes much more sense. Thank you. The OpenCL documentation probably mentioned this somewhere, but I'm still a newbie at programming and the OpenCL Specification is a bit… cryptic.
I'm new member too,
I think you should use UseHostPointer flag for best performance In/Out data from Host and Device, Finally , You can read the result back to Host. Because Copy data from Host to Device very expensively. :)