Nice Job! I just want to comment on your accum/histogram stage - what I've found on my similar implementation is that you will get a good amount of race condition on the way you have it - as much as 50% point loss - on certain dense xforms. I ended up doing this on the opengl side (vbo + glblend).
Thanks for the info! I'm going to see if I can use CUDA's atomic functions to solve this.
I tried that. From what I remember it messed up the timing of the warp and slowed things down quite a bit… You might get better results though. -dave