some samples with cuda code show very low performance, about 2-4x times slower than opencl versions, must be a bug somewhere. tho, some of them work fine