CUDA kernel is destroying last 4 bytes of every input. Problem would be in proper alignment of input data.
Log in to post a comment.