Big update this time, almost there for a full release:
Added project for CUDA Kernels. Kernels were written to fill in the holes in NPP for Copy, Set, & Transpose and Kernels were written for Conjugate and Real-Complex functions missing in NPP.
CUDA project includes functions to automatically calculate the grid and threads for kernel functions.
MXCudaData class written to make it easy to pass scalars and simple vectors to-from the device.
Many bug fixes.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Big update this time, almost there for a full release:
Added project for CUDA Kernels. Kernels were written to fill in the holes in NPP for Copy, Set, & Transpose and Kernels were written for Conjugate and Real-Complex functions missing in NPP.
CUDA project includes functions to automatically calculate the grid and threads for kernel functions.
MXCudaData class written to make it easy to pass scalars and simple vectors to-from the device.
Many bug fixes.