Seperate tuning caching from kernel caching
C++ library for sorting and searching in OpenCL applications
Brought to you by:
bmerry
Currently every cached kernel corresponds to a tuning pass, but that does not necessarily scale well. In particular, if scan inputs and outputs are separated, it does not make sense to redo tuning for every input type, but a separate kernel is needed in each case.
Diff: