I've been experimenting with custom kernels in ViennaCL and I've encountered some problems related to work sizes and clFinish(). If I give my kernel a global work size which exceeds the maximum global work size, ViennaCL attempts to halve the work size until clEnqueueNDRangeKernel() succeeds. After each failed enqueuing, it calls queue.finish() (enqueue.hpp:83), which calls clFinish(). As far as I can tell, the clFinish() call serves no purpose except to slow things down, as it introduces an unnecessary synchronization point.
Additionally, the default behaviour of halving the global and local work sizes until they fit isn't ideal. If I only set the global work size, my global work size may not be divisible by the default local work size. ViennaCL will happily reduce these all the way to 0, if necessary, which is not very useful.
It's not clear that a user gains much from ViennaCL's naive work size selection algorithm. It might be better to simply fail and report an error than to pick suboptimal (or incorrect!) work sizes.
the reason for this behavior dates back at least three years, where one got all different kinds of funny behavior. I agree that this is superseded, I'll remove this for the next release.
Thanks for the input and best regards,
the behavior is corrected here:
Thanks for the input!