Thank you for this great piece of code!
If you take a look at this code inside kernel.hpp, around line 118:
while (err != CL_SUCCESS && tmp_local > 1)
//std::cout << "Flushing queue, then enqueuing again with half the size..." << std::endl;
tmp_global /= 2;
tmp_local /= 2;
err = clEnqueueNDRangeKernel(device().queue().get(), h.get(), 1, NULL, &tmp_global, &tmp_local, 0, NULL, NULL);
You try to find the biggest work group size that lets your kernel run. If an error occurs, you try lowering the group size, but the code above also lowers global work size, by a factor of two. Is this what you want?