Change the CPU-based uniform grid creation to work across 'n' threads to distribute load on latest generation multi-core CPUs. Additionally a native library call to use SSE4 wouldn't be a bad idea, but that's way beyond me.
Log in to post a comment.