- labels: --> LinearSampling\Performance
// Generates random bit vector of length "n" with "k" bits set to 1 and rest set to 0.
__device__ void kGenerateBooleanPermutation(char* perm_shared, curandState *curand_state, int n, int k)
{
// ToDo: replace with this:
// http://stackoverflow.com/questions/12653995/how-to-generate-random-permutations-with-cuda
if (threadIdx.x == 0) {
This can be fixed in two ways. E// Generates random bit vector of length "n" with "k" bits set to 1 and rest set to 0.
__device__ void kGenerateBooleanPermutation(char* perm_shared, curandState *curand_state, int n, int k)
{
// ToDo: replace with this:
// http://stackoverflow.com/questions/12653995/how-to-generate-random-permutations-with-cuda
if (threadIdx.x == 0) {
I actually in doubt whether this can be fixed with idea recommended in "how-to-generate-random-permutations-with-cuda" because this approach doesn't work when N is not power of two. But an alternative is pre-generate a table of random permutations in GPU (this can be done in parallel), and then use it in cross-validation.