I have a simple 8x8 table initialized as follows:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 1
Now I have the following (very simple) kernel, which has to convert
each 1 to a 2.
...
// Specifying the domain of execution for output stream
eucdist_saito_first_stage_fwd.domainOffset(uint4(0, 0, 0, 0));
eucdist_saito_first_stage_fwd.domainSize(uint4(width, 1, 1, 1)); // use 1
thread per row
// Perform the row scan forward
eucdist_saito_first_stage_fwd((float)width, inputStream, outputStream);
kernel void
eucdist_saito_first_stage_fwd(float width, float input[], out float output[])
{
float row = (float)(instance().x);
float j;
// Scan the row going forward
for(j = 0.0f; j<width; j=j+1.0f){
if(input[row*width + j] == 1.0f){
output[row*width + j] = 2.0f;
}
}
}
This simple code does not work !!! The "if" statement is always taken and the
output is:
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
However if I do:
kernel void
eucdist_saito_first_stage_fwd(float width, float input[], out float output[])
{
float row = (float)(instance().x);
float j;
// Scan the row going forward
for(j = 0.0f; j<width; j=j+1.0f){
if(input[row*width + j] == 1.0f){
output[row*width + j] = 2.0f * input[row*width + j];
}
}
}
I get:
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
which makes no sense.
Forum URL : http://www.gpgpu.org/forums/viewtopic.php?t=5549