I found that on my platform (Nvidia 7800GS Windows 2000 CVS version of Brook) the reduce function works incorrectly if the size of the stream is 47. It basically double counts the item at position 46, and skips the item at position 44.
This happens when BRT_RUNTIME is set to dx9. ogl and cpu works fine.
I have not tested this on ATI hardware yet.
This problem ocurrs for many stream sizes, but 47 is the first ocurrance of it.
A small stand alone program that will demonstrate the reduction flaw under dx9
Logged In: YES
user_id=728092
Originator: YES
I just tried it on the favored hardware: dx9 for ATI 1950 xt.
The same problem ocurrs.
Logged In: YES
user_id=728092
Originator: YES
I have found the problem.
The guts of it:
The main section of executeReductionStep (in gpu/gpukernel.cpp) line 836 is currently:
_context->getStreamReduceInterpolant( inputBuffer, resultExtents[0], resultExtents[1],
i, remainingExtent+i, 0, otherExtent, dim, _globalInterpolants[i] );
But when remainingExtent is 47, we do not want to go from i...47+i, We want i..46+i.
The proper setting should be:
_context->getStreamReduceInterpolant( inputBuffer, resultExtents[0], resultExtents[1],
i, (remainingFactor-slopFactor)*outputExtent+i, 0, otherExtent, dim, _globalInterpolants[i] );
where remainingFactor-slopFactor gets us the right end range, and multiplying by outputExtent is to scale remainingFactor properly.
I also found some minor interpolation problems with the slop pickup code, but probably nothing to cause an error.
With this line changes, reduction works up to 4096. After that something else goes wrong.
I would love to submit the fix.