If the chain length is not an even multiple of the vector width, the results generated are rubbish. This is due to the vector_load and vector_store types not handling this case properly in OpenCL. Fix with padding to work correctly.
Log in to post a comment.