From: Philippe T. <phi...@gm...> - 2012-08-23 13:56:19
|
Hello everybody ! Browsing through the internals, I have found : template <typename SCALARTYPE, typename F, unsigned int ALIGNMENT> void fast_copy(SCALARTYPE * cpu_matrix_begin, SCALARTYPE * cpu_matrix_end, matrix<SCALARTYPE, F, ALIGNMENT> & gpu_matrix) Might not have the intended behavior : it reallocs the internal gpu_buffer of gpu_matrix to a buffer of size (cpu_matrix_end-cpu_matrix_begin)*sizeof(SCALARTYPE) , and copies the data. A user might however want to copy just a part of the cpu_matrix to the gpu. Plus, reallocating the matrix without changing its sizes sounds a bit weird. Instead, wouldn't it be more intuitive to call clEnqueueWriteBuffer, and to create an additional constructor : matrix(size1, size2, cpu_matrix_begin) , which would indeed allocate with CL_COPY_HOST_PTR flag. Plus, such a constructor covers a special const-correctness case i've faced : Creating a gpu matrix from cpu data, but not writing them back. In that case, the gpu_matrix is const but it is not possible to call fast_copy on it ! Best regards, Philippe |