From: Karl R. <ru...@iu...> - 2011-05-24 07:42:03
|
Hi Riaan, the kernel is indeed not very efficient and can be the cause of the problem. I think it's better to iterate within one work group over the transposed matrix in a column-wise manner and synchronize after every column. Moreover, the sparse matrix can be split into k sets of columns (one set per work group) that are processed in parallel, each leading to a partial result vector y_k. At the end, the result vector is obtained by partial reduction as y = sum_k y_k. Best regards, Karli On 05/24/2011 09:05 AM, Riaan van den Dool wrote: > I do wonder if my kernel does not take too long to execute... > > > > __kernel void trans_vec_mul( > __global const unsigned int * row_indices, > __global const unsigned int * column_indices, > __global const float * elements, > __global const float * vector, > __global float * result, > unsigned int size1, > unsigned int size2) > { > unsigned int max_index = row_indices[size1]; > for (unsigned int col = get_global_id(0); col < size2; col += > get_global_size(0)) > { > float dot_prod = 0.0f; > unsigned int row = 0; > for (unsigned int i = 0; i <= max_index; i++) > { > while (row_indices[row] < i) > row++; > if (column_indices[i] == col) > { > dot_prod += elements[i] * vector[row]; > } > } > result[col] = dot_prod; > } > } > > > On Mon, May 23, 2011 at 2:16 PM, Karl Rupp <ru...@iu... > <mailto:ru...@iu...>> wrote: > > Hi, > > if you get the error due to not providing an estimate, then it's > likely that you try to access some invalid piece of memory and the > out_of_resources exception is only a consequence. > > Conservative work sizes are 64 for local and 64*64 for global. > > Best regards, > Karli > > > > On 05/23/2011 02:01 PM, Riaan van den Dool wrote: > > I keep getting terminate called after throwing an instance of > 'viennacl::ocl::out_of_resources' now (seems that if I don;t set > up the > solver with a prior (estimate of solution)) then it falls over. > > What would the most conservative workgroup sizes look like? > > local_work_size_[0] = 1; local_work_size_[1] = 0; > global_work_size_[0] = 512*512; global_work_size_[1] = 0; > > These settings still give problems. > > R > > > |