ViennaCL / Discussion / General Discussion: Example code for building matrix on the GPU

Example code for building matrix on the GPU

Forum: General Discussion

Creator: Peter Schröder

Created: 2017-03-30

Updated: 2017-03-30

Peter Schröder - 2017-03-30

I just started playing with ViennaCL. I found that constructing the sparse matrix (a complex valued connection Laplacian in 3D) on the CPU takes a VERY long time before I transfer to the GPU and compute. I have 7 non-zero entries per row and am using the vector of maps version for the sparse matrix. In the long run I need to load the entries of the matrix on the GPU itself since they keep changing (though not the structure) during a non-linear minimization run. The entries are complicated (though locally defined) functions of entries in a 3D grid (think: 3-vector at each vertex of a 3D grid giving rise to the entries of the sparse matrix, each grid point giving rise to a row in the matrix).

Can someone point me to an example of code which fills such a matrix on the GPU? What I am concerned about is understanding the data layout. Once I have that it should not be too hard to write the kernel which figures out the entries in the matrix based on the data in the 3D grid (all in parallel since they don'd depend on each other). An example of the latter would be very helpful too.

Who can help?

Peter

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Rupp - 2017-03-31

Hi Peter,

the recommended sparse matrix type in ViennaCL is viennacl::compressed_matrix. These matrices are stored in the standard CSR format, cf. https://software.intel.com/en-us/node/599835 with zero-based indexing and three array format. The respective memory buffers for a viennacl::compressed_matrix<T> A are:
A.handle1(): Start and end index pair for each row (length: no. of rows + 1)
A.handle2(): Column indices (length: number of nonzeros in A)
A.handle(): The numerical entries (length: number of nonzeros in A)
The generic way of dealing with this is to set A to the correct size (either with the appropriate constructor call, or by calling .resize() and .reshape()). Then, use your own kernels to populate the respective memory buffers of A. An example for calling your own kernels can be found here for the CUDA case: https://github.com/viennacl/viennacl-dev/blob/master/examples/tutorial/custom-cuda.cu
and here for the OpenCL case:
https://github.com/viennacl/viennacl-dev/blob/master/examples/tutorial/custom-kernels.cpp

Best regards,
Karli

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.