I am planning on using ViennaCL as part of a research project involving solving the equations of motion of more than 10,000 particles at a time by the implicit Euler method. I am hoping to keep everything on the GPU while doing this. In doing so, I have come up with a few questions about the compressed_matrix class I was hoping you could help me answer.
I saw that it is suggested to construct instances of compressed_matrix on the CPU, then pushing them onto the GPU using the ViennaCL API. However, I saw that there are functions for providing the three arrays that compose the CSR format either using buffers on host or device memory. The host code works great, while the device code does not work out of the box because it does not set up the context like the other constructors. Is there a technical reason why it's allowed for host memory but not device memory? I was able to just add a few lines of code from one of the other constructors to ask for the OpenCL/ViennaCL context, making it appear to work after the change. As long as I set up the input arrays correctly, is there any reasons why this approach might not be recommended?
On the note of setting up these arrays correctly (whether on host or device), I was wondering what the strict requirements of the CSR format as implemented in ViennaCL are. As I am well aware, there is the array of row indices, which give the start indexes of each row in memory. Then, there are two arrays for the nonzero elements and the columns that they are associated with. In all examples that I have ever seen of this sparse matrix format, the elements per row are ordered by increasing column index. However, I tried switching the columns (in a row) out of order in a few different small examples, and it appeared to work properly with the iterative solvers. Is the ordering of the columns for each row in the CSR format in ViennaCL not a requirement, or am I just getting lucky? Could I continue to expect such behavior to continue to work?
Thank you for your time and any help you may be able to provide.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
as for your question regarding the constructors of compressed_matrix: It is possible to pass your own context if you tell ViennaCL to use your context. An example of how this can be done is in examples/tutorial/custom-kernels.cpp
Once the context is set up, you can directly use the existing constructor taking cl_mem memory handles for the three CSR arrays. If you have suggestions on how to improve or simplify this, please let us know, we are happy to incorporate this if there are no semantic reasons against it :-)
Regarding the CSR format, there are minor details to consider: If you use compressed_matrix<T>, i.e. without setting the additional alignment template argument, this is just the 'standard' CSR format, where the array for the row index ranges is of length N+1 for a sparse matrix with N rows. The last value of this array should be NNZ+1 for all kernels to work correctly (NNZ is the number of nonzeros and equal to the length of the column index and entry arrays). At present we do not expect any specific ordering of the column indices for convenience, which we won't change during the 1.x.y release series. However, for efficiency purposes we may require a particular ordering for 2.0.0, which, however, I don't expect to be released within the next 12 months or so.
I hope this answers all your questions - if not, please let me know :-)
Best regards,
Karli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I am planning on using ViennaCL as part of a research project involving solving the equations of motion of more than 10,000 particles at a time by the implicit Euler method. I am hoping to keep everything on the GPU while doing this. In doing so, I have come up with a few questions about the
compressed_matrix
class I was hoping you could help me answer.I saw that it is suggested to construct instances of
compressed_matrix
on the CPU, then pushing them onto the GPU using the ViennaCL API. However, I saw that there are functions for providing the three arrays that compose the CSR format either using buffers on host or device memory. The host code works great, while the device code does not work out of the box because it does not set up the context like the other constructors. Is there a technical reason why it's allowed for host memory but not device memory? I was able to just add a few lines of code from one of the other constructors to ask for the OpenCL/ViennaCL context, making it appear to work after the change. As long as I set up the input arrays correctly, is there any reasons why this approach might not be recommended?On the note of setting up these arrays correctly (whether on host or device), I was wondering what the strict requirements of the CSR format as implemented in ViennaCL are. As I am well aware, there is the array of row indices, which give the start indexes of each row in memory. Then, there are two arrays for the nonzero elements and the columns that they are associated with. In all examples that I have ever seen of this sparse matrix format, the elements per row are ordered by increasing column index. However, I tried switching the columns (in a row) out of order in a few different small examples, and it appeared to work properly with the iterative solvers. Is the ordering of the columns for each row in the CSR format in ViennaCL not a requirement, or am I just getting lucky? Could I continue to expect such behavior to continue to work?
Thank you for your time and any help you may be able to provide.
Hi Joseph,
as for your question regarding the constructors of compressed_matrix: It is possible to pass your own context if you tell ViennaCL to use your context. An example of how this can be done is in examples/tutorial/custom-kernels.cpp
Once the context is set up, you can directly use the existing constructor taking cl_mem memory handles for the three CSR arrays. If you have suggestions on how to improve or simplify this, please let us know, we are happy to incorporate this if there are no semantic reasons against it :-)
Regarding the CSR format, there are minor details to consider: If you use compressed_matrix<T>, i.e. without setting the additional alignment template argument, this is just the 'standard' CSR format, where the array for the row index ranges is of length N+1 for a sparse matrix with N rows. The last value of this array should be NNZ+1 for all kernels to work correctly (NNZ is the number of nonzeros and equal to the length of the column index and entry arrays). At present we do not expect any specific ordering of the column indices for convenience, which we won't change during the 1.x.y release series. However, for efficiency purposes we may require a particular ordering for 2.0.0, which, however, I don't expect to be released within the next 12 months or so.
I hope this answers all your questions - if not, please let me know :-)
Best regards,
Karli