I tried many solvers on GPU and the bottleneck is ILU. for my work the system matrix and rhs reside on the GPU and are computed with CUDA. Before starting any conversion from CUDA to openCL I would like to make sure that at least the ILU version of Pr Chow is runnign fully on the GPU.
A side question: can I pass col, row and value arrays as cuda pointer to compressed viennaCl matrix?
Last edit: Mohamed Aissa 2015-12-17
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi,
you can use our parallel Chow-Patel-ILU with OpenMP, CUDA, and OpenCL. In other words, you don't need to migrate from CUDA to OpenCL, you can just use your CUDA buffers. With CUDA and OpenCL the Chow-Patel-ILU fully runs on the GPU.
If your col/row/value data is already in CSR format and you use OpenCL, you can use the provided constructor for OpenCL. With CUDA I recommend you copy&paste the OpenCL-enabled constructor (compressed_matrix.hpp, around line 670) to make it work with CUDA. The added code should be something similar to the following (not tested):
I used the same constructor you sugested with int instead of unsigned int. I call it that way :
Matrix=new viennacl::compressed_matrix<double,4>(drow_offset, dcol, dval,NRow,NCol,num_entries);
and the errors are of that kind:
no operator "=" matches these operands operand types are: viennacl::backend::mem_handle::cuda_handle_type = int *
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I got a small code to work finding the right solution of a 6x6 Ax=b system but the cuda profiler nvvp shows me that during the solving there is some use of the PCI in both direction. Is it intended although bad for perfromance or in this code something is wrong?:
Sorry, sourceforge silently put your post into the moderation queue.
Yes, PCI-Express communication in both directions is required for iterative solvers (at the very least, one has to check convergence in each iteration).
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I tried many solvers on GPU and the bottleneck is ILU. for my work the system matrix and rhs reside on the GPU and are computed with CUDA. Before starting any conversion from CUDA to openCL I would like to make sure that at least the ILU version of Pr Chow is runnign fully on the GPU.
A side question: can I pass col, row and value arrays as cuda pointer to compressed viennaCl matrix?
Last edit: Mohamed Aissa 2015-12-17
Hi,
you can use our parallel Chow-Patel-ILU with OpenMP, CUDA, and OpenCL. In other words, you don't need to migrate from CUDA to OpenCL, you can just use your CUDA buffers. With CUDA and OpenCL the Chow-Patel-ILU fully runs on the GPU.
If your col/row/value data is already in CSR format and you use OpenCL, you can use the provided constructor for OpenCL. With CUDA I recommend you copy&paste the OpenCL-enabled constructor (compressed_matrix.hpp, around line 670) to make it work with CUDA. The added code should be something similar to the following (not tested):
If you replace 'unsigned int' with 'int', things should still work (lower bits are the same in two-complement).
Best regards,
Karli
thanks for the prompt answer. I will test it.
I used the same constructor you sugested with int instead of unsigned int. I call it that way :
Matrix=new viennacl::compressed_matrix<double,4>(drow_offset, dcol, dval,NRow,NCol,num_entries);
and the errors are of that kind:
no operator "=" matches these operands operand types are: viennacl::backend::mem_handle::cuda_handle_type = int *
Ah, type mismatch. Please replace the respective lines of type
with
I got a small code to work finding the right solution of a 6x6 Ax=b system but the cuda profiler nvvp shows me that during the solving there is some use of the PCI in both direction. Is it intended although bad for perfromance or in this code something is wrong?:
#include <iostream>
#include <cstdlib>
#include <cuda.h>
#include <cuda_runtime_api.h>
#include "viennacl/compressed_matrix.hpp"
#include "viennacl/vector.hpp"
#include "viennacl/linalg/ilu.hpp"
#include "viennacl/linalg/gmres.hpp"
#include <unistd.h>
int main(int argc, char* argv[]) {
int NRow=6,NCol=6,num_entries=9;
// CSR row offset
int row_offset[7] = {0, 1, 3, 4, 6, 8, 9};
// CSR cols
int col[9] = {0, 1, 2, 2, 2, 3, 4, 5, 5};
// CSR vals
double val[9] = {2., 3., 6., 4., 5., 6.3, 1.5, 2.5, 4.5};
// rhs
double b[6] = {1.0, 1.0, 1.0, 1.0, 1.0, 1.0};
std::vector<double> stl_vec(6);
// Copy to GPU
int drow_offset = NULL;
int dcol = NULL;
double dval = NULL;
double drhs = NULL;
double *dsol = NULL;
cudaMalloc((void)&drow_offset,7*sizeof(int));
cudaMalloc((void)&dcol,9sizeof(int));
cudaMalloc((void)&dval,9*sizeof(double));
cudaMalloc((void)&drhs,9sizeof(double));
cudaMalloc((void*)&dsol,6sizeof(double));
cudaMemcpy(drow_offset,&row_offset,7sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(dcol,&col,9sizeof(int),cudaMemcpyHostToDevice);
cudaMemcpy(dval,&val,9sizeof(double),cudaMemcpyHostToDevice);
cudaMemcpy(drhs,&b,9sizeof(double),cudaMemcpyHostToDevice);
viennacl::compressed_matrix<double> Matrix;
Matrix =viennacl::compressed_matrix<double>(drow_offset, dcol, dval,NRow,NCol,num_entries) ;
viennacl::vector_base<double> Vector= viennacl::vector_base<double>(drhs, viennacl::CUDA_MEMORY,NRow,0,1);
viennacl::vector_base<double> Solution(dsol,viennacl::CUDA_MEMORY,NRow,0,1);
//for profiling: separate solving in the timeline
usleep(1000 * 1000);
//viennacl::linalg::chow_patel_ilu_precond< viennacl::compressed_matrix<double> > vcl_ilu0(Matrix, viennacl::linalg::chow_patel_tag());
Solution = viennacl::linalg::solve(Matrix, Vector, viennacl::linalg::gmres_tag(1e-6, 20));
usleep(1000 * 1000);
copy(Solution.begin(), Solution.end(), stl_vec.begin());
for(int i=0;i<6;i++)printf("%f \t",stl_vec[i]);printf("\n");
return 0;
}
Sorry, sourceforge silently put your post into the moderation queue.
Yes, PCI-Express communication in both directions is required for iterative solvers (at the very least, one has to check convergence in each iteration).