From: Karl R. <ru...@iu...> - 2021-09-12 08:57:36
|
Hello, thank you for describing your application in great detail! With a system size of 2000 to 50000 unknowns you are most likely better off with staying on the CPU (assuming that your system is indeed rather sparse with less than about 100 nonzeros per row on average). This is because each GPU kernel launch involves a couple of microseconds of latency; this doesn't sound much, but it accumulates over many kernel launches. Also, with multiple right hand sides I recommend to compute a sparse LU factorization (PARDISO, SuperLU, etc.), and then apply this factorization for each of the right hand sides. This will be more efficient than calling iterative solvers (which is the standard approach for GPUs). Sparse factorizations on the GPU don't really work that well and to the best of my knowledge just match those in equally powerful (with similar energy consumption) CPUs. Regarding symmetry: You can use the symmetry to compute a sparse Cholesky factorization instead of an LU factorization. This, again, fits better onto a CPU than a GPU. Overall, I *think* that you can use the same parallelization approaches (esp. datastructures) for the GPU to also speed up your CPU code (OpenMP, MPI, etc.). In terms of solving these systems, sparse direct solvers on the CPU will be hard to beat at the system sizes you mentioned. Productivity-wise, your best option is most likely to stay with the CPU and don't worry about GPUs for this particular problem. Best regards, Karli On 9/10/21 15:28, Arno Gehrer wrote: > Good afternoon! > > Maybe you can support me to find out if it would make sense to apply > ViennaCL to my problem? > > Background: > > ·In the context of a reverse engineering problem I need to solve a > linear system of equations. > The number of unknowns is in the range of n=2000 … 50000 and the system > needs to be solved a lot of times within an iteration loop. > > ·The matrix is symmetric, hence only the upper triangle is stored in > compressed CSR format > > ·I need to solve this system with multiple right hand side vectors. > > ·At present, I’m using Intel MKL / PARDISO to solve the linear system > with mtype = 2 (real and symmetric positive definite) or -2 (in some > cases, the matrix is real and symmetric indefinite) which works very well. > > ·Recently, I managed to speed up the whole algorithm by setting up the > system on the GPU with CUDA and I’m looking for a suitable library to > solve the system on the GPU as well. > > oI have already tried to solve the system with cusparse (using > cusolverSpDcsrlsvchol or cusolverSpDcsrlsvqr) which in principle worked. > I have faced the problem that I did not find a possibility to > simultaneously solve multiple right hand sides and also the symmetric > property is not supported for cusolverSp. So I had to extend the matrix > to a full matrix and to solve the system for each rhs which in total was > much slower than solving the system on the CPU by means of PARDISO. > > So, after this lengthy introduction, my question is: > > Is it possible to apply ViennaCL to such a problem and can I expect a > significant speed up compared to mkl? > > ·The perfect solution would be if I directly could transfer the matrix > in csr format and the rhs vectors (which are all stored in GPU memory) > to a suitable solver that replaces PARDISO, mtype 2,2 (I currently copy > these data to the host and pass it to PARDISO) > > My environment for development is Win10(x64) / Visual Studio 2019 / MKL > 2017 / CUDA 11.2 and the code also compiles on Linux where CUDA 7.5 is > installed. > > Thanks for your feedback, > > Arno Gehrer > > > > _______________________________________________ > ViennaCL-support mailing list > Vie...@li... > https://lists.sourceforge.net/lists/listinfo/viennacl-support > |