Screenshot instructions:
Windows
Mac
Red Hat Linux
Ubuntu
Click URL instructions:
Rightclick on ad, choose "Copy Link", then paste here →
(This may not be possible with some types of ads)
From: Martin Madaras <martin.madaras@gm...>  20121003 14:00:56

Hello, I am currently solving a problem and I would like to use ViennaCL to accelarate the solving time. However, I am not sure if ViennaCL is capable of what I need to do. I have been trying for two days to manage it, but without results. I would be very thankful if you could help me with this questions/problems. I am trying to solve overdeterminated linear system in least squares manner using QR decomposition. The system is A * x = b, where A is not squared 2N x N matrix, x is N x 3 matrix and b is 2N x 3 matrix. 1) My first questions is, if the library is capable of solving linear system composed of matrices, or it has to be decomposed into 3 linear systems where x and b are vectors only. 2) When I am trying to solve this in least squares manner, it looks like follows: A * x = b QR * x = b R * x = Q^T * b x = R^1 * Q^T * b for this, I have to create inverse matrix R^1 Is it possible to compute an inverse matrix using ViennaCL ? Thank you very much for any given advice, and I am sorry if I put anything wrong here, I am not really mathematician, only a programmer that has to solve linear system ;) 
From: <rupp@iu...>  20121003 14:49:14

Hi Martin, yes, ViennaCL can help you with this task for about 1000 < N < 10000, depending on your hardware. > 1) My first questions is, if the library is capable of solving > linear> system composed of matrices, or it has to be decomposed into > 3 linear > systems where x and b are vectors only. Yes, it is. However, solver operations on submatrices are not yet supported in the 1.3.1 release. > 2) When I am trying to solve this in least squares manner, it looks > like> follows: The first thing to note is that most time is spent in the QR factorization (for reasonably large problem sizes). Let's go through it step by step: > QR * x = b Have a look at examples/tutorial/qr.cpp. In your case, std::vector<ScalarType> betas = viennacl::linalg::inplace_qr(vcl_A); should be just what you need. However, at this point the matrices Q and R are both stored within A. > R * x = Q^T * b To accomplish this with the current version of ViennaCL, you need to set up R and Q explicitly on the CPU: viennacl::copy(vcl_A, ublas_A); //copy ViennaCL matrix to uBLAS Q.clear(); R.clear(); viennacl::linalg::recoverQ(ublas_A, hybrid_betas, Q, R); You may then get Q^T b directly on the CPU via boost::numeric::ublas::vector<T> QTb = prod(trans(Q), b); There will be further improvements on doing this part directly on the GPU with one of the next releases, probably already with the upcoming 1.4.0 release. However, I can't promise anything right now... > x = R^1 * Q^T * b Here it is important to keep in mind that R is a upper triangular matrix with unit diagonal. Thus, instead of forming the inverse R^{1} explicitly, you should launch a triangular solver. As the matrices Q and R are already on the CPU, just use uBLAS directly: ublas::solve(R, QTb, ublas::unit_upper_tag()); > Thank you very much for any given advice, > and I am sorry if I put anything wrong here, I am not really > mathematician, only a programmer that has to solve linear system ;) I hope that helps. I haven't tested the suggested code lines, so I hope I haven't missed any details. Also, don't worry about a lack of math skills, there are also enough (too many?) math guys with a lack of programming skills... ;) Best regards, Karli 
From: Michael Lehn <michael.lehn@un...>  20121003 14:54:37

Hi Karl, > >> x = R^1 * Q^T * b > > Here it is important to keep in mind that R is a upper triangular > matrix with unit diagonal. Thus, instead of forming the inverse R^{1} > explicitly, you should launch a triangular solver. As the matrices Q > and R are already on the CPU, just use uBLAS directly: > ublas::solve(R, QTb, ublas::unit_upper_tag()); I think R is upper triangular with *non*unit diagonal. Or is this special to ViennaCL? Cheers, Michael 
From: <rupp@iu...>  20121003 15:02:06

Hi Michael, oh, yes, my bad. R has nonunit diagonal, so you should use ublas::solve(R, QTb, ublas::upper_tag()); Instead, the Householder reflectors are scaled such that their 'diagonal' is unity. This, however, is no longer relevant if Q is recovered explicitly. Thanks for pointing that out. Best regards, Karli Quoting Michael Lehn <michael.lehn@...>: > Hi Karl, > >> >>> x = R^1 * Q^T * b >> >> Here it is important to keep in mind that R is a upper triangular >> matrix with unit diagonal. Thus, instead of forming the inverse R^{1} >> explicitly, you should launch a triangular solver. As the matrices Q >> and R are already on the CPU, just use uBLAS directly: >> ublas::solve(R, QTb, ublas::unit_upper_tag()); > > I think R is upper triangular with *non*unit diagonal. Or is this > special to ViennaCL? > > Cheers, > > Michael > > 
From: Michael Lehn <michael.lehn@un...>  20121003 15:37:26

Hi Karl and others, I am planning to do something similar to MAGMA [1] with FLENS [2]. For the GPU computations I also want to use ViennaCL. So I am extending FLENS for matrix types that "live on the GPU". When using ViennaCL as backend these are basically wrappers for your matrix/vector types. But if you are interested I could also could write an interface that allows conversion between FLENS and ViennaCL matrix/vector types forth and back (like for Eigen). Cheers, Michael [1] http://icl.cs.utk.edu/magma/index.html [2] http://flens.sf.net 
From: <rupp@iu...>  20121003 15:52:35

Hi Michael, > I am planning to do something similar to MAGMA [1] with FLENS [2]. > For the GPU > computations I also want to use ViennaCL. So I am extending FLENS for matrix > types that "live on the GPU". When using ViennaCL as backend these > are basically > wrappers for your matrix/vector types. Cool! :) > But if you are interested I could also could write an interface that allows > conversion between FLENS and ViennaCL matrix/vector types forth and > back (like > for Eigen). Oh yes, that would be great. As you know the internal datastructures in FLENS well, you can most likely avoid some of the unnecessary copies to a separate memory buffer floating around in some of the current viennacl::copy() routines. I'm looking forward to your contribution :) Best regards, Karli 
From: Karl Rupp <rupp@iu...>  20121011 02:30:16

Hi Martin, > Thank you very much for your answers, > it helped my a lot, I finally make it running with good results. Great, I'm glad I could help... > However, I have found some interesting time complexities. You mentioned, > that the most complex part of QR decomposition and solving in least > squares sense is inplace_qr(), therefore it is parallelized on gpu. > > My running times looks as follow for N = 2300 > > inplace_qr on CPU : 12 sec and 849 milisec > inplace_qr on GPU : 1 sec and 491 milisec > recover QR : 34 sec and 892 milisec > Which compiler flags did you use? Have you set the NDEBUG preprocessor constant? Without it, you will get rather poor performance with uBLAS. > with N = 15000 > > inplace_qr on GPU : 15 sec and 704 milisec > recover QR : 870 sec and 796 milisec > The scaling of the GPU is mostly due to the smaller overheads at larger problem sizes, that's why you get an almost linear scaling. The QR recovery timings suggest that you haven't set NDEBUG... > It looks like in my case the most time expensive part is not the > inplace_qr(), but recovering of the Q and R matrices. > This is done on CPU. The question is, isn't it possible to run also this > function on GPU using your library ? Or this problem is hard to > parallelize ? The computation is not entirely trivial, but a GPU parallelization is nevertheless possible. As I'm aiming at providing a leastsquares examples with 1.4.0, there will be a GPUversion available soon. > And one more question, u said "1000 < N < 10000". The maximum size of > matrices is limited to maximum texture size supported by GPU, or GPU > memmory only? Is it possible to find (running some command) these limits > of my GPU ? You can get device information from the 'viennaclinfo' example. The limitation stems from the main memory available on the GPU, which is currently capped at 36 GB. Also, the maximum allocable buffer size on the GPU may be lower than the physical memory available. Best regards, Karli > >> Hi Martin, >> >> yes, ViennaCL can help you with this task for about 1000 < N < 10000, >> depending on your hardware. >> >>> 1) My first questions is, if the library is capable of solving >>> linear> system composed of matrices, or it has to be decomposed into >>> 3 linear >>> systems where x and b are vectors only. >> >> Yes, it is. However, solver operations on submatrices are not yet >> supported in the 1.3.1 release. >> >> >>> 2) When I am trying to solve this in least squares manner, it looks >>> like> follows: >> >> >> The first thing to note is that most time is spent in the QR >> factorization (for reasonably large problem sizes). Let's go through >> it step by step: >> >>> QR * x = b >> >> Have a look at examples/tutorial/qr.cpp. In your case, >> std::vector<ScalarType> betas = viennacl::linalg::inplace_qr(vcl_A); >> should be just what you need. However, at this point the matrices Q >> and R are both stored within A. >> >>> R * x = Q^T * b >> >> To accomplish this with the current version of ViennaCL, you need to >> set up R and Q explicitly on the CPU: >> >> viennacl::copy(vcl_A, ublas_A); //copy ViennaCL matrix to uBLAS >> Q.clear(); R.clear(); >> viennacl::linalg::recoverQ(ublas_A, hybrid_betas, Q, R); >> >> You may then get Q^T b directly on the CPU via >> boost::numeric::ublas::vector<T> QTb = prod(trans(Q), b); >> >> There will be further improvements on doing this part directly on the >> GPU with one of the next releases, probably already with the upcoming >> 1.4.0 release. However, I can't promise anything right now... >> >> >>> x = R^1 * Q^T * b >> >> Here it is important to keep in mind that R is a upper triangular >> matrix with unit diagonal. Thus, instead of forming the inverse R^{1} >> explicitly, you should launch a triangular solver. As the matrices Q >> and R are already on the CPU, just use uBLAS directly: >> ublas::solve(R, QTb, ublas::unit_upper_tag()); >> >> >>> Thank you very much for any given advice, >>> and I am sorry if I put anything wrong here, I am not really >>> mathematician, only a programmer that has to solve linear system ;) >> >> I hope that helps. I haven't tested the suggested code lines, so I >> hope I haven't missed any details. Also, don't worry about a lack of >> math skills, there are also enough (too many?) math guys with a lack >> of programming skills... ;) >> >> Best regards, >> Karli >> >> > > 