I'm using the conjugate gradient solver to with a dense, symmetric 70x70 matrix of doubles. e.g.
k = viennacl::linalg::solve(m, k, tag);
I'm using ViennaCL 1.4.1 and an NVIDIA K20 GPU.
I've found that I get significantly different solutions when I use the GPU vs. the CPU (i.e. when I define VIENNACL_WITH_OPENCL).
For example, the first element of my result vector is 0.00931 on the CPU, but 0.00924 on the GPU. The CPU results are more consistent with the results I get with other numeric libraries.
The cg_tag object reports the estimated relative error after the solve. On the CPU it reports 0.0000107717157843 and on the GPU it reports 0.0000119388501763. I'm not quite sure how to interpret these numbers. If I interpret them as relative (or even absolute) error for any element, the actual error is much larger.
For both CPU and GPU cg_tag reports that the tolerance is 1e-8.
Is this expected? Can I improve the quality of the results by tuning the cg_tag() parameters?
I'll try to build a simple example which demonstrates this.
the numbers you get suggest that your system matrix is presumably rather poorly conditioned. Try to set the relative tolerance to 1e-12 and let me know if the difference in the solution remains.
The meaning of 'relative tolerance' is that the residual has changed in magnitude by the specified value. Since the relative error in the solution x is essentially proportional to the condition number of the matrix times the reduction in the residual, you get a large error in the solution if the condition number is in the range 10^8. In such case, the behavior would indeed be expected (note that on the GPU you get slightly different results due to different rounding).
Another note: 70x70 is way to small for the GPU to provide good performance. Also, for dense matrices in the range 70x70 you presumably want to run a direct LU factorization on the CPU for best performance.
Changing the tolerance doesn't seem to make much difference. I tried tolerances as small as 1e-32 and don't see any significant changes in the solution.
Hmm, do you use double precision? Tolerances tighter than 1e-8 (float) and 1e-16 (double) will hardly ever be reached because of round-off errors, so the solver will only terminate because of reaching the maximum number of operatoins.
I think I may have prematurely blamed the solver. It looks like there's a bug in my own code which is only showing up when I enable OpenCL.
Ah, I see… Thanks for letting us know so that we don't have sleepless nights because of worrying about our CG implementation… ;-)
Log in to post a comment.
Sign up for the SourceForge newsletter:
You seem to have CSS turned off.
Please don't fill out this field.