ViennaCL / Discussion / General Discussion: Trouble compiling some example projects with cuda

ekk - 2016-04-15

Dear all,

I have been meaning to use ViennaCL for solving (tall) non-square linear systems in the least-square sense. Unfortunately, I am already running into trouble when trying to compile the least-squares-cuda example. Specifically, I get the following errors during compilation:

.../boost_1_59_0_vs2013\boost/numeric/ublas/vector.hpp(640): error C2039: 'iterator' : is not a member of 'boost::numeric::ublas'

../boost_1_59_0_vs2013\boost/smart_ptr/detail/array_allocator.hpp(69): error C2146: syntax error : missing ';' before identifier 'CA'

../boost_1_59_0_vs2013\boost/smart_ptr/detail/array_allocator.hpp(69): error C4430: missing type specifier - int assumed. Note: C++ does not support default-int

It does not appear to be a general problem with cuda since some of the other cuda-using example projects can be compiled successfully (such as the iterative-solver-cuda project).

My build configuration is 64-bit, VS2013 (on Win 8.1), Vienna-CL 1.7.1, boost 1.59, Cuda 7.5 ..

I added a screenshot of my CMAKE parameters in case there's is something wrong in that department:
http://i.imgur.com/f59jkx3.png

On a different, but related note: I also experimented with the (non-cuda) least-squares example project applying the solver to matrices of sizes like 20000x50. If I am not mistaken, the Option 2 (using viennaCL types) is taking considerably longer than the one just utilizing boost ublas. Is there an explanation for this, or am I probably doing something wrong?

Your help would be much appreciated!
Best regards,
Ercan

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Rupp - 2016-04-15

Hi Ercan,

as far as I can tell, you ran into an incompatibility of CUDA 7.5 with uBLAS. The QR factorization uses a few bits of uBLAS which are not used in other examples, so that might explain why other examples work. The CMake configuration looks good. Any chance you can try other versions of CUDA? I can also offer to refactor the example such that it no longer uses uBLAS (this is on my TODO list anyway).

As for performance: Our current implementation of QR factorization is not particularly optimized for tall and skinny matrices. This may explain why you see better performance with uBLAS types.

Best regards,
Karli

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

ekk - 2016-04-20

Hi Karli,

thank you for your (super-fast) response! In the meantime I have tried different versions of CUDA (6.0, 7.0, 7.5), boost (1.60, 1.59 and 1.53), and visual studios (2012, 2013) in different combinations. However, none would resolve the issue unfortunately.

So if it isn't too much trouble, it would be really great if CUDA was decoupled from uBLAS.

With regard to the performance question: can you by any chance give an estimate of how the CUDA implementation for least-squares compares to the uBLAS and OpenCL ones respectively?

Again, many thanks and best regards,
Ercan

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Rupp - 2016-04-20

thank you for the extensive testing. Please allow for a few days to update the example and resolve the incompatibility.

As for performance: CUDA and OpenCL almost always provide the same performance for the same kernel implementation on the same GPU. In the context of ViennaCL, you may see slightly better performance with OpenCL if you use anything involving dense matrix-matrix multiplies. This, however, is likely to be resolved soon.

The question of CUDA/OpenCL vs. uBLAS mainly depends on your system size and has no simple answer. Rule of thumb: The larger the matrices involved and the more equal the matrix dimensions are (-> "square"), the better CUDA/OpenCL will perform.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Rupp - 2016-04-29

quick update: I just got to the refactoring today and found a bunch of non-optimal uses of data transfer. Let me fix this as well, you will hear from me on Monday.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Trouble compiling some example projects with cuda

Linear algebra and solver library using CUDA, OpenCL, and OpenMP

Forums

Help

Trouble compiling some example projects with cuda

Trouble compiling some example projects with cuda

Linear algebra and solver library using CUDA, OpenCL, and OpenMP

Forums

Help

Trouble compiling some example projects with cuda document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Trouble compiling some example projects with cuda