Hi! I am trying to solve a simple linear system of linear equations with a sparse matrix:
#include<iostream>#define VIENNACL_WITH_OPENCL#define ENABLE_OPENCL// Armadillo headers (disable BLAS and LAPACK to avoid linking issues)#define ARMA_DONT_USE_BLAS#define ARMA_DONT_USE_LAPACK#include<armadillo>// IMPORTANT: Must be set prior to any ViennaCL includes if you want to use ViennaCL algorithms on Armadillo objects#define VIENNACL_WITH_ARMADILLO 1// ViennaCL includes#include"viennacl/vector.hpp"#include"viennacl/matrix.hpp"#include"viennacl/compressed_matrix.hpp"#include"viennacl/linalg/prod.hpp"#include<viennacl/linalg/cg.hpp>intmain(){viennacl::compressed_matrix<float>vcl_sparsemat(3,3);vcl_sparsemat(0,0)=1;vcl_sparsemat(1,1)=1;vcl_sparsemat(2,2)=1;viennacl::vector<float>vcl_rhs(3);vcl_rhs(0)=1;vcl_rhs(1)=2;vcl_rhs(2)=3;arma::SpMat<float>arma_sparsemat(3,3);viennacl::copy(vcl_sparsemat,arma_sparsemat);arma_sparsemat.print("sparsemat");arma::Col<float>arma_rhs(3);viennacl::copy(vcl_rhs,arma_rhs);arma_rhs.print("rhs");viennacl::vector<float>vcl_result(3);vcl_result=viennacl::linalg::solve(vcl_sparsemat,vcl_rhs,viennacl::linalg::cg_tag());arma::Col<float>arma_result(3);viennacl::copy(vcl_result,arma_result);arma_result.print("result");std::cin.get();}
My OS: Windows 7 Ultimate x64 (version 6.1, build 7600)
Compiler 32-bit: gcc.exe (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 5.1.0
For other info, I ran opencl-info.cpp:
# =========================================
# Platform Information
# =========================================
#
# Vendor and version: Intel(R) Corporation: OpenCL 1.1
#
# ViennaCL uses this OpenCL platform by default.
#
# Available Devices:
#
-----------------------------------------
Address Bits: 64
Available: 1
Compiler Available: 1
Endian Little: 1
Error Correction Support: 0
Execution Capabilities: CL_EXEC_KERNEL
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_l
ocal_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_intel_dx9_media_sh
aring cl_khr_3d_image_writes cl_khr_byte_addressable_store
Global Mem Cache Size: 2097152 Bytes
Global Mem Cache Type: CL_READ_WRITE_CACHE
Global Mem Cacheline Size: 64 Bytes
Global Mem Size: 1702887424 Bytes
Host Unified Memory: 1
Image Support: 1
Image2D Max Height: 16384
Image2D Max Width: 16384
Image3D Max Depth: 2048
Image3D Max Height: 2048
Image3D Max Width: 2048
Local Mem Size: 65536 Bytes
Local Mem Type: CL_LOCAL
Max Clock Frequency: 350 MHz
Max Compute Units: 16
Max Constant Args: 8
Max Constant Buffer Size: 65536 Bytes
Max Mem Alloc Size: 425721856 Bytes
Max Parameter Size: 1024 Bytes
Max Read Image Args: 128
Max Samplers: 16
Max Work Group Size: 512
Max Work Item Dimensions: 3
Max Work Item Sizes: 512 512 512
Max Write Image Args: 8
Mem Base Addr Align: 1024
Min Data Type Align Size: 128 Bytes
Name: Intel(R) HD Graphics 4000
Native Vector Width char: 1
Native Vector Width short: 1
Native Vector Width int: 1
Native Vector Width long: 1
Native Vector Width float: 1
Native Vector Width double: 0
Native Vector Width half: 1
OpenCL C Version: OpenCL C 1.1
Platform: 0x637f20
Preferred Vector Width char: 1
Preferred Vector Width short: 1
Preferred Vector Width int: 1
Preferred Vector Width long: 1
Preferred Vector Width float: 1
Preferred Vector Width double: 0
Preferred Vector Width half: 1
Profile: FULL_PROFILE
Profiling Timer Resolution: 80 ns
Queue Properties: CL_QUEUE_PROFILING_ENABLE
Single FP Config: CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF
Type: GPU
Vendor: Intel(R) Corporation
Vendor ID: 32902
Version: OpenCL 1.1
Driver Version: 8.15.10.2752
ViennaCL Device Architecture: 8
ViennaCL Database Mapped Name: Intel(R) HD Graphics 4000
-----------------------------------------
###########################################
# =========================================
# Platform Information
# =========================================
#
# Vendor and version: Intel(R) Corporation: OpenCL 2.0
#
#
# Available Devices:
#
-----------------------------------------
Address Bits: 32
Available: 1
Compiler Available: 1
Endian Little: 1
Error Correction Support: 0
Execution Capabilities: CL_EXEC_KERNEL CL_EXEC_NATIVE_KERNEL
Extensions: cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_l
ocal_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_l
ocal_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_intel_dx9_media_sharing cl_khr_dx9_
media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing
Global Mem Cache Size: 262144 Bytes
Global Mem Cache Type: CL_READ_WRITE_CACHE
Global Mem Cacheline Size: 64 Bytes
Global Mem Size: 2147352576 Bytes
Host Unified Memory: 1
Image Support: 1
Image2D Max Height: 16384
Image2D Max Width: 16384
Image3D Max Depth: 2048
Image3D Max Height: 2048
Image3D Max Width: 2048
Local Mem Size: 32768 Bytes
Local Mem Type: CL_GLOBAL
Max Clock Frequency: 2400 MHz
Max Compute Units: 4
Max Constant Args: 480
Max Constant Buffer Size: 131072 Bytes
Max Mem Alloc Size: 536838144 Bytes
Max Parameter Size: 3840 Bytes
Max Read Image Args: 480
Max Samplers: 480
Max Work Group Size: 8192
Max Work Item Dimensions: 3
Max Work Item Sizes: 8192 8192 8192
Max Write Image Args: 480
Mem Base Addr Align: 1024
Min Data Type Align Size: 128 Bytes
Name: Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz
Native Vector Width char: 16
Native Vector Width short: 8
Native Vector Width int: 4
Native Vector Width long: 2
Native Vector Width float: 4
Native Vector Width double: 0
Native Vector Width half: 0
OpenCL C Version: OpenCL C 2.0
Platform: 0x672b28
Preferred Vector Width char: 1
Preferred Vector Width short: 1
Preferred Vector Width int: 1
Preferred Vector Width long: 1
Preferred Vector Width float: 1
Preferred Vector Width double: 0
Preferred Vector Width half: 0
Profile: FULL_PROFILE
Profiling Timer Resolution: 427 ns
Queue Properties: CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_QUEUE_PROFILING_ENABLE
Single FP Config: CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST
Type: CPU
Vendor: Intel(R) Corporation
Vendor ID: 32902
Version: OpenCL 2.0 (Build 162)
Driver Version: 5.3.0.713
ViennaCL Device Architecture: 8
ViennaCL Database Mapped Name: Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz
-----------------------------------------
###########################################
I also ran the opencl.cpp benchmark:
----------------------------------------------
Device Info
----------------------------------------------
Name: Intel(R) HD Graphics 4000
Vendor: Intel(R) Corporation
Type: GPU
Available: 1
Max Compute Units: 16
Max Work Group Size: 512
Global Mem Size: 1702887424
Local Mem Size: 65536
Local Mem Type: 1
Host Unified Memory: 1
----------------------------------------------
----------------------------------------------
## Benchmark :: OpenCL performance
----------------------------------------------
-------------------------------
# benchmarking single-precision
-------------------------------
Time for building scalar kernels: 3.29271e-005
Time for building vector kernels: 4.4508
Time for building matrix kernels: 2.57711
Time for building compressed_matrix kernels: 1.78181
Time for 100000 entry accesses on host: 0.000555058
Time per entry: 5.55058e-009
Result of operation on host: 104839
Time for 100000 entry accesses via OpenCL: 9.13609
Time per entry: 9.13609e-005
Result of operation via OpenCL: 104839
At first I compiled the ViennaCL library using cmake. When I tried to use premake (which I use in my project), premake did not find OpenCL like cmake did. I could not figure out where cmake found the OpenCL library and so I simply installed Intel OpenCL SDK, took OpenCL.lib from C:\Program Files (x86)\Intel\OpenCL SDK\5.3\lib\x86 and linked to it. I am not sure if that's the right way to do it. Meanwhile, I am trying to investigate where cmake found this library.
Last edit: Dmitrii 2017-05-02
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
CG only works for symmetric, positive definite matrices. The modified matrix you provide is not symmetric, hence CG computes a vector that is not a true solution.
As for OpenCL: You can find the OpenCL path detected by CMake in the GUI under OPENCL_LIBRARY and OPENCL_INCLUDE_DIR.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Hi! I am trying to solve a simple linear system of linear equations with a sparse matrix:
Ouptut:
Where did I go wrong?
Hi Dmitrii,
the following stripped-down example produces the expected result:
Output:
I'm still investigating what the problem with your version is.
Best regards,
Karli
I get correct results with your code snippet as well:
Which hardware an OS are you running on? My only explanation is that there is a problem in a device-specific compute kernel.
Best regards,
Karli
Hi, Karl, thank you for the response! The stripped down version fails for me as well:
However with a little modification to my original code, where I omit the conversions to viennacl types and simply do this call:
I get the neccessary result. Going ahead and adding another element does not work so well however:
and if I set
then the solution gets even more off
My OS: Windows 7 Ultimate x64 (version 6.1, build 7600)
Compiler 32-bit: gcc.exe (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 5.1.0
For other info, I ran opencl-info.cpp:
I also ran the opencl.cpp benchmark:
At first I compiled the ViennaCL library using cmake. When I tried to use premake (which I use in my project), premake did not find OpenCL like cmake did. I could not figure out where cmake found the OpenCL library and so I simply installed Intel OpenCL SDK, took OpenCL.lib from C:\Program Files (x86)\Intel\OpenCL SDK\5.3\lib\x86 and linked to it. I am not sure if that's the right way to do it. Meanwhile, I am trying to investigate where cmake found this library.
Last edit: Dmitrii 2017-05-02
CG only works for symmetric, positive definite matrices. The modified matrix you provide is not symmetric, hence CG computes a vector that is not a true solution.
As for OpenCL: You can find the OpenCL path detected by CMake in the GUI under OPENCL_LIBRARY and OPENCL_INCLUDE_DIR.