ViennaCL / Discussion / General Discussion: Simple sparse matrix example does not work [Armadillo][OpenCL]

Hi! I am trying to solve a simple linear system of linear equations with a sparse matrix:

#include <iostream>

#define VIENNACL_WITH_OPENCL
#define ENABLE_OPENCL

// Armadillo headers (disable BLAS and LAPACK to avoid linking issues)
#define ARMA_DONT_USE_BLAS
#define ARMA_DONT_USE_LAPACK
#include <armadillo>

// IMPORTANT: Must be set prior to any ViennaCL includes if you want to use ViennaCL algorithms on Armadillo objects
#define VIENNACL_WITH_ARMADILLO 1

// ViennaCL includes
#include "viennacl/vector.hpp"
#include "viennacl/matrix.hpp"
#include "viennacl/compressed_matrix.hpp"
#include "viennacl/linalg/prod.hpp"
#include <viennacl/linalg/cg.hpp>

int main()
{
    viennacl::compressed_matrix<float> vcl_sparsemat(3, 3);

    vcl_sparsemat(0, 0) = 1;
    vcl_sparsemat(1, 1) = 1;
    vcl_sparsemat(2, 2) = 1;

    viennacl::vector<float> vcl_rhs(3);

    vcl_rhs(0) = 1;
    vcl_rhs(1) = 2;
    vcl_rhs(2) = 3;

    arma::SpMat<float> arma_sparsemat(3, 3);
    viennacl::copy(vcl_sparsemat, arma_sparsemat);
    arma_sparsemat.print("sparsemat");

    arma::Col<float> arma_rhs(3);
    viennacl::copy(vcl_rhs, arma_rhs);
    arma_rhs.print("rhs");

    viennacl::vector<float> vcl_result(3);
    vcl_result = viennacl::linalg::solve(vcl_sparsemat, vcl_rhs, viennacl::linalg::cg_tag());

    arma::Col<float> arma_result(3);
    viennacl::copy(vcl_result, arma_result);
    arma_result.print("result");

    std::cin.get();
}

Ouptut:

sparsemat
[matrix size: 3x3; n_nonzero: 3; density: 33.33%]

     (0, 0)         1.0000
     (1, 1)         1.0000
     (2, 2)         1.0000

rhs
   1.0000
   2.0000
   3.0000
result
      nan
      nan
      nan

Where did I go wrong?

Hi, Karl, thank you for the response! The stripped down version fails for me as well:

compressed_matrix of size (3, 3) with 3 nonzeros:
  (0, 0)        1
  (1, 1)        1
  (2, 2)        1

[3](1,2,3)
[3](nan,nan,nan)

However with a little modification to my original code, where I omit the conversions to viennacl types and simply do this call:

arma_result = viennacl::linalg::solve(arma_sparsemat, arma_rhs, viennacl::linalg::cg_tag());

I get the neccessary result. Going ahead and adding another element does not work so well however:

sparsemat
[matrix size: 3x3; n_nonzero: 4; density: 44.44%]

     (0, 0)         1.0000
     (0, 1)         0.5000
     (1, 1)         1.0000
     (2, 2)         1.0000

rhs
   1.0000
   2.0000
   3.0000
arma result
   0.0216
   2.0175
   3.0262

and if I set

viennacl::linalg::cg_tag(1e-10, 10000)

then the solution gets even more off

sparsemat
[matrix size: 3x3; n_nonzero: 4; density: 44.44%]

     (0, 0)         1.0000
     (0, 1)         0.5000
     (1, 1)         1.0000
     (2, 2)         1.0000

rhs
   1.0000
   2.0000
   3.0000
arma result
   0.7585
   2.4914
   3.7371

My OS: Windows 7 Ultimate x64 (version 6.1, build 7600)
Compiler 32-bit: gcc.exe (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 5.1.0

For other info, I ran opencl-info.cpp:

# =========================================
#         Platform Information
# =========================================
#
# Vendor and version: Intel(R) Corporation: OpenCL 1.1
#
# ViennaCL uses this OpenCL platform by default.
#
# Available Devices:
#

  -----------------------------------------
Address Bits:                  64
Available:                     1
Compiler Available:            1
Endian Little:                 1
Error Correction Support:      0
Execution Capabilities:        CL_EXEC_KERNEL
Extensions:                    cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_l
ocal_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_intel_dx9_media_sh
aring cl_khr_3d_image_writes cl_khr_byte_addressable_store
Global Mem Cache Size:         2097152 Bytes
Global Mem Cache Type:         CL_READ_WRITE_CACHE
Global Mem Cacheline Size:     64 Bytes
Global Mem Size:               1702887424 Bytes
Host Unified Memory:           1
Image Support:                 1
Image2D Max Height:            16384
Image2D Max Width:             16384
Image3D Max Depth:             2048
Image3D Max Height:            2048
Image3D Max Width:             2048
Local Mem Size:                65536 Bytes
Local Mem Type:                CL_LOCAL
Max Clock Frequency:           350 MHz
Max Compute Units:             16
Max Constant Args:             8
Max Constant Buffer Size:      65536 Bytes
Max Mem Alloc Size:            425721856 Bytes
Max Parameter Size:            1024 Bytes
Max Read Image Args:           128
Max Samplers:                  16
Max Work Group Size:           512
Max Work Item Dimensions:      3
Max Work Item Sizes:           512 512 512
Max Write Image Args:          8
Mem Base Addr Align:           1024
Min Data Type Align Size:      128 Bytes
Name:                          Intel(R) HD Graphics 4000
Native Vector Width char:      1
Native Vector Width short:     1
Native Vector Width int:       1
Native Vector Width long:      1
Native Vector Width float:     1
Native Vector Width double:    0
Native Vector Width half:      1
OpenCL C Version:              OpenCL C 1.1
Platform:                      0x637f20
Preferred Vector Width char:   1
Preferred Vector Width short:  1
Preferred Vector Width int:    1
Preferred Vector Width long:   1
Preferred Vector Width float:  1
Preferred Vector Width double: 0
Preferred Vector Width half:   1
Profile:                       FULL_PROFILE
Profiling Timer Resolution:    80 ns
Queue Properties:              CL_QUEUE_PROFILING_ENABLE
Single FP Config:              CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF
Type:                          GPU
Vendor:                        Intel(R) Corporation
Vendor ID:                     32902
Version:                       OpenCL 1.1
Driver Version:                8.15.10.2752
ViennaCL Device Architecture:  8
ViennaCL Database Mapped Name: Intel(R) HD Graphics 4000
  -----------------------------------------

###########################################

# =========================================
#         Platform Information
# =========================================
#
# Vendor and version: Intel(R) Corporation: OpenCL 2.0
#
#
# Available Devices:
#

  -----------------------------------------
Address Bits:                  32
Available:                     1
Compiler Available:            1
Endian Little:                 1
Error Correction Support:      0
Execution Capabilities:        CL_EXEC_KERNEL CL_EXEC_NATIVE_KERNEL
Extensions:                    cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_l
ocal_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_l
ocal_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_intel_dx9_media_sharing cl_khr_dx9_
media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing
Global Mem Cache Size:         262144 Bytes
Global Mem Cache Type:         CL_READ_WRITE_CACHE
Global Mem Cacheline Size:     64 Bytes
Global Mem Size:               2147352576 Bytes
Host Unified Memory:           1
Image Support:                 1
Image2D Max Height:            16384
Image2D Max Width:             16384
Image3D Max Depth:             2048
Image3D Max Height:            2048
Image3D Max Width:             2048
Local Mem Size:                32768 Bytes
Local Mem Type:                CL_GLOBAL
Max Clock Frequency:           2400 MHz
Max Compute Units:             4
Max Constant Args:             480
Max Constant Buffer Size:      131072 Bytes
Max Mem Alloc Size:            536838144 Bytes
Max Parameter Size:            3840 Bytes
Max Read Image Args:           480
Max Samplers:                  480
Max Work Group Size:           8192
Max Work Item Dimensions:      3
Max Work Item Sizes:           8192 8192 8192
Max Write Image Args:          480
Mem Base Addr Align:           1024
Min Data Type Align Size:      128 Bytes
Name:                                 Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz
Native Vector Width char:      16
Native Vector Width short:     8
Native Vector Width int:       4
Native Vector Width long:      2
Native Vector Width float:     4
Native Vector Width double:    0
Native Vector Width half:      0
OpenCL C Version:              OpenCL C 2.0
Platform:                      0x672b28
Preferred Vector Width char:   1
Preferred Vector Width short:  1
Preferred Vector Width int:    1
Preferred Vector Width long:   1
Preferred Vector Width float:  1
Preferred Vector Width double: 0
Preferred Vector Width half:   0
Profile:                       FULL_PROFILE
Profiling Timer Resolution:    427 ns
Queue Properties:              CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_QUEUE_PROFILING_ENABLE
Single FP Config:              CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST
Type:                          CPU
Vendor:                        Intel(R) Corporation
Vendor ID:                     32902
Version:                       OpenCL 2.0 (Build 162)
Driver Version:                5.3.0.713
ViennaCL Device Architecture:  8
ViennaCL Database Mapped Name:        Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz
  -----------------------------------------

###########################################

I also ran the opencl.cpp benchmark:

----------------------------------------------
               Device Info
----------------------------------------------
Name:                Intel(R) HD Graphics 4000
Vendor:              Intel(R) Corporation
Type:                GPU
Available:           1
Max Compute Units:   16
Max Work Group Size: 512
Global Mem Size:     1702887424
Local Mem Size:      65536
Local Mem Type:      1
Host Unified Memory: 1

----------------------------------------------
----------------------------------------------
## Benchmark :: OpenCL performance
----------------------------------------------

   -------------------------------
   # benchmarking single-precision
   -------------------------------
Time for building scalar kernels: 3.29271e-005
Time for building vector kernels: 4.4508
Time for building matrix kernels: 2.57711
Time for building compressed_matrix kernels: 1.78181
Time for 100000 entry accesses on host: 0.000555058
Time per entry: 5.55058e-009
Result of operation on host: 104839
Time for 100000 entry accesses via OpenCL: 9.13609
Time per entry: 9.13609e-005
Result of operation via OpenCL: 104839

At first I compiled the ViennaCL library using cmake. When I tried to use premake (which I use in my project), premake did not find OpenCL like cmake did. I could not figure out where cmake found the OpenCL library and so I simply installed Intel OpenCL SDK, took OpenCL.lib from C:\Program Files (x86)\Intel\OpenCL SDK\5.3\lib\x86 and linked to it. I am not sure if that's the right way to do it. Meanwhile, I am trying to investigate where cmake found this library.

Last edit: Dmitrii 2017-05-02

Simple sparse matrix example does not work [Armadillo][OpenCL]

Linear algebra and solver library using CUDA, OpenCL, and OpenMP

Forums

Help

Simple sparse matrix example does not work [Armadillo][OpenCL]

Simple sparse matrix example does not work [Armadillo][OpenCL]

Linear algebra and solver library using CUDA, OpenCL, and OpenMP

Forums

Help

Simple sparse matrix example does not work [Armadillo][OpenCL] document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Simple sparse matrix example does not work [Armadillo][OpenCL]