Menu

Simple sparse matrix example does not work [Armadillo][OpenCL]

Dmitrii
2017-05-01
2017-05-01
  • Dmitrii

    Dmitrii - 2017-05-01

    Hi! I am trying to solve a simple linear system of linear equations with a sparse matrix:

    #include <iostream>
    
    #define VIENNACL_WITH_OPENCL
    #define ENABLE_OPENCL
    
    // Armadillo headers (disable BLAS and LAPACK to avoid linking issues)
    #define ARMA_DONT_USE_BLAS
    #define ARMA_DONT_USE_LAPACK
    #include <armadillo>
    
    // IMPORTANT: Must be set prior to any ViennaCL includes if you want to use ViennaCL algorithms on Armadillo objects
    #define VIENNACL_WITH_ARMADILLO 1
    
    // ViennaCL includes
    #include "viennacl/vector.hpp"
    #include "viennacl/matrix.hpp"
    #include "viennacl/compressed_matrix.hpp"
    #include "viennacl/linalg/prod.hpp"
    #include <viennacl/linalg/cg.hpp>
    
    int main()
    {
        viennacl::compressed_matrix<float> vcl_sparsemat(3, 3);
    
        vcl_sparsemat(0, 0) = 1;
        vcl_sparsemat(1, 1) = 1;
        vcl_sparsemat(2, 2) = 1;
    
        viennacl::vector<float> vcl_rhs(3);
    
        vcl_rhs(0) = 1;
        vcl_rhs(1) = 2;
        vcl_rhs(2) = 3;
    
        arma::SpMat<float> arma_sparsemat(3, 3);
        viennacl::copy(vcl_sparsemat, arma_sparsemat);
        arma_sparsemat.print("sparsemat");
    
        arma::Col<float> arma_rhs(3);
        viennacl::copy(vcl_rhs, arma_rhs);
        arma_rhs.print("rhs");
    
        viennacl::vector<float> vcl_result(3);
        vcl_result = viennacl::linalg::solve(vcl_sparsemat, vcl_rhs, viennacl::linalg::cg_tag());
    
        arma::Col<float> arma_result(3);
        viennacl::copy(vcl_result, arma_result);
        arma_result.print("result");
    
        std::cin.get();
    }
    

    Ouptut:

    sparsemat
    [matrix size: 3x3; n_nonzero: 3; density: 33.33%]
    
         (0, 0)         1.0000
         (1, 1)         1.0000
         (2, 2)         1.0000
    
    rhs
       1.0000
       2.0000
       3.0000
    result
          nan
          nan
          nan
    

    Where did I go wrong?

     
  • Karl Rupp

    Karl Rupp - 2017-05-02

    Hi Dmitrii,
    the following stripped-down example produces the expected result:

    #include <iostream>
    
    #define VIENNACL_WITH_OPENCL
    
    // ViennaCL includes
    #include "viennacl/vector.hpp"
    #include "viennacl/matrix.hpp"
    #include "viennacl/compressed_matrix.hpp"
    #include "viennacl/linalg/prod.hpp"
    #include <viennacl/linalg/cg.hpp>
    
    int main()
    {
        viennacl::compressed_matrix<float> vcl_sparsemat(3, 3);
    
        vcl_sparsemat(0, 0) = 1;
        vcl_sparsemat(1, 1) = 1;
        vcl_sparsemat(2, 2) = 1;
    
        viennacl::vector<float> vcl_rhs(3);
    
        vcl_rhs(0) = 1;
        vcl_rhs(1) = 2;
        vcl_rhs(2) = 3;
    
        std::cout << vcl_sparsemat << std::endl;
    
        std::cout << vcl_rhs << std::endl;
    
        viennacl::vector<float> vcl_result(3);
        vcl_result = viennacl::linalg::solve(vcl_sparsemat, vcl_rhs, viennacl::linalg::cg_tag());
    
        std::cout << vcl_result << std::endl;
    }
    

    Output:

    compressed_matrix of size (3, 3) with 3 nonzeros:
      (0, 0)    1
      (1, 1)    1
      (2, 2)    1
    
    [3](1,2,3)
    [3](1,2,3)
    

    I'm still investigating what the problem with your version is.

    Best regards,
    Karli

     
  • Karl Rupp

    Karl Rupp - 2017-05-02

    I get correct results with your code snippet as well:

    [matrix size: 3x3; n_nonzero: 3; density: 33.33%]
    
         (0, 0)         1.0000
         (1, 1)         1.0000
         (2, 2)         1.0000
    
    rhs
       1.0000
       2.0000
       3.0000
    result
       1.0000
       2.0000
       3.0000
    

    Which hardware an OS are you running on? My only explanation is that there is a problem in a device-specific compute kernel.

    Best regards,
    Karli

     
  • Dmitrii

    Dmitrii - 2017-05-02

    Hi, Karl, thank you for the response! The stripped down version fails for me as well:

    compressed_matrix of size (3, 3) with 3 nonzeros:
      (0, 0)        1
      (1, 1)        1
      (2, 2)        1
    
    [3](1,2,3)
    [3](nan,nan,nan)
    

    However with a little modification to my original code, where I omit the conversions to viennacl types and simply do this call:

    arma_result = viennacl::linalg::solve(arma_sparsemat, arma_rhs, viennacl::linalg::cg_tag());
    

    I get the neccessary result. Going ahead and adding another element does not work so well however:

    sparsemat
    [matrix size: 3x3; n_nonzero: 4; density: 44.44%]
    
         (0, 0)         1.0000
         (0, 1)         0.5000
         (1, 1)         1.0000
         (2, 2)         1.0000
    
    rhs
       1.0000
       2.0000
       3.0000
    arma result
       0.0216
       2.0175
       3.0262
    

    and if I set

    viennacl::linalg::cg_tag(1e-10, 10000)
    

    then the solution gets even more off

    sparsemat
    [matrix size: 3x3; n_nonzero: 4; density: 44.44%]
    
         (0, 0)         1.0000
         (0, 1)         0.5000
         (1, 1)         1.0000
         (2, 2)         1.0000
    
    rhs
       1.0000
       2.0000
       3.0000
    arma result
       0.7585
       2.4914
       3.7371
    

    My OS: Windows 7 Ultimate x64 (version 6.1, build 7600)
    Compiler 32-bit: gcc.exe (i686-posix-dwarf-rev0, Built by MinGW-W64 project) 5.1.0

    For other info, I ran opencl-info.cpp:

    # =========================================
    #         Platform Information
    # =========================================
    #
    # Vendor and version: Intel(R) Corporation: OpenCL 1.1
    #
    # ViennaCL uses this OpenCL platform by default.
    #
    # Available Devices:
    #
    
      -----------------------------------------
    Address Bits:                  64
    Available:                     1
    Compiler Available:            1
    Endian Little:                 1
    Error Correction Support:      0
    Execution Capabilities:        CL_EXEC_KERNEL
    Extensions:                    cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_l
    ocal_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_gl_sharing cl_khr_d3d10_sharing cl_intel_dx9_media_sh
    aring cl_khr_3d_image_writes cl_khr_byte_addressable_store
    Global Mem Cache Size:         2097152 Bytes
    Global Mem Cache Type:         CL_READ_WRITE_CACHE
    Global Mem Cacheline Size:     64 Bytes
    Global Mem Size:               1702887424 Bytes
    Host Unified Memory:           1
    Image Support:                 1
    Image2D Max Height:            16384
    Image2D Max Width:             16384
    Image3D Max Depth:             2048
    Image3D Max Height:            2048
    Image3D Max Width:             2048
    Local Mem Size:                65536 Bytes
    Local Mem Type:                CL_LOCAL
    Max Clock Frequency:           350 MHz
    Max Compute Units:             16
    Max Constant Args:             8
    Max Constant Buffer Size:      65536 Bytes
    Max Mem Alloc Size:            425721856 Bytes
    Max Parameter Size:            1024 Bytes
    Max Read Image Args:           128
    Max Samplers:                  16
    Max Work Group Size:           512
    Max Work Item Dimensions:      3
    Max Work Item Sizes:           512 512 512
    Max Write Image Args:          8
    Mem Base Addr Align:           1024
    Min Data Type Align Size:      128 Bytes
    Name:                          Intel(R) HD Graphics 4000
    Native Vector Width char:      1
    Native Vector Width short:     1
    Native Vector Width int:       1
    Native Vector Width long:      1
    Native Vector Width float:     1
    Native Vector Width double:    0
    Native Vector Width half:      1
    OpenCL C Version:              OpenCL C 1.1
    Platform:                      0x637f20
    Preferred Vector Width char:   1
    Preferred Vector Width short:  1
    Preferred Vector Width int:    1
    Preferred Vector Width long:   1
    Preferred Vector Width float:  1
    Preferred Vector Width double: 0
    Preferred Vector Width half:   1
    Profile:                       FULL_PROFILE
    Profiling Timer Resolution:    80 ns
    Queue Properties:              CL_QUEUE_PROFILING_ENABLE
    Single FP Config:              CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST CL_FP_ROUND_TO_ZERO CL_FP_ROUND_TO_INF
    Type:                          GPU
    Vendor:                        Intel(R) Corporation
    Vendor ID:                     32902
    Version:                       OpenCL 1.1
    Driver Version:                8.15.10.2752
    ViennaCL Device Architecture:  8
    ViennaCL Database Mapped Name: Intel(R) HD Graphics 4000
      -----------------------------------------
    
    ###########################################
    
    # =========================================
    #         Platform Information
    # =========================================
    #
    # Vendor and version: Intel(R) Corporation: OpenCL 2.0
    #
    #
    # Available Devices:
    #
    
      -----------------------------------------
    Address Bits:                  32
    Available:                     1
    Compiler Available:            1
    Endian Little:                 1
    Error Correction Support:      0
    Execution Capabilities:        CL_EXEC_KERNEL CL_EXEC_NATIVE_KERNEL
    Extensions:                    cl_khr_icd cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_l
    ocal_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_byte_addressable_store cl_khr_spir cl_intel_exec_by_l
    ocal_thread cl_khr_depth_images cl_khr_3d_image_writes cl_khr_image2d_from_buffer cl_intel_dx9_media_sharing cl_khr_dx9_
    media_sharing cl_khr_d3d11_sharing cl_khr_gl_sharing
    Global Mem Cache Size:         262144 Bytes
    Global Mem Cache Type:         CL_READ_WRITE_CACHE
    Global Mem Cacheline Size:     64 Bytes
    Global Mem Size:               2147352576 Bytes
    Host Unified Memory:           1
    Image Support:                 1
    Image2D Max Height:            16384
    Image2D Max Width:             16384
    Image3D Max Depth:             2048
    Image3D Max Height:            2048
    Image3D Max Width:             2048
    Local Mem Size:                32768 Bytes
    Local Mem Type:                CL_GLOBAL
    Max Clock Frequency:           2400 MHz
    Max Compute Units:             4
    Max Constant Args:             480
    Max Constant Buffer Size:      131072 Bytes
    Max Mem Alloc Size:            536838144 Bytes
    Max Parameter Size:            3840 Bytes
    Max Read Image Args:           480
    Max Samplers:                  480
    Max Work Group Size:           8192
    Max Work Item Dimensions:      3
    Max Work Item Sizes:           8192 8192 8192
    Max Write Image Args:          480
    Mem Base Addr Align:           1024
    Min Data Type Align Size:      128 Bytes
    Name:                                 Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz
    Native Vector Width char:      16
    Native Vector Width short:     8
    Native Vector Width int:       4
    Native Vector Width long:      2
    Native Vector Width float:     4
    Native Vector Width double:    0
    Native Vector Width half:      0
    OpenCL C Version:              OpenCL C 2.0
    Platform:                      0x672b28
    Preferred Vector Width char:   1
    Preferred Vector Width short:  1
    Preferred Vector Width int:    1
    Preferred Vector Width long:   1
    Preferred Vector Width float:  1
    Preferred Vector Width double: 0
    Preferred Vector Width half:   0
    Profile:                       FULL_PROFILE
    Profiling Timer Resolution:    427 ns
    Queue Properties:              CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE CL_QUEUE_PROFILING_ENABLE
    Single FP Config:              CL_FP_DENORM CL_FP_INF_NAN CL_FP_ROUND_TO_NEAREST
    Type:                          CPU
    Vendor:                        Intel(R) Corporation
    Vendor ID:                     32902
    Version:                       OpenCL 2.0 (Build 162)
    Driver Version:                5.3.0.713
    ViennaCL Device Architecture:  8
    ViennaCL Database Mapped Name:        Intel(R) Core(TM) i3-3110M CPU @ 2.40GHz
      -----------------------------------------
    
    ###########################################
    

    I also ran the opencl.cpp benchmark:

    ----------------------------------------------
                   Device Info
    ----------------------------------------------
    Name:                Intel(R) HD Graphics 4000
    Vendor:              Intel(R) Corporation
    Type:                GPU
    Available:           1
    Max Compute Units:   16
    Max Work Group Size: 512
    Global Mem Size:     1702887424
    Local Mem Size:      65536
    Local Mem Type:      1
    Host Unified Memory: 1
    
    ----------------------------------------------
    ----------------------------------------------
    ## Benchmark :: OpenCL performance
    ----------------------------------------------
    
       -------------------------------
       # benchmarking single-precision
       -------------------------------
    Time for building scalar kernels: 3.29271e-005
    Time for building vector kernels: 4.4508
    Time for building matrix kernels: 2.57711
    Time for building compressed_matrix kernels: 1.78181
    Time for 100000 entry accesses on host: 0.000555058
    Time per entry: 5.55058e-009
    Result of operation on host: 104839
    Time for 100000 entry accesses via OpenCL: 9.13609
    Time per entry: 9.13609e-005
    Result of operation via OpenCL: 104839
    

    At first I compiled the ViennaCL library using cmake. When I tried to use premake (which I use in my project), premake did not find OpenCL like cmake did. I could not figure out where cmake found the OpenCL library and so I simply installed Intel OpenCL SDK, took OpenCL.lib from C:\Program Files (x86)\Intel\OpenCL SDK\5.3\lib\x86 and linked to it. I am not sure if that's the right way to do it. Meanwhile, I am trying to investigate where cmake found this library.

     

    Last edit: Dmitrii 2017-05-02
  • Karl Rupp

    Karl Rupp - 2017-05-03

    CG only works for symmetric, positive definite matrices. The modified matrix you provide is not symmetric, hence CG computes a vector that is not a true solution.

    As for OpenCL: You can find the OpenCL path detected by CMake in the GUI under OPENCL_LIBRARY and OPENCL_INCLUDE_DIR.

     

Log in to post a comment.