Menu

Matrix definition is a bit slow on OpenCL

Olivier
2015-12-26
2015-12-26
  • Olivier

    Olivier - 2015-12-26

    Dear everyone,

    My question is related to matrix declaration while using OpenCL such as:

    viennacl::matrix<float> vcl_matrix(1000,1000);

    The declaration itself (not the data transfer) requiere a lot amount of time compare to the data transfer , and I don' t explain myself why.

    Thank you,

    Olivier

     
  • Karl Rupp

    Karl Rupp - 2015-12-26

    Hi Olivier,

    the matrix definition you mentioned does two things:
    a) it allocates the necessary memory
    b) it sets all entries to zero.

    In order to execute b), the respective kernels need to be just-in-time compiled when using OpenCL. Hence, if vcl_matrix is the first time you use a viennacl::matrix, then the time you observe is the OpenCL kernel compilation time. The NVIDIA SDK uses some caching to keep those times small, while most other SDKs such as those from INTEL and AMD don't cache automatically.

    To better demonstrate the effect, consider

    viennacl::matrix<float> vcl_A(1,1);     // compilation here, slow
    viennacl::matrix<float> vcl_B(1000,1000); // fast
    

    If vcl_A is the first viennacl::matrix<> you use, the instantiation of vcl_A should take longer than the one for vcl_B because of the inital just-in-time kernel compilation.

    Best regards,
    Karli

     
  • Olivier

    Olivier - 2015-12-26

    A great thanks Karli for your explaination. It is very clear now. Besides I would like to thank you for your very quick reply. It is really helpful.

    Best regards,

    Olivier

     

Log in to post a comment.