ViennaCL / Discussion / General Discussion: Matrix definition is a bit slow on OpenCL

Matrix definition is a bit slow on OpenCL

Forum: General Discussion

Creator: Olivier

Created: 2015-12-26

Updated: 2015-12-26

Olivier - 2015-12-26

Dear everyone,

My question is related to matrix declaration while using OpenCL such as:

viennacl::matrix<float> vcl_matrix(1000,1000);

The declaration itself (not the data transfer) requiere a lot amount of time compare to the data transfer , and I don' t explain myself why.

Thank you,

Olivier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Karl Rupp - 2015-12-26

Hi Olivier,

the matrix definition you mentioned does two things:
a) it allocates the necessary memory
b) it sets all entries to zero.

In order to execute b), the respective kernels need to be just-in-time compiled when using OpenCL. Hence, if vcl_matrix is the first time you use a viennacl::matrix, then the time you observe is the OpenCL kernel compilation time. The NVIDIA SDK uses some caching to keep those times small, while most other SDKs such as those from INTEL and AMD don't cache automatically.

To better demonstrate the effect, consider

viennacl::matrix<float> vcl_A(1,1); // compilation here, slow viennacl::matrix<float> vcl_B(1000,1000); // fast

If vcl_A is the first viennacl::matrix<> you use, the instantiation of vcl_A should take longer than the one for vcl_B because of the inital just-in-time kernel compilation.

Best regards,
Karli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Olivier - 2015-12-26

A great thanks Karli for your explaination. It is very clear now. Besides I would like to thank you for your very quick reply. It is really helpful.

Best regards,

Olivier

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Log in to post a comment.