the matrix definition you mentioned does two things:
a) it allocates the necessary memory
b) it sets all entries to zero.
In order to execute b), the respective kernels need to be just-in-time compiled when using OpenCL. Hence, if vcl_matrix is the first time you use a viennacl::matrix, then the time you observe is the OpenCL kernel compilation time. The NVIDIA SDK uses some caching to keep those times small, while most other SDKs such as those from INTEL and AMD don't cache automatically.
If vcl_A is the first viennacl::matrix<> you use, the instantiation of vcl_A should take longer than the one for vcl_B because of the inital just-in-time kernel compilation.
Best regards,
Karli
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
Dear everyone,
My question is related to matrix declaration while using OpenCL such as:
viennacl::matrix<float> vcl_matrix(1000,1000);
The declaration itself (not the data transfer) requiere a lot amount of time compare to the data transfer , and I don' t explain myself why.
Thank you,
Olivier
Hi Olivier,
the matrix definition you mentioned does two things:
a) it allocates the necessary memory
b) it sets all entries to zero.
In order to execute b), the respective kernels need to be just-in-time compiled when using OpenCL. Hence, if vcl_matrix is the first time you use a viennacl::matrix, then the time you observe is the OpenCL kernel compilation time. The NVIDIA SDK uses some caching to keep those times small, while most other SDKs such as those from INTEL and AMD don't cache automatically.
To better demonstrate the effect, consider
If vcl_A is the first viennacl::matrix<> you use, the instantiation of vcl_A should take longer than the one for vcl_B because of the inital just-in-time kernel compilation.
Best regards,
Karli
A great thanks Karli for your explaination. It is very clear now. Besides I would like to thank you for your very quick reply. It is really helpful.
Best regards,
Olivier