Re: [ViennaCL-support] Performance reduction from 1.4.2 to 1.5.1

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Karl,

	Thanks very much for responding so quickly and for your suggestion, it worked really well: with that change, the performance surpassed the 1.4.2 value of 500 Gflops and went all the way up to 600 Gflops!

> Thank you very much for the positive feedback!

	Really, I can't say enough good things about ViennaCL. I've been trying to make a go of it with OpenCL because I dislike proprietary solutions like CUDA. But as you will know, the ecosystem of libraries for OpenCL is still pretty sparse compared to CUDA. So the existence of such a highly specialized, high quality library like ViennaCL is a major enabler.

Regards,
Paul

Paul Dufort, Ph.D.	
Computational Imaging Scientist
The Joint Department of Medical Imaging
Mount Sinai Hospital, University Health Network, Women's College Hospital
Room MP 14-322 Back Office, 14th Floor Main Pavilion
Department of Medical Imaging, Toronto Western Hospital
399 Bathurst Street, Toronto, ON  M5T 2S8
Cell: 647-291-6180
E-Mail: pau...@uh...

-----Original Message-----
From: Karl Rupp [mailto:ru...@iu...] 
Sent: February 13, 2014 3:49 PM
To: Dufort, Paul; 'vie...@li...'
Cc: Philippe Tillet
Subject: Re: [ViennaCL-support] Performance reduction from 1.4.2 to 1.5.1

Deal Paul,

 >                  First, I want to say thank you very much for creating
> this library. It is extremely useful and easy to use, and I have come 
> to rely on it a great deal in my research.

Thank you very much for the positive feedback!

>                  Now to the problem: I just upgraded from 1.4.2 to 
> 1.5.1 and found that the blas3bench results have taken a substantial hit.
> Specifically, the basic dense matrix-matrix multiply consistently 
> gives me 500 Gflops on my Nvidia GTX 680 using version 1.4.1 and 
> 1.4.2, but has now dropped to 340 Gflops with 1.5.1. I've tried 
> fiddling with various things, but to no avail - it is a stable result. 
> Do you have any idea why this might be happening?

I suppose that this is because of the ongoing integration of the kernel generator, which also brings a device database. So far there are only few devices (device families) in the database, it needs to be filled incrementally. As you can see here:
https://github.com/viennacl/viennacl-dev/blob/master/viennacl/generator/autotuning/profiles.hpp
we only have reference data for a GTX 470, which we use for all other NVIDIA Fermi GPUs (this is reasonable). We couldn't include a full tuning profile for Kepler GPUs on time for the release, hence it uses a fallback implementation.

You can try an ad-hoc change in the released version as follows:
  - Edit viennacl/generator/profiles.hpp
  - Go to about line 250 and find the 22 lines for the GTX 470.
  - Copy&Paste the block, replacing
     "viennacl::ocl::Fermi" with "viennacl::ocl::Kepler", and
     "GeForce GTX 470" by "GeForce GTX 680"
This should use the same Fermi kernel on a Kepler GPU, basically reproducing the 'old' behavior from 1.4.2.

If the above doesn't work, I can only recommend to use 1.4.2 until we have a higher device database population.

@Philippe: Do you know a better workaround?

Either way, thanks for reporting, Paul. We definitely need to get this performance regression fixed in the next release.

Best regards,
Karli

This e-mail may contain confidential and/or privileged information for the sole use of the intended recipient. 
Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. 
If you have received this e-mail in error, please contact the sender and delete all copies. 
Opinions, conclusions or other information contained in this e-mail may not be that of the organization.

Re: [ViennaCL-support] Performance reduction from 1.4.2 to 1.5.1

Linear algebra and solver library using CUDA, OpenCL, and OpenMP

Re: [ViennaCL-support] Performance reduction from 1.4.2 to 1.5.1