From: Dufort, P. <Pau...@uh...> - 2014-02-13 21:48:09
|
Hi Karl, Thanks very much for responding so quickly and for your suggestion, it worked really well: with that change, the performance surpassed the 1.4.2 value of 500 Gflops and went all the way up to 600 Gflops! > Thank you very much for the positive feedback! Really, I can't say enough good things about ViennaCL. I've been trying to make a go of it with OpenCL because I dislike proprietary solutions like CUDA. But as you will know, the ecosystem of libraries for OpenCL is still pretty sparse compared to CUDA. So the existence of such a highly specialized, high quality library like ViennaCL is a major enabler. Regards, Paul Paul Dufort, Ph.D. Computational Imaging Scientist The Joint Department of Medical Imaging Mount Sinai Hospital, University Health Network, Women's College Hospital Room MP 14-322 Back Office, 14th Floor Main Pavilion Department of Medical Imaging, Toronto Western Hospital 399 Bathurst Street, Toronto, ON M5T 2S8 Cell: 647-291-6180 E-Mail: pau...@uh... -----Original Message----- From: Karl Rupp [mailto:ru...@iu...] Sent: February 13, 2014 3:49 PM To: Dufort, Paul; 'vie...@li...' Cc: Philippe Tillet Subject: Re: [ViennaCL-support] Performance reduction from 1.4.2 to 1.5.1 Deal Paul, > First, I want to say thank you very much for creating > this library. It is extremely useful and easy to use, and I have come > to rely on it a great deal in my research. Thank you very much for the positive feedback! > Now to the problem: I just upgraded from 1.4.2 to > 1.5.1 and found that the blas3bench results have taken a substantial hit. > Specifically, the basic dense matrix-matrix multiply consistently > gives me 500 Gflops on my Nvidia GTX 680 using version 1.4.1 and > 1.4.2, but has now dropped to 340 Gflops with 1.5.1. I've tried > fiddling with various things, but to no avail - it is a stable result. > Do you have any idea why this might be happening? I suppose that this is because of the ongoing integration of the kernel generator, which also brings a device database. So far there are only few devices (device families) in the database, it needs to be filled incrementally. As you can see here: https://github.com/viennacl/viennacl-dev/blob/master/viennacl/generator/autotuning/profiles.hpp we only have reference data for a GTX 470, which we use for all other NVIDIA Fermi GPUs (this is reasonable). We couldn't include a full tuning profile for Kepler GPUs on time for the release, hence it uses a fallback implementation. You can try an ad-hoc change in the released version as follows: - Edit viennacl/generator/profiles.hpp - Go to about line 250 and find the 22 lines for the GTX 470. - Copy&Paste the block, replacing "viennacl::ocl::Fermi" with "viennacl::ocl::Kepler", and "GeForce GTX 470" by "GeForce GTX 680" This should use the same Fermi kernel on a Kepler GPU, basically reproducing the 'old' behavior from 1.4.2. If the above doesn't work, I can only recommend to use 1.4.2 until we have a higher device database population. @Philippe: Do you know a better workaround? Either way, thanks for reporting, Paul. We definitely need to get this performance regression fixed in the next release. Best regards, Karli This e-mail may contain confidential and/or privileged information for the sole use of the intended recipient. Any review or distribution by anyone other than the person for whom it was originally intended is strictly prohibited. If you have received this e-mail in error, please contact the sender and delete all copies. Opinions, conclusions or other information contained in this e-mail may not be that of the organization. |