[Mplapack-devel] Accelerated double-double version of Regmm on NVIDIA C2050
Status: Pre-Alpha
Brought to you by:
nakatamaho
From: Maho N. <ma...@ri...> - 2012-10-13 06:02:43
|
Hi all, I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050. You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . more explicitly, http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050.20121011.tar.gz/download the file name is Rgemm_C2050.20121011.tar.gz MD5sum is a4da6bfcadef19baf692502d6236f0e6 This is preliminary version of double-double version Rgemm, for benchmarking purpose. Requirements: * NVIDIA C2050, M2070, 2075, 2090. * CUDA 4.2 (CUDA 4.1 is known to have a bug) * SDK assumed to be installed at /usr/local/cuda/. How to test: $ tar xvfz Rgemm_C2050.20121011.tar.gz $ cd Rgemm_C2050 $ make ... building Rgemm for C2050, and taking benchmark and results are saved as CSV files. The default precision for multiplication and addition is rounding IEEE's one. subdir bench_all test square matrix of various size. subdir bench_ecc_onoff test ecc on/off case. change in ECC configuration requires reboot. subdir bench_jitter test jitter of NVIDIA GPU. subdir bench_pointerredirecting test the effect of pointer redirecting. subdir bench_rectangular test matrix-matrix multiplication for rectangular matrix. subdir bench_sloppy test lower accurate methods. How to look at the results: All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed, (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". --quote start device_count : 1 device name -> Tesla C2050 cudareturn -> 0 cudaGetDevice()=0 n - n mode 2, 0.00007523 8, 0.00401315 15, 0.02368548 32, 0.18986460 47, 0.51354554 64, 1.10947988 65, 1.05225521 81, 1.77540829 97, 2.11975457 ... --quote end Notes: The full reference version can be downloaded is http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated hopefully soon. License: 2-clause BSD style license. See each files for details. Citation: * "A Fast implementation of matrix-matrix product in double-double precision on NVIDIA C2050 and application to semidefinite programming", Maho Nakata, Yasuyoshi Takao, Shigeho Noda and Ryutaro Himeno", International Conference on Networking and Computing, Okinawa, Japan, 2012. (To appear) *"Acceleration of matrix-matrix product in double-double precision using GPU", Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese). Programmed by: Takao, Yasuyoshi, Nakata, Maho, and RIKEN. Enjoy, -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt |