[Mplapack-devel] Accelerated double-double version of Regmm on NVIDIA C2050

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Hi all,

I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050.
You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ .
more explicitly,
http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050.20121011.tar.gz/download
the file name is Rgemm_C2050.20121011.tar.gz
MD5sum is  a4da6bfcadef19baf692502d6236f0e6

This is preliminary version of double-double version Rgemm,
for benchmarking purpose. 

Requirements:
 * NVIDIA C2050, M2070, 2075, 2090.
 * CUDA 4.2 (CUDA 4.1 is known to have a bug)
 * SDK assumed to be installed at /usr/local/cuda/.

How to test:
 $ tar xvfz Rgemm_C2050.20121011.tar.gz
 $ cd Rgemm_C2050
 $ make
 ...
 building Rgemm for C2050, and taking benchmark and results are 
 saved as CSV files.
 The default precision for multiplication and addition is rounding
 IEEE's one.
 subdir bench_all
     test square matrix of various size.
 subdir bench_ecc_onoff
     test ecc on/off case. change in ECC configuration requires reboot.
 subdir bench_jitter
     test jitter of NVIDIA GPU. 
 subdir bench_pointerredirecting
     test the effect of pointer redirecting.
 subdir bench_rectangular
     test matrix-matrix multiplication for rectangular matrix.
 subdir bench_sloppy
     test lower accurate methods.

How to look at the results:
All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" 
file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed,
(NN) and all matrices are square matrices. First five lines can be ignored. Next line shows
"2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means
0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops".

--quote start
device_count : 1
device name -> Tesla C2050 
cudareturn -> 0
cudaGetDevice()=0
n - n mode
2, 0.00007523
8, 0.00401315
15, 0.02368548
32, 0.18986460
47, 0.51354554
64, 1.10947988
65, 1.05225521
81, 1.77540829
97, 2.11975457
...
--quote end

Notes:
The full reference version can be downloaded is
http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated
hopefully soon.

License:
 2-clause BSD style license. See each files for details.

Citation:
 *  "A Fast implementation of matrix-matrix product in double-double precision
    on NVIDIA C2050 and application to semidefinite programming", 
    Maho Nakata, Yasuyoshi Takao, Shigeho Noda and Ryutaro Himeno",
    International Conference on Networking and Computing, Okinawa, Japan, 2012. (To appear)

 *"Acceleration of matrix-matrix product in double-double precision using GPU",
   Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO,
   Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011.
 * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese).

Programmed by:
 Takao, Yasuyoshi, Nakata, Maho, and RIKEN.

Enjoy,
-- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/
http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt