Re: [Mplapack-devel] Accelerated double-double version of Regmm on NVIDIA C2050

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Note: forgot to mention that this work has been supported by
MS CORE 6 PROJECT,
http://www.microsoft.com/ja-jp/ijarc/core/ifp_06_j.aspx
.

From: Maho NAKATA <ch...@ma...>
Subject: Accelerated double-double version of Regmm on NVIDIA C2050
Date: Fri, 28 Oct 2011 09:53:40 +0900 (JST)

> Hi all,
> 
> I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050.
> You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ .
> more explicitly,
> http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download
> the file name is Rgemm_C2050_20111026.tar.gz.
> 
> This is preliminary version of double-double version Rgemm,
> for benchmarking purpose. 
> 
> Requirements:
>  * NVIDIA C2050 or C2070.
>  * CUDA 3.2
>  * SDK assumed to be installed at /usr/local/cuda/.
> 
> How to test:
>  $ tar xvfz Rgemm_C2050_20111026.tar.gz
>  $ cd Rgemm_C2050
>  $ make
>  ...
>  building Rgemm for C2050, and taking benchmark and results are 
>  saved as CSV files.
>  The default precision for multiplication and addition is rounding
>  IEEE's one.
>  subdir bench_all
>      test square matrix of various size.
>  subdir bench_ecc_onoff
>      test ecc on/off case. change in ECC configuration requires reboot.
>  subdir bench_jitter
>      test jitter of NVIDIA GPU. 
>  subdir bench_pointerredirecting
>      test the effect of pointer redirecting.
>  subdir bench_rectangular
>      test matrix-matrix multiplication for rectangular matrix.
>  subdir bench_sloppy
>      test lower accurate methods.
> 
> How to look at the results:
> All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" 
> file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed,
> (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows
> "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means
> 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops".
> 
> --quote start
> device_count : 1
> device name -> Tesla C2050 
> cudareturn -> 0
> cudaGetDevice()=0
> n - n mode
> 2, 0.00007523
> 8, 0.00401315
> 15, 0.02368548
> 32, 0.18986460
> 47, 0.51354554
> 64, 1.10947988
> 65, 1.05225521
> 81, 1.77540829
> 97, 2.11975457
> ...
> --quote end
> 
> Notes:
> The full reference version can be downloaded is
> http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated
> hopefully soon.
> 
> License:
>  2-clause BSD style license. See each files for details.
> 
> Citation:
>  *"Acceleration of matrix-matrix product in double-double precision using GPU",
>    Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO,
>    Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011.
>  * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese).
> 
> Programmed by:
>  Takao, Yasuyoshi, Nakata, Maho, and RIKEN.
> 
> Enjoy,
> -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/
> http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt
>