Re: [Mplapack-devel] Accelerated double-double version of Regmm on NVIDIA C2050
Status: Pre-Alpha
Brought to you by:
nakatamaho
From: Maho N. <ch...@ma...> - 2011-10-29 03:20:29
|
Note: forgot to mention that this work has been supported by MS CORE 6 PROJECT, http://www.microsoft.com/ja-jp/ijarc/core/ifp_06_j.aspx . From: Maho NAKATA <ch...@ma...> Subject: Accelerated double-double version of Regmm on NVIDIA C2050 Date: Fri, 28 Oct 2011 09:53:40 +0900 (JST) > Hi all, > > I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050. > You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . > more explicitly, > http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download > the file name is Rgemm_C2050_20111026.tar.gz. > > This is preliminary version of double-double version Rgemm, > for benchmarking purpose. > > Requirements: > * NVIDIA C2050 or C2070. > * CUDA 3.2 > * SDK assumed to be installed at /usr/local/cuda/. > > How to test: > $ tar xvfz Rgemm_C2050_20111026.tar.gz > $ cd Rgemm_C2050 > $ make > ... > building Rgemm for C2050, and taking benchmark and results are > saved as CSV files. > The default precision for multiplication and addition is rounding > IEEE's one. > subdir bench_all > test square matrix of various size. > subdir bench_ecc_onoff > test ecc on/off case. change in ECC configuration requires reboot. > subdir bench_jitter > test jitter of NVIDIA GPU. > subdir bench_pointerredirecting > test the effect of pointer redirecting. > subdir bench_rectangular > test matrix-matrix multiplication for rectangular matrix. > subdir bench_sloppy > test lower accurate methods. > > How to look at the results: > All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" > file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed, > (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows > "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means > 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". > > --quote start > device_count : 1 > device name -> Tesla C2050 > cudareturn -> 0 > cudaGetDevice()=0 > n - n mode > 2, 0.00007523 > 8, 0.00401315 > 15, 0.02368548 > 32, 0.18986460 > 47, 0.51354554 > 64, 1.10947988 > 65, 1.05225521 > 81, 1.77540829 > 97, 2.11975457 > ... > --quote end > > Notes: > The full reference version can be downloaded is > http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated > hopefully soon. > > License: > 2-clause BSD style license. See each files for details. > > Citation: > *"Acceleration of matrix-matrix product in double-double precision using GPU", > Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, > Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. > * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese). > > Programmed by: > Takao, Yasuyoshi, Nakata, Maho, and RIKEN. > > Enjoy, > -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ > http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt > |