From: Edwin Huang <ewh@st...>  20160529 00:53:40

Hello, The code for real() and imag() in "mpcomplex.h", starting from line 141: mpreal real() { mpreal tmp; tmp = mpc_realref(mpc); return tmp; } results in lost precision if mpc has higher precision than the default used for tmp. For example, the following code: #include <mpcomplex.h> using namespace mpfr; int main(void) { const mp_prec_t prec = 256; mpreal pi(0.0, prec, MPFR_RNDN); pi = const_pi(prec, MPFR_RNDN); mpfr_printf("pi, mpreal:\t%.60Rf\n", mpfr_ptr(pi)); mpcomplex pi_complex(0.0, prec, prec, MPC_RNDNN); pi_complex = pi; mpfr_printf("pi, mpcomplex:\t%.60Rf\n", mpfr_ptr(pi_complex.real())); return 0; } gives this output: pi, mpreal: 3.141592653589793238462643383279502884197169399375105820974945 pi, mpcomplex: 3.141592653589793115997963468544185161590576171875000000000000 I believe this may be resolved without any negative side effects by this simplification: mpreal real() { return mpc_realref(mpc); } and likewise for imag(). Best, Edwin 
From: Sven Höfer <sven@sv...>  20150130 21:47:49

Hi. I want to try mpack for solving some stiff ODEs, which have factors stretching over enough magnitudes to make using "double" quite problematic. Therefore I compiled mpack0.8.0 and stuck at a linkage error in the benchmark/mblas directory: << snip /bin/bash ../../libtool mode=link g++ o Rgemm.dd_cuda_total L/usr/lib/nvidiacudatoolkit/lib64 L/usr/lib/nvidiacudatoolkit/lib64 L../../mlapack/reference lmlapack_dd_ref L../../mblas/optimized/dd/cuda lmblas_dd_cuda L/usr/lib/nvidiacudatoolkit/lib64 lcudart L../../mblas/optimized/dd lmblas_dd L../../. lqd ldl fopenmp Rgemm_dd_cuda_totalRgemm_dd.o libtool: link: g++ o .libs/Rgemm.dd_cuda_total fopenmp Rgemm_dd_cuda_totalRgemm_dd.o L/usr/lib/nvidiacudatoolkit/lib64 L../../mlapack/reference /home/sven/src/mpack0.8.0/mlapack/reference/.libs/libmlapack_dd_ref.so L../../mblas/optimized/dd/cuda /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so lcudart L../../mblas/optimized/dd /home/sven/src/mpack0.8.0/mblas/optimized/dd/.libs/libmblas_dd.so L../../. lqd ldl fopenmp /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NU_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_TL_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NU_0(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_TU_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_TU_0(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NL_0(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NL_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' collect2: error: ld returned 1 exit status Makefile:1454: recipe for target 'Rgemm.dd_cuda_total' failed make: *** [Rgemm.dd_cuda_total] Error 1 config: all enabled  cuda, dd, qd, mpfr, gmp, __float128 or just cuda and dd all that numeric libs from system (Debian) or compiled from your package environment: Debian 8.0 aka testing/jessie x86_64 or amd64 gcc / g++ 4.9 cuda toolkit 6.0.1 I could test the cuda stuff on a Quadro 4000 or maybe C2070. I don't see a reason for the errors above. Please give me a hint, because I gave up after reading the source and trying out different configs. Maybe it's related to cuda toolkit version, but I'm not familiar with that stuff and its behaviour. Thanks, Sven 
From: Tony Scott <scotusts@ya...>  20141123 04:57:14

Helo Nakata Maho I did subscribe as tcscott@... but unfortunately because of a filter in mainland China,I cannot readily access it. I put in place a forward of my gmail account to my yahoo account.I will resubscribe under my yahoo account. But please tell me bow to build and use mpack so that I can use doubledouble eigenvaluesolvers. best wishes Tony From: Nakata Maho <maho.nakata@...> To: scotusts@... Cc: mplapackdevel@... Sent: Saturday, November 22, 2014 5:59 PM Subject: Re: Adjusting qd_real? Hi sorry, your email has been deferred as @yahoo.com... you should subscribe the list. anyway, > I see previous messages claiming that doubledouble works but I would like to know how. > Does the build require CUDA? no. internally it uses doubledouble library developed by Hida et al. I have just back from US so bit tired. I'll reply in the next week. thanks Nakata Maho From: Tony Scott <scotusts@...> Subject: Adjusting qd_real? Date: Fri, 21 Nov 2014 06:41:34 +0000 (UTC) > Hello again > Still no documentation but by attrition, I figured out some of my previous problems. > I needed to switch on the FORTRAN wrapper for IFORT and through a GMP test case work out the link to the libraries. > I also managed to get the examples in dd and gmp working. I got D. Bailey's qd packageand enabled qd in mpack only to find that > > mblas.h in the 'include' subdirectory has qd_real = REAL i.e. single precision which is NOT quad i.e. doubledouble precision > If I change it to e.g. qd_real to e.g. LONG DOUBLE or ____FLOAT128, I canconfigure but the make creates an error on the first BLAS routine it tries to compile. > I see previous messages claiming that doubledouble works but I would like to know how. > Does the build require CUDA? > Can someone tells me how to make build the libraries so that I can get doubledoubleprecision for MBLAS and MLAPACK? > email is: 2793247480@... > best wishes and thanks in advance > (some documentation would also be nice:) ) > 
From: Nakata Maho <maho.nakata@gm...>  20141123 02:00:00

Hi sorry, your email has been deferred as @yahoo.com... you should subscribe the list. anyway, > I see previous messages claiming that doubledouble works but I would like to know how. > Does the build require CUDA? no. internally it uses doubledouble library developed by Hida et al. I have just back from US so bit tired. I'll reply in the next week. thanks Nakata Maho From: Tony Scott <scotusts@...> Subject: Adjusting qd_real? Date: Fri, 21 Nov 2014 06:41:34 +0000 (UTC) > Hello again > Still no documentation but by attrition, I figured out some of my previous problems. > I needed to switch on the FORTRAN wrapper for IFORT and through a GMP test case work out the link to the libraries. > I also managed to get the examples in dd and gmp working. I got D. Bailey's qd packageand enabled qd in mpack only to find that > > mblas.h in the 'include' subdirectory has qd_real = REAL i.e. single precision which is NOT quad i.e. doubledouble precision > If I change it to e.g. qd_real to e.g. LONG DOUBLE or ____FLOAT128, I canconfigure but the make creates an error on the first BLAS routine it tries to compile. > I see previous messages claiming that doubledouble works but I would like to know how. > Does the build require CUDA? > Can someone tells me how to make build the libraries so that I can get doubledoubleprecision for MBLAS and MLAPACK? > email is: 2793247480@... > best wishes and thanks in advance > (some documentation would also be nice:) ) > 
From: Tony Scott <scotusts@ya...>  20141121 06:41:41

Hello again Still no documentation but by attrition, I figured out some of my previous problems. I needed to switch on the FORTRAN wrapper for IFORT and through a GMP test case work out the link to the libraries. I also managed to get the examples in dd and gmp working. I got D. Bailey's qd packageand enabled qd in mpack only to find that > mblas.h in the 'include' subdirectory has qd_real = REAL i.e. single precision which is NOT quad i.e. doubledouble precision If I change it to e.g. qd_real to e.g. LONG DOUBLE or ____FLOAT128, I canconfigure but the make creates an error on the first BLAS routine it tries to compile. I see previous messages claiming that doubledouble works but I would like to know how. Does the build require CUDA? Can someone tells me how to make build the libraries so that I can get doubledoubleprecision for MBLAS and MLAPACK? email is: 2793247480@... best wishes and thanks in advance (some documentation would also be nice:) ) 
From: Tony Scott <scotusts@ya...>  20141119 04:12:31

Dear Sirs I installed both mpack0.6.7 (claimed to be the latest version?) and mpack0.8.0only to find that only the latter would build on my Centos release 6.2 system. It created libraries on /usr/local/lib but I find NO documentation whatsoever. There is mention of userman.pdf and I cannot find it on the web in the installed mpack code. I did a posting of someonelooking for it. I saw the previous mailing lists and tried to link compiled FORTRAN code in real(16) i.e. quad precision with Intel's ifort using: ifort r16 $(FOBJS) L/usr/local/lib lmlapack_gmp lmblas_gmp lmblas_gmp_ref lmlapack_gmp_ref lgmp lgmpxx lgomp g but to no avail. Routine "rscal" was not recognized. I could use some assistance. Where is the doc showing how to link these libraries to a FORTRAN program? best wishes and thank you in advance Tony Scott (newcomer to mpack) 
From: Nakata Maho <maho@ri...>  20131029 01:19:45

Birds of Feather Session on high precision computing in Supercomputing '13 2013/11/20 Wed 12:151:15 Room 301/302/303 SC13（Colorado Convention Center, Denver） ==================================================================== Call for Participation for Birds of Feather Session in Supercomputing '13 We will have BOF session on high precision computing in the upcoming Supercomputing '13. The title of the session is "High Precision Arithmetic Operations: Libraries and Applications". Schedule and Room Supercomputing '13, Colorado Convention Center, Denver Conference Dates: November 1722, 2013 The BOF will be November 20(Wed), 12:151:15 in room 301/302/303. Scope The emergence of largescale and highspeed parallel computing forces us to consider, on a new level, rounding errors in repeated arithmetic operations. Many scientific and engineering problems rely on the numerical stability of the algorithms used, even with conventional "double precision" arithmetic operations. The development of techniques that are truly high precision is important to solve very computeintensive problems, while an extreme solution is to implement "high precision" arithmetic as hardware. In the BOF session, researchers working on high precision arithmetic techniques and applications will meet and present recent progress. We are convinced of the importance of the opportunity through experiences in the past three workshops on the high precision approaches in the scientific and engineering computing field (http://suchix.kek.jp/mpcomp/en/). Agenda We have asked the following three speakers to report the latest development. 1. M.Nakata (RIKEN) 2. D.Takahashi (University of Tsukuba) 3. D.H.Bailey (Lawrence Berkeley National Laboratory) *Speaker Introduction NAKATA Maho is a technical scientist at RIKEN, and also a visiting associate professor at Rikkyo University. He is the leading developer of MPACK library, multiple precision version of BLAS and LAPACK. He gained an interest in high precision computation while applying it to theoretical chemistry and mathematical optimization. Daisuke Takahashi is a professor at the Faculty of Engineering, Information and Systems, University of Tsukuba. He received his Ph.D. degree in information science from the University of Tokyo. His research interests include highperformance computing. David H. Bailey is recently retired from the Lawrence Berkeley National Laboratory, and is also affiliated with the University of California, Davis. He has been a leading figure in the highperformance computing world, and also in the arena of highprecision computation. In the rest of the session, we call for comments, questions and discussions from the floor. We will ask BOF attendees the following questions: (a) is your application numerically unstable in a standard precision (e.g. double precision)? (b) how many digits do you think to solve your numerically unstable problem? (c) what compiler/library/software support do we need to practically use high precision arithmetic? (d) do we need the support in hardware for high precision arithmetic? Also, we would like to discuss on possible future collaborations and other opportunities to gather the experts on high precision arithmetic. We believe this BOF will give a very good chance for researchers on working on high precision libraries and applications to share knowledge. Session Leaders and organizers N.Nakasato (University of Aizu) and F.Yuasa (KEK) M.Nakata(RIKEN), H.Matsufuru and T.Ishikawa (KEK) Contact email address: scbof@... See also http://suchix.kek.jp/mpcomp/ 
From: Maho NAKATA <maho@ri...>  20121225 00:22:47

Hi, I have just uploaded mpack version 0.8.0. http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack0.8.0.tar.gz/download https://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack0.8.0.tar.gz/download . MD5 (mpack0.8.0.tar.gz) = c2aa0cf512a5dfcf2881c69f70953c02 * CUDA version of Rgemm in doubledouble precision has been integrated. To enable CUDA version, pass configure to "enablecuda=yes" also, if multiple version of CUDA toolkit is installed or if you installed to a different directory than /usr/local/cuda/, you may want to speficfy like following withcudatoolkithome=/usr/local/cuda5.0/ * Preliminary Intel Xeon Phi support. We added preliminary version of Intel Xeon Phi support. Note that doubledouble version doesn't work as expected. ** How to build Build for Xeon Phi is done following way, and not by usual ./configure ; make. $ tar xvfz <somewhere>/mpack0.8.0.tar.gz $ cp <somewhere>/mpack0.8.0.tar.gz . $ cd mpack0.8.0/misc $ bash prepare_mpacklibs_for_mic.sh >& log.prepare_mpacklibs_for_mic.sh $ bash build_mpack_for_mic.sh >& log.build_mpack_for_mic.sh $ cd mpack0.8.0 ; make install Build and make check have passed on * Intel Composer 13.0.1 on Linux. * Gcc on Linux (Ubuntu, RedHat) * gcc47 on FreeBSD * gcc46 on MacOSX Lion * CUDA 3.1, 3.2, 4.0, 4.2, 5.0 on Linux Host This archive is exactly same as RC2. Enjoy!  Nakata Maho http://nakatamaho.riken.jp/ http://blog.goo.ne.jp/nakatamaho/ http://nakatamaho.riken.jp/maho.pgp.txt http://ja.openoffice.org/ 
From: Maho NAKATA <maho@ri...>  20121220 10:54:59

Hi all I have just uploaded MPACK 0.8.0RC2 http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack0.8.0RC2.tar.gz/download https://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack0.8.0RC2.tar.gz/download . MD5 (mpack0.8.0RC2.tar.gz) = c2aa0cf512a5dfcf2881c69f70953c02 Build fixes. Build and make check have passed on * Intel Composer 13.0.1 on Linux. * Gcc on Linux (Ubuntu, RedHat) * gcc47 on FreeBSD * gcc46 on MacOSX Lion * CUDA 3.1, 3.2, 4.0, 4.2, 5.0 on Linux Host Best, Nakata Maho From: Maho NAKATA <maho@...> Subject: MPACK 0.8.0 RC1 : CUDA support for Rgemm in doubledouble precision. Date: Thu, 29 Nov 2012 12:34:41 +0900 (JST) > Hi all, > > I have just uploaded MPACK 0.8.0RC1 > http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack0.8.0RC1.tar.gz/download > . > > CUDA version of Rgemm in doubledouble precision has been integrated. > To enable CUDA version, pass configure to "enablecuda=yes" > also, if multiple version of CUDA toolkit is installed or if you installed to a different > directory than /usr/local/cuda/, you may want to > speficfy like following > withcudatoolkithome=/usr/local/cuda5.0/ > . > > From my experience, CUDA 4.0 gives usually best performance. > You don't want to use CUDA 3.1 and 3.2. > CUDA 5.0 gives best when the size of matrix is multiple of 64, but much worse > than 4.0, 3.2 and 3.1. > > Thanks >  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ > http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt 
From: Maho NAKATA <maho@ri...>  20121129 03:34:52

Hi all, I have just uploaded MPACK 0.8.0RC1 http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack0.8.0RC1.tar.gz/download . CUDA version of Rgemm in doubledouble precision has been integrated. To enable CUDA version, pass configure to "enablecuda=yes" also, if multiple version of CUDA toolkit is installed or if you installed to a different directory than /usr/local/cuda/, you may want to speficfy like following withcudatoolkithome=/usr/local/cuda5.0/ . >From my experience, CUDA 4.0 gives usually best performance. You don't want to use CUDA 3.1 and 3.2. CUDA 5.0 gives best when the size of matrix is multiple of 64, but much worse than 4.0, 3.2 and 3.1. Thanks  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt 
From: Maho NAKATA <maho@ri...>  20121013 06:02:43

Hi all, I have just uploded Accelerated doubledouble version of Regmm on NVIDIA C2050. You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . more explicitly, http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050.20121011.tar.gz/download the file name is Rgemm_C2050.20121011.tar.gz MD5sum is a4da6bfcadef19baf692502d6236f0e6 This is preliminary version of doubledouble version Rgemm, for benchmarking purpose. Requirements: * NVIDIA C2050, M2070, 2075, 2090. * CUDA 4.2 (CUDA 4.1 is known to have a bug) * SDK assumed to be installed at /usr/local/cuda/. How to test: $ tar xvfz Rgemm_C2050.20121011.tar.gz $ cd Rgemm_C2050 $ make ... building Rgemm for C2050, and taking benchmark and results are saved as CSV files. The default precision for multiplication and addition is rounding IEEE's one. subdir bench_all test square matrix of various size. subdir bench_ecc_onoff test ecc on/off case. change in ECC configuration requires reboot. subdir bench_jitter test jitter of NVIDIA GPU. subdir bench_pointerredirecting test the effect of pointer redirecting. subdir bench_rectangular test matrixmatrix multiplication for rectangular matrix. subdir bench_sloppy test lower accurate methods. How to look at the results: All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" file. This result includes CPUGPU transfer time (Total), all matrices are not transposed, (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". quote start device_count : 1 device name > Tesla C2050 cudareturn > 0 cudaGetDevice()=0 n  n mode 2, 0.00007523 8, 0.00401315 15, 0.02368548 32, 0.18986460 47, 0.51354554 64, 1.10947988 65, 1.05225521 81, 1.77540829 97, 2.11975457 ... quote end Notes: The full reference version can be downloaded is http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated hopefully soon. License: 2clause BSD style license. See each files for details. Citation: * "A Fast implementation of matrixmatrix product in doubledouble precision on NVIDIA C2050 and application to semidefinite programming", Maho Nakata, Yasuyoshi Takao, Shigeho Noda and Ryutaro Himeno", International Conference on Networking and Computing, Okinawa, Japan, 2012. (To appear) *"Acceleration of matrixmatrix product in doubledouble precision using GPU", Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. * "Rgemmpreprint.ja.pdf" is included as well (preprint, in Japanese). Programmed by: Takao, Yasuyoshi, Nakata, Maho, and RIKEN. Enjoy,  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt 
From: Maho NAKATA <chat95@ma...>  20120810 22:43:27

Hello From: Akira SaiToh <akirasaitoh@...> Subject: Re: [Mplapackdevel] possibly a bug in Cheev Date: Sat, 11 Aug 2012 01:25:28 +0900 > Hello, Nakatasan, > > Thank you for your effort. With your patch, I could get > correct eigenvalues for the previous examples. However, I Congratulations! > encountered another problem. Ah :( I'm sorry to hear that. > For the matrix > > 2 0 1 > 0 2 0 > 1 0 2 > > Cheev returns correct eigenvalues, 1, 2, and 3. Then, for > the matrix > > 20 0 10 > 0 20 0 > 10 0 20 > > expected eigenvalues are 10, 20, and 30. However, Cheev > returns 20, 20, and 20. I guess probably Cheev should be > modified so that an appropriate scaling is internally > performed. Yes, many thanks for your bug report. I'll fix hopefully soon. Thanks Nakata Maho >  > [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out > 2 0 1 0 2 0 1 0 2 > A = > 2 0 1 > 0 2 0 > 1 0 2 > Eigenvalues: > 1 2 3 > Unitary matrix: > 0.707107+i*0 0+i*0 0.707107+i*0 > 0+i*0 1+i*0 0+i*0 > 0.707107+i*0 0+i*0 0.707107+i*0 > [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out > 20 0 10 0 20 0 10 0 20 > A = > 20 0 10 > 0 20 0 > 10 0 20 > Eigenvalues: > 20 20 20 > Unitary matrix: > 1+i*0 4.85181e173+i*0 0+i*0 > 0+i*0 1+i*0 0+i*0 > 0+i*0 0+i*0 1+i*0 >  > > Regards, > Akira SaiToh > 
From: Akira SaiToh <akirasaitoh@ni...>  20120810 16:25:37

Hello, Nakatasan, Thank you for your effort. With your patch, I could get correct eigenvalues for the previous examples. However, I encountered another problem. For the matrix 2 0 1 0 2 0 1 0 2 Cheev returns correct eigenvalues, 1, 2, and 3. Then, for the matrix 20 0 10 0 20 0 10 0 20 expected eigenvalues are 10, 20, and 30. However, Cheev returns 20, 20, and 20. I guess probably Cheev should be modified so that an appropriate scaling is internally performed.  [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 2 0 1 0 2 0 1 0 2 A = 2 0 1 0 2 0 1 0 2 Eigenvalues: 1 2 3 Unitary matrix: 0.707107+i*0 0+i*0 0.707107+i*0 0+i*0 1+i*0 0+i*0 0.707107+i*0 0+i*0 0.707107+i*0 [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 20 0 10 0 20 0 10 0 20 A = 20 0 10 0 20 0 10 0 20 Eigenvalues: 20 20 20 Unitary matrix: 1+i*0 4.85181e173+i*0 0+i*0 0+i*0 1+i*0 0+i*0 0+i*0 0+i*0 1+i*0  Regards, Akira SaiToh 
From: Maho NAKATA <chat95@ma...>  20120809 11:30:43

Hi Saitohsan After patching, my result now seems to be correct: $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/maho/MPACK//lib ./test 1 0 1 0 0 0 1 0 1 A = 1 0 1 0 0 0 1 0 1 Eigenvalues: 0 0 2 Unitary matrix: 0+i*0 0.707107+i*0 0.707107+i*0 1+i*0 0+i*0 0+i*0 0+i*0 0.707107+i*0 0.707107+i*0 . Thanks Nakata Maho From: Akira SaiToh <akirasaitoh@...> Subject: Re: [Mplapackdevel] possibly a bug in Cheev Date: Thu, 09 Aug 2012 14:25:35 +0900 > Dear Nakatasan, > > Thanks for your suggestion. > Indeed, the matrix in the previous email was a singular matrix. > But sometimes Cheev fails to find eigenvalues also for a small > nonsingular Hermitian matrix. A typical example is > > 2 0 1 > 0 2 0 > 1 0 2 > > Its eigenvalues are 1, 2, and 3. For this matrix, Cheev stops > together with the message "#Rlaev2 Checkpoint 13 Not checked" > as is similar to the previous example. > > I am not sure if this phenomenon is dependent on a system > environment. Could somebody test Cheev with the above matrix? > > Thank you for your attentions. > Regards, > Akira SaiToh > > 20120809 10:22 Maho NAKATA wrote: >> Akira SaiTohさま >> >> すいません、遅くなって。 >> まだ解決してませんでしょうか。 >> >> singularじゃなさそうな行列を適当に入れてみたら >> 如何でしょうか。 >> thanks > 
From: Maho NAKATA <maho@ri...>  20120809 09:43:56

Hi Saitohsan Please apply following patch. $ diff u mlapack/reference/Rlaev2.cpp~ mlapack/reference/Rlaev2.cpp  Rlaev2.cpp~ 20120809 14:56:17.156927491 +0900 +++ Rlaev2.cpp 20120809 18:37:00.635038092 +0900 @@ 142,8 +142,6 @@ *cs1 = one; *sn1 = zero; } else {  printf("#Rlaev2 Checkpoint 13 Not checked\n");  exit(1); tn = cs / tb; *cs1 = one / sqrt(one + tn * tn); *sn1 = tn * (*cs1); This should work. Thanks Nakata Maho From: Akira SaiToh <akirasaitoh@...> Subject: Re: [Mplapackdevel] possibly a bug in Cheev Date: Thu, 09 Aug 2012 14:25:35 +0900 > Dear Nakatasan, > > Thanks for your suggestion. > Indeed, the matrix in the previous email was a singular matrix. > But sometimes Cheev fails to find eigenvalues also for a small > nonsingular Hermitian matrix. A typical example is > > 2 0 1 > 0 2 0 > 1 0 2 > > Its eigenvalues are 1, 2, and 3. For this matrix, Cheev stops > together with the message "#Rlaev2 Checkpoint 13 Not checked" > as is similar to the previous example. > > I am not sure if this phenomenon is dependent on a system > environment. Could somebody test Cheev with the above matrix? > > Thank you for your attentions. > Regards, > Akira SaiToh > > 20120809 10:22 Maho NAKATA wrote: >> Akira SaiTohさま >> >> すいません、遅くなって。 >> まだ解決してませんでしょうか。 >> >> singularじゃなさそうな行列を適当に入れてみたら >> 如何でしょうか。 >> thanks > 
From: Maho NAKATA <maho@ri...>  20120809 07:05:20

Hi Sorry for being late. > I tried Cheev of MPACK with GMP but could not get correct results > for some simple matrices. For example, it does not diagonalize the > 3 x 3 matrix > > 1 0 1 > 0 0 0 > 1 0 1 > > and gives me "#Rlaev2 Checkpoint 13 Not checked". It works fine > for diag(1,0,0) and a matrix filled with 1's among others. I have just reproduced your result! Thanks Nakata Maho From: Akira SaiToh <akirasaitoh@...> Subject: [Mplapackdevel] possibly a bug in Cheev Date: Thu, 26 Jul 2012 19:05:41 +0900 > Dear Dr. Maho NAKATA, > > I am a postdoc working on quantum information and the matrices > I have to handle are quite often Hermitian matrices with some > eigenvalues degenerate or close to each other. I am trying several > libraries as well as writing my own library for matrix calculations > required in my research. > > I would like to report some issue as I think this is possibly a > bug of MPACK. Or, it might be a problem in my code or the environment. > > I tried Cheev of MPACK with GMP but could not get correct results > for some simple matrices. For example, it does not diagonalize the > 3 x 3 matrix > > 1 0 1 > 0 0 0 > 1 0 1 > > and gives me "#Rlaev2 Checkpoint 13 Not checked". It works fine > for diag(1,0,0) and a matrix filled with 1's among others. > > I am using mpack0.7.0 on Fedora 15 where GMP 4.3.2 exists. > > Thanks for your attentions. > Regards, > Akira SAITOH, NII, Japan > > =========================================== > [saitoh@... test_cheev]$ g++ test.cpp L/usr/local/lib > lmlapack_gmp lmblas_gmp lmblas_gmp_ref lmlapack_gmp_ref lgmp lmpfr > lgmpxx lgomp g > /usr/bin/ld: warning: libgmp.so.3, needed by > /usr/lib/gcc/x86_64redhatlinux/4.6.3/../../../../lib64/libmpfr.so, may > conflict with libgmp.so.10 > [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 > 0 1 0 0 0 1 0 1 > A = > 1 0 1 > 0 0 0 > 1 0 1 > #Rlaev2 Checkpoint 13 Not checked > [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 > 0 0 0 0 0 0 0 0 > A = > 1 0 0 > 0 0 0 > 0 0 0 > Eigenvalues: > 0 0 1 > Unitary matrix: > 0+i*0 0+i*0 1+i*0 > 1+i*0 0+i*0 0+i*0 > 0+i*0 1+i*0 0+i*0 > =========================================== > > test.cpp: > > =========================================== > #include <gmpxx.h> > #include <mpack/mpack_config.h> > #include <mpack/mpc_class.h> > #include <mpack/mlapack_gmp.h> > #include <cstdlib> > #include <iostream> > int main (int argc, char* argv[]) > { > mpf_set_default_prec(512); > mpackint n = 3; > mpc_class *A = new mpc_class [n * n]; > mpackint lda = n; > mpf_class *w = new mpf_class [n]; > mpc_class *work = new mpc_class [2 * n  1]; > mpackint lwork = 2 * n  1; > mpf_class *rwork = new mpf_class [3 * n  2]; > mpackint info; > > for (int i = 1; i < argc && i < n * n + 1; i++) > A[i1] = ::atof(argv[i]); > > std::cout << "A = " << std::endl; > for (int i = 0; i < n; i ++) > { > for (int j = 0; j < n; j++) > std::cout << A[i + n * j].real().get_d() << " "; > std::cout << std::endl; > } > > Cheev ("V", "U", n, A, lda, w, work, lwork, rwork, &info); > > std::cout << "Eigenvalues: " << std::endl; > for (int i = 0; i < n; i++) > std::cout << w[i].get_d() << " "; > std::cout << std::endl; > > std::cout << "Unitary matrix:" << std::endl; > for (int i = 0; i < n; i ++) > { > for (int j = 0; j < n; j++) > std::cout << A[i + n * j].real().get_d() << "+i*" > << A[i + n * j].imag().get_d() << " "; > std::cout << std::endl; > } > > delete [] A, w, work, rwork; > return 0; > } > =========================================== > > >  > Live Security Virtual Conference > Exclusive live event will cover all the ways today's security and > threat landscape has changed and how IT managers can respond. Discussions > will include endpoint security, mobile security and the latest in malware > threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/ > _______________________________________________ > Mplapackdevel mailing list > Mplapackdevel@... > https://lists.sourceforge.net/lists/listinfo/mplapackdevel > 
From: Akira SaiToh <akirasaitoh@ni...>  20120809 05:25:43

Dear Nakatasan, Thanks for your suggestion. Indeed, the matrix in the previous email was a singular matrix. But sometimes Cheev fails to find eigenvalues also for a small nonsingular Hermitian matrix. A typical example is 2 0 1 0 2 0 1 0 2 Its eigenvalues are 1, 2, and 3. For this matrix, Cheev stops together with the message "#Rlaev2 Checkpoint 13 Not checked" as is similar to the previous example. I am not sure if this phenomenon is dependent on a system environment. Could somebody test Cheev with the above matrix? Thank you for your attentions. Regards, Akira SaiToh 20120809 10:22 Maho NAKATA wrote: > Akira SaiTohさま > > すいません、遅くなって。 > まだ解決してませんでしょうか。 > > singularじゃなさそうな行列を適当に入れてみたら > 如何でしょうか。 > thanks 
From: Maho NAKATA <chat95@ma...>  20120809 01:23:13

Akira SaiTohさま すいません、遅くなって。 まだ解決してませんでしょうか。 singularじゃなさそうな行列を適当に入れてみたら 如何でしょうか。 thanks 
From: Maho NAKATA <chat95@ma...>  20120727 06:19:07

Dear Saitohsan Thanks for your email. I'll investigate...  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt 
From: Akira SaiToh <akirasaitoh@ni...>  20120726 10:21:31

Dear Dr. Maho NAKATA, I am a postdoc working on quantum information and the matrices I have to handle are quite often Hermitian matrices with some eigenvalues degenerate or close to each other. I am trying several libraries as well as writing my own library for matrix calculations required in my research. I would like to report some issue as I think this is possibly a bug of MPACK. Or, it might be a problem in my code or the environment. I tried Cheev of MPACK with GMP but could not get correct results for some simple matrices. For example, it does not diagonalize the 3 x 3 matrix 1 0 1 0 0 0 1 0 1 and gives me "#Rlaev2 Checkpoint 13 Not checked". It works fine for diag(1,0,0) and a matrix filled with 1's among others. I am using mpack0.7.0 on Fedora 15 where GMP 4.3.2 exists. Thanks for your attentions. Regards, Akira SAITOH, NII, Japan =========================================== [saitoh@... test_cheev]$ g++ test.cpp L/usr/local/lib lmlapack_gmp lmblas_gmp lmblas_gmp_ref lmlapack_gmp_ref lgmp lmpfr lgmpxx lgomp g /usr/bin/ld: warning: libgmp.so.3, needed by /usr/lib/gcc/x86_64redhatlinux/4.6.3/../../../../lib64/libmpfr.so, may conflict with libgmp.so.10 [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 0 1 0 0 0 1 0 1 A = 1 0 1 0 0 0 1 0 1 #Rlaev2 Checkpoint 13 Not checked [saitoh@... test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 0 0 0 0 0 0 0 0 A = 1 0 0 0 0 0 0 0 0 Eigenvalues: 0 0 1 Unitary matrix: 0+i*0 0+i*0 1+i*0 1+i*0 0+i*0 0+i*0 0+i*0 1+i*0 0+i*0 =========================================== test.cpp: =========================================== #include <gmpxx.h> #include <mpack/mpack_config.h> #include <mpack/mpc_class.h> #include <mpack/mlapack_gmp.h> #include <cstdlib> #include <iostream> int main (int argc, char* argv[]) { mpf_set_default_prec(512); mpackint n = 3; mpc_class *A = new mpc_class [n * n]; mpackint lda = n; mpf_class *w = new mpf_class [n]; mpc_class *work = new mpc_class [2 * n  1]; mpackint lwork = 2 * n  1; mpf_class *rwork = new mpf_class [3 * n  2]; mpackint info; for (int i = 1; i < argc && i < n * n + 1; i++) A[i1] = ::atof(argv[i]); std::cout << "A = " << std::endl; for (int i = 0; i < n; i ++) { for (int j = 0; j < n; j++) std::cout << A[i + n * j].real().get_d() << " "; std::cout << std::endl; } Cheev ("V", "U", n, A, lda, w, work, lwork, rwork, &info); std::cout << "Eigenvalues: " << std::endl; for (int i = 0; i < n; i++) std::cout << w[i].get_d() << " "; std::cout << std::endl; std::cout << "Unitary matrix:" << std::endl; for (int i = 0; i < n; i ++) { for (int j = 0; j < n; j++) std::cout << A[i + n * j].real().get_d() << "+i*" << A[i + n * j].imag().get_d() << " "; std::cout << std::endl; } delete [] A, w, work, rwork; return 0; } =========================================== 
From: Maho NAKATA <maho@ri...>  20120616 08:40:46

Hi lists, I have just uploaded MPACK 0.7.0. you can download from here. http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.7.0/mpack0.7.0.tar.gz/download or https://sourceforge.net/projects/mplapack/files/mpack/mpack%200.7.0/mpack0.7.0.tar.gz/download . MD5sum and SHA256sum are following: MD5: 77d285c83f66f196fd7e3379dd97365 SHA256: 14d11bf51f6d40c59937117d8f13095383dc080d9aea741ea234fd3bff8bdb15 . New features: * __float128 support: IEEE 754 2008 binary128 or quadruple precision, which is a gcc's original extension. Not available for Intel compiler. * The GMP and DD versions are build by default to reduce the build time significantly. * FORTRAN compiler is not needed anymore as default. Supported platforms: 64bit platform is strongly recommended. * MacOSX Lion (Xcode or gcc4.6 of macports), Linux (Red Hat 6.0, Ubuntu 12.04), FreeBSD 9.0/amd64, 8.2/amd64. * Mingw32 (tested on FreeBSD 9.0/amd64 and wine64). Known issues: * Debugging feature is not supported on Mingw32. * QD, DD types are not supported on Mingw32. * Static library is not supported on all platforms. * Some debugging functions are known to fail. Especially Intel compilers outputs broken executables although these are very rare cases. Special Thanks To: * FUJISAWA, Katsuki * NAKASATO, Naoto * GOTO, Kazushige * IMAMURA, Toshiyuki * HIMENO, Ryutaro Acknowledgment: This work was supported by the Special Postdoctoral Researchers' Program of RIKEN, was partially supported by GrantinAid for Scientific Research (B) 21300017, and CORE 6 project by Institute for Japanese Academic Research Collaboration of Microsoft research. Thanks,  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt 
From: Maho NAKATA <chat95@ma...>  20111101 00:06:02

Hi Fletcher, Many thanks, and could you please paste your error? Are you trying to build on CUDA 4.0? Currenlty only CUDA 3.2 is supported... thanks Nakata Maho From: "Fletcher, John P" <j.p.fletcher@...> Subject: RE: [Mplapackdevel] Accelerated doubledouble version of Regmm on NVIDIA C2050 Date: Mon, 31 Oct 2011 10:48:46 +0000 > Maho > > I am attempting to get this code working on my system with a GTX 460 card. > > At the moment I have a strange problem compiling the code which is a failure to find __builtin_isfinite. This seems to be a problem between gcc 4.4 and the nvcc compiler, which has occurred in a number of projects as well. I have overcome this temporarily by doctoring the isfinite implementation in dd_real.h > > The other problem is about compatibility of NVIDIA driver versions. > > I am hopeful of overcoming these as I have MPACK working with qd on my computer, and also code which works with CUDA. > > I will report further when I have a solution. > > If anyone else can help, please post a reply. > > Thank you for this work. > > John > > Dr John P. Fletcher Tel: (44) 121 204 3389 (direct line), FAX: (44) 121 204 3678 > Chemical Engineering and Applied Chemistry (CEAC), > Associate Dean  External Relations, > School of Engineering and Applied Science (EAS), > Aston University, Aston Triangle, BIRMINGHAM B4 7ET U.K. > > > > Original Message > From: Maho NAKATA [mailto:chat95@...] > Sent: 28 October 2011 01:54 > To: mplapackdevel@... > Cc: ytakao@...; himeno@... > Subject: [Mplapackdevel] Accelerated doubledouble version of Regmm on NVIDIA C2050 > > Hi all, > > I have just uploded Accelerated doubledouble version of Regmm on NVIDIA C2050. > You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . > more explicitly, > http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download > the file name is Rgemm_C2050_20111026.tar.gz. > > This is preliminary version of doubledouble version Rgemm, > for benchmarking purpose. > > Requirements: > * NVIDIA C2050 or C2070. > * CUDA 3.2 > * SDK assumed to be installed at /usr/local/cuda/. > > How to test: > $ tar xvfz Rgemm_C2050_20111026.tar.gz > $ cd Rgemm_C2050 > $ make > ... > building Rgemm for C2050, and taking benchmark and results are > saved as CSV files. > The default precision for multiplication and addition is rounding > IEEE's one. > subdir bench_all > test square matrix of various size. > subdir bench_ecc_onoff > test ecc on/off case. change in ECC configuration requires reboot. > subdir bench_jitter > test jitter of NVIDIA GPU. > subdir bench_pointerredirecting > test the effect of pointer redirecting. > subdir bench_rectangular > test matrixmatrix multiplication for rectangular matrix. > subdir bench_sloppy > test lower accurate methods. > > How to look at the results: > All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" > file. This result includes CPUGPU transfer time (Total), all matrices are not transposed, > (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows > "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means > 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". > > quote start > device_count : 1 > device name > Tesla C2050 > cudareturn > 0 > cudaGetDevice()=0 > n  n mode > 2, 0.00007523 > 8, 0.00401315 > 15, 0.02368548 > 32, 0.18986460 > 47, 0.51354554 > 64, 1.10947988 > 65, 1.05225521 > 81, 1.77540829 > 97, 2.11975457 > ... > quote end > > Notes: > The full reference version can be downloaded is > http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated > hopefully soon. > > License: > 2clause BSD style license. See each files for details. > > Citation: > *"Acceleration of matrixmatrix product in doubledouble precision using GPU", > Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, > Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. > * "Rgemmpreprint.ja.pdf" is included as well (preprint, in Japanese). > > Programmed by: > Takao, Yasuyoshi, Nakata, Maho, and RIKEN. > > Enjoy, >  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ > http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt > >  > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@... SelfAssessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/ciscodev2dev > _______________________________________________ > Mplapackdevel mailing list > Mplapackdevel@... > https://lists.sourceforge.net/lists/listinfo/mplapackdevel > > 
From: Fletcher, John P <j.p.fletcher@as...>  20111031 11:04:06

Maho I am attempting to get this code working on my system with a GTX 460 card. At the moment I have a strange problem compiling the code which is a failure to find __builtin_isfinite. This seems to be a problem between gcc 4.4 and the nvcc compiler, which has occurred in a number of projects as well. I have overcome this temporarily by doctoring the isfinite implementation in dd_real.h The other problem is about compatibility of NVIDIA driver versions. I am hopeful of overcoming these as I have MPACK working with qd on my computer, and also code which works with CUDA. I will report further when I have a solution. If anyone else can help, please post a reply. Thank you for this work. John Dr John P. Fletcher Tel: (44) 121 204 3389 (direct line), FAX: (44) 121 204 3678 Chemical Engineering and Applied Chemistry (CEAC), Associate Dean  External Relations, School of Engineering and Applied Science (EAS), Aston University, Aston Triangle, BIRMINGHAM B4 7ET U.K. Original Message From: Maho NAKATA [mailto:chat95@...] Sent: 28 October 2011 01:54 To: mplapackdevel@... Cc: ytakao@...; himeno@... Subject: [Mplapackdevel] Accelerated doubledouble version of Regmm on NVIDIA C2050 Hi all, I have just uploded Accelerated doubledouble version of Regmm on NVIDIA C2050. You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . more explicitly, http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download the file name is Rgemm_C2050_20111026.tar.gz. This is preliminary version of doubledouble version Rgemm, for benchmarking purpose. Requirements: * NVIDIA C2050 or C2070. * CUDA 3.2 * SDK assumed to be installed at /usr/local/cuda/. How to test: $ tar xvfz Rgemm_C2050_20111026.tar.gz $ cd Rgemm_C2050 $ make ... building Rgemm for C2050, and taking benchmark and results are saved as CSV files. The default precision for multiplication and addition is rounding IEEE's one. subdir bench_all test square matrix of various size. subdir bench_ecc_onoff test ecc on/off case. change in ECC configuration requires reboot. subdir bench_jitter test jitter of NVIDIA GPU. subdir bench_pointerredirecting test the effect of pointer redirecting. subdir bench_rectangular test matrixmatrix multiplication for rectangular matrix. subdir bench_sloppy test lower accurate methods. How to look at the results: All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" file. This result includes CPUGPU transfer time (Total), all matrices are not transposed, (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". quote start device_count : 1 device name > Tesla C2050 cudareturn > 0 cudaGetDevice()=0 n  n mode 2, 0.00007523 8, 0.00401315 15, 0.02368548 32, 0.18986460 47, 0.51354554 64, 1.10947988 65, 1.05225521 81, 1.77540829 97, 2.11975457 ... quote end Notes: The full reference version can be downloaded is http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated hopefully soon. License: 2clause BSD style license. See each files for details. Citation: *"Acceleration of matrixmatrix product in doubledouble precision using GPU", Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. * "Rgemmpreprint.ja.pdf" is included as well (preprint, in Japanese). Programmed by: Takao, Yasuyoshi, Nakata, Maho, and RIKEN. Enjoy,  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt  The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@... SelfAssessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/ciscodev2dev _______________________________________________ Mplapackdevel mailing list Mplapackdevel@... https://lists.sourceforge.net/lists/listinfo/mplapackdevel 
From: Maho NAKATA <chat95@ma...>  20111029 03:20:29

Note: forgot to mention that this work has been supported by MS CORE 6 PROJECT, http://www.microsoft.com/jajp/ijarc/core/ifp_06_j.aspx . From: Maho NAKATA <chat95@...> Subject: Accelerated doubledouble version of Regmm on NVIDIA C2050 Date: Fri, 28 Oct 2011 09:53:40 +0900 (JST) > Hi all, > > I have just uploded Accelerated doubledouble version of Regmm on NVIDIA C2050. > You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . > more explicitly, > http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download > the file name is Rgemm_C2050_20111026.tar.gz. > > This is preliminary version of doubledouble version Rgemm, > for benchmarking purpose. > > Requirements: > * NVIDIA C2050 or C2070. > * CUDA 3.2 > * SDK assumed to be installed at /usr/local/cuda/. > > How to test: > $ tar xvfz Rgemm_C2050_20111026.tar.gz > $ cd Rgemm_C2050 > $ make > ... > building Rgemm for C2050, and taking benchmark and results are > saved as CSV files. > The default precision for multiplication and addition is rounding > IEEE's one. > subdir bench_all > test square matrix of various size. > subdir bench_ecc_onoff > test ecc on/off case. change in ECC configuration requires reboot. > subdir bench_jitter > test jitter of NVIDIA GPU. > subdir bench_pointerredirecting > test the effect of pointer redirecting. > subdir bench_rectangular > test matrixmatrix multiplication for rectangular matrix. > subdir bench_sloppy > test lower accurate methods. > > How to look at the results: > All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" > file. This result includes CPUGPU transfer time (Total), all matrices are not transposed, > (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows > "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means > 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". > > quote start > device_count : 1 > device name > Tesla C2050 > cudareturn > 0 > cudaGetDevice()=0 > n  n mode > 2, 0.00007523 > 8, 0.00401315 > 15, 0.02368548 > 32, 0.18986460 > 47, 0.51354554 > 64, 1.10947988 > 65, 1.05225521 > 81, 1.77540829 > 97, 2.11975457 > ... > quote end > > Notes: > The full reference version can be downloaded is > http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated > hopefully soon. > > License: > 2clause BSD style license. See each files for details. > > Citation: > *"Acceleration of matrixmatrix product in doubledouble precision using GPU", > Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, > Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. > * "Rgemmpreprint.ja.pdf" is included as well (preprint, in Japanese). > > Programmed by: > Takao, Yasuyoshi, Nakata, Maho, and RIKEN. > > Enjoy, >  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ > http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt > 
From: Maho NAKATA <maho@ri...>  20111028 00:46:03

Hi all, I have just uploded Accelerated doubledouble version of Regmm on NVIDIA C2050. You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . more explicitly, http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download the file name is Rgemm_C2050_20111026.tar.gz. This is preliminary version of doubledouble version Rgemm, for benchmarking purpose. Requirements: * NVIDIA C2050 or C2070. * CUDA 3.2 * SDK assumed to be installed at /usr/local/cuda/. How to test: $ tar xvfz Rgemm_C2050_20111026.tar.gz $ cd Rgemm_C2050 $ make ... building Rgemm for C2050, and taking benchmark and results are saved as CSV files. The default precision for multiplication and addition is rounding IEEE's one. subdir bench_all test square matrix of various size. subdir bench_ecc_onoff test ecc on/off case. change in ECC configuration requires reboot. subdir bench_jitter test jitter of NVIDIA GPU. subdir bench_pointerredirecting test the effect of pointer redirecting. subdir bench_rectangular test matrixmatrix multiplication for rectangular matrix. subdir bench_sloppy test lower accurate methods. How to look at the results: All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" file. This result includes CPUGPU transfer time (Total), all matrices are not transposed, (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". quote start device_count : 1 device name > Tesla C2050 cudareturn > 0 cudaGetDevice()=0 n  n mode 2, 0.00007523 8, 0.00401315 15, 0.02368548 32, 0.18986460 47, 0.51354554 64, 1.10947988 65, 1.05225521 81, 1.77540829 97, 2.11975457 ... quote end Notes: The full reference version can be downloaded is http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated hopefully soon. License: 2clause BSD style license. See each files for details. Citation: *"Acceleration of matrixmatrix product in doubledouble precision using GPU", Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. * "Rgemmpreprint.ja.pdf" is included as well (preprint, in Japanese). Programmed by: Takao, Yasuyoshi, Nakata, Maho, and RIKEN. Enjoy,  Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt 