mplapack-devel Mailing List for Multiple precision LAPACK and BLAS
Status: Pre-Alpha
                
                Brought to you by:
                
                    nakatamaho
                    
                
            You can subscribe to this list here.
| 2009 | 
          Jan
           | 
        
        
        
        
          Feb
           (3)  | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2010 | 
          Jan
           (4)  | 
        
        
        
        
          Feb
           (3)  | 
        
        
        
        
          Mar
           (11)  | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           (1)  | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           (2)  | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2011 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           (5)  | 
        
        
        
        
          Apr
           (11)  | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           (11)  | 
        
        
        
        
          Oct
           (4)  | 
        
        
        
        
          Nov
           (1)  | 
        
        
        
        
          Dec
           | 
        
      
| 2012 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           (1)  | 
        
        
        
        
          Jul
           (2)  | 
        
        
        
        
          Aug
           (7)  | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           (1)  | 
        
        
        
        
          Nov
           (1)  | 
        
        
        
        
          Dec
           (2)  | 
        
      
| 2013 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           (1)  | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2014 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           (4)  | 
        
        
        
        
          Dec
           | 
        
      
| 2015 | 
          Jan
           (1)  | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 2016 | 
          Jan
           | 
        
        
        
        
          Feb
           | 
        
        
        
        
          Mar
           | 
        
        
        
        
          Apr
           | 
        
        
        
        
          May
           (1)  | 
        
        
        
        
          Jun
           | 
        
        
        
        
          Jul
           | 
        
        
        
        
          Aug
           | 
        
        
        
        
          Sep
           | 
        
        
        
        
          Oct
           | 
        
        
        
        
          Nov
           | 
        
        
        
        
          Dec
           | 
        
      
| 
     
      
      
      From: Edwin H. <ew...@st...> - 2016-05-29 00:53:40
      
     
   | 
Hello,
The code for real() and imag() in "mpcomplex.h", starting from line 141:
    mpreal real()
    {
        mpreal tmp;
        tmp = mpc_realref(mpc);
        return tmp;
    }
results in lost precision if mpc has higher precision than the default used
for tmp.
For example, the following code:
#include <mpcomplex.h>
using namespace mpfr;
int main(void)
{
const mp_prec_t prec = 256;
mpreal pi(0.0, prec, MPFR_RNDN);
pi = const_pi(prec, MPFR_RNDN);
mpfr_printf("pi, mpreal:\t%.60Rf\n", mpfr_ptr(pi));
mpcomplex pi_complex(0.0, prec, prec, MPC_RNDNN);
pi_complex = pi;
mpfr_printf("pi, mpcomplex:\t%.60Rf\n", mpfr_ptr(pi_complex.real()));
return 0;
}
gives this output:
pi, mpreal: 3.141592653589793238462643383279502884197169399375105820974945
pi, mpcomplex:
3.141592653589793115997963468544185161590576171875000000000000
I believe this may be resolved without any negative side effects by this
simplification:
    mpreal real()
    {
        return mpc_realref(mpc);
    }
and likewise for imag().
Best,
Edwin
 | 
| 
     
      
      
      From: Sven H. <sv...@sv...> - 2015-01-30 21:47:49
      
     
   | 
Hi. I want to try mpack for solving some stiff ODEs, which have factors stretching over enough magnitudes to make using "double" quite problematic. Therefore I compiled mpack-0.8.0 and stuck at a linkage error in the benchmark/mblas directory: << snip /bin/bash ../../libtool --mode=link g++ -o Rgemm.dd_cuda_total -L/usr/lib/nvidia-cuda-toolkit/lib64 -L/usr/lib/nvidia-cuda-toolkit/lib64 -L../../mlapack/reference -lmlapack_dd_ref -L../../mblas/optimized/dd/cuda -lmblas_dd_cuda -L/usr/lib/nvidia-cuda-toolkit/lib64 -lcudart -L../../mblas/optimized/dd -lmblas_dd -L../../. -lqd -ldl -fopenmp Rgemm_dd_cuda_total-Rgemm_dd.o libtool: link: g++ -o .libs/Rgemm.dd_cuda_total -fopenmp Rgemm_dd_cuda_total-Rgemm_dd.o -L/usr/lib/nvidia-cuda-toolkit/lib64 -L../../mlapack/reference /home/sven/src/mpack-0.8.0/mlapack/reference/.libs/libmlapack_dd_ref.so -L../../mblas/optimized/dd/cuda /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so -lcudart -L../../mblas/optimized/dd /home/sven/src/mpack-0.8.0/mblas/optimized/dd/.libs/libmblas_dd.so -L../../. -lqd -ldl -fopenmp /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NU_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_TL_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NU_0(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_TU_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_TU_0(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NL_0(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' /home/sven/src/mpack-0.8.0/mblas/optimized/dd/cuda/.libs/libmblas_dd_cuda.so: undefined reference to `Rsyrk_NL_p(dd_real*, dd_real*, long, long, long, long, dd_real, dd_real)' collect2: error: ld returned 1 exit status Makefile:1454: recipe for target 'Rgemm.dd_cuda_total' failed make: *** [Rgemm.dd_cuda_total] Error 1 config: all enabled - cuda, dd, qd, mpfr, gmp, __float128 or just cuda and dd all that numeric libs from system (Debian) or compiled from your package environment: Debian 8.0 aka testing/jessie x86_64 or amd64 gcc / g++ 4.9 cuda toolkit 6.0.1 I could test the cuda stuff on a Quadro 4000 or maybe C2070. I don't see a reason for the errors above. Please give me a hint, because I gave up after reading the source and trying out different configs. Maybe it's related to cuda toolkit version, but I'm not familiar with that stuff and its behaviour. Thanks, Sven  | 
| 
     
      
      
      From: Tony S. <sco...@ya...> - 2014-11-23 04:57:14
      
     
   | 
Helo Nakata Maho
I did subscribe as tc...@gm... but unfortunately because of a filter in mainland China,I cannot readily access it.  I put in place a forward of my gmail account to my yahoo account.I will re-subscribe under my yahoo account.
But please tell me bow to build and use mpack so that I can use double-double eigenvaluesolvers.
best wishes
Tony
      From: Nakata Maho <mah...@gm...>
 To: sco...@ya... 
Cc: mpl...@li... 
 Sent: Saturday, November 22, 2014 5:59 PM
 Subject: Re: Adjusting qd_real?
   
Hi
sorry, your e-mail has been deferred as @yahoo.com...
you should subscribe the list.
anyway,
> I see previous messages claiming that double-double works but I would like to know how.
> Does the build require CUDA?
no. internally it uses double-double library developed by Hida et al.
I have just back from US so bit tired. I'll reply in the next week.
thanks
 Nakata Maho
From: Tony Scott <sco...@ya...>
Subject: Adjusting qd_real?
Date: Fri, 21 Nov 2014 06:41:34 +0000 (UTC)
> Hello again
> Still no documentation but by attrition, I figured out some of my previous problems.  
> I needed to switch on the FORTRAN wrapper for IFORT and through a GMP test case work out the link to the libraries.  
> I also managed to get the examples in dd and gmp working.  I got D. Bailey's qd packageand enabled qd in mpack only to find that 
> -> mblas.h in the 'include' sub-directory has  qd_real = REAL i.e. single precision which is NOT quad  i.e. double-double precision
> If I change it to  e.g. qd_real to e.g. LONG DOUBLE or ____FLOAT128, I canconfigure but the make creates an error on the first BLAS routine it tries to compile.
> I see previous messages claiming that double-double works but I would like to know how.
> Does the build require CUDA?
> Can someone tells me how to make build the libraries so that I can get double-doubleprecision for MBLAS and MLAPACK?
> email is: 279...@qq...
> best wishes and thanks in advance
> (some documentation would also be nice:-) )
> 
   | 
| 
     
      
      
      From: Nakata M. <mah...@gm...> - 2014-11-23 02:00:00
      
     
   | 
Hi sorry, your e-mail has been deferred as @yahoo.com... you should subscribe the list. anyway, > I see previous messages claiming that double-double works but I would like to know how. > Does the build require CUDA? no. internally it uses double-double library developed by Hida et al. I have just back from US so bit tired. I'll reply in the next week. thanks Nakata Maho From: Tony Scott <sco...@ya...> Subject: Adjusting qd_real? Date: Fri, 21 Nov 2014 06:41:34 +0000 (UTC) > Hello again > Still no documentation but by attrition, I figured out some of my previous problems. > I needed to switch on the FORTRAN wrapper for IFORT and through a GMP test case work out the link to the libraries. > I also managed to get the examples in dd and gmp working. I got D. Bailey's qd packageand enabled qd in mpack only to find that > -> mblas.h in the 'include' sub-directory has qd_real = REAL i.e. single precision which is NOT quad i.e. double-double precision > If I change it to e.g. qd_real to e.g. LONG DOUBLE or ____FLOAT128, I canconfigure but the make creates an error on the first BLAS routine it tries to compile. > I see previous messages claiming that double-double works but I would like to know how. > Does the build require CUDA? > Can someone tells me how to make build the libraries so that I can get double-doubleprecision for MBLAS and MLAPACK? > email is: 279...@qq... > best wishes and thanks in advance > (some documentation would also be nice:-) ) >  | 
| 
     
      
      
      From: Tony S. <sco...@ya...> - 2014-11-21 06:41:41
      
     
   | 
Hello again Still no documentation but by attrition, I figured out some of my previous problems. I needed to switch on the FORTRAN wrapper for IFORT and through a GMP test case work out the link to the libraries. I also managed to get the examples in dd and gmp working. I got D. Bailey's qd packageand enabled qd in mpack only to find that -> mblas.h in the 'include' sub-directory has qd_real = REAL i.e. single precision which is NOT quad i.e. double-double precision If I change it to e.g. qd_real to e.g. LONG DOUBLE or ____FLOAT128, I canconfigure but the make creates an error on the first BLAS routine it tries to compile. I see previous messages claiming that double-double works but I would like to know how. Does the build require CUDA? Can someone tells me how to make build the libraries so that I can get double-doubleprecision for MBLAS and MLAPACK? email is: 279...@qq... best wishes and thanks in advance (some documentation would also be nice:-) )  | 
| 
     
      
      
      From: Tony S. <sco...@ya...> - 2014-11-19 04:12:31
      
     
   | 
Dear Sirs I installed both mpack-0.6.7 (claimed to be the latest version?) and mpack-0.8.0only to find that only the latter would build on my Centos release 6.2 system. It created libraries on /usr/local/lib but I find NO documentation whatsoever. There is mention of userman.pdf and I cannot find it on the web in the installed mpack code. I did a posting of someonelooking for it. I saw the previous mailing lists and tried to link compiled FORTRAN code in real(16) i.e. quad precision with Intel's ifort using: ifort -r16 $(FOBJS) -L/usr/local/lib -lmlapack_gmp -lmblas_gmp -lmblas_gmp_ref -lmlapack_gmp_ref -lgmp -lgmpxx -lgomp -g but to no avail. Routine "rscal" was not recognized. I could use some assistance. Where is the doc showing how to link these libraries to a FORTRAN program? best wishes and thank you in advance Tony Scott (newcomer to mpack)  | 
| 
     
      
      
      From: Nakata M. <ma...@ri...> - 2013-10-29 01:19:45
      
     
   | 
Birds of Feather Session on high precision computing in Supercomputing '13 2013/11/20 Wed 12:15-1:15 Room 301/302/303 SC13(Colorado Convention Center, Denver) ==================================================================== Call for Participation for Birds of Feather Session in Supercomputing '13 We will have BOF session on high precision computing in the upcoming Supercomputing '13. The title of the session is "High Precision Arithmetic Operations: Libraries and Applications". Schedule and Room Supercomputing '13, Colorado Convention Center, Denver Conference Dates: November 17-22, 2013 The BOF will be November 20(Wed), 12:15-1:15 in room 301/302/303. Scope The emergence of large-scale and high-speed parallel computing forces us to consider, on a new level, rounding errors in repeated arithmetic operations. Many scientific and engineering problems rely on the numerical stability of the algorithms used, even with conventional "double precision" arithmetic operations. The development of techniques that are truly high precision is important to solve very compute-intensive problems, while an extreme solution is to implement "high precision" arithmetic as hardware. In the BOF session, researchers working on high precision arithmetic techniques and applications will meet and present recent progress. We are convinced of the importance of the opportunity through experiences in the past three workshops on the high precision approaches in the scientific and engineering computing field (http://suchix.kek.jp/mpcomp/en/). Agenda We have asked the following three speakers to report the latest development. 1. M.Nakata (RIKEN) 2. D.Takahashi (University of Tsukuba) 3. D.H.Bailey (Lawrence Berkeley National Laboratory) *Speaker Introduction NAKATA Maho is a technical scientist at RIKEN, and also a visiting associate professor at Rikkyo University. He is the leading developer of MPACK library, multiple precision version of BLAS and LAPACK. He gained an interest in high precision computation while applying it to theoretical chemistry and mathematical optimization. Daisuke Takahashi is a professor at the Faculty of Engineering, Information and Systems, University of Tsukuba. He received his Ph.D. degree in information science from the University of Tokyo. His research interests include high-performance computing. David H. Bailey is recently retired from the Lawrence Berkeley National Laboratory, and is also affiliated with the University of California, Davis. He has been a leading figure in the high-performance computing world, and also in the arena of high-precision computation. In the rest of the session, we call for comments, questions and discussions from the floor. We will ask BOF attendees the following questions: (a) is your application numerically unstable in a standard precision (e.g. double precision)? (b) how many digits do you think to solve your numerically unstable problem? (c) what compiler/library/software support do we need to practically use high precision arithmetic? (d) do we need the support in hardware for high precision arithmetic? Also, we would like to discuss on possible future collaborations and other opportunities to gather the experts on high precision arithmetic. We believe this BOF will give a very good chance for researchers on working on high precision libraries and applications to share knowledge. Session Leaders and organizers N.Nakasato (University of Aizu) and F.Yuasa (KEK) M.Nakata(RIKEN), H.Matsufuru and T.Ishikawa (KEK) Contact e-mail address: sc...@ml... See also http://suchix.kek.jp/mpcomp/  | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2012-12-25 00:22:47
      
     
   | 
Hi, I have just uploaded mpack version 0.8.0. http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack-0.8.0.tar.gz/download https://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack-0.8.0.tar.gz/download . MD5 (mpack-0.8.0.tar.gz) = c2aa0cf512a5dfcf2881c69f70953c02 * CUDA version of Rgemm in double-double precision has been integrated. To enable CUDA version, pass configure to "--enable-cuda=yes" also, if multiple version of CUDA toolkit is installed or if you installed to a different directory than /usr/local/cuda/, you may want to speficfy like following --with-cudatoolkithome=/usr/local/cuda-5.0/ * Preliminary Intel Xeon Phi support. We added preliminary version of Intel Xeon Phi support. Note that double-double version doesn't work as expected. ** How to build Build for Xeon Phi is done following way, and not by usual ./configure ; make. $ tar xvfz <somewhere>/mpack-0.8.0.tar.gz $ cp <somewhere>/mpack-0.8.0.tar.gz . $ cd mpack-0.8.0/misc $ bash prepare_mpacklibs_for_mic.sh >& log.prepare_mpacklibs_for_mic.sh $ bash build_mpack_for_mic.sh >& log.build_mpack_for_mic.sh $ cd mpack-0.8.0 ; make install Build and make check have passed on * Intel Composer 13.0.1 on Linux. * Gcc on Linux (Ubuntu, RedHat) * gcc47 on FreeBSD * gcc46 on MacOSX Lion * CUDA 3.1, 3.2, 4.0, 4.2, 5.0 on Linux Host This archive is exactly same as RC2. Enjoy! -- Nakata Maho http://nakatamaho.riken.jp/ http://blog.goo.ne.jp/nakatamaho/ http://nakatamaho.riken.jp/maho.pgp.txt http://ja.openoffice.org/  | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2012-12-20 10:54:59
      
     
   | 
Hi all I have just uploaded MPACK 0.8.0RC2 http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack-0.8.0-RC2.tar.gz/download https://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack-0.8.0-RC2.tar.gz/download . MD5 (mpack-0.8.0-RC2.tar.gz) = c2aa0cf512a5dfcf2881c69f70953c02 Build fixes. Build and make check have passed on * Intel Composer 13.0.1 on Linux. * Gcc on Linux (Ubuntu, RedHat) * gcc47 on FreeBSD * gcc46 on MacOSX Lion * CUDA 3.1, 3.2, 4.0, 4.2, 5.0 on Linux Host Best, Nakata Maho From: Maho NAKATA <ma...@ri...> Subject: MPACK 0.8.0 RC1 : CUDA support for Rgemm in double-double precision. Date: Thu, 29 Nov 2012 12:34:41 +0900 (JST) > Hi all, > > I have just uploaded MPACK 0.8.0RC1 > http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack-0.8.0-RC1.tar.gz/download > . > > CUDA version of Rgemm in double-double precision has been integrated. > To enable CUDA version, pass configure to "--enable-cuda=yes" > also, if multiple version of CUDA toolkit is installed or if you installed to a different > directory than /usr/local/cuda/, you may want to > speficfy like following > --with-cudatoolkithome=/usr/local/cuda-5.0/ > . > > From my experience, CUDA 4.0 gives usually best performance. > You don't want to use CUDA 3.1 and 3.2. > CUDA 5.0 gives best when the size of matrix is multiple of 64, but much worse > than 4.0, 3.2 and 3.1. > > Thanks > -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ > http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt  | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2012-11-29 03:34:52
      
     
   | 
Hi all, I have just uploaded MPACK 0.8.0RC1 http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.8.0/mpack-0.8.0-RC1.tar.gz/download . CUDA version of Rgemm in double-double precision has been integrated. To enable CUDA version, pass configure to "--enable-cuda=yes" also, if multiple version of CUDA toolkit is installed or if you installed to a different directory than /usr/local/cuda/, you may want to speficfy like following --with-cudatoolkithome=/usr/local/cuda-5.0/ . >From my experience, CUDA 4.0 gives usually best performance. You don't want to use CUDA 3.1 and 3.2. CUDA 5.0 gives best when the size of matrix is multiple of 64, but much worse than 4.0, 3.2 and 3.1. Thanks -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt  | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2012-10-13 06:02:43
      
     
   | 
Hi all, I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050. You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . more explicitly, http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050.20121011.tar.gz/download the file name is Rgemm_C2050.20121011.tar.gz MD5sum is a4da6bfcadef19baf692502d6236f0e6 This is preliminary version of double-double version Rgemm, for benchmarking purpose. Requirements: * NVIDIA C2050, M2070, 2075, 2090. * CUDA 4.2 (CUDA 4.1 is known to have a bug) * SDK assumed to be installed at /usr/local/cuda/. How to test: $ tar xvfz Rgemm_C2050.20121011.tar.gz $ cd Rgemm_C2050 $ make ... building Rgemm for C2050, and taking benchmark and results are saved as CSV files. The default precision for multiplication and addition is rounding IEEE's one. subdir bench_all test square matrix of various size. subdir bench_ecc_onoff test ecc on/off case. change in ECC configuration requires reboot. subdir bench_jitter test jitter of NVIDIA GPU. subdir bench_pointerredirecting test the effect of pointer redirecting. subdir bench_rectangular test matrix-matrix multiplication for rectangular matrix. subdir bench_sloppy test lower accurate methods. How to look at the results: All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed, (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". --quote start device_count : 1 device name -> Tesla C2050 cudareturn -> 0 cudaGetDevice()=0 n - n mode 2, 0.00007523 8, 0.00401315 15, 0.02368548 32, 0.18986460 47, 0.51354554 64, 1.10947988 65, 1.05225521 81, 1.77540829 97, 2.11975457 ... --quote end Notes: The full reference version can be downloaded is http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated hopefully soon. License: 2-clause BSD style license. See each files for details. Citation: * "A Fast implementation of matrix-matrix product in double-double precision on NVIDIA C2050 and application to semidefinite programming", Maho Nakata, Yasuyoshi Takao, Shigeho Noda and Ryutaro Himeno", International Conference on Networking and Computing, Okinawa, Japan, 2012. (To appear) *"Acceleration of matrix-matrix product in double-double precision using GPU", Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese). Programmed by: Takao, Yasuyoshi, Nakata, Maho, and RIKEN. Enjoy, -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt  | 
| 
     
      
      
      From: Maho N. <ch...@ma...> - 2012-08-10 22:43:27
      
     
   | 
Hello From: Akira SaiToh <aki...@ni...> Subject: Re: [Mplapack-devel] possibly a bug in Cheev Date: Sat, 11 Aug 2012 01:25:28 +0900 > Hello, Nakata-san, > > Thank you for your effort. With your patch, I could get > correct eigenvalues for the previous examples. However, I Congratulations! > encountered another problem. Ah :-( I'm sorry to hear that. > For the matrix > > 2 0 1 > 0 2 0 > 1 0 2 > > Cheev returns correct eigenvalues, 1, 2, and 3. Then, for > the matrix > > 20 0 10 > 0 20 0 > 10 0 20 > > expected eigenvalues are 10, 20, and 30. However, Cheev > returns 20, 20, and 20. I guess probably Cheev should be > modified so that an appropriate scaling is internally > performed. Yes, many thanks for your bug report. I'll fix hopefully soon. Thanks Nakata Maho > ------------------------------------------------------- > [saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out > 2 0 1 0 2 0 1 0 2 > A = > 2 0 1 > 0 2 0 > 1 0 2 > Eigenvalues: > 1 2 3 > Unitary matrix: > 0.707107+i*0 0+i*0 0.707107+i*0 > 0+i*0 -1+i*0 0+i*0 > -0.707107+i*0 0+i*0 0.707107+i*0 > [saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out > 20 0 10 0 20 0 10 0 20 > A = > 20 0 10 > 0 20 0 > 10 0 20 > Eigenvalues: > 20 20 20 > Unitary matrix: > -1+i*0 4.85181e-173+i*0 0+i*0 > 0+i*0 -1+i*0 0+i*0 > 0+i*0 0+i*0 1+i*0 > ------------------------------------------------------- > > Regards, > Akira SaiToh >  | 
| 
     
      
      
      From: Akira S. <aki...@ni...> - 2012-08-10 16:25:37
      
     
   | 
Hello, Nakata-san, Thank you for your effort. With your patch, I could get correct eigenvalues for the previous examples. However, I encountered another problem. For the matrix 2 0 1 0 2 0 1 0 2 Cheev returns correct eigenvalues, 1, 2, and 3. Then, for the matrix 20 0 10 0 20 0 10 0 20 expected eigenvalues are 10, 20, and 30. However, Cheev returns 20, 20, and 20. I guess probably Cheev should be modified so that an appropriate scaling is internally performed. ------------------------------------------------------- [saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 2 0 1 0 2 0 1 0 2 A = 2 0 1 0 2 0 1 0 2 Eigenvalues: 1 2 3 Unitary matrix: 0.707107+i*0 0+i*0 0.707107+i*0 0+i*0 -1+i*0 0+i*0 -0.707107+i*0 0+i*0 0.707107+i*0 [saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 20 0 10 0 20 0 10 0 20 A = 20 0 10 0 20 0 10 0 20 Eigenvalues: 20 20 20 Unitary matrix: -1+i*0 4.85181e-173+i*0 0+i*0 0+i*0 -1+i*0 0+i*0 0+i*0 0+i*0 1+i*0 ------------------------------------------------------- Regards, Akira SaiToh  | 
| 
     
      
      
      From: Maho N. <ch...@ma...> - 2012-08-09 11:30:43
      
     
   | 
Hi Saitoh-san After patching, my result now seems to be correct: $ LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/home/maho/MPACK//lib ./test 1 0 1 0 0 0 1 0 1 A = 1 0 1 0 0 0 1 0 1 Eigenvalues: 0 0 2 Unitary matrix: 0+i*0 0.707107+i*0 0.707107+i*0 -1+i*0 0+i*0 0+i*0 0+i*0 -0.707107+i*0 0.707107+i*0 . Thanks Nakata Maho From: Akira SaiToh <aki...@ni...> Subject: Re: [Mplapack-devel] possibly a bug in Cheev Date: Thu, 09 Aug 2012 14:25:35 +0900 > Dear Nakata-san, > > Thanks for your suggestion. > Indeed, the matrix in the previous email was a singular matrix. > But sometimes Cheev fails to find eigenvalues also for a small > non-singular Hermitian matrix. A typical example is > > 2 0 1 > 0 2 0 > 1 0 2 > > Its eigenvalues are 1, 2, and 3. For this matrix, Cheev stops > together with the message "#Rlaev2 Checkpoint 13 Not checked" > as is similar to the previous example. > > I am not sure if this phenomenon is dependent on a system > environment. Could somebody test Cheev with the above matrix? > > Thank you for your attentions. > Regards, > Akira SaiToh > > 2012-08-09 10:22 Maho NAKATA wrote: >> Akira SaiTohさま >> >> すいません、遅くなって。 >> まだ解決してませんでしょうか。 >> >> singularじゃなさそうな行列を適当に入れてみたら >> 如何でしょうか。 >> thanks >  | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2012-08-09 09:43:56
      
     
   | 
Hi Saitoh-san
Please apply following patch.
$ diff -u mlapack/reference/Rlaev2.cpp~ mlapack/reference/Rlaev2.cpp
--- Rlaev2.cpp~ 2012-08-09 14:56:17.156927491 +0900
+++ Rlaev2.cpp  2012-08-09 18:37:00.635038092 +0900
@@ -142,8 +142,6 @@
            *cs1 = one;
            *sn1 = zero;
        } else {
-           printf("#Rlaev2 Checkpoint 13 Not checked\n");
-           exit(1);
            tn = -cs / tb;
            *cs1 = one / sqrt(one + tn * tn);
            *sn1 = tn * (*cs1);
This should work.
Thanks
 Nakata Maho
From: Akira SaiToh <aki...@ni...>
Subject: Re: [Mplapack-devel] possibly a bug in Cheev
Date: Thu, 09 Aug 2012 14:25:35 +0900
> Dear Nakata-san,
> 
> Thanks for your suggestion.
> Indeed, the matrix in the previous email was a singular matrix.
> But sometimes Cheev fails to find eigenvalues also for a small
> non-singular Hermitian matrix. A typical example is
> 
> 2 0 1
> 0 2 0
> 1 0 2
> 
> Its eigenvalues are 1, 2, and 3. For this matrix, Cheev stops
> together with the message "#Rlaev2 Checkpoint 13 Not checked"
> as is similar to the previous example.
> 
> I am not sure if this phenomenon is dependent on a system
> environment. Could somebody test Cheev with the above matrix?
> 
> Thank you for your attentions.
> Regards,
> Akira SaiToh
> 
> 2012-08-09 10:22 Maho NAKATA wrote:
>> Akira SaiTohさま
>>
>> すいません、遅くなって。
>> まだ解決してませんでしょうか。
>>
>> singularじゃなさそうな行列を適当に入れてみたら
>> 如何でしょうか。
>> thanks
> 
 | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2012-08-09 07:05:20
      
     
   | 
Hi
Sorry for being late.
> I tried Cheev of MPACK with GMP but could not get correct results
> for some simple matrices. For example, it does not diagonalize the
> 3 x 3 matrix
> 
> 1 0 1
> 0 0 0
> 1 0 1
> 
> and gives me "#Rlaev2 Checkpoint 13 Not checked". It works fine
> for diag(1,0,0) and a matrix filled with 1's among others.
I have just reproduced your result!
Thanks
 Nakata Maho
From: Akira SaiToh <aki...@ni...>
Subject: [Mplapack-devel] possibly a bug in Cheev
Date: Thu, 26 Jul 2012 19:05:41 +0900
> Dear Dr. Maho NAKATA,
> 
> I am a postdoc working on quantum information and the matrices
> I have to handle are quite often Hermitian matrices with some
> eigenvalues degenerate or close to each other. I am trying several
> libraries as well as writing my own library for matrix calculations
> required in my research.
> 
> I would like to report some issue as I think this is possibly a
> bug of MPACK. Or, it might be a problem in my code or the environment.
> 
> I tried Cheev of MPACK with GMP but could not get correct results
> for some simple matrices. For example, it does not diagonalize the
> 3 x 3 matrix
> 
> 1 0 1
> 0 0 0
> 1 0 1
> 
> and gives me "#Rlaev2 Checkpoint 13 Not checked". It works fine
> for diag(1,0,0) and a matrix filled with 1's among others.
> 
> I am using mpack-0.7.0 on Fedora 15 where GMP 4.3.2 exists.
> 
> Thanks for your attentions.
> Regards,
> Akira SAITOH, NII, Japan
> 
> ===========================================
> [saitoh@localhost test_cheev]$ g++ test.cpp -L/usr/local/lib 
> -lmlapack_gmp -lmblas_gmp -lmblas_gmp_ref -lmlapack_gmp_ref -lgmp -lmpfr 
> -lgmpxx -lgomp -g
> /usr/bin/ld: warning: libgmp.so.3, needed by 
> /usr/lib/gcc/x86_64-redhat-linux/4.6.3/../../../../lib64/libmpfr.so, may 
> conflict with libgmp.so.10
> [saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 
> 0 1 0 0 0 1 0 1
> A =
> 1 0 1
> 0 0 0
> 1 0 1
> #Rlaev2 Checkpoint 13 Not checked
> [saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 
> 0 0 0 0 0 0 0 0
> A =
> 1 0 0
> 0 0 0
> 0 0 0
> Eigenvalues:
> 0 0 1
> Unitary matrix:
> 0+i*0 0+i*0 1+i*0
> 1+i*0 0+i*0 0+i*0
> 0+i*0 1+i*0 0+i*0
> ===========================================
> 
> test.cpp:
> 
> ===========================================
> #include <gmpxx.h>
> #include <mpack/mpack_config.h>
> #include <mpack/mpc_class.h>
> #include <mpack/mlapack_gmp.h>
> #include <cstdlib>
> #include <iostream>
> int main (int argc, char* argv[])
> {
>    mpf_set_default_prec(512);
>    mpackint n = 3;
>    mpc_class *A = new mpc_class [n * n];
>    mpackint lda = n;
>    mpf_class *w = new mpf_class [n];
>    mpc_class *work = new mpc_class [2 * n - 1];
>    mpackint lwork = 2 * n - 1;
>    mpf_class *rwork = new mpf_class [3 * n - 2];
>    mpackint info;
> 
>    for (int i = 1; i < argc && i < n * n + 1; i++)
>      A[i-1] = ::atof(argv[i]);
> 
>    std::cout << "A = " << std::endl;
>    for (int i = 0; i < n; i ++)
>    {
>      for (int j = 0; j < n; j++)
>        std::cout << A[i + n * j].real().get_d() << " ";
>      std::cout << std::endl;
>    }
> 
>    Cheev ("V", "U", n, A, lda, w, work, lwork, rwork, &info);
> 
>    std::cout << "Eigenvalues: " << std::endl;
>    for (int i = 0; i < n; i++)
>      std::cout << w[i].get_d() << " ";
>    std::cout << std::endl;
> 
>    std::cout << "Unitary matrix:" << std::endl;
>    for (int i = 0; i < n; i ++)
>    {
>      for (int j = 0; j < n; j++)
>        std::cout << A[i + n * j].real().get_d() << "+i*"
>                  << A[i + n * j].imag().get_d() << " ";
>      std::cout << std::endl;
>    }
> 
>    delete [] A, w, work, rwork;
>    return 0;
> }
> ===========================================
> 
> 
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and 
> threat landscape has changed and how IT managers can respond. Discussions 
> will include endpoint security, mobile security and the latest in malware 
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Mplapack-devel mailing list
> Mpl...@li...
> https://lists.sourceforge.net/lists/listinfo/mplapack-devel
> 
 | 
| 
     
      
      
      From: Akira S. <aki...@ni...> - 2012-08-09 05:25:43
      
     
   | 
Dear Nakata-san, Thanks for your suggestion. Indeed, the matrix in the previous email was a singular matrix. But sometimes Cheev fails to find eigenvalues also for a small non-singular Hermitian matrix. A typical example is 2 0 1 0 2 0 1 0 2 Its eigenvalues are 1, 2, and 3. For this matrix, Cheev stops together with the message "#Rlaev2 Checkpoint 13 Not checked" as is similar to the previous example. I am not sure if this phenomenon is dependent on a system environment. Could somebody test Cheev with the above matrix? Thank you for your attentions. Regards, Akira SaiToh 2012-08-09 10:22 Maho NAKATA wrote: > Akira SaiTohさま > > すいません、遅くなって。 > まだ解決してませんでしょうか。 > > singularじゃなさそうな行列を適当に入れてみたら > 如何でしょうか。 > thanks  | 
| 
     
      
      
      From: Maho N. <ch...@ma...> - 2012-08-09 01:23:13
      
     
   | 
Akira SaiTohさま すいません、遅くなって。 まだ解決してませんでしょうか。 singularじゃなさそうな行列を適当に入れてみたら 如何でしょうか。 thanks  | 
| 
     
      
      
      From: Maho N. <ch...@ma...> - 2012-07-27 06:19:07
      
     
   | 
Dear Saitoh-san Thanks for your e-mail. I'll investigate... -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt  | 
| 
     
      
      
      From: Akira S. <aki...@ni...> - 2012-07-26 10:21:31
      
     
   | 
Dear Dr. Maho NAKATA,
I am a postdoc working on quantum information and the matrices
I have to handle are quite often Hermitian matrices with some
eigenvalues degenerate or close to each other. I am trying several
libraries as well as writing my own library for matrix calculations
required in my research.
I would like to report some issue as I think this is possibly a
bug of MPACK. Or, it might be a problem in my code or the environment.
I tried Cheev of MPACK with GMP but could not get correct results
for some simple matrices. For example, it does not diagonalize the
3 x 3 matrix
1 0 1
0 0 0
1 0 1
and gives me "#Rlaev2 Checkpoint 13 Not checked". It works fine
for diag(1,0,0) and a matrix filled with 1's among others.
I am using mpack-0.7.0 on Fedora 15 where GMP 4.3.2 exists.
Thanks for your attentions.
Regards,
Akira SAITOH, NII, Japan
===========================================
[saitoh@localhost test_cheev]$ g++ test.cpp -L/usr/local/lib 
-lmlapack_gmp -lmblas_gmp -lmblas_gmp_ref -lmlapack_gmp_ref -lgmp -lmpfr 
-lgmpxx -lgomp -g
/usr/bin/ld: warning: libgmp.so.3, needed by 
/usr/lib/gcc/x86_64-redhat-linux/4.6.3/../../../../lib64/libmpfr.so, may 
conflict with libgmp.so.10
[saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 
0 1 0 0 0 1 0 1
A =
1 0 1
0 0 0
1 0 1
#Rlaev2 Checkpoint 13 Not checked
[saitoh@localhost test_cheev]$ LD_LIBRARY_PATH=/usr/local/lib ./a.out 1 
0 0 0 0 0 0 0 0
A =
1 0 0
0 0 0
0 0 0
Eigenvalues:
0 0 1
Unitary matrix:
0+i*0 0+i*0 1+i*0
1+i*0 0+i*0 0+i*0
0+i*0 1+i*0 0+i*0
===========================================
test.cpp:
===========================================
#include <gmpxx.h>
#include <mpack/mpack_config.h>
#include <mpack/mpc_class.h>
#include <mpack/mlapack_gmp.h>
#include <cstdlib>
#include <iostream>
int main (int argc, char* argv[])
{
   mpf_set_default_prec(512);
   mpackint n = 3;
   mpc_class *A = new mpc_class [n * n];
   mpackint lda = n;
   mpf_class *w = new mpf_class [n];
   mpc_class *work = new mpc_class [2 * n - 1];
   mpackint lwork = 2 * n - 1;
   mpf_class *rwork = new mpf_class [3 * n - 2];
   mpackint info;
   for (int i = 1; i < argc && i < n * n + 1; i++)
     A[i-1] = ::atof(argv[i]);
   std::cout << "A = " << std::endl;
   for (int i = 0; i < n; i ++)
   {
     for (int j = 0; j < n; j++)
       std::cout << A[i + n * j].real().get_d() << " ";
     std::cout << std::endl;
   }
   Cheev ("V", "U", n, A, lda, w, work, lwork, rwork, &info);
   std::cout << "Eigenvalues: " << std::endl;
   for (int i = 0; i < n; i++)
     std::cout << w[i].get_d() << " ";
   std::cout << std::endl;
   std::cout << "Unitary matrix:" << std::endl;
   for (int i = 0; i < n; i ++)
   {
     for (int j = 0; j < n; j++)
       std::cout << A[i + n * j].real().get_d() << "+i*"
                 << A[i + n * j].imag().get_d() << " ";
     std::cout << std::endl;
   }
   delete [] A, w, work, rwork;
   return 0;
}
===========================================
 | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2012-06-16 08:40:46
      
     
   | 
Hi lists, I have just uploaded MPACK 0.7.0. you can download from here. http://sourceforge.net/projects/mplapack/files/mpack/mpack%200.7.0/mpack-0.7.0.tar.gz/download or https://sourceforge.net/projects/mplapack/files/mpack/mpack%200.7.0/mpack-0.7.0.tar.gz/download . MD5sum and SHA256sum are following: MD5: 77d285c83f66f196fd7e3379dd97365 SHA256: 14d11bf51f6d40c59937117d8f13095383dc080d9aea741ea234fd3bff8bdb15 . New features: * __float128 support: IEEE 754 2008 binary128 or quadruple precision, which is a gcc's original extension. Not available for Intel compiler. * The GMP and DD versions are build by default to reduce the build time significantly. * FORTRAN compiler is not needed anymore as default. Supported platforms: 64bit platform is strongly recommended. * MacOSX Lion (Xcode or gcc-4.6 of macports), Linux (Red Hat 6.0, Ubuntu 12.04), FreeBSD 9.0/amd64, 8.2/amd64. * Mingw32 (tested on FreeBSD 9.0/amd64 and wine64). Known issues: * Debugging feature is not supported on Mingw32. * QD, DD types are not supported on Mingw32. * Static library is not supported on all platforms. * Some debugging functions are known to fail. Especially Intel compilers outputs broken executables although these are very rare cases. Special Thanks To: * FUJISAWA, Katsuki * NAKASATO, Naoto * GOTO, Kazushige * IMAMURA, Toshiyuki * HIMENO, Ryutaro Acknowledgment: This work was supported by the Special Postdoctoral Researchers' Program of RIKEN, was partially supported by Grant-in-Aid for Scientific Research (B) 21300017, and CORE 6 project by Institute for Japanese Academic Research Collaboration of Microsoft research. Thanks, -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt  | 
| 
     
      
      
      From: Maho N. <ch...@ma...> - 2011-11-01 00:06:02
      
     
   | 
Hi Fletcher, Many thanks, and could you please paste your error? Are you trying to build on CUDA 4.0? Currenlty only CUDA 3.2 is supported... thanks Nakata Maho From: "Fletcher, John P" <j.p...@as...> Subject: RE: [Mplapack-devel] Accelerated double-double version of Regmm on NVIDIA C2050 Date: Mon, 31 Oct 2011 10:48:46 +0000 > Maho > > I am attempting to get this code working on my system with a GTX 460 card. > > At the moment I have a strange problem compiling the code which is a failure to find __builtin_isfinite. This seems to be a problem between gcc 4.4 and the nvcc compiler, which has occurred in a number of projects as well. I have overcome this temporarily by doctoring the isfinite implementation in dd_real.h > > The other problem is about compatibility of NVIDIA driver versions. > > I am hopeful of overcoming these as I have MPACK working with qd on my computer, and also code which works with CUDA. > > I will report further when I have a solution. > > If anyone else can help, please post a reply. > > Thank you for this work. > > John > > Dr John P. Fletcher Tel: (44) 121 204 3389 (direct line), FAX: (44) 121 204 3678 > Chemical Engineering and Applied Chemistry (CEAC), > Associate Dean - External Relations, > School of Engineering and Applied Science (EAS), > Aston University, Aston Triangle, BIRMINGHAM B4 7ET U.K. > > > > -----Original Message----- > From: Maho NAKATA [mailto:ch...@ma...] > Sent: 28 October 2011 01:54 > To: mpl...@li... > Cc: y-...@jf...; hi...@ri... > Subject: [Mplapack-devel] Accelerated double-double version of Regmm on NVIDIA C2050 > > Hi all, > > I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050. > You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . > more explicitly, > http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download > the file name is Rgemm_C2050_20111026.tar.gz. > > This is preliminary version of double-double version Rgemm, > for benchmarking purpose. > > Requirements: > * NVIDIA C2050 or C2070. > * CUDA 3.2 > * SDK assumed to be installed at /usr/local/cuda/. > > How to test: > $ tar xvfz Rgemm_C2050_20111026.tar.gz > $ cd Rgemm_C2050 > $ make > ... > building Rgemm for C2050, and taking benchmark and results are > saved as CSV files. > The default precision for multiplication and addition is rounding > IEEE's one. > subdir bench_all > test square matrix of various size. > subdir bench_ecc_onoff > test ecc on/off case. change in ECC configuration requires reboot. > subdir bench_jitter > test jitter of NVIDIA GPU. > subdir bench_pointerredirecting > test the effect of pointer redirecting. > subdir bench_rectangular > test matrix-matrix multiplication for rectangular matrix. > subdir bench_sloppy > test lower accurate methods. > > How to look at the results: > All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" > file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed, > (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows > "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means > 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". > > --quote start > device_count : 1 > device name -> Tesla C2050 > cudareturn -> 0 > cudaGetDevice()=0 > n - n mode > 2, 0.00007523 > 8, 0.00401315 > 15, 0.02368548 > 32, 0.18986460 > 47, 0.51354554 > 64, 1.10947988 > 65, 1.05225521 > 81, 1.77540829 > 97, 2.11975457 > ... > --quote end > > Notes: > The full reference version can be downloaded is > http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated > hopefully soon. > > License: > 2-clause BSD style license. See each files for details. > > Citation: > *"Acceleration of matrix-matrix product in double-double precision using GPU", > Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, > Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. > * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese). > > Programmed by: > Takao, Yasuyoshi, Nakata, Maho, and RIKEN. > > Enjoy, > -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ > http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt > > ------------------------------------------------------------------------------ > The demand for IT networking professionals continues to grow, and the > demand for specialized networking skills is growing even more rapidly. > Take a complimentary Learning@Cisco Self-Assessment and learn > about Cisco certifications, training, and career opportunities. > http://p.sf.net/sfu/cisco-dev2dev > _______________________________________________ > Mplapack-devel mailing list > Mpl...@li... > https://lists.sourceforge.net/lists/listinfo/mplapack-devel > >  | 
| 
     
      
      
      From: Fletcher, J. P <j.p...@as...> - 2011-10-31 11:04:06
      
     
   | 
Maho I am attempting to get this code working on my system with a GTX 460 card. At the moment I have a strange problem compiling the code which is a failure to find __builtin_isfinite. This seems to be a problem between gcc 4.4 and the nvcc compiler, which has occurred in a number of projects as well. I have overcome this temporarily by doctoring the isfinite implementation in dd_real.h The other problem is about compatibility of NVIDIA driver versions. I am hopeful of overcoming these as I have MPACK working with qd on my computer, and also code which works with CUDA. I will report further when I have a solution. If anyone else can help, please post a reply. Thank you for this work. John Dr John P. Fletcher Tel: (44) 121 204 3389 (direct line), FAX: (44) 121 204 3678 Chemical Engineering and Applied Chemistry (CEAC), Associate Dean - External Relations, School of Engineering and Applied Science (EAS), Aston University, Aston Triangle, BIRMINGHAM B4 7ET U.K. -----Original Message----- From: Maho NAKATA [mailto:ch...@ma...] Sent: 28 October 2011 01:54 To: mpl...@li... Cc: y-...@jf...; hi...@ri... Subject: [Mplapack-devel] Accelerated double-double version of Regmm on NVIDIA C2050 Hi all, I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050. You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . more explicitly, http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download the file name is Rgemm_C2050_20111026.tar.gz. This is preliminary version of double-double version Rgemm, for benchmarking purpose. Requirements: * NVIDIA C2050 or C2070. * CUDA 3.2 * SDK assumed to be installed at /usr/local/cuda/. How to test: $ tar xvfz Rgemm_C2050_20111026.tar.gz $ cd Rgemm_C2050 $ make ... building Rgemm for C2050, and taking benchmark and results are saved as CSV files. The default precision for multiplication and addition is rounding IEEE's one. subdir bench_all test square matrix of various size. subdir bench_ecc_onoff test ecc on/off case. change in ECC configuration requires reboot. subdir bench_jitter test jitter of NVIDIA GPU. subdir bench_pointerredirecting test the effect of pointer redirecting. subdir bench_rectangular test matrix-matrix multiplication for rectangular matrix. subdir bench_sloppy test lower accurate methods. How to look at the results: All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed, (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". --quote start device_count : 1 device name -> Tesla C2050 cudareturn -> 0 cudaGetDevice()=0 n - n mode 2, 0.00007523 8, 0.00401315 15, 0.02368548 32, 0.18986460 47, 0.51354554 64, 1.10947988 65, 1.05225521 81, 1.77540829 97, 2.11975457 ... --quote end Notes: The full reference version can be downloaded is http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated hopefully soon. License: 2-clause BSD style license. See each files for details. Citation: *"Acceleration of matrix-matrix product in double-double precision using GPU", Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese). Programmed by: Takao, Yasuyoshi, Nakata, Maho, and RIKEN. Enjoy, -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt ------------------------------------------------------------------------------ The demand for IT networking professionals continues to grow, and the demand for specialized networking skills is growing even more rapidly. Take a complimentary Learning@Cisco Self-Assessment and learn about Cisco certifications, training, and career opportunities. http://p.sf.net/sfu/cisco-dev2dev _______________________________________________ Mplapack-devel mailing list Mpl...@li... https://lists.sourceforge.net/lists/listinfo/mplapack-devel  | 
| 
     
      
      
      From: Maho N. <ch...@ma...> - 2011-10-29 03:20:29
      
     
   | 
Note: forgot to mention that this work has been supported by MS CORE 6 PROJECT, http://www.microsoft.com/ja-jp/ijarc/core/ifp_06_j.aspx . From: Maho NAKATA <ch...@ma...> Subject: Accelerated double-double version of Regmm on NVIDIA C2050 Date: Fri, 28 Oct 2011 09:53:40 +0900 (JST) > Hi all, > > I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050. > You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . > more explicitly, > http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download > the file name is Rgemm_C2050_20111026.tar.gz. > > This is preliminary version of double-double version Rgemm, > for benchmarking purpose. > > Requirements: > * NVIDIA C2050 or C2070. > * CUDA 3.2 > * SDK assumed to be installed at /usr/local/cuda/. > > How to test: > $ tar xvfz Rgemm_C2050_20111026.tar.gz > $ cd Rgemm_C2050 > $ make > ... > building Rgemm for C2050, and taking benchmark and results are > saved as CSV files. > The default precision for multiplication and addition is rounding > IEEE's one. > subdir bench_all > test square matrix of various size. > subdir bench_ecc_onoff > test ecc on/off case. change in ECC configuration requires reboot. > subdir bench_jitter > test jitter of NVIDIA GPU. > subdir bench_pointerredirecting > test the effect of pointer redirecting. > subdir bench_rectangular > test matrix-matrix multiplication for rectangular matrix. > subdir bench_sloppy > test lower accurate methods. > > How to look at the results: > All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" > file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed, > (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows > "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means > 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". > > --quote start > device_count : 1 > device name -> Tesla C2050 > cudareturn -> 0 > cudaGetDevice()=0 > n - n mode > 2, 0.00007523 > 8, 0.00401315 > 15, 0.02368548 > 32, 0.18986460 > 47, 0.51354554 > 64, 1.10947988 > 65, 1.05225521 > 81, 1.77540829 > 97, 2.11975457 > ... > --quote end > > Notes: > The full reference version can be downloaded is > http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated > hopefully soon. > > License: > 2-clause BSD style license. See each files for details. > > Citation: > *"Acceleration of matrix-matrix product in double-double precision using GPU", > Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, > Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. > * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese). > > Programmed by: > Takao, Yasuyoshi, Nakata, Maho, and RIKEN. > > Enjoy, > -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ > http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt >  | 
| 
     
      
      
      From: Maho N. <ma...@ri...> - 2011-10-28 00:46:03
      
     
   | 
Hi all, I have just uploded Accelerated double-double version of Regmm on NVIDIA C2050. You can download from here http://sourceforge.net/projects/mplapack/files/mpack/ . more explicitly, http://sourceforge.net/projects/mplapack/files/mpack/Rgemm_C2050_20111026.tar.gz/download the file name is Rgemm_C2050_20111026.tar.gz. This is preliminary version of double-double version Rgemm, for benchmarking purpose. Requirements: * NVIDIA C2050 or C2070. * CUDA 3.2 * SDK assumed to be installed at /usr/local/cuda/. How to test: $ tar xvfz Rgemm_C2050_20111026.tar.gz $ cd Rgemm_C2050 $ make ... building Rgemm for C2050, and taking benchmark and results are saved as CSV files. The default precision for multiplication and addition is rounding IEEE's one. subdir bench_all test square matrix of various size. subdir bench_ecc_onoff test ecc on/off case. change in ECC configuration requires reboot. subdir bench_jitter test jitter of NVIDIA GPU. subdir bench_pointerredirecting test the effect of pointer redirecting. subdir bench_rectangular test matrix-matrix multiplication for rectangular matrix. subdir bench_sloppy test lower accurate methods. How to look at the results: All results are in csv file. For example, please open the "Rgemm_NN_Total_bench.csv" file. This result includes CPU-GPU transfer time (Total), all matrices are not transposed, (NN) and all matrices are square matrices. First five lines can be ignored. Next line shows "2, 0.00007523". This means A, B, and C are 2x2 matrices, and 0.00007523 means 0.00007523 GFlops. Therefore, this csv file's format is "size, Gflops". --quote start device_count : 1 device name -> Tesla C2050 cudareturn -> 0 cudaGetDevice()=0 n - n mode 2, 0.00007523 8, 0.00401315 15, 0.02368548 32, 0.18986460 47, 0.51354554 64, 1.10947988 65, 1.05225521 81, 1.77540829 97, 2.11975457 ... --quote end Notes: The full reference version can be downloaded is http://mplapack.sourceforge.net/, and this version of Rgemm will be integrated hopefully soon. License: 2-clause BSD style license. See each files for details. Citation: *"Acceleration of matrix-matrix product in double-double precision using GPU", Maho NAKATA, Yasuyoshi TAKAO, Shigeho NODA, and Ryutaro HIMENO, Keisankogakukoenkai Ronbunsyuu, Vol. 16. 2011. * "Rgemm-preprint.ja.pdf" is included as well (preprint, in Japanese). Programmed by: Takao, Yasuyoshi, Nakata, Maho, and RIKEN. Enjoy, -- Nakata Maho http://accc.riken.jp/maho/ , JA OOO http://ja.openoffice.org/ http://blog.goo.ne.jp/nakatamaho/ ,GPG: http://accc.riken.jp/maho/maho.pgp.txt  |