Thread: Re: [Apbs-users] lack of speedup for apbs on AIX 5L using MPI
Biomolecular electrostatics software
Brought to you by:
sobolevnrm
From: Nathan A. B. <ba...@ch...> - 2004-03-31 21:05:34
|
Hi Lawrence -- Actually, I'd leave VREDFRAC alone unless you can demonstrate that the results (usually binding or solvation energies) are insensitive to the choice of this variable for your application. Robert's solution is the best way to demonstrate scaling for APBS; the folks at SDSC and NCSA have a fair amount of experience with this -- APBS often gets used as a benchmark on new platforms. The psize.py utility is mentioned in the user manual at http://agave.wustl.edu/apbs/doc/html/user-guide/x371.html#AEN415 and is located at apbs/tools/manip/psize.py Thanks, Nathan Lawrence Hannon <ha...@us...> (03-31-2004 12:22:39-0600): >Nathan, > >Thanks for the explanation. Can you verify the following. > >The version I have will not parallelize well on any problem. I have to >rebuild the app with a smaller value for VREDFRAC (I didn't find VREDRAT >in src/generic/apbs/vhal.h). The value I choose is application dependent. >I'm guessing I should experiment with different values to make sure that >the answers are OK. You mentioned that psize.py could help. Is it >documented somewhere? > >Thanks, > >Lawrence > > > > > >"Nathan A. Baker" <ba...@ch...> >Sent by: apb...@ch... >03/31/2004 11:20 AM > > > To: Lawrence Hannon/Houston/IBM@IBMUS > cc: apb...@ch... > Subject: Re: [Apbs-users] lack of speedup for apbs on AIX 5L using MPI > > > >Hi Lawrence -- > >(Warning -- long e-mail ahead... figured I'd post all the details to >the mailing list for posterity) > >APBS was designed to enable electrostatics calculations on very large >biological systems where single processor calculations are not >feasible due to memory restrictions. > >Here's a summary of a typical situation where APBS's parallel >capabailities are useful... > >Assume we have a protein (or chunk of protein) of dimensions (x, y, z) >whose potential we want to calculate. In general, we first choose the >fine (lxf, lyf, lzf) and coarse (lxc, lyc, lzc) grid lengths of the >calculation. We then specify either the number of grid points (nx, ny, >nz) used by the solver or the grid spacings (hx, hy, hz). These two >quantities are related by: > > nx = lxf/hx + 1 > ny = lyf/hy + 1 > nz = lzf/hz + 1 > >The amount of memory an APBS calculation requires is directly >proportional to (nx*ny*nz). > >APBS focuses the solution from the coarse to fine mesh (using the same >number of grid points) in a controlled manner by requiring the grid >spacing/length change between two focusing levels to be less than a >specified ratio: > > lx2/lx1 < eps < 1 > ly2/ly1 < eps < 1 > lz2/lz1 < eps < 1 > >In doing so, it sets the number of focusing levels in a calculation: > > m = max_{i = x,y,z} log(lif/lic)/log(eps) > >For a given number of grid points, the length of time that an APBS >calculation runs is directly proportional to the number of focusing >levels (m). > >The only thing that parallel focusing does differently from a typical >sequential calculation is to choose smaller fine grid lengths > > lxfp ~ (1 + 2*sigma) lxf/npx > lyfp ~ (1 + 2*sigma) lyf/npy > lzfp ~ (1 + 2*sigma) lzf/npz > >based on the size of the processor array (npx, npy, npz) and the >desired overlap between processor grids (sigma). > >Now, consider a PBE calculation on a large biomolecular system where >we choose the number of grid points (nx, ny, nz) based on the >available memory per processor and the grid lengths (lxf, lyf, lzf) >and (lxc, lyc, lzc) based on the size of the protein. For a large >molecule, we'll find that these platform- and protein-specfic settings >give grid spacings (hx, hy, hz) that are too large for accurate PBE >calculations and, therefore, we'll need to use parallel focusing. > >Based on the arguments above, you can see that the number of focusing >levels (and therefore the computational time) will scale as > > m ~ max_{i = x, y, z} log(lif/lic/npi)/log(eps) > >In the original implementation, eps was so small (~1/100) that the >logarithmic dependence was negligible on the available computational >platforms. This gives us a claim to linear scaling in a manner >entirely analogous to the fast multipole method (the similarity >between the methods is more than superficial, BTW). > >This choice seemed to work well for the cases I examined -- mainly >ligand binding and comparison of average potentials away from protein >surfaces. However, I later discovered that the resulting potentials >with small eps did not give reliable results for protein-protein >interactions. Therefore, I chose a very conservative value (0.25) as >the default in recent versions of APBS. This value is overkill but it >ensures that users will likely never receive "surprising" errors with >APBS parallel focusing -- this value is also causing the >less-than-linear scaling you're observing. This value can be modified >(see VREDRAT in src/generic/apbs/vhal.h) to values appropriate to your >application. > >The upshot of this long e-mail is that APBS allows users to look at >large systems that are not possible with a single machine. Situations >where this is warranted are indicated by the psize.py utility provided >with APBS. The algorithm is completely latency-tolerant (we have a >version that requires no communication) and scales linearly under >certain circumstances. > >Thanks, > >Nathan > >Lawrence Hannon <ha...@us...> (03-31-2004 11:07:50-0600): >>I'm an IBM'er working with Celera on code optimization. They've asked me >>to take a look a apbs/MPI. They have not been big MPI users up to now. >> >>I built the system with the following >> >>CC=mpcc_r >>F77=mpxlf_r >>CFLAGS="-O3 -qstrict -qarch=pwr3 -qtune=pwr3 -qcache=auto -qmaxmem" >>FFLAGS="-qfixed=132 -O3 -qstrict -qarch=pwr3 -qtune=pwr3 -qcache=auto >>-qmaxmem" >>LDFLAGS="-bmaxdata:0x80000000 -bmaxstack:0x10000000 -L/usr/local/lib >>-lmass -lessl " >> >>For maloc >>configure --prefix <install directory> --enable_mpi --enable_blas=no >>gmake install >> >>For apbs-0.2.6 >>configure --prefix <install directory> --with_blas="-L/usr/lib -lblas" >>gmake install >> >>It seems to have hooked into IBM's MPI (poe) because It complains if >>MP_PROCS or other MPI related environment variables are set incorrectly. >I >>ran it using the apbs-PARALLEL.in input file in examples/actin-dimer >>(modified to run with 1, 2, 4, & 8 cpus). I'm not seeing any speedup when > >>I add cpus. As a matter of fact, it seems to run in the same amount of >>time or longer as I add cpu's. When I profile the code , I see most of >the >>time is spent in "ivdwAccExclus". The time spent in this routine is the >>same or more for each thread even when it's run on multiple cpu's. I must > >>be doing something wrong. >> >>If I look at which MPI routines are being used, I see MPI_Comm_size, >>MPI_Comm_rank, and MPI_ALLreduce all being used only once. When I look >>thru the source, it's very hard to determine exactly how parallelism is >>entering the problem. Can anyone help? >> >>Thanks, >> >>Lawrence HannonEnd of message from Lawrence Hannon. > >-- >Nathan A. Baker, Assistant Professor >Washington University in St. Louis School of Medicine >Dept. of Biochemistry and Molecular Biophysics >Center for Computational Biology >700 S. Euclid Ave., Campus Box 8036, St. Louis, MO 63110 >Phone: (314) 362-2040, Fax: (314) 362-0234 >URL: http://www.biochem.wustl.edu/~baker >PGP key: http://cholla.wustl.edu/~baker/pubkey.asc >_______________________________________________ >apbs-users mailing list >apb...@ch... >http://cholla.wustl.edu/mailman/listinfo/apbs-users > > End of message from Lawrence Hannon. -- Nathan A. Baker, Assistant Professor Washington University in St. Louis School of Medicine Dept. of Biochemistry and Molecular Biophysics Center for Computational Biology 700 S. Euclid Ave., Campus Box 8036, St. Louis, MO 63110 Phone: (314) 362-2040, Fax: (314) 362-0234 URL: http://www.biochem.wustl.edu/~baker PGP key: http://cholla.wustl.edu/~baker/pubkey.asc |