I'm an IBM'er working with Celera on code optimization. They've asked me to take a look a apbs/MPI. They have not been big MPI users up to now.

I built the system with the following

CC=mpcc_r
F77=mpxlf_r
CFLAGS="-O3 -qstrict -qarch=pwr3 -qtune=pwr3 -qcache=auto -qmaxmem"
FFLAGS="-qfixed=132 -O3 -qstrict -qarch=pwr3 -qtune=pwr3 -qcache=auto -qmaxmem"
LDFLAGS="-bmaxdata:0x80000000 -bmaxstack:0x10000000 -L/usr/local/lib -lmass -lessl "

For maloc
configure --prefix <install directory> --enable_mpi --enable_blas=no
gmake install

For apbs-0.2.6
configure --prefix <install directory> --with_blas="-L/usr/lib -lblas"
gmake install

It seems to have hooked into IBM's MPI (poe) because It complains if MP_PROCS or other MPI related environment variables are set incorrectly. I ran it using the apbs-PARALLEL.in input file in examples/actin-dimer (modified to run with 1, 2, 4, & 8 cpus). I'm not seeing any speedup when I add cpus. As a matter of fact, it seems to run in the same amount of time or longer as I add cpu's. When I profile the code , I see most of the time is spent in "ivdwAccExclus". The time spent in this routine is the same or more for each thread even when it's run on multiple cpu's. I must be doing something wrong.

If I look at which MPI routines are being used, I see MPI_Comm_size, MPI_Comm_rank, and MPI_ALLreduce all being used only once. When I look thru the source, it's very hard to determine exactly how parallelism is entering the problem.  Can anyone help?

Thanks,

Lawrence Hannon