|
From: David S. <ds...@um...> - 2008-01-18 18:49:44
|
Hello MVAPICH and VALGRIND I am a research associate at UMASSD. I work on a numerical ocean model, fvcom, written in F90. We have recently run into problems: forrtl: error (78): process killed (SIGTERM) forrtl: error (78): process killed (SIGTERM) forrtl: error (78): process killed (SIGTERM) mpiexec: Warning: tasks 0-1,3 exited with status 1. mpiexec: Warning: task 2 died with signal 11 (Segmentation fault). The error is problem size dependent The error is compiler optimization dependent. The error only occurs when running on more than one node. (in the example error above, I used 2 procs. per node, on 2 nodes) If I run on four procs in one node, the code passes! The only clue that I have is that the problem seems to be related to subroutines which use explicit shape arrays - but I have checked all the upper and lower bounds. Running under valgrind or compiling with '-check all' in ifort allows the routine to pass? It seems my only hope for tracing this mess is using valgrind, but I am having trouble using valgrind on our cluster. It does run but I am concerned that it is not running properly. The mpi_init call alone results in hundreds of errors in the mpi and vapi libraries including leaks, uninitialized memory use/conditionals and invalid read/writes. Has anyone had success using valgrind with mvapich2? Valgrind also found problems with the fvcom fortran code but most of these seemed to go away when I increased the max-framestack. None of the remaining errors seem to be related to what causes the sigsev when I run without valgrind. Selected system info: Nodes are Dell 1850. Intel Xeon EM64-T Network is Infiniband PCI-EX 4X System is Rocks 4.2 Thread model: posix gcc version 3.4.6 20060404 (Red Hat 3.4.6-3) ifort Version 9.1 mpif90 for mvapich2-1.0 valgrind-3.2.3 mpiexec-0.82 Again, all of these tools/libraries seem to work fine under normal tests, but this particular combination of code and model case is causing a real mess! Thanks for any help you can offer! David |