From: Erik A. H. <er...@he...> - 2001-12-17 18:09:20
|
On Sun, Dec 16, 2001 at 05:21:39PM -0500, Nicholas Henke wrote: > Yes, but unfortunatley it is specific to Scyld's use of Bproc. They have > hacked in dependancies for beomap and beostatus. To use mpich requires you > to use this programs instead of another resource manager. We saw that here and ended up making our own little modification. Attached below is the MPICH patch. The down side is you need a special MPI run to use this. Unfortunately since that's a separate piece of code that I wrote from scratch here, there's a procedure to do through to release that. It's a very simple program though. Somebody could rewrite it a LOT faster than I can get it released if they're feeling impatient. The patch just creates a new "external execer" facility. For the program "app" and -np 4, mpirun would fork and bproc_execmove the following: rank 0: app -p4execer 0 4 n-1 45541 ;n5,0;n6,1;n7,1;n10,1 rank 1: app -p4execer 1 4 n5 41922 rank 2: app -p4execer 2 4 n5 41922 rank 3: app -p4execer 3 4 n5 41922 -p4execer is the magic argument and it works like this: for rank 0: -p4execer rank jobsize mpirunhost mpirunport procgroup for rank 1+: -p4execer rank jobsize rank0host rank0port The reason rank 0 is special is because it is the job that all the others must connect to in MPI_Init. In order to do that the others must know what host and port rank 0 is on. mpirun won't know what port to tell the others unless rank 0 tells it. That's why rank zero connects to mpirun and sends its port number. Then mpirun can start all the other jobs with approprate arguments. The format of the process group argument is: ;host0,0;host1,1;host2,1;host3,1 You could just wait for me to get our simple mpirun released but it probably wont be for a while since I probably can't do it before xmas and the lab is closed for a week then. - Erik |