Re: [Moose-g3-devel] parallel MOOSE

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Upinder S. Bhalla writes:
 > Nevertheless, I think that some form of visible postmaster is necessary.
 > My concern here is that I don't want to have hidden plumbing in MOOSE.
 > There was a lot of that in GENESIS, and it ended up being accessed through
 > a whole lot of special purpose calls. I would much rather have visible
 > 'plumbing' objects that most people do not need to worry about, which do
 > provide a consistent interface to those who do care about such things. For
 > example, where would you go to ask the system how many nodes the model was
 > running on, or if you want to explicitly shift objects around? For
 > instance, MOOSE already has the rather complex plumbing of the scheduler
 > completely visible.
Upi,
My feeling is that the functions associated with the postmasters *should* be
"hidden plumbing".  If someone checkpoints their model that happened to
be running on 16 processors, then with model-level postmaster objects, the
postmaster information would get included in the checkpoint file.  However,
say the next day that person wants to resume the run, but only 15 processors
are available (due to hardware failure or other people running jobs), then
it becomes messy to restore the model state.  This does not have to be a
messy operation if the connections are saved directly by the hidden plumbing,
and then restored based on the new localities of the source and destination
objects.

And while I think there ought to be a way for the model to find out how many
nodes it is running on (e.g. for displaying to the user), I don't think
the model writer should be encouraged to exploit that or to become involved
in explicit model partitioning.  The kind of information that the model
writer might provide would be something like "keep A on the same node as B
because those objects communicate frequently", rather than directives to
place A on node 7 and B on node 7.  That way, the model is not tied to
specific hardware and can be portably run on many systems.

 > Having said this, I really don't know what is to happen to the postmasters
 > should we shift to the actual MPI communication to a two-hop process as
 > you described. Obvious possibility is to use a variant of the postmaster
 > class, with the design requirement that the first-order user viewpont
 > remains that of a transparent, singly rooted object tree with ordinary
 > messaging between objects. Anyway, it is still some way off.

Yes, I agree with this design requirement.  I guess the difference between
our positions is that I would prefer not to have higher-order user viewpoints
of the communication infrastructure -- first, because it limits portability,
and second because I see almost no one using it.  Models get altered
frequently enough that it usually doesn't make sense to excessively
hand-tune details such as partitioning (provided that it is done
reasonably well in some automated way).  It's sort of like the situation
with compilers -- even if we could get a factor of 2 improvement by coding
directly in assembler, it's almost never worth the extra effort.

Regards,
--Greg

Re: [Moose-g3-devel] parallel MOOSE

Multiscale Neuroscience and Systems Biology Simulator

Re: [Moose-g3-devel] parallel MOOSE