From: Greg H. <gh...@ps...> - 2006-11-07 01:34:21
|
Upinder S. Bhalla writes: > Nevertheless, I think that some form of visible postmaster is necessary. > My concern here is that I don't want to have hidden plumbing in MOOSE. > There was a lot of that in GENESIS, and it ended up being accessed through > a whole lot of special purpose calls. I would much rather have visible > 'plumbing' objects that most people do not need to worry about, which do > provide a consistent interface to those who do care about such things. For > example, where would you go to ask the system how many nodes the model was > running on, or if you want to explicitly shift objects around? For > instance, MOOSE already has the rather complex plumbing of the scheduler > completely visible. Upi, My feeling is that the functions associated with the postmasters *should* be "hidden plumbing". If someone checkpoints their model that happened to be running on 16 processors, then with model-level postmaster objects, the postmaster information would get included in the checkpoint file. However, say the next day that person wants to resume the run, but only 15 processors are available (due to hardware failure or other people running jobs), then it becomes messy to restore the model state. This does not have to be a messy operation if the connections are saved directly by the hidden plumbing, and then restored based on the new localities of the source and destination objects. And while I think there ought to be a way for the model to find out how many nodes it is running on (e.g. for displaying to the user), I don't think the model writer should be encouraged to exploit that or to become involved in explicit model partitioning. The kind of information that the model writer might provide would be something like "keep A on the same node as B because those objects communicate frequently", rather than directives to place A on node 7 and B on node 7. That way, the model is not tied to specific hardware and can be portably run on many systems. > Having said this, I really don't know what is to happen to the postmasters > should we shift to the actual MPI communication to a two-hop process as > you described. Obvious possibility is to use a variant of the postmaster > class, with the design requirement that the first-order user viewpont > remains that of a transparent, singly rooted object tree with ordinary > messaging between objects. Anyway, it is still some way off. Yes, I agree with this design requirement. I guess the difference between our positions is that I would prefer not to have higher-order user viewpoints of the communication infrastructure -- first, because it limits portability, and second because I see almost no one using it. Models get altered frequently enough that it usually doesn't make sense to excessively hand-tune details such as partitioning (provided that it is done reasonably well in some automated way). It's sort of like the situation with compilers -- even if we could get a factor of 2 improvement by coding directly in assembler, it's almost never worth the extra effort. Regards, --Greg |