Re: [LaunchMON-devel] Increasing the level of abstraction offered by LaunchMON
Brought to you by:
dongahn
From: Mark O'C. <ma...@al...> - 2009-03-19 08:45:00
|
Hi all, In a recent conference call between Dong Ahn, David and myself were discussing ways to enhance LaunchMON's abstraction of the "launch a job and co-locate these daemons" task. Ideally we'd like to see LaunchMON provide an MPI-independent interface to the API-user, at least for simple/common cases. I've summarized our thoughts here - Dong suggested bouncing this to the general distribution list because he thought it's worth thinking through carefully. >From a tool writer's perspective, LaunchMON would ideally abstract away enough of the RM to enable starting and attaching to jobs in the majority of cases. Currently, it abstracts the RM details away when attaching daemons to a running process - only the hostname and pid of the RM are necessary. However, when starting an mpi job with daemons co-hosted, the path to the RM and a list of its arguments must be provided. This passes RM-dependency back to the tool, which then needs to discover the MPIs installed on the system, select the correct one, understand whether it uses -np or -n or -proc to specify the number if processes and so on. Of course, RMs typically provide a vast array of options many of which are unique to that particular RM - it's not expected that LaunchMON should abstract away all differences between the RMs; just the ones they have in common and are most-often used. In this case, the path of the target program, its arguments and the number of processes might be enough. To do this, LaunchMON could expose an extra, higher-level, way to start a job, requiring only the path to target executable, its arguments, the number of processes to launch and an additional array of strings to pass to the RM as arguments. This is similar to the level of abstraction provided to the user in DDT's GUI - the RM is detected at install, and after that most users simply choose the number of processes and their program. If they wish to use specific RM features not abstracted by the GUI, they have the option of passing extra arguments directly to the RM. Note: I think it would be acceptable if LaunchMON did not attempt to abstract out handling of queuing systems - this can be very complex. Instead, the caller must ensure that, if necessary, it is executed by the batch system with the appropriate configuration options set. As mentioned, DDT already contains the above abstractions for a vast list of RMs; this experience might be something we could contribute to the LaunchMON project; from a purely technical perspective repackaging these into a reusable bundle would not require an undue amount of effort. Mark Allinea Software |