LaunchMON / Feature Requests / #4 Abstract away differences in RM launch commands

#4 Abstract away differences in RM launch commands

Status: open

Owner: nobody

Labels: None

Priority: 5

Updated: 2009-05-14

Created: 2009-05-14

Creator: Mark O'Connor

Private: No

From a tool writer's perspective, LaunchMON would ideally abstract
away enough of the RM to enable starting and attaching to jobs in the
majority of cases. Currently, it abstracts the RM details away when
attaching daemons to a running process - only the hostname and pid of
the RM are necessary. However, when starting an mpi job with daemons
co-hosted, the path to the RM and a list of its arguments must be
provided. This passes RM-dependency back to the tool, which then needs
to discover the MPIs installed on the system, select the correct one,
understand whether it uses -np or -n or -proc to specify the number if
processes and so on. Of course, RMs typically provide a vast array of
options many of which are unique to that particular RM - it's not
expected that LaunchMON should abstract away all differences between
the RMs; just the ones they have in common and are most-often used. In
this case, the path of the target program, its arguments and the
number of processes might be enough.

In this case, LaunchMON would expose an extra, higher-level, way to
start a job, requiring only the path to target executable, its
arguments, the number of processes to launch and an additional array
of strings to pass to the RM as arguments. This is similar to the
level of abstraction provided to the user in DDT's GUI - the RM is
detected at install, and after that most users simply choose the
number of processes and their program. If they wish to use specific RM
features not abstracted by the GUI, they have the option of passing
extra arguments directly to the RM.

Note: I think it would be acceptable if LaunchMON did not attempt to
abstract out handling of queuing systems - this can be very complex.
Instead, the caller must ensure that, if necessary, it is executed by
the batch system with the appropriate configuration options set.

Discussion

Dong Ahn - 2009-06-04

We had email exchanges for this item and I repeat that this is an excellent idea. However, this will involve pretty extensive engineering work, compared to other enhancement requests you have. So this won't happen right away. We will see how this can be done....

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Dong Ahn - 2010-06-18

Mark and I discussed this a bit at a concall. As the first step (an incremental step), I proposed to add

typedef struct {
rm_catalogue_e rm_type;
pid_t rm_launcher_pid;
/* what else info?*/
} lmon_rm_info_t;

lmon_rc_e LMON_fe_getRMInfo (int sessionHandle, lmon_rm_info_t *info);

This will return "info" on the underlying RM that LaunchMON is nteracting with. If you can think of other RM-related information that
would be useful, please let me know. More fields can be created in lmon_rm_info_t. I recently added the rm_catalogue_e type in
http://launchmon.svn.sourceforge.net/viewvc/launchmon/branches/launchmon-0.7-release/launchmon/src/sdbg_rm_map.hxx?revision=275&view=markup

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Abstract away differences in RM launch commands

Group

Searches

Help

#4 Abstract away differences in RM launch commands

Discussion