The modern HPC environment typically consists of many multicore nodes together with high-speed interconnects, usually with some number of GPU nodes. The current design does not include consideration of GPU parallel applications. Instead, we focus on the need to support MPI, OpenMP, shared memory, and hybrid applications. This design does not preclude use of the infrastructure by GPU parallel applications, however, but since the infrastructure itself is not GPU parallel, there are some resulting design constraints for the integrated system involving GPU parallel user applications. In the following discussions, the user will loosely refer to the entity seeking to integrate existing code, or use an integrated system. The entity could be a person, a group, a company, etc. Figures 1 and 2 illustrate the types of application this effort is targeting, and how parallel applications are represented in the discussion Figures.
Figure 1. - The current design of the integration infrastructure targets fairly general parallel user applications. Each rectangle is an application process and the communication substrate (if any) is specific to the parallelization strategy of the application.
Figure 2. - Communication substrates of user applications are typically MPI, OpenMP, or shared memory.
It will be useful to describe the parts of the system for serial applications in order to identify the main components and introduce the acronyms. Except where otherwise noted, the descriptions and Figures that follow employ a color and form coding:
Figure 3. - High level architecture for single integrated application that is serial.
Figure 3 covers most of the main high-level components of the infrastructure design. Infrastructure constructs are used in shallow modifications to the user's application, producing an application system component (ASC). An ASC is an application-native software construct with an embedded component-side client (CSC). The CSC is an application-native software construct that uses an infrastructure API to provide read/write/execute access to the original user application's native data structures and computational methods. This access is communicated over an inter-component communication (ICC) method to one or more component interface objects (CIO) in the integrated system. Integrated systems are accomplished by the production of the System Integration Manager (SIM). The SIM manages all of the CIO for all of the applications and mediates their interactions, as shown in Figure 5. Each of the infrastructure constructs will be described in more detail below.
How the AC and SIM map to processes is application and system specific and has implications on the allowed communication substrate for the ICC. The likely situations are summarized in Figure 4.
(4a) - Integrated system with single process.
(4b) - Integrated system with separate processes.
Figure 4. Two typical mappings of the integrated system to processes. The integrated system runs either as a single process, as in (a), or with the AC as a separate process. Further distinctions include whether the application process is a child of the SIM process, and whether the processes are separated by a network.
If the SIM and the AC are implemented in a single process, as shown in Figure 4a, then the ICC would be most efficiently implemented as a direct memory access. On the other hand, if the AC and SIM are in separate processes, then the ICC has to use some sort of communication method. Possible choices for the communication method include MPI, MPI2, shared memory, TCP/IP, some other Unix IPC, or even files in some cases. The available communication substrates for the ICC are implementation-specific, but clearly have implications for the types of integrated systems that can be constructed.
The general system architecture with multiple integrated serial applications is shown in Figure 5 (see System Architecture for a discussion on the layers and the service applications). In general, each AC can use any of the implemented ICC methods. The SIM manages all of the CIO for all of the AC, mediates their interactions, and usually (but not always) manages the control flow between the integrated components. Depending on implementation, integrated components can be serial or parallel standalone applications, or libraries, spawned separately or as children of the SIM process.
Figure 5. - Integrated system with multiple serial applications.
The integrated system for a single parallel application is shown in Figure 6. The parallel application typically, but not always, uses MPI for the communication substrate. The application's native communication method is used to provide inter-process communication between the CSC's for each processor as shown in Figure 6a. Each CSC can communicate with the SIM process via the ICC, as shown in Figure 6a, or the SIM/CSC communication can be done by only one CSC as shown in Figure 6b. Parallel components with a single communicating CSC are usually OpenMP, or otherwise threaded applications, but this situation can also arise with a master-slave-type parallelization model using MPI or another IPC method.
(6a). - Single parallel application integration with a CSC for each application process.
(6b). - Single parallel application integration with single CSC from application master process.
Figure 6. Two different models for integration of parallel applications. The architecture in (a) is typical for MPI-parallel SPMD-type applications while that shown in (b) is typical for OpenMP, or master-slave type parallelism.
In parallel integrated systems, the SIM can itself be a parallel construct, as shown in Figures 6 and 7. Parallelization of the SIM requires a communication method for SIM process IPC. This is usually done by MPI, but is implementation-specific in general.
Each integrated component will have its own native communication method, and parallelization strategy, in general, and the infrastructure constructs need to support the integration of these disparately developed and parallelized application components. The overall integrated parallel system as imagined is illustrated in Figure 7.
Figure 7. - Integrated system with multiple parallel applications.
The above is meant to be a general description of integrated system architecture. Several existing multiphysics integration packages (e.g. Rocstar, PreCICE, MCT, and LIME), implement systems very similar to those described above. Consortium-based implementations are described here.
Wiki: Software Integration Implementation
Wiki: System Archictecture
Wiki: Technical Approach