PMCGPU Wiki

Parallel simulators for Membrane Computing on the GPU

Status: Beta

Brought to you by: mdelamor

ENPS

Authors:

Parallel simulation for Enzymatic Numerical P Systems (ENPS-GPU)

Parallel simulation for Enzymatic Numerical P Systems (ENPS-GPU)

1. Team

Manuel García-Quismondo, Ph.D. student (University of Seville)
Mario de Jesús Pérez Jiménez, Ph.D. (University of Seville)

2. Integration of SNUPS on P-Lingua

A previously-existent simulator for ENPS models known as SNUPS was integrated in P-Lingua. As a results, functionality provided by P-Lingua framework is available for ENPS models, that is: translation among formats and simulations. To simulate ENPS models, P-Lingua delegates on SNUPS by using the standard extension mechanism provided on the framework.

3. Simulator based on C/C++

When developing a parallel simulator, the first step is usually developing a sequential conterpart. Then, following the development lifecycle, the next step consists on profiling, that is, analyzing which parts of the software consume most of the time. This analysis is directly related with the Amdahl's law, which measures the acceleration gained according to the time that a section of the software under study is run.

The implementation of the simulator is rather simple: it consists on a C++ class which reads the system to simulate from an XML file. The XML format is the same as in SNUPS. In order to achieve simplicity, variable names are kept in a diccionary, as in the simulation they will be addressed by integer indexes.

ENPS simulation consists of the following steps: first of all, distribution protocol coefficients are normalized. Then, for each computation step, the implementation in CUDA/C++ reproduces the stages of ENPS models:

Selection of applicable programs
Calculation of production functions
Distribution of production function results according to repartition protocols

The simulator initalizes a vector with as many entries as model's programs as true. Then, it loops through all variables in all production functions whose program is in enzymatic form. Then, it compares its value against the program's enzyme, setting it to true if the enzyme's value is greater than that of the variable. In such a case, the loop continues with the next program.

Then, the simulator loops through all production functions and calculates their values recursively. To simplify the implementation, production functions are implemented as linked trees, in such a way that recursive calls for tree trasversal simply entails setting a children parameter as main node. Each node contains the following fields:

Node type: constant, variable or operator. In addition, each operator type (plus, minus, times, divide, modulo, power) is a type by itself.
Node value: This field stores the node's value for constants and variables and the results of operations for operation nodes.
Left and right children.

The results of operations for operation nodes are recursively calculated, passing through its children as arguments to a recursive function.

Then, the simulator loops through all variables in all repartition protocols and adds up the value of their program's production function multiplying it by the coefficient for the variable in the protocol, updating the variable's value for the next simulation step.

4. CUDA/C++ Simulator

In order to achieve a lower simulation time in comparison with Java, C/C++ and CUDA simulators, a CUDA-based simulator was implemented. This implementation requires for its execution that a Nvidia with compute capability at least 2.0 is installed. The reason is that system rules (or programs, to be consistent with ENPS nomenclature) are defined as general, unbounded-depth arithmetic expressions, which are commonly implemented as trees. Due to the recursive nature of trees, their travesal is usually (and, in most cases, conveniently) implemented as a recursive algorithm. Although they can be simulated using functionality previous to compute capability 2.0, recursive calls to device functions are solely available from version 2.0 and on. Henceforth, as a future improvement it is proposed to simulate these recursive calls by using these features so as to provide backwards compatibility.

The program structure is equivalent to that of its C/C++ counterpart, following the same stages. However, their implementation is obviously different, as it is intended to take advantage of GPU's parallel architecture.

To determine which programs are to be executed, it is necessary to distinguish between programs in enzymatic form and programs in non-enzymatic form. On every step, every program in non-enzymatic form is executed, whereas only those programs in enzymatic form such as the value of the enzyme is strictly greater than the minimum variable of its production function are executed. In contrast to most Membrane Computing models, ENPSs are deterministic, so every program which can be executed must be executed.

In this simulator, this is implemented by using a shared array with as many entries as programs are in the model. Each thread block is assigned to a different program. If the program is in non-enzymatic form, then its position in the array is set to true (1). Otherwise, each thread in the block checks a different variable in the block's program production function. Then, each thread's variable is compared against the program enzyme, and an OR operation (|) is performed with the corresponding position in the shared array.

Secondly, production functions are calculated. Trees representing production functions are implemented in four arrays: one to determine the node's type (constant, variable or operation), one to determine the node's value and two for the left and right children of operation nodes. In this context, each thread is responsible for computing a production function. If its node type is a constant or a variable, then its value is evaluated as the result of the function. Otherwise, its left and right children are recursively computed and the value resulting from the application of their parent's operation is returned.

Distribution by means of repartition protocols is possibly the most straightforward stage. Each thread is assigned a variable in a repartition protocol. Then, it merely multiplies the result of its protocol's program by its normalized repartition coefficient. Then, it adds up the result to the thread's variable.

The experiments were ran on computers with multi-core processor Intel i7 and Nvidia GeForce GTX 460M graphic card with 1.5GB of dedicated RAM memory, starting to pay off at ENPS models with 1500 membanes and achieving a speed-up factor of about 11x for models with 100000 membranes.

5. Publications

5.1. Book chapters

M. García-Quismondo, L.F. Macías-Ramos, M.J. Pérez-Jiménez. Implementing enzymatic numerical P systems for AI applications by means of graphic processing units. In J. Kelemen, J. Romportl and E. Zackova (eds.) Beyond Artificial Intelligence. Contemplations, Expectations, Applications, Springer, Berlin-Heidelberg, Series: Topics in Intelligent Engineering and Informatics, Volume 4, 2013, chapter XIV, pp. 137-159.J A preliminary version in J. Romportl, P. Ircing, E. Zacková, R. Schuster, M. Polák (eds.) Beyond AI: Interdisciplinary Aspects of Artificial Intelligence. Proceedings of the International Conference Beyond AI 2011, Pilsen, Czech Republic, December 8-9, 2011. Published by University of West Bohemia, Pilsen, 2011, pp. 27-33.

5.2. Conference contributions

M. García-Quismondo, A.B. Pavel, M.J. Pérez-Jiménez. Simulating large-scale ENPS models by means of GPU. In M.A. Martínez del Amor, Gh. Paun, I. Pérez Hurtado, F.J. Romero (eds.) Proceedings of the Tenth Brainstorming Week on Membrane Computing, Volume I, Seville, Spain, January 30- February 3, 2012, Report RGNC 01/2012, Fénix Editora, 2012, pp. 137-152.

6. Downloads

ENPS C/C++ Simulator

ENPS CUDA/C++ Simulator

Refer to this link simulators. It is in the root folder of files of PMCGPU. Here is a link to use ENPS C/C++ simulator AND ENPS CUDA/C++ simulator.

7. How to acknowledge

If you intend to create a branch of ENPS C/C++ simulator or ENPS CUDA/C++ simulator, or use its produced results, please consider citing the aforementioned publications.

8. Funding

This work has been supported by project TIN 2009-13192 from “Ministerio de Ciencia e Innovación” of Spain, co-financed by FEDER funds, “Proyecto de Excelencia con Investigador de Reconocida
Valía P08-TIC-04200” from Junta de Andalucía and National FPU Grant Programme from the Spanish Ministry of Education.