CPU : AMD Opteron 2.3GHz x 16 Processor
Memory : 32GB
OS: RHEL7
openmpi-1.6.4-3.el7.x86_64, openmpi-devel-1.6.4-3.el7.x86_64 (package of RHEL7)
16 containers of LXCF are made, and it allocates it for exclusive use by one CPU a container.
You should not connect it directly with the Internet so as not to become a problem on security because firewalld and SELinux are stopped.
Openmpi is used.
# yum install openmpi openmpi-devel
It is necessary to do the setting (The host who can do login limits it) without the password of ssh to use openmpi.
Therefore, the user for the execution of the exclusive use "mpiusr" is registered.
# useradd -m mpiusr
# su - mpiusr
Passing openmpi is set to the .bashrc file. Please add the following setting to .bashrc.
export PATH=/usr/lib64/openmpi/bin:${PATH}
export LD_LIBRARY_PATH=/usr/lib64/openmpi/lib:${LD_LIBRARY_PATH}
export MANPATH=/usr/share/man/openmpi-x86_64:$MANPATH
Let's set the password of ssh.
It makes it from the system to an unnecessary password to which the key of opening to the public is registered. (Naturally, login cannot be done from the system without the public-key. )
In the execution of ssh-keygen, it only has to push 'Enter' as it is without putting it at all.
# su - mpiusr
$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/mpiusr/.ssh/id_rsa): 【Enter】
Created directory '/home/mpiusr/.ssh'.
Enter passphrase (empty for no passphrase): 【Enter】
Enter same passphrase again: 【Enter】
Your identification has been saved in /home/mpiusr/.ssh/id_rsa.
Your public key has been saved in /home/mpiusr/.ssh/id_rsa.pub.
The key fingerprint is:
1c:af:63:39:a6:8a:8c:88:17:81c:25:cb:3a:79:ad:87 mpiusr@lxcf-srv
The key's randomart image is:
+--[ RSA 2048]----+
| |
| |
| . |
|. . . . . |
|. S . |
|o.o . . |
|.+ o * . |
| E .o = o |
|.o.+. +.. |
+-----------------+
The public-key is registered in authorized_keys. This is used to deploy the same nvironment as the container later.
$ cat .ssh/id_rsa.pub > .ssh/authorized_keys
$ chmod 400 .ssh/authorized_keys
I will register the container executed in parallel. It is possible to register beforehand though the container has not been made yet. The parallel execution program is delivered and executed for the system registered here.
It will be named to it a0016 from a0001 though 16 containers are scheduled to be made. Moreover, one cpu is scheduled to be allocated in each container. The mpi program of 16 parallels becomes possible this time.
Please add the following setting to /etc/openmpi-x86_64/openmpi-default-hostfile.
a0001 cpu=1
a0002 cpu=1
a0003 cpu=1
a0004 cpu=1
a0005 cpu=1
a0006 cpu=1
a0007 cpu=1
a0008 cpu=1
a0009 cpu=1
a0010 cpu=1
a0011 cpu=1
a0012 cpu=1
a0013 cpu=1
a0014 cpu=1
a0015 cpu=1
a0016 cpu=1
Firewalld is stopped.
# systemctl stop firewalld
# systemctl disable firewalld
SELinux is stopped. Please correct /etc/selinux/config as "SELINUX=disabled" and do reboot.
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these two values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are pr$
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
It reboots.
The container is made. It makes it from a0001 to a0016 as shown ahead.
It is easy in one command to make two or more containers in LXCF.
# lxcf sysgen-n a 16
When OS starts, I will automatically set the container as the start.
# lxcf autostart-n a 16
# lxcf list
Name Mode State Autostart Path
-------------------------------------------------------------------------
a0001 joint running y /opt/lxcf/a0001
a0002 joint running y /opt/lxcf/a0002
a0003 joint running y /opt/lxcf/a0003
a0004 joint running y /opt/lxcf/a0004
a0005 joint running y /opt/lxcf/a0005
a0006 joint running y /opt/lxcf/a0006
a0007 joint running y /opt/lxcf/a0007
a0008 joint running y /opt/lxcf/a0008
a0009 joint running y /opt/lxcf/a0009
a0010 joint running y /opt/lxcf/a0010
a0011 joint running y /opt/lxcf/a0011
a0012 joint running y /opt/lxcf/a0012
a0013 joint running y /opt/lxcf/a0013
a0014 joint running y /opt/lxcf/a0014
a0015 joint running y /opt/lxcf/a0015
a0016 joint running y /opt/lxcf/a0016
Let's write and execute the following scripts. The argument of -i becomes cpu number.
#!/bin/sh
lxcf set -i 0 a0001
lxcf set -i 1 a0002
lxcf set -i 2 a0003
lxcf set -i 3 a0004
lxcf set -i 4 a0005
lxcf set -i 5 a0006
lxcf set -i 6 a0007
lxcf set -i 7 a0008
lxcf set -i 8 a0009
lxcf set -i 9 a0010
lxcf set -i 10 a0011
lxcf set -i 11 a0012
lxcf set -i 12 a0013
lxcf set -i 13 a0014
lxcf set -i 14 a0015
lxcf set -i 15 a0016
Please note easiness to put three 0 after a00010 and a since the tenth. 0 is a0010 in the correct one by two.
The setting of ssh set from all made containers a0001 to a0016 first and the setting of bash are deployed.
# cd /home/mpiusr
# lxcf deploy .ssh
# lxcf deploy .bash*
In openmpi, it seems to have to register the system that executes it in known_hosts.
It deploys it to the container making it on the host.
A0001 is first logged in once, and exit is done at once.
# su - mpiusr
$ ssh a0001
The authenticity of host 'a0001 (192.168.125.2)' can't be established.
ECDSA key fingerprint is ef:21:e4:30:01:99:f5:af:58:9b:2c:9f:77f:8a:6b:31.
Are you sure you want to continue connecting (yes/no)? yes
[mpiusr2@a0001 ~]$ exit
Please repeat this log in and exit to a0016.
The file named known_hosts under .ssh, and it is registered from a0001 to a0016 on the inside. This file is deployed to the container.
# cd /home/mpiusr/.ssh
# lxcf deploy known_hosts
The Himeno benchmark is executed. Himeno benchmark http://accc.riken.jp/2444.htm
Dr. Ryutaro Himeno, Director of the Advanced Center for Computing and Communication, has developed this benchmark to evaluate performance of incompressible fluid analysis code. This benchmark program takes measurements to proceed major loops in solving the Poisson's equation solution using the Jacobi iteration method.
Source code(Fortran90 + MPI) http://accc.riken.jp/secure/4562/f90_xp_mpi.lzh
It is possible to develop with tool file-roller of GUI though it is lzh form.
$ cd /home/mpiusr
$ file-roller f90_xp_mpi.lzh
The source file named himenoBMTxpr.f90 is obtained when developing. This is compiled and deployed.
$ mpif90 himenoBMTxpr.f90 -o himenoBMTxpr
$ su mpiusr
# lxcf deploy himenoBMTxpr
# exit
$
Then, let's execute it.
It executes it in not the parallel but a single process.
$ ./himenoBMTxpr
For example:
Grid-size=
XS (64x32x32)
S (128x64x64)
M (256x128x128)
L (512x256x256)
XL (1024x512x512)
Grid-size =
LFor example:
DDM pattern=
1 1 2
i-direction partitioning : 1
j-direction partitioning : 1
k-direction partitioning : 2
DDM pattern =
1 1 1Sequential version array size
mimax= 513 mjmax= 257 mkmax= 257
Parallel version array size
mimax= 513 mjmax= 257 mkmax= 257
imax= 512 jmax= 256 kmax= 256
I-decomp= 1 J-decomp= 1 K-decomp= 1Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 149.74321524898934 time(s): 22.412517070770264 4.88281250E-04
Now, start the actual measurement process.
The loop will be excuted in 8 times.
This will take about one minute.
Wait for a while.
Loop executed for 8 times
Gosa : 4.88281250E-04
MFLOPS: 149.88961008631279 time(s): 59.708338975906372
Score based on Pentium III 600MHz : 1.80938697
The result was 150MFlops.
Next, it executes it with the container of LXCF of 16 parallels.
$ mpirun -np 16 himenoBMTxpr
For example:
Grid-size=
XS (64x32x32)
S (128x64x64)
M (256x128x128)
L (512x256x256)
XL (1024x512x512)
Grid-size =
LFor example:
DDM pattern=
1 1 2
i-direction partitioning : 1
j-direction partitioning : 1
k-direction partitioning : 2
DDM pattern =
2 2 4Sequential version array size
mimax= 513 mjmax= 257 mkmax= 257
Parallel version array size
mimax= 258 mjmax= 130 mkmax= 66
imax= 257 jmax= 129 kmax= 65
I-decomp= 2 J-decomp= 2 K-decomp= 4Start rehearsal measurement process.
Measure the performance in 3 times.
MFLOPS: 2561.5325802873126 time(s): 1.3102009296417236 8.44401540E-04
Now, start the actual measurement process.
The loop will be excuted in 137 times.
This will take about one minute.
Wait for a while.
Loop executed for 137 times
Gosa : 7.51504384E-04
MFLOPS: 2537.1674758698832 time(s): 60.407096862792969
Score based on Pentium III 600MHz : 30.6273251
The result was 2.5GFlops(2537MFlops).
Result of 16 parallels/ result of one parallel = 16.9
Almost the same 16 as the number of LXCF containers in which 16 was used in parallel times the performance were able to improve. Ideal speed-up by the parallel computation can be achieved.
It will be able to be said that this is evidence that making the container environment without the overhead by LXCF parallel can be achieved.
Two or more PC ties with the cable of Ethernet, the power supply and the network cable keep confused, and the PC cluster to which the rental of electricity increases, too need not be used. If LXCF is used, a parallel cluster with the machine with two or more CPU core can be easily constructed.