From: lukeg-adv <luk...@o2...> - 2019-03-27 01:05:35
|
Hi, I am looking for a way to start a Java programme on a computing cluster that uses SLURM for job submission. As far as I can tell I cannot ssh to its computing nodes, which rules out using SSHLauncher. After looking through the mailing list thought about using APGAS's NoLauncher for that task, i.e. starting a number of processes with srun and making each of them connect to one master process, using the command like (I have redacted the ip of master node that I use): srun -n 4 -N 4 java -cp "./lib/*:./build/libs/*" -Dapgas.places=4 -Dapgas.launcher=apgas.impl.N -Dapgas.verbose.launcher=true -Dapgas.master=10.10.10.10 test.Test. test.Test is a simple hello world programme, see the end of this post for its code. To my understanding, the srun command should start 4 APGAS processes, each on its own node, with one node accomodating master process. In fact the logs (the messages in the form of "[APGAS] New place starting at 10.10.10.10:5701" show that indeed 4 places are started. The problems I encounter: (1) the 4 processes start and this is my output: Hello from place(3) Hello from place(0) Hello from place(1) Hello from place(0) Hello from place(1) Hello from place(0) Hello from place(3) Hello from place(1) Hello from place(3) Hello from place(2) Hello from place(2) Hello from place(0) Hello from place(1) Hello from place(3) Hello from place(2) Hello from place(2) I am starting to learn APGAS and getting acquainted with its terminology, however shouldn't the programme print the message only 4 times, once for each place? (2) The programme fails to stop. After printing 16 lines of "Hello from..." messages, it just freezes. What is therefore a recommended way of running Java programme with APGAS library on a cluster with SLURM workload manager? Hello world code I used: public class Test { public static void main(String[] args) { finish ( () -> { for (final Place place : places()) { asyncAt(place, ()->System.out.println("Hello from " + here())); } }); } } Thanks for your answers! Best regards Łukasz |