TaskManager is an open source infrastructure software for distributing and managing calculation jobs in a Unix computer cluster environment. The TaskManager was designed to control the utilization of a set of hosts even if you are not the administrator of the system. The hosts are embedded in a Unix environment and the user's home directories are mounted on each host. The hosts may have different numbers of CPUs/cores and different kernels. Keep in mind that a user is able to log into each host and calculate on it. However, he should use the TaskManager to submit calculation jobs to the cluster to avoid an overload of the hosts. Jobs which are under the control of the TaskManager are executed on a host of the computer cluster with the rights of the respective user to ensure that the executing jobs have the permission to access the user's files.
The TaskManager package consists of several servers, TaskDispatcher, TaskManagerServer and InfoServer, and several clients, which communicate with these servers. The main server, TaskDispatcher, is responsible for receiving jobs from users, storing detailed information about each job, sending jobs to vacant computer in a cluster, and controlling their execution. A TaskManagerServer is invoked by each user, respectively, in the background with his Unix permissions. Therefore, it has the rights to access user specific data in the file system. Finally, the InfoServer is invoked on every host in the cluster to gather information about the computer. The servers and clients communicate over secure socket layers (SSL) authenticated with certificates, which are generated by you or the taskmanager admin.
See Servers of TaskManager page for more details.
The current version can be downloaded from the project page: http://sourceforge.net/projects/tmpackage/files/taskmanager-0.9.2.tar.gz/download
Copy entire TaskManager directory to a directory of your choice, e.g., /usr/local/opt
$ tar -C /usr/local/opt -xzvf taskmanager-0.9.1.tar.gz
(optional) Create a frozen version of several python programs with freeze. Modify and execute script
For each program all necessary python libraries are copied to a single directory and a binary is created. Therefore a load of python libraries over an intranet is not anymore necessary and it makes the execution faster. Modify wrapper scripts in the directory bin/ in order to invoke the compiled version respectively.
Set permissions of wrapper scripts in bin/ that every user can execute it.
Create certificate for user taskdispatcher who represents the TaskDispatcher. Modify and execute script
$ cd scripts; ./createCertificate.sh taskdispatcher <YOUREMAIL>; cd -
Create for each user a certificate
$ cd scripts; ./createCertificate.sh <USER> <USEREMAIL>; cd -
For each user the following files are created in etc/certs
* <USER>.key private key of user * <USER>.csr certificate request file (not necessary) * <USER>.crt self signed certificate of user * ca_certs.<USER>.crt certificates of taskdispatcher and user himself
Each user has to create the directory .taskManager in his home directory and copy the following files to that directory:
* <USER>.key * <USER>.crt * ca_certs.<USER>.crt
Include all users who are allowed to use the TaskManager and their certificate files into etc/users
Create file which include all certificates of users
$ cd scripts; ./createAuthorizedCertsFile.sh; cd -
Assign users to groups etc/groups
Configure computer cluster by given information about each computer in the cluster in etc/ComputerCluster.config
$ cd Server $ python TaskDispatcher.py -e ../var/TaskDispatcherError.log -p 101010
$ bin/hRunJob -h
Get status of TaskManagerServer and TaskDispatcher
$ bin/hRunJob -s
Connect directly to TaskDispatcher and get help
$ bin/hSend localhost 101010 "help"
Activate computer (if you have permissions given in etc/groups)
$ bin/hSend localhost 101010 "activatehost:localhost"
Send job to cluster
$ bin/hRunJob "sleep 10"
For the most current version of the application, just check out the SVN trunk:
svn checkout svn://svn.code.sf.net/p/tmpackage/code/trunk tmpackage