README for Platform Community Scheduler Framework
28th Sep, 2007
=========================
Introduction
=========================
------------------------
1.1 What is the Platform Community Scheduler Framework?
------------------------
The Platform Community Scheduler Framework (CSF) is the industry's
first comprehensive and OGSI-compliant metascheduling framework
built upon the Globus Toolkit(R) 4 (GT4) As part of our commitment
to the future of Grid, we are contributing CSF back to the Globus
Project(TM) to be included in all future versions of the Globus Toolkit.
------------------------
1.2 Platform CSF metascheduling services
------------------------
The Platform CSF consists of the following services:
o Job Service, for submitting, controlling and monitoring jobs
o Reservation Service, for reserving resources on the Grid
o Queuing Service, which provides a basic scheduling capability in CSF.
=========================
Building and Installing
=========================
------------------------
1 Supported Platforms:
------------------------
This release supports the following platforms:
x86 systems running Linux: Kernel 2.4.x, compiled with glibc 2.2
Tested on RedHat Linux 9.0 and RedHat Enterprise 3.0
------------------------
2 Required packages
------------------------
1. Additional tools (required also by GT4)
- JDK 1.5 or higher (http://java.sun.com/j2se)(You will meet compile errors if using jdk 1.4.2)
- Ant 1.5 or higher (http://ant.apache.org/)
- Globus Toolkit 4.0.x (http://www.globus.org/toolkit/downloads)
See the GT4 installation documentation:
2. CSF packages
- CSF4.0.x source package: csf-src-4.0.x.tar.gz
Newest release package: (recommended)
http://sourceforge.net/projects/gcsf
Newest code from CVS: (which is up to date, but may not stable)
http://sourceforge.net/cvs/?group_id=103105
(moudle name of CSF4: gridservices)
From GT4 package (which is not up to date generally)
untar and unzip gt4 package,
CSF package is in 'contrib' directory
------------------------
3 Optional packages
------------------------
1. Platform's LSF 6.0
CSF4 provides some advanced functionalities (such as advanced reservation and pause/resume jobs) for LSF
2. CSF4_portlet
o CSF4_portlet is a portlet of CSF4 host in gridsphere
o Provides friendly and rapid user interface to operate CSF4
o Supports most features of CSF4, and account management, certificate management, transparent data transfer
------------------------
4 System requirements
------------------------
- CPU speed must be 1 GHz processor higher.
- At least 512 MB of RAM and 1 GB of free disk space, which required by Globus Toolkit 4
------------------------
5 Installing the CSF4.0.x packages
------------------------
To install the CSF4.0.x packages, complete the following steps.
1. Make sure JAVA_HOME, ANT_HOME are set to the correct installation location.
Add JAVA_HOME/bin and ANT_HOME/bin to PATH variable.
Tip: use "java -version" and "ant -version" to verify if your java and ant work.
2. Install the newest Gt4 packages and do the basic configuration:
please look through the GT4 install guide :
http://www.globus.org/toolkit/docs/4.0/admin/docbook/
For usging CSF4, the GSI and WS GRAM configuration is required.
If you don't want to see a mass of error messages from globus container, you also need to setup postgreSQL and configure RFT.
3. Set up GT4.0 environment
o Make sure JAVA_HOME, ANT_HOME are set to the correct installation location.
o Add JAVA_HOME/bin and ANT_HOME/bin to PATH variable.
o cd <gt4_install_location>
o export GLOBUS_LOCATION=`pwd`
o source either etc/globus-user-env.csh or etc/globus-user-env.sh
o source either etc/globus-devel-env.csh or etc/globus-devel-env.sh
4. Install LSF (Optional, if you don't have lsf license, please skip this step)
o Login as root
o Uncompress and untar lsf6.0_linux2.4-glibc2.2-x86.tar.Z and follow
the steps in README
o Set up your LSF environment by sourcing either cshrc.lsf or profile.lsf.
5. Install CSF4.0.x :
o If download the package: csf-4.0.x-src.tar.gz
o run this command as globus admin user (the user who installed gt4)
gpt-build csf-4.0.x-src.tar.gz
gpt-postinstall
o If download the CVS:
o Enter into packaging/
run this command:
. ./make-src-package
come into being the file of csf-4.0.x-src.tar.gz
o run this command as globus admin user (the user installed gt4)
gpt-build csf-4.0.x-src.tar.gz
gpt-postinstall
6. Make sure you set up security(GSI) configuration for GT4.0
You can get certificate from Globus or set up your own CA. Certificates
for every user and the host, which runs GT4.0, are required. We recommend
you getting certificates from Globus as setting up your own CA could
take some time.
For details,
http://www.globus.org/toolkit/docs/4.0/admin/docbook/ch05.html
7. Sanity checking
o Verify your certificate by executing the following commands
[zding@grid1 zding]$ grid-proxy-init -debug -verify
User Cert File: /home/zding/.globus/usercert.pem
User Key File: /home/zding/.globus/userkey.pem
Trusted CA Cert Dir: /etc/grid-security/certificates
Output File: /tmp/x509up_u502
Your identity: /O=Grid/OU=GlobusTest/OU=simpleCA-grid1.jlu.edu.cn/OU=jlu.edu.cn/CN=Zhaohui Ding
Enter GRID pass phrase for this identity:
Creating proxy ................++++++++++++
....++++++++++++
Done
Proxy Verify OK
Your proxy is valid until: Thu Sep 21 03:39:17 2006
[zding@grid1 zding]$ grid-proxy-info
subject : /O=Grid/OU=GlobusTest/OU=simpleCA-grid1.jlu.edu.cn/OU=jlu.edu.cn/CN=Zhaohui Ding/CN=1218653979
issuer : /O=Grid/OU=GlobusTest/OU=simpleCA-grid1.jlu.edu.cn/OU=jlu.edu.cn/CN=Zhaohui Ding
identity : /O=Grid/OU=GlobusTest/OU=simpleCA-grid1.jlu.edu.cn/OU=jlu.edu.cn/CN=Zhaohui Ding
type : Proxy draft (pre-RFC) compliant impersonation proxy
strength : 512 bits
path : /tmp/x509up_u502
timeleft : 11:12:52
=========================
Configuring
=========================
following files must be configured:
- $GLOBUS_LOCATION/etc/metascheduler/resourcemanager-config.xml
- $GLOBUS_LOCATION/etc/metascheduler/metascheduler-config.xml
- $GLOBUS_LOCATION/etc/metascheduler/application-config.xml
Those files include some example settings for you follow.
------------------------
1. Configuring Resource Manager
------------------------
Edit $GLOBUS_LOCATION/etc/metascheduler/resourcemanager-config.xml and specify:
The config file is for the resource managers other than WS GRAM, such as Pre-WS GRAM(i.e. GT2 GRAM or gatekeeper) and LSF web broker.
name = name of cluster installed in step (1) of installation
type = currently must have value of "GRAM" or "LSF" (without quotes)
host = host running gatekeeper or lsf web broker
port = port number of gatekeeper or lsf web broker
Pre-WS GRAM Example, it hosts on grid.jlu.edu.cn and the jobmanager is sge:
<cluster>
<name> jlu-sge </name>
<type> GRAM </type>
<host> grid1.jlu.edu.cn/jobmanager-sge </host>
<port> 2119 </port>
<version>2.4</version>
</cluster>
LSF web broker Example,
<cluster>
<name> jlu-lsf01 </name>
<type> LSF </type>
<host> grid1.jlu.edu.cn</host>
<port> 1975 </port>
<version>6.0</version>
</cluster>
Note: the example resource manager configuration in the file
is commented out. You need to define one after line "-->".
------------------------
2. Configuring CSF Services
------------------------
1. Edit $GLOBUS_LOCATION/etc/metascheduler-config.xml and specify:
CommunityGISHandle = handle of Default Index Service
NOTE: Do not use localhost or 127.0.0.1 loopback address, change it to
actual host IP address.
NOTE: If you ignore this step, CSF will use local DefaultIndexService as CommunityGisHandle
2. Config job,queue,reservation
Tip: You can using default configuration without any change.
------------------------
3. Configuring Installed Applications
------------------------
Edit $GLOBUS_LOCATION/etc/application-config.xml,
which is the configuration file contains the information of specific application installation.
With this, CSF4 will dispatch jobs to the satisfied clusters, on which the applications were installed, with correct application path.
Followed example configured "cpi" program (a simple mpi program for computing PI value)
on host "grid1.jlu.edu.cn" and "grid2.jlu.edu.cn"
<app>
<name>CPI</name>
<clusters>
<cluster>
<masterhost>grid1.jlu.edu.cn</masterhost>
<path>/bin/cpi</path>
</cluster>
<cluster>
<masterhost>grid2.jlu.edu.cn</masterhost>
<path>/usr/local/bin/cpi</path>
</cluster>
</clusters>
</app>
=========================
Testing
=========================
------------------------
1. Documentation
------------------------
NOTE: What we did the testing is on the GT4.0.0
CSF documentation can be found in directory $GLOBUS_LOCATION/docs/metascheduler/:
- examples: contains examples for job service and reservation service
- config: configuration template for resource manager and CSF.
- api: Java API documents in Javadoc format:
$GLOBUS_LOCATION/bin/
------------------------
2. Starting LSF (Optional, if you don't have lsf license, please skip this step)
-----------------------
o Login as root
o Set up LSF environment by sourcing either cshrc.lsf or
profile.lsf
o lsadmin limstartup
o lsadmin resstartup
o badmin hstartup
o $LSF_SERVERDIR/gabd -d $LSF_ENVDIR
------------------------
3. Starting GT4.0
------------------------
1. Set up GT4.0 environment
- Make sure JAVA_HOME, ANT_HOME are set to corresponding install location.
- Add JAVA_HOME/bin and ANT_HOME/bin to PATH variable.
- cd <gt4_install_location>
- Export GLOBUS_LOCATION=`pwd`
- Source either etc/globus-devel-env.csh or etc/globus-devel-env.sh
2. Start service container
% su - globus
% globus-start-container
NOTE: control will not return to the terminal for "globus-start-container",
so you will need another window for testing CSF.
------------------------
4. Using the Job Service
------------------------
For jobs forwarded by the Platform Resource Manager Adapter, you can
verify if job service is working by checking the job status in
your LSF cluster
o Creating and submit a job:
o Syntax: csf-job-create -rsl|-r <rsl_file> [-name|-n job_name] [-submit|-sub] [-clustername|-cn clustername] [-clustertype|-ct clustertype] [-A array_scope] [-delegate|-d <Full|Limited>]
RETURN: true or false ,and the job_name
o A simplest job submission
$ csf-job-create -r $GLOBUS_LOCATION/docs/metascheduler/examples/gram_job.rsl -n myjob1 -sub
$ csf-job-create -r $GLOBUS_LOCATION/docs/metascheduler/examples/gram_job.xml -n myjob2 -sub
NOTE: Before create the job, you should have a rsl(job description file).
Please follow these pages to learn how to write a rsl.
GT2 RSL: http://www.globus.org/toolkit/docs/2.4/gram/rsl_spec1.html
GT4 RSL: http://www.globus.org/toolkit/docs/4.0/execution/wsgram/schemas/gram_job_description.html
Tip : There are some rsl examples located at
$GLOBUS_LOCATION/docs/metascheduler/examples
Tip : User can give a -submit|-sub argument, then the job will be
submit to default queue immediately after created. If -submit or -sub
didn't be given, users need to call submit or start after the creation
in order for the job to run.
o Two more complicated jobs
$ csf-job-create -r $GLOBUS_LOCATION/docs/metascheduler/examples/gram_job.rsl -n myjob3 -sub -ct SGE
$ csf-job-create -r $GLOBUS_LOCATION/docs/metascheduler/examples/gram_job.rsl -n myjob4 -sub -cn jlu_lsf01
For the two jobs, myjob3 will only be submitted to SGE clusters,
and myjob4 will be dispatched to the cluster named jlu_lsf01.
Tip : cluster name ant type information can be queried by $ csf-resource-list
Tip : cluster type can be one of "Fork" "LSF" "SGE" "Condor" "PBS" and "DRMAA"
o Submit a job with automatic data-staging
o What is automatic data-staging?
By using automatic data-staging, user don't need to do data stagin and stageout manually even if the job is running on remote clusters.
o Submit a job with data-staging
$ csf-job-create -r $GLOBUS_LOCATION/docs/metascheduler/examples/datastaging_job.rsl -name myjob5
The content of datastaging-job.rsl:
&(executable=/bin/echo)
(directory=$(HOME))
(arguments="this is a pre-ws gram example")
(stdout=testDataStage.out)
(stderr=testDataStage.err)
(stageout="testDataStage.out" "testDataStage.err")
If the job is running on a remote cluster, the two files "testDataStage.out" and "testDataStage.err" will be transfer to local machine
after the job finished.
NOTE: Automatic data-staging required Gridftp service is available on the clusters.
o Submit an Array Job
o What is an array job?
Array job is designed for some applications, like autodock, which is composed of a large number of sub jobs,
these sub jobs generally have fully same executable, but different arguments, input data and output data.
The advantages of array job include user only need to submit job once, submission time and memory storage were saved.
o Array Job Submission
$ csf-job-create -r $GLOBUS_LOCATION/docs/metascheduler/examples/array_job.rsl -n myjob6 -A 1-10
The content of array_job.rsl:
&(executable=/bin/echo)
(directory=$(HOME))
(arguments="@A")
(stdout=testArrayJob.out.@A)
(stderr=testArrayJob.err.@A)
(stageout="/home/zding/testArrayJob.out.@A" "/home/zding/testArrayJob.err.@A")
The job will be split to 10 copies at the run time, and note there are 3 "@A" in the rsl,
they will be replaced by "1" to "10" before be dispatched to the clusters. With stageout,
you will see testArrayJob.out.1, testArrayJob.err.1, testArrayJob.out.2, testArrayJob.err.2
...... and testArrayJob.out.10, testArrayJob.err.10 in your home dirctory.
o Submit a application specified job
o $ csf-job-create -r $GLOBUS_LOCATION/docs/metascheduler/examples/app_job.rsl -name myjob7
The content of app_job.rsl:
&(application=AUTODOCK)
(executable=autodock4)
(arguments="-alt")
(stdout=stdout)
(stderr=stderr)
(stageout="stdout" "stderr")
With the "application" tag in rsl, CSF4 will dispatch jobs to the clusters installed AUTODOCK,
and get correct application path on these clusters.
NOTE: Before submit a application specified job, you (or CSF4 administrator) should configure application-config.xml.
NOTE: If the application only has one binary, you don't need to specify executable tag, if the application
have more than one binaries(such as AUTODOCK has two, autodock4 and autogrid4), the executable is required.
o Submit a mpi job
o $ csf-mpi-run -rsl $GLOBUS_LOCATION/docs/metascheduler/examples/mpi_job.rsl -name myjob8
or $ csf-mpi-run -np 4 -e /opt/hpl/gnu/bin/xhpl -name myjob8
The content of mpi_job.rsl:
&(executable=/opt/hpl/gnu/bin/xhpl)
(jobtype="mpi")
(count="4")
(stdout=/home/zding/xhpl.out)
(stderr=/home/zding/xhpl.err)
(stagein="/export/home/zding/HPL.dat->HPL.dat")
(stageout="/home/zding/xhpl.out" "/home/zding/xhpl.err")
As you see, you can use either csf or mpirun style commandline, but the benefits of using CSF rsl are automatic data-staging and
applicatin based scheduling etc.
o Query Resource Managers available
o Syntax: csf-resource-list
RETURN: All the resource managers information (Include local/remote ResourceManagerLsf, local/remote ResourceManagerGram, local/remote Gram Clusters)
Tip: ResourceManagerGram means GT2 gatekeeper, Gram Cluster means cluster managed by Gram. Gram Cluster can be Fork, LSF, PBS, SGE, Condor or DRMAA.
o Query Applications Installed
o Syntax: csf-application-list
RETURN: All Application installation information
o Queuing a job:
If a job is created without given "-sub" or "-submit", user can use this command to submit the job to queue at any time.
o Syntax: csf-job-submit <job_name> [-queue|-q queueName] [-clustertype|-ct clustertype] [-clustername|-cn clustername] [-delegate|-d "Full"|"Limited"]
o Starting a job in a cluster controlled by resource manager:
o Syntax: csf-job-start <job_name> <-RmfHandle|-Rh ResourceManagerFactoryService_handle> <-cluster|-c cluster_name> <-rsvId|-r reservationID> <-delegate|-d "Full"|"Limited">
If a job is created without given "-sub" or "-submit", user can use this command to start the job immediately at any time.
RETURN: JobID
o Stopping a job:
o Syntax: csf-job-stop <job_name>
RETURN: true or false
o Resuming a job:
o Syntax: csf-job-resume <job_name>
RETURN: true or false
o Canceling a job:
o Syntax: csf-job-cancel <job_name>
RETURN: true or false
o Checking job information:
o Syntax: csf-job-data <job_name>
RETURN: completed job data in xml format
o Syntax: csf-job-status <job_name>
RETURN: job status in xml format
o List job history
o Syntax: csf-job-list
o Creating a job with a grid reservation id (LSF6 is needed, skip this test if you don't have lsf license)
o Make a reservation by using Reservation service.
o Get grid reservation id
o Specify a reservation in job's rsl file like
<metascheduler:gridReservationId>
<rsl:string>
<GridReservation ID="{http://10.60.39.113:8080/wsrf/services/metascheduler/ReservationService}ID_2005030_10"/>
</rsl:string>
</metascheduler:gridReservationId>
Queuing service dispatches the job to the cluster where the reservation is
made if grid reservation id is included. See the next section for details
of queuing service.
If job is started by using job service client, clusterName and
gridReservationId defined in job's rsl file is ignored.
Note: This feature only available for ResourcManagerLsf
------------------------
5. Using the Queuing Service
------------------------
o Configuring queues:
$GLOBUS_LOCATION/etc/metascheduler/metascheduler-config.xml
section queuingConfig
Each queue has its own configuration section, which includes:
- plugin: name of the class which must interface
implement com.platform.metascheduler.impl.schedPlugin
The default plugin is always loaded even without defining plugin.
If the plugin specified does not exist or does not implement the
schedPlugin interface, it will not be loaded.
The optional throttle plugin should be configured.
- scheduleInterval: interval in seconds between different
scheduling session. Its value is an integer between 5 and 600.
This parameter is optional. If not defined, default value (30
seconds) is used.
- throttle: maximal number of jobs can be dispatched in each
scheduling cycle. Its value is an integer greater than 0.
To test this, you must configure throttle to a small value and
scheduleInterval to a bigger value and submit more jobs. You can
check how many jobs are forwarded to LSF cluster(s) as well.
Turn on debug in $GLOBUS_LOCATION/log4j.properties
log4j.category.com.platform.metascheduler.impl.schedThrottle=DEBUG
You should see a message like the following:
[java] 438867 [Thread-60] DEBUG com.platform.metascheduler.impl.schedThrottle
- Keep only the first 3 decisions
By default, there is a queue named "normal" configured.
o Creating a queue - only queues the are configured in the
configuration file can be created. Any user can create queues and
any queue can be used by all users. If you submit a job to a
queue that does not already exist, the queue is created automatically.
Syntax:
o Syntax: csf-queue-create -n <queue_name>
Return: handle to the created queue if operation is successful.
o Checking one specific queue:
o Get queue data through queuing service client
o Syntax: csf-queue-data <queue_name>
Return: queue name and status
o Get queue configuration through queuing service client:
o Syntax: csf-queue-conf <queue_name>
Return: queue configuration parameters.
o Queuing a job
o Jobs specifying a concrete cluster name or a cluster type. Please note
cluster name and cluster type arguments is exclusive. The queuing service honors
the ClusterName specified in the job RSL file and dispatches it.
If it is incorrect, the job is not scheduled.
1 - create a job by using job service
2 - submit the job to the queue
See "Using the Job Service" for job creation and job
submission to queue
o Jobs without specifying any cluster. The queuing service assigns
a cluster for the job and dispatches it.
1 - create a job by using job service
2 - submit the job to the queue
See "Using the Job Service" for job creation and job
submission to queue
o Jobs with reservation. Queuing service dispatches the job
to the cluster where the first reservation is made.
If the job does not specify clusterName or gridReservationId in
its RSL file, the queuing service dispatches the job to the
available Resource Manager Factory Service and Gram factories
in round-robin order.
------------------------
6. Using the Reservation Service (LSF6 is required, if you don't have lsf license, please skip this section)
------------------------
Reservation Service requires resource manager supporting advance
reservation. By default only LSF cluster administrator(s) can make
reservations in the cluster. You can allow every user to make
reservation as well by defining advance reservation policies in your
LSF cluster.
See the chapter "Advance Reservation" in "Administering Platform LSF"
for more information.
o Creating a reservation by agreement
o Use template agreement file from
${GLOBUS_LOCATION}/docs/csf/example/agreement.xml,
you must change:
- agreement term values (ResReq or hostTerm, cpuTerm, userTerm,
startTime, endTime)
o csf-rsv-create -agreement|-a
<agreement_file> [-name|-n reservation_name]
o On success, a corresponding reservation is created in the LSF
cluster. Use the "brsvs" command to query existing reservations.
NOTE: On success, this returns a handle to the reservation-name
On failure, this throws a fault
If not specified the reservation_name ,It will return a hach_code,
The hach_code has the same function as the reservation_name.
o Creating a reservation by RSL
o You can use template RSL file from csf/example/rsv.xml, but
must change:
- schema location specified in "schemaLocation" attribute.
- values for hosts, number ... etc
o csf-rsv-create createService -rsl|-r <rsl_file>
[-name|-n reservation_name]
o csf-rsv-submit <reservation_name>
o On success, a corresponding reservation is created in the LSF
cluster. Use the "brsvs" command to query existing reservations.
o Modifying a reservation
o csf-rsv-modify <reservation_name > <-rsl rsl_file>
o Operation that modifies the reservation request. This can only
be performed when reservation is in CREATED status.
o Canceling a reservation
o csf-rsv-cancel <reservation_name >
o Querying Reservation Data
o csf-rsv-data <reservation_name >
You will see XML file. At the top of it, you should see the reservation
information:
<GridReservation
ID="{http://10.60.39.113:8080/wsrf/services/metascheduler/ReservationService}ID_20050310_3" Status="RESERVED" UserID="bingfeng">
This reservation can be used by job. For details, see next section.
o Querying Reservation Status
o csf-rsv-status <reservation_name>
=========================
CSF Commands & APIs
=========================
In order to make CSF easy for end users to use, the CSF package
includes several sample commands to wrap around clients for Job
Service, Reservation Service, and Queuing Service.
All the commands are located in $GLOBUS_LOCATION/bin. For
command usage, use "help" as an argument.
------------------------
1. Job service commands
------------------------
% csf-job-cancel: cancel a job service instance
% csf-job-create: create a job service instance
% csf-job-data: get detailed job information
% csf-job-resume: resume a suspended job
% csf-job-start: start a job in a specific resource manager
% csf-job-status: get job status
% csf-job-stop: resume a suspended job
% csf-job-submit: submit the job to queuing service
% csf-job-list: list all the jobs, name, rsl style and status
% csf-resource-list: list all resources(clusters) available
% csf-application-list: list all applications installed
% csf-mpi-run: submit a mpi job
------------------------
2. Reservation service commands
------------------------
% csf-rsv-cancel: cancel an existing reservation
% csf-rsv-create: create a reservation service instance
% csf-rsv-data: get reservation information
% csf-rsv-status: get reservation status
% csf-rsv-submit: submit a created reservation to start reservation booking
------------------------
3.Queuing service commands
------------------------
% csf-queue-conf: get queue configuration
% csf-queue-create: create a queuing service instance
% csf-queue-data: get queue information
Job service, reservation service and queuing service are Grid
services. Here we list published APIs of those services. For details,
see the WSDL files under directory
$GLOBUS_LOCATION/schema/metascheduler and the Javadoc information
under directory ${GLOBUS_LOCATION}/docs/csf/api.
------------------------
4. Job service APIs
------------------------
- String start(String factoryUri, String clusterName, String rmType, String rmRsvId)
Start a job in a specific cluster
- boolean cancel()
Cancel a job
- boolean resume()
Resume a suspended job
- boolean stop()
Suspend a job
- String getJobData()
Get detailed job information
- String getStatus()
Get job status
------------------------
5. Reservation service APIs
------------------------
- String getRsvData()
Get reservation information
- boolean submit()
Submit a reservation request for reservation booking
- boolean modify(String rsl)
Modify a reservation request before the request is submitted
- String getStatus()
Get reservation status
------------------------
6. Queuing Service APIs
------------------------
- QueueDataType getQueueData()
Get queue information
- QueueConfigParamsType getQueueConfigParams()
Get queue configuration
- String submit(QueuingRequestType submitRequest)
Submit a job to current queue
- boolean remove(QueuingRequestType removeRequest)
Remove a job from current queue
=========================
Support for Multiple CSF4 Hosting Environment
=========================
------------------------
1.The conception of multiple CSF4 hosting environment
------------------------
The conception of multiple CSF4 hosting environment means there exist multiple CSF4 installations,they trust the issued CAs mutually, the multiple Globus enviroment is called a "Community" or a "Virtual Organization(OA)".
The multi-hosting collaboration of CSF allows the hostA send it's jobs to the hostB,and the jobs run in the hostB's resourcemanager.
One community need one or more Center Index Server,the Center Index Server storage all "epr" about the RM information of the community. Especially, the Center Index Server isn't necessary to install the CSF4.
------------------------
2.The configure must to do
------------------------
for instance: HostA(IP:10.60.39.3) HostB(IP10.60.39.113) as the hosts computer of a Community, HostC(IP:10.60.39.4) as the Center Index Server of the Community.
o config the HostA:
o In the file of $GLOBUS_LOCATION/etc/metascheduler/metaschedulerconfig.xml
(in the title of globalConfig),configure the CommunityGISHandle
For instance:
<global:CommunityGISHandle value=
" https://10.60.39.3:8443/wsrf/services/DefaultIndexService"/>
o In the file of $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/hierarchy.xml
configure the VO Index Service Handle.
For instance:
<upstream>https://10.60.39.4:8443/wsrf/services/DefaultIndexService</upstream>
o config the HostB
The configure process is the same as HostA
o In the file of $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/hierarchy.xml, configure the other Host Index Service Handle of the VO
for instance:
<downstream>https://10.60.39.3:8443/wsrf/services/DefaultIndexService</downstream>
<downstream>https://10.60.39.113:8443/wsrf/services/DefaultIndexService</downstream>
------------------------
3.Submit a job in the Multiple GT4.0 Hosting Environment
------------------------
o Config the HostB's $GLOBUS_LOCATION/etc/metascheduler/resourcemanager-config.xml
for instance:
<cluster>
<name> cluster1 </name>
<type> LSF </type>
<host> HostB </host>
<port> 1966</port>
<version> 6.0 </version>
</cluster>
o Config the HostA's $GLOBUS_LOCATION/etc/metascheduler/resourcemanager-config.xml
for instance:
<cluster>
<name> cluster2 </name>
<type> LSF </type>
<host> HostA </host>
<port> 1976</port>
<version> 6.0 </version>
</cluster>
o Create job1 on HostB
[globus@grid5 gt395]$ csf-job-create -rsl job.xml -name job1
Service location:https://10.60.39.113:8443/wsrf/services/metascheduler/JobService
CreateJob Successfully: job1
o submit job1 ,run in the HostA
[globus@grid5 gt395]$ csf-job-submit job1 -q normal -Rh https://10.60.39.3:8443/wsrf/services/metascheduler/ResourceManagerFactoryService -c cluster2
submit(normal) => https://10.60.39.113:8443/wsrf/services/metascheduler/JobService
Notice:The HostB had configed to cluster1, the job1 was appointed to cluster2(the HostA had configed), so when submit the jod which created in HostB ,the job will run in the HostA.
------------------------
4.Starting a job in an LSF cluster through Resource Manager Factory service
------------------------
o Create a job in GT4.0 on hostA environment:
% csf-job-create rsl job.xml job1
o Start the job in an LSF cluster through Resource Manager Factory
service:
% csf-job-start job1 -Rh http://HostIP:8443/wsrf/services/metascheduler/ResourceManagerFactoryService -c clustername
=========================
New features of 4.0.2
=========================
-------------------------
1. A new useful utility for Query Resource Manager Information availabe
-------------------------
o csf-job-RmInfo
o csf-job-list
-------------------------
2. Support GT2 gatekeeper
-------------------------
o Config gatekeeper to $GLOBUS_LOCATION/etc/metascheduler/resourcemanager-config.xml
o For example:
<cluster>
<name> gatekeeperA </name>
<type> GRAM </type>
<host> rocks-110.sdsc.edu </host>
<port> 2119 </port>
<version>2.4</version>
</cluster>
o submit a job to gatekeeper
o csf-job-create -rsl docs/metascheduler/test.rsl -n job1
Service location:https://198.202.88.110:8443/wsrf/services/metascheduler/JobService
CreateJob Successfully: job1
o csf-job-submit job1
-------------------------
3. Support full delegate for GT4 WS-Gram/GT2 gatekeeper job
-------------------------
o GT4 WS-Gram client (globusrun-ws) and gatekeeper client (globusrun) didn't support full delegation for some secure issue, however, full delegation proxy is necessary for some application (for example: Gfarm1.1.1).
o Start a job with full delegate
For example:
o csf-job-start job1 -Rh http:///198.202.88.110:8443/wsrf/services/metascheduler/ResourceManagerFactoryService -c gatekeeperA -d Full
=========================
New features of 4.0.3
=========================
-------------------------
1. CSF4_Portlet
-------------------------
The CSF4 Portlet is a java based web application for dispatching jobs to remote job schedulers, through a web browser. It presents a generic interface for users to create job specifications and submit job specifications to generate jobs, to view job specifications and job history, monitor job status, and also get job output from remote sites.
For more details, see README of CSF4_Portlet please.
-------------------------
2. Funtionalities improved
-------------------------
o User can give a -submit|-sub argument when run csf-job-create, then the job will be submit to default queue immediately after created.
o User can specified cluster type when run csf-job-submit, then scheduling framework will only take the special type clusters as candidates.
=========================
New features of 4.0.4
=========================
-------------------------
1. Array Job
-------------------------
See the introduction in JobService
-------------------------
2. Application based Scheduling
-------------------------
o csf-application-list
See the introduction in JobService
-------------------------
3. Automatic Data-Staging
-------------------------
See the introduction in JobService
-------------------------
4. MPI Job
-------------------------
o csf-mpi-run
See the introduction in JobService
-------------------------
5. Updated CSF4 Portlet
-------------------------
Fully compliant with CSF4.0.4
=========================
Uninstallation
=========================
To undeploy and uninstall the CSF package:
Run this command in the directory $GLOBUS_LOCATION as globus admin user
ant -f share/globus_wsrf_common/build-packages.xml undeployGar -Dgar.id=metascheduler
rm -rf etc/gpt/packages/setup/csf
rm -rf etc/gpt/packages/csf
=========================
Contact Information
=========================
Please send all questions and comments to support@platform.com
or discuss group of sourceforge.net(http://sourceforge.net/mail/?group_id=103105),
or phone toll-free 1-877-444-4LSF (+1 877 444 4573)
=========================
Copyright
=========================
Copyright 1994-2003 Platform Computing Corporation,
2004-2007 Lab of Distributed Computing and System Architecture, Jilin University
All rights reserved.
Although the information in this document has been carefully
reviewed, Lab of Distributed Computing and System Architecture,
Jilin University ("DCSA Lab.") does not warrant it to be free of
errors or omissions. DCSA Lab. reserves the right to make
corrections, updates, revisions or changes to the information
in this document.
UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED
IN THIS DOCUMENT IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO
ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA OR
SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.
LSF is a registered trademark of Platform Computing Corporation in
the United States and in other jurisdictions.
ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING,
PLATFORM COMPUTING, and the PLATFORM and LSF logos are trademarks of
Platform Computing Corporation in the United States and in other jurisdictions.
UNIX is a registered trademark of The Open Group.
Other products or services mentioned in this document are the
trademarks of their respective owners.
=========================
Troubleshooting
=========================
o If you can't create a queue, you'd better check out if you have
configed the queue in the file of
$GLOBUS_LOCATION/etc/metascheduler/metascheduler-config.xml
o When you using csf-job-RmInfo/csf-rsv-create/csf-queue-create utility,
return a error message "DefaultIndexService isn't ready".
The reason is ResourceManager Information won't aggregate to DefaultIndexService as soon as container startup,
so please wait a few seconds(less than 10 seconds) and re-run the command.
==============================
END of README
Last update: September 28th 2007
==============================