README for Platform Community Scheduler Framework
December, 2003
Platform Computing
=========================
CONTENTS
=========================
1. Introduction
2. Supported Platforms
3. Installation
4. Configuration
5. Using CSF
6. Contact Information
7. Support for Multiple GT3.0 Hosting Environment
8. Uninstallation
9. Contact Information
10. Copyright
=========================
1. Introduction
=========================
This file describes the installation and usage of Platform Community
Scheduler Framework (CSF).
------------------------
1.1 What is the Platform Community Scheduler Framework?
------------------------
The Platform Community Scheduler Framework (CSF) is the industry's
first comprehensive and OGSI-compliant metascheduling framework
built upon the Globus Toolkit(R) 3.0 (GT3) As part of our commitment
to the future of Grid, we are contributing CSF back to the Globus
Project(TM) to be included in all future versions of the Globus Toolkit.
------------------------
1.2 Platform CSF metascheduling services
------------------------
The Platform CSF consists of the following services:
o Job Service, for submitting, controlling and monitoring jobs
o Reservation Service, for reserving resources on the Grid
o Queuing Service, which provides a basic scheduling capability in CSF.
=========================
2. Supported Platforms
=========================
This release supports the following platforms:
o x86 systems running Linux: Kernel 2.4.x, compiled with glibc 2.2
Tested on RedHat Linux 7.2 and 8.0
=========================
3. Installation
=========================
------------------------
3.1 Required packages
------------------------
1. Additional tools (required also by GT3)
- JDK 1.4 or higher (http://java.sun.com/j2se/1.4.1/index.html)
- Ant 1.5 (http://ant.apache.org/)
- JUnit 3.8.1 (http://www.junit.org/index.htm)
See the GT3 installation documentation for information about
downloading and installing these tools:
http://www-unix.globus.org/toolkit/3.0/ogsa/docs/admin/installation.html
2. Platform's LSF 6.0
3. CSF packages
- CSF+GT3.0 binary package: gt3.0-csf3.0-linux-installer.tar.gz
- CSF+GT3.0 source package: gt3.0-csf3.0-src-installer.tar.gz
------------------------
3.2 System requirements
------------------------
- CPU speed must be 1 GHz processor or higher.
- At least 512 MB of RAM and 1 GB of free disk space.
------------------------
3.3 Installing the CSF+GT3.0 all-in-one packages
------------------------
To install the CSF+GT3.0 all-in-one packages, complete the following steps.
The GT3.0_CSF all-in-one package includes a simple installation script
and the following packages:
- GT3.0.2 package
- csf package
1. Install LSF for CSF
o Login as root
o Uncompress and untar lsf6.0_linux2.4-glibc2.2-x86.tar.Z and follow
the steps in README
o Set up your LSF environment by sourcing either cshrc.lsf or
profile.lsf.
2. Make sure JAVA_HOME, ANT_HOME are set to the correct
installation location.
Add JAVA_HOME/bin and ANT_HOME/bin to PATH variable.
3. Install one of the gt3.0-csf3.0 packages:
o gt3.0-csf3.0 binary package:
% gunzip gt3.0-csf3.0-linux-installer.tar.gz
% tar xvf gt3.0-csf3.0-linux-installer.tar
% cd gt3-install
% ./install-gt3.0-csf3.0 <gt3_install_location>
<gt3_install_location> must be an absolute path.
o gt3.0-csf3.0 source package:
% gunzip gt3.0-csf3.0-src-installer.tar.gz
% tar xvf gt3.0-csf3.0-src-installer.tar
% cd gt3-install
% ./install-gt3.0-csf3.0 <gt3_install_location>
<gt3_install_location> must be an absolute path.
Both GT3.0 mmjfs package and CSF package are installed
automatically.
4. Make sure you set up security configuration for GT3.0, mmjfs installed and
grim security is configured.
4.1 Security configuration
If you already have GT2 certificates and have /etc/grid-security configured,
you can skip this step.
You can get certificate from Globus or set up your own CA. Certificates
for every user and the host, which runs GT3.0, are required. We recommend
you getting certificates from Globus as setting up your own CA could
take some time.
For details, see section "Security Configuration" at
http://www-unix.globus.org/toolkit/3.0/ogsa/docs/admin/configuration.html
4.2 Grim installation
% ls -l $JAVA_HOME/jre/endorsed/xalan.jar
% ls -l $GLOBUS_LOCATION/bin/globus-grim
If either of the files is not installed, follow instructions in
the GT3.0 Administrator Guide installation section "Installing
GT3" at
http://www-unix.globus.org/toolkit/3.0/ogsa/docs/admin/installation.html
4.3 MMJFS installation
% $GPT_LOCATION/sbin/gpt-query -name=mmjfs
For mmjfs installation, make sure you have certificates BEFORE
you start service container. Details can be found on the
Globus Web site at
http://www-unix.globus.org/toolkit/3.0/ogsa/docs/admin/configuration.html
5. Set up GT3.0 environment
o Make sure JAVA_HOME, ANT_HOME are set to the correct installation location.
o Add JAVA_HOME/bin and ANT_HOME/bin to PATH variable.
o cd <gt3_install_location>
o export GLOBUS_LOCATION=`pwd`
o source either setenv.csh or setenv.sh
o source either etc/globus-user-env.csh or etc/globus-user-env.sh
o Execute $GLOBUS_LOCATION/bin/setperm.sh as root.
6. Sanity checking
o Your name is defined in /etc/grid-security/grid-mapfile like:
"/O=Grid/O=Globus/OU=lsf.platform.com/CN=Bingfeng Lu" bingfeng
o Your name is defined in /etc/grid-security/grim-port-type.xml like:
<authorized_port_types>
...
<port_type username="bingfeng">http://www.globus.org/namespaces/managed_job/managed_job/ManagedJobPortType</port_type>
...
</authorized_port_types>
o Verify your certificate by executing the following commands
% $GLOBUS_LOCATION/bin/grid-proxy-init
220:blu@dev04 /usr/local/bingfeng/gt301> grid-proxy-init
Your identity: /O=Grid/O=Globus/OU=lsf.platform.com/CN=Bingfeng Lu
Enter GRID pass phrase for this identity:
Creating proxy ........................................... Done
Your proxy is valid until: Wed Jul 30 05:02:14 2003
% $GLOBUS_LOCATION/bin/grid-proxy-info
222:blu@dev04 /usr/local/bingfeng/gt301> grid-proxy-info
subject : /O=Grid/O=Globus/OU=lsf.platform.com/CN=Bingfeng Lu/CN=880442514
issuer : /O=Grid/O=Globus/OU=lsf.platform.com/CN=Bingfeng Lu
identity : /O=Grid/O=Globus/OU=lsf.platform.com/CN=Bingfeng Lu
type : Proxy draft compliant impersonation proxy
strength : 512 bits
path : /tmp/x509up_u30107
timeleft : 11:59:36
=========================
4. Configuring CSF
=========================
You must configure the following files:
- $GLOBUS_LOCATION/etc/resourcemanager-config.xml
- $GLOBUS_LOCATION/etc/metascheduler-config.xml
Those files include some example settings for you follow.
-------------------------
4.1. Configuring Resource Manager Factory Service
-------------------------
Edit $GLOBUS_LOCATION/etc/resourcemanager-config.xml and specify:
name = name of cluster installed in step (1) of installation
type = currently must have value of "LSF" (without quotes)
host = host running gabd, may be same as LSF master
port = port number specified in gabd configuration (i.e., ga.conf)
gabd's configuration locates at $LSF_ENVDIR/conf/ga.conf, in which
port number is specified. The default port number for gabd is 1966.
Note: the example resource manager configuration in the file
is commented out. You need to define one after line "-->".
-------------------------
4.2. Configuring CSF
-------------------------
Edit $GLOBUS_LOCATION/etc/metascheduler-config.xml and specify:
GISHandle = handle of Index Service
registryHandle = handle of container registry
NOTE: do not use localhost or 127.0.0.1 loopback address, change it to
actual host IP address.
=========================
5. Using CSF
=========================
-------------------------
5.1. Documentation
-------------------------
CSF documentation can be found in directory $GLOBUS_LOCATION/docs/csf:
- examples: contains examples for job service and reservation service
- api: Java API documents in Javadoc format:
$GLOBUS_LOCATION/docs/csf/api/CSF/index.html
$GLOBUS_LOCATION/docs/csf/api/resourcemanager/index.html
- config: configuration template for resource manager and CSF.
-------------------------
5.2. Starting LSF
-------------------------
o Login as root
o Set up LSF environment by sourcing either cshrc.lsf or
profile.lsf
o lsadmin limstartup
o lsadmin resstartup
o badmin hstartup
o $LSF_SERVERDIR/gabd -d $LSF_ENVDIR
-------------------------
5.3. Starting GT3.0
-------------------------
1. Set up GT3.0 environment
- Make sure JAVA_HOME, ANT_HOME are set to corresponding install location.
- Add JAVA_HOME/bin and ANT_HOME/bin to PATH variable.
- cd <gt3_install_location>
- Export GLOBUS_LOCATION=`pwd`
- Source either etc/globus-user-env.csh or etc/globus-user-env.sh
- Source either setenv.csh or setenv.sh
2. Start service container
% ant startContainer
NOTE: control will not return to the terminal for "ant startContainer",
so you will need another window for testing CSF.
-------------------------
5.4. Using the Reservation Service
-------------------------
Reservation Service requires resource manager supporting advance
reservation. By default only LSF cluster administrator(s) can make
reservations in the cluster. You can allow every user to make
reservation as well by defining advance reservation policies in your
LSF cluster.
See the chapter "Advance Reservation" in "Administering Platform LSF"
for more information.
o Creating a reservation by agreement
o Use template agreement file from
${GLOBUS_LOCATION}/docs/csf/example/agreement.xml,
you must change:
- agreement term values (ResReq or hostTerm, cpuTerm, userTerm,
startTime, endTime)
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_factory_handle> createService agreement
<agreement_file>
[ instance_name ]
o On success, a corresponding reservation is created in the LSF
cluster. Use the "brsvs" command to query existing reservations.
NOTE: On success, this returns a handle to the reservation service
instance. On failure, this throws a fault
o Creating a reservation by RSL
o You can use template RSL file from csf/example/rsv.xml, but
must change:
- schema location specified in "schemaLocation" attribute.
- values for hosts, number ... etc
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_factory_handle> createService rsl <rsl_file>
[ instance_name ]
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_service_handle> submit.
o On success, a corresponding reservation is created in the LSF
cluster. Use the "brsvs" command to query existing reservations.
o Creating a reservation by ID
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_factory_handle> createService id <rsv_id>
[ instance_name ]
o For example, a reservation was created previously by using either
RSL or GSA agreement. Users obtain the reservation ID from
"getRsvData" operation, and delete the service instance. Some
time later, user would like to perform additional operations to
the reservation.
NOTE: This can only be perform for previously create reservation.
java com.platform.metascheduler.client.ReservationServiceClient
<reservation_factory_handle> createService id "ID_20030627_1" xyz
o Modifying a reservation
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_service_handle> modify <rsl_file>
o Operation that modifies the reservation request. This can only
be performed when reservation is in CREATED status.
o Canceling a reservation
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_service_handle> cancel
o Querying Reservation Data
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_service_handle> getRsvData
-or-
o java org.globus.ogsa.client.FindServiceDataByName "agreement"
<reservation_service_handle>
You will see XML file. At the top of it, you should see the reservation
information:
<GridReservation ID="{http://host:8080/ogsa/services/metascheduler/ReservationFactoryService}ID_20030730_3" Status="RESERVED" UserID="bingfeng">
This reservation can be used by job. For details, see next section.
o Querying Reservation Status
o java com.platform.metascheduler.client.ReservationServiceClient
<reservation_service_handle> getStatus
-or-
o java org.globus.ogsa.client.FindServiceDataByName "status"
<reservation_service_handle>
------------------------
5.5. Using the Job Service
------------------------
For jobs forwarded by the Platform Resource Manager Adapter, you can
verify if job service is working by checking the job status in
your LSF cluster.
o Creating a job:
o java com.platform.metascheduler.client.JobServiceClient
<job_factory_service_handle> createService rsl <rsl_file>
RETURN: job_service_handle
(You should use this handle to control the job).
NOTE: Users need to call submit or start after the creation
in order for the job to run; otherwise, the job service
will not perform any further action. Users should also
destroy the job after it is finished.
createService option "id <ID>" is not supported yet.
o Starting a job in a cluster controlled by resource manager:
o java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> start <resource_manager_factory_handle> <cluster_name>
RETURN: true or false
o java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> start <cluster_name>
RETURN: true or false
o Starting a job to a managed job service:
o java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> start <Master(type)
ManagedJobFactoryService handle> <type "Fork" or "Pbs">
RETURN: true or false
o Stopping a job:
o java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> stop
RETURN: true or false
o Resuming a job:
o java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> resume
RETURN: true or false
o Canceling a job:
o java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> cancel
RETURN: true or false
o Checking job information:
o java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> getJobData
RETURN: job data in xml format
o Queuing a job:
The queuing service must be configured successfully, as described
in "5.6. Using the Queuing Service"
% java com.platform.metascheduler.client.JobServiceClient
<job_service_handle> submit [queue_name]
If queue_name is not specified, the default queue that is
defined in the job service section is picked by job service.
o Creating a job with a grid reservation id
o Make a reservation by using Reservation service.
o Get grid reservation id
o Specify a reservation in job's rsl_file like
<metascheduler:gridReservationId>
<rsl:string>
<GridReservation ID="{http://172.25.247.141:8080/ogsa/services/metascheduler/ReservationFactoryService}ID_20030730_3"/>
</rsl:string>
</metascheduler:gridReservationId>
Queuing service dispatches the job to the cluster where the reservation is
made if grid reservation id is included. See the next section for details
of queuing service.
If job is started by using job service client, clusterName and
gridReservationId defined in job's rsl file is ignored.
------------------------
5.6. Using the Queuing Service
------------------------
o Configuring queues:
$GLOBUS_LOCATION/etc/metascheduler-config.xml
section queuingFactoryConfig
Each queue has its own configuration section, which includes:
- plugin: name of the class which must interface
implement com.platform.metascheduler.impl.schedPlugin
The default plugin is always loaded even without defining plugin.
If the plugin specified does not exist or does not implement the
schedPlugin interface, it will not be loaded.
The optional throttle plugin should be configured.
- scheduleInterval: interval in seconds between different
scheduling session. Its value is an integer between 5 and 600.
This parameter is optional. If not defined, default value (30
seconds) is used.
- throttle: maximal number of jobs can be dispatched in each
scheduling cycle. Its value is an integer greater than 0.
To test this, you must configure throttle to a small value and
scheduleInterval to a bigger value and submit more jobs. You can
check how many jobs are forwarded to LSF cluster(s) as well.
Turn on debug in $GLOBUS_LOCATION/log4j.properties
log4j.category.com.platform.metascheduler.impl.schedThrottle=DEBUG
You should see a message like the following:
[java] 438867 [Thread-60] DEBUG com.platform.metascheduler.impl.schedThrottle
- Keep only the first 3 decisions
By default, there is no queue configured. Any job submission to a
queue will fail.
o Creating a queue - only queues the are configured in the
configuration file can be created. Any user can create queues and
any queue can be used by all users. If you submit a job to a
queue that does not already exist, the queue is created automatically.
Syntax:
% java com.platform.metascheduler.client.QueuingServiceClient
<QueuingFactoryService_handle> create <queue>
Return: handle to the created queue if operation is successful.
o Checking all defined queues in Queuing factory service through
service data
Syntax:
% java org.globus.ogsa.client.FindServiceDataByName ";ConfigParams"
<QueuingFactoryService_handle>
Return: all queues configured in xml format
o Checking one specific queue:
o Get queue configuration through service data:
Syntax:
% java org.globus.ogsa.client.FindServiceDataByName ";ConfigParams"
<queue_service_handle>
Return: queue configuration in xml format
o Get queue status through service data:
% java org.globus.ogsa.client.FindServiceDataByName ";status"
<queue_service_handle>
Return: queue status in xml format
o Get queue data through queuing service client:
Syntax:
% java com.platform.metascheduler.client.QueuingServiceClient
<queue_service_handle> getQueueData
Return: queue name and status
o Get queue configuration through queuing service client:
Syntax:
% java com.platform.metascheduler.client.QueuingServiceClient
<queue_service_handle> getQueueConfigParams
Return: queue configuration parameters.
o Queuing a job
o Jobs specifying a cluster name. The queuing service honors
the ClusterName specified in the job RSL file and dispatches it.
If it is incorrect, the job is not scheduled.
1 - create a job by using job service
2 - submit the job to the queue
o Jobs without specifying any cluster. The queuing service assigns
a cluster for the job and dispatches it.
1 - create a job by using job service
2 - submit the job to the queue
See "5.5. Using the Job Service" for job creation and job
submission to queue
o Jobs with reservation. Queuing service dispatches the job
to the cluster where the first reservation is made.
If the job does not specify clusterName or gridReservationId in
its RSL file, the queuing service dispatches the job to the
available Resource Manager Factory Service and Gram factories
in round-robin order.
o Debugging the queuing service - turn the following flags in
log4j.properties if you are using log4j.properties file for
logging configuration file:
- log4j.category.com.platform.metascheduler.impl.QueuingFactoryCallback=DEBUG
- log4j.category.com.platform.metascheduler.impl.QueuingServiceImpl=DEBUG
- log4j.category.com.platform.metascheduler.impl.schedPluginDefault=DEBUG
- log4j.category.com.platform.metascheduler.impl.schedThrottle=DEBUG
=========================
6. CSF Commands & APIs
=========================
In order to make CSF easy for end users to use, the CSF package
includes several sample commands to wrap around clients for Job
Service, Reservation Service, and Queuing Service.
All the commands are located in $GLOBUS_LOCATION/bin. For
command usage, use "help" as an argument.
------------------------
6.1 Job service commands
------------------------
% csf-job-cancel: cancel a job service instance
% csf-job-create: create a job service instance
% csf-job-data: get detailed job information
% csf-job-resume: resume a suspended job
% csf-job-start: start a job in a specific resource manager
% csf-job-status: get job status
% csf-job-stop: resume a suspended job
% csf-job-submit: submit the job to queuing service
------------------------
6.2 Reservation service commands
------------------------
% csf-rsv-cancel cancel an existing reservation
% csf-rsv-create create a reservation service instance
% csf-rsv-data get reservation information
% csf-rsv-status get reservation status
% csf-rsv-submit submit a created reservation to start reservation booking
------------------------
6.3 Queuing service commands
------------------------
% csf-queue-conf get queue configuration
% csf-queue-create create a queuing service instance
% csf-queue-data get queue information
Job service, reservation service and queuing service are Grid
services. Here we list published APIs of those services. For details,
see the WSDL files under directory
$GLOBUS_LOCATION/schema/metascheduler and the Javadoc information
under directory ${GLOBUS_LOCATION}/docs/csf/api.
------------------------
6.4 Job service APIs
------------------------
- String start(String factoryUri, String clusterName, String rmType, String rmRsvId)
Start a job in a specific cluster
- boolean cancel()
Cancel a job
- boolean resume()
Resume a suspended job
- boolean stop()
Suspend a job
- String getJobData()
Get detailed job information
- String getStatus
Get job status
------------------------
6.5 Reservation service APIs
------------------------
- String getRsvData()
Get reservation information
- boolean submit()
Submit a reservation request for reservation booking
- boolean modify(String rsl)
Modify a reservation request before the request is submitted
- String getStatus()
Get reservation status
------------------------
6.6 Queuing Service APIs
------------------------
- QueueDataType getQueueData()
Get queue information
- QueueConfigParamsType getQueueConfigParams()
Get queue configuration
- String submit(QueuingRequestType submitRequest)
Submit a job to current queue
- boolean remove(QueuingRequestType removeRequest)
Remove a job from current queue
=========================
7. Support for Multiple GT3.0 Hosting Environment
=========================
All CSF services can work across multiple GT3.0 hosting environments.
You can start a job, created in one GT3.0, in another GT3.0
environment by using corresponding Resource Manager Factory URL or
GRAM URL.
For example, assume GT3.0 is installed on both hostA and hostB
independently.
------------------------
7.1 Starting a job in an LSF cluster through Resource Manager Factory
service
------------------------
o Create a job in GT3.0 on hostA environment:
% csf-job-create rsl job.xml job1
o Start the job in an LSF cluster through Resource Manager Factory
service:
% csf-job-start job1 http://hostB:8080/ogsa/services/metascheduler/ResourceManagerFactoryService clustername
o Start the job using GramFork on hostB:
% csf-job-start job1 http://hostB:8080/ogsa/services/base/gram/MasterForkManagedJobFactoryService Fork
o Start the job using GRAM on hostB:
In job.xml file, modify section clusterName for the job like
<metascheduler:clusterName>
<rsl:string> <rsl:stringElement value="http://hostB:8080/ogsa/services/metascheduler/ResourceManagerFactoryService;clustername"/></rsl:string>
</metascheduler:clusterName>
By default, queuing service round-robin scheduling does not
consider Resource Manager Factory service or Gram hosted in other
environment. To enable it, you must make the following change:
Modify etc/index-service-config.xml in GT3.0 environment, which
hosts the queuing service:
- Add the following in section installedProviders:
<providerEntry class="com.platform.metascheduler.providers.IndexServiceMonitor" />
- Add the following in section executedProviders:
<provider-exec:ServiceDataProviderExecution>
<provider-exec:serviceDataProviderName>IndexServiceMonitor</provider-exec:serviceDataProviderName>
<provider-exec:serviceDataProviderImpl>com.platform.metascheduler.providers.IndexServiceMonitor</provider-exec:serviceDataProviderImpl>
<provider-exec:serviceDataProviderArgs> http://HOSTB:PORT/ogsa/services/base/index/IndexService </provider-exec:serviceDataProviderArgs>
<provider-exec:serviceDataName>service_monitor2</provider-exec:serviceDataName>
<provider-exec:refreshFrequency>60</provider-exec:refreshFrequency>
<provider-exec:async>false</provider-exec:async>
</provider-exec:ServiceDataProviderExecution>
HOSTB:PORT : host and port number for the second GT3.0
environment.
- Restart GT3.0
You should get Index Service information of hostB from the first
GT3.0 environment:
% java org.globus.ogsa.client.FindServiceDataByXPath "" "*" http://host:8080/ogsa/services/base/index/IndexService "//gis:RemoteIndexServiceAvailability" "xmlns:gis=http://www.platform.com/namespaces/2003/05/metascheduler/GIS"
- Logging is working
- Date: Thu Jul 24 13:41:39 EDT 2003
- Version: Apache-XML-Security-J 1.0.4
<ns2:RemoteIndexServiceAvailability xmlns:ns2="http://www.platform.com/namespaces/2003/05/metascheduler/GIS"><ns2:RemoteIndexService Status="down" URL="http://HOSTB:PORT/ogsa/services/base/index/IndexService"/></ns2:RemoteIndexServiceAvailability>
Status "down" or "up" tells whether GT3 on HOSTB is accessible or
not.
------------------------
7.2 Submitting multiple jobs without specifying clusterName
------------------------
Run the following commands:
% csf-job-create rsl job.xml job1
% csf-job-submit job1 normal
% csf-job-create rsl job.xml job2
% csf-job-submit job2 normal
% csf-job-create rsl job.xml job3
% csf-job-submit job3 normal
Jobs should be started in different resource manager or gram.
=========================
8. Uninstallation
=========================
To undeploy and uninstall the CSF package:
% cd $GLOBUS_LOCATION
% ant undeploy -Dgar.id=metascheduler
% $GPT_LOCATION/sbin/gpt-uninstall [-force] csf
=========================
9. Common problems and known issues
=========================
The following are several common problems during installation and testing.
For known problems with GT3.0, see:
http://www-fp.globus.org/about/faq/errors.html
1. GT3.0 container cannot be started successfully due to port in use.
GT3.0 uses port 8080 by default. You can change the default port
number in file $GLOBUS_LOCATION/ogsa.properties parameter service.port.
If you shut down your container by using Ctrl-C, the container might
be still running. You need to kill all java processes and then
restart container again.
2. Java client always fails due to "Defective credential" or "Credential expired".
Run grid-proxy-init again and then run client again.
3. Client receives error message like "Failed to establish security context".
a. Shut down your GT3.0 environment, and make sure you kill all
Java processes.
b. Remove the directory $HOME/.globus/uhe-*/.
Make sure you do NOT delete $HOME/.globus.
c. Remove the file $GLOBUS_LOCATION/jobMapping*
d. Restart GT3.0 container.
4. How to support running more than 200 jobs concurrently?
With the default configuration, CSF can support 50 running jobs
concurrently.
If you need to test more than 200 jobs, you must modify your
configuration file $GLOBUS_LOCATION/build-services.xml section
"startContainer". Add "-Xmx512m" to the value of parameter
"jvmarg". For example:
<jvmarg value="-Xmx512m -Djava.endorsed.dirs=endorsed"/>
Platform is currently working on this problem.
5. You receive error message
"faultString: java.net.SocketTimeoutException: Read timed out"
during container start up or even at run time.
By default the timeout of a Grid service invocation is 60 seconds.
Globus has some suggestions at http://www-unix.globus.org/toolkit/faq.html#105,
but these seem not to work based on our experience. Our suggestion
is to reduce the load of the host that is running GT3.0 container.
Second, increase number of container thread defined in
$GLOBUS_LOCATION/server-config.wsdd from
<parameter name="containerThreads" value="5"/>
to
<parameter name="containerThreads" value="50"/>
6. Gram job does not work due to permission or invalid certificates.
a. Make sure $GLOBUS_LOCATION/bin/setperm.sh is executed as root.
b. Check that your name is included in files /etc/grid-security/grid-mapfile
and grim-port-type.xml.
c. Make sure the host has certificate file hostcert.pem
under directory /etc/grid-security.
d. Run grid-proxy-info to verify that your certificate is working.
=========================
10. Contact Information
=========================
Please send all questions and comments to support@platform.com,
or phone toll-free 1-877-444-4LSF (+1 877 444 4573)
=============================
11. Copyright
=============================
Copyright 1994-2003 Platform Computing Corporation.
All rights reserved.
Although the information in this document has been carefully
reviewed, Platform Computing Corporation ("Platform") does not
warrant it to be free of errors or omissions. Platform reserves the
right to make corrections, updates, revisions or changes to the
information in this document.
UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED
IN THIS DOCUMENT IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY
KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO
ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL
DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA OR
SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.
LSF is a registered trademark of Platform Computing Corporation in
the United States and in other jurisdictions.
ACCELERATING INTELLIGENCE, THE BOTTOM LINE IN DISTRIBUTED COMPUTING,
PLATFORM COMPUTING, and the PLATFORM and LSF logos are trademarks of
Platform Computing Corporation in the United States and in other jurisdictions.
UNIX is a registered trademark of The Open Group.
Other products or services mentioned in this document are the
trademarks of their respective owners.
==============================
END of README
Last update: Tuesday December 09 2003
==============================