|
From: Mark HB <m.h...@ma...> - 2007-04-17 08:42:07
|
Dear Vesselin, Apologies if you have already replied to me (I have been on holiday and was ruthless in deleting my spammed mail). I was wondering if you had a chance to look into the problem I am having with running multiple gridsam instances (see below) Cheers Mark ########################### Dear Vesselin, Thanks for your reply. After looking into this further, I think we have boiled it down to the exact problem. I am now certain that it is not a certificate problem, as I have successfully run a job on one of the grid nodes using my set-up on Friday. However the error below is still a problem. I will elaborate on my setup. I am running four instances of grid-sam. One points to the local system, and the other three point to respective NGS nodes (man, leeds & oesc). When I restart the OMII server, I am then able to submit a job to ONE of the NGS nodes successfully, however, the other two respond with the same error as before! "Description: cannot initialise working directory: Could not connect to FTP server on"gsiftp://grid-data.man.ac.uk/ - User globus credential is required but not specified in the context". It CANNOT be credentials, as I have just successfully run a job on the leeds node, but the exact same job fails on the man and oesc nodes. I set up the multiple instances exactly as instructed on the gridsam site (moving jar files etc). Have you any ideas Regards Mark Vesselin Novov wrote: > Mark, > > I should have pointed out, I have never used the AHE in OMII. > All my tests use directly installed/managed GridSAM instance. > It's worth checking with the AHE developers what exactly goes on > when AHE instantiates/manages a GridSAM instance with regard to any > security credentials. > > -Vesso > > Mark HB wrote: > > >> Vesselin, >> >> Yes, sorry about that. The error is exactly the same no matter which >> node I send the job to, and hence I just grabbed the first log file to hand. >> >> Cheers >> Mark >> >> >> Novov wrote: >> >> >> >>> Mark, >>> >>> I recently had this exact error, but, in my case missing proxy >>> credentials were definitely >>> the cause of it. >>> >>> I am also a bit confused: >>> >>> - the pasted "GridSAM state is . . ." below indicates the job was >>> attempted on the 27th March >>> and it was Not submitted because no connection was established with >>> grid-compute.leeds.ac.uk. >>> - the gram_job_mgr_9570.log entries are from 13th March and indicate >>> that job Was at least submitted. >>> - the catalina.out section below also indicates a job Was submitted, the >>> staging-in of input files was successful >>> but failed at staging-out(after execution) phase and the machine is >>> grid-data.man.ac.uk. >>> >>> regards >>> Vesso >>> >>> Mark HB wrote: >>> >>> >>> >>> >>> >>>> Dear GRIDSam list, >>>> >>>> I am attempting to run an application on the NGS using AHE/GRIDSam >>>> bundled with the OMII-stack. I get the following error when trying to >>>> run the job: >>>> >>>> GridSAM state is: failed >>>> Time: 2007-03-22T08:23:48.172Z >>>> Description: cannot initialise working directory: Could not connect >>>> to FTP server on"gsiftp://grid-compute.leeds.ac.uk/ - User globus >>>> credential is required but not specified in the context". >>>> >>>> I can assure you that I have both a GRIDSam proxy certificate and proxy >>>> user certificate running. >>>> >>>> The GRAM log file found on the NGS machine can be found here: >>>> http://igrid-ext.cryst.bbk.ac.uk/gram_job_mgr_9570.log >>>> >>>> The main point in this log is "GRAM_SCRIPT_ERROR = 26" >>>> >>>> ABelow you will find the output produced in the catalina.log file. >>>> >>>> Has anyone come across this error before? I would appreciate any comments? >>>> >>>> Cheers >>>> Mark >>>> >>>> - principal obtained from Axis transport - >>>> EMA...@ma..., >>>> CN=igrid.cryst.bbk.ac.uk, L=EISD, OU=UCL, O=eScience, C=UK >>>> - principal obtained from Axis transport - >>>> EMA...@ma..., >>>> CN=igrid.cryst.bbk.ac.uk, L=EISD, OU=UCL, O=eScience, C=UK >>>> - state {pending} reached >>>> WseSourceProcessor: No destinations to route the event >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fec threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7feb threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> - principal obtained from Axis transport - >>>> EMA...@ma..., >>>> CN=igrid.cryst.bbk.ac.uk, L=EISD, OU=UCL, O=eScience, C=UK >>>> - principal obtained from Axis transport - >>>> EMA...@ma..., >>>> CN=igrid.cryst.bbk.ac.uk, L=EISD, OU=UCL, O=eScience, C=UK >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fea threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> - state {staging-in} reached >>>> WseSourceProcessor: No destinations to route the event >>>> - Proxy Credentials received : >>>> /C=UK/O=eScience/OU=Imperial/L=LeSC/CN=mark halling-brown >>>> - basic authentication scheme selected >>>> - staging (copy) file >>>> http://test:to...@ig...:18080/filestage/551218202107278041688/datPlain12 >>>> >>>> -> gsiftp://grid-data.man.ac.uk/551218202107278041688/datPlain12 >>>> - basic authentication scheme selected >>>> - total byte write 2508 >>>> - datPlain12 staged >>>> - state {staged-in} reached >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fe9 threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> - executing groovy script >>>> org/icenigrid/gridsam/core/plugin/connector/globus/rsl.groovy >>>> - executed groovy script >>>> org/icenigrid/gridsam/core/plugin/connector/globus/rsl.groovy >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fe8 threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> - RSL &( arguments = "-f" "datPlain12" )( directory = >>>> "551218202107278041688" )( executable = "/home/ngs0386/bin/cimmsim" )( >>>> stderr = "stderr.txt" )( stdout = "stdout.txt" )( count = "1" )( jobType >>>> = "mpi" ) >>>> - Submitting globus job to grid-data.man.ac.uk/jobmanager-pbs >>>> - Globus job submitted https://grid-data.man.ac.uk:64304/25112/1174414409/ >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fe7 threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> WseSourceProcessor: No destinations to route the event >>>> - globus job is active - https://grid-data.man.ac.uk:64304/25112/1174414409/ >>>> - state {active} reached >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fe6 threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> - state {staging-out} reached >>>> WseSourceProcessor: No destinations to route the event >>>> - staging file >>>> gsiftp://grid-data.man.ac.uk/551218202107278041688/stdout.txt -> >>>> webdav://test:to...@ig...:18080/filestage/551218202107278041688/stdout.txt >>>> [Fatal Error] :-1:-1: Premature end of file. >>>> - total byte write 4353 >>>> [Fatal Error] :-1:-1: Premature end of file. >>>> - stdout.txt staged >>>> - staging file >>>> gsiftp://grid-data.man.ac.uk/551218202107278041688/stderr.txt -> >>>> webdav://test:to...@ig...:18080/filestage/551218202107278041688/stderr.txt >>>> [Fatal Error] :-1:-1: Premature end of file. >>>> - total byte write 194 >>>> [Fatal Error] :-1:-1: Premature end of file. >>>> - stderr.txt staged >>>> - staging file >>>> gsiftp://grid-data.man.ac.uk/551218202107278041688/_th-details.out -> >>>> webdav://test:to...@ig...:18080/filestage/551218202107278041688/_th-details.out >>>> [Fatal Error] :-1:-1: Premature end of file. >>>> - total byte write 32075 >>>> [Fatal Error] :-1:-1: Premature end of file. >>>> - _th-details.out staged >>>> - state {staged-out} reached >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fe5 threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> - state {done} reached >>>> - done 1174414390889 1174414391097 1174414406236 1174414430046 >>>> 1174414449984 1174414470047 1174414482532 1174414482572 >>>> WseSourceProcessor: No destinations to route the event >>>> - Job ff80808211705e810111708d76680005.-6a0625b2:111705f74ad:-7fe4 threw >>>> a JobExecutionException: >>>> org.quartz.JobExecutionException >>>> at >>>> org.icenigrid.gridsam.core.plugin.manager.DefaultJobManagerContext$StageTask.execute(DefaultJobManagerContext.java:525) >>>> at org.quartz.core.JobRunShell.run(JobRunShell.java:191) >>>> at >>>> org.quartz.simpl.SimpleThreadPool$WorkerThread.run(SimpleThreadPool.java:516) >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>>> >>> >>> >>> >>> >> >> >> > > -- --------------------------------------------------------------------------- Mark Halling-Brown | Tel: +44-20-7631-6839 Research Associate | Room 359 | Fax: +44-20-7631-6803 School of Crystallography | Birkbeck College | Email: Malet Street | gh...@ma... London WC1E 7HX | ma...@gm... UK | http://people.cryst.bbk.ac.uk/~ghall04 --------------------------------------------------------------------------- |