|
From: William L. <ww...@do...> - 2006-04-06 15:19:47
|
Thanks. The classad.groovy copy I have put up is for later version that is not yet bundled with OMII. I have an updated copy for OMII 2.3.3 at the same location. Please try that instead. William On 6 Apr 2006, at 15:06, OMII Support wrote: > When replying, type your text above this line. Notification of > Query Change > > > Priority: Normal Status: Agent Replied > Creation Date: 03/04/2006 Creation Time: 13:30:47 > Created By: ge...@ni... > > Click here to view Query in Browser > > Description: > Entered on 06/04/2006 at 15:06:02 by William Lee (GridSAM): > May I ask which version of GridSAM you are using? If it's from the > OMII bundle, which OMII version? > > William > > On 6 Apr 2006, at 14:58, G.T. Chiang wrote: > > > Dear William > > > > thakn you very much!! this verison is getting better, the > > following is the gridsam-status results. at least job is being > > processing via condor, but somehow it fails. > > > > [root@agorilla examples]# gridsam-status -s "http:// > > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?wsdl" > > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: pending - > > > staging-in -> staged-in -> active -> failed > > > > --- pending - 2006-04-06 14:52:09.0 --- > > job is being scheduled > > --- staging-in - 2006-04-06 14:52:09.0 --- > > staging files... > > --- staged-in - 2006-04-06 14:52:09.0 --- > > 2 files staged in > > --- active - 2006-04-06 14:52:09.0 --- > > job is being launched through condor > > --- failed - 2006-04-06 14:52:09.0 --- > > expecting job property urn:condor:classad from previous stage > > > > -------------- > > Job Properties > > -------------- > > urn:gridsam:Description=cat job description > > urn:gridsam:JobProject=gridsam project > > urn:gridsam:JobAnnotation=no annotation > > urn:gridsam:JobName=cat job > > [root@agorilla examples]# > > > > > > the following is the log from gridsam.log > > > > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] > > state {pending} reached 2006-04-06 14:52:09,574 INFO > > [006868a90a6f50f8010a6f796cea0021] initialised working directory: / > > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 > > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state {staging- > > in} reached 2006-04-06 14:52:09,662 INFO > > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http:// > > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp:// > > gri...@ce.../tmp/ > > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06 > > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/file1.txt > > staged 2006-04-06 14:52:09,791 INFO > > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp:// > > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt -> > > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ > > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt > > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] > > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO > > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached > > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] > > executing groovy script classad.groovy 2006-04-06 14:52:09,871 INFO > > [006868a90a6f50f8010a6f796cea0021] executed groovy script > > classad.groovy 2006-04-06 14:52:09,898 INFO > > [006868a90a6f50f8010a6f796cea0021] state {active} reached > > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021] > > Failed to submit condor job: expecting job property > > urn:condor:classad from previous stage 2006-04-06 14:52:09,903 INFO > > [006868a90a6f50f8010a6f796cea0021] state {failed} reached > > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] > > failed 1144331529450 1144331529607 1144331529843 1144331529898 > > 1144331529903 > > > > > > is it possible to obtain the condor job description file which > > converted from gridsam JSDL. can I try to submit it to condor > > directly? > > > > Best Regard! > > gen-tao > > > > > > > > > > > > > > > > > > On Apr 6 2006, William Lee wrote: > > > >> > >> Please try the classad.groovy script at this location. It > >> incorporates a solution that sets up the transfer_input_files and > >> the transfer_output_files classad attributes in the JSDL-to- > >> Classad translation. This is needed if the submission node (the > >> node which GridSAM is running) does not necessarily share a > >> common file system with the execution nodes. > >> > >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy > >> > >> William > >> > >> On 4 Apr 2006, at 15:15, OMII Support wrote: > >> > >>> [Duplicate message snipped] > > Entered on 06/04/2006 at 15:00:02 by gt...@ca...: > Dear William > > thakn you very much!! this verison is getting better, the following is > the gridsam-status results. at least job is being processing via > condor, > but somehow it fails. > > [root@agorilla examples]# gridsam-status -s > "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ > gridsam?wsdl" > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: pending -> > staging-in -> staged-in -> active -> failed > > --- pending - 2006-04-06 14:52:09.0 --- > job is being scheduled > --- staging-in - 2006-04-06 14:52:09.0 --- > staging files... > --- staged-in - 2006-04-06 14:52:09.0 --- > 2 files staged in > --- active - 2006-04-06 14:52:09.0 --- > job is being launched through condor > --- failed - 2006-04-06 14:52:09.0 --- > expecting job property urn:condor:classad from previous stage > > -------------- > Job Properties > -------------- > urn:gridsam:Description=cat job description > urn:gridsam:JobProject=gridsam project > urn:gridsam:JobAnnotation=no annotation > urn:gridsam:JobName=cat job > [root@agorilla examples]# > > the following is the log from gridsam.log > > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] state > {pending} reached 2006-04-06 14:52:09,574 INFO > [006868a90a6f50f8010a6f796cea0021] initialised working directory: > /tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 > 14:52:09,607 INFO > [006868a90a6f50f8010a6f796cea0021] state {staging-in} reached > 2006-04-06 > 14:52:09,662 INFO [006868a90a6f50f8010a6f796cea0021] staging (copy) > file > http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt > 2006-04-06 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] > dir1/file1.txt staged 2006-04-06 14:52:09,791 INFO > [006868a90a6f50f8010a6f796cea0021] staging (copy) file > ftp://anonymous:anonymous@128.232.232.41:19245/subdir/input- > file.txt -> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached > 2006-04-06 > 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] executing groovy > script classad.groovy 2006-04-06 14:52:09,871 INFO > [006868a90a6f50f8010a6f796cea0021] executed groovy script > classad.groovy > 2006-04-06 14:52:09,898 INFO [006868a90a6f50f8010a6f796cea0021] state > {active} reached 2006-04-06 14:52:09,903 ERROR > [006868a90a6f50f8010a6f796cea0021] Failed to submit condor job: > expecting > job property urn:condor:classad from previous stage 2006-04-06 > 14:52:09,903 > INFO [006868a90a6f50f8010a6f796cea0021] state {failed} reached > 2006-04-06 > 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] failed > 1144331529450 > 1144331529607 1144331529843 1144331529898 1144331529903 > > is it possible to obtain the condor job description file which > converted > from gridsam JSDL. can I try to submit it to condor directly? > > Best Regard! > gen-tao > > On Apr 6 2006, William Lee wrote: > > > > >Please try the classad.groovy script at this location. It > >incorporates a solution that sets up the transfer_input_files and the > >transfer_output_files classad attributes in the JSDL-to-Classad > >translation. This is needed if the submission node (the node which > >GridSAM is running) does not necessarily share a common file system > >with the execution nodes. > > > >http://www.doc.ic.ac.uk/~wwhl/classad.groovy > > > >William > > > >On 4 Apr 2006, at 15:15, OMII Support wrote: > > > >> [Duplicate message snipped] > > Entered on 06/04/2006 at 13:57:02 by William Lee (GridSAM): > Please try the classad.groovy script at this location. It > incorporates a solution that sets up the transfer_input_files and the > transfer_output_files classad attributes in the JSDL-to-Classad > translation. This is needed if the submission node (the node which > GridSAM is running) does not necessarily share a common file system > with the execution nodes. > > http://www.doc.ic.ac.uk/~wwhl/classad.groovy > > William > > On 4 Apr 2006, at 15:15, OMII Support wrote: > > > [Duplicate message snipped] > > Entered on 04/04/2006 at 15:15:01 by gt...@ca...: > Dear William > > thank you so much!! i modefi the classad.grrovy with adding your code. > now the probme becomes undefined and job can not submited to condor. > [root@agorilla examples]# gridsam-status -s > "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ > gridsam?wsdl" > urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress: pending -> > staging-in -> staged-in -> undefined > > --- pending - 2006-04-04 15:07:13.0 --- > job is being scheduled > --- staging-in - 2006-04-04 15:07:13.0 --- > staging files... > --- staged-in - 2006-04-04 15:07:13.0 --- > 2 files staged in > --- undefined - 2006-04-04 15:07:13.0 --- > cannot advance from 'staged-in' to 'done' > > -------------- > Job Properties > -------------- > urn:condor:purestaging=true > [root@agorilla examples]# > > thank you for any suggestion!! > > Best Regard! > gen-tao > > On Apr 4 2006, OMII Support wrote: > > >[Duplicate message snipped] > > Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM): > Hi Gen Tao, > > You are right, according to the condor setup, you would have to > modify the classad.groovy script to enable the transfer_input_files > and transfer_output_files classad attributes. This only applies to > condor setup that does not share a common networked file system. > > The code to add to the classad.groovy is > > jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/ > jsdl:DataStaging", ns).eachWithIndex(){ > node, index -> > if(index == 0){ > script += "transfer_input_files=" > } > if(!node.select("jsdl:Source", ns).isEmpty()){ > fileName = node.select("jsdl:FileName")[0].text; > script += "${fileName} ," > } > } > > I haven't been able to test the code above. Feel free to make any > modification as you see fit. > > William > > On 3 Apr 2006, at 20:09, OMII Support wrote: > > > [Duplicate message snipped] > > Entered on 03/04/2006 at 20:09:02 by gt...@ca...: > Dear Sir > > the resuts as following: > [condor@badger1--niees--group jobs]$ less stderr.txt > condor_exec.exe: dir1/file1.txt: No such file or directory > condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory > [condor@badger1--niees--group jobs]$ > > those files had been staged to the central manager, but not in the > executing node. sorry, our central manager is not configured to run > jobs. > thus, central manager will submit jobs to other machines. that's > why when i > run this condor job at executing node, and can not find related > files. is > this normal? shoudl central manager copy those files to other work > nodes as > well? souhld i changing sometihgn in classad.groovy? > > thank you very much!! > > gen-tao > > thakn you very much!! > > On Apr 3 2006, OMII Support wrote: > > >[Duplicate message snipped] > > Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM): > It's not apparent where the problem lies. Condor has reported to > GridSAM the job has > completed successfully with exit code 1. Hence the description > shown in the EXECUTED > state. > > Can you try running a condor job with the following classad directly? > > universe=vanilla > when_to_transfer_output=ON_EXIT > should_transfer_files=IF_NEEDED > notification=Never > log=/tmp/condor.log > executable=/bin/cat > arguments=dir1/file1.txt dir2/subdir1/file2.txt > output=stdout.txt > error=stderr.txt > > queue > > Entered on 03/04/2006 at 13:30:47 by ge...@ni...: > Dear Sir > > i am trying to run some GridSAM testing programs. however, it seems > the jobs can not be executed in our condor pool. the condor pool is > working. the job can be submited to the condor_submitter and > running at condor node, but then failed. > > the following are some information! > > this is the modefied cat-staging.jsdl > <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> > <JobDescription> > <JobIdentification> > <JobName>cat job</JobName> > <Description>cat job description</Description> > <JobAnnotation>no annotation</JobAnnotation> > <JobProject>gridsam project</JobProject> > </JobIdentification> > <Application> > <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl- > posix"> > <Executable>/bin/cat</Executable> > <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument> > <Output>stdout.txt</Output> > <Error>stderr.txt</Error> > </POSIXApplication> > </Application> > <DataStaging> > <FileName>dir1/file1.txt</FileName> > <CreationFlag >overwrite</CreationFlag> > <Source> > <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI> > </Source> > </DataStaging> > <DataStaging> > <FileName>dir2/subdir1/file2.txt</FileName> > <CreationFlag>overwrite</CreationFlag> > <Source> > <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input- > file.txt</URI> > </Source> > </DataStaging> > <DataStaging> > <FileName>stdout.txt</FileName> > <CreationFlag>overwrite</CreationFlag> > <DeleteOnTermination>true</DeleteOnTermination> > <Target> > <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/ > stdout.txt</URI> > </Target> > </DataStaging> > </JobDescription> > </JobDefinition> > > after submit this file > > [root@agorilla examples]# gridsam-status -s "http:// > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?wsdl" > urn:gridsam:006868a90a4d221e010a5fb493650117 > Job Progress: pending -> staging-in -> staged-in -> active -> > executed -> staging-out -> staged-out -> done > > --- pending - 2006-04-03 13:22:50.0 --- > job is being scheduled > --- staging-in - 2006-04-03 13:22:50.0 --- > staging files... > --- staged-in - 2006-04-03 13:22:59.0 --- > 2 files staged in > --- active - 2006-04-03 13:22:59.0 --- > job is being launched through condor > --- executed - 2006-04-03 13:23:04.0 --- > 04/03 13:23:52 Job terminated. (1) Normal termination (return value > 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 > 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 > 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - > Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run Bytes > Received By Job 126 - Total Bytes Sent By Job 15992 - Total Bytes > Received By Job > --- staging-out - 2006-04-03 13:23:04.0 --- > staging files out... > --- staged-out - 2006-04-03 13:23:04.0 --- > 1 files staged out > --- done - 2006-04-03 13:23:04.0 --- > Job completed > > -------------- > Job Properties > -------------- > urn:gridsam:Description=cat job description > urn:gridsam:JobProject=gridsam project > urn:gridsam:JobAnnotation=no annotation > urn:gridsam:JobName=cat job > urn:condor:classad=universe=vanilla > when_to_transfer_output=ON_EXIT > should_transfer_files=IF_NEEDED > notification=Never > log=/tmp/condor.log > > executable=/bin/cat > arguments=dir1/file1.txt dir2/subdir1/file2.txt > output=stdout.txt > > error=stderr.txt > > queue > urn:condor:clusterid=191 > urn:gridsam:exitcode=1 > [root@agorilla examples]# > > if i go the the executing node and the log indicates the following > 4/3 13:23:47 DaemonCore: Command received via UDP from host > <172.24.89.61:9632> > 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO), calling > handler (command_match_info) > 4/3 13:23:47 vm1: match_info called > 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674 > 4/3 13:23:47 vm1: State change: match notification protocol successful > 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched > 4/3 13:23:47 DaemonCore: Command received via TCP from host > <172.24.89.61:9693> > 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM), > calling handler (command_request_claim) > 4/3 13:23:47 vm1: Request accepted. > 4/3 13:23:47 vm1: Remote owner is > gri...@ce... > 4/3 13:23:47 vm1: State change: claiming protocol successful > 4/3 13:23:47 vm1: Changing state: Matched -> Claimed > 4/3 13:23:50 DaemonCore: Command received via TCP from host > <172.24.89.61:9669> > 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM), > calling handler (command_activate_claim) > 4/3 13:23:50 vm1: Got activate_claim request from shadow > (<172.24.89.61:9669>) > 4/3 13:23:50 vm1: Remote job ID is 191.0 > 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad > 4/3 13:23:50 vm1: State change: claim-activation protocol successful > 4/3 13:23:50 vm1: Changing activity: Idle -> Busy > 4/3 13:23:51 DaemonCore: Command received via TCP from host > <172.24.89.61:9652> > 4/3 13:23:51 DaemonCore: received command 404 > (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler) > 4/3 13:23:51 vm1: Called deactivate_claim_forcibly() > 4/3 13:23:51 Starter pid 31148 exited with status 0 > 4/3 13:23:51 vm1: State change: starter exited > 4/3 13:23:51 vm1: Changing activity: Busy -> Idle > 4/3 13:23:52 DaemonCore: Command received via UDP from host > <172.24.89.61:9620> > 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), > calling handler (command_handler) > 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command > 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle -> > Preempting/Vacating > 4/3 13:23:52 vm1: State change: No preempting claim, returning to > owner > 4/3 13:23:52 vm1: Changing state and activity: Preempting/Vacating - > > Owner/Idle > 4/3 13:23:52 vm1: State change: IS_OWNER is false > 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed > 4/3 13:23:52 DaemonCore: Command received via UDP from host > <172.24.89.61:9675> > 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), > calling handler (command_handler) > 4/3 13:23:52 Error: can't find resource with capability > (<172.24.89.1:9666>#7928521674) > > it seems the job can be submit to condor centra-manager and > executing at node, then terminated by unknown reason. it is fine to > run jobs either from condor_submit or from globus. I am confusing > this is due to our condor setting our gridsam or OMII level. > > BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the > virtula files are there. > > sorry, I am not sure should I ask GridSAM question here, I am not > even sure that is gridsam, OMII, or condor problem. becasue, when i > run PBAC test, it fail again. i was working just after reinstall > OMII server. > > thank you so much for giving any assistant. > > Best Regard! > gen-tao > > Current Assignees: Steve McGough (GridSAM), William Lee (GridSAM), > Steven Newhouse > > CC(s): > > Contact Information: > > Customer Name: Gen-Tao Chiang Email address: ge...@ni... > Organisation: NIEeS Secondary email address: gt...@ca... > > --- William Lee - Software Coordinator --- --- London e-Science Centre, Imperial College London --- A: Room 211a, London e-Science Centre, William Penney Laboratory, Imperial College London, South Kensington, London, SW7 2AZ, UK E: wwhl at doc.ic.ac.uk | william at imageunion.com W: www.lesc.ic.ac.uk | www.imageunion.com P: +44 (0) 207 594 8185 |