|
From: G.T. C. <gt...@ca...> - 2006-04-07 13:17:24
|
Dear William
we did another test!! it seems if running a job and copy output back is
fine. however, if a job need stage in, the condor can not find where to
read the inpurt data. from the example cat-stagin.jsdl. the input files put
under <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument> however,
it seems condor submiter does not know how to find this virtual directory
(which physically located under /tmp/gridsam-......./dir) and readt input
files. is this due to the configuration from classad.groovy as well?
thnak you so much!!
gen-tao
On Apr 7 2006, William Lee wrote:
>According to the Condor documentation, the transfer_output_files
>attribute is best to be left off. That's the reason why it's not in
>the classad.groovy script in the first place. If your condor system
>handle file output staging correctly, the code in the classad.groovy
>file that generates the "transfer_output_files" attribute can be
>commented out.
>
>William
>
>On 7 Apr 2006, at 10:11, G.T. Chiang wrote:
>
>> Dear William
>>
>> thakn you very much! this version works! right now i can
>> see .condor.script in the condor-submitter. i have tested one of
>> the testing job, the jsdl is like the following
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl">
>> <JobDescription>
>> <Application>
>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/
>> 2005/06/jsdl-posix">
>> <Executable>/bin/echo</Executable>
>> <Argument>hello hihi</Argument>
>> <Output>stdout.txt</Output>
>> </POSIXApplication>
>> </Application>
>> <DataStaging>
>> <FileName>unknown-file.txt</FileName>
>> <CreationFlag>overwrite</CreationFlag>
>> <DeleteOnTermination>true</DeleteOnTermination>
>> <Target>
>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/
>> output/subdir/output-file.txt</URI>
>> </Target>
>> </DataStaging>
>> </JobDescription>
>> </JobDefinition>
>>
>>
>> the condor.scrip is like the foolwing
>> universe=vanilla
>> when_to_transfer_output=ON_EXIT
>> should_transfer_files=IF_NEEDED
>> should_transfer_files=YES
>> notification=Never
>> log=/tmp/condor.log
>>
>>
>> executable=/bin/echo
>> arguments=hello gen-tao
>> output=stdout.txt
>>
>>
>>
>> transfer_output_files=unknown-file.txt
>> queue
>>
>>
>> however, this jobs are keep in idel status in condor. however, if i
>> remove the transfer_outpu_files=unknow-file.txt, the job can be
>> executed. i have seen that the similar problem wiht GT4 and condor.
>> it seems that is condor problme. is that relatd to file system?
>> sorry, i think the gridsam is wokring now! just need to figure out
>> what's wrong in our condor pool!!
>>
>> thank you very much!!!
>>
>> Best Regard!
>> gen-tao
>>
>>
>>
>>
>>
>>
>>
>> On Apr 6 2006, William Lee wrote:
>>
>>> Thanks. The classad.groovy copy I have put up is for later
>>> version that is not yet bundled with OMII.
>>>
>>> I have an updated copy for OMII 2.3.3 at the same location.
>>> Please try that instead.
>>>
>>> William
>>>
>>>
>>> On 6 Apr 2006, at 15:06, OMII Support wrote:
>>>
>>>> When replying, type your text above this line. Notification of
>>>> Query Change
>>>>
>>>>
>>>> Priority: Normal Status: Agent Replied
>>>> Creation Date: 03/04/2006 Creation Time: 13:30:47
>>>> Created By: ge...@ni...
>>>>
>>>> Click here to view Query in Browser
>>>>
>>>> Description:
>>>> Entered on 06/04/2006 at 15:06:02 by William Lee (GridSAM):
>>>> May I ask which version of GridSAM you are using? If it's from the
>>>> OMII bundle, which OMII version?
>>>>
>>>> William
>>>>
>>>> On 6 Apr 2006, at 14:58, G.T. Chiang wrote:
>>>>
>>>> > Dear William
>>>> >
>>>> > thakn you very much!! this verison is getting better, the
>>>> > following is the gridsam-status results. at least job is being
>>>> > processing via condor, but somehow it fails.
>>>> >
>>>> > [root@agorilla examples]# gridsam-status -s "http://
>>>> > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?
>>>> wsdl"
>>>> > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress:
>>>> pending -
>>>> > > staging-in -> staged-in -> active -> failed
>>>> >
>>>> > --- pending - 2006-04-06 14:52:09.0 ---
>>>> > job is being scheduled
>>>> > --- staging-in - 2006-04-06 14:52:09.0 ---
>>>> > staging files...
>>>> > --- staged-in - 2006-04-06 14:52:09.0 ---
>>>> > 2 files staged in
>>>> > --- active - 2006-04-06 14:52:09.0 ---
>>>> > job is being launched through condor
>>>> > --- failed - 2006-04-06 14:52:09.0 ---
>>>> > expecting job property urn:condor:classad from previous stage
>>>> >
>>>> > --------------
>>>> > Job Properties
>>>> > --------------
>>>> > urn:gridsam:Description=cat job description
>>>> > urn:gridsam:JobProject=gridsam project
>>>> > urn:gridsam:JobAnnotation=no annotation
>>>> > urn:gridsam:JobName=cat job
>>>> > [root@agorilla examples]#
>>>> >
>>>> >
>>>> > the following is the log from gridsam.log
>>>> >
>>>> > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > state {pending} reached 2006-04-06 14:52:09,574 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] initialised working
>>>> directory: /
>>>> > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06
>>>> > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state
>>>> {staging-
>>>> > in} reached 2006-04-06 14:52:09,662 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http://
>>>> > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp://
>>>> > gri...@ce.../tmp/
>>>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06
>>>> > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/
>>>> file1.txt
>>>> > staged 2006-04-06 14:52:09,791 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp://
>>>> > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt ->
>>>> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt
>>>> > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached
>>>> > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > executing groovy script classad.groovy 2006-04-06 14:52:09,871
>>>> INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] executed groovy script
>>>> > classad.groovy 2006-04-06 14:52:09,898 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {active} reached
>>>> > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021]
>>>> > Failed to submit condor job: expecting job property
>>>> > urn:condor:classad from previous stage 2006-04-06 14:52:09,903
>>>> INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {failed} reached
>>>> > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > failed 1144331529450 1144331529607 1144331529843 1144331529898
>>>> > 1144331529903
>>>> >
>>>> >
>>>> > is it possible to obtain the condor job description file which
>>>> > converted from gridsam JSDL. can I try to submit it to condor
>>>> > directly?
>>>> >
>>>> > Best Regard!
>>>> > gen-tao
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Apr 6 2006, William Lee wrote:
>>>> >
>>>> >>
>>>> >> Please try the classad.groovy script at this location. It
>>>> >> incorporates a solution that sets up the transfer_input_files and
>>>> >> the transfer_output_files classad attributes in the JSDL-to-
>>>> >> Classad translation. This is needed if the submission node (the
>>>> >> node which GridSAM is running) does not necessarily share a
>>>> >> common file system with the execution nodes.
>>>> >>
>>>> >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>> >>
>>>> >> William
>>>> >>
>>>> >> On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>> >>
>>>> >>> [Duplicate message snipped]
>>>>
>>>> Entered on 06/04/2006 at 15:00:02 by gt...@ca...:
>>>> Dear William
>>>>
>>>> thakn you very much!! this verison is getting better, the
>>>> following is
>>>> the gridsam-status results. at least job is being processing via
>>>> condor,
>>>> but somehow it fails.
>>>>
>>>> [root@agorilla examples]# gridsam-status -s
>>>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/
>>>> gridsam?wsdl"
>>>> urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress:
>>>> pending ->
>>>> staging-in -> staged-in -> active -> failed
>>>>
>>>> --- pending - 2006-04-06 14:52:09.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-06 14:52:09.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-06 14:52:09.0 ---
>>>> 2 files staged in
>>>> --- active - 2006-04-06 14:52:09.0 ---
>>>> job is being launched through condor
>>>> --- failed - 2006-04-06 14:52:09.0 ---
>>>> expecting job property urn:condor:classad from previous stage
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:gridsam:Description=cat job description
>>>> urn:gridsam:JobProject=gridsam project
>>>> urn:gridsam:JobAnnotation=no annotation
>>>> urn:gridsam:JobName=cat job
>>>> [root@agorilla examples]#
>>>>
>>>> the following is the log from gridsam.log
>>>>
>>>> 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> state
>>>> {pending} reached 2006-04-06 14:52:09,574 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] initialised working directory:
>>>> /tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06
>>>> 14:52:09,607 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] state {staging-in} reached
>>>> 2006-04-06
>>>> 14:52:09,662 INFO [006868a90a6f50f8010a6f796cea0021] staging
>>>> (copy) file
>>>> http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt ->
>>>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt
>>>> 2006-04-06 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> dir1/file1.txt staged 2006-04-06 14:52:09,791 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] staging (copy) file
>>>> ftp://anonymous:anonymous@128.232.232.41:19245/subdir/input-
>>>> file.txt ->
>>>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt
>>>> 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached
>>>> 2006-04-06
>>>> 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] executing
>>>> groovy
>>>> script classad.groovy 2006-04-06 14:52:09,871 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] executed groovy script
>>>> classad.groovy
>>>> 2006-04-06 14:52:09,898 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> state
>>>> {active} reached 2006-04-06 14:52:09,903 ERROR
>>>> [006868a90a6f50f8010a6f796cea0021] Failed to submit condor job:
>>>> expecting
>>>> job property urn:condor:classad from previous stage 2006-04-06
>>>> 14:52:09,903
>>>> INFO [006868a90a6f50f8010a6f796cea0021] state {failed} reached
>>>> 2006-04-06
>>>> 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] failed
>>>> 1144331529450
>>>> 1144331529607 1144331529843 1144331529898 1144331529903
>>>>
>>>> is it possible to obtain the condor job description file which
>>>> converted
>>>> from gridsam JSDL. can I try to submit it to condor directly?
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> On Apr 6 2006, William Lee wrote:
>>>>
>>>> >
>>>> >Please try the classad.groovy script at this location. It
>>>> >incorporates a solution that sets up the transfer_input_files
>>>> and the
>>>> >transfer_output_files classad attributes in the JSDL-to-Classad
>>>> >translation. This is needed if the submission node (the node which
>>>> >GridSAM is running) does not necessarily share a common file system
>>>> >with the execution nodes.
>>>> >
>>>> >http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>> >
>>>> >William
>>>> >
>>>> >On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>> >
>>>> >> [Duplicate message snipped]
>>>>
>>>> Entered on 06/04/2006 at 13:57:02 by William Lee (GridSAM):
>>>> Please try the classad.groovy script at this location. It
>>>> incorporates a solution that sets up the transfer_input_files and
>>>> the
>>>> transfer_output_files classad attributes in the JSDL-to-Classad
>>>> translation. This is needed if the submission node (the node which
>>>> GridSAM is running) does not necessarily share a common file system
>>>> with the execution nodes.
>>>>
>>>> http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>>
>>>> William
>>>>
>>>> On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>>
>>>> > [Duplicate message snipped]
>>>>
>>>> Entered on 04/04/2006 at 15:15:01 by gt...@ca...:
>>>> Dear William
>>>>
>>>> thank you so much!! i modefi the classad.grrovy with adding your
>>>> code.
>>>> now the probme becomes undefined and job can not submited to condor.
>>>> [root@agorilla examples]# gridsam-status -s
>>>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/
>>>> gridsam?wsdl"
>>>> urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress:
>>>> pending ->
>>>> staging-in -> staged-in -> undefined
>>>>
>>>> --- pending - 2006-04-04 15:07:13.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-04 15:07:13.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-04 15:07:13.0 ---
>>>> 2 files staged in
>>>> --- undefined - 2006-04-04 15:07:13.0 ---
>>>> cannot advance from 'staged-in' to 'done'
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:condor:purestaging=true
>>>> [root@agorilla examples]#
>>>>
>>>> thank you for any suggestion!!
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> On Apr 4 2006, OMII Support wrote:
>>>>
>>>> >[Duplicate message snipped]
>>>>
>>>> Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM):
>>>> Hi Gen Tao,
>>>>
>>>> You are right, according to the condor setup, you would have to
>>>> modify the classad.groovy script to enable the transfer_input_files
>>>> and transfer_output_files classad attributes. This only applies to
>>>> condor setup that does not share a common networked file system.
>>>>
>>>> The code to add to the classad.groovy is
>>>>
>>>> jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/
>>>> jsdl:DataStaging", ns).eachWithIndex(){
>>>> node, index ->
>>>> if(index == 0){
>>>> script += "transfer_input_files="
>>>> }
>>>> if(!node.select("jsdl:Source", ns).isEmpty()){
>>>> fileName = node.select("jsdl:FileName")[0].text;
>>>> script += "${fileName} ,"
>>>> }
>>>> }
>>>>
>>>> I haven't been able to test the code above. Feel free to make any
>>>> modification as you see fit.
>>>>
>>>> William
>>>>
>>>> On 3 Apr 2006, at 20:09, OMII Support wrote:
>>>>
>>>> > [Duplicate message snipped]
>>>>
>>>> Entered on 03/04/2006 at 20:09:02 by gt...@ca...:
>>>> Dear Sir
>>>>
>>>> the resuts as following:
>>>> [condor@badger1--niees--group jobs]$ less stderr.txt
>>>> condor_exec.exe: dir1/file1.txt: No such file or directory
>>>> condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory
>>>> [condor@badger1--niees--group jobs]$
>>>>
>>>> those files had been staged to the central manager, but not in the
>>>> executing node. sorry, our central manager is not configured to
>>>> run jobs.
>>>> thus, central manager will submit jobs to other machines. that's
>>>> why when i
>>>> run this condor job at executing node, and can not find related
>>>> files. is
>>>> this normal? shoudl central manager copy those files to other
>>>> work nodes as
>>>> well? souhld i changing sometihgn in classad.groovy?
>>>>
>>>> thank you very much!!
>>>>
>>>> gen-tao
>>>>
>>>> thakn you very much!!
>>>>
>>>> On Apr 3 2006, OMII Support wrote:
>>>>
>>>> >[Duplicate message snipped]
>>>>
>>>> Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM):
>>>> It's not apparent where the problem lies. Condor has reported to
>>>> GridSAM the job has
>>>> completed successfully with exit code 1. Hence the description
>>>> shown in the EXECUTED
>>>> state.
>>>>
>>>> Can you try running a condor job with the following classad
>>>> directly?
>>>>
>>>> universe=vanilla
>>>> when_to_transfer_output=ON_EXIT
>>>> should_transfer_files=IF_NEEDED
>>>> notification=Never
>>>> log=/tmp/condor.log
>>>> executable=/bin/cat
>>>> arguments=dir1/file1.txt dir2/subdir1/file2.txt
>>>> output=stdout.txt
>>>> error=stderr.txt
>>>>
>>>> queue
>>>>
>>>> Entered on 03/04/2006 at 13:30:47 by ge...@ni...:
>>>> Dear Sir
>>>>
>>>> i am trying to run some GridSAM testing programs. however, it
>>>> seems the jobs can not be executed in our condor pool. the
>>>> condor pool is working. the job can be submited to the
>>>> condor_submitter and running at condor node, but then failed.
>>>>
>>>> the following are some information!
>>>>
>>>> this is the modefied cat-staging.jsdl
>>>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl">
>>>> <JobDescription>
>>>> <JobIdentification>
>>>> <JobName>cat job</JobName>
>>>> <Description>cat job description</Description>
>>>> <JobAnnotation>no annotation</JobAnnotation>
>>>> <JobProject>gridsam project</JobProject>
>>>> </JobIdentification>
>>>> <Application>
>>>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/
>>>> jsdl- posix">
>>>> <Executable>/bin/cat</Executable>
>>>> <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument>
>>>> <Output>stdout.txt</Output>
>>>> <Error>stderr.txt</Error>
>>>> </POSIXApplication>
>>>> </Application>
>>>> <DataStaging>
>>>> <FileName>dir1/file1.txt</FileName>
>>>> <CreationFlag >overwrite</CreationFlag>
>>>> <Source>
>>>> <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI>
>>>> </Source>
>>>> </DataStaging>
>>>> <DataStaging>
>>>> <FileName>dir2/subdir1/file2.txt</FileName>
>>>> <CreationFlag>overwrite</CreationFlag>
>>>> <Source>
>>>> <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input-
>>>> file.txt</URI>
>>>> </Source>
>>>> </DataStaging>
>>>> <DataStaging>
>>>> <FileName>stdout.txt</FileName>
>>>> <CreationFlag>overwrite</CreationFlag>
>>>> <DeleteOnTermination>true</DeleteOnTermination>
>>>> <Target>
>>>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/
>>>> stdout.txt</URI>
>>>> </Target>
>>>> </DataStaging>
>>>> </JobDescription>
>>>> </JobDefinition>
>>>>
>>>> after submit this file
>>>>
>>>> [root@agorilla examples]# gridsam-status -s "http://
>>>> agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?
>>>> wsdl" urn:gridsam:006868a90a4d221e010a5fb493650117
>>>> Job Progress: pending -> staging-in -> staged-in -> active ->
>>>> executed -> staging-out -> staged-out -> done
>>>>
>>>> --- pending - 2006-04-03 13:22:50.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-03 13:22:50.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-03 13:22:59.0 ---
>>>> 2 files staged in
>>>> --- active - 2006-04-03 13:22:59.0 ---
>>>> job is being launched through condor
>>>> --- executed - 2006-04-03 13:23:04.0 ---
>>>> 04/03 13:23:52 Job terminated. (1) Normal termination (return
>>>> value 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr
>>>> 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys
>>>> 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00
>>>> - Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run
>>>> Bytes Received By Job 126 - Total Bytes Sent By Job 15992 -
>>>> Total Bytes Received By Job
>>>> --- staging-out - 2006-04-03 13:23:04.0 ---
>>>> staging files out...
>>>> --- staged-out - 2006-04-03 13:23:04.0 ---
>>>> 1 files staged out
>>>> --- done - 2006-04-03 13:23:04.0 ---
>>>> Job completed
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:gridsam:Description=cat job description
>>>> urn:gridsam:JobProject=gridsam project
>>>> urn:gridsam:JobAnnotation=no annotation
>>>> urn:gridsam:JobName=cat job
>>>> urn:condor:classad=universe=vanilla
>>>> when_to_transfer_output=ON_EXIT
>>>> should_transfer_files=IF_NEEDED
>>>> notification=Never
>>>> log=/tmp/condor.log
>>>>
>>>> executable=/bin/cat
>>>> arguments=dir1/file1.txt dir2/subdir1/file2.txt
>>>> output=stdout.txt
>>>>
>>>> error=stderr.txt
>>>>
>>>> queue
>>>> urn:condor:clusterid=191
>>>> urn:gridsam:exitcode=1
>>>> [root@agorilla examples]#
>>>>
>>>> if i go the the executing node and the log indicates the following
>>>> 4/3 13:23:47 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9632>
>>>> 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO),
>>>> calling handler (command_match_info)
>>>> 4/3 13:23:47 vm1: match_info called
>>>> 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674
>>>> 4/3 13:23:47 vm1: State change: match notification protocol
>>>> successful
>>>> 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched
>>>> 4/3 13:23:47 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9693>
>>>> 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM),
>>>> calling handler (command_request_claim)
>>>> 4/3 13:23:47 vm1: Request accepted.
>>>> 4/3 13:23:47 vm1: Remote owner is
>>>> gri...@ce...
>>>> 4/3 13:23:47 vm1: State change: claiming protocol successful
>>>> 4/3 13:23:47 vm1: Changing state: Matched -> Claimed
>>>> 4/3 13:23:50 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9669>
>>>> 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM),
>>>> calling handler (command_activate_claim)
>>>> 4/3 13:23:50 vm1: Got activate_claim request from shadow
>>>> (<172.24.89.61:9669>)
>>>> 4/3 13:23:50 vm1: Remote job ID is 191.0
>>>> 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad
>>>> 4/3 13:23:50 vm1: State change: claim-activation protocol successful
>>>> 4/3 13:23:50 vm1: Changing activity: Idle -> Busy
>>>> 4/3 13:23:51 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9652>
>>>> 4/3 13:23:51 DaemonCore: received command 404
>>>> (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
>>>> 4/3 13:23:51 vm1: Called deactivate_claim_forcibly()
>>>> 4/3 13:23:51 Starter pid 31148 exited with status 0
>>>> 4/3 13:23:51 vm1: State change: starter exited
>>>> 4/3 13:23:51 vm1: Changing activity: Busy -> Idle
>>>> 4/3 13:23:52 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9620>
>>>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM),
>>>> calling handler (command_handler)
>>>> 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command
>>>> 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle ->
>>>> Preempting/Vacating
>>>> 4/3 13:23:52 vm1: State change: No preempting claim, returning
>>>> to owner
>>>> 4/3 13:23:52 vm1: Changing state and activity: Preempting/
>>>> Vacating - > Owner/Idle
>>>> 4/3 13:23:52 vm1: State change: IS_OWNER is false
>>>> 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed
>>>> 4/3 13:23:52 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9675>
>>>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM),
>>>> calling handler (command_handler)
>>>> 4/3 13:23:52 Error: can't find resource with capability
>>>> (<172.24.89.1:9666>#7928521674)
>>>>
>>>> it seems the job can be submit to condor centra-manager and
>>>> executing at node, then terminated by unknown reason. it is fine
>>>> to run jobs either from condor_submit or from globus. I am
>>>> confusing this is due to our condor setting our gridsam or OMII
>>>> level.
>>>>
>>>> BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the
>>>> virtula files are there.
>>>>
>>>> sorry, I am not sure should I ask GridSAM question here, I am
>>>> not even sure that is gridsam, OMII, or condor problem. becasue,
>>>> when i run PBAC test, it fail again. i was working just after
>>>> reinstall OMII server.
>>>>
>>>> thank you so much for giving any assistant.
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> Current Assignees: Steve McGough (GridSAM), William Lee
>>>> (GridSAM), Steven Newhouse
>>>>
>>>> CC(s):
>>>>
>>>> Contact Information:
>>>>
>>>> Customer Name: Gen-Tao Chiang Email address: ge...@ni...
>>>> Organisation: NIEeS Secondary email address: gt...@ca...
>>>>
>>>>
>>>
>>> --- William Lee - Software Coordinator ---
>>> --- London e-Science Centre, Imperial College London ---
>>> A: Room 211a, London e-Science Centre, William Penney Laboratory,
>>> Imperial College London, South Kensington, London, SW7 2AZ, UK
>>> E: wwhl at doc.ic.ac.uk | william at imageunion.com
>>> W: www.lesc.ic.ac.uk | www.imageunion.com
>>> P: +44 (0) 207 594 8185
>>>
>>>
>>>
>
>--- William Lee - Software Coordinator ---
>--- London e-Science Centre, Imperial College London ---
>A: Room 211a, London e-Science Centre, William Penney Laboratory,
>Imperial College London, South Kensington, London, SW7 2AZ, UK
>E: wwhl at doc.ic.ac.uk | william at imageunion.com
>W: www.lesc.ic.ac.uk | www.imageunion.com
>P: +44 (0) 207 594 8185
>
>
>
|