You can subscribe to this list here.
| 2005 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(1) |
Oct
(1) |
Nov
(3) |
Dec
(18) |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2006 |
Jan
(11) |
Feb
(19) |
Mar
(5) |
Apr
(9) |
May
(5) |
Jun
(1) |
Jul
|
Aug
(6) |
Sep
(8) |
Oct
|
Nov
(10) |
Dec
(1) |
| 2007 |
Jan
(3) |
Feb
(14) |
Mar
(8) |
Apr
(5) |
May
(5) |
Jun
(1) |
Jul
(1) |
Aug
|
Sep
|
Oct
(8) |
Nov
(3) |
Dec
(1) |
| 2008 |
Jan
|
Feb
|
Mar
|
Apr
|
May
|
Jun
|
Jul
|
Aug
(5) |
Sep
|
Oct
(1) |
Nov
(9) |
Dec
(7) |
| 2009 |
Jan
(2) |
Feb
(1) |
Mar
|
Apr
(8) |
May
(3) |
Jun
(2) |
Jul
(7) |
Aug
(5) |
Sep
|
Oct
(6) |
Nov
(1) |
Dec
|
| 2010 |
Jan
|
Feb
|
Mar
(6) |
Apr
(1) |
May
(1) |
Jun
(2) |
Jul
(3) |
Aug
(1) |
Sep
|
Oct
|
Nov
|
Dec
(5) |
| 2011 |
Jan
(2) |
Feb
(3) |
Mar
(6) |
Apr
|
May
|
Jun
|
Jul
|
Aug
|
Sep
(2) |
Oct
|
Nov
|
Dec
|
|
From: dejw <de...@ma...> - 2006-09-11 08:55:39
|
Hi, I had no problems with using gridsam client and server (newest version 2.0.0 rc1) over plain http protocol. But after reconfiguration of server to https I have problems with this. I can use web browser to contact with services on tomcat over https without problems. Also gridsam server is visible over https in web browser. But when I try to use gridsam client I have such errors (I turned on detailed logging): my services.properties file: omiiserver=https://pinus.man.poznan.pl:18443/gridsam/services/gridsam ./gridsam-submit -j ../data/examples/sleep.jsdl -sn omiiserver 2006-09-11 10:45:16,970 INFO [GridSAMClientSupport] (main:) request = <grid:submitJob xmlns:grid="http://www.icenigrid.org/service/gridsam" startSuspended="false"><grid:JobDescription><JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl"> <JobDescription xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl"> <JobIdentification xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl"> <JobProject xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl">gridsam</JobProject> </JobIdentification> <Application xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl"> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix"> <Executable xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix">/bin/sleep</Executable> <Argument xmlns="http://schemas.ggf.org/jsdl/2005/11/jsdl-posix">5</Argument> </POSIXApplication> </Application> </JobDescription> </JobDefinition></grid:JobDescription></grid:submitJob> 2006-09-11 10:45:17,319 INFO [CryptoFactory] (main:) Using Crypto Engine [org.apache.ws.security.components.crypto.Merlin] 2006-09-11 10:45:17,590 INFO [HttpMethodDirector] (main:) I/O exception (java.net.SocketException) caught when processing request: Unconnected sockets not implemented 2006-09-11 10:45:17,591 INFO [HttpMethodDirector] (main:) Retrying request 2006-09-11 10:45:17,595 INFO [HttpMethodDirector] (main:) I/O exception (java.net.SocketException) caught when processing request: Unconnected sockets not implemented 2006-09-11 10:45:17,595 INFO [HttpMethodDirector] (main:) Retrying request 2006-09-11 10:45:17,596 INFO [HttpMethodDirector] (main:) I/O exception (java.net.SocketException) caught when processing request: Unconnected sockets not implemented 2006-09-11 10:45:17,596 INFO [HttpMethodDirector] (main:) Retrying request 2006-09-11 10:45:17,602 FATAL [GridSAMSubmit] (main:) unable to submit job: failed to submit job: ; nested exception is: java.net.SocketException: Unconnected sockets not implemented Maybe I should configure something else? Should I change something in client-config.wsdd? Best regards, Dawid Szejnfeld, PSNC |
|
From: dejw <de...@ma...> - 2006-09-07 13:00:07
|
Hi,
I have problems with GridSAM winodws client. When I try to submit a job:
gridsam-submit -j ..\data\examples\sleep.jsdl
One time I have error:
2006-09-07 14:56:57,554 FATAL [GridSAMSubmit] (main:) unable to submit
job: fail
ed to submit job: (501)Not Implemented
other time such error:
2006-09-07 14:57:04,679 FATAL [GridSAMSubmit] (main:) unable to submit
job: fail
ed to submit job: ; nested exception is:
java.net.SocketException: Software caused connection abort: recv
failed
I have also client on linux machine and it work without problems.
Best regards,
Dawid
|
|
From: Garry S. <gar...@co...> - 2006-09-07 10:56:22
|
Hi Dawid, > Wow! :) this is great news for us. I have to look at it closely. Is it > possible to use this tool in other open source project? Sure. The source is GPL so feel free. > > What do you think about gridsam myproxy extension to the jsdl? I have > doubts if it is secure to send such information along with jsdl > description. HTTPS will provide encrypted connections to prevent password sniffing. You just need to trust the administrator running the GridSAM you are contacting to, not to steal your myproxy username and password!! A shorter credential lifetime is intended to limit the damage should this happen! regards Garry > > Thanks for your replay. > > Best regards, > Dawid > > Garry Smith wrote: > >>Hi Dawid, >> >> >> >>>Is it possible to recognize how is target GridSAM service configured? I >>>mean if it uses i.e. globus connector or fork or some other one? This is >>>important as the your jsdl myproxy extension and mpi extension are >>>targeted only to globus connector. >>> >>> >> >>I had the same requirement in my project. The way we addressed it was to >>create a simple web app that we dropped into the GridSAM's container. >> >>The webapp (VTBProducer) parses the associated GridSAM's jobmanger.xml >>and authorisation.xml files and exposes the data to a distributed >>registry (provided by http://dsg.port.ac.uk/projects/tycho/). >> >>Our clients then query the registry in order to determine which GridSAM >>to submit to. The JSDL is then constructed based on the type of >>underlying resource. >> >>Particular examples/screenshots using our Web interface are here: >>http://dsg.port.ac.uk/~garry/votb/locating_resources.html >> >>regards >>Garry >> >>http://garrysmith.net >> >>dejw wrote: >> >> >> >>>Hi, >>> >>>we want to write some plugin to integrate GridSAM with our job >>>submission mechanism - the question is: >>> >>>Is it possible to recognize how is target GridSAM service configured? I >>>mean if it uses i.e. globus connector or fork or some other one? This is >>>important as the your jsdl myproxy extension and mpi extension are >>>targeted only to globus connector. So it would be nice to recognize if >>>target GridSAM uses globus and only then attach these extensions to the >>>jsdl. This is important as we concern myproxy extension which contains >>>very sensitive data as user and password for myproxy server. As you saw >>>in my previous mail we think that is not so secure to add it anyway so >>>we want to limit adding this only to globus based gridsam's. The plugin >>>should be general as it is possible - so plugin should recognize somehow >>>globus based gridsam services. >>> >>>What can you tall me about this? >>> >>>Best regards, >>>Dawid Szejnfled, PSNC >>> >>> >>>------------------------------------------------------------------------- >>>Using Tomcat but need to do more? Need to support web services, security? >>>Get stuff done quickly with pre-integrated technology to make your job easier >>>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>>_______________________________________________ >>>GridSAM-Discuss mailing list >>>Gri...@li... >>>https://lists.sourceforge.net/lists/listinfo/gridsam-discuss >>> >>> >>> >>> >>> >>> >> >> >>------------------------------------------------------------------------- >>Using Tomcat but need to do more? Need to support web services, security? >>Get stuff done quickly with pre-integrated technology to make your job easier >>Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >>http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >>_______________________________________________ >>GridSAM-Discuss mailing list >>Gri...@li... >>https://lists.sourceforge.net/lists/listinfo/gridsam-discuss >> >> > |
|
From: Garry S. <gar...@co...> - 2006-09-07 10:27:06
|
Hi Dawid, > Is it possible to recognize how is target GridSAM service configured? I > mean if it uses i.e. globus connector or fork or some other one? This is > important as the your jsdl myproxy extension and mpi extension are > targeted only to globus connector. I had the same requirement in my project. The way we addressed it was to create a simple web app that we dropped into the GridSAM's container. The webapp (VTBProducer) parses the associated GridSAM's jobmanger.xml and authorisation.xml files and exposes the data to a distributed registry (provided by http://dsg.port.ac.uk/projects/tycho/). Our clients then query the registry in order to determine which GridSAM to submit to. The JSDL is then constructed based on the type of underlying resource. Particular examples/screenshots using our Web interface are here: http://dsg.port.ac.uk/~garry/votb/locating_resources.html regards Garry http://garrysmith.net dejw wrote: >Hi, > >we want to write some plugin to integrate GridSAM with our job >submission mechanism - the question is: > >Is it possible to recognize how is target GridSAM service configured? I >mean if it uses i.e. globus connector or fork or some other one? This is >important as the your jsdl myproxy extension and mpi extension are >targeted only to globus connector. So it would be nice to recognize if >target GridSAM uses globus and only then attach these extensions to the >jsdl. This is important as we concern myproxy extension which contains >very sensitive data as user and password for myproxy server. As you saw >in my previous mail we think that is not so secure to add it anyway so >we want to limit adding this only to globus based gridsam's. The plugin >should be general as it is possible - so plugin should recognize somehow >globus based gridsam services. > >What can you tall me about this? > >Best regards, >Dawid Szejnfled, PSNC > > >------------------------------------------------------------------------- >Using Tomcat but need to do more? Need to support web services, security? >Get stuff done quickly with pre-integrated technology to make your job easier >Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo >http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 >_______________________________________________ >GridSAM-Discuss mailing list >Gri...@li... >https://lists.sourceforge.net/lists/listinfo/gridsam-discuss > > > > |
|
From: dejw <de...@ma...> - 2006-09-07 09:06:15
|
Hi, we want to write some plugin to integrate GridSAM with our job submission mechanism - the question is: Is it possible to recognize how is target GridSAM service configured? I mean if it uses i.e. globus connector or fork or some other one? This is important as the your jsdl myproxy extension and mpi extension are targeted only to globus connector. So it would be nice to recognize if target GridSAM uses globus and only then attach these extensions to the jsdl. This is important as we concern myproxy extension which contains very sensitive data as user and password for myproxy server. As you saw in my previous mail we think that is not so secure to add it anyway so we want to limit adding this only to globus based gridsam's. The plugin should be general as it is possible - so plugin should recognize somehow globus based gridsam services. What can you tall me about this? Best regards, Dawid Szejnfled, PSNC |
|
From: dejw <de...@ma...> - 2006-09-07 08:33:30
|
Hi, I have a question about myproxy extension. Is it a final solution? I mean sending such information along with user and password to myproxy is not too secure I think? What do you think about this? This kind of information could be easily be caught by someone. We want to use gridsam in our infrastructure with globus but this is pretty obstacle for us. Was it used in an production environment? Second question: does your Globus 2.4.3 DRM Connector work with newer globus versions like 3 or better 4? If no do you plan to implement such one? Best regards, Dawid Szejnfeld, PSNC |
|
From: dejw <de...@ma...> - 2006-08-18 07:26:20
|
Hi all, I finally found solution. I found that in gridsam-client.jar in manifest file there was Class-Path - I had to remove this entry although paths were good. Then I had to modify all 'bat' scripts in this way: I added new classpath: set CLASSPATH=.;!CLASSPATH! for %%i in (%GRIDSAM_HOME%\lib\*.jar) do SET CLASSPATH=!CLASSPATH!;%%i set CLASSPATH=!CLASSPATH!;%OMII_CLIENT_HOME%\conf What is important to say: this will work only if you have delayed expansion of variables enabled: http://www.winguides.com/registry/display.php/825/ and I removed also: -cp "%CLASSPATH%" from the java command line as it is too long then. And then scripts worked well. Cheers, Dawid dejw wrote: > Hi, > > I installed omii client 3.0.1 and GridSAM client 2.0.0 on Windows XP. > Then I tried to run 'gridsam-version.bat'. Unfortunately then I had error: > > c:\devel\omii\OMIICLIENT\gridsam\bin>gridsam-version > Exception in thread "main" java.lang.NoClassDefFoundError: > org/icenigrid/gridsam/client/cli/GridSAMVersion > > I checked the script and everything seems to be OK, the path is set > correctly: > > set OMII_CLIENT_HOME=c:\devel\omii\OMIICLIENT > > I didn't have also any errors during installation. > > The target jar also seems to be OK - gridsam-client.jar. > > Any idea? > > Thanks in advance, > Dawid Szejnfeld > > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > GridSAM-Discuss mailing list > Gri...@li... > https://lists.sourceforge.net/lists/listinfo/gridsam-discuss > |
|
From: dejw <de...@ma...> - 2006-08-14 07:33:16
|
Hi, I installed omii client 3.0.1 and GridSAM client 2.0.0 on Windows XP. Then I tried to run 'gridsam-version.bat'. Unfortunately then I had error: c:\devel\omii\OMIICLIENT\gridsam\bin>gridsam-version Exception in thread "main" java.lang.NoClassDefFoundError: org/icenigrid/gridsam/client/cli/GridSAMVersion I checked the script and everything seems to be OK, the path is set correctly: set OMII_CLIENT_HOME=c:\devel\omii\OMIICLIENT I didn't have also any errors during installation. The target jar also seems to be OK - gridsam-client.jar. Any idea? Thanks in advance, Dawid Szejnfeld |
|
From: Ryan D. <rp...@im...> - 2006-08-06 18:41:47
|
Turns out I was using 32-bit jvm on a 64-bit machine. Not very obvious from the error message...but problem solved. Ryan Duffy wrote: > Hi, > > I belatedly realised that I am using GridSAM in fork mode, which means > it is not submitting my jobs to the cluster, but rather is running them > all on a single machine. > This is a major problem for my MSc project which is supposed to be > nearing completion, but is in fact not doing the precise thing it is > supposed to - i.e. submit jobs to various clusters. > > Looking through the documentation, I found that I should change > jobmanager.xml to this: > > <?xml version="1.0" encoding="UTF-8"?> > <module id="jobmanager.drmaa" version="1.0.0"> > > <!-- dependent modules --> > <sub-module > descriptor="org/icenigrid/gridsam/resource/config/common.xml"/> > <sub-module > descriptor="org/icenigrid/gridsam/resource/config/shell.xml"/> > <sub-module > descriptor="org/icenigrid/gridsam/resource/config/drmaa.xml"/> > <sub-module > descriptor="org/icenigrid/gridsam/resource/config/embedded.xml"/> > <sub-module descriptor="database.xml"/> > <sub-module descriptor="authorisation.xml"/> > > </module> > > If I understand correctly, this should be all that is required. > However, when I start up GridSAM, it seems to complain about a missing > file: > ERROR [DrmaaDRMConnectorManager] failed to load the DRMAA library - > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open > shared object file: No such file or directory > > However, this file is present on the filesystem: > bash-3.00$ ls -l /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so > -rwxr-xr-x 1 root root 1635857 May 8 09:17 > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so > > I also tried on another cluster (mars) and got the same error (except > about its version of libdrmaa.so). > I've no idea where to go from here to sort it out. > > Can anyone help? > > > ------------------------------------------------------------- > log of the error from gridsam.log: > > > 2006-08-03 22:28:49,147 DEBUG [DRMConnectorManager] Creating > SingletonProxy for service drmaa.DRMConnectorManager > 2006-08-03 22:28:49,181 DEBUG [DRMConnectorManager] Constructing core > service implementation for service drmaa.DRMConnectorManager > 2006-08-03 22:28:49,360 DEBUG [DrmaaDRMConnectorManager] failed to load > the DRMAA library - > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open > shared object file: No such file or directory > java.lang.UnsatisfiedLinkError: > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open > shared object file: No such file or directory > at java.lang.ClassLoader$NativeLibrary.load(Native Method) > at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1751) > at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1676) > at java.lang.Runtime.loadLibrary0(Runtime.java:822) > at java.lang.System.loadLibrary(System.java:992) > at com.sun.grid.drmaa.SessionImpl$1.run(SessionImpl.java:58) > at java.security.AccessController.doPrivileged(Native Method) > at com.sun.grid.drmaa.SessionImpl.<clinit>(SessionImpl.java:56) > at > com.sun.grid.drmaa.SessionFactoryImpl.getSession(SessionFactoryImpl.java:59) > at > org.icenigrid.gridsam.core.plugin.connector.drmaa.DrmaaDRMConnectorManager.getDrmaaSession(DrmaaDRMConnectorManager.java:61) > at > org.icenigrid.gridsam.core.plugin.connector.drmaa.DrmaaDRMConnectorManager.<clinit>(DrmaaDRMConnectorManager.java:46) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:242) > at > org.apache.hivemind.impl.DefaultClassResolver.lookupClass(DefaultClassResolver.java:101) > at > org.apache.hivemind.impl.DefaultClassResolver.checkForClass(DefaultClassResolver.java:108) > at > org.apache.hivemind.impl.ModuleImpl.resolveType(ModuleImpl.java:191) > at > org.apache.hivemind.service.impl.BuilderFactoryLogic.instantiateCoreServiceInstance(BuilderFactoryLogic.java:100) > at > org.apache.hivemind.service.impl.BuilderFactoryLogic.createService(BuilderFactoryLogic.java:75) > at > org.apache.hivemind.service.impl.BuilderFactory.createCoreServiceImplementation(BuilderFactory.java:42) > at > org.apache.hivemind.impl.InvokeFactoryServiceConstructor.constructCoreServiceImplementation(InvokeFactoryServiceConstructor.java:84) > at > org.apache.hivemind.impl.servicemodel.AbstractServiceModelImpl.constructCoreServiceImplementation(AbstractServiceModelImpl.java:107) > at > org.apache.hivemind.impl.servicemodel.AbstractServiceModelImpl.constructNewServiceImplementation(AbstractServiceModelImpl.java:157) > at > org.apache.hivemind.impl.servicemodel.AbstractServiceModelImpl.constructServiceImplementation(AbstractServiceModelImpl.java:139) > at > org.apache.hivemind.impl.servicemodel.SingletonServiceModel.getActualServiceImplementation(SingletonServiceModel.java:68) > at > $DRMConnectorManager_10cd5f02336._service($DRMConnectorManager_10cd5f02336.java) > at > $DRMConnectorManager_10cd5f02336.initialise($DRMConnectorManager_10cd5f02336.java) > at > $DRMConnectorManager_10cd5f02335.initialise($DRMConnectorManager_10cd5f02335.java) > at > org.icenigrid.gridsam.core.plugin.manager.DelegatingDRMConnectorManager.initialise(DelegatingDRMConnectorManager.java:58) > at > $DRMConnectorManager_10cd5f0232c.initialise($DRMConnectorManager_10cd5f0232c.java) > at > $DRMConnectorManager_10cd5f0232b.initialise($DRMConnectorManager_10cd5f0232b.java) > at > org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.initialise(DefaultJobManager.java:150) > at > org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.sanityCheck(DefaultJobManager.java:184) > at > org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.<init>(DefaultJobManager.java:95) > at > org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.<init>(DefaultJobManager.java:84) > at > org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.<init>(DefaultJobManager.java:74) > at > org.icenigrid.gridsam.webservice.servlet.JobManagerConfigurator.contextInitialized(JobManagerConfigurator.java:46) > at > org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3805) > at > org.apache.catalina.core.StandardContext.start(StandardContext.java:4321) > at > org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:823) > at > org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:807) > at > org.apache.catalina.core.StandardHost.addChild(StandardHost.java:595) > at > org.apache.catalina.core.StandardHostDeployer.install(StandardHostDeployer.java:277) > at > org.apache.catalina.core.StandardHost.install(StandardHost.java:832) > at > org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:683) > at > org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:432) > at org.apache.catalina.startup.HostConfig.start(HostConfig.java:964) > at > org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:349) > at > org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) > at > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1091) > at > org.apache.catalina.core.StandardHost.start(StandardHost.java:789) > at > org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1083) > at > org.apache.catalina.core.StandardEngine.start(StandardEngine.java:478) > at > org.apache.catalina.core.StandardService.start(StandardService.java:476) > at > org.apache.catalina.core.StandardServer.start(StandardServer.java:2298) > at org.apache.catalina.startup.Catalina.start(Catalina.java:556) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:585) > at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:284) > at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:422) > 2006-08-03 22:28:49,368 ERROR [DrmaaDRMConnectorManager] failed to load > the DRMAA library - > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: > /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open > shared object file: No such file or directory > 2006-08-03 22:28:49,368 FATAL [DrmaaDRMConnectorManager] DRMAA > DRMConnector fails to initialise. Please consult the log for advice. > 2006-08-03 22:28:49,416 INFO [JobManagerConfigurator] GridSAM machinery > initialised > > > ------------------------------------------------------------------------- > Take Surveys. Earn Cash. Influence the Future of IT > Join SourceForge.net's Techsay panel and you'll get the chance to share your > opinions on IT & business topics through brief surveys -- and earn cash > http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV > _______________________________________________ > GridSAM-Discuss mailing list > Gri...@li... > https://lists.sourceforge.net/lists/listinfo/gridsam-discuss > |
|
From: Ryan D. <rp...@im...> - 2006-08-03 21:55:46
|
Hi,
I belatedly realised that I am using GridSAM in fork mode, which means
it is not submitting my jobs to the cluster, but rather is running them
all on a single machine.
This is a major problem for my MSc project which is supposed to be
nearing completion, but is in fact not doing the precise thing it is
supposed to - i.e. submit jobs to various clusters.
Looking through the documentation, I found that I should change
jobmanager.xml to this:
<?xml version="1.0" encoding="UTF-8"?>
<module id="jobmanager.drmaa" version="1.0.0">
<!-- dependent modules -->
<sub-module
descriptor="org/icenigrid/gridsam/resource/config/common.xml"/>
<sub-module
descriptor="org/icenigrid/gridsam/resource/config/shell.xml"/>
<sub-module
descriptor="org/icenigrid/gridsam/resource/config/drmaa.xml"/>
<sub-module
descriptor="org/icenigrid/gridsam/resource/config/embedded.xml"/>
<sub-module descriptor="database.xml"/>
<sub-module descriptor="authorisation.xml"/>
</module>
If I understand correctly, this should be all that is required.
However, when I start up GridSAM, it seems to complain about a missing
file:
ERROR [DrmaaDRMConnectorManager] failed to load the DRMAA library -
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so:
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open
shared object file: No such file or directory
However, this file is present on the filesystem:
bash-3.00$ ls -l /vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so
-rwxr-xr-x 1 root root 1635857 May 8 09:17
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so
I also tried on another cluster (mars) and got the same error (except
about its version of libdrmaa.so).
I've no idea where to go from here to sort it out.
Can anyone help?
-------------------------------------------------------------
log of the error from gridsam.log:
2006-08-03 22:28:49,147 DEBUG [DRMConnectorManager] Creating
SingletonProxy for service drmaa.DRMConnectorManager
2006-08-03 22:28:49,181 DEBUG [DRMConnectorManager] Constructing core
service implementation for service drmaa.DRMConnectorManager
2006-08-03 22:28:49,360 DEBUG [DrmaaDRMConnectorManager] failed to load
the DRMAA library -
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so:
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open
shared object file: No such file or directory
java.lang.UnsatisfiedLinkError:
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so:
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open
shared object file: No such file or directory
at java.lang.ClassLoader$NativeLibrary.load(Native Method)
at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1751)
at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1676)
at java.lang.Runtime.loadLibrary0(Runtime.java:822)
at java.lang.System.loadLibrary(System.java:992)
at com.sun.grid.drmaa.SessionImpl$1.run(SessionImpl.java:58)
at java.security.AccessController.doPrivileged(Native Method)
at com.sun.grid.drmaa.SessionImpl.<clinit>(SessionImpl.java:56)
at
com.sun.grid.drmaa.SessionFactoryImpl.getSession(SessionFactoryImpl.java:59)
at
org.icenigrid.gridsam.core.plugin.connector.drmaa.DrmaaDRMConnectorManager.getDrmaaSession(DrmaaDRMConnectorManager.java:61)
at
org.icenigrid.gridsam.core.plugin.connector.drmaa.DrmaaDRMConnectorManager.<clinit>(DrmaaDRMConnectorManager.java:46)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:242)
at
org.apache.hivemind.impl.DefaultClassResolver.lookupClass(DefaultClassResolver.java:101)
at
org.apache.hivemind.impl.DefaultClassResolver.checkForClass(DefaultClassResolver.java:108)
at
org.apache.hivemind.impl.ModuleImpl.resolveType(ModuleImpl.java:191)
at
org.apache.hivemind.service.impl.BuilderFactoryLogic.instantiateCoreServiceInstance(BuilderFactoryLogic.java:100)
at
org.apache.hivemind.service.impl.BuilderFactoryLogic.createService(BuilderFactoryLogic.java:75)
at
org.apache.hivemind.service.impl.BuilderFactory.createCoreServiceImplementation(BuilderFactory.java:42)
at
org.apache.hivemind.impl.InvokeFactoryServiceConstructor.constructCoreServiceImplementation(InvokeFactoryServiceConstructor.java:84)
at
org.apache.hivemind.impl.servicemodel.AbstractServiceModelImpl.constructCoreServiceImplementation(AbstractServiceModelImpl.java:107)
at
org.apache.hivemind.impl.servicemodel.AbstractServiceModelImpl.constructNewServiceImplementation(AbstractServiceModelImpl.java:157)
at
org.apache.hivemind.impl.servicemodel.AbstractServiceModelImpl.constructServiceImplementation(AbstractServiceModelImpl.java:139)
at
org.apache.hivemind.impl.servicemodel.SingletonServiceModel.getActualServiceImplementation(SingletonServiceModel.java:68)
at
$DRMConnectorManager_10cd5f02336._service($DRMConnectorManager_10cd5f02336.java)
at
$DRMConnectorManager_10cd5f02336.initialise($DRMConnectorManager_10cd5f02336.java)
at
$DRMConnectorManager_10cd5f02335.initialise($DRMConnectorManager_10cd5f02335.java)
at
org.icenigrid.gridsam.core.plugin.manager.DelegatingDRMConnectorManager.initialise(DelegatingDRMConnectorManager.java:58)
at
$DRMConnectorManager_10cd5f0232c.initialise($DRMConnectorManager_10cd5f0232c.java)
at
$DRMConnectorManager_10cd5f0232b.initialise($DRMConnectorManager_10cd5f0232b.java)
at
org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.initialise(DefaultJobManager.java:150)
at
org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.sanityCheck(DefaultJobManager.java:184)
at
org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.<init>(DefaultJobManager.java:95)
at
org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.<init>(DefaultJobManager.java:84)
at
org.icenigrid.gridsam.core.plugin.manager.DefaultJobManager.<init>(DefaultJobManager.java:74)
at
org.icenigrid.gridsam.webservice.servlet.JobManagerConfigurator.contextInitialized(JobManagerConfigurator.java:46)
at
org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java:3805)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4321)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:823)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:807)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:595)
at
org.apache.catalina.core.StandardHostDeployer.install(StandardHostDeployer.java:277)
at
org.apache.catalina.core.StandardHost.install(StandardHost.java:832)
at
org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:683)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:432)
at org.apache.catalina.startup.HostConfig.start(HostConfig.java:964)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:349)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1091)
at
org.apache.catalina.core.StandardHost.start(StandardHost.java:789)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1083)
at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:478)
at
org.apache.catalina.core.StandardService.start(StandardService.java:476)
at
org.apache.catalina.core.StandardServer.start(StandardServer.java:2298)
at org.apache.catalina.startup.Catalina.start(Catalina.java:556)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:585)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:284)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:422)
2006-08-03 22:28:49,368 ERROR [DrmaaDRMConnectorManager] failed to load
the DRMAA library -
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so:
/vol/grail/software/sge-6.0/lib/lx24-amd64/libdrmaa.so: cannot open
shared object file: No such file or directory
2006-08-03 22:28:49,368 FATAL [DrmaaDRMConnectorManager] DRMAA
DRMConnector fails to initialise. Please consult the log for advice.
2006-08-03 22:28:49,416 INFO [JobManagerConfigurator] GridSAM machinery
initialised
|
|
From: Nathan H. <rat...@go...> - 2006-08-02 15:17:54
|
Hi, I was wondering if GridSAM provided functionality to allow SGE and Condor to be integrated together. Thanks Nathan |
|
From: Ryan D. <rp...@im...> - 2006-08-02 13:17:49
|
Hi, I am considering using GridSAM to submit MPI jobs to Grid Engine. There is comment in some old documentation that states experimental submission of MPI jobs is supported for Globus. Is it currently possible to submit to Grid Engine using this JSDL extension, or has that not been implemented? Thanks, Ryan. |
|
From: William L. <ww...@do...> - 2006-06-02 10:03:55
|
Sorry, my fault.
It's the directory containing those files must be in the CLASSPATH
(i.e. /home/ligang/OMIICLIENT/conf/). If that doesn't work, you might
need to specify a java system property (using -D in the command-line
or programmatically) "axis.ClientConfigFile" to point to the client-
config.wsdd file location (e.g. -Daxis.ClientConfigFile=/home/ligang/
OMIICLIENT/conf/client-config.wsdd). This is a quirk in the way Axis
loads its client-side configuration.
By the way, I'm cc'ing this e-mail to Vesso Novov, who will take over
the GridSAM development from me. I've also forwarded this e-mail to
the GridSAM mailing-list as a record for others in a similar situation.
Regards,
William
On 2 Jun 2006, at 10:45, Kevin X. Yang wrote:
> Dear William,
> I have added both crypto.properties file and the client-
> config.wsdd to my CLASSPATH. When "echo $CLASSPATH", I can see it
> displays as below: "/home/ligang/OMIICLIENT/conf/crypto.properties:/
> home/ligang/OMIICLIENT/conf/client-config.wsdd"
>
> After compile and run, I still got the same exception "failed to
> submit job: SecurityContextInitHandler: Request does not contain
> required Security header",same as I described in my previous email.
>
> Is there anything wrong? Have you read the attached source code? In
> the code, should I do some security authentication?
>
> Please help me!
>
> Many thanks,and I look forward to hearing from you.
> Best regards Kevin
>
>
>
> On Jun 1 2006, William Lee wrote:
>
>>
>> You will have to have a crypto.properties file and the client-
>> config.wsdd file in your CLASSPATH in order to enable the WS-
>> Security support in the client. You can find sample files in the
>> client distribution.
>>
>> You are correct that you don't need to have the client
>> distribution in order to use the GridSAM client API. Please note
>> the crypto.properties file also refers to the location of the
>> keystore file to use to sign/verify message signature.
>>
>> If you are using maven for building code, I would recommend you
>> to use the 2.0.0-SNAPSHOT which decouples the client/service
>> APIs completely.
>>
>> William
>>
>> On 1 Jun 2006, at 17:00, Kevin X. Yang wrote:
>>
>>> Dear William,
>>>
>>> We have decided to adopt GridSam in MaterialsGrid project, which
>>> is a "sister" project of eMinerals project. We have
>>> successfully installed Gridsam 1.1.0, and we can successfully
>>> submit a job to GridSam Web service using gridsam-submit command
>>> line.
>>>
>>> Because we need to develop a grid portal,as a starting point, I
>>> am now trying to write some java code using APIs to call GridSam
>>> web service to submit/monitor/cancel job(s). I have tried using
>>> both (i) ClientSideJobManager.submit(), and (ii)
>>> GridSAMClientSupport.submitJob () to submit jobs, but eventually
>>> I failed and got the same exceptions:
>>>
>>> "failed to submit job: SecurityContextInitHandler: Request does
>>> not contain required Security header".
>>>
>>> Having used various means to trace/debug the source code, I
>>> found the problem eventually comes from following statement in
>>> GridSAMClientSupport.submit():
>>>
>>> Object xRet = xCall.invoke(new Object[] { xRequest });
>>> I have System.out.print(xRequest) for your reference(See below).
>>> I've aso attached source code for your reference. See attached.
>>>
>>> Here are my questions:
>>>
>>> 1) Is this because I missed adding security header to the
>>> xRequest? If so, in the java code,how should I do this and do
>>> the related security configurations?
>>>
>>> 2) What is the common practice of using GridSam APIs to submit/
>>> monitor/cancel job(s) to GridSam Web Service? I would be greatly
>>> appreciated if you could also provide some sample codes.
>>>
>>> 3) In terms of using GridSam APIs to write code to submit job to
>>> GridSam Web Service, we don't have to install any GridSam client
>>> on the local machine. Is that right?
>>> Thank you very much, and I look forward to hearing from you.
>>> With my best regards Kevin
>>> System.out.print(xRequest) as follows:
>>>
>>> <gridsam:submitJob xmlns:gridsam="http://www.icenigrid.org/
>>> service/ gridsam" startSuspended="false">
>>> <gridsam:JobDescription> <ns1:JobDefinition xmlns="http://
>>> schemas.ggf.org/jsdl/2005/06/jsdl" xmlns:ns1="http://
>>> schemas.ggf.org/jsdl/2005/06/jsdl">
>>> <ns1:JobDescription>
>>> <ns1:Application> <ns2:POSIXApplicationxmlns="http://
>>> schemas.ggf.org/jsdl/2005/06/jsdl-posix" xmlns:ns2="http://
>>> schemas.ggf.org/jsdl/2005/06/jsdl-posix">
>>> <ns2:Executable>/bin/echo</ns2:Executable>
>>> <ns2:Argument>Hello World!!</ns2:Argument>
>>> </ns2:POSIXApplication>
>>> </ns1:Application>
>>> </ns1:JobDescription>
>>> </ns1:JobDefinition></gridsam:JobDescription></gridsam:submitJob>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> <AccessGridSamService.java>
>>
>> --- William Lee - Software Coordinator ---
>> --- London e-Science Centre, Imperial College London ---
>> A: Room 211a, London e-Science Centre, William Penney Laboratory,
>> Imperial College London, South Kensington, London, SW7 2AZ, UK
>> E: wwhl at doc.ic.ac.uk | william at imageunion.com
>> W: www.lesc.ic.ac.uk | www.imageunion.com
>> P: +44 (0) 207 594 8185
>>
>>
>>
--- William Lee - Software Coordinator ---
--- London e-Science Centre, Imperial College London ---
A: Room 211a, London e-Science Centre, William Penney Laboratory,
Imperial College London, South Kensington, London, SW7 2AZ, UK
E: wwhl at doc.ic.ac.uk | william at imageunion.com
W: www.lesc.ic.ac.uk | www.imageunion.com
P: +44 (0) 207 594 8185
|
|
From: William L. <ww...@do...> - 2006-05-09 00:27:21
|
Can you send me the stacktrace and the version number you were using =20
in the first instance?
William
On 28 Apr 2006, at 20:56, Liang Chen wrote:
> Hi William,
>
> I got the following error when querying the status of a job. In =20
> fact, it
> doesn't look like something complicated at all, but confusing =20
> enough at the
> moment.
>
> Here it is:
>
> 15 omii@trout1% gridsam-status -s
> http://HOST:18080/gridsam/services/gridsam?WSDL
> urn:gridsam:00908c5f0add0555010adfd5d85b5786
> 2006-04-28 11:48:25,352 FATAL [GridSAMStatus] (main:) unable to =20
> retrieve job
> status: failed to retrieve status of job:
> java.lang.IllegalArgumentException: Adding text to an XML document =20
> must not
> be null
>
> The =93IllegalArgumentException=94 is directly thrown by Dom4j when a
> Text-to-be-set is null. The code segment looks like:
>
> public Text createText(String text) {
> 172 if (text =3D=3D null) {
> 173 String msg =3D "Adding text to an XML document must =20=
> not be
> null";
> 174 throw new IllegalArgumentException(msg);
> 175 }
> 176
> 177 return new DefaultText(text);
> 178 }
>
> And then back to GridSAM (1.0) source code, where a text is set (if =20=
> related
> to Text) is at GridSAMClientSupport.AxisBaseImpl.getJobStatus,
>
> Document xBodyDoc =3D DocumentFactory.getInstance().createDocument();
> xBodyDoc.addElement(
>
> DocumentFactory.getInstance().createQName("getJobStatus",
> GridSAMSupport.GRIDSAM_NAMESPACE))
> .addElement(
>
> DocumentFactory.getInstance().createQName("JobIdentifier",
> =20
> GridSAMSupport.GRIDSAM_NAMESPACE))
> .addElement(
>
> DocumentFactory.getInstance().createQName("ID",
> =20
> GridSAMSupport.GRIDSAM_NAMESPACE))
> .setText(pJobID.trim());
>
> A text is set in last line. However, isn't pJobID the id taken from =20=
> the
> command line, how could it possibly to be null?
>
> Or am I looking at the wrong place?
>
> Regards,
>
> Ben
>
> University College London Office: 7.06
> Dept. of Computer Science Tel: +44 (0)20 7679 0370 (Direct Dial)
> Gower Street Internal: 30370
> London WC1E 6BT Fax: +44 (0)20 7387 1397
> United Kingdom Email: B....@cs...
>
>
>
>
> -------------------------------------------------------
> Using Tomcat but need to do more? Need to support web services, =20
> security?
> Get stuff done quickly with pre-integrated technology to make your =20
> job easier
> Download IBM WebSphere Application Server v.1.0.1 based on Apache =20
> Geronimo
> http://sel.as-us.falkag.net/sel?cmd=3Dlnk&kid=120709&bid&3057&dat=121642=
> _______________________________________________
> GridSAM-Discuss mailing list
> Gri...@li...
> https://lists.sourceforge.net/lists/listinfo/gridsam-discuss
--- William Lee - Software Coordinator ---
--- London e-Science Centre, Imperial College London ---
A: Room 211a, London e-Science Centre, William Penney Laboratory, =20
Imperial College London, South Kensington, London, SW7 2AZ, UK
E: wwhl at doc.ic.ac.uk | william at imageunion.com
W: www.lesc.ic.ac.uk | www.imageunion.com
P: +44 (0) 207 594 8185
|
|
From: Garry S. <gar...@po...> - 2006-05-05 08:26:10
|
Hi William, William Lee wrote: > I'm on holiday at the moment. Would you kindly tell me which version > you were using? GridSAM 1.1.0 rc1, OMII 2.3.0. Java 1.5, default Java Heap size Server: uname -a Linux holly 2.4.32 #1 SMP Fri Nov 25 14:41:06 GMT 2005 i686 GNU/Linux Dual processor, 2GB physical memory. regards Garry > > William > > On 5/5/06, Garry Smith <gar...@po...> wrote: > >> Hi William, >> >> Further to my previous email (with the helloworld.jsdl tests), I have >> resorted to increasing the Java heap size again as used by Tomcat. >> This action does not solve the problem but does mean that the GridSAMs >> can last longer between reboots >> >> As you can see below, with a 400MByte heap size I executed 55 iterations >> of 30 jobs each. >> GridSAM was responsive for each iteration and did not exhibit the >> slow-down experienced with the >> default heap size. >> >> The worry for me is the amount of additional memory that is being >> consumed by each iteration and that is not being released. I estimate >> that I would be able to get approx 110 iterations of 30 jobs each before >> the GridSAM stops accepting jobs again. I assume this value will be >> lower given a JSDL that contains data staging elements. >> >> These tests were to the same GridSAM as before and using the fork >> jobmanager again. It will be interesting to see if the values are >> different using the Condor, ssh and Globus jobmanagers. >> >> >> Have you tested the GridSAM core independently of Tomcat and Axis? It >> would be nice if we could pin down the memory usage. >> >> regards >> Garry >> >> >> Using top to follow memory usage: >> >> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND >> >> 22288 omii 9 0 185m 185m 13m S 0.0 9.2 0:20.96 java >> #tomcat started up >> >> 22288 omii 9 0 205m 205m 18m S 0.0 10.2 0:20.96 java >> #after 1st iteration >> >> 22288 omii 9 0 199m 199m 18m S 0.0 9.9 0:20.96 java >> #after 2nd iteration >> >> 22288 omii 9 0 200m 200m 18m S 0.0 9.9 0:20.96 java # >> after 3rd iteration >> >> 22288 omii 9 0 200m 200m 14m S 0.0 9.9 0:20.96 java >> #after 4th iteration >> >> 22288 omii 9 0 205m 205m 16m S 0.0 10.2 0:20.96 java >> #after 5th iteration >> >> 22288 omii 9 0 209m 209m 16m S 0.0 10.4 0:20.96 java # >> after 6th iteration >> >> 22288 omii 9 0 213m 213m 16m S 0.0 10.6 0:20.96 java # >> after 7th iteration >> >> 22288 omii 9 0 215m 215m 14m S 0.0 10.7 0:20.96 java # >> after 8th iteration >> >> 22288 omii 9 0 221m 221m 16m S 0.0 11.0 0:20.96 java >> #after 9th iteration >> >> 22288 omii 9 0 223m 223m 16m S 0.0 11.1 0:20.96 java >> #after 10th iteration >> >> 22288 omii 9 0 223m 223m 16m S 0.0 11.0 0:20.96 java >> #after 11th iteration >> >> 22288 omii 9 0 221m 221m 14m S 0.0 11.0 0:20.96 java >> #after 12th iteration >> >> 22288 omii 9 0 222m 222m 14m S 0.0 11.0 0:20.96 java >> #after 13th iteration >> >> 22288 omii 9 0 223m 223m 14m S 0.0 11.1 0:20.96 java >> #after 14th iteration >> >> 22288 omii 9 0 225m 225m 14m S 0.0 11.1 0:20.96 java >> #after 15th iteration >> >> 22288 omii 9 0 226m 226m 14m S 0.0 11.2 0:20.96 java >> #after 16th iteration >> >> 22288 omii 9 0 227m 227m 14m S 0.0 11.3 0:20.96 java >> #after 17th iteration >> >> 22288 omii 9 0 229m 229m 14m S 0.0 11.3 0:20.96 java >> #after 18th iteration >> >> 22288 omii 9 0 230m 230m 14m S 0.0 11.4 0:20.96 java >> #after 19th iteration >> >> 22288 omii 9 0 231m 231m 14m S 0.0 11.5 0:20.96 java >> #after 20th iteration >> >> 22288 omii 9 0 233m 233m 14m S 0.0 11.5 0:20.96 java >> #after 21st iteration >> >> 22288 omii 9 0 235m 235m 14m S 0.0 11.6 0:20.96 java >> #after 22nd iteration >> >> 22288 omii 9 0 236m 236m 14m S 0.0 11.7 0:20.96 java >> #after 23rd iteration >> >> 22288 omii 9 0 237m 237m 14m S 0.0 11.7 0:20.96 java >> #after 24th iteration >> >> 22288 omii 9 0 240m 240m 14m S 0.0 11.9 0:20.96 java >> #after 25th iteration >> >> 22288 omii 9 0 242m 242m 14m S 0.0 12.0 0:20.96 java >> #after 26th iteration >> >> 22288 omii 9 0 243m 243m 14m S 0.0 12.0 0:20.96 java >> #after 27th iteration >> >> 22288 omii 9 0 244m 244m 14m S 0.0 12.1 0:20.96 java >> #after 28th iteration >> >> 22288 omii 9 0 246m 246m 14m S 0.0 12.2 0:20.96 java >> #after 29th iteration >> >> 22288 omii 9 0 247m 247m 14m S 0.0 12.3 0:20.96 java >> #after 30th iteration >> >> 22288 omii 9 0 250m 250m 14m S 0.0 12.4 0:20.96 java >> #after 31st iteration >> >> 22288 omii 9 0 251m 251m 14m S 0.0 12.5 0:20.96 java >> #after 32nd iteration >> >> 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java >> #after 33rd iteration >> >> 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java >> #after 34th iteration >> >> 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java >> #after 35th iteration >> >> 22288 omii 9 0 256m 256m 14m S 0.0 12.7 0:20.96 java >> #after 36th iteration >> >> 22288 omii 9 0 262m 262m 14m S 0.0 13.0 0:20.96 java >> #after 37th iteration >> >> 22288 omii 9 0 263m 263m 14m S 0.0 13.1 0:20.96 java >> #after 38th iteration >> >> 22288 omii 9 0 265m 265m 14m S 0.0 13.1 0:20.96 java >> #after 39th iteration >> >> 22288 omii 9 0 266m 266m 14m S 0.0 13.2 0:20.96 java >> #after 40th iteration >> >> 22288 omii 9 0 268m 268m 14m S 0.0 13.3 0:20.96 java >> #after 41st iteration >> >> 22288 omii 9 0 271m 271m 14m S 0.0 13.4 0:20.96 java >> #after 42nd iteration >> >> 22288 omii 9 0 272m 272m 14m S 0.0 13.5 0:20.96 java >> #after 43rd iteration >> >> 22288 omii 9 0 274m 274m 14m S 0.0 13.6 0:20.96 java >> #after 44th iteration >> >> 22288 omii 9 0 275m 275m 14m S 0.0 13.7 0:20.96 java >> #after 45th iteration >> >> 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java # >> after 46th iteration >> >> 22288 omii 9 0 277m 276m 14m S 0.0 13.7 0:20.96 java >> #after 47th iteration >> >> 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java >> #after 48th iteration >> >> 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java >> #after 49th iteration >> >> 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java >> #after 50th iteration >> >> 22288 omii 9 0 278m 278m 14m S 0.0 13.8 0:20.96 java >> #after 51st iteration >> >> 22288 omii 9 0 280m 280m 14m S 0.0 13.9 0:20.96 java >> #after 52nd iteration >> >> 22288 omii 9 0 281m 281m 14m S 0.0 14.0 0:20.96 java >> #after 53rd iteration >> >> 22288 omii 9 0 284m 284m 14m S 0.0 14.1 0:20.96 java >> #after 54th iteration >> >> 22288 omii 9 0 285m 285m 14m S 0.0 14.1 0:20.96 java >> #after 55th iteration. >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > |
|
From: William L. <ww...@do...> - 2006-05-04 23:13:51
|
I'm on holiday at the moment. Would you kindly tell me which version you were using? William On 5/5/06, Garry Smith <gar...@po...> wrote: > Hi William, > > Further to my previous email (with the helloworld.jsdl tests), I have > resorted to increasing the Java heap size again as used by Tomcat. > This action does not solve the problem but does mean that the GridSAMs > can last longer between reboots > > As you can see below, with a 400MByte heap size I executed 55 iterations > of 30 jobs each. > GridSAM was responsive for each iteration and did not exhibit the > slow-down experienced with the > default heap size. > > The worry for me is the amount of additional memory that is being > consumed by each iteration and that is not being released. I estimate > that I would be able to get approx 110 iterations of 30 jobs each before > the GridSAM stops accepting jobs again. I assume this value will be > lower given a JSDL that contains data staging elements. > > These tests were to the same GridSAM as before and using the fork > jobmanager again. It will be interesting to see if the values are > different using the Condor, ssh and Globus jobmanagers. > > > Have you tested the GridSAM core independently of Tomcat and Axis? It > would be nice if we could pin down the memory usage. > > regards > Garry > > > Using top to follow memory usage: > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > > 22288 omii 9 0 185m 185m 13m S 0.0 9.2 0:20.96 java > #tomcat started up > > 22288 omii 9 0 205m 205m 18m S 0.0 10.2 0:20.96 java > #after 1st iteration > > 22288 omii 9 0 199m 199m 18m S 0.0 9.9 0:20.96 java > #after 2nd iteration > > 22288 omii 9 0 200m 200m 18m S 0.0 9.9 0:20.96 java # > after 3rd iteration > > 22288 omii 9 0 200m 200m 14m S 0.0 9.9 0:20.96 java > #after 4th iteration > > 22288 omii 9 0 205m 205m 16m S 0.0 10.2 0:20.96 java > #after 5th iteration > > 22288 omii 9 0 209m 209m 16m S 0.0 10.4 0:20.96 java # > after 6th iteration > > 22288 omii 9 0 213m 213m 16m S 0.0 10.6 0:20.96 java # > after 7th iteration > > 22288 omii 9 0 215m 215m 14m S 0.0 10.7 0:20.96 java # > after 8th iteration > > 22288 omii 9 0 221m 221m 16m S 0.0 11.0 0:20.96 java > #after 9th iteration > > 22288 omii 9 0 223m 223m 16m S 0.0 11.1 0:20.96 java > #after 10th iteration > > 22288 omii 9 0 223m 223m 16m S 0.0 11.0 0:20.96 java > #after 11th iteration > > 22288 omii 9 0 221m 221m 14m S 0.0 11.0 0:20.96 java > #after 12th iteration > > 22288 omii 9 0 222m 222m 14m S 0.0 11.0 0:20.96 java > #after 13th iteration > > 22288 omii 9 0 223m 223m 14m S 0.0 11.1 0:20.96 java > #after 14th iteration > > 22288 omii 9 0 225m 225m 14m S 0.0 11.1 0:20.96 java > #after 15th iteration > > 22288 omii 9 0 226m 226m 14m S 0.0 11.2 0:20.96 java > #after 16th iteration > > 22288 omii 9 0 227m 227m 14m S 0.0 11.3 0:20.96 java > #after 17th iteration > > 22288 omii 9 0 229m 229m 14m S 0.0 11.3 0:20.96 java > #after 18th iteration > > 22288 omii 9 0 230m 230m 14m S 0.0 11.4 0:20.96 java > #after 19th iteration > > 22288 omii 9 0 231m 231m 14m S 0.0 11.5 0:20.96 java > #after 20th iteration > > 22288 omii 9 0 233m 233m 14m S 0.0 11.5 0:20.96 java > #after 21st iteration > > 22288 omii 9 0 235m 235m 14m S 0.0 11.6 0:20.96 java > #after 22nd iteration > > 22288 omii 9 0 236m 236m 14m S 0.0 11.7 0:20.96 java > #after 23rd iteration > > 22288 omii 9 0 237m 237m 14m S 0.0 11.7 0:20.96 java > #after 24th iteration > > 22288 omii 9 0 240m 240m 14m S 0.0 11.9 0:20.96 java > #after 25th iteration > > 22288 omii 9 0 242m 242m 14m S 0.0 12.0 0:20.96 java > #after 26th iteration > > 22288 omii 9 0 243m 243m 14m S 0.0 12.0 0:20.96 java > #after 27th iteration > > 22288 omii 9 0 244m 244m 14m S 0.0 12.1 0:20.96 java > #after 28th iteration > > 22288 omii 9 0 246m 246m 14m S 0.0 12.2 0:20.96 java > #after 29th iteration > > 22288 omii 9 0 247m 247m 14m S 0.0 12.3 0:20.96 java > #after 30th iteration > > 22288 omii 9 0 250m 250m 14m S 0.0 12.4 0:20.96 java > #after 31st iteration > > 22288 omii 9 0 251m 251m 14m S 0.0 12.5 0:20.96 java > #after 32nd iteration > > 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java > #after 33rd iteration > > 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java > #after 34th iteration > > 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java > #after 35th iteration > > 22288 omii 9 0 256m 256m 14m S 0.0 12.7 0:20.96 java > #after 36th iteration > > 22288 omii 9 0 262m 262m 14m S 0.0 13.0 0:20.96 java > #after 37th iteration > > 22288 omii 9 0 263m 263m 14m S 0.0 13.1 0:20.96 java > #after 38th iteration > > 22288 omii 9 0 265m 265m 14m S 0.0 13.1 0:20.96 java > #after 39th iteration > > 22288 omii 9 0 266m 266m 14m S 0.0 13.2 0:20.96 java > #after 40th iteration > > 22288 omii 9 0 268m 268m 14m S 0.0 13.3 0:20.96 java > #after 41st iteration > > 22288 omii 9 0 271m 271m 14m S 0.0 13.4 0:20.96 java > #after 42nd iteration > > 22288 omii 9 0 272m 272m 14m S 0.0 13.5 0:20.96 java > #after 43rd iteration > > 22288 omii 9 0 274m 274m 14m S 0.0 13.6 0:20.96 java > #after 44th iteration > > 22288 omii 9 0 275m 275m 14m S 0.0 13.7 0:20.96 java > #after 45th iteration > > 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java # > after 46th iteration > > 22288 omii 9 0 277m 276m 14m S 0.0 13.7 0:20.96 java > #after 47th iteration > > 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java > #after 48th iteration > > 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java > #after 49th iteration > > 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java > #after 50th iteration > > 22288 omii 9 0 278m 278m 14m S 0.0 13.8 0:20.96 java > #after 51st iteration > > 22288 omii 9 0 280m 280m 14m S 0.0 13.9 0:20.96 java > #after 52nd iteration > > 22288 omii 9 0 281m 281m 14m S 0.0 14.0 0:20.96 java > #after 53rd iteration > > 22288 omii 9 0 284m 284m 14m S 0.0 14.1 0:20.96 java > #after 54th iteration > > 22288 omii 9 0 285m 285m 14m S 0.0 14.1 0:20.96 java > #after 55th iteration. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > |
|
From: Garry S. <gar...@po...> - 2006-05-04 17:16:25
|
Hi William, Further to my previous email (with the helloworld.jsdl tests), I have resorted to increasing the Java heap size again as used by Tomcat. This action does not solve the problem but does mean that the GridSAMs can last longer between reboots As you can see below, with a 400MByte heap size I executed 55 iterations of 30 jobs each. GridSAM was responsive for each iteration and did not exhibit the slow-down experienced with the default heap size. The worry for me is the amount of additional memory that is being consumed by each iteration and that is not being released. I estimate that I would be able to get approx 110 iterations of 30 jobs each before the GridSAM stops accepting jobs again. I assume this value will be lower given a JSDL that contains data staging elements. These tests were to the same GridSAM as before and using the fork jobmanager again. It will be interesting to see if the values are different using the Condor, ssh and Globus jobmanagers. Have you tested the GridSAM core independently of Tomcat and Axis? It would be nice if we could pin down the memory usage. regards Garry Using top to follow memory usage: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 22288 omii 9 0 185m 185m 13m S 0.0 9.2 0:20.96 java #tomcat started up 22288 omii 9 0 205m 205m 18m S 0.0 10.2 0:20.96 java #after 1st iteration 22288 omii 9 0 199m 199m 18m S 0.0 9.9 0:20.96 java #after 2nd iteration 22288 omii 9 0 200m 200m 18m S 0.0 9.9 0:20.96 java # after 3rd iteration 22288 omii 9 0 200m 200m 14m S 0.0 9.9 0:20.96 java #after 4th iteration 22288 omii 9 0 205m 205m 16m S 0.0 10.2 0:20.96 java #after 5th iteration 22288 omii 9 0 209m 209m 16m S 0.0 10.4 0:20.96 java # after 6th iteration 22288 omii 9 0 213m 213m 16m S 0.0 10.6 0:20.96 java # after 7th iteration 22288 omii 9 0 215m 215m 14m S 0.0 10.7 0:20.96 java # after 8th iteration 22288 omii 9 0 221m 221m 16m S 0.0 11.0 0:20.96 java #after 9th iteration 22288 omii 9 0 223m 223m 16m S 0.0 11.1 0:20.96 java #after 10th iteration 22288 omii 9 0 223m 223m 16m S 0.0 11.0 0:20.96 java #after 11th iteration 22288 omii 9 0 221m 221m 14m S 0.0 11.0 0:20.96 java #after 12th iteration 22288 omii 9 0 222m 222m 14m S 0.0 11.0 0:20.96 java #after 13th iteration 22288 omii 9 0 223m 223m 14m S 0.0 11.1 0:20.96 java #after 14th iteration 22288 omii 9 0 225m 225m 14m S 0.0 11.1 0:20.96 java #after 15th iteration 22288 omii 9 0 226m 226m 14m S 0.0 11.2 0:20.96 java #after 16th iteration 22288 omii 9 0 227m 227m 14m S 0.0 11.3 0:20.96 java #after 17th iteration 22288 omii 9 0 229m 229m 14m S 0.0 11.3 0:20.96 java #after 18th iteration 22288 omii 9 0 230m 230m 14m S 0.0 11.4 0:20.96 java #after 19th iteration 22288 omii 9 0 231m 231m 14m S 0.0 11.5 0:20.96 java #after 20th iteration 22288 omii 9 0 233m 233m 14m S 0.0 11.5 0:20.96 java #after 21st iteration 22288 omii 9 0 235m 235m 14m S 0.0 11.6 0:20.96 java #after 22nd iteration 22288 omii 9 0 236m 236m 14m S 0.0 11.7 0:20.96 java #after 23rd iteration 22288 omii 9 0 237m 237m 14m S 0.0 11.7 0:20.96 java #after 24th iteration 22288 omii 9 0 240m 240m 14m S 0.0 11.9 0:20.96 java #after 25th iteration 22288 omii 9 0 242m 242m 14m S 0.0 12.0 0:20.96 java #after 26th iteration 22288 omii 9 0 243m 243m 14m S 0.0 12.0 0:20.96 java #after 27th iteration 22288 omii 9 0 244m 244m 14m S 0.0 12.1 0:20.96 java #after 28th iteration 22288 omii 9 0 246m 246m 14m S 0.0 12.2 0:20.96 java #after 29th iteration 22288 omii 9 0 247m 247m 14m S 0.0 12.3 0:20.96 java #after 30th iteration 22288 omii 9 0 250m 250m 14m S 0.0 12.4 0:20.96 java #after 31st iteration 22288 omii 9 0 251m 251m 14m S 0.0 12.5 0:20.96 java #after 32nd iteration 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java #after 33rd iteration 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java #after 34th iteration 22288 omii 9 0 252m 252m 14m S 0.0 12.5 0:20.96 java #after 35th iteration 22288 omii 9 0 256m 256m 14m S 0.0 12.7 0:20.96 java #after 36th iteration 22288 omii 9 0 262m 262m 14m S 0.0 13.0 0:20.96 java #after 37th iteration 22288 omii 9 0 263m 263m 14m S 0.0 13.1 0:20.96 java #after 38th iteration 22288 omii 9 0 265m 265m 14m S 0.0 13.1 0:20.96 java #after 39th iteration 22288 omii 9 0 266m 266m 14m S 0.0 13.2 0:20.96 java #after 40th iteration 22288 omii 9 0 268m 268m 14m S 0.0 13.3 0:20.96 java #after 41st iteration 22288 omii 9 0 271m 271m 14m S 0.0 13.4 0:20.96 java #after 42nd iteration 22288 omii 9 0 272m 272m 14m S 0.0 13.5 0:20.96 java #after 43rd iteration 22288 omii 9 0 274m 274m 14m S 0.0 13.6 0:20.96 java #after 44th iteration 22288 omii 9 0 275m 275m 14m S 0.0 13.7 0:20.96 java #after 45th iteration 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java # after 46th iteration 22288 omii 9 0 277m 276m 14m S 0.0 13.7 0:20.96 java #after 47th iteration 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java #after 48th iteration 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java #after 49th iteration 22288 omii 9 0 277m 277m 14m S 0.0 13.7 0:20.96 java #after 50th iteration 22288 omii 9 0 278m 278m 14m S 0.0 13.8 0:20.96 java #after 51st iteration 22288 omii 9 0 280m 280m 14m S 0.0 13.9 0:20.96 java #after 52nd iteration 22288 omii 9 0 281m 281m 14m S 0.0 14.0 0:20.96 java #after 53rd iteration 22288 omii 9 0 284m 284m 14m S 0.0 14.1 0:20.96 java #after 54th iteration 22288 omii 9 0 285m 285m 14m S 0.0 14.1 0:20.96 java #after 55th iteration. |
|
From: Garry S. <gar...@po...> - 2006-05-04 14:47:47
|
Hi William, Are you seeing this behaviour in your latest version of GridSAM? GridSAM 1.1.0 rc1, OMII 2.3.0. Java 1.5, default Java Heap size Server: uname -a Linux holly 2.4.32 #1 SMP Fri Nov 25 14:41:06 GMT 2005 i686 GNU/Linux Dual processor, 2GB physical memory. I am running into a problem with GridSAM and mutliple concurrent job submission. The JSDL for the following tests is attached (Notice there a not any datastaging elements; when included the situation is worse: I get OutOfMemory Exceptions). The command (jobCommand): kryten:~/jsdl/1.1.0-rc1> ~/bin/OMIICLIENT3_2_0/OMIICLIENT/gridsam/bin/gridsam-submit -s http://holly.dsg.port.ac.uk:8088/gridsam/services/gridsam?WSDL helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl helloworld.jsdl GridSAM configured to use fork: I can submit the jobCommand successfully a maximum of 5 times to the same GridSAM instance before the GridSAM refuses to accept any further connections. Before submitting each jobCommand I wait for previous job instances to complete by monitoring the gridsam.log. The speed at which gridsam processes subsequent iterations gets slower with each iteration. On the sixth attempt I get the following error: 2006-05-02 09:38:59,923 FATAL [GridSAMSubmit] (main:) unable to submit job: failed to submit job: WSDoAllReceiver: cannot convert into document; nested exception is: org.xml.sax.SAXException: Bad envelope tag: html There are no exceptions in Catalina.out and gridsam.log shows the end of the previous successful submission, e.g: 2006-05-02 10:44:09,805 INFO [144516830af4709f010af4785bd7025a] deleting working directory: /tmp/gridsam-144516830af4709f010af4785bd7025a 2006-05-02 10:44:09,821 INFO [144516830af4709f010af47866ed025f] deleting working directory: /tmp/gridsam-144516830af4709f010af47866ed025f In order to get the GridSAM to accept jobs again I stop/start the tomcat. Observing memory usage using top PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17600 omii 9 0 110m 109m 12m S 0.0 6.0 0:20.86 java #tomcat started 17600 omii 9 0 120m 120m 13m S 0.0 6.0 0:20.86 java #After first iteration 17600 omii 9 0 122m 122m 14m S 0.0 6.0 0:20.86 java #After the second iteration 17600 omii 9 0 120m 120m 13m S 0.0 6.0 0:20.86 java #After the third iteration 17600 omii 9 0 120m 120m 13m S 0.0 6.0 0:20.86 java #After the fourth iteration 17600 omii 9 0 121m 121m 13m S 0.0 6.0 0:20.86 java #After the fifth iteration # Sixth iteration fails consistently with the usual failure message: 2006-05-04 12:20:27,255 FATAL [GridSAMSubmit] (main:) unable to submit job: failed to submit job: WSDoAllReceiver: cannot convert into document; nested exception is: org.xml.sax.SAXException: Bad envelope tag: html The Gridsam server memory usage after reading this error message at the client is: 17600 omii 9 0 122m 122m 14m S 0.0 6.1 0:20.86 java Thanks in advance regards Garry |
|
From: Liang C. <B....@cs...> - 2006-04-28 11:57:14
|
Hi William,
I got the following error when querying the status of a job. In fact, it
doesn't look like something complicated at all, but confusing enough at =
the
moment.=20
Here it is: =20
15 omii@trout1% gridsam-status -s
http://HOST:18080/gridsam/services/gridsam?WSDL
urn:gridsam:00908c5f0add0555010adfd5d85b5786
2006-04-28 11:48:25,352 FATAL [GridSAMStatus] (main:) unable to retrieve =
job
status: failed to retrieve status of job:
java.lang.IllegalArgumentException: Adding text to an XML document must =
not
be null
The =93IllegalArgumentException=94 is directly thrown by Dom4j when a
Text-to-be-set is null. The code segment looks like:
public Text createText(String text) {
172 =A0=A0=A0=A0=A0=A0=A0=A0if (text =3D=3D null) {
173 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0String msg =3D "Adding text to =
an XML document must not be
null";
174 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0throw new =
IllegalArgumentException(msg);
175 =A0=A0=A0=A0=A0=A0=A0=A0}
176=20
177 =A0=A0=A0=A0=A0=A0=A0=A0return new DefaultText(text);
178 =A0=A0=A0=A0}=20
And then back to GridSAM (1.0) source code, where a text is set (if =
related
to Text) is at GridSAMClientSupport.AxisBaseImpl.getJobStatus,=20
Document xBodyDoc =3D DocumentFactory.getInstance().createDocument();
xBodyDoc.addElement(
=20
DocumentFactory.getInstance().createQName("getJobStatus",
GridSAMSupport.GRIDSAM_NAMESPACE))
.addElement(
=20
DocumentFactory.getInstance().createQName("JobIdentifier",
=
GridSAMSupport.GRIDSAM_NAMESPACE))
.addElement(
=20
DocumentFactory.getInstance().createQName("ID",
=
GridSAMSupport.GRIDSAM_NAMESPACE))
.setText(pJobID.trim());
A text is set in last line. However, isn't pJobID the id taken from the
command line, how could it possibly to be null?
Or am I looking at the wrong place?
Regards,
Ben
University College London=A0 Office: 7.06
Dept. of Computer Science=A0 Tel: +44 (0)20 7679 0370 (Direct Dial)
Gower Street=A0=A0=A0 Internal: 30370
London WC1E 6BT=A0=A0=A0 Fax: +44 (0)20 7387 1397
United Kingdom=A0=A0=A0 Email: B....@cs...
|
|
From: G.T. C. <gt...@ca...> - 2006-04-07 15:33:03
|
Dear William
the problem has been clearified. it is due to that our condor pool can
not handle relatvie path of input files. since i change all input files and
put in the same directory, it wokrs fine in this situation. basically,
Gridsam is wokring fine right now, we will fix the condor problem later!!
thank you very much for your helping!!
Best Regard!!
gen-tao
On Apr 7 2006, William Lee wrote:
>According to the Condor documentation, the transfer_output_files
>attribute is best to be left off. That's the reason why it's not in
>the classad.groovy script in the first place. If your condor system
>handle file output staging correctly, the code in the classad.groovy
>file that generates the "transfer_output_files" attribute can be
>commented out.
>
>William
>
>On 7 Apr 2006, at 10:11, G.T. Chiang wrote:
>
>> Dear William
>>
>> thakn you very much! this version works! right now i can
>> see .condor.script in the condor-submitter. i have tested one of
>> the testing job, the jsdl is like the following
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl">
>> <JobDescription>
>> <Application>
>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/
>> 2005/06/jsdl-posix">
>> <Executable>/bin/echo</Executable>
>> <Argument>hello hihi</Argument>
>> <Output>stdout.txt</Output>
>> </POSIXApplication>
>> </Application>
>> <DataStaging>
>> <FileName>unknown-file.txt</FileName>
>> <CreationFlag>overwrite</CreationFlag>
>> <DeleteOnTermination>true</DeleteOnTermination>
>> <Target>
>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/
>> output/subdir/output-file.txt</URI>
>> </Target>
>> </DataStaging>
>> </JobDescription>
>> </JobDefinition>
>>
>>
>> the condor.scrip is like the foolwing
>> universe=vanilla
>> when_to_transfer_output=ON_EXIT
>> should_transfer_files=IF_NEEDED
>> should_transfer_files=YES
>> notification=Never
>> log=/tmp/condor.log
>>
>>
>> executable=/bin/echo
>> arguments=hello gen-tao
>> output=stdout.txt
>>
>>
>>
>> transfer_output_files=unknown-file.txt
>> queue
>>
>>
>> however, this jobs are keep in idel status in condor. however, if i
>> remove the transfer_outpu_files=unknow-file.txt, the job can be
>> executed. i have seen that the similar problem wiht GT4 and condor.
>> it seems that is condor problme. is that relatd to file system?
>> sorry, i think the gridsam is wokring now! just need to figure out
>> what's wrong in our condor pool!!
>>
>> thank you very much!!!
>>
>> Best Regard!
>> gen-tao
>>
>>
>>
>>
>>
>>
>>
>> On Apr 6 2006, William Lee wrote:
>>
>>> Thanks. The classad.groovy copy I have put up is for later
>>> version that is not yet bundled with OMII.
>>>
>>> I have an updated copy for OMII 2.3.3 at the same location.
>>> Please try that instead.
>>>
>>> William
>>>
>>>
>>> On 6 Apr 2006, at 15:06, OMII Support wrote:
>>>
>>>> When replying, type your text above this line. Notification of
>>>> Query Change
>>>>
>>>>
>>>> Priority: Normal Status: Agent Replied
>>>> Creation Date: 03/04/2006 Creation Time: 13:30:47
>>>> Created By: ge...@ni...
>>>>
>>>> Click here to view Query in Browser
>>>>
>>>> Description:
>>>> Entered on 06/04/2006 at 15:06:02 by William Lee (GridSAM):
>>>> May I ask which version of GridSAM you are using? If it's from the
>>>> OMII bundle, which OMII version?
>>>>
>>>> William
>>>>
>>>> On 6 Apr 2006, at 14:58, G.T. Chiang wrote:
>>>>
>>>> > Dear William
>>>> >
>>>> > thakn you very much!! this verison is getting better, the
>>>> > following is the gridsam-status results. at least job is being
>>>> > processing via condor, but somehow it fails.
>>>> >
>>>> > [root@agorilla examples]# gridsam-status -s "http://
>>>> > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?
>>>> wsdl"
>>>> > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress:
>>>> pending -
>>>> > > staging-in -> staged-in -> active -> failed
>>>> >
>>>> > --- pending - 2006-04-06 14:52:09.0 ---
>>>> > job is being scheduled
>>>> > --- staging-in - 2006-04-06 14:52:09.0 ---
>>>> > staging files...
>>>> > --- staged-in - 2006-04-06 14:52:09.0 ---
>>>> > 2 files staged in
>>>> > --- active - 2006-04-06 14:52:09.0 ---
>>>> > job is being launched through condor
>>>> > --- failed - 2006-04-06 14:52:09.0 ---
>>>> > expecting job property urn:condor:classad from previous stage
>>>> >
>>>> > --------------
>>>> > Job Properties
>>>> > --------------
>>>> > urn:gridsam:Description=cat job description
>>>> > urn:gridsam:JobProject=gridsam project
>>>> > urn:gridsam:JobAnnotation=no annotation
>>>> > urn:gridsam:JobName=cat job
>>>> > [root@agorilla examples]#
>>>> >
>>>> >
>>>> > the following is the log from gridsam.log
>>>> >
>>>> > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > state {pending} reached 2006-04-06 14:52:09,574 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] initialised working
>>>> directory: /
>>>> > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06
>>>> > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state
>>>> {staging-
>>>> > in} reached 2006-04-06 14:52:09,662 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http://
>>>> > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp://
>>>> > gri...@ce.../tmp/
>>>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06
>>>> > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/
>>>> file1.txt
>>>> > staged 2006-04-06 14:52:09,791 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp://
>>>> > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt ->
>>>> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt
>>>> > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached
>>>> > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > executing groovy script classad.groovy 2006-04-06 14:52:09,871
>>>> INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] executed groovy script
>>>> > classad.groovy 2006-04-06 14:52:09,898 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {active} reached
>>>> > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021]
>>>> > Failed to submit condor job: expecting job property
>>>> > urn:condor:classad from previous stage 2006-04-06 14:52:09,903
>>>> INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {failed} reached
>>>> > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > failed 1144331529450 1144331529607 1144331529843 1144331529898
>>>> > 1144331529903
>>>> >
>>>> >
>>>> > is it possible to obtain the condor job description file which
>>>> > converted from gridsam JSDL. can I try to submit it to condor
>>>> > directly?
>>>> >
>>>> > Best Regard!
>>>> > gen-tao
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Apr 6 2006, William Lee wrote:
>>>> >
>>>> >>
>>>> >> Please try the classad.groovy script at this location. It
>>>> >> incorporates a solution that sets up the transfer_input_files and
>>>> >> the transfer_output_files classad attributes in the JSDL-to-
>>>> >> Classad translation. This is needed if the submission node (the
>>>> >> node which GridSAM is running) does not necessarily share a
>>>> >> common file system with the execution nodes.
>>>> >>
>>>> >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>> >>
>>>> >> William
>>>> >>
>>>> >> On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>> >>
>>>> >>> [Duplicate message snipped]
>>>>
>>>> Entered on 06/04/2006 at 15:00:02 by gt...@ca...:
>>>> Dear William
>>>>
>>>> thakn you very much!! this verison is getting better, the
>>>> following is
>>>> the gridsam-status results. at least job is being processing via
>>>> condor,
>>>> but somehow it fails.
>>>>
>>>> [root@agorilla examples]# gridsam-status -s
>>>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/
>>>> gridsam?wsdl"
>>>> urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress:
>>>> pending ->
>>>> staging-in -> staged-in -> active -> failed
>>>>
>>>> --- pending - 2006-04-06 14:52:09.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-06 14:52:09.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-06 14:52:09.0 ---
>>>> 2 files staged in
>>>> --- active - 2006-04-06 14:52:09.0 ---
>>>> job is being launched through condor
>>>> --- failed - 2006-04-06 14:52:09.0 ---
>>>> expecting job property urn:condor:classad from previous stage
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:gridsam:Description=cat job description
>>>> urn:gridsam:JobProject=gridsam project
>>>> urn:gridsam:JobAnnotation=no annotation
>>>> urn:gridsam:JobName=cat job
>>>> [root@agorilla examples]#
>>>>
>>>> the following is the log from gridsam.log
>>>>
>>>> 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> state
>>>> {pending} reached 2006-04-06 14:52:09,574 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] initialised working directory:
>>>> /tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06
>>>> 14:52:09,607 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] state {staging-in} reached
>>>> 2006-04-06
>>>> 14:52:09,662 INFO [006868a90a6f50f8010a6f796cea0021] staging
>>>> (copy) file
>>>> http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt ->
>>>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt
>>>> 2006-04-06 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> dir1/file1.txt staged 2006-04-06 14:52:09,791 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] staging (copy) file
>>>> ftp://anonymous:anonymous@128.232.232.41:19245/subdir/input-
>>>> file.txt ->
>>>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt
>>>> 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached
>>>> 2006-04-06
>>>> 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] executing
>>>> groovy
>>>> script classad.groovy 2006-04-06 14:52:09,871 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] executed groovy script
>>>> classad.groovy
>>>> 2006-04-06 14:52:09,898 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> state
>>>> {active} reached 2006-04-06 14:52:09,903 ERROR
>>>> [006868a90a6f50f8010a6f796cea0021] Failed to submit condor job:
>>>> expecting
>>>> job property urn:condor:classad from previous stage 2006-04-06
>>>> 14:52:09,903
>>>> INFO [006868a90a6f50f8010a6f796cea0021] state {failed} reached
>>>> 2006-04-06
>>>> 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] failed
>>>> 1144331529450
>>>> 1144331529607 1144331529843 1144331529898 1144331529903
>>>>
>>>> is it possible to obtain the condor job description file which
>>>> converted
>>>> from gridsam JSDL. can I try to submit it to condor directly?
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> On Apr 6 2006, William Lee wrote:
>>>>
>>>> >
>>>> >Please try the classad.groovy script at this location. It
>>>> >incorporates a solution that sets up the transfer_input_files
>>>> and the
>>>> >transfer_output_files classad attributes in the JSDL-to-Classad
>>>> >translation. This is needed if the submission node (the node which
>>>> >GridSAM is running) does not necessarily share a common file system
>>>> >with the execution nodes.
>>>> >
>>>> >http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>> >
>>>> >William
>>>> >
>>>> >On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>> >
>>>> >> [Duplicate message snipped]
>>>>
>>>> Entered on 06/04/2006 at 13:57:02 by William Lee (GridSAM):
>>>> Please try the classad.groovy script at this location. It
>>>> incorporates a solution that sets up the transfer_input_files and
>>>> the
>>>> transfer_output_files classad attributes in the JSDL-to-Classad
>>>> translation. This is needed if the submission node (the node which
>>>> GridSAM is running) does not necessarily share a common file system
>>>> with the execution nodes.
>>>>
>>>> http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>>
>>>> William
>>>>
>>>> On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>>
>>>> > [Duplicate message snipped]
>>>>
>>>> Entered on 04/04/2006 at 15:15:01 by gt...@ca...:
>>>> Dear William
>>>>
>>>> thank you so much!! i modefi the classad.grrovy with adding your
>>>> code.
>>>> now the probme becomes undefined and job can not submited to condor.
>>>> [root@agorilla examples]# gridsam-status -s
>>>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/
>>>> gridsam?wsdl"
>>>> urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress:
>>>> pending ->
>>>> staging-in -> staged-in -> undefined
>>>>
>>>> --- pending - 2006-04-04 15:07:13.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-04 15:07:13.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-04 15:07:13.0 ---
>>>> 2 files staged in
>>>> --- undefined - 2006-04-04 15:07:13.0 ---
>>>> cannot advance from 'staged-in' to 'done'
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:condor:purestaging=true
>>>> [root@agorilla examples]#
>>>>
>>>> thank you for any suggestion!!
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> On Apr 4 2006, OMII Support wrote:
>>>>
>>>> >[Duplicate message snipped]
>>>>
>>>> Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM):
>>>> Hi Gen Tao,
>>>>
>>>> You are right, according to the condor setup, you would have to
>>>> modify the classad.groovy script to enable the transfer_input_files
>>>> and transfer_output_files classad attributes. This only applies to
>>>> condor setup that does not share a common networked file system.
>>>>
>>>> The code to add to the classad.groovy is
>>>>
>>>> jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/
>>>> jsdl:DataStaging", ns).eachWithIndex(){
>>>> node, index ->
>>>> if(index == 0){
>>>> script += "transfer_input_files="
>>>> }
>>>> if(!node.select("jsdl:Source", ns).isEmpty()){
>>>> fileName = node.select("jsdl:FileName")[0].text;
>>>> script += "${fileName} ,"
>>>> }
>>>> }
>>>>
>>>> I haven't been able to test the code above. Feel free to make any
>>>> modification as you see fit.
>>>>
>>>> William
>>>>
>>>> On 3 Apr 2006, at 20:09, OMII Support wrote:
>>>>
>>>> > [Duplicate message snipped]
>>>>
>>>> Entered on 03/04/2006 at 20:09:02 by gt...@ca...:
>>>> Dear Sir
>>>>
>>>> the resuts as following:
>>>> [condor@badger1--niees--group jobs]$ less stderr.txt
>>>> condor_exec.exe: dir1/file1.txt: No such file or directory
>>>> condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory
>>>> [condor@badger1--niees--group jobs]$
>>>>
>>>> those files had been staged to the central manager, but not in the
>>>> executing node. sorry, our central manager is not configured to
>>>> run jobs.
>>>> thus, central manager will submit jobs to other machines. that's
>>>> why when i
>>>> run this condor job at executing node, and can not find related
>>>> files. is
>>>> this normal? shoudl central manager copy those files to other
>>>> work nodes as
>>>> well? souhld i changing sometihgn in classad.groovy?
>>>>
>>>> thank you very much!!
>>>>
>>>> gen-tao
>>>>
>>>> thakn you very much!!
>>>>
>>>> On Apr 3 2006, OMII Support wrote:
>>>>
>>>> >[Duplicate message snipped]
>>>>
>>>> Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM):
>>>> It's not apparent where the problem lies. Condor has reported to
>>>> GridSAM the job has
>>>> completed successfully with exit code 1. Hence the description
>>>> shown in the EXECUTED
>>>> state.
>>>>
>>>> Can you try running a condor job with the following classad
>>>> directly?
>>>>
>>>> universe=vanilla
>>>> when_to_transfer_output=ON_EXIT
>>>> should_transfer_files=IF_NEEDED
>>>> notification=Never
>>>> log=/tmp/condor.log
>>>> executable=/bin/cat
>>>> arguments=dir1/file1.txt dir2/subdir1/file2.txt
>>>> output=stdout.txt
>>>> error=stderr.txt
>>>>
>>>> queue
>>>>
>>>> Entered on 03/04/2006 at 13:30:47 by ge...@ni...:
>>>> Dear Sir
>>>>
>>>> i am trying to run some GridSAM testing programs. however, it
>>>> seems the jobs can not be executed in our condor pool. the
>>>> condor pool is working. the job can be submited to the
>>>> condor_submitter and running at condor node, but then failed.
>>>>
>>>> the following are some information!
>>>>
>>>> this is the modefied cat-staging.jsdl
>>>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl">
>>>> <JobDescription>
>>>> <JobIdentification>
>>>> <JobName>cat job</JobName>
>>>> <Description>cat job description</Description>
>>>> <JobAnnotation>no annotation</JobAnnotation>
>>>> <JobProject>gridsam project</JobProject>
>>>> </JobIdentification>
>>>> <Application>
>>>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/
>>>> jsdl- posix">
>>>> <Executable>/bin/cat</Executable>
>>>> <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument>
>>>> <Output>stdout.txt</Output>
>>>> <Error>stderr.txt</Error>
>>>> </POSIXApplication>
>>>> </Application>
>>>> <DataStaging>
>>>> <FileName>dir1/file1.txt</FileName>
>>>> <CreationFlag >overwrite</CreationFlag>
>>>> <Source>
>>>> <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI>
>>>> </Source>
>>>> </DataStaging>
>>>> <DataStaging>
>>>> <FileName>dir2/subdir1/file2.txt</FileName>
>>>> <CreationFlag>overwrite</CreationFlag>
>>>> <Source>
>>>> <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input-
>>>> file.txt</URI>
>>>> </Source>
>>>> </DataStaging>
>>>> <DataStaging>
>>>> <FileName>stdout.txt</FileName>
>>>> <CreationFlag>overwrite</CreationFlag>
>>>> <DeleteOnTermination>true</DeleteOnTermination>
>>>> <Target>
>>>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/
>>>> stdout.txt</URI>
>>>> </Target>
>>>> </DataStaging>
>>>> </JobDescription>
>>>> </JobDefinition>
>>>>
>>>> after submit this file
>>>>
>>>> [root@agorilla examples]# gridsam-status -s "http://
>>>> agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?
>>>> wsdl" urn:gridsam:006868a90a4d221e010a5fb493650117
>>>> Job Progress: pending -> staging-in -> staged-in -> active ->
>>>> executed -> staging-out -> staged-out -> done
>>>>
>>>> --- pending - 2006-04-03 13:22:50.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-03 13:22:50.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-03 13:22:59.0 ---
>>>> 2 files staged in
>>>> --- active - 2006-04-03 13:22:59.0 ---
>>>> job is being launched through condor
>>>> --- executed - 2006-04-03 13:23:04.0 ---
>>>> 04/03 13:23:52 Job terminated. (1) Normal termination (return
>>>> value 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr
>>>> 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys
>>>> 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00
>>>> - Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run
>>>> Bytes Received By Job 126 - Total Bytes Sent By Job 15992 -
>>>> Total Bytes Received By Job
>>>> --- staging-out - 2006-04-03 13:23:04.0 ---
>>>> staging files out...
>>>> --- staged-out - 2006-04-03 13:23:04.0 ---
>>>> 1 files staged out
>>>> --- done - 2006-04-03 13:23:04.0 ---
>>>> Job completed
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:gridsam:Description=cat job description
>>>> urn:gridsam:JobProject=gridsam project
>>>> urn:gridsam:JobAnnotation=no annotation
>>>> urn:gridsam:JobName=cat job
>>>> urn:condor:classad=universe=vanilla
>>>> when_to_transfer_output=ON_EXIT
>>>> should_transfer_files=IF_NEEDED
>>>> notification=Never
>>>> log=/tmp/condor.log
>>>>
>>>> executable=/bin/cat
>>>> arguments=dir1/file1.txt dir2/subdir1/file2.txt
>>>> output=stdout.txt
>>>>
>>>> error=stderr.txt
>>>>
>>>> queue
>>>> urn:condor:clusterid=191
>>>> urn:gridsam:exitcode=1
>>>> [root@agorilla examples]#
>>>>
>>>> if i go the the executing node and the log indicates the following
>>>> 4/3 13:23:47 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9632>
>>>> 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO),
>>>> calling handler (command_match_info)
>>>> 4/3 13:23:47 vm1: match_info called
>>>> 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674
>>>> 4/3 13:23:47 vm1: State change: match notification protocol
>>>> successful
>>>> 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched
>>>> 4/3 13:23:47 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9693>
>>>> 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM),
>>>> calling handler (command_request_claim)
>>>> 4/3 13:23:47 vm1: Request accepted.
>>>> 4/3 13:23:47 vm1: Remote owner is
>>>> gri...@ce...
>>>> 4/3 13:23:47 vm1: State change: claiming protocol successful
>>>> 4/3 13:23:47 vm1: Changing state: Matched -> Claimed
>>>> 4/3 13:23:50 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9669>
>>>> 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM),
>>>> calling handler (command_activate_claim)
>>>> 4/3 13:23:50 vm1: Got activate_claim request from shadow
>>>> (<172.24.89.61:9669>)
>>>> 4/3 13:23:50 vm1: Remote job ID is 191.0
>>>> 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad
>>>> 4/3 13:23:50 vm1: State change: claim-activation protocol successful
>>>> 4/3 13:23:50 vm1: Changing activity: Idle -> Busy
>>>> 4/3 13:23:51 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9652>
>>>> 4/3 13:23:51 DaemonCore: received command 404
>>>> (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
>>>> 4/3 13:23:51 vm1: Called deactivate_claim_forcibly()
>>>> 4/3 13:23:51 Starter pid 31148 exited with status 0
>>>> 4/3 13:23:51 vm1: State change: starter exited
>>>> 4/3 13:23:51 vm1: Changing activity: Busy -> Idle
>>>> 4/3 13:23:52 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9620>
>>>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM),
>>>> calling handler (command_handler)
>>>> 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command
>>>> 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle ->
>>>> Preempting/Vacating
>>>> 4/3 13:23:52 vm1: State change: No preempting claim, returning
>>>> to owner
>>>> 4/3 13:23:52 vm1: Changing state and activity: Preempting/
>>>> Vacating - > Owner/Idle
>>>> 4/3 13:23:52 vm1: State change: IS_OWNER is false
>>>> 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed
>>>> 4/3 13:23:52 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9675>
>>>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM),
>>>> calling handler (command_handler)
>>>> 4/3 13:23:52 Error: can't find resource with capability
>>>> (<172.24.89.1:9666>#7928521674)
>>>>
>>>> it seems the job can be submit to condor centra-manager and
>>>> executing at node, then terminated by unknown reason. it is fine
>>>> to run jobs either from condor_submit or from globus. I am
>>>> confusing this is due to our condor setting our gridsam or OMII
>>>> level.
>>>>
>>>> BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the
>>>> virtula files are there.
>>>>
>>>> sorry, I am not sure should I ask GridSAM question here, I am
>>>> not even sure that is gridsam, OMII, or condor problem. becasue,
>>>> when i run PBAC test, it fail again. i was working just after
>>>> reinstall OMII server.
>>>>
>>>> thank you so much for giving any assistant.
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> Current Assignees: Steve McGough (GridSAM), William Lee
>>>> (GridSAM), Steven Newhouse
>>>>
>>>> CC(s):
>>>>
>>>> Contact Information:
>>>>
>>>> Customer Name: Gen-Tao Chiang Email address: ge...@ni...
>>>> Organisation: NIEeS Secondary email address: gt...@ca...
>>>>
>>>>
>>>
>>> --- William Lee - Software Coordinator ---
>>> --- London e-Science Centre, Imperial College London ---
>>> A: Room 211a, London e-Science Centre, William Penney Laboratory,
>>> Imperial College London, South Kensington, London, SW7 2AZ, UK
>>> E: wwhl at doc.ic.ac.uk | william at imageunion.com
>>> W: www.lesc.ic.ac.uk | www.imageunion.com
>>> P: +44 (0) 207 594 8185
>>>
>>>
>>>
>
>--- William Lee - Software Coordinator ---
>--- London e-Science Centre, Imperial College London ---
>A: Room 211a, London e-Science Centre, William Penney Laboratory,
>Imperial College London, South Kensington, London, SW7 2AZ, UK
>E: wwhl at doc.ic.ac.uk | william at imageunion.com
>W: www.lesc.ic.ac.uk | www.imageunion.com
>P: +44 (0) 207 594 8185
>
>
>
|
|
From: G.T. C. <gt...@ca...> - 2006-04-07 13:17:24
|
Dear William
we did another test!! it seems if running a job and copy output back is
fine. however, if a job need stage in, the condor can not find where to
read the inpurt data. from the example cat-stagin.jsdl. the input files put
under <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument> however,
it seems condor submiter does not know how to find this virtual directory
(which physically located under /tmp/gridsam-......./dir) and readt input
files. is this due to the configuration from classad.groovy as well?
thnak you so much!!
gen-tao
On Apr 7 2006, William Lee wrote:
>According to the Condor documentation, the transfer_output_files
>attribute is best to be left off. That's the reason why it's not in
>the classad.groovy script in the first place. If your condor system
>handle file output staging correctly, the code in the classad.groovy
>file that generates the "transfer_output_files" attribute can be
>commented out.
>
>William
>
>On 7 Apr 2006, at 10:11, G.T. Chiang wrote:
>
>> Dear William
>>
>> thakn you very much! this version works! right now i can
>> see .condor.script in the condor-submitter. i have tested one of
>> the testing job, the jsdl is like the following
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl">
>> <JobDescription>
>> <Application>
>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/
>> 2005/06/jsdl-posix">
>> <Executable>/bin/echo</Executable>
>> <Argument>hello hihi</Argument>
>> <Output>stdout.txt</Output>
>> </POSIXApplication>
>> </Application>
>> <DataStaging>
>> <FileName>unknown-file.txt</FileName>
>> <CreationFlag>overwrite</CreationFlag>
>> <DeleteOnTermination>true</DeleteOnTermination>
>> <Target>
>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/
>> output/subdir/output-file.txt</URI>
>> </Target>
>> </DataStaging>
>> </JobDescription>
>> </JobDefinition>
>>
>>
>> the condor.scrip is like the foolwing
>> universe=vanilla
>> when_to_transfer_output=ON_EXIT
>> should_transfer_files=IF_NEEDED
>> should_transfer_files=YES
>> notification=Never
>> log=/tmp/condor.log
>>
>>
>> executable=/bin/echo
>> arguments=hello gen-tao
>> output=stdout.txt
>>
>>
>>
>> transfer_output_files=unknown-file.txt
>> queue
>>
>>
>> however, this jobs are keep in idel status in condor. however, if i
>> remove the transfer_outpu_files=unknow-file.txt, the job can be
>> executed. i have seen that the similar problem wiht GT4 and condor.
>> it seems that is condor problme. is that relatd to file system?
>> sorry, i think the gridsam is wokring now! just need to figure out
>> what's wrong in our condor pool!!
>>
>> thank you very much!!!
>>
>> Best Regard!
>> gen-tao
>>
>>
>>
>>
>>
>>
>>
>> On Apr 6 2006, William Lee wrote:
>>
>>> Thanks. The classad.groovy copy I have put up is for later
>>> version that is not yet bundled with OMII.
>>>
>>> I have an updated copy for OMII 2.3.3 at the same location.
>>> Please try that instead.
>>>
>>> William
>>>
>>>
>>> On 6 Apr 2006, at 15:06, OMII Support wrote:
>>>
>>>> When replying, type your text above this line. Notification of
>>>> Query Change
>>>>
>>>>
>>>> Priority: Normal Status: Agent Replied
>>>> Creation Date: 03/04/2006 Creation Time: 13:30:47
>>>> Created By: ge...@ni...
>>>>
>>>> Click here to view Query in Browser
>>>>
>>>> Description:
>>>> Entered on 06/04/2006 at 15:06:02 by William Lee (GridSAM):
>>>> May I ask which version of GridSAM you are using? If it's from the
>>>> OMII bundle, which OMII version?
>>>>
>>>> William
>>>>
>>>> On 6 Apr 2006, at 14:58, G.T. Chiang wrote:
>>>>
>>>> > Dear William
>>>> >
>>>> > thakn you very much!! this verison is getting better, the
>>>> > following is the gridsam-status results. at least job is being
>>>> > processing via condor, but somehow it fails.
>>>> >
>>>> > [root@agorilla examples]# gridsam-status -s "http://
>>>> > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?
>>>> wsdl"
>>>> > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress:
>>>> pending -
>>>> > > staging-in -> staged-in -> active -> failed
>>>> >
>>>> > --- pending - 2006-04-06 14:52:09.0 ---
>>>> > job is being scheduled
>>>> > --- staging-in - 2006-04-06 14:52:09.0 ---
>>>> > staging files...
>>>> > --- staged-in - 2006-04-06 14:52:09.0 ---
>>>> > 2 files staged in
>>>> > --- active - 2006-04-06 14:52:09.0 ---
>>>> > job is being launched through condor
>>>> > --- failed - 2006-04-06 14:52:09.0 ---
>>>> > expecting job property urn:condor:classad from previous stage
>>>> >
>>>> > --------------
>>>> > Job Properties
>>>> > --------------
>>>> > urn:gridsam:Description=cat job description
>>>> > urn:gridsam:JobProject=gridsam project
>>>> > urn:gridsam:JobAnnotation=no annotation
>>>> > urn:gridsam:JobName=cat job
>>>> > [root@agorilla examples]#
>>>> >
>>>> >
>>>> > the following is the log from gridsam.log
>>>> >
>>>> > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > state {pending} reached 2006-04-06 14:52:09,574 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] initialised working
>>>> directory: /
>>>> > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06
>>>> > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state
>>>> {staging-
>>>> > in} reached 2006-04-06 14:52:09,662 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http://
>>>> > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp://
>>>> > gri...@ce.../tmp/
>>>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06
>>>> > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/
>>>> file1.txt
>>>> > staged 2006-04-06 14:52:09,791 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp://
>>>> > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt ->
>>>> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt
>>>> > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached
>>>> > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > executing groovy script classad.groovy 2006-04-06 14:52:09,871
>>>> INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] executed groovy script
>>>> > classad.groovy 2006-04-06 14:52:09,898 INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {active} reached
>>>> > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021]
>>>> > Failed to submit condor job: expecting job property
>>>> > urn:condor:classad from previous stage 2006-04-06 14:52:09,903
>>>> INFO
>>>> > [006868a90a6f50f8010a6f796cea0021] state {failed} reached
>>>> > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> > failed 1144331529450 1144331529607 1144331529843 1144331529898
>>>> > 1144331529903
>>>> >
>>>> >
>>>> > is it possible to obtain the condor job description file which
>>>> > converted from gridsam JSDL. can I try to submit it to condor
>>>> > directly?
>>>> >
>>>> > Best Regard!
>>>> > gen-tao
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > On Apr 6 2006, William Lee wrote:
>>>> >
>>>> >>
>>>> >> Please try the classad.groovy script at this location. It
>>>> >> incorporates a solution that sets up the transfer_input_files and
>>>> >> the transfer_output_files classad attributes in the JSDL-to-
>>>> >> Classad translation. This is needed if the submission node (the
>>>> >> node which GridSAM is running) does not necessarily share a
>>>> >> common file system with the execution nodes.
>>>> >>
>>>> >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>> >>
>>>> >> William
>>>> >>
>>>> >> On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>> >>
>>>> >>> [Duplicate message snipped]
>>>>
>>>> Entered on 06/04/2006 at 15:00:02 by gt...@ca...:
>>>> Dear William
>>>>
>>>> thakn you very much!! this verison is getting better, the
>>>> following is
>>>> the gridsam-status results. at least job is being processing via
>>>> condor,
>>>> but somehow it fails.
>>>>
>>>> [root@agorilla examples]# gridsam-status -s
>>>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/
>>>> gridsam?wsdl"
>>>> urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress:
>>>> pending ->
>>>> staging-in -> staged-in -> active -> failed
>>>>
>>>> --- pending - 2006-04-06 14:52:09.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-06 14:52:09.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-06 14:52:09.0 ---
>>>> 2 files staged in
>>>> --- active - 2006-04-06 14:52:09.0 ---
>>>> job is being launched through condor
>>>> --- failed - 2006-04-06 14:52:09.0 ---
>>>> expecting job property urn:condor:classad from previous stage
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:gridsam:Description=cat job description
>>>> urn:gridsam:JobProject=gridsam project
>>>> urn:gridsam:JobAnnotation=no annotation
>>>> urn:gridsam:JobName=cat job
>>>> [root@agorilla examples]#
>>>>
>>>> the following is the log from gridsam.log
>>>>
>>>> 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> state
>>>> {pending} reached 2006-04-06 14:52:09,574 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] initialised working directory:
>>>> /tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06
>>>> 14:52:09,607 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] state {staging-in} reached
>>>> 2006-04-06
>>>> 14:52:09,662 INFO [006868a90a6f50f8010a6f796cea0021] staging
>>>> (copy) file
>>>> http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt ->
>>>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt
>>>> 2006-04-06 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> dir1/file1.txt staged 2006-04-06 14:52:09,791 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] staging (copy) file
>>>> ftp://anonymous:anonymous@128.232.232.41:19245/subdir/input-
>>>> file.txt ->
>>>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/
>>>> gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt
>>>> 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached
>>>> 2006-04-06
>>>> 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] executing
>>>> groovy
>>>> script classad.groovy 2006-04-06 14:52:09,871 INFO
>>>> [006868a90a6f50f8010a6f796cea0021] executed groovy script
>>>> classad.groovy
>>>> 2006-04-06 14:52:09,898 INFO [006868a90a6f50f8010a6f796cea0021]
>>>> state
>>>> {active} reached 2006-04-06 14:52:09,903 ERROR
>>>> [006868a90a6f50f8010a6f796cea0021] Failed to submit condor job:
>>>> expecting
>>>> job property urn:condor:classad from previous stage 2006-04-06
>>>> 14:52:09,903
>>>> INFO [006868a90a6f50f8010a6f796cea0021] state {failed} reached
>>>> 2006-04-06
>>>> 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] failed
>>>> 1144331529450
>>>> 1144331529607 1144331529843 1144331529898 1144331529903
>>>>
>>>> is it possible to obtain the condor job description file which
>>>> converted
>>>> from gridsam JSDL. can I try to submit it to condor directly?
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> On Apr 6 2006, William Lee wrote:
>>>>
>>>> >
>>>> >Please try the classad.groovy script at this location. It
>>>> >incorporates a solution that sets up the transfer_input_files
>>>> and the
>>>> >transfer_output_files classad attributes in the JSDL-to-Classad
>>>> >translation. This is needed if the submission node (the node which
>>>> >GridSAM is running) does not necessarily share a common file system
>>>> >with the execution nodes.
>>>> >
>>>> >http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>> >
>>>> >William
>>>> >
>>>> >On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>> >
>>>> >> [Duplicate message snipped]
>>>>
>>>> Entered on 06/04/2006 at 13:57:02 by William Lee (GridSAM):
>>>> Please try the classad.groovy script at this location. It
>>>> incorporates a solution that sets up the transfer_input_files and
>>>> the
>>>> transfer_output_files classad attributes in the JSDL-to-Classad
>>>> translation. This is needed if the submission node (the node which
>>>> GridSAM is running) does not necessarily share a common file system
>>>> with the execution nodes.
>>>>
>>>> http://www.doc.ic.ac.uk/~wwhl/classad.groovy
>>>>
>>>> William
>>>>
>>>> On 4 Apr 2006, at 15:15, OMII Support wrote:
>>>>
>>>> > [Duplicate message snipped]
>>>>
>>>> Entered on 04/04/2006 at 15:15:01 by gt...@ca...:
>>>> Dear William
>>>>
>>>> thank you so much!! i modefi the classad.grrovy with adding your
>>>> code.
>>>> now the probme becomes undefined and job can not submited to condor.
>>>> [root@agorilla examples]# gridsam-status -s
>>>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/
>>>> gridsam?wsdl"
>>>> urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress:
>>>> pending ->
>>>> staging-in -> staged-in -> undefined
>>>>
>>>> --- pending - 2006-04-04 15:07:13.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-04 15:07:13.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-04 15:07:13.0 ---
>>>> 2 files staged in
>>>> --- undefined - 2006-04-04 15:07:13.0 ---
>>>> cannot advance from 'staged-in' to 'done'
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:condor:purestaging=true
>>>> [root@agorilla examples]#
>>>>
>>>> thank you for any suggestion!!
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> On Apr 4 2006, OMII Support wrote:
>>>>
>>>> >[Duplicate message snipped]
>>>>
>>>> Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM):
>>>> Hi Gen Tao,
>>>>
>>>> You are right, according to the condor setup, you would have to
>>>> modify the classad.groovy script to enable the transfer_input_files
>>>> and transfer_output_files classad attributes. This only applies to
>>>> condor setup that does not share a common networked file system.
>>>>
>>>> The code to add to the classad.groovy is
>>>>
>>>> jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/
>>>> jsdl:DataStaging", ns).eachWithIndex(){
>>>> node, index ->
>>>> if(index == 0){
>>>> script += "transfer_input_files="
>>>> }
>>>> if(!node.select("jsdl:Source", ns).isEmpty()){
>>>> fileName = node.select("jsdl:FileName")[0].text;
>>>> script += "${fileName} ,"
>>>> }
>>>> }
>>>>
>>>> I haven't been able to test the code above. Feel free to make any
>>>> modification as you see fit.
>>>>
>>>> William
>>>>
>>>> On 3 Apr 2006, at 20:09, OMII Support wrote:
>>>>
>>>> > [Duplicate message snipped]
>>>>
>>>> Entered on 03/04/2006 at 20:09:02 by gt...@ca...:
>>>> Dear Sir
>>>>
>>>> the resuts as following:
>>>> [condor@badger1--niees--group jobs]$ less stderr.txt
>>>> condor_exec.exe: dir1/file1.txt: No such file or directory
>>>> condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory
>>>> [condor@badger1--niees--group jobs]$
>>>>
>>>> those files had been staged to the central manager, but not in the
>>>> executing node. sorry, our central manager is not configured to
>>>> run jobs.
>>>> thus, central manager will submit jobs to other machines. that's
>>>> why when i
>>>> run this condor job at executing node, and can not find related
>>>> files. is
>>>> this normal? shoudl central manager copy those files to other
>>>> work nodes as
>>>> well? souhld i changing sometihgn in classad.groovy?
>>>>
>>>> thank you very much!!
>>>>
>>>> gen-tao
>>>>
>>>> thakn you very much!!
>>>>
>>>> On Apr 3 2006, OMII Support wrote:
>>>>
>>>> >[Duplicate message snipped]
>>>>
>>>> Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM):
>>>> It's not apparent where the problem lies. Condor has reported to
>>>> GridSAM the job has
>>>> completed successfully with exit code 1. Hence the description
>>>> shown in the EXECUTED
>>>> state.
>>>>
>>>> Can you try running a condor job with the following classad
>>>> directly?
>>>>
>>>> universe=vanilla
>>>> when_to_transfer_output=ON_EXIT
>>>> should_transfer_files=IF_NEEDED
>>>> notification=Never
>>>> log=/tmp/condor.log
>>>> executable=/bin/cat
>>>> arguments=dir1/file1.txt dir2/subdir1/file2.txt
>>>> output=stdout.txt
>>>> error=stderr.txt
>>>>
>>>> queue
>>>>
>>>> Entered on 03/04/2006 at 13:30:47 by ge...@ni...:
>>>> Dear Sir
>>>>
>>>> i am trying to run some GridSAM testing programs. however, it
>>>> seems the jobs can not be executed in our condor pool. the
>>>> condor pool is working. the job can be submited to the
>>>> condor_submitter and running at condor node, but then failed.
>>>>
>>>> the following are some information!
>>>>
>>>> this is the modefied cat-staging.jsdl
>>>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl">
>>>> <JobDescription>
>>>> <JobIdentification>
>>>> <JobName>cat job</JobName>
>>>> <Description>cat job description</Description>
>>>> <JobAnnotation>no annotation</JobAnnotation>
>>>> <JobProject>gridsam project</JobProject>
>>>> </JobIdentification>
>>>> <Application>
>>>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/
>>>> jsdl- posix">
>>>> <Executable>/bin/cat</Executable>
>>>> <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument>
>>>> <Output>stdout.txt</Output>
>>>> <Error>stderr.txt</Error>
>>>> </POSIXApplication>
>>>> </Application>
>>>> <DataStaging>
>>>> <FileName>dir1/file1.txt</FileName>
>>>> <CreationFlag >overwrite</CreationFlag>
>>>> <Source>
>>>> <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI>
>>>> </Source>
>>>> </DataStaging>
>>>> <DataStaging>
>>>> <FileName>dir2/subdir1/file2.txt</FileName>
>>>> <CreationFlag>overwrite</CreationFlag>
>>>> <Source>
>>>> <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input-
>>>> file.txt</URI>
>>>> </Source>
>>>> </DataStaging>
>>>> <DataStaging>
>>>> <FileName>stdout.txt</FileName>
>>>> <CreationFlag>overwrite</CreationFlag>
>>>> <DeleteOnTermination>true</DeleteOnTermination>
>>>> <Target>
>>>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/
>>>> stdout.txt</URI>
>>>> </Target>
>>>> </DataStaging>
>>>> </JobDescription>
>>>> </JobDefinition>
>>>>
>>>> after submit this file
>>>>
>>>> [root@agorilla examples]# gridsam-status -s "http://
>>>> agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?
>>>> wsdl" urn:gridsam:006868a90a4d221e010a5fb493650117
>>>> Job Progress: pending -> staging-in -> staged-in -> active ->
>>>> executed -> staging-out -> staged-out -> done
>>>>
>>>> --- pending - 2006-04-03 13:22:50.0 ---
>>>> job is being scheduled
>>>> --- staging-in - 2006-04-03 13:22:50.0 ---
>>>> staging files...
>>>> --- staged-in - 2006-04-03 13:22:59.0 ---
>>>> 2 files staged in
>>>> --- active - 2006-04-03 13:22:59.0 ---
>>>> job is being launched through condor
>>>> --- executed - 2006-04-03 13:23:04.0 ---
>>>> 04/03 13:23:52 Job terminated. (1) Normal termination (return
>>>> value 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr
>>>> 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys
>>>> 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00
>>>> - Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run
>>>> Bytes Received By Job 126 - Total Bytes Sent By Job 15992 -
>>>> Total Bytes Received By Job
>>>> --- staging-out - 2006-04-03 13:23:04.0 ---
>>>> staging files out...
>>>> --- staged-out - 2006-04-03 13:23:04.0 ---
>>>> 1 files staged out
>>>> --- done - 2006-04-03 13:23:04.0 ---
>>>> Job completed
>>>>
>>>> --------------
>>>> Job Properties
>>>> --------------
>>>> urn:gridsam:Description=cat job description
>>>> urn:gridsam:JobProject=gridsam project
>>>> urn:gridsam:JobAnnotation=no annotation
>>>> urn:gridsam:JobName=cat job
>>>> urn:condor:classad=universe=vanilla
>>>> when_to_transfer_output=ON_EXIT
>>>> should_transfer_files=IF_NEEDED
>>>> notification=Never
>>>> log=/tmp/condor.log
>>>>
>>>> executable=/bin/cat
>>>> arguments=dir1/file1.txt dir2/subdir1/file2.txt
>>>> output=stdout.txt
>>>>
>>>> error=stderr.txt
>>>>
>>>> queue
>>>> urn:condor:clusterid=191
>>>> urn:gridsam:exitcode=1
>>>> [root@agorilla examples]#
>>>>
>>>> if i go the the executing node and the log indicates the following
>>>> 4/3 13:23:47 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9632>
>>>> 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO),
>>>> calling handler (command_match_info)
>>>> 4/3 13:23:47 vm1: match_info called
>>>> 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674
>>>> 4/3 13:23:47 vm1: State change: match notification protocol
>>>> successful
>>>> 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched
>>>> 4/3 13:23:47 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9693>
>>>> 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM),
>>>> calling handler (command_request_claim)
>>>> 4/3 13:23:47 vm1: Request accepted.
>>>> 4/3 13:23:47 vm1: Remote owner is
>>>> gri...@ce...
>>>> 4/3 13:23:47 vm1: State change: claiming protocol successful
>>>> 4/3 13:23:47 vm1: Changing state: Matched -> Claimed
>>>> 4/3 13:23:50 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9669>
>>>> 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM),
>>>> calling handler (command_activate_claim)
>>>> 4/3 13:23:50 vm1: Got activate_claim request from shadow
>>>> (<172.24.89.61:9669>)
>>>> 4/3 13:23:50 vm1: Remote job ID is 191.0
>>>> 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad
>>>> 4/3 13:23:50 vm1: State change: claim-activation protocol successful
>>>> 4/3 13:23:50 vm1: Changing activity: Idle -> Busy
>>>> 4/3 13:23:51 DaemonCore: Command received via TCP from host
>>>> <172.24.89.61:9652>
>>>> 4/3 13:23:51 DaemonCore: received command 404
>>>> (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler)
>>>> 4/3 13:23:51 vm1: Called deactivate_claim_forcibly()
>>>> 4/3 13:23:51 Starter pid 31148 exited with status 0
>>>> 4/3 13:23:51 vm1: State change: starter exited
>>>> 4/3 13:23:51 vm1: Changing activity: Busy -> Idle
>>>> 4/3 13:23:52 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9620>
>>>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM),
>>>> calling handler (command_handler)
>>>> 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command
>>>> 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle ->
>>>> Preempting/Vacating
>>>> 4/3 13:23:52 vm1: State change: No preempting claim, returning
>>>> to owner
>>>> 4/3 13:23:52 vm1: Changing state and activity: Preempting/
>>>> Vacating - > Owner/Idle
>>>> 4/3 13:23:52 vm1: State change: IS_OWNER is false
>>>> 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed
>>>> 4/3 13:23:52 DaemonCore: Command received via UDP from host
>>>> <172.24.89.61:9675>
>>>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM),
>>>> calling handler (command_handler)
>>>> 4/3 13:23:52 Error: can't find resource with capability
>>>> (<172.24.89.1:9666>#7928521674)
>>>>
>>>> it seems the job can be submit to condor centra-manager and
>>>> executing at node, then terminated by unknown reason. it is fine
>>>> to run jobs either from condor_submit or from globus. I am
>>>> confusing this is due to our condor setting our gridsam or OMII
>>>> level.
>>>>
>>>> BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the
>>>> virtula files are there.
>>>>
>>>> sorry, I am not sure should I ask GridSAM question here, I am
>>>> not even sure that is gridsam, OMII, or condor problem. becasue,
>>>> when i run PBAC test, it fail again. i was working just after
>>>> reinstall OMII server.
>>>>
>>>> thank you so much for giving any assistant.
>>>>
>>>> Best Regard!
>>>> gen-tao
>>>>
>>>> Current Assignees: Steve McGough (GridSAM), William Lee
>>>> (GridSAM), Steven Newhouse
>>>>
>>>> CC(s):
>>>>
>>>> Contact Information:
>>>>
>>>> Customer Name: Gen-Tao Chiang Email address: ge...@ni...
>>>> Organisation: NIEeS Secondary email address: gt...@ca...
>>>>
>>>>
>>>
>>> --- William Lee - Software Coordinator ---
>>> --- London e-Science Centre, Imperial College London ---
>>> A: Room 211a, London e-Science Centre, William Penney Laboratory,
>>> Imperial College London, South Kensington, London, SW7 2AZ, UK
>>> E: wwhl at doc.ic.ac.uk | william at imageunion.com
>>> W: www.lesc.ic.ac.uk | www.imageunion.com
>>> P: +44 (0) 207 594 8185
>>>
>>>
>>>
>
>--- William Lee - Software Coordinator ---
>--- London e-Science Centre, Imperial College London ---
>A: Room 211a, London e-Science Centre, William Penney Laboratory,
>Imperial College London, South Kensington, London, SW7 2AZ, UK
>E: wwhl at doc.ic.ac.uk | william at imageunion.com
>W: www.lesc.ic.ac.uk | www.imageunion.com
>P: +44 (0) 207 594 8185
>
>
>
|
|
From: William L. <ww...@do...> - 2006-04-07 09:15:00
|
According to the Condor documentation, the transfer_output_files attribute is best to be left off. That's the reason why it's not in the classad.groovy script in the first place. If your condor system handle file output staging correctly, the code in the classad.groovy file that generates the "transfer_output_files" attribute can be commented out. William On 7 Apr 2006, at 10:11, G.T. Chiang wrote: > Dear William > > thakn you very much! this version works! right now i can > see .condor.script in the condor-submitter. i have tested one of > the testing job, the jsdl is like the following > > <?xml version="1.0" encoding="UTF-8"?> > <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> > <JobDescription> > <Application> > <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/ > 2005/06/jsdl-posix"> > <Executable>/bin/echo</Executable> > <Argument>hello hihi</Argument> > <Output>stdout.txt</Output> > </POSIXApplication> > </Application> > <DataStaging> > <FileName>unknown-file.txt</FileName> > <CreationFlag>overwrite</CreationFlag> > <DeleteOnTermination>true</DeleteOnTermination> > <Target> > <URI>ftp://anonymous:anonymous@128.232.232.41:19245/ > output/subdir/output-file.txt</URI> > </Target> > </DataStaging> > </JobDescription> > </JobDefinition> > > > the condor.scrip is like the foolwing > universe=vanilla > when_to_transfer_output=ON_EXIT > should_transfer_files=IF_NEEDED > should_transfer_files=YES > notification=Never > log=/tmp/condor.log > > > executable=/bin/echo > arguments=hello gen-tao > output=stdout.txt > > > > transfer_output_files=unknown-file.txt > queue > > > however, this jobs are keep in idel status in condor. however, if i > remove the transfer_outpu_files=unknow-file.txt, the job can be > executed. i have seen that the similar problem wiht GT4 and condor. > it seems that is condor problme. is that relatd to file system? > sorry, i think the gridsam is wokring now! just need to figure out > what's wrong in our condor pool!! > > thank you very much!!! > > Best Regard! > gen-tao > > > > > > > > On Apr 6 2006, William Lee wrote: > >> Thanks. The classad.groovy copy I have put up is for later >> version that is not yet bundled with OMII. >> >> I have an updated copy for OMII 2.3.3 at the same location. >> Please try that instead. >> >> William >> >> >> On 6 Apr 2006, at 15:06, OMII Support wrote: >> >>> When replying, type your text above this line. Notification of >>> Query Change >>> >>> >>> Priority: Normal Status: Agent Replied >>> Creation Date: 03/04/2006 Creation Time: 13:30:47 >>> Created By: ge...@ni... >>> >>> Click here to view Query in Browser >>> >>> Description: >>> Entered on 06/04/2006 at 15:06:02 by William Lee (GridSAM): >>> May I ask which version of GridSAM you are using? If it's from the >>> OMII bundle, which OMII version? >>> >>> William >>> >>> On 6 Apr 2006, at 14:58, G.T. Chiang wrote: >>> >>> > Dear William >>> > >>> > thakn you very much!! this verison is getting better, the >>> > following is the gridsam-status results. at least job is being >>> > processing via condor, but somehow it fails. >>> > >>> > [root@agorilla examples]# gridsam-status -s "http:// >>> > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam? >>> wsdl" >>> > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: >>> pending - >>> > > staging-in -> staged-in -> active -> failed >>> > >>> > --- pending - 2006-04-06 14:52:09.0 --- >>> > job is being scheduled >>> > --- staging-in - 2006-04-06 14:52:09.0 --- >>> > staging files... >>> > --- staged-in - 2006-04-06 14:52:09.0 --- >>> > 2 files staged in >>> > --- active - 2006-04-06 14:52:09.0 --- >>> > job is being launched through condor >>> > --- failed - 2006-04-06 14:52:09.0 --- >>> > expecting job property urn:condor:classad from previous stage >>> > >>> > -------------- >>> > Job Properties >>> > -------------- >>> > urn:gridsam:Description=cat job description >>> > urn:gridsam:JobProject=gridsam project >>> > urn:gridsam:JobAnnotation=no annotation >>> > urn:gridsam:JobName=cat job >>> > [root@agorilla examples]# >>> > >>> > >>> > the following is the log from gridsam.log >>> > >>> > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] >>> > state {pending} reached 2006-04-06 14:52:09,574 INFO >>> > [006868a90a6f50f8010a6f796cea0021] initialised working >>> directory: / >>> > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 >>> > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state >>> {staging- >>> > in} reached 2006-04-06 14:52:09,662 INFO >>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http:// >>> > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp:// >>> > gri...@ce.../tmp/ >>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06 >>> > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/ >>> file1.txt >>> > staged 2006-04-06 14:52:09,791 INFO >>> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp:// >>> > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt -> >>> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ >>> > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt >>> > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] >>> > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO >>> > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached >>> > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] >>> > executing groovy script classad.groovy 2006-04-06 14:52:09,871 >>> INFO >>> > [006868a90a6f50f8010a6f796cea0021] executed groovy script >>> > classad.groovy 2006-04-06 14:52:09,898 INFO >>> > [006868a90a6f50f8010a6f796cea0021] state {active} reached >>> > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021] >>> > Failed to submit condor job: expecting job property >>> > urn:condor:classad from previous stage 2006-04-06 14:52:09,903 >>> INFO >>> > [006868a90a6f50f8010a6f796cea0021] state {failed} reached >>> > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] >>> > failed 1144331529450 1144331529607 1144331529843 1144331529898 >>> > 1144331529903 >>> > >>> > >>> > is it possible to obtain the condor job description file which >>> > converted from gridsam JSDL. can I try to submit it to condor >>> > directly? >>> > >>> > Best Regard! >>> > gen-tao >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > >>> > On Apr 6 2006, William Lee wrote: >>> > >>> >> >>> >> Please try the classad.groovy script at this location. It >>> >> incorporates a solution that sets up the transfer_input_files and >>> >> the transfer_output_files classad attributes in the JSDL-to- >>> >> Classad translation. This is needed if the submission node (the >>> >> node which GridSAM is running) does not necessarily share a >>> >> common file system with the execution nodes. >>> >> >>> >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy >>> >> >>> >> William >>> >> >>> >> On 4 Apr 2006, at 15:15, OMII Support wrote: >>> >> >>> >>> [Duplicate message snipped] >>> >>> Entered on 06/04/2006 at 15:00:02 by gt...@ca...: >>> Dear William >>> >>> thakn you very much!! this verison is getting better, the >>> following is >>> the gridsam-status results. at least job is being processing via >>> condor, >>> but somehow it fails. >>> >>> [root@agorilla examples]# gridsam-status -s >>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ >>> gridsam?wsdl" >>> urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: >>> pending -> >>> staging-in -> staged-in -> active -> failed >>> >>> --- pending - 2006-04-06 14:52:09.0 --- >>> job is being scheduled >>> --- staging-in - 2006-04-06 14:52:09.0 --- >>> staging files... >>> --- staged-in - 2006-04-06 14:52:09.0 --- >>> 2 files staged in >>> --- active - 2006-04-06 14:52:09.0 --- >>> job is being launched through condor >>> --- failed - 2006-04-06 14:52:09.0 --- >>> expecting job property urn:condor:classad from previous stage >>> >>> -------------- >>> Job Properties >>> -------------- >>> urn:gridsam:Description=cat job description >>> urn:gridsam:JobProject=gridsam project >>> urn:gridsam:JobAnnotation=no annotation >>> urn:gridsam:JobName=cat job >>> [root@agorilla examples]# >>> >>> the following is the log from gridsam.log >>> >>> 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] >>> state >>> {pending} reached 2006-04-06 14:52:09,574 INFO >>> [006868a90a6f50f8010a6f796cea0021] initialised working directory: >>> /tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 >>> 14:52:09,607 INFO >>> [006868a90a6f50f8010a6f796cea0021] state {staging-in} reached >>> 2006-04-06 >>> 14:52:09,662 INFO [006868a90a6f50f8010a6f796cea0021] staging >>> (copy) file >>> http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> >>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ >>> gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt >>> 2006-04-06 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] >>> dir1/file1.txt staged 2006-04-06 14:52:09,791 INFO >>> [006868a90a6f50f8010a6f796cea0021] staging (copy) file >>> ftp://anonymous:anonymous@128.232.232.41:19245/subdir/input- >>> file.txt -> >>> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ >>> gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt >>> 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] >>> dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO >>> [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached >>> 2006-04-06 >>> 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] executing >>> groovy >>> script classad.groovy 2006-04-06 14:52:09,871 INFO >>> [006868a90a6f50f8010a6f796cea0021] executed groovy script >>> classad.groovy >>> 2006-04-06 14:52:09,898 INFO [006868a90a6f50f8010a6f796cea0021] >>> state >>> {active} reached 2006-04-06 14:52:09,903 ERROR >>> [006868a90a6f50f8010a6f796cea0021] Failed to submit condor job: >>> expecting >>> job property urn:condor:classad from previous stage 2006-04-06 >>> 14:52:09,903 >>> INFO [006868a90a6f50f8010a6f796cea0021] state {failed} reached >>> 2006-04-06 >>> 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] failed >>> 1144331529450 >>> 1144331529607 1144331529843 1144331529898 1144331529903 >>> >>> is it possible to obtain the condor job description file which >>> converted >>> from gridsam JSDL. can I try to submit it to condor directly? >>> >>> Best Regard! >>> gen-tao >>> >>> On Apr 6 2006, William Lee wrote: >>> >>> > >>> >Please try the classad.groovy script at this location. It >>> >incorporates a solution that sets up the transfer_input_files >>> and the >>> >transfer_output_files classad attributes in the JSDL-to-Classad >>> >translation. This is needed if the submission node (the node which >>> >GridSAM is running) does not necessarily share a common file system >>> >with the execution nodes. >>> > >>> >http://www.doc.ic.ac.uk/~wwhl/classad.groovy >>> > >>> >William >>> > >>> >On 4 Apr 2006, at 15:15, OMII Support wrote: >>> > >>> >> [Duplicate message snipped] >>> >>> Entered on 06/04/2006 at 13:57:02 by William Lee (GridSAM): >>> Please try the classad.groovy script at this location. It >>> incorporates a solution that sets up the transfer_input_files and >>> the >>> transfer_output_files classad attributes in the JSDL-to-Classad >>> translation. This is needed if the submission node (the node which >>> GridSAM is running) does not necessarily share a common file system >>> with the execution nodes. >>> >>> http://www.doc.ic.ac.uk/~wwhl/classad.groovy >>> >>> William >>> >>> On 4 Apr 2006, at 15:15, OMII Support wrote: >>> >>> > [Duplicate message snipped] >>> >>> Entered on 04/04/2006 at 15:15:01 by gt...@ca...: >>> Dear William >>> >>> thank you so much!! i modefi the classad.grrovy with adding your >>> code. >>> now the probme becomes undefined and job can not submited to condor. >>> [root@agorilla examples]# gridsam-status -s >>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ >>> gridsam?wsdl" >>> urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress: >>> pending -> >>> staging-in -> staged-in -> undefined >>> >>> --- pending - 2006-04-04 15:07:13.0 --- >>> job is being scheduled >>> --- staging-in - 2006-04-04 15:07:13.0 --- >>> staging files... >>> --- staged-in - 2006-04-04 15:07:13.0 --- >>> 2 files staged in >>> --- undefined - 2006-04-04 15:07:13.0 --- >>> cannot advance from 'staged-in' to 'done' >>> >>> -------------- >>> Job Properties >>> -------------- >>> urn:condor:purestaging=true >>> [root@agorilla examples]# >>> >>> thank you for any suggestion!! >>> >>> Best Regard! >>> gen-tao >>> >>> On Apr 4 2006, OMII Support wrote: >>> >>> >[Duplicate message snipped] >>> >>> Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM): >>> Hi Gen Tao, >>> >>> You are right, according to the condor setup, you would have to >>> modify the classad.groovy script to enable the transfer_input_files >>> and transfer_output_files classad attributes. This only applies to >>> condor setup that does not share a common networked file system. >>> >>> The code to add to the classad.groovy is >>> >>> jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/ >>> jsdl:DataStaging", ns).eachWithIndex(){ >>> node, index -> >>> if(index == 0){ >>> script += "transfer_input_files=" >>> } >>> if(!node.select("jsdl:Source", ns).isEmpty()){ >>> fileName = node.select("jsdl:FileName")[0].text; >>> script += "${fileName} ," >>> } >>> } >>> >>> I haven't been able to test the code above. Feel free to make any >>> modification as you see fit. >>> >>> William >>> >>> On 3 Apr 2006, at 20:09, OMII Support wrote: >>> >>> > [Duplicate message snipped] >>> >>> Entered on 03/04/2006 at 20:09:02 by gt...@ca...: >>> Dear Sir >>> >>> the resuts as following: >>> [condor@badger1--niees--group jobs]$ less stderr.txt >>> condor_exec.exe: dir1/file1.txt: No such file or directory >>> condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory >>> [condor@badger1--niees--group jobs]$ >>> >>> those files had been staged to the central manager, but not in the >>> executing node. sorry, our central manager is not configured to >>> run jobs. >>> thus, central manager will submit jobs to other machines. that's >>> why when i >>> run this condor job at executing node, and can not find related >>> files. is >>> this normal? shoudl central manager copy those files to other >>> work nodes as >>> well? souhld i changing sometihgn in classad.groovy? >>> >>> thank you very much!! >>> >>> gen-tao >>> >>> thakn you very much!! >>> >>> On Apr 3 2006, OMII Support wrote: >>> >>> >[Duplicate message snipped] >>> >>> Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM): >>> It's not apparent where the problem lies. Condor has reported to >>> GridSAM the job has >>> completed successfully with exit code 1. Hence the description >>> shown in the EXECUTED >>> state. >>> >>> Can you try running a condor job with the following classad >>> directly? >>> >>> universe=vanilla >>> when_to_transfer_output=ON_EXIT >>> should_transfer_files=IF_NEEDED >>> notification=Never >>> log=/tmp/condor.log >>> executable=/bin/cat >>> arguments=dir1/file1.txt dir2/subdir1/file2.txt >>> output=stdout.txt >>> error=stderr.txt >>> >>> queue >>> >>> Entered on 03/04/2006 at 13:30:47 by ge...@ni...: >>> Dear Sir >>> >>> i am trying to run some GridSAM testing programs. however, it >>> seems the jobs can not be executed in our condor pool. the >>> condor pool is working. the job can be submited to the >>> condor_submitter and running at condor node, but then failed. >>> >>> the following are some information! >>> >>> this is the modefied cat-staging.jsdl >>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> >>> <JobDescription> >>> <JobIdentification> >>> <JobName>cat job</JobName> >>> <Description>cat job description</Description> >>> <JobAnnotation>no annotation</JobAnnotation> >>> <JobProject>gridsam project</JobProject> >>> </JobIdentification> >>> <Application> >>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/ >>> jsdl- posix"> >>> <Executable>/bin/cat</Executable> >>> <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument> >>> <Output>stdout.txt</Output> >>> <Error>stderr.txt</Error> >>> </POSIXApplication> >>> </Application> >>> <DataStaging> >>> <FileName>dir1/file1.txt</FileName> >>> <CreationFlag >overwrite</CreationFlag> >>> <Source> >>> <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI> >>> </Source> >>> </DataStaging> >>> <DataStaging> >>> <FileName>dir2/subdir1/file2.txt</FileName> >>> <CreationFlag>overwrite</CreationFlag> >>> <Source> >>> <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input- >>> file.txt</URI> >>> </Source> >>> </DataStaging> >>> <DataStaging> >>> <FileName>stdout.txt</FileName> >>> <CreationFlag>overwrite</CreationFlag> >>> <DeleteOnTermination>true</DeleteOnTermination> >>> <Target> >>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/ >>> stdout.txt</URI> >>> </Target> >>> </DataStaging> >>> </JobDescription> >>> </JobDefinition> >>> >>> after submit this file >>> >>> [root@agorilla examples]# gridsam-status -s "http:// >>> agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam? >>> wsdl" urn:gridsam:006868a90a4d221e010a5fb493650117 >>> Job Progress: pending -> staging-in -> staged-in -> active -> >>> executed -> staging-out -> staged-out -> done >>> >>> --- pending - 2006-04-03 13:22:50.0 --- >>> job is being scheduled >>> --- staging-in - 2006-04-03 13:22:50.0 --- >>> staging files... >>> --- staged-in - 2006-04-03 13:22:59.0 --- >>> 2 files staged in >>> --- active - 2006-04-03 13:22:59.0 --- >>> job is being launched through condor >>> --- executed - 2006-04-03 13:23:04.0 --- >>> 04/03 13:23:52 Job terminated. (1) Normal termination (return >>> value 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr >>> 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys >>> 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 >>> - Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run >>> Bytes Received By Job 126 - Total Bytes Sent By Job 15992 - >>> Total Bytes Received By Job >>> --- staging-out - 2006-04-03 13:23:04.0 --- >>> staging files out... >>> --- staged-out - 2006-04-03 13:23:04.0 --- >>> 1 files staged out >>> --- done - 2006-04-03 13:23:04.0 --- >>> Job completed >>> >>> -------------- >>> Job Properties >>> -------------- >>> urn:gridsam:Description=cat job description >>> urn:gridsam:JobProject=gridsam project >>> urn:gridsam:JobAnnotation=no annotation >>> urn:gridsam:JobName=cat job >>> urn:condor:classad=universe=vanilla >>> when_to_transfer_output=ON_EXIT >>> should_transfer_files=IF_NEEDED >>> notification=Never >>> log=/tmp/condor.log >>> >>> executable=/bin/cat >>> arguments=dir1/file1.txt dir2/subdir1/file2.txt >>> output=stdout.txt >>> >>> error=stderr.txt >>> >>> queue >>> urn:condor:clusterid=191 >>> urn:gridsam:exitcode=1 >>> [root@agorilla examples]# >>> >>> if i go the the executing node and the log indicates the following >>> 4/3 13:23:47 DaemonCore: Command received via UDP from host >>> <172.24.89.61:9632> >>> 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO), >>> calling handler (command_match_info) >>> 4/3 13:23:47 vm1: match_info called >>> 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674 >>> 4/3 13:23:47 vm1: State change: match notification protocol >>> successful >>> 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched >>> 4/3 13:23:47 DaemonCore: Command received via TCP from host >>> <172.24.89.61:9693> >>> 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM), >>> calling handler (command_request_claim) >>> 4/3 13:23:47 vm1: Request accepted. >>> 4/3 13:23:47 vm1: Remote owner is >>> gri...@ce... >>> 4/3 13:23:47 vm1: State change: claiming protocol successful >>> 4/3 13:23:47 vm1: Changing state: Matched -> Claimed >>> 4/3 13:23:50 DaemonCore: Command received via TCP from host >>> <172.24.89.61:9669> >>> 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM), >>> calling handler (command_activate_claim) >>> 4/3 13:23:50 vm1: Got activate_claim request from shadow >>> (<172.24.89.61:9669>) >>> 4/3 13:23:50 vm1: Remote job ID is 191.0 >>> 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad >>> 4/3 13:23:50 vm1: State change: claim-activation protocol successful >>> 4/3 13:23:50 vm1: Changing activity: Idle -> Busy >>> 4/3 13:23:51 DaemonCore: Command received via TCP from host >>> <172.24.89.61:9652> >>> 4/3 13:23:51 DaemonCore: received command 404 >>> (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler) >>> 4/3 13:23:51 vm1: Called deactivate_claim_forcibly() >>> 4/3 13:23:51 Starter pid 31148 exited with status 0 >>> 4/3 13:23:51 vm1: State change: starter exited >>> 4/3 13:23:51 vm1: Changing activity: Busy -> Idle >>> 4/3 13:23:52 DaemonCore: Command received via UDP from host >>> <172.24.89.61:9620> >>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), >>> calling handler (command_handler) >>> 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command >>> 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle -> >>> Preempting/Vacating >>> 4/3 13:23:52 vm1: State change: No preempting claim, returning >>> to owner >>> 4/3 13:23:52 vm1: Changing state and activity: Preempting/ >>> Vacating - > Owner/Idle >>> 4/3 13:23:52 vm1: State change: IS_OWNER is false >>> 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed >>> 4/3 13:23:52 DaemonCore: Command received via UDP from host >>> <172.24.89.61:9675> >>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), >>> calling handler (command_handler) >>> 4/3 13:23:52 Error: can't find resource with capability >>> (<172.24.89.1:9666>#7928521674) >>> >>> it seems the job can be submit to condor centra-manager and >>> executing at node, then terminated by unknown reason. it is fine >>> to run jobs either from condor_submit or from globus. I am >>> confusing this is due to our condor setting our gridsam or OMII >>> level. >>> >>> BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the >>> virtula files are there. >>> >>> sorry, I am not sure should I ask GridSAM question here, I am >>> not even sure that is gridsam, OMII, or condor problem. becasue, >>> when i run PBAC test, it fail again. i was working just after >>> reinstall OMII server. >>> >>> thank you so much for giving any assistant. >>> >>> Best Regard! >>> gen-tao >>> >>> Current Assignees: Steve McGough (GridSAM), William Lee >>> (GridSAM), Steven Newhouse >>> >>> CC(s): >>> >>> Contact Information: >>> >>> Customer Name: Gen-Tao Chiang Email address: ge...@ni... >>> Organisation: NIEeS Secondary email address: gt...@ca... >>> >>> >> >> --- William Lee - Software Coordinator --- >> --- London e-Science Centre, Imperial College London --- >> A: Room 211a, London e-Science Centre, William Penney Laboratory, >> Imperial College London, South Kensington, London, SW7 2AZ, UK >> E: wwhl at doc.ic.ac.uk | william at imageunion.com >> W: www.lesc.ic.ac.uk | www.imageunion.com >> P: +44 (0) 207 594 8185 >> >> >> --- William Lee - Software Coordinator --- --- London e-Science Centre, Imperial College London --- A: Room 211a, London e-Science Centre, William Penney Laboratory, Imperial College London, South Kensington, London, SW7 2AZ, UK E: wwhl at doc.ic.ac.uk | william at imageunion.com W: www.lesc.ic.ac.uk | www.imageunion.com P: +44 (0) 207 594 8185 |
|
From: G.T. C. <gt...@ca...> - 2006-04-07 09:12:10
|
Dear William thakn you very much! this version works! right now i can see .condor.script in the condor-submitter. i have tested one of the testing job, the jsdl is like the following <?xml version="1.0" encoding="UTF-8"?> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> <JobDescription> <Application> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl-posix"> <Executable>/bin/echo</Executable> <Argument>hello hihi</Argument> <Output>stdout.txt</Output> </POSIXApplication> </Application> <DataStaging> <FileName>unknown-file.txt</FileName> <CreationFlag>overwrite</CreationFlag> <DeleteOnTermination>true</DeleteOnTermination> <Target> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/subdir/output-file.txt</URI> </Target> </DataStaging> </JobDescription> </JobDefinition> the condor.scrip is like the foolwing universe=vanilla when_to_transfer_output=ON_EXIT should_transfer_files=IF_NEEDED should_transfer_files=YES notification=Never log=/tmp/condor.log executable=/bin/echo arguments=hello gen-tao output=stdout.txt transfer_output_files=unknown-file.txt queue however, this jobs are keep in idel status in condor. however, if i remove the transfer_outpu_files=unknow-file.txt, the job can be executed. i have seen that the similar problem wiht GT4 and condor. it seems that is condor problme. is that relatd to file system? sorry, i think the gridsam is wokring now! just need to figure out what's wrong in our condor pool!! thank you very much!!! Best Regard! gen-tao On Apr 6 2006, William Lee wrote: >Thanks. The classad.groovy copy I have put up is for later version >that is not yet bundled with OMII. > >I have an updated copy for OMII 2.3.3 at the same location. Please >try that instead. > >William > > >On 6 Apr 2006, at 15:06, OMII Support wrote: > >> When replying, type your text above this line. Notification of >> Query Change >> >> >> Priority: Normal Status: Agent Replied >> Creation Date: 03/04/2006 Creation Time: 13:30:47 >> Created By: ge...@ni... >> >> Click here to view Query in Browser >> >> Description: >> Entered on 06/04/2006 at 15:06:02 by William Lee (GridSAM): >> May I ask which version of GridSAM you are using? If it's from the >> OMII bundle, which OMII version? >> >> William >> >> On 6 Apr 2006, at 14:58, G.T. Chiang wrote: >> >> > Dear William >> > >> > thakn you very much!! this verison is getting better, the >> > following is the gridsam-status results. at least job is being >> > processing via condor, but somehow it fails. >> > >> > [root@agorilla examples]# gridsam-status -s "http:// >> > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?wsdl" >> > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: pending - >> > > staging-in -> staged-in -> active -> failed >> > >> > --- pending - 2006-04-06 14:52:09.0 --- >> > job is being scheduled >> > --- staging-in - 2006-04-06 14:52:09.0 --- >> > staging files... >> > --- staged-in - 2006-04-06 14:52:09.0 --- >> > 2 files staged in >> > --- active - 2006-04-06 14:52:09.0 --- >> > job is being launched through condor >> > --- failed - 2006-04-06 14:52:09.0 --- >> > expecting job property urn:condor:classad from previous stage >> > >> > -------------- >> > Job Properties >> > -------------- >> > urn:gridsam:Description=cat job description >> > urn:gridsam:JobProject=gridsam project >> > urn:gridsam:JobAnnotation=no annotation >> > urn:gridsam:JobName=cat job >> > [root@agorilla examples]# >> > >> > >> > the following is the log from gridsam.log >> > >> > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] >> > state {pending} reached 2006-04-06 14:52:09,574 INFO >> > [006868a90a6f50f8010a6f796cea0021] initialised working directory: / >> > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 >> > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state {staging- >> > in} reached 2006-04-06 14:52:09,662 INFO >> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http:// >> > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp:// >> > gri...@ce.../tmp/ >> > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06 >> > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/file1.txt >> > staged 2006-04-06 14:52:09,791 INFO >> > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp:// >> > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt -> >> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ >> > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt >> > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] >> > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO >> > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached >> > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] >> > executing groovy script classad.groovy 2006-04-06 14:52:09,871 INFO >> > [006868a90a6f50f8010a6f796cea0021] executed groovy script >> > classad.groovy 2006-04-06 14:52:09,898 INFO >> > [006868a90a6f50f8010a6f796cea0021] state {active} reached >> > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021] >> > Failed to submit condor job: expecting job property >> > urn:condor:classad from previous stage 2006-04-06 14:52:09,903 INFO >> > [006868a90a6f50f8010a6f796cea0021] state {failed} reached >> > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] >> > failed 1144331529450 1144331529607 1144331529843 1144331529898 >> > 1144331529903 >> > >> > >> > is it possible to obtain the condor job description file which >> > converted from gridsam JSDL. can I try to submit it to condor >> > directly? >> > >> > Best Regard! >> > gen-tao >> > >> > >> > >> > >> > >> > >> > >> > >> > On Apr 6 2006, William Lee wrote: >> > >> >> >> >> Please try the classad.groovy script at this location. It >> >> incorporates a solution that sets up the transfer_input_files and >> >> the transfer_output_files classad attributes in the JSDL-to- >> >> Classad translation. This is needed if the submission node (the >> >> node which GridSAM is running) does not necessarily share a >> >> common file system with the execution nodes. >> >> >> >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy >> >> >> >> William >> >> >> >> On 4 Apr 2006, at 15:15, OMII Support wrote: >> >> >> >>> [Duplicate message snipped] >> >> Entered on 06/04/2006 at 15:00:02 by gt...@ca...: >> Dear William >> >> thakn you very much!! this verison is getting better, the following is >> the gridsam-status results. at least job is being processing via >> condor, >> but somehow it fails. >> >> [root@agorilla examples]# gridsam-status -s >> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ >> gridsam?wsdl" >> urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: pending -> >> staging-in -> staged-in -> active -> failed >> >> --- pending - 2006-04-06 14:52:09.0 --- >> job is being scheduled >> --- staging-in - 2006-04-06 14:52:09.0 --- >> staging files... >> --- staged-in - 2006-04-06 14:52:09.0 --- >> 2 files staged in >> --- active - 2006-04-06 14:52:09.0 --- >> job is being launched through condor >> --- failed - 2006-04-06 14:52:09.0 --- >> expecting job property urn:condor:classad from previous stage >> >> -------------- >> Job Properties >> -------------- >> urn:gridsam:Description=cat job description >> urn:gridsam:JobProject=gridsam project >> urn:gridsam:JobAnnotation=no annotation >> urn:gridsam:JobName=cat job >> [root@agorilla examples]# >> >> the following is the log from gridsam.log >> >> 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] state >> {pending} reached 2006-04-06 14:52:09,574 INFO >> [006868a90a6f50f8010a6f796cea0021] initialised working directory: >> /tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 >> 14:52:09,607 INFO >> [006868a90a6f50f8010a6f796cea0021] state {staging-in} reached >> 2006-04-06 >> 14:52:09,662 INFO [006868a90a6f50f8010a6f796cea0021] staging (copy) >> file >> http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> >> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ >> gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt >> 2006-04-06 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] >> dir1/file1.txt staged 2006-04-06 14:52:09,791 INFO >> [006868a90a6f50f8010a6f796cea0021] staging (copy) file >> ftp://anonymous:anonymous@128.232.232.41:19245/subdir/input- >> file.txt -> >> sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ >> gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt >> 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] >> dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO >> [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached >> 2006-04-06 >> 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] executing groovy >> script classad.groovy 2006-04-06 14:52:09,871 INFO >> [006868a90a6f50f8010a6f796cea0021] executed groovy script >> classad.groovy >> 2006-04-06 14:52:09,898 INFO [006868a90a6f50f8010a6f796cea0021] state >> {active} reached 2006-04-06 14:52:09,903 ERROR >> [006868a90a6f50f8010a6f796cea0021] Failed to submit condor job: >> expecting >> job property urn:condor:classad from previous stage 2006-04-06 >> 14:52:09,903 >> INFO [006868a90a6f50f8010a6f796cea0021] state {failed} reached >> 2006-04-06 >> 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] failed >> 1144331529450 >> 1144331529607 1144331529843 1144331529898 1144331529903 >> >> is it possible to obtain the condor job description file which >> converted >> from gridsam JSDL. can I try to submit it to condor directly? >> >> Best Regard! >> gen-tao >> >> On Apr 6 2006, William Lee wrote: >> >> > >> >Please try the classad.groovy script at this location. It >> >incorporates a solution that sets up the transfer_input_files and the >> >transfer_output_files classad attributes in the JSDL-to-Classad >> >translation. This is needed if the submission node (the node which >> >GridSAM is running) does not necessarily share a common file system >> >with the execution nodes. >> > >> >http://www.doc.ic.ac.uk/~wwhl/classad.groovy >> > >> >William >> > >> >On 4 Apr 2006, at 15:15, OMII Support wrote: >> > >> >> [Duplicate message snipped] >> >> Entered on 06/04/2006 at 13:57:02 by William Lee (GridSAM): >> Please try the classad.groovy script at this location. It >> incorporates a solution that sets up the transfer_input_files and the >> transfer_output_files classad attributes in the JSDL-to-Classad >> translation. This is needed if the submission node (the node which >> GridSAM is running) does not necessarily share a common file system >> with the execution nodes. >> >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy >> >> William >> >> On 4 Apr 2006, at 15:15, OMII Support wrote: >> >> > [Duplicate message snipped] >> >> Entered on 04/04/2006 at 15:15:01 by gt...@ca...: >> Dear William >> >> thank you so much!! i modefi the classad.grrovy with adding your code. >> now the probme becomes undefined and job can not submited to condor. >> [root@agorilla examples]# gridsam-status -s >> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ >> gridsam?wsdl" >> urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress: pending -> >> staging-in -> staged-in -> undefined >> >> --- pending - 2006-04-04 15:07:13.0 --- >> job is being scheduled >> --- staging-in - 2006-04-04 15:07:13.0 --- >> staging files... >> --- staged-in - 2006-04-04 15:07:13.0 --- >> 2 files staged in >> --- undefined - 2006-04-04 15:07:13.0 --- >> cannot advance from 'staged-in' to 'done' >> >> -------------- >> Job Properties >> -------------- >> urn:condor:purestaging=true >> [root@agorilla examples]# >> >> thank you for any suggestion!! >> >> Best Regard! >> gen-tao >> >> On Apr 4 2006, OMII Support wrote: >> >> >[Duplicate message snipped] >> >> Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM): >> Hi Gen Tao, >> >> You are right, according to the condor setup, you would have to >> modify the classad.groovy script to enable the transfer_input_files >> and transfer_output_files classad attributes. This only applies to >> condor setup that does not share a common networked file system. >> >> The code to add to the classad.groovy is >> >> jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/ >> jsdl:DataStaging", ns).eachWithIndex(){ >> node, index -> >> if(index == 0){ >> script += "transfer_input_files=" >> } >> if(!node.select("jsdl:Source", ns).isEmpty()){ >> fileName = node.select("jsdl:FileName")[0].text; >> script += "${fileName} ," >> } >> } >> >> I haven't been able to test the code above. Feel free to make any >> modification as you see fit. >> >> William >> >> On 3 Apr 2006, at 20:09, OMII Support wrote: >> >> > [Duplicate message snipped] >> >> Entered on 03/04/2006 at 20:09:02 by gt...@ca...: >> Dear Sir >> >> the resuts as following: >> [condor@badger1--niees--group jobs]$ less stderr.txt >> condor_exec.exe: dir1/file1.txt: No such file or directory >> condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory >> [condor@badger1--niees--group jobs]$ >> >> those files had been staged to the central manager, but not in the >> executing node. sorry, our central manager is not configured to run >> jobs. >> thus, central manager will submit jobs to other machines. that's >> why when i >> run this condor job at executing node, and can not find related >> files. is >> this normal? shoudl central manager copy those files to other work >> nodes as >> well? souhld i changing sometihgn in classad.groovy? >> >> thank you very much!! >> >> gen-tao >> >> thakn you very much!! >> >> On Apr 3 2006, OMII Support wrote: >> >> >[Duplicate message snipped] >> >> Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM): >> It's not apparent where the problem lies. Condor has reported to >> GridSAM the job has >> completed successfully with exit code 1. Hence the description >> shown in the EXECUTED >> state. >> >> Can you try running a condor job with the following classad directly? >> >> universe=vanilla >> when_to_transfer_output=ON_EXIT >> should_transfer_files=IF_NEEDED >> notification=Never >> log=/tmp/condor.log >> executable=/bin/cat >> arguments=dir1/file1.txt dir2/subdir1/file2.txt >> output=stdout.txt >> error=stderr.txt >> >> queue >> >> Entered on 03/04/2006 at 13:30:47 by ge...@ni...: >> Dear Sir >> >> i am trying to run some GridSAM testing programs. however, it seems >> the jobs can not be executed in our condor pool. the condor pool is >> working. the job can be submited to the condor_submitter and >> running at condor node, but then failed. >> >> the following are some information! >> >> this is the modefied cat-staging.jsdl >> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> >> <JobDescription> >> <JobIdentification> >> <JobName>cat job</JobName> >> <Description>cat job description</Description> >> <JobAnnotation>no annotation</JobAnnotation> >> <JobProject>gridsam project</JobProject> >> </JobIdentification> >> <Application> >> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl- >> posix"> >> <Executable>/bin/cat</Executable> >> <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument> >> <Output>stdout.txt</Output> >> <Error>stderr.txt</Error> >> </POSIXApplication> >> </Application> >> <DataStaging> >> <FileName>dir1/file1.txt</FileName> >> <CreationFlag >overwrite</CreationFlag> >> <Source> >> <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI> >> </Source> >> </DataStaging> >> <DataStaging> >> <FileName>dir2/subdir1/file2.txt</FileName> >> <CreationFlag>overwrite</CreationFlag> >> <Source> >> <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input- >> file.txt</URI> >> </Source> >> </DataStaging> >> <DataStaging> >> <FileName>stdout.txt</FileName> >> <CreationFlag>overwrite</CreationFlag> >> <DeleteOnTermination>true</DeleteOnTermination> >> <Target> >> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/ >> stdout.txt</URI> >> </Target> >> </DataStaging> >> </JobDescription> >> </JobDefinition> >> >> after submit this file >> >> [root@agorilla examples]# gridsam-status -s "http:// >> agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?wsdl" >> urn:gridsam:006868a90a4d221e010a5fb493650117 >> Job Progress: pending -> staging-in -> staged-in -> active -> >> executed -> staging-out -> staged-out -> done >> >> --- pending - 2006-04-03 13:22:50.0 --- >> job is being scheduled >> --- staging-in - 2006-04-03 13:22:50.0 --- >> staging files... >> --- staged-in - 2006-04-03 13:22:59.0 --- >> 2 files staged in >> --- active - 2006-04-03 13:22:59.0 --- >> job is being launched through condor >> --- executed - 2006-04-03 13:23:04.0 --- >> 04/03 13:23:52 Job terminated. (1) Normal termination (return value >> 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 >> 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 >> 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - >> Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run Bytes >> Received By Job 126 - Total Bytes Sent By Job 15992 - Total Bytes >> Received By Job >> --- staging-out - 2006-04-03 13:23:04.0 --- >> staging files out... >> --- staged-out - 2006-04-03 13:23:04.0 --- >> 1 files staged out >> --- done - 2006-04-03 13:23:04.0 --- >> Job completed >> >> -------------- >> Job Properties >> -------------- >> urn:gridsam:Description=cat job description >> urn:gridsam:JobProject=gridsam project >> urn:gridsam:JobAnnotation=no annotation >> urn:gridsam:JobName=cat job >> urn:condor:classad=universe=vanilla >> when_to_transfer_output=ON_EXIT >> should_transfer_files=IF_NEEDED >> notification=Never >> log=/tmp/condor.log >> >> executable=/bin/cat >> arguments=dir1/file1.txt dir2/subdir1/file2.txt >> output=stdout.txt >> >> error=stderr.txt >> >> queue >> urn:condor:clusterid=191 >> urn:gridsam:exitcode=1 >> [root@agorilla examples]# >> >> if i go the the executing node and the log indicates the following >> 4/3 13:23:47 DaemonCore: Command received via UDP from host >> <172.24.89.61:9632> >> 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO), calling >> handler (command_match_info) >> 4/3 13:23:47 vm1: match_info called >> 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674 >> 4/3 13:23:47 vm1: State change: match notification protocol successful >> 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched >> 4/3 13:23:47 DaemonCore: Command received via TCP from host >> <172.24.89.61:9693> >> 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM), >> calling handler (command_request_claim) >> 4/3 13:23:47 vm1: Request accepted. >> 4/3 13:23:47 vm1: Remote owner is >> gri...@ce... >> 4/3 13:23:47 vm1: State change: claiming protocol successful >> 4/3 13:23:47 vm1: Changing state: Matched -> Claimed >> 4/3 13:23:50 DaemonCore: Command received via TCP from host >> <172.24.89.61:9669> >> 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM), >> calling handler (command_activate_claim) >> 4/3 13:23:50 vm1: Got activate_claim request from shadow >> (<172.24.89.61:9669>) >> 4/3 13:23:50 vm1: Remote job ID is 191.0 >> 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad >> 4/3 13:23:50 vm1: State change: claim-activation protocol successful >> 4/3 13:23:50 vm1: Changing activity: Idle -> Busy >> 4/3 13:23:51 DaemonCore: Command received via TCP from host >> <172.24.89.61:9652> >> 4/3 13:23:51 DaemonCore: received command 404 >> (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler) >> 4/3 13:23:51 vm1: Called deactivate_claim_forcibly() >> 4/3 13:23:51 Starter pid 31148 exited with status 0 >> 4/3 13:23:51 vm1: State change: starter exited >> 4/3 13:23:51 vm1: Changing activity: Busy -> Idle >> 4/3 13:23:52 DaemonCore: Command received via UDP from host >> <172.24.89.61:9620> >> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), >> calling handler (command_handler) >> 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command >> 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle -> >> Preempting/Vacating >> 4/3 13:23:52 vm1: State change: No preempting claim, returning to >> owner >> 4/3 13:23:52 vm1: Changing state and activity: Preempting/Vacating - >> > Owner/Idle >> 4/3 13:23:52 vm1: State change: IS_OWNER is false >> 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed >> 4/3 13:23:52 DaemonCore: Command received via UDP from host >> <172.24.89.61:9675> >> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), >> calling handler (command_handler) >> 4/3 13:23:52 Error: can't find resource with capability >> (<172.24.89.1:9666>#7928521674) >> >> it seems the job can be submit to condor centra-manager and >> executing at node, then terminated by unknown reason. it is fine to >> run jobs either from condor_submit or from globus. I am confusing >> this is due to our condor setting our gridsam or OMII level. >> >> BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the >> virtula files are there. >> >> sorry, I am not sure should I ask GridSAM question here, I am not >> even sure that is gridsam, OMII, or condor problem. becasue, when i >> run PBAC test, it fail again. i was working just after reinstall >> OMII server. >> >> thank you so much for giving any assistant. >> >> Best Regard! >> gen-tao >> >> Current Assignees: Steve McGough (GridSAM), William Lee (GridSAM), >> Steven Newhouse >> >> CC(s): >> >> Contact Information: >> >> Customer Name: Gen-Tao Chiang Email address: ge...@ni... >> Organisation: NIEeS Secondary email address: gt...@ca... >> >> > >--- William Lee - Software Coordinator --- >--- London e-Science Centre, Imperial College London --- >A: Room 211a, London e-Science Centre, William Penney Laboratory, >Imperial College London, South Kensington, London, SW7 2AZ, UK >E: wwhl at doc.ic.ac.uk | william at imageunion.com >W: www.lesc.ic.ac.uk | www.imageunion.com >P: +44 (0) 207 594 8185 > > > |
|
From: William L. <ww...@do...> - 2006-04-06 15:19:47
|
Thanks. The classad.groovy copy I have put up is for later version that is not yet bundled with OMII. I have an updated copy for OMII 2.3.3 at the same location. Please try that instead. William On 6 Apr 2006, at 15:06, OMII Support wrote: > When replying, type your text above this line. Notification of > Query Change > > > Priority: Normal Status: Agent Replied > Creation Date: 03/04/2006 Creation Time: 13:30:47 > Created By: ge...@ni... > > Click here to view Query in Browser > > Description: > Entered on 06/04/2006 at 15:06:02 by William Lee (GridSAM): > May I ask which version of GridSAM you are using? If it's from the > OMII bundle, which OMII version? > > William > > On 6 Apr 2006, at 14:58, G.T. Chiang wrote: > > > Dear William > > > > thakn you very much!! this verison is getting better, the > > following is the gridsam-status results. at least job is being > > processing via condor, but somehow it fails. > > > > [root@agorilla examples]# gridsam-status -s "http:// > > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?wsdl" > > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: pending - > > > staging-in -> staged-in -> active -> failed > > > > --- pending - 2006-04-06 14:52:09.0 --- > > job is being scheduled > > --- staging-in - 2006-04-06 14:52:09.0 --- > > staging files... > > --- staged-in - 2006-04-06 14:52:09.0 --- > > 2 files staged in > > --- active - 2006-04-06 14:52:09.0 --- > > job is being launched through condor > > --- failed - 2006-04-06 14:52:09.0 --- > > expecting job property urn:condor:classad from previous stage > > > > -------------- > > Job Properties > > -------------- > > urn:gridsam:Description=cat job description > > urn:gridsam:JobProject=gridsam project > > urn:gridsam:JobAnnotation=no annotation > > urn:gridsam:JobName=cat job > > [root@agorilla examples]# > > > > > > the following is the log from gridsam.log > > > > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] > > state {pending} reached 2006-04-06 14:52:09,574 INFO > > [006868a90a6f50f8010a6f796cea0021] initialised working directory: / > > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 > > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state {staging- > > in} reached 2006-04-06 14:52:09,662 INFO > > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http:// > > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp:// > > gri...@ce.../tmp/ > > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06 > > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/file1.txt > > staged 2006-04-06 14:52:09,791 INFO > > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp:// > > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt -> > > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ > > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt > > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] > > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO > > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached > > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] > > executing groovy script classad.groovy 2006-04-06 14:52:09,871 INFO > > [006868a90a6f50f8010a6f796cea0021] executed groovy script > > classad.groovy 2006-04-06 14:52:09,898 INFO > > [006868a90a6f50f8010a6f796cea0021] state {active} reached > > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021] > > Failed to submit condor job: expecting job property > > urn:condor:classad from previous stage 2006-04-06 14:52:09,903 INFO > > [006868a90a6f50f8010a6f796cea0021] state {failed} reached > > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] > > failed 1144331529450 1144331529607 1144331529843 1144331529898 > > 1144331529903 > > > > > > is it possible to obtain the condor job description file which > > converted from gridsam JSDL. can I try to submit it to condor > > directly? > > > > Best Regard! > > gen-tao > > > > > > > > > > > > > > > > > > On Apr 6 2006, William Lee wrote: > > > >> > >> Please try the classad.groovy script at this location. It > >> incorporates a solution that sets up the transfer_input_files and > >> the transfer_output_files classad attributes in the JSDL-to- > >> Classad translation. This is needed if the submission node (the > >> node which GridSAM is running) does not necessarily share a > >> common file system with the execution nodes. > >> > >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy > >> > >> William > >> > >> On 4 Apr 2006, at 15:15, OMII Support wrote: > >> > >>> [Duplicate message snipped] > > Entered on 06/04/2006 at 15:00:02 by gt...@ca...: > Dear William > > thakn you very much!! this verison is getting better, the following is > the gridsam-status results. at least job is being processing via > condor, > but somehow it fails. > > [root@agorilla examples]# gridsam-status -s > "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ > gridsam?wsdl" > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: pending -> > staging-in -> staged-in -> active -> failed > > --- pending - 2006-04-06 14:52:09.0 --- > job is being scheduled > --- staging-in - 2006-04-06 14:52:09.0 --- > staging files... > --- staged-in - 2006-04-06 14:52:09.0 --- > 2 files staged in > --- active - 2006-04-06 14:52:09.0 --- > job is being launched through condor > --- failed - 2006-04-06 14:52:09.0 --- > expecting job property urn:condor:classad from previous stage > > -------------- > Job Properties > -------------- > urn:gridsam:Description=cat job description > urn:gridsam:JobProject=gridsam project > urn:gridsam:JobAnnotation=no annotation > urn:gridsam:JobName=cat job > [root@agorilla examples]# > > the following is the log from gridsam.log > > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] state > {pending} reached 2006-04-06 14:52:09,574 INFO > [006868a90a6f50f8010a6f796cea0021] initialised working directory: > /tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 > 14:52:09,607 INFO > [006868a90a6f50f8010a6f796cea0021] state {staging-in} reached > 2006-04-06 > 14:52:09,662 INFO [006868a90a6f50f8010a6f796cea0021] staging (copy) > file > http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt > 2006-04-06 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] > dir1/file1.txt staged 2006-04-06 14:52:09,791 INFO > [006868a90a6f50f8010a6f796cea0021] staging (copy) file > ftp://anonymous:anonymous@128.232.232.41:19245/subdir/input- > file.txt -> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached > 2006-04-06 > 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] executing groovy > script classad.groovy 2006-04-06 14:52:09,871 INFO > [006868a90a6f50f8010a6f796cea0021] executed groovy script > classad.groovy > 2006-04-06 14:52:09,898 INFO [006868a90a6f50f8010a6f796cea0021] state > {active} reached 2006-04-06 14:52:09,903 ERROR > [006868a90a6f50f8010a6f796cea0021] Failed to submit condor job: > expecting > job property urn:condor:classad from previous stage 2006-04-06 > 14:52:09,903 > INFO [006868a90a6f50f8010a6f796cea0021] state {failed} reached > 2006-04-06 > 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] failed > 1144331529450 > 1144331529607 1144331529843 1144331529898 1144331529903 > > is it possible to obtain the condor job description file which > converted > from gridsam JSDL. can I try to submit it to condor directly? > > Best Regard! > gen-tao > > On Apr 6 2006, William Lee wrote: > > > > >Please try the classad.groovy script at this location. It > >incorporates a solution that sets up the transfer_input_files and the > >transfer_output_files classad attributes in the JSDL-to-Classad > >translation. This is needed if the submission node (the node which > >GridSAM is running) does not necessarily share a common file system > >with the execution nodes. > > > >http://www.doc.ic.ac.uk/~wwhl/classad.groovy > > > >William > > > >On 4 Apr 2006, at 15:15, OMII Support wrote: > > > >> [Duplicate message snipped] > > Entered on 06/04/2006 at 13:57:02 by William Lee (GridSAM): > Please try the classad.groovy script at this location. It > incorporates a solution that sets up the transfer_input_files and the > transfer_output_files classad attributes in the JSDL-to-Classad > translation. This is needed if the submission node (the node which > GridSAM is running) does not necessarily share a common file system > with the execution nodes. > > http://www.doc.ic.ac.uk/~wwhl/classad.groovy > > William > > On 4 Apr 2006, at 15:15, OMII Support wrote: > > > [Duplicate message snipped] > > Entered on 04/04/2006 at 15:15:01 by gt...@ca...: > Dear William > > thank you so much!! i modefi the classad.grrovy with adding your code. > now the probme becomes undefined and job can not submited to condor. > [root@agorilla examples]# gridsam-status -s > "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ > gridsam?wsdl" > urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress: pending -> > staging-in -> staged-in -> undefined > > --- pending - 2006-04-04 15:07:13.0 --- > job is being scheduled > --- staging-in - 2006-04-04 15:07:13.0 --- > staging files... > --- staged-in - 2006-04-04 15:07:13.0 --- > 2 files staged in > --- undefined - 2006-04-04 15:07:13.0 --- > cannot advance from 'staged-in' to 'done' > > -------------- > Job Properties > -------------- > urn:condor:purestaging=true > [root@agorilla examples]# > > thank you for any suggestion!! > > Best Regard! > gen-tao > > On Apr 4 2006, OMII Support wrote: > > >[Duplicate message snipped] > > Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM): > Hi Gen Tao, > > You are right, according to the condor setup, you would have to > modify the classad.groovy script to enable the transfer_input_files > and transfer_output_files classad attributes. This only applies to > condor setup that does not share a common networked file system. > > The code to add to the classad.groovy is > > jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/ > jsdl:DataStaging", ns).eachWithIndex(){ > node, index -> > if(index == 0){ > script += "transfer_input_files=" > } > if(!node.select("jsdl:Source", ns).isEmpty()){ > fileName = node.select("jsdl:FileName")[0].text; > script += "${fileName} ," > } > } > > I haven't been able to test the code above. Feel free to make any > modification as you see fit. > > William > > On 3 Apr 2006, at 20:09, OMII Support wrote: > > > [Duplicate message snipped] > > Entered on 03/04/2006 at 20:09:02 by gt...@ca...: > Dear Sir > > the resuts as following: > [condor@badger1--niees--group jobs]$ less stderr.txt > condor_exec.exe: dir1/file1.txt: No such file or directory > condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory > [condor@badger1--niees--group jobs]$ > > those files had been staged to the central manager, but not in the > executing node. sorry, our central manager is not configured to run > jobs. > thus, central manager will submit jobs to other machines. that's > why when i > run this condor job at executing node, and can not find related > files. is > this normal? shoudl central manager copy those files to other work > nodes as > well? souhld i changing sometihgn in classad.groovy? > > thank you very much!! > > gen-tao > > thakn you very much!! > > On Apr 3 2006, OMII Support wrote: > > >[Duplicate message snipped] > > Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM): > It's not apparent where the problem lies. Condor has reported to > GridSAM the job has > completed successfully with exit code 1. Hence the description > shown in the EXECUTED > state. > > Can you try running a condor job with the following classad directly? > > universe=vanilla > when_to_transfer_output=ON_EXIT > should_transfer_files=IF_NEEDED > notification=Never > log=/tmp/condor.log > executable=/bin/cat > arguments=dir1/file1.txt dir2/subdir1/file2.txt > output=stdout.txt > error=stderr.txt > > queue > > Entered on 03/04/2006 at 13:30:47 by ge...@ni...: > Dear Sir > > i am trying to run some GridSAM testing programs. however, it seems > the jobs can not be executed in our condor pool. the condor pool is > working. the job can be submited to the condor_submitter and > running at condor node, but then failed. > > the following are some information! > > this is the modefied cat-staging.jsdl > <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> > <JobDescription> > <JobIdentification> > <JobName>cat job</JobName> > <Description>cat job description</Description> > <JobAnnotation>no annotation</JobAnnotation> > <JobProject>gridsam project</JobProject> > </JobIdentification> > <Application> > <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl- > posix"> > <Executable>/bin/cat</Executable> > <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument> > <Output>stdout.txt</Output> > <Error>stderr.txt</Error> > </POSIXApplication> > </Application> > <DataStaging> > <FileName>dir1/file1.txt</FileName> > <CreationFlag >overwrite</CreationFlag> > <Source> > <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI> > </Source> > </DataStaging> > <DataStaging> > <FileName>dir2/subdir1/file2.txt</FileName> > <CreationFlag>overwrite</CreationFlag> > <Source> > <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input- > file.txt</URI> > </Source> > </DataStaging> > <DataStaging> > <FileName>stdout.txt</FileName> > <CreationFlag>overwrite</CreationFlag> > <DeleteOnTermination>true</DeleteOnTermination> > <Target> > <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/ > stdout.txt</URI> > </Target> > </DataStaging> > </JobDescription> > </JobDefinition> > > after submit this file > > [root@agorilla examples]# gridsam-status -s "http:// > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?wsdl" > urn:gridsam:006868a90a4d221e010a5fb493650117 > Job Progress: pending -> staging-in -> staged-in -> active -> > executed -> staging-out -> staged-out -> done > > --- pending - 2006-04-03 13:22:50.0 --- > job is being scheduled > --- staging-in - 2006-04-03 13:22:50.0 --- > staging files... > --- staged-in - 2006-04-03 13:22:59.0 --- > 2 files staged in > --- active - 2006-04-03 13:22:59.0 --- > job is being launched through condor > --- executed - 2006-04-03 13:23:04.0 --- > 04/03 13:23:52 Job terminated. (1) Normal termination (return value > 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr 0 > 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys 0 > 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 - > Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run Bytes > Received By Job 126 - Total Bytes Sent By Job 15992 - Total Bytes > Received By Job > --- staging-out - 2006-04-03 13:23:04.0 --- > staging files out... > --- staged-out - 2006-04-03 13:23:04.0 --- > 1 files staged out > --- done - 2006-04-03 13:23:04.0 --- > Job completed > > -------------- > Job Properties > -------------- > urn:gridsam:Description=cat job description > urn:gridsam:JobProject=gridsam project > urn:gridsam:JobAnnotation=no annotation > urn:gridsam:JobName=cat job > urn:condor:classad=universe=vanilla > when_to_transfer_output=ON_EXIT > should_transfer_files=IF_NEEDED > notification=Never > log=/tmp/condor.log > > executable=/bin/cat > arguments=dir1/file1.txt dir2/subdir1/file2.txt > output=stdout.txt > > error=stderr.txt > > queue > urn:condor:clusterid=191 > urn:gridsam:exitcode=1 > [root@agorilla examples]# > > if i go the the executing node and the log indicates the following > 4/3 13:23:47 DaemonCore: Command received via UDP from host > <172.24.89.61:9632> > 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO), calling > handler (command_match_info) > 4/3 13:23:47 vm1: match_info called > 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674 > 4/3 13:23:47 vm1: State change: match notification protocol successful > 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched > 4/3 13:23:47 DaemonCore: Command received via TCP from host > <172.24.89.61:9693> > 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM), > calling handler (command_request_claim) > 4/3 13:23:47 vm1: Request accepted. > 4/3 13:23:47 vm1: Remote owner is > gri...@ce... > 4/3 13:23:47 vm1: State change: claiming protocol successful > 4/3 13:23:47 vm1: Changing state: Matched -> Claimed > 4/3 13:23:50 DaemonCore: Command received via TCP from host > <172.24.89.61:9669> > 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM), > calling handler (command_activate_claim) > 4/3 13:23:50 vm1: Got activate_claim request from shadow > (<172.24.89.61:9669>) > 4/3 13:23:50 vm1: Remote job ID is 191.0 > 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad > 4/3 13:23:50 vm1: State change: claim-activation protocol successful > 4/3 13:23:50 vm1: Changing activity: Idle -> Busy > 4/3 13:23:51 DaemonCore: Command received via TCP from host > <172.24.89.61:9652> > 4/3 13:23:51 DaemonCore: received command 404 > (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler) > 4/3 13:23:51 vm1: Called deactivate_claim_forcibly() > 4/3 13:23:51 Starter pid 31148 exited with status 0 > 4/3 13:23:51 vm1: State change: starter exited > 4/3 13:23:51 vm1: Changing activity: Busy -> Idle > 4/3 13:23:52 DaemonCore: Command received via UDP from host > <172.24.89.61:9620> > 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), > calling handler (command_handler) > 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command > 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle -> > Preempting/Vacating > 4/3 13:23:52 vm1: State change: No preempting claim, returning to > owner > 4/3 13:23:52 vm1: Changing state and activity: Preempting/Vacating - > > Owner/Idle > 4/3 13:23:52 vm1: State change: IS_OWNER is false > 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed > 4/3 13:23:52 DaemonCore: Command received via UDP from host > <172.24.89.61:9675> > 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), > calling handler (command_handler) > 4/3 13:23:52 Error: can't find resource with capability > (<172.24.89.1:9666>#7928521674) > > it seems the job can be submit to condor centra-manager and > executing at node, then terminated by unknown reason. it is fine to > run jobs either from condor_submit or from globus. I am confusing > this is due to our condor setting our gridsam or OMII level. > > BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the > virtula files are there. > > sorry, I am not sure should I ask GridSAM question here, I am not > even sure that is gridsam, OMII, or condor problem. becasue, when i > run PBAC test, it fail again. i was working just after reinstall > OMII server. > > thank you so much for giving any assistant. > > Best Regard! > gen-tao > > Current Assignees: Steve McGough (GridSAM), William Lee (GridSAM), > Steven Newhouse > > CC(s): > > Contact Information: > > Customer Name: Gen-Tao Chiang Email address: ge...@ni... > Organisation: NIEeS Secondary email address: gt...@ca... > > --- William Lee - Software Coordinator --- --- London e-Science Centre, Imperial College London --- A: Room 211a, London e-Science Centre, William Penney Laboratory, Imperial College London, South Kensington, London, SW7 2AZ, UK E: wwhl at doc.ic.ac.uk | william at imageunion.com W: www.lesc.ic.ac.uk | www.imageunion.com P: +44 (0) 207 594 8185 |
|
From: William L. <ww...@do...> - 2006-04-06 14:03:22
|
May I ask which version of GridSAM you are using? If it's from the OMII bundle, which OMII version? William On 6 Apr 2006, at 14:58, G.T. Chiang wrote: > Dear William > > thakn you very much!! this verison is getting better, the > following is the gridsam-status results. at least job is being > processing via condor, but somehow it fails. > > [root@agorilla examples]# gridsam-status -s "http:// > agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam?wsdl" > urn:gridsam:006868a90a6f50f8010a6f796cea0021 Job Progress: pending - > > staging-in -> staged-in -> active -> failed > > --- pending - 2006-04-06 14:52:09.0 --- > job is being scheduled > --- staging-in - 2006-04-06 14:52:09.0 --- > staging files... > --- staged-in - 2006-04-06 14:52:09.0 --- > 2 files staged in > --- active - 2006-04-06 14:52:09.0 --- > job is being launched through condor > --- failed - 2006-04-06 14:52:09.0 --- > expecting job property urn:condor:classad from previous stage > > -------------- > Job Properties > -------------- > urn:gridsam:Description=cat job description > urn:gridsam:JobProject=gridsam project > urn:gridsam:JobAnnotation=no annotation > urn:gridsam:JobName=cat job > [root@agorilla examples]# > > > the following is the log from gridsam.log > > 2006-04-06 14:52:09,450 INFO [006868a90a6f50f8010a6f796cea0021] > state {pending} reached 2006-04-06 14:52:09,574 INFO > [006868a90a6f50f8010a6f796cea0021] initialised working directory: / > tmp/gridsam-006868a90a6f50f8010a6f796cea0021 2006-04-06 > 14:52:09,607 INFO [006868a90a6f50f8010a6f796cea0021] state {staging- > in} reached 2006-04-06 14:52:09,662 INFO > [006868a90a6f50f8010a6f796cea0021] staging (copy) file http:// > www.doc.ic.ac.uk/~wwhl/download/helloworld.txt -> sftp:// > gri...@ce.../tmp/ > gridsam-006868a90a6f50f8010a6f796cea0021/dir1/file1.txt 2006-04-06 > 14:52:09,681 INFO [006868a90a6f50f8010a6f796cea0021] dir1/file1.txt > staged 2006-04-06 14:52:09,791 INFO > [006868a90a6f50f8010a6f796cea0021] staging (copy) file ftp:// > anonymous:anonymous@128.232.232.41:19245/subdir/input-file.txt -> > sftp://gridsamusr@cete.niees.group.cam.ac.uk/tmp/ > gridsam-006868a90a6f50f8010a6f796cea0021/dir2/subdir1/file2.txt > 2006-04-06 14:52:09,842 INFO [006868a90a6f50f8010a6f796cea0021] > dir2/subdir1/file2.txt staged 2006-04-06 14:52:09,843 INFO > [006868a90a6f50f8010a6f796cea0021] state {staged-in} reached > 2006-04-06 14:52:09,870 INFO [006868a90a6f50f8010a6f796cea0021] > executing groovy script classad.groovy 2006-04-06 14:52:09,871 INFO > [006868a90a6f50f8010a6f796cea0021] executed groovy script > classad.groovy 2006-04-06 14:52:09,898 INFO > [006868a90a6f50f8010a6f796cea0021] state {active} reached > 2006-04-06 14:52:09,903 ERROR [006868a90a6f50f8010a6f796cea0021] > Failed to submit condor job: expecting job property > urn:condor:classad from previous stage 2006-04-06 14:52:09,903 INFO > [006868a90a6f50f8010a6f796cea0021] state {failed} reached > 2006-04-06 14:52:09,904 INFO [006868a90a6f50f8010a6f796cea0021] > failed 1144331529450 1144331529607 1144331529843 1144331529898 > 1144331529903 > > > is it possible to obtain the condor job description file which > converted from gridsam JSDL. can I try to submit it to condor > directly? > > Best Regard! > gen-tao > > > > > > > > > On Apr 6 2006, William Lee wrote: > >> >> Please try the classad.groovy script at this location. It >> incorporates a solution that sets up the transfer_input_files and >> the transfer_output_files classad attributes in the JSDL-to- >> Classad translation. This is needed if the submission node (the >> node which GridSAM is running) does not necessarily share a >> common file system with the execution nodes. >> >> http://www.doc.ic.ac.uk/~wwhl/classad.groovy >> >> William >> >> On 4 Apr 2006, at 15:15, OMII Support wrote: >> >>> When replying, type your text above this line. Notification of >>> Query Change >>> >>> gt...@ca... has sent a reply for query query [OMII >>> 383]. To respond, either reply via email or use the web link >>> below, see 'Click here to view Query in Browser'. >>> >>> Priority: Normal Status: User Replied >>> Creation Date: 03/04/2006 Creation Time: 13:30:47 >>> Created By: ge...@ni... >>> >>> Click here to view Query in Browser >>> >>> Description: >>> Entered on 04/04/2006 at 15:15:01 by gt...@ca...: >>> Dear William >>> >>> thank you so much!! i modefi the classad.grrovy with adding your >>> code. >>> now the probme becomes undefined and job can not submited to condor. >>> [root@agorilla examples]# gridsam-status -s >>> "http://agorilla.niees.group.cam.ac.uk:18080/gridsam/services/ >>> gridsam?wsdl" >>> urn:gridsam:006868a90a64591e010a653a7f1a0013 Job Progress: >>> pending -> >>> staging-in -> staged-in -> undefined >>> >>> --- pending - 2006-04-04 15:07:13.0 --- >>> job is being scheduled >>> --- staging-in - 2006-04-04 15:07:13.0 --- >>> staging files... >>> --- staged-in - 2006-04-04 15:07:13.0 --- >>> 2 files staged in >>> --- undefined - 2006-04-04 15:07:13.0 --- >>> cannot advance from 'staged-in' to 'done' >>> >>> -------------- >>> Job Properties >>> -------------- >>> urn:condor:purestaging=true >>> [root@agorilla examples]# >>> >>> thank you for any suggestion!! >>> >>> Best Regard! >>> gen-tao >>> >>> On Apr 4 2006, OMII Support wrote: >>> >>> >[Duplicate message snipped] >>> >>> Entered on 04/04/2006 at 09:57:02 by William Lee (GridSAM): >>> Hi Gen Tao, >>> >>> You are right, according to the condor setup, you would have to >>> modify the classad.groovy script to enable the transfer_input_files >>> and transfer_output_files classad attributes. This only applies to >>> condor setup that does not share a common networked file system. >>> >>> The code to add to the classad.groovy is >>> >>> jsdl.select("jsdl:JobDefinition/jsdl:JobDescription/ >>> jsdl:DataStaging", ns).eachWithIndex(){ >>> node, index -> >>> if(index == 0){ >>> script += "transfer_input_files=" >>> } >>> if(!node.select("jsdl:Source", ns).isEmpty()){ >>> fileName = node.select("jsdl:FileName")[0].text; >>> script += "${fileName} ," >>> } >>> } >>> >>> I haven't been able to test the code above. Feel free to make any >>> modification as you see fit. >>> >>> William >>> >>> On 3 Apr 2006, at 20:09, OMII Support wrote: >>> >>> > [Duplicate message snipped] >>> >>> Entered on 03/04/2006 at 20:09:02 by gt...@ca...: >>> Dear Sir >>> >>> the resuts as following: >>> [condor@badger1--niees--group jobs]$ less stderr.txt >>> condor_exec.exe: dir1/file1.txt: No such file or directory >>> condor_exec.exe: dir2/subdir1/file2.txt: No such file or directory >>> [condor@badger1--niees--group jobs]$ >>> >>> those files had been staged to the central manager, but not in the >>> executing node. sorry, our central manager is not configured to >>> run jobs. >>> thus, central manager will submit jobs to other machines. that's >>> why when i >>> run this condor job at executing node, and can not find related >>> files. is >>> this normal? shoudl central manager copy those files to other >>> work nodes as >>> well? souhld i changing sometihgn in classad.groovy? >>> >>> thank you very much!! >>> >>> gen-tao >>> >>> thakn you very much!! >>> >>> On Apr 3 2006, OMII Support wrote: >>> >>> >[Duplicate message snipped] >>> >>> Entered on 03/04/2006 at 17:32:38 by William Lee (GridSAM): >>> It's not apparent where the problem lies. Condor has reported to >>> GridSAM the job has >>> completed successfully with exit code 1. Hence the description >>> shown in the EXECUTED >>> state. >>> >>> Can you try running a condor job with the following classad >>> directly? >>> >>> universe=vanilla >>> when_to_transfer_output=ON_EXIT >>> should_transfer_files=IF_NEEDED >>> notification=Never >>> log=/tmp/condor.log >>> executable=/bin/cat >>> arguments=dir1/file1.txt dir2/subdir1/file2.txt >>> output=stdout.txt >>> error=stderr.txt >>> >>> queue >>> >>> Entered on 03/04/2006 at 13:30:47 by ge...@ni...: >>> Dear Sir >>> >>> i am trying to run some GridSAM testing programs. however, it >>> seems the jobs can not be executed in our condor pool. the >>> condor pool is working. the job can be submited to the >>> condor_submitter and running at condor node, but then failed. >>> >>> the following are some information! >>> >>> this is the modefied cat-staging.jsdl >>> <JobDefinition xmlns="http://schemas.ggf.org/jsdl/2005/06/jsdl"> >>> <JobDescription> >>> <JobIdentification> >>> <JobName>cat job</JobName> >>> <Description>cat job description</Description> >>> <JobAnnotation>no annotation</JobAnnotation> >>> <JobProject>gridsam project</JobProject> >>> </JobIdentification> >>> <Application> >>> <POSIXApplication xmlns="http://schemas.ggf.org/jsdl/2005/06/ >>> jsdl- posix"> >>> <Executable>/bin/cat</Executable> >>> <Argument>dir1/file1.txt dir2/subdir1/file2.txt</Argument> >>> <Output>stdout.txt</Output> >>> <Error>stderr.txt</Error> >>> </POSIXApplication> >>> </Application> >>> <DataStaging> >>> <FileName>dir1/file1.txt</FileName> >>> <CreationFlag >overwrite</CreationFlag> >>> <Source> >>> <URI>http://www.doc.ic.ac.uk/~wwhl/download/helloworld.txt</URI> >>> </Source> >>> </DataStaging> >>> <DataStaging> >>> <FileName>dir2/subdir1/file2.txt</FileName> >>> <CreationFlag>overwrite</CreationFlag> >>> <Source> >>> <URI>ftp://anonymous:anonymous@localhost:19245/subdir/input- >>> file.txt</URI> >>> </Source> >>> </DataStaging> >>> <DataStaging> >>> <FileName>stdout.txt</FileName> >>> <CreationFlag>overwrite</CreationFlag> >>> <DeleteOnTermination>true</DeleteOnTermination> >>> <Target> >>> <URI>ftp://anonymous:anonymous@128.232.232.41:19245/output/ >>> stdout.txt</URI> >>> </Target> >>> </DataStaging> >>> </JobDescription> >>> </JobDefinition> >>> >>> after submit this file >>> >>> [root@agorilla examples]# gridsam-status -s "http:// >>> agorilla.niees.group.cam.ac.uk:18080/gridsam/services/gridsam? >>> wsdl" urn:gridsam:006868a90a4d221e010a5fb493650117 >>> Job Progress: pending -> staging-in -> staged-in -> active -> >>> executed -> staging-out -> staged-out -> done >>> >>> --- pending - 2006-04-03 13:22:50.0 --- >>> job is being scheduled >>> --- staging-in - 2006-04-03 13:22:50.0 --- >>> staging files... >>> --- staged-in - 2006-04-03 13:22:59.0 --- >>> 2 files staged in >>> --- active - 2006-04-03 13:22:59.0 --- >>> job is being launched through condor >>> --- executed - 2006-04-03 13:23:04.0 --- >>> 04/03 13:23:52 Job terminated. (1) Normal termination (return >>> value 1) Usr 0 00:00:00, Sys 0 00:00:00 - Run Remote Usage Usr >>> 0 00:00:00, Sys 0 00:00:00 - Run Local Usage Usr 0 00:00:00, Sys >>> 0 00:00:00 - Total Remote Usage Usr 0 00:00:00, Sys 0 00:00:00 >>> - Total Local Usage 126 - Run Bytes Sent By Job 15992 - Run >>> Bytes Received By Job 126 - Total Bytes Sent By Job 15992 - >>> Total Bytes Received By Job >>> --- staging-out - 2006-04-03 13:23:04.0 --- >>> staging files out... >>> --- staged-out - 2006-04-03 13:23:04.0 --- >>> 1 files staged out >>> --- done - 2006-04-03 13:23:04.0 --- >>> Job completed >>> >>> -------------- >>> Job Properties >>> -------------- >>> urn:gridsam:Description=cat job description >>> urn:gridsam:JobProject=gridsam project >>> urn:gridsam:JobAnnotation=no annotation >>> urn:gridsam:JobName=cat job >>> urn:condor:classad=universe=vanilla >>> when_to_transfer_output=ON_EXIT >>> should_transfer_files=IF_NEEDED >>> notification=Never >>> log=/tmp/condor.log >>> >>> executable=/bin/cat >>> arguments=dir1/file1.txt dir2/subdir1/file2.txt >>> output=stdout.txt >>> >>> error=stderr.txt >>> >>> queue >>> urn:condor:clusterid=191 >>> urn:gridsam:exitcode=1 >>> [root@agorilla examples]# >>> >>> if i go the the executing node and the log indicates the following >>> 4/3 13:23:47 DaemonCore: Command received via UDP from host >>> <172.24.89.61:9632> >>> 4/3 13:23:47 DaemonCore: received command 440 (MATCH_INFO), >>> calling handler (command_match_info) >>> 4/3 13:23:47 vm1: match_info called >>> 4/3 13:23:47 vm1: Received match <172.24.89.1:9666>#7928521674 >>> 4/3 13:23:47 vm1: State change: match notification protocol >>> successful >>> 4/3 13:23:47 vm1: Changing state: Unclaimed -> Matched >>> 4/3 13:23:47 DaemonCore: Command received via TCP from host >>> <172.24.89.61:9693> >>> 4/3 13:23:47 DaemonCore: received command 442 (REQUEST_CLAIM), >>> calling handler (command_request_claim) >>> 4/3 13:23:47 vm1: Request accepted. >>> 4/3 13:23:47 vm1: Remote owner is >>> gri...@ce... >>> 4/3 13:23:47 vm1: State change: claiming protocol successful >>> 4/3 13:23:47 vm1: Changing state: Matched -> Claimed >>> 4/3 13:23:50 DaemonCore: Command received via TCP from host >>> <172.24.89.61:9669> >>> 4/3 13:23:50 DaemonCore: received command 444 (ACTIVATE_CLAIM), >>> calling handler (command_activate_claim) >>> 4/3 13:23:50 vm1: Got activate_claim request from shadow >>> (<172.24.89.61:9669>) >>> 4/3 13:23:50 vm1: Remote job ID is 191.0 >>> 4/3 13:23:50 vm1: Got universe "VANILLA" (5) from request classad >>> 4/3 13:23:50 vm1: State change: claim-activation protocol successful >>> 4/3 13:23:50 vm1: Changing activity: Idle -> Busy >>> 4/3 13:23:51 DaemonCore: Command received via TCP from host >>> <172.24.89.61:9652> >>> 4/3 13:23:51 DaemonCore: received command 404 >>> (DEACTIVATE_CLAIM_FORCIBLY), calling handler (command_handler) >>> 4/3 13:23:51 vm1: Called deactivate_claim_forcibly() >>> 4/3 13:23:51 Starter pid 31148 exited with status 0 >>> 4/3 13:23:51 vm1: State change: starter exited >>> 4/3 13:23:51 vm1: Changing activity: Busy -> Idle >>> 4/3 13:23:52 DaemonCore: Command received via UDP from host >>> <172.24.89.61:9620> >>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), >>> calling handler (command_handler) >>> 4/3 13:23:52 vm1: State change: received RELEASE_CLAIM command >>> 4/3 13:23:52 vm1: Changing state and activity: Claimed/Idle -> >>> Preempting/Vacating >>> 4/3 13:23:52 vm1: State change: No preempting claim, returning >>> to owner >>> 4/3 13:23:52 vm1: Changing state and activity: Preempting/ >>> Vacating - > Owner/Idle >>> 4/3 13:23:52 vm1: State change: IS_OWNER is false >>> 4/3 13:23:52 vm1: Changing state: Owner -> Unclaimed >>> 4/3 13:23:52 DaemonCore: Command received via UDP from host >>> <172.24.89.61:9675> >>> 4/3 13:23:52 DaemonCore: received command 443 (RELEASE_CLAIM), >>> calling handler (command_handler) >>> 4/3 13:23:52 Error: can't find resource with capability >>> (<172.24.89.1:9666>#7928521674) >>> >>> it seems the job can be submit to condor centra-manager and >>> executing at node, then terminated by unknown reason. it is fine >>> to run jobs either from condor_submit or from globus. I am >>> confusing this is due to our condor setting our gridsam or OMII >>> level. >>> >>> BTW, the file staging seems ok. at /tmp/gridsam....../dir/ , the >>> virtula files are there. >>> >>> sorry, I am not sure should I ask GridSAM question here, I am >>> not even sure that is gridsam, OMII, or condor problem. becasue, >>> when i run PBAC test, it fail again. i was working just after >>> reinstall OMII server. >>> >>> thank you so much for giving any assistant. >>> >>> Best Regard! >>> gen-tao >>> >>> Current Assignees: Steve McGough (GridSAM), William Lee >>> (GridSAM), Steven Newhouse >>> >>> CC(s): >>> >>> Contact Information: >>> >>> Customer Name: Gen-Tao Chiang Email address: ge...@ni... >>> Organisation: NIEeS Secondary email address: gt...@ca... >>> >>> >> >> --- William Lee - Software Coordinator --- >> --- London e-Science Centre, Imperial College London --- >> A: Room 211a, London e-Science Centre, William Penney Laboratory, >> Imperial College London, South Kensington, London, SW7 2AZ, UK >> E: wwhl at doc.ic.ac.uk | william at imageunion.com >> W: www.lesc.ic.ac.uk | www.imageunion.com >> P: +44 (0) 207 594 8185 >> >> >> >> >> ------------------------------------------------------- This >> SF.Net email is sponsored by xPML, a groundbreaking scripting >> language that extends applications into web and mobile media. >> Attend the live webcast and join the prime developer group >> breaking into this new coding territory! http://sel.as- >> us.falkag.net/sel?cmd=lnk&kid=110944&bid=241720&dat=121642 >> _______________________________________________ GridSAM-Discuss >> mailing list Gri...@li... https:// >> lists.sourceforge.net/lists/listinfo/gridsam-discuss > --- William Lee - Software Coordinator --- --- London e-Science Centre, Imperial College London --- A: Room 211a, London e-Science Centre, William Penney Laboratory, Imperial College London, South Kensington, London, SW7 2AZ, UK E: wwhl at doc.ic.ac.uk | william at imageunion.com W: www.lesc.ic.ac.uk | www.imageunion.com P: +44 (0) 207 594 8185 |