From: Charles G. <ce...@uw...> - 2016-03-30 19:44:10
|
Hi, Opal seems to submit jobs to SGE without setting the restart flag. We’re running Opal with SGE in the Amazon EC2 cloud, and if Opal could send jobs to SGE with the restart flag set we could take advantage of spot pricing. The problem with using spot pricing is that you can loose nodes without notice if you get outbid. Currently if the SGE cluster looses a node being used to run an Opal job, Opal simply reports an invalid job state, and the jobs is forgotten If Opal set the restart flag, SGE would automatically restart the job on another node. I’m guessing that this would just be a mater of passing ‘-r’ to the JobTempate.setArgs() function. Is this on anybody’s plate, or should I try creating a patch for Opal 2.5 on my own? Thanks! Charles |