#578 UNICORE client reports a Running job as Failed

UNICORE6.5
closed-fixed
TSI (15)
5
2012-08-22
2012-08-22
Shahbaz Memon
No

For jobs particularly with long execution times, UCC shows the job status as FAILED (implicitly comes through the server), though job can be seen as running process on system.

I have simulated this by submitting a simple script that perform N-million loops.

Environment: Only see this error with the NO-BATCH TSI. I am not able to reproduce this error with UNICOREX / batch-system-TSI settings.

The UNICOREX log says the following,

2012-08-22 14:47:24,962 [XNJS-1-JobRunner-1] ERROR Execution - Error getting job details.
java.lang.Exception: Getting job details on TSI failed: reply was No further information is available.

at de.fzj.unicore.xnjs.legacy.Execution.getBSSJobDetails(Execution.java:557)
at de.fzj.unicore.xnjs.legacy.Execution.updateStatus(Execution.java:328)
at de.fzj.unicore.xnjs.ems.JobProcessor.handleQueued(JobProcessor.java:522)
at de.fzj.unicore.xnjs.ems.Processor.process(Processor.java:125)
at de.fzj.unicore.xnjs.ems.JobRunner.process(JobRunner.java:157)

And, the UCC shows,

...
Wed Aug 22 14:44:21 CEST 2012: TSI reply: submission OK.
Wed Aug 22 14:44:21 CEST 2012: Submitted to classic TSI as [m.memon NONE] with BSSID=1661900 project=TG-STA110009S
Wed Aug 22 14:47:24 CEST 2012: Job was not completed (no exit code file found), please check standard error file <stderr>
Wed Aug 22 14:47:24 CEST 2012: Result: Failed.
Wed Aug 22 14:47:24 CEST 2012: Status set to DONE.

Discussion

  • Bernd Schuller
    Bernd Schuller
    2012-08-22

    • status: open --> open-fixed
     
  • Bernd Schuller
    Bernd Schuller
    2012-08-22

    • status: open-fixed --> closed-fixed