For jobs particularly with long execution times, UCC shows the job status as FAILED (implicitly comes through the server), though job can be seen as running process on system.
I have simulated this by submitting a simple script that perform N-million loops.
Environment: Only see this error with the NO-BATCH TSI. I am not able to reproduce this error with UNICOREX / batch-system-TSI settings.
The UNICOREX log says the following,
2012-08-22 14:47:24,962 [XNJS-1-JobRunner-1] ERROR Execution - Error getting job details.
java.lang.Exception: Getting job details on TSI failed: reply was No further information is available.
And, the UCC shows,
Wed Aug 22 14:44:21 CEST 2012: TSI reply: submission OK.
Wed Aug 22 14:44:21 CEST 2012: Submitted to classic TSI as [m.memon NONE] with BSSID=1661900 project=TG-STA110009S
Wed Aug 22 14:47:24 CEST 2012: Job was not completed (no exit code file found), please check standard error file <stderr>
Wed Aug 22 14:47:24 CEST 2012: Result: Failed.
Wed Aug 22 14:47:24 CEST 2012: Status set to DONE.