#577 UX/NOBATCH becomes unresponsive while running(local) IO jobs

UNICORE6.5
closed-invalid
UNICORE/X (76)
5
2012-09-06
2012-08-22
No

In a situation when a job request that generates a big file (at least 10G) (for example via dd command), while job is executing the unicorex becomes unresponsive, and the gateway is not able to communicate with it. During that if a client supposedly queries for the job status or generally interacts, it sees an exception. Please find attached the exception snippet from the gateway logs.

This problem can be seen with the NO-BATCH deployments, but not with the ones deployed on batch systems such as Torque.

Note: As a workout I have also tried to extend timeout attributes on gateway, unicorex (wsrflite.xml, xnjs_legacy.xml), however they didn't solve this issue.

Discussion

  • Shahbaz Memon

    Shahbaz Memon - 2012-08-22

    gateway and ucc log snippet

     
  • Bernd Schuller

    Bernd Schuller - 2012-08-22

    I assume you also have UNICORE/X on the same machine as the NOBATCH TSI?
    In such a case I see this too, it appears to be due to the high IO load on the UNICORE/X machine, leading to long response times, especially when UNICORE/X tries to access the disk.

     
  • Bernd Schuller

    Bernd Schuller - 2012-08-22
    • status: open --> open-accepted
     
  • Bernd Schuller

    Bernd Schuller - 2012-09-06
    • status: open-accepted --> closed-invalid
     
  • Bernd Schuller

    Bernd Schuller - 2012-09-06

    I do not really see this as a bug. It's possible to modify Submit.pm (and maybe ExceuteScript.pm) and add some "nice" and "ionice" commands to reduce the load generated by user scripts. But all in all, it is more a deployment problem, and administrators need to understand the implications of running the NOBATCH TSI (and io-intensive jobs) on the same machine as UNICORE/X.

     

Log in to post a comment.

Get latest updates about Open Source Projects, Conferences and News.

Sign up for the SourceForge newsletter:

JavaScript is required for this form.





No, thanks