Menu

#2326 xcatd: SSL Listener crashes

2.6.11
closed
General (881)
5
2013-01-25
2011-09-23
Evan Felix
No

Whe have setup a xcat hierarchical cluster, and have noticed an issue where the SSL listener processes dies. specifically it gets cannot fork errors, and then exits. We can reproduce this with a lot of requests such as:

for i in seq 1 40; do ./getpostscript.awk > /dev/null & done

This pretty reliably causes the error. We see a storm like this on our production system when we boot large numbers of nodes.. Also once the process has dies, restarting the xcatd service fails pretty quickly as all of the nodes are now retrying the request. This does not seem to happen on the management node, but it seems to handle the requests one at a time, and is very slow. We have modified the xcatd script to not immediately call 'die' when a fork fails, and the process seems to continue handling other requests. we have checked various ulimits, and socket counts and cannot find any resource that is not available.

Discussion

  • Evan Felix

    Evan Felix - 2011-11-04

    We patched the xcatd code to not exit on these errors, so it could handle further requests, as it seemed to work fine that way, only slower. Clients retry in this case so it worked ok.

    Later on we realized that we needed overcommit of memory setting different, and we found that setting the sytsctl of

    vm.overcommit_memory=0

    instead of

    vm.overcommit_memory=2

    made the fork errors disappear.

     
  • Brian  Croswell

    Brian Croswell - 2012-01-25

    Moved the open xCAT 2.6.6 bug to current service tream
    Can you pickup the latest xCAT 2.6.10 build and see if this issue still exists .
    If there is till a problem please reopen the bug.

     
  • Lissa Valletta

    Lissa Valletta - 2013-01-25
    • status: pending --> closed