xCAT / Bugs / #2326 xcatd: SSL Listener crashes

#2326 xcatd: SSL Listener crashes

Milestone: 2.6.11

Status: closed

Owner: Jarrod Johnson

Labels: General (881)

component:

Priority: 5

Updated: 2013-01-25

Created: 2011-09-23

Creator: Evan Felix

Private: No

Whe have setup a xcat hierarchical cluster, and have noticed an issue where the SSL listener processes dies. specifically it gets cannot fork errors, and then exits. We can reproduce this with a lot of requests such as:

for i in seq 1 40; do ./getpostscript.awk > /dev/null & done

This pretty reliably causes the error. We see a storm like this on our production system when we boot large numbers of nodes.. Also once the process has dies, restarting the xcatd service fails pretty quickly as all of the nodes are now retrying the request. This does not seem to happen on the management node, but it seems to handle the requests one at a time, and is very slow. We have modified the xcatd script to not immediately call 'die' when a fork fails, and the process seems to continue handling other requests. we have checked various ulimits, and socket counts and cannot find any resource that is not available.

Discussion

Evan Felix - 2011-11-04

We patched the xcatd code to not exit on these errors, so it could handle further requests, as it seemed to work fine that way, only slower. Clients retry in this case so it worked ok.

Later on we realized that we needed overcommit of memory setting different, and we found that setting the sytsctl of

vm.overcommit_memory=0

instead of

vm.overcommit_memory=2

made the fork errors disappear.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Brian Croswell - 2012-01-25

Moved the open xCAT 2.6.6 bug to current service tream
Can you pickup the latest xCAT 2.6.10 build and see if this issue still exists .
If there is till a problem please reopen the bug.

If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Lissa Valletta - 2013-01-25

status: pending --> closed
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

xcatd: SSL Listener crashes

An extreme cluster/cloud administration toolkit

Milestone

Searches

Help

#2326 xcatd: SSL Listener crashes

Discussion