I'm getting error while loading STAF on boot in RHEL. STAF is added to chkconfig and loads on boot. It gives the following info and hangs.
INFO: task STAFproc: 4127 blocked for more than 120 seconds
and then it shows call trace and hangs.
What version of RHEL are you using?
Did you following the instructions in section "11.1 Unix" for "RHEL 4 and 5" in the STAF Installation Guide at http://staf.sourceforge.net/current/STAFInstall.pdf? Did you first verify that your STAFProc init script works correctly for your environment by running it manually as follows:
# cd /etc/rc.d/init.d
# ./stafproc start
Waiting for STAFProc to finish initalizing
Did you time it to see if STAFProc completed initializing in < 120 seconds when run manually and that a "STAF local PING PING" request works when STAFProc is started?
When starting STAFProc automatically on boot when you get this error, what is the output from the stafproc
init script in file /var/log/messages?
What is the content of your init script that you created (e.g. /etc/rc.d/init.d/stafproc)?
What is the content of your STAF.cfg file (e.g. /usr/local/staf/bin/STAF.cfg)?
The RHEL version is
# cat /etc/redhat-release
Red Hat Enterprise Linux Server release 5.6 (Tikanga)
>>Did you following the instructions in section "11.1 Unix" for "RHEL 4 and 5" in the STAF Installation Guide at >>http://staf.sourceforge.net/current/STAFInstall.pdf?
It's an intermittent issue. This issue occurs for 20% of the RHEL VM servers while mass booting.
# ./stafproc status
STAFProc (pid 4049) is running…
>>Did you time it to see if STAFProc completed initializing in < 120 seconds when run manually and that a "STAF local >>PING PING" request works when STAFProc is started?
>>When starting STAFProc automatically on boot when you get this error, what is the output from the stafproc
>>init script in file /var/log/messages?
Nothing in /var/log/messages. Error only on boot screen.
When I'm auto booting VMs via vmware, STAF hangs (20 % of servers) with that error. But when I manually boot those VMs, staf is loading fine.
# cat /usr/software/test/staf/current/bin/STAF_IPv4.cfg
# Turn on tracing of internal errors and deprecated options
trace enable tracepoints "error deprecated"
# Enable TCP/IP connections
interface tcp library STAFTCP option Secure=No option Port=6500 option PROTOCOL=IPv4
# Set default local trust
trust machine local://local level 5
# Add default service loader
serviceloader library STAFDSLS
#VVVVVVVVVVVVVV# EDITED BY XX #VVVVVVVVVVVVV#
SERVICE log LIBRARY STAFLog
SERVICE monitor LIBRARY STAFMon
SERVICE respool LIBRARY STAFPool
SERVICE reaper LIBRARY STAFReaper
SET DEFAULTSTOPUSING SIGINT
SET MAXQUEUESIZE 65536
TRUST LEVEL 5 DEFAULT
SET PROCESSAUTHMODE None
SET DATADIR /var/staf/data
SET VAR STAF/Env/STAF_DEBUG_RC_21=1
NOTIFY ONSTART MACHINE X.X.X.X NAME registration
Did you try running the following command to see if it disables this message?
echo 0 > /proc/sys/kernel/hung_task_timeout_secs
In googling for message "INFO: task <process>:<pid> blocked for more than 120 seconds", I found many entries. This issue related specifically to STAF. It is a Linux kernel issue that appears to perhaps be fixed in a later Linux kernel. Google for this message and you'll see entries like the following:
I had a typo in my previous post. I meant to say that this issue is NOT related specifically to STAF - It is a Linux kernel issue.