Customer reported that the appstatus did not get updated if the node status is "noping". The command used was nodestat -up <node>.</node>
Customer wrote:
I have performed the following sanity check:
# rpower n01 off
# nodestat -up n01
noping(off)
# nodels n01 nodelist.appstatus nodelist.status
n01: nodelist.appstatus: xend=down,sshd=up,rdp=down,https=up,pbs=down,msrpc=down
n01: nodelist.status: noping(off)
# tabch node=n01 nodelist.appstatus=""
# nodels n01 nodelist.appstatus nodelist.status
n01: nodelist.appstatus:
n01: nodelist.status: noping(off)
# nodestat -up n01
noping(off)
# nodels n01 nodelist.appstatus nodelist.status
n01: nodelist.appstatus:
n01: nodelist.status: noping(off)
# rpower n01 on
# nodestat -up n01
sshd,https
# nodels n01 nodelist.appstatus nodelist.status
n01: nodelist.appstatus: xend=down,sshd=up,rdp=down,https=up,pbs=down,msrpc=down
n01: nodelist.status: ping
So it seems when a node is not responding (noping), the appstatus
field doesn't update. I assume some kind of code optimization (why
bother checking appstatus for a noping node) but it does present a
slightly more complex logic to my monitoring code,
How hard would it be to modify the code to update the appstatus
regardless of status=noping (or even automatically set everything to
'down')?
Fixed in 82c90e2227f3ed17a4a431a81a5cab0d42911d0a for xCAT 2.10.
For other versions, please get the attached nodestat.pm file and copy it under /opt/xcat/lib/perl/xCAT_plugin/ directory on both mn and sn (if any) to replace the original file. And restart xcatd on both mn and sn.
Last edit: Ling 2015-04-21
Ling, since you already checked in the code, should we close this bug? Thanks.
The new nodestat.pm completely removes the appstatus from the nodelist table. Is that the intended behavior?
I would prefer having it set all apps to "down", but if by design it should remove the appstatus, than it seems to work correctly.
For now we will just leave the appstatus blank if the node status is off.