we talked about the error that nodeset will return an error "Error: Unable to dispatch hierarchical sub-command to SN:3001. This service node may be down or its xcatd daemon may not be responding." when install a SN,,
But I think this error is not friendly.
I just want to intsall this SN , it's sure that xcat cannot dispath command to it,,,but if nodeset pop this error, I always think that nodeset is fail....
[root@c250mgrs04-pvt ~]# nodeset c250f07c04ap01 install
c250f07c04ap01: install rhels6-ppc64-service
Error: Unable to dispatch hierarchical sub-command to c250f07c04ap01:3001. This service node may be down or its xcatd daemon may not be responding.
Error: Unable to dispatch hierarchical sub-command to c250f07c04ap01:3001. This service node may be down or its xcatd daemon may not be responding.
[root@c250mgrs04-pvt ~]# tabdump servicenode
"service","1","1","1","1",,,,,"1",,,,
[root@c250mgrs04-pvt rh]# lsdef -t group -o service
Object name: service
arch=ppc64
grouptype=static
members=c250f07c04ap01
nodetype=osi
os=rhels6
postscripts=servicenode,xcatserver,xcatclient
profile=service
setupdhcp=1
setupftp=1
setupnameserver=1
setupnfs=1
setuptftp=1
Discussion history:
I think we need to continue the discussion for this topic.
Does anybody has comment about which should be set for site.disjointdhcps by default, 1 or 0?
If no one has more concern, I'll checkin the code that Jarrod mentioned.
I am also thinking that do we need to combine this error message for multiple service nodes?
Make the error message to be:
Error: Unable to dispatch hierarchical sub-command to idplex01:3001,x3550m3n01:3001. This service node may be down or its xcatd daemon may not be responding.
instead of:
Error: Unable to dispatch hierarchical sub-command to idplex01:3001. This service node may be down or its xcatd daemon may not be responding.
Error: Unable to dispatch hierarchical sub-command to x3550m3n01:3001. This service node may be down or its xcatd daemon may not be responding.
Thanks
Best Regards
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: wxp@cn.ibm.com
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193
----- Forwarded by Xiao Peng Wang/China/IBM on 2011-05-17 21:25 -----
From: Xiao Peng Wang/China/IBM
To: Jarrod B Johnson/Raleigh/IBM@IBMUS@IBMAU
Cc: Ling Gao/Poughkeepsie/IBM@IBMUS, Lissa Valletta/Poughkeepsie/IBM@IBMUS, Bruce M Potter/Poughkeepsie/IBM@IBMUS, CSTL xCAT - All, POK xCAT - All
Date: 2011-05-13 15:10
Subject: Re: Some questions for '-l' flag of makedhcp, who knows the '-l' flag?
For how to reduce the effect of this error message in the disjointdhcps=0 mode, I think Jorrod's code is better than nothing. As Jorrod said, it's possible that the service node will server each other, we cannot avoid the this error for the case no SN in the noderange is up. Also for the case that all cn is just managed by sn1 (configured in the noderes.servicenode), the admin should set the disjointdhcps=1.
When the disjointdhcps=0, the noderes.servicenode will be ignored, that means if you want to specify the service node for each node, the disjointdhcps=1 should be set. I am thinking that in most of case (the cluster has less than 1000 nodes), use the noderes.servicenode to specify the sn for cn to make the cluster structure easy to manage. So I vote to set disjointdhcps=1 by default and customer can set it to '0' for performance tuning.
Thanks
Best Regards
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: wxp@cn.ibm.com
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193
From: Jarrod B Johnson/Raleigh/IBM@IBMUS
To: Ling Gao/Poughkeepsie/IBM@IBMUS
Cc: Lissa Valletta/Poughkeepsie/IBM@IBMUS, Bruce M Potter/Poughkeepsie/IBM@IBMUS, Xiao Peng Wang/China/IBM@IBMCN@IBMAU, CSTL xCAT - All, POK xCAT - All
Date: 2011-05-12 23:06
Subject: Re: Some questions for '-l' flag of makedhcp, who knows the '-l' flag?
In the first case, it is actually potentially valid for service nodes to boot off each other. The only case that really can be assured without specific configuration is a dhcp server serving itself. So disjointdhcp=1 remains the catchall for keeping xCAT from doing too much, but increasing the chances of doing too little.
From: Ling Gao/Poughkeepsie/IBM
To: Jarrod B Johnson/Raleigh/IBM@IBMUS
Cc: Lissa Valletta/Poughkeepsie/IBM@IBMUS, Bruce M Potter/Poughkeepsie/IBM@IBMUS, Xiao Peng Wang/China/IBM@IBMCN@IBMAU, CSTL xCAT - All, POK xCAT - All
Date: 05/12/2011 10:53 AM
Subject: Re: Some questions for '-l' flag of makedhcp, who knows the '-l' flag?
This only solves Xiao Peng's case, but when someone specify a list of service nodes and none of them are up and running, (it is very common when setting up a cluster), we'll still face the same errors. And to make it further confusing, if I specify a list of compute nodes, all the them are managed by sn1, but now sn2 are down, we'll get a warning saying "cannot send request to sn2".
Ling
Ling Gao
Poughkeepsie Unix Development Lab
IBM Systems and Technology Group
Internal: T/L 293-5692
External: linggao@us.ibm.com, 845-433-5692
"I never worry about the future. It comes soon enough." --- Albert Einstein
From: Jarrod B Johnson/Raleigh/IBM
To: Lissa Valletta/Poughkeepsie/IBM@IBMUS
Cc: Bruce M Potter/Poughkeepsie/IBM@IBMUS, Xiao Peng Wang/China/IBM@IBMCN@IBMAU, CSTL xCAT - All, POK xCAT - All
Date: 05/12/2011 10:42 AM
Subject: Re: Some questions for '-l' flag of makedhcp, who knows the '-l' flag?
After thought and reviewing the code, I suppose the answer would be setting disjointdhcp, if I read it correctly. In terms of making disjointdhcp=1 default behavior, I'm not sure. disjointdhcp=0 may hit corner cases like this which appear silly (but still works), but it has a higher chance of 'just working' with less configuration (i.e., not having to set noderes.servicenode explicitly to everything in servicenode table). It also tolerates mistakes more (if someone accidentally omits a key service node, it may offer stale DHCP data if they messed up noderes.servicenode).
Of course, disjointdhcp is the generic answer that would include, among other things, this. However if we want disjointdhcp=0 to not fail in this specific case without much side effect, I think:
$ svn diff
Index: xCAT-server/lib/xcat/plugins/dhcp.pm
===================================================================
--- xCAT-server/lib/xcat/plugins/dhcp.pm (revision 9579)
+++ xCAT-server/lib/xcat/plugins/dhcp.pm (working copy)
@@ -699,6 +699,7 @@
my $reqcopy = {%$req};
$reqcopy->{'_xcatdest'} = $s;
$reqcopy->{_xcatpreprocessed}->[0] = 1;
Would do it. Look good?
From: Lissa Valletta/Poughkeepsie/IBM
To: Bruce M Potter/Poughkeepsie/IBM
Cc: Xiao Peng Wang/China/IBM@IBMCN@IBMAU, CSTL xCAT - All, Jarrod B Johnson/Raleigh/IBM@IBMUS, POK xCAT - All
Date: 05/12/2011 08:19 AM
Subject: Re: Some questions for '-l' flag of makedhcp, who knows the '-l' flag?
I think makedhcp always setup all nodes on all service nodes. So it would not just send to the servicenode for the node.
I know the -l flag was used in AAsn.pm and I thought it was some sort of control that when makedhcp was run on the Service Node it would not again setup all the ServiceNodes, but only for that Service Node.
Lissa K. Valletta
2-3/T12
Poughkeepsie, NY 12601
(tie 293) 433-3102
From: Bruce M Potter/Poughkeepsie/IBM
To: Xiao Peng Wang/China/IBM@IBMCN@IBMAU
Cc: CSTL xCAT - All, Jarrod B Johnson/Raleigh/IBM@IBMUS, POK xCAT - All
Date: 05/12/2011 07:52 AM
Subject: Re: Some questions for '-l' flag of makedhcp, who knows the '-l' flag?
Does/should this depend on the site.disjointdhcps attribute? From the site man page:
disjointdhcps: If set to '1', the .leases file on a service node only contains
the nodes it manages. The default value is '0'.
Seems like if site.disjointdhcps is 0 (the default) then makedhcp runs have to be sent to all SNs.
I wonder if nodeset should specifically remove from its SN list any SN that is in the specified node range? (With the assumption being that if they are doing a nodeset on the SN, either it isn't up right now or they will be reinstalling it.)
Bruce Potter STSM, Linux & AIX Cluster Development, IBM, Poughkeepsie, NY
Email: bp@us.ibm.com Phone: external: 845-433-7073, internal: TL 293-7073
From: Xiao Peng Wang/China/IBM@IBMCN
To: CSTL xCAT - All, POK xCAT - All, Jarrod B Johnson/Raleigh/IBM@IBMUS
Date: 05/11/2011 10:11 PM
Subject: Some questions for '-l' flag of makedhcp, who knows the '-l' flag?
I have a defect that when performing the 'nodeset hs22n01' command to a service node, an error like following will be displayed. That means the command request was send to the node itself, but in this case, the node still in power off state.
Error: Unable to dispatch hierarchical sub-command to hs22n01:3001. This service node may be down or its xcatd daemon may not be responding.
I looked into the code, found it was caused by the makedhcp which invoked in the nodeset command. The makedhcp command tried to broadcast the request to all the SN. And I found the makedhcp has a '-l' flag which will just dispatch 'makedhcp' request to the SN which serving the node in the noderange instead of broadcasting to all the SN. Then I think we should invoke makedhcp with '-l' flag here. And what's the purpose of '-l' flag?
And I found the 'makedhcp sn_node' also displayed this error message since it did not get into the '-l' code path but broadcasting to all the SN includes the sn_node itself. By my understanding that if a noderange is specified, the 'makedhcp' request should only send to the SN of this node. Am I right?
Thanks
Best Regards
Wang Xiaopeng (王晓朋)
IBM China System Technology Laboratory
Tel: 86-10-82453455
Email: wxp@cn.ibm.com
Address: 28,ZhongGuanCun Software Park,No.8 Dong Bei Wang West Road, Haidian District Beijing P.R.China 100193
Conclusion 1: By default, disjointdhcps=0. Then any request of makedhcp will be forward to any service node which defined in the servicenode table. The case that only has one service node in the noderange is an exception that no request will be sent to the sn itself.
Conclusion 2: In some case, customer can set disjointdhcps=1 to avoid this situation that send request to all of the service node.
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 28 days (the time period specified by
the administrator of this Tracker).