Similar to defect #4026, hierarchical issues, Postage.pm was not returning the correct ip of the SN, it was always passing MASTER from the site table to #TABLE:noderes:$NODE:xcatmaster#
https://gitlab.arif-ali.co.uk/arif/xcat-core/commit/6e0969ca216c45a7191a82118f77cf9f9c8899fd
This is due to
if ($tabname =~ /^noderes$/ && $attrib =~ /^xcatmaster$/ && ! exists($::GLOBAL_TAB_HASH{noderes}{$node}{xcatmaster}))
But a few lines above defines $::GLOBAL_TAB_HASH{noderes}{$node}{xcatmaster}, and therefore it will always exist, as the exist function does the following.
Given an expression that specifies an element of a hash, returns true if the specified element in the hash has ever been initialized
The if statement never gets entered.
I hope that makes sense
So I defined an xcatmaster
lsdef compute-01 | grep xcatmaster
xcatmaster=60.0.0.3
This is the site master
tabdump site | grep master
"master","9.114.34.44",,
I run updatenode compute-01 -g to generate the mypostscript file for the node
cat /tftpboot/mypostscripts/mypostscript.compute-01 | grep MASTER
SITEMASTER='9.114.34.44'
export SITEMASTER
MASTER='60.0.0.3'
export MASTER
MONMASTER='60.0.0.3'
export MONMASTER
It seemed to pick up the correct value from noderes.xcatmaster not the master from the site table.
OK, let me explain
I only set
noderes.servicenode
, with multiple servicenodes, I don't set noderes.xcatmaster.After running a sample postscript, I grepped
MASTER
from/xcatpost/mypostscript
, of which the output is belowWe should be getting
MASTER='172.27.2.42'
I hope that makes sense
172.27.2.42 is the address of what on your system. Is it in some xCAT Table attribute?
It is the facing IP of the SN of the node, i.e. nxb142 has 2 IPs, One one on the MN network, and one on the compute network
i.e, the facing ip of the SN to the CN is 172.27.2.42
There is no route from 172.27.2.0 NW to 172.27.6.0 NW
So to answer your question, the 172.27.2.42 is defined in the nics tables as a nic for nxb142
I hope that makes sense
I spoke with Lissa briefly about this and explained to her what is going on.
I understand your issue, and ran a quick test myself. You are correct that the code path does have an error and that the my_ip_facing routine is not being called in this case. I will take ownership of this defect and fix it.
I think just to be safe, I will code it to include both tests, so that it is something like:
One thing Lissa did want to test was to make sure that updatenode -g (which also generates the postscripts in /tftpboot) will work correctly to create unique versions of the /tftpboot/mypostscripts for each service node with the correct my_ip_facing value in it.
I did need to fix updatenode -g Checked in this morning, but it calls the same code and nodeset, so it will depend on any fixes for this defect.
https://sourceforge.net/p/xcat/bugs/4050/
Fixed and tested:
2.8: 9c572d1e274851b44a1527ba60271c2dfbababed
2.9 (master): f196c720ef8da1227f0512d53820b74cfb82c390
Still want to talk to Lissa about her updatenode -g fix. Not sure if when we have 'sharedtftp=0', whether we will need to send the request to all service nodes the way we do for nodeset. Question is which service node is the one to handle postscripts for a node during boot with servicenode pools -- Is it the primary servicenode the way updatenode uses, or is it the DHCP responder the way we use for the rest of the node boot/install process?
Cool, thanks for that
I am thinking about collecting all the patches/fixes that have been implemented for my customer over the last month, to be collected, and applied to 2.8.3 (local git branch), and create custom RPMs, as at the moment I am having to sync some of the plugins. Would this be recommended, or would us going to the latest 2.8.4 would sound reasonable?
thanks
Whichever way you feel most comfortable for your customer. Our latest 2.8.4 builds have been very stable, and there are a large number of defect fixes in this release. We will not GA until June, and are still running through FVT for various new functions. The advantage of going to 2.8.4 is that it will be easier for xCAT dev to help with any issues you may run into. The advantage of 2.8.3 with patches is that this is what you have been working with and verified up until now.
If you would like to try 2.8.4, email Lissa and she will generate a new snap build with the very latest fixes. Also, are there any other defects for this customer that are still outstanding that we should give particular focus to?
Well, the defects that I have fixes for, but haven't been approved are #3987 and #4025. So I am not particularly worried about these; as I have working solutions
feature request #167, I've not had response wrt 2.8 branch; so not sure what the status on that is
The main one being #4035, which is the Intel Phi's been provisioned over hierarchy doesn't work, when there is no route from Phi containing node to the SN's other interface. The code does not have anything wrt my_ip_facing wrt calling configmic, and therefore fails. This worked for me originally, when I had ip forwarding and iptables setup, but that is against the requirement of the customer. So at the moment, the only way I can provision Phi's is by having routing setup on the specific SN. With the same defect, multiple SN in
noderes.servicenode
is also not supported.I very much appreciate your asistance
Last edit: Arif Ali 2014-04-09
So far I have the following
https://gitlab.arif-ali.co.uk/arif/xcat-core/compare/ccb66ff79...2.8.3-ocf
with my new 2.8.3-ocf branch, which I am tracking the changes; I have cherry-picked everything. I will have to have a discussion with the customer, on how they wanted to proceed. I can easily create the RPMs, which won't be a problem. But as you mention, support from you guys won't be the same, as this will be custom
Will there be any gotchas, with new features in 2.8.4, that I would need to be aware of, that could potentially cause issues?
We have started to list some of our 2.8.4 changes in a new release notes page:
https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_2.8.4_Release_Notes
This is not complete, be gives you an idea of some of the items. Besides new OS support, we spent some time cleaning things up and taking care of various outstanding defects and issues. I can't think of anything that would trip you up off-hand.
I put in a fix for updatenode.pm to Broadcast the updatenode -g to all service nodes if sharedtftp=0
The nly thing is the returned message ( multiple) maybe confusing. Maybe I should just write them to syslog for example. I did add the hostname. But we will get one for every service node.
updatenode compute-03 -g
Generated new mypostscript files on manage-02
Generated new mypostscript files on service-03.ppd.pok.ibm.com
2.8.4
commit 8a592c77aa8fb30af3780490d900c328c97313aa
2.9
commit 357a53589cbdfd7e89192d95be8c03a6a44a06c7
thanks, I will get that tested
That works really well.
nodeset doesn't do anything on the MN, but does on the SN; in the case for
updatenode -g
, it also creates on the MN; would you expect thatI only mention this for consistency between
nodeset
andupdatenode
.I hope that makes sense.
I see you are correct. What I did in updatenode was to allow the combination of nodes and servicenodes and nodes that are assigned to service nodes which are assigned to the MN. Which is not allowed in nodeset. Maybe I should make totally consistent with nodeset. On the other hand, maybe I should just leave well enough alone because updatenode is doing this one function with the -g flag. I will talk to Linda on Monday.
What I did not want to do is have to evaluate every node.
[manage-02][/root]> updatenode compute-01 -g
Generated new mypostscript files on manage-02
Generated new mypostscript files on service-03.ppd.pok.ibm.com
[manage-02][/root]> updatenode compute-01,service-03 -g
Generated new mypostscript files on manage-02
Generated new mypostscript files on service-03.ppd.pok.ibm.com
[manage-02][/root]> nodeset compute-01,service-03 install
Error: Nodeset was run with a noderange containing both service nodes and compute nodes. This is not valid. You must submit with either compute nodes in the noderange or service nodes.
Should I just put the "Generated" message to syslog, does it bother you?
Last edit: Lissa Valletta 2014-04-11
It was more of an observation, than anything else.
But the fact that SN will be able to become a MN, then it makes sense that the MN also has a copy
Personally, not bothered about syslog, but it may be a good idea, for extra debugging purposes?
The only problem is with pools what I generated on the MN would not be correct for MASTER for a compute node assigned to a service node. The use of the ip-facing routine to get the ip does not work. We will talk on Monday.
Talked to Linda. We will leave code as is and i will just check that if ip-facing on the service node does not return some valid address for the node, we default back to site.master.
cool, no problems, as long as it works
thanks for your assistance on this