Menu

#4049 Postage.pm doesn't output correct MASTER

2.8.4
closed
Postage.pm (2)
general
5
2014-05-23
2014-04-07
Arif Ali
No

Similar to defect #4026, hierarchical issues, Postage.pm was not returning the correct ip of the SN, it was always passing MASTER from the site table to #TABLE:noderes:$NODE:xcatmaster#

https://gitlab.arif-ali.co.uk/arif/xcat-core/commit/6e0969ca216c45a7191a82118f77cf9f9c8899fd

This is due to

if ($tabname =~ /^noderes$/ && $attrib =~ /^xcatmaster$/ && ! exists($::GLOBAL_TAB_HASH{noderes}{$node}{xcatmaster}))

But a few lines above defines $::GLOBAL_TAB_HASH{noderes}{$node}{xcatmaster}, and therefore it will always exist, as the exist function does the following.

Given an expression that specifies an element of a hash, returns true if the specified element in the hash has ever been initialized

The if statement never gets entered.

I hope that makes sense

Discussion

  • Lissa Valletta

    Lissa Valletta - 2014-04-07

    So I defined an xcatmaster
    lsdef compute-01 | grep xcatmaster
    xcatmaster=60.0.0.3

    This is the site master
    tabdump site | grep master
    "master","9.114.34.44",,

    I run updatenode compute-01 -g to generate the mypostscript file for the node

    cat /tftpboot/mypostscripts/mypostscript.compute-01 | grep MASTER
    SITEMASTER='9.114.34.44'
    export SITEMASTER
    MASTER='60.0.0.3'
    export MASTER
    MONMASTER='60.0.0.3'
    export MONMASTER

    It seemed to pick up the correct value from noderes.xcatmaster not the master from the site table.

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-07
    • assigned_to: Lissa Valletta
     
  • Arif Ali

    Arif Ali - 2014-04-07

    OK, let me explain

    I only set noderes.servicenode, with multiple servicenodes, I don't set noderes.xcatmaster.

    # lsdef nxb1a02 | grep 'xcatmaster\|servicenode'
        servicenode=nxb142,nxb242,nxb342,nxb542,nxb642
    
    [root@crb111 xCAT_plugin]# cat /tftpboot/mypostscripts/mypostscript.nxb1a02| grep MASTER
    SITEMASTER='172.27.6.250'
    export SITEMASTER
    MASTER='172.27.6.250'
    export MASTER
    

    After running a sample postscript, I grepped MASTER from /xcatpost/mypostscript, of which the output is below

    # grep MASTER mypostscript 
    SITEMASTER='172.27.6.250'
    export SITEMASTER
    MASTER='172.27.6.250'
    export MASTER
    

    We should be getting MASTER='172.27.2.42'

    I hope that makes sense

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-07

    172.27.2.42 is the address of what on your system. Is it in some xCAT Table attribute?

     
  • Arif Ali

    Arif Ali - 2014-04-07

    It is the facing IP of the SN of the node, i.e. nxb142 has 2 IPs, One one on the MN network, and one on the compute network

    MN == 172.27.6.250 <== crb111 ==> 172.27.34.11
    SN ==                             172.27.34.1  <== nxb142  ==> 172.27.2.42 
    CN ==                                              nxb1a01 ==> 172.27.2.2
    

    i.e, the facing ip of the SN to the CN is 172.27.2.42

    There is no route from 172.27.2.0 NW to 172.27.6.0 NW

    So to answer your question, the 172.27.2.42 is defined in the nics tables as a nic for nxb142

    I hope that makes sense

     
  • Anonymous

    Anonymous - 2014-04-07

    I spoke with Lissa briefly about this and explained to her what is going on.

    I understand your issue, and ran a quick test myself. You are correct that the code path does have an error and that the my_ip_facing routine is not being called in this case. I will take ownership of this defect and fix it.

    I think just to be safe, I will code it to include both tests, so that it is something like:

    if ($tabname =~ /^noderes$/ && $attrib =~ /^xcatmaster$/ && 
            ( ! exists($::GLOBAL_TAB_HASH{noderes}{$node}{xcatmaster}) ||
              $::GLOBAL_TAB_HASH{noderes}{$node}{xcatmaster} == ""        )
    

    One thing Lissa did want to test was to make sure that updatenode -g (which also generates the postscripts in /tftpboot) will work correctly to create unique versions of the /tftpboot/mypostscripts for each service node with the correct my_ip_facing value in it.

     
  • Anonymous

    Anonymous - 2014-04-07
    • assigned_to: Lissa Valletta --> Linda Mellor
     
  • Lissa Valletta

    Lissa Valletta - 2014-04-08

    I did need to fix updatenode -g Checked in this morning, but it calls the same code and nodeset, so it will depend on any fixes for this defect.
    https://sourceforge.net/p/xcat/bugs/4050/

     
  • Anonymous

    Anonymous - 2014-04-08
    • status: open --> pending
     
  • Anonymous

    Anonymous - 2014-04-08

    Fixed and tested:
    2.8: 9c572d1e274851b44a1527ba60271c2dfbababed
    2.9 (master): f196c720ef8da1227f0512d53820b74cfb82c390

    Still want to talk to Lissa about her updatenode -g fix. Not sure if when we have 'sharedtftp=0', whether we will need to send the request to all service nodes the way we do for nodeset. Question is which service node is the one to handle postscripts for a node during boot with servicenode pools -- Is it the primary servicenode the way updatenode uses, or is it the DHCP responder the way we use for the rest of the node boot/install process?

     
  • Arif Ali

    Arif Ali - 2014-04-09

    Cool, thanks for that

    I am thinking about collecting all the patches/fixes that have been implemented for my customer over the last month, to be collected, and applied to 2.8.3 (local git branch), and create custom RPMs, as at the moment I am having to sync some of the plugins. Would this be recommended, or would us going to the latest 2.8.4 would sound reasonable?

    thanks

     
  • Anonymous

    Anonymous - 2014-04-09

    Whichever way you feel most comfortable for your customer. Our latest 2.8.4 builds have been very stable, and there are a large number of defect fixes in this release. We will not GA until June, and are still running through FVT for various new functions. The advantage of going to 2.8.4 is that it will be easier for xCAT dev to help with any issues you may run into. The advantage of 2.8.3 with patches is that this is what you have been working with and verified up until now.

    If you would like to try 2.8.4, email Lissa and she will generate a new snap build with the very latest fixes. Also, are there any other defects for this customer that are still outstanding that we should give particular focus to?

     
  • Arif Ali

    Arif Ali - 2014-04-09

    Well, the defects that I have fixes for, but haven't been approved are #3987 and #4025. So I am not particularly worried about these; as I have working solutions

    feature request #167, I've not had response wrt 2.8 branch; so not sure what the status on that is

    The main one being #4035, which is the Intel Phi's been provisioned over hierarchy doesn't work, when there is no route from Phi containing node to the SN's other interface. The code does not have anything wrt my_ip_facing wrt calling configmic, and therefore fails. This worked for me originally, when I had ip forwarding and iptables setup, but that is against the requirement of the customer. So at the moment, the only way I can provision Phi's is by having routing setup on the specific SN. With the same defect, multiple SN in noderes.servicenode is also not supported.

    I very much appreciate your asistance

     

    Last edit: Arif Ali 2014-04-09
  • Arif Ali

    Arif Ali - 2014-04-09

    So far I have the following

    https://gitlab.arif-ali.co.uk/arif/xcat-core/compare/ccb66ff79...2.8.3-ocf

    with my new 2.8.3-ocf branch, which I am tracking the changes; I have cherry-picked everything. I will have to have a discussion with the customer, on how they wanted to proceed. I can easily create the RPMs, which won't be a problem. But as you mention, support from you guys won't be the same, as this will be custom

    Will there be any gotchas, with new features in 2.8.4, that I would need to be aware of, that could potentially cause issues?

     
  • Anonymous

    Anonymous - 2014-04-09

    We have started to list some of our 2.8.4 changes in a new release notes page:

    https://sourceforge.net/apps/mediawiki/xcat/index.php?title=XCAT_2.8.4_Release_Notes

    This is not complete, be gives you an idea of some of the items. Besides new OS support, we spent some time cleaning things up and taking care of various outstanding defects and issues. I can't think of anything that would trip you up off-hand.

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-10

    I put in a fix for updatenode.pm to Broadcast the updatenode -g to all service nodes if sharedtftp=0

    The nly thing is the returned message ( multiple) maybe confusing. Maybe I should just write them to syslog for example. I did add the hostname. But we will get one for every service node.
    updatenode compute-03 -g
    Generated new mypostscript files on manage-02
    Generated new mypostscript files on service-03.ppd.pok.ibm.com

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-10

    2.8.4
    commit 8a592c77aa8fb30af3780490d900c328c97313aa
    2.9
    commit 357a53589cbdfd7e89192d95be8c03a6a44a06c7

     
  • Arif Ali

    Arif Ali - 2014-04-10

    thanks, I will get that tested

     
  • Arif Ali

    Arif Ali - 2014-04-10

    That works really well.

    nodeset doesn't do anything on the MN, but does on the SN; in the case for updatenode -g, it also creates on the MN; would you expect that

    I only mention this for consistency between nodeset and updatenode.

    I hope that makes sense.

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-11

    I see you are correct. What I did in updatenode was to allow the combination of nodes and servicenodes and nodes that are assigned to service nodes which are assigned to the MN. Which is not allowed in nodeset. Maybe I should make totally consistent with nodeset. On the other hand, maybe I should just leave well enough alone because updatenode is doing this one function with the -g flag. I will talk to Linda on Monday.
    What I did not want to do is have to evaluate every node.

    [manage-02][/root]> updatenode compute-01 -g
    Generated new mypostscript files on manage-02
    Generated new mypostscript files on service-03.ppd.pok.ibm.com
    [manage-02][/root]> updatenode compute-01,service-03 -g
    Generated new mypostscript files on manage-02
    Generated new mypostscript files on service-03.ppd.pok.ibm.com
    [manage-02][/root]> nodeset compute-01,service-03 install
    Error: Nodeset was run with a noderange containing both service nodes and compute nodes. This is not valid. You must submit with either compute nodes in the noderange or service nodes.

    Should I just put the "Generated" message to syslog, does it bother you?

     

    Last edit: Lissa Valletta 2014-04-11
  • Arif Ali

    Arif Ali - 2014-04-11

    It was more of an observation, than anything else.

    But the fact that SN will be able to become a MN, then it makes sense that the MN also has a copy

    Personally, not bothered about syslog, but it may be a good idea, for extra debugging purposes?

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-11

    The only problem is with pools what I generated on the MN would not be correct for MASTER for a compute node assigned to a service node. The use of the ip-facing routine to get the ip does not work. We will talk on Monday.

     
  • Lissa Valletta

    Lissa Valletta - 2014-04-14

    Talked to Linda. We will leave code as is and i will just check that if ip-facing on the service node does not return some valid address for the node, we default back to site.master.

     
  • Arif Ali

    Arif Ali - 2014-04-15

    cool, no problems, as long as it works

    thanks for your assistance on this

     
  • Arif Ali

    Arif Ali - 2014-05-23
    • status: pending --> closed