In setting up HASN, we hit a problem where our resolv.conf was pointing to one of the service nodes. When we shutdown the nameserver service node, the entire cluster hung.
Workaround
We have a complete host list in the shared root .client_data, so to work around we are not using resolv.conf.
Hi, I hit a hung issue when I tested "take down one SN and reboot CNs on another SN", seemed the workaround didn't work for me too.
I changed the /install/nim/resolv_conf/GOLD_71Ddskls_SP3_resolv_conf to include all the SNs, like:
And then the nodes can boot up, but the xcat postscripts need more time to finish, on f12 it took about 8 minutes for all postscripts. Is that acceptable? Another way to resolve this problem may delete the resolv.conf nim object and change the /etc/netsvc.conf to use local /etc/hosts firstly, which one is more fix our customers?
I suggest you don't use a nameserver for the time being.
The site table on the test system has both the nameservers and domain attrs set which means the code will attempt to create a resolv.conf. I think this feature needs to be looked at again to see what is required for an HASN environment.
If you remove the nameservers attr from the site table then mkdsklsnode will not attempt to create resol.conf.
For the HASN environment we recommend the you don't use nameservers - as mentioned in the previous note.
However, I'll use this bug to change the contents of the resolv.conf files to include both servers - just in case a user does decide to use a nameserver.
The priority cna probably be lowered.
The HASN doc has been updated to indicate that you should not use a resolv_conf resource in this environment. See section "Remove resolv_conf resource".
Also, the snmove command has been modified to NOT run the mkresolvconf script in this environment.
File: snmove.pm
2.7.3 - rev 13025
2.8 - rev 13026