|
From: Pocina, G. <Gor...@DE...> - 2012-10-12 13:52:17
|
Thanks, still no luck, but some more information.
DNS looks ok. NTP gets set up by a custom rpm, and while the time looks ok at the end of the anaconda install, the messages log clearly shows the time is off
on the labcm0001 until that rpm gets installed. I added the network specific NTP servers to the networks table, but that made no difference.
I upgraded to 2.7.4, rebuilt the genesis net boot image, restarted xcatd, and methodically tried again, with the same result, except that now after
the install, the genesis kernel is no longer in bootparams for the node. Instead, the CentOS 6.2 kernel is in place, so that is consistent with the other nodes.
Note: this node has postbootscripts. I'm starting to wonder if maybe the "install done" message isn't being sent to xCAT because the postbootscripts
haven't been run yet. I'll try taking them out to see if that makes a difference.
Q. Where do I look for log information that indicates whether the install has completeted successfully from xCAT's POV. I can see that all of the
postscripts have been run. Of course none of the postbootscripts run until I manually configure the node to boot from the HD.
The chain I've been using is: Chain="runcmd=discover,bmcsetup,install,boot" ondiscover="nodediscover"
Does this look right? The messages log seems to have an excessive number of "Destiny" messages, given the length of the chain. There are 6 "Allowing nextdestiny" between 10:28 and 10:24:34. By 10:47 the first boot hasn't occurred, and I've tried to run "rsetboot hd" manually.
Oct 11 10:28:06 drdkvm0003 xCAT: xCAT: Allowing nodels to labcm0001 bootparam for root from localhost.localdomain
Oct 11 10:28:08 drdkvm0003 xCAT: xCAT: Allowing nextdestiny for labcm0001 from labcm0001
Oct 11 10:28:08 drdkvm0003 xCAT: xCAT: Allowing nodels to labcm0001 bootparams for root from localhost.localdomain
Oct 11 14:28:25 labcm0001 (none) dhclient[800]: XMT: Solicit on eth0, interval 118580ms.
Oct 11 14:28:36 labcm0001 (none) dhclient[2017]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 7 (xid=0x2667a922)
Oct 11 14:28:43 labcm0001 (none) dhclient[2017]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 12 (xid=0x2667a922)
Oct 11 10:28:46 drdkvm0003 xCAT: xCAT: Allowing rpower to labcm0001 discover for root from localhost.localdomain
Oct 11 14:28:55 labcm0001 (none) dhclient[2017]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 10 (xid=0x2667a922)
Oct 11 14:29:05 labcm0001 (none) dhclient[2017]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 12 (xid=0x2667a922)
Oct 11 10:29:09 drdkvm0003 xCAT: xCAT: Allowing nextdestiny for labcm0001 from labcm0001
Oct 11 14:29:17 labcm0001 (none) dhclient[2017]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 16 (xid=0x2667a922)
Oct 11 10:29:18 drdkvm0003 xCAT: xCAT: Allowing nodeset to labcm0001 install for root from localhost.localdomain
Oct 11 14:29:32 labcm0001 (none) dhclient[814]: XMT: Solicit on eth1, interval 120520ms.
Oct 11 14:29:32 labcm0001 (none) dhclient[814]: send_packet6: Network is unreachable
Oct 11 14:29:32 labcm0001 (none) dhclient[814]: dhc6: send_packet6() sent -1 of 102 bytes
Oct 11 14:29:33 labcm0001 (none) dhclient[2017]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 4 (xid=0x2667a922)
Oct 11 14:29:37 labcm0001 (none) dhclient[2017]: No DHCPOFFERS received.
Oct 11 14:29:37 labcm0001 (none) dhclient[2017]: No working leases in persistent database - sleeping.
Oct 11 10:30:10 drdkvm0003 xCAT: xCAT: Allowing nextdestiny for labcm0001 from labcm0001
Oct 11 14:30:24 labcm0001 (none) dhclient[800]: XMT: Solicit on eth0, interval 121750ms.
Oct 11 10:31:29 drdkvm0003 xCAT: xCAT: Allowing nextdestiny for labcm0001 from labcm0001
Oct 11 10:32:49 drdkvm0003 xCAT node discovery: labcm0001 has been discovered
Oct 11 14:32:55 labcm0001 (none) rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="797" x-info="http://www.rsyslog.com"] (re)start
Oct 11 14:32:56 labcm0001 (none) ntpd[1835]: Listening on interface #6 eth0, 149.77.53.126#123 Enabled
Oct 11 14:32:56 labcm0001 (none) ntpd[1835]: new interface(s) found: waking up resolver
Oct 11 14:32:57 labcm0001 (none) dhclient[3085]: Bound to *:546
Oct 11 14:32:57 labcm0001 (none) dhclient[3088]: Bound to *:546
Oct 11 14:32:57 labcm0001 (none) dhclient[3098]: Bound to *:546
Oct 11 14:32:58 labcm0001 (none) dhclient[3094]: XMT: Solicit on eth0, interval 1060ms.
Oct 11 14:32:59 labcm0001 (none) dhclient[3094]: XMT: Solicit on eth0, interval 2190ms.
Oct 11 14:33:00 labcm0001 (none) dhclient[3081]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 3 (xid=0x3ea71efa)
Oct 11 14:33:01 labcm0001 (none) dhclient[3094]: XMT: Solicit on eth0, interval 4250ms.
Oct 11 14:33:03 labcm0001 (none) dhclient[3081]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 5 (xid=0x3ea71efa)
Oct 11 14:33:03 labcm0001 (none) dhclient[3101]: Bound to *:546
Oct 11 14:33:03 labcm0001 (none) dhclient[3308]: XMT: Solicit on eth1, interval 1030ms.
Oct 11 14:33:03 labcm0001 (none) dhclient[3308]: send_packet6: Network is unreachable
Oct 11 14:33:03 labcm0001 (none) dhclient[3308]: dhc6: send_packet6() sent -1 of 110 bytes
Oct 11 14:33:04 labcm0001 (none) dhclient[3308]: XMT: Solicit on eth1, interval 2050ms.
Oct 11 14:33:04 labcm0001 (none) dhclient[3308]: send_packet6: Network is unreachable
Oct 11 14:33:04 labcm0001 (none) dhclient[3308]: dhc6: send_packet6() sent -1 of 110 bytes
Oct 11 14:33:06 labcm0001 (none) dhclient[3094]: XMT: Solicit on eth0, interval 8510ms.
Oct 11 14:33:06 labcm0001 (none) dhclient[3308]: XMT: Solicit on eth1, interval 4060ms.
Oct 11 14:33:06 labcm0001 (none) dhclient[3308]: send_packet6: Network is unreachable
Oct 11 14:33:06 labcm0001 (none) dhclient[3308]: dhc6: send_packet6() sent -1 of 110 bytes
Oct 11 14:33:08 labcm0001 (none) dhclient[3081]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9 (xid=0x3ea71efa)
Oct 11 14:33:10 labcm0001 (none) dhclient[3308]: XMT: Solicit on eth1, interval 8090ms.
Oct 11 14:33:10 labcm0001 (none) dhclient[3308]: send_packet6: Network is unreachable
Oct 11 14:33:10 labcm0001 (none) dhclient[3308]: dhc6: send_packet6() sent -1 of 110 bytes
Oct 11 10:33:12 drdkvm0003 xCAT: xCAT: Allowing getcredentials x509cert from labcm0001
Oct 11 10:33:14 drdkvm0003 xCAT: xCAT: Allowing nextdestiny for labcm0001 from labcm0001
Oct 11 14:33:14 labcm0001 (none) dhclient[3094]: XMT: Solicit on eth0, interval 17530ms.
Oct 11 14:33:17 labcm0001 (none) dhclient[3081]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 12 (xid=0x3ea71efa)
Oct 11 14:33:18 labcm0001 (none) dhclient[3308]: XMT: Solicit on eth1, interval 16230ms.
Oct 11 14:33:18 labcm0001 (none) dhclient[3308]: send_packet6: Network is unreachable
Oct 11 14:33:18 labcm0001 (none) dhclient[3308]: dhc6: send_packet6() sent -1 of 110 bytes
Oct 11 10:33:28 drdkvm0003 xCAT: xCAT: Allowing rpower to labcm0001 boot for root from localhost.localdomain
Oct 11 14:33:29 labcm0001 (none) dhclient[3081]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 9 (xid=0x3ea71efa)
Oct 11 14:33:32 labcm0001 (none) dhclient[3094]: XMT: Solicit on eth0, interval 35450ms.
Oct 11 10:34:34 labcm0001 (none) rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="734" x-info="http://www.rsyslog.com"] (re)start
Oct 11 10:34:34 labcm0001 (none) dhclient[760]: bound to 149.77.53.126 -- renewal in 18397 seconds.
Oct 11 10:34:34 labcm0001 (none) ntpd[1744]: ntpd 4.2.4p8@1.1612-o Tue Nov 29 00:09:12 UTC 2011 (1)
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: precision = 0.053 usec
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: Listening on interface #0 wildcard, 0.0.0.0#123 Disabled
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: Listening on interface #1 wildcard, ::#123 Disabled
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: Listening on interface #2 lo, ::1#123 Enabled
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: Listening on interface #3 lo, 127.0.0.1#123 Enabled
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: Listening on interface #4 eth0, 149.77.53.126#123 Enabled
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: Listening on routing socket on fd #21 for interface updates
Oct 11 10:34:34 labcm0001 (none) ntpd[1748]: kernel time sync status 2040
Oct 11 10:34:33 drdkvm0003 xCAT: xCAT: Allowing getcredentials x509cert from labcm0001
Oct 11 10:34:34 labcm0001 (none) dhclient[761]: XMT: Solicit on eth0, interval 2150ms.
Oct 11 10:34:34 labcm0001 (none) dhclient[782]: Bound to *:546
Oct 11 10:34:34 labcm0001 (none) dhclient[781]: DHCPDISCOVER on eth1 to 255.255.255.255 port 67 interval 6 (xid=0x27b03e5c)
Oct 11 10:34:34 drdkvm0003 xCAT: xCAT: Allowing nextdestiny for labcm0001 from labcm0001
Oct 11 10:34:35 labcm0001 (none) dhclient[782]: XMT: Solicit on eth1, interval 1070ms.
Oct 11 10:34:35 labcm0001 (none) dhclient[782]: send_packet6: Network is unreachable
Oct 11 10:34:35 labcm0001 (none) dhclient[782]: dhc6: send_packet6() sent -1 of 102 bytes
Oct 11 10:34:36 labcm0001 (none) ntpd[1748]: Listening on interface #5 eth0, fe80::225:90ff:fe74:e378#123 Enabled
Oct 11 10:34:36 labcm0001 (none) ntpd[1748]: new interface(s) found: waking up resolver
Oct 11 10:34:36 labcm0001 (none) dhclient[782]: XMT: Solicit on eth1, interval 2050ms.
Oct 11 10:34:36 labcm0001 (none) dhclient[782]: send_packet6: Network is unreachable
Oct 11 10:34:36 labcm0001 (none) dhclient[782]: dhc6: send_packet6() sent -1 of 102 bytes
Oct 11 10:34:36 labcm0001 (none) dhclient[761]: XMT: Solicit on eth0, interval 4130ms.
Oct 11 10:34:38 labcm0001 (none) dhclient[782]: XMT: Solicit on eth1, interval 3900ms.
Oct 11 10:34:38 labcm0001 (none) dhclient[782]: send_packet6: Network is unreachable
Oct 11 10:34:38 labcm0001 (none) dhclient[782]: dhc6: send_packet6() sent -1 of 102 bytes
Oct 11 10:41:59 drdkvm0003 xCAT: xCAT: Allowing nodels to labcm0001 chain for root from localhost.localdomain
Oct 11 10:42:41 drdkvm0003 xCAT: xCAT: Allowing getpostscript from labcm0001
Oct 11 06:42:42 labcm0001 kernel: imklog 4.6.2, log source = /proc/kmsg started.
Oct 11 06:42:42 labcm0001 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="11689" x-info="http://www.rsyslog.com"] (re)start
Oct 11 06:42:42 labcm0001 xCAT: Install: syslog setup
Oct 11 06:42:42 labcm0001 xcat: Install: setup /etc/ssh/sshd_config
Oct 11 06:42:42 labcm0001 xcat: Install: setup root .ssh
Oct 11 10:43:15 drdkvm0003 xCAT: xCAT: Allowing getcredentials ssh_dsa_hostkey from labcm0001
Oct 11 06:43:15 labcm0001 xCAT: ssh_dsa_hostkey
Oct 11 10:43:15 drdkvm0003 xCAT: xCAT: Allowing getcredentials ssh_rsa_hostkey from labcm0001
Oct 11 06:43:15 labcm0001 xCAT: ssh_rsa_hostkey
Oct 11 10:43:49 drdkvm0003 xCAT: xCAT: Allowing getcredentials ssh_root_key from labcm0001
Oct 11 06:43:49 labcm0001 xCAT: ssh_root_key
Oct 11 06:43:49 labcm0001 xCAT: start up sshd
Oct 11 06:43:49 labcm0001 sshd[21429]: error: Bind to port 22 on 0.0.0.0 failed: Address already in use.
Oct 11 06:43:49 labcm0001 sshd[21429]: error: Bind to port 22 on :: failed: Address already in use.
Oct 11 06:43:49 labcm0001 sshd[21429]: fatal: Cannot bind any address.
Oct 11 06:43:49 labcm0001 xCAT: ./syncfiles: there is no sync file template for the node
Oct 11 06:43:49 labcm0001 kernel: SELinux: initialized (dev 0:22, type nfs4), uses genfs_contexts
Oct 11 06:43:49 labcm0001 kernel: SELinux: initialized (dev 0:23, type nfs4), uses genfs_contexts
Oct 11 06:44:35 labcm0001 xcat: ready
Oct 11 06:44:35 labcm0001 xcat: done
Oct 11 06:44:36 labcm0001 kernel: Kernel logging (proc) stopped.
Oct 11 06:44:36 labcm0001 rsyslogd: [origin software="rsyslogd" swVersion="4.6.2" x-pid="11689" x-info="http://www.rsyslog.com"] exiting on signal 15.
Oct 11 10:47:05 drdkvm0003 xCAT: xCAT: Allowing rsetboot to labcm0001 hd for root from localhost.localdomain
Q Another guess I had was that "bmcsetup" was doing something subtle to prevent future invocations of "rsetboot labcm0001 hd" from working.
To check this I installed "freeipmi" on a node which has not gone through auto-discovery (bmc.2) and a problem node that has (bmc.1), and run a
diff of the output configuration. I don't see anything obvious, but I'm not that familiar with BMC config.
Thanks.
Key: < == "rsetboot hd works" > == "rsetboot hd doesn't work"
[root@labcm0002 tmp]# ssh labcm0001 cat /tmp/bmc.1 | diff bmc.2 -
40c40 (for NULL user - should be OK?)
< Enable_User Yes
---
> Enable_User No
66c66 (for admin user known to xCAT, should be OK?)
< Lan_Enable_Link_Auth No
---
> Lan_Enable_Link_Auth Yes
303c303
< IP_Address 149.77.53.123
---
> IP_Address 149.77.53.122
305c305
< MAC_Address 00:25:90:74:E2:0B
---
> MAC_Address 00:25:90:74:E1:F9
311c311
< Default_Gateway_MAC_Address 00:00:00:00:00:00
---
> Default_Gateway_MAC_Address 00:00:0C:07:AC:00
362c362
< Admin_Enable_Auth_Type_MD2 Yes
---
> Admin_Enable_Auth_Type_MD2 No
366c366
< Admin_Enable_Auth_Type_Straight_Password Yes
---
> Admin_Enable_Auth_Type_Straight_Password No
427c427
< Maximum_Privilege_Cipher_Suite_Id_0 Administrator
---
> Maximum_Privilege_Cipher_Suite_Id_0 Unused
471c471
< Character_Accumulate_Interval 4
---
> Character_Accumulate_Interval 0
473c473
< Character_Send_Threshold 70
---
> Character_Send_Threshold 0
481c481
< Volatile_Bit_Rate 19200
---
> Volatile_Bit_Rate 115200
From: Russell Jones [mailto:ru...@jo...]
Sent: Wednesday, October 10, 2012 1:40 PM
To: xca...@li...
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
Just a quick couple of things that have bit me on this same issue:
* DNS is set incorrectly. Check to make sure /etc/resolv.conf on the node has the correct data the entire time through the install process. Make sure you don't have a postscript overwriting it with an invalid configuration.
* Network configuration is getting monkeyed with during postscripts. Make sure that eth0 is the only interface that is up and active during the install process. Make sure eth1 isn't accidentally being brought up and obtaining an IP on the same network as eth0. This could cause the node to attempt to contact the MN using the wrong interface.
On 10/10/2012 9:13 AM, Jarrod B Johnson wrote:
Now the issue of endless install, that would be a failure of the OS to update the management server. May need log output to suggest why updateflag would be failing...
|