|
From: Pocina, G. <Gor...@DE...> - 2012-10-17 15:10:49
|
Update. We have install methods that work, but I have some more detail on the original problem. My guess is the root cause is one of:
- A problem with the node firmware refusing to try the next boot device after xnba processes the the gPXE "exit" directive.
- A problem with how xnba processes the gPXE exit directive.
- A problem with how the boot sector is set up during xnba installs (seems unlikely, but the last point below might suggest it.)
Data: all with node boot order set to (net,hd), using BIOS (not UEFI).
- If "pxe" is used for discovery, bmcsetup, and install, the node boots up on the HD after the install. Works.
- Following a "pxe" install, if noderes.netboot is set to "xnba", the hd boot fails with "no more network devices." The gpxe tftp file looks good and contains "exit" as it should.
- If "xnba" is used for the install, hd boot fails exactly as above.
- Following an xnba install, if noderes.netboot is set to "pxe", the boot fails with:
Loading pxelinux.cfg/954D3582
Booting from local disk...
Reboot and Select proper boot device
Or Insert Boot Media in the selected Boot device and press a key
This last data point suggests the boot record is hosed, but if the boot order is changed to (hd,net) in the BIOS, the node has no trouble coming up, so I don't know.
I've also tried UEFI boot, as suggested, and that works, with xCAT creating a first boot method called [CentOS] on the node. That's actually very nice, and we might end up using it production for these nodes.
Meanwhile we're using "pxe" and following up with the node vendor to see if this might be firmware problem (SuperMicro X9DRT). There is one glitch with the full "pxe" install, in that the transition from "discovery" to "runcmd=bmcsetup" hangs with the node in standby condition, and "nodeset node runcmd=bmcsetup" must be run. However we have to manually run "rmnodecfg" to kick things off, so on more manual step isn't a show-stopper.
Thanks much for all the help.
From: Jarrod B Johnson [mailto:jbj...@us...]
Sent: Friday, October 12, 2012 5:22 PM
To: xCAT Users Mailing list
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
That's an appropriate approach. We have tried that in the past to be more boot order immune, but occasionally a BIOS implementation will hang when we try that. The failure rate when such a BIOS is encountered has pretty much been 100%, so if it works for you, it will probably always work.
-----"Pocina, Goran" <Gor...@DE...<mailto:Gor...@DE...>> wrote: -----
To: xCAT Users Mailing list <xca...@li...<mailto:xca...@li...>>
From: "Pocina, Goran" <Gor...@DE...<mailto:Gor...@DE...>>
Date: 10/12/2012 05:06PM
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
The boot order is currently 'net,hd', but hd only seems to work if I add chain.c32 as the default:
[root@drdkvm0003 xcat]# cat ./lib/perl/xCAT_plugin/xnba.pm.diff
244c244
< print $ucfg 'default="chain.c32"'."\n";
---
> print $ucfg 'default="xCAT"'."\n";
I'm not sure what we lose by doing that though...
From: Jarrod B Johnson [mailto:jbj...@us...]
Sent: Friday, October 12, 2012 4:45 PM
To: xCAT Users Mailing list
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
Usually, boot sequence is 'net,hd'.
If you UEFI boot, the installer will auto-fixup the boot sequence to be HD first (currently)
If you bios boot, you can rsetboot hd to get it up if the boot order is suboptimal, and then use asu to change the persistant one (if IBM equipment)
-----"Pocina, Goran" <Gor...@DE...<mailto:Gor...@DE...>> wrote: -----
To: xCAT Users Mailing list <xca...@li...<mailto:xca...@li...>>
From: "Pocina, Goran" <Gor...@DE...<mailto:Gor...@DE...>>
Date: 10/12/2012 04:31PM
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
That's interesting. I think you've hit the nail on the head. The difference between the "problem" node and the "working" node was the "problem" node was set to network boot, and the "working" node was set to boot from the hard disk.
So in order to do an install of the "working" node, I had to first run "rsetboot labcm0003 net", and then once the install was done, it booted back into the HD OS.
So that explains the entire difference in behavior between two nodes.
So I guess my problem boils down to not knowing how to boot from the HD with a node configured for network boot, that being the preferred configuration for auto-discovery, and bmcsetup.
Thanks.
From: Jarrod B Johnson [mailto:jbj...@us...]
Sent: Wednesday, October 10, 2012 10:14 AM
To: xCAT Users Mailing list
Cc: xCAT Users Mailing list
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
rsetboot sets boot device. If the boot order of the target always netboot attempts first, then rsetboot doesn't matter. rsetboot is required if your boot sequence nominally jumps to the installed OS without trying to netboot.
Now the issue of endless install, that would be a failure of the OS to update the management server. May need log output to suggest why updateflag would be failing...
-----"Pocina, Goran" <Gor...@DE...<mailto:Gor...@DE...>> wrote: -----
To: xCAT Users Mailing list <xca...@li...<mailto:xca...@li...>>
From: "Pocina, Goran" <Gor...@DE...<mailto:Gor...@DE...>>
Date: 10/10/2012 08:22AM
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
BTW I'm not completely clear on how "rpower", "rsetboot", "nodeset/rinstall" work together. Sometimes, before using "rinstall", I find it's necessary to run "rsetboot net", otherwise the node simply boots up into the old OS without attempting an install. However with KVM guest installs, this doesn't seem to be needed. Also, sometimes it's necessary to run "rsetboot hd" following the install, otherwise the node will continuously re-install itself, other times it will simply boot up the new OS as expected.
I suspect it has to do with the type of node, and with whether or not the install was completely successful, but I don't completely understand it.
Thanks,
Goran
From: Pocina, Goran
Sent: Tuesday, October 09, 2012 5:10 PM
To: xCAT Users Mailing list
Subject: RE: [xcat-user] Unable to boot from HD after auto-discovery.
Thanks.
The node went into standby mode. Should I remove that as well from kcmdline?
I removed "quiet console..." from bootparams.kcmdline, and then ran "rsetboot labcm001 hd" and "rpower boot", which resulted in the following being generated:
[root@drdkvm0003 nodes]# cat /tftpboot/xcat/xnba/nodes/labcm0001
#!gpxe
#standby
imgfetch -n kernel http://${next-server}/tftpboot/xcat/genesis.kernel.x86_64<http://$%7bnext-server%7d/tftpboot/xcat/genesis.kernel.x86_64>
imgload kernel
imgargs kernel console=tty0 xcatd=149.77.53.252:3001 destiny=standby BOOTIF=01-${netX/machyp}
imgfetch http://${next-server}/tftpboot/xcat/genesis.fs.x86_64.lzma<http://$%7bnext-server%7d/tftpboot/xcat/genesis.fs.x86_64.lzma>
imgexec kernel
It went into standby mode. The console show detailed kernel boot messages, followed by:
Received request to retry in a bit, will call xCAT back in NNN seconds.
[root@drdkvm0003 nodes]# nodels labcm0001 chain
labcm0001: chain.chain: runcmd=bmcsetup,standby
labcm0001: chain.node: labcm0001
labcm0001: chain.currstate: standby
labcm0001: chain.currchain: standby
labcm0001: chain.ondiscover:
labcm0001: chain.comments:
labcm0001: chain.disable:
[root@drdkvm0003 nodes]# nodels labcm0001 bootparams
labcm0001: bootparams.kcmdline: console=tty0 xcatd=149.77.53.252:3001 destiny=standby
labcm0001: bootparams.kernel: xcat/genesis.kernel.x86_64
labcm0001: bootparams.initrd: xcat/genesis.fs.x86_64.lzma
labcm0001: bootparams.node: labcm0001
labcm0001: bootparams.addkcmdline:
labcm0001: bootparams.comments:
labcm0001: bootparams.adddhcpstatements:
labcm0001: bootparams.disable:
labcm0001: bootparams.dhcpstatements:
From: Sten Wolf [mailto:st...@ch...]
Sent: Tuesday, October 09, 2012 3:42 PM
To: xCAT Users Mailing list
Subject: Re: [xcat-user] Unable to boot from HD after auto-discovery.
That seems strange - why would you need to "nodeset boot" after "rsetboot hd" ? nodeset boot modifies tftpboot files, but rsetboot hd should already bypass the entire pxe chain.
As an interim troubleshooting step - try removing the string "quiet console=ttyS1,115200" then issue rpower boot again to the node (do not "nodeset" or you will overwrite your modifications), this time with console attached directly to the node (or with ipmi console redirection, not serial redirection), to see what happens after initial image is loaded.
On 09/10//2012 21:00, Pocina, Goran wrote:
Posted again after fixing text formatting:
xCAT 2.7.3 CentOS 6.2 local disk installs have been working for us on a node configured with noderes.netboot=xnba.
We can toggle back and forth between "net" and "hd" boots using the "rsetboot: command.
We recently tried auto-discovery, and since then haven't been able to HD boot the node:
- Discovery works, with a correct DHCP entry created based on the switch port.
- Bmcsetup works, with correct IP address assigned to ipmi.
- OS 6.2 install on local disk works appears to work.
- A boot attempt at this point displays the screen below:
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
xCAT-user mailing list
xCA...@li...<mailto:xCA...@li...>
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
xCAT-user mailing list
xCA...@li...<mailto:xCA...@li...>
https://lists.sourceforge.net/lists/listinfo/xcat-user
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
xCAT-user mailing list
xCA...@li...<mailto:xCA...@li...>
https://lists.sourceforge.net/lists/listinfo/xcat-user
|