Hello,
We have a ESS cluster
- 1 EMS Power 7R2"jjohnson2@lenovo.com"
- 2 P8 BE servers
Having issues installing redhat 7.1 over the network using xcat 2.9.1
TFTP BOOT ---------------------------------------------------
Server IP.....................192.168.45.111
Client IP.....................192.168.55.41
Gateway IP....................192.168.45.111
Subnet Mask...................255.255.0.0
( 1 ) Filename................./boot/grub2/grub2.ppc
TFTP Retries..................5
Block Size....................512
FINAL PACKET COUNT = 317
FINAL FILE SIZE = 161808 BYTES
Elapsed time since release of system processors: 109459 mins 51 secs
error: timeout: could not resolve hardware address.
Entering rescue mode...
grub rescue> [root@ems1 consoles]#
Drops us to grub rescue mode each time. We have installed redhat 7 hundreds of times on these nodes. The only change here is we are using xCAT 2.9.1, installed 7.1 on EMS and using a 7.1 O/S
image to deploy the nodes.
[root@ems1 consoles]# lsxcatd -v
Version 2.9.1 (git commit 7f6043fffd62d482931b17b60f9488eb5754fdc1, built Thu Mar 19 03:25:35 EDT 2015)
Bug possibly looks similar to -> http://sourceforge.net/p/xcat/bugs/4003/
[root@ems1 consoles]# lsdef -t osimage -l -o rhels7.1-ppc64-install-gss
Object name: rhels7.1-ppc64-install-gss
groups=all
imagetype=linux
osarch=ppc64
osname=Linux
osvers=rhels7.1
otherpkgdir=/install/gss/otherpkgs/rhels7.0/ppc64
otherpkglist=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.otherpkgs.pkglist
pkgdir=/install/rhels7.1/ppc64
pkglist=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.pkglist
postbootscripts=setupntp,gss_postboot,gss_ofed,gss_sashba
postscripts=otherpkgs,gss_instnic,gss_post
profile=gss
provmethod=install
synclists=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.synclist
template=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.tmpl
[root@ems1 consoles]# rpm -qa | grep -i grub2
grub2-tools-2.02-0.16.el7.ppc64
grub2-2.02-0.16.el7.ppc64
grub2-xcat-1.0-2.noarch
We added nfsserver=192.168.45.111 tftpserver=192.168.45.111 but didnt make a difference
Any ideas here? Thanks
After looking into the problem, I think it is caused by the some bug in grub2-xcat,which is repackaged with grub2 on Redhat7. There is a similar defect in LTC https://bugzilla.linux.ibm.com/show_bug.cgi?id=119351.
I provisioned the failed nodes with grub2 shipped in Redhat7.1 successfully. The grub2/grub2-tools package is updated between 7.0 and 7.1:
on Redhat7.1:
grub2-tools-2.02-0.16.el7.ppc64
grub2-2.02-0.16.el7.ppc64
on Redhat7:
grub2-tools-2.02-0.2.10.el7.ppc64
grub2-2.02-0.2.10.el7.ppc64
We might need to repackaging the grub2-xcat, which requires lots of testing work. As a workaround, you can use the grub2 shipped in Redhat7.1 in the following steps:
1."nodeset <node> osimage=..."
2."grub2-mknetdir --net-directory=/tftpboot/"
3."cp /tftpboot/boot/grub2/powerpc-ieee1275/core.elf /tftpboot/boot/grub2/grub2.ppc"
4. "rnetboot ..."
Diff:
the grub2-xcat is repackaged with grub2-2.02-0.16.ael7b.src.rpm which is shipped in redhat 7.1, the modified build script in checked in:
commit 19d9d39ec8e41d07700c48295075feafb95f04c4
Author: immarvin yangsbj@cn.ibm.com
Date: Thu May 14 01:57:22 2015 -0400
commit 918a3e2a1ea5954b589c003bc56d912f7a215473
Author: root root@c910f02c01p13.pok.stglabs.ibm.com
Date: Mon May 11 10:00:51 2015 -0400
commit 234b44fd29eaa261f28ba9f316bd9b510b39e60f
Author: immarvin yangsbj@cn.ibm.com
Date: Sun May 10 09:30:07 2015 -0400
Diff: