Menu

#4658 Redhat7.1 provisioning complains "error: timeout: could not resolve hardware address." and drop into rescue mode

2.10
closed
yangsong
None
linux provisioning
5
2015-07-03
2015-04-29
yangsong
No

Hello,

We have a ESS cluster
- 1 EMS Power 7R2"jjohnson2@lenovo.com"
- 2 P8 BE servers

Having issues installing redhat 7.1 over the network using xcat 2.9.1

TFTP BOOT ---------------------------------------------------
Server IP.....................192.168.45.111
Client IP.....................192.168.55.41
Gateway IP....................192.168.45.111
Subnet Mask...................255.255.0.0
( 1 ) Filename................./boot/grub2/grub2.ppc
TFTP Retries..................5
Block Size....................512
FINAL PACKET COUNT = 317
FINAL FILE SIZE = 161808 BYTES

Elapsed time since release of system processors: 109459 mins 51 secs

error: timeout: could not resolve hardware address.
Entering rescue mode...
grub rescue> [root@ems1 consoles]#

Drops us to grub rescue mode each time. We have installed redhat 7 hundreds of times on these nodes. The only change here is we are using xCAT 2.9.1, installed 7.1 on EMS and using a 7.1 O/S
image to deploy the nodes.

[root@ems1 consoles]# lsxcatd -v
Version 2.9.1 (git commit 7f6043fffd62d482931b17b60f9488eb5754fdc1, built Thu Mar 19 03:25:35 EDT 2015)

Bug possibly looks similar to -> http://sourceforge.net/p/xcat/bugs/4003/

[root@ems1 consoles]# lsdef -t osimage -l -o rhels7.1-ppc64-install-gss
Object name: rhels7.1-ppc64-install-gss
groups=all
imagetype=linux
osarch=ppc64
osname=Linux
osvers=rhels7.1
otherpkgdir=/install/gss/otherpkgs/rhels7.0/ppc64
otherpkglist=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.otherpkgs.pkglist
pkgdir=/install/rhels7.1/ppc64
pkglist=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.pkglist
postbootscripts=setupntp,gss_postboot,gss_ofed,gss_sashba
postscripts=otherpkgs,gss_instnic,gss_post
profile=gss
provmethod=install
synclists=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.synclist
template=/opt/ibm/gss/xcat/install/rh/gss.rhels7.ppc64.tmpl

[root@ems1 consoles]# rpm -qa | grep -i grub2
grub2-tools-2.02-0.16.el7.ppc64
grub2-2.02-0.16.el7.ppc64
grub2-xcat-1.0-2.noarch

We added nfsserver=192.168.45.111 tftpserver=192.168.45.111 but didnt make a difference

Any ideas here? Thanks


After looking into the problem, I think it is caused by the some bug in grub2-xcat,which is repackaged with grub2 on Redhat7. There is a similar defect in LTC https://bugzilla.linux.ibm.com/show_bug.cgi?id=119351.

I provisioned the failed nodes with grub2 shipped in Redhat7.1 successfully. The grub2/grub2-tools package is updated between 7.0 and 7.1:
on Redhat7.1:
grub2-tools-2.02-0.16.el7.ppc64
grub2-2.02-0.16.el7.ppc64

on Redhat7:
grub2-tools-2.02-0.2.10.el7.ppc64
grub2-2.02-0.2.10.el7.ppc64

We might need to repackaging the grub2-xcat, which requires lots of testing work. As a workaround, you can use the grub2 shipped in Redhat7.1 in the following steps:
1."nodeset <node> osimage=..."
2."grub2-mknetdir --net-directory=/tftpboot/"
3."cp /tftpboot/boot/grub2/powerpc-ieee1275/core.elf /tftpboot/boot/grub2/grub2.ppc"
4. "rnetboot ..."

Discussion

  • yangsong

    yangsong - 2015-05-26
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,3 @@
    -
    
     Hello,
    
    • status: open --> pending
     
  • yangsong

    yangsong - 2015-05-26

    the grub2-xcat is repackaged with grub2-2.02-0.16.ael7b.src.rpm which is shipped in redhat 7.1, the modified build script in checked in:

    commit 19d9d39ec8e41d07700c48295075feafb95f04c4
    Author: immarvin yangsbj@cn.ibm.com
    Date: Thu May 14 01:57:22 2015 -0400

    refine the build process of grub2-xcat in the README file
    

    commit 918a3e2a1ea5954b589c003bc56d912f7a215473
    Author: root root@c910f02c01p13.pok.stglabs.ibm.com
    Date: Mon May 11 10:00:51 2015 -0400

    use cp instead of ln during packaging;add more package info to the grub2-xcat rpm
    

    commit 234b44fd29eaa261f28ba9f316bd9b510b39e60f
    Author: immarvin yangsbj@cn.ibm.com
    Date: Sun May 10 09:30:07 2015 -0400

    build grub2-xcat with grub2-2.02-0.16.ael7b.src.rpm; set version and release of grub2-xcat according to the
    
     
  • yangsong

    yangsong - 2015-07-03
    • Description has changed:

    Diff:

    --- old
    +++ new
    @@ -1,4 +1,3 @@
    -
     Hello,
    
     We have a ESS cluster 
    
    • status: pending --> closed