Menu

#2742 geniamge error with OFED installation failure

2.7.1
closed
jjh
Infiniband (17)
6
2012-09-19
2012-03-30
Li Cao
No

On sles11.1 x86_64
with 2.7.1 new build:

  1. run genimage -o sles11.1 -a x86_64

It failed with error:
Retrieving package mlnx-ofa_kernel-kmp-default-1.5.3_2.6.32.12_0.7-OFED.1.5.3.3.0.0.sles11sp1.x86_64 (376/382), 10.6 MiB (60.4 MiB unpacked)

Installing: mlnx-ofa_kernel-kmp-default-1.5.3_2.6.32.12_0.7-OFED.1.5.3.3.0.0.sles11sp1 [.......error]

Installation of mlnx-ofa_kernel-kmp-default-1.5.3_2.6.32.12_0.7-OFED.1.5.3.3.0.0.sles11sp1 failed:

(with --nodeps --force) Error: Subprocess failed. Error: RPM failed:

Kernel image: /boot/vmlinuz-2.6.32.12-0.7-default

Initrd image: /boot/initrd-2.6.32.12-0.7-default

node name not found

Root device (/dev/sda2) not found

error: %post(mlnx-ofa_kernel-kmp-default-1.5.3_2.6.32.12_0.7-OFED.1.5.3.3.0.0.sles11sp1.x86_64) scriptlet failed, exit status 1

Abort, retry, ignore? [a/r/i] (a): a

Problem occured during or after installation or removal of packages:

Installation aborted by user

Please see the above error message for a hint.

zypper invocation failed with rc: 4

  1. Then run genimage sles11.1-x86_64-netboot-compute, it can pass with the following error:
    TARGETS = halt dbus earlysyslog random sciv10 reboot haldaemon network boot.clock syslog splash_early rpcbind nfs splash network-remotefs gmond sshd single postfix gpfs gettyset cron xcatpostinit

TARGETS = halt dbus earlysyslog random sciv10 reboot haldaemon network boot.clock syslog splash_early rpcbind nfs splash network-remotefs gmond sshd single postfix gpfs gettyset cron xcatpostinit

  • installroot=/install/netboot/sles11.1/x86_64/compute/rootimg

  • ofeddir=/install/post/otherpkgs/sles11.1/x86_64/ofed/

  • NODESETSTATE=genimage

  • /install/postscripts/mlnxofed_ib_install

!/bin/sh -vx

Sample script to customize options for Mellonax OFED IB support

For AIX:

TBD

For Linux:

- For full-disk installs:

- Copy rpms to node

- Install IB rpms

- For diskless images:

- Copy the packages to the images.

- Install IB rpms

OS=uname

uname

++ uname

  • OS=Linux

installroot='/install/netboot/sles11.1/x86_64/compute/rootimg'

OFED_DIR='/install/post/otherpkgs/sles11.1/x86_64/ofed/'

OFED_DIR=$ofeddir

  • OFED_DIR=/install/post/otherpkgs/sles11.1/x86_64/ofed/

if [ -z "$OFED_DIR" ]; then

# try to default

OFED_DIR=$INSTALL_DIR/post/otherpkgs/$OSVER/$ARCH/ofed

fi

  • '[' -z /install/post/otherpkgs/sles11.1/x86_64/ofed/ ']'

if [ $NODESETSTATE != "genimage" ]; then

# running as a postscript in a full-disk install or AIX diskless install

installroot=""

fi

  • '[' genimage '!=' genimage ']'

if [ $OS != "AIX" ]; then

if [ $NODESETSTATE == "install" ] || [ $NODESETSTATE == "boot" ]; then

#  Being run from a stateful install postscript

#  Copy rpms directly from the xCAT management node and install

    mkdir -p /tmp/ofed

    rm -f -R /tmp/ofed/*

    cd /tmp/ofed

    download_dir=`echo $OFED_DIR | cut -d '/' -f3-`

    wget -l inf -N -r --waitretry=10 --random-wait --retry-connrefused -t 10 -T 60 -nH --cut-dirs=5 ftp://$SITEMASTER/$download_dir/ 2> /tmp/wget.log

    #rpm -Uvh --force libibverbs-devel*.rpm

    perl -x mlnxofedinstall --without-32bit --force

    rm -Rf /tmp/ofed

fi

if [ $NODESETSTATE == "genimage" ]; then

# Being called from <image>.postinstall script

# Assume we are on the same machine

    #if [[ $OS = sles* ]] || [[ $OS = suse* ]] || [[ -f /etc/SuSE-release ]]; then

    # For SLES, assume zypper is available on the system running genimage

        mkdir $installroot/tmp/ofed_install

        cp -r $OFED_DIR $installroot/tmp/ofed_install/

        chroot $installroot perl -x /tmp/ofed_install/ofed/mlnxofedinstall --without-32bit --force

        rm -rf $installroot/tmp/ofed_install

    #fi

fi

fi

  • '[' Linux '!=' AIX ']'

  • '[' genimage == install ']'

  • '[' genimage == boot ']'

  • '[' genimage == genimage ']'

  • mkdir /install/netboot/sles11.1/x86_64/compute/rootimg/tmp/ofed_install

  • cp -r /install/post/otherpkgs/sles11.1/x86_64/ofed/ /install/netboot/sles11.1/x86_64/compute/rootimg/tmp/ofed_install/

  • chroot /install/netboot/sles11.1/x86_64/compute/rootimg perl -x /tmp/ofed_install/ofed/mlnxofedinstall --without-32bit --force

df: Warning: cannot read table of mounted file systems

This program will install the MLNX_OFED_LINUX package on your machine.

Note that all other Mellanox, OEM, OFED, or Distribution IB packages will be removed.

Failed to uninstall ofa_kernel KMP RPMs

  • rm -rf /install/netboot/sles11.1/x86_64/compute/rootimg/tmp/ofed_install

  1. Then we run the diskless installation, the installaiton pass but the ib0 was not configured .

Discussion

  • jjh

    jjh - 2012-03-31

    Hi Cao Li,

    Would you please verify it using the /opt/xcat/share/xcat/ib/scripts/Mellanox/mlnxofed_ib_install script and /opt/xcat/share/xcat/ib/netboot/sles/ib.sles11.1.x86_64.pkglist from xCAT 2.7.1 new build, instead of the old scripts and old configuration in your environment ?

    And please refer to the doc :

    https://sourceforge.net/apps/mediawiki/xcat/index.php?title=Managing_the_Mellanox_Infiniband_Network#Mellanox_IB_Interface_Configuration

    Any problems, please let me know.

    Thanks.

     
  • jjh

    jjh - 2012-03-31

    I used the scripts and pkglist from the xCAT 2.71. new build, and run succeessfully to genimage/packimage.

    But I re-run the genimage based on the last successful rootimg. The mellanox script could not unistall the package mlnx-ofa_kernel-kmp-default .

    I will contact with IB team, and there is a workaround that before run genimage, please clean up the /install/netboot/sles11.1/x86_64/compute/rootimg,

    Thanks.

     
  • jjh

    jjh - 2012-04-01

    When run genimage twice based on the the last successful rootimg, the secondary will fail. I added a special case for sles11sp1 in the mlnxofed_ib_install. The error does not come out in the rhels6.1 x86_64 environment.

    I have checked the code into 2.7 revision 12085 and trunk revision 12086.

     
  • jjh

    jjh - 2012-05-16

    This bug has been fixed. close it for xCAT 2.7.2