Menu

#4466 ubuntu 14.10 genimage for netboot results in an unbootable image

2.9.1
pending
None
unknown
5
2015-02-23
2014-12-10
No

ubuntu 14.10 genimage for netboot results in an unbootable image

I'm attempting to pick up a an updated kernel image and i'm following the directions here:

http://sourceforge.net/p/xcat/wiki/Using_Provmethod%3Dosimagename/#installing-a-new-kernel-in-the-stateless-image

as well as the information provided by:
lsdef -t osimage -lh
regarding the use of the kerneldir and kernelver parameters.

kerneldir:      The directory name where the 3rd-party kernel is stored. It is used for diskless image only.
kernelver:      The version of linux kernel used in the linux image. If the kernel version is not set, the default kernel in rootimgdir will be used

We have a minimal list of otherpkgs to to install (including the kernel headers)

We build this following the directions for building with a mixed architecture environment as follows:

on the management node we run:

genimage --dryrun tulgpu-0000-netboot-compute  

And then copy its output.

On a stateful boot of the target node we run:

mkdir -p /install
mkdir -p /opt/xcat
mkdir -p /etc/xcat
mount bgxcat:/install /install # the mount needs to be rw
mount bgxcat:/opt/xcat /opt/xcat
mount bgxcat:/etc/xcat /etc/xcat

cd /opt/xcat/share/xcat/netboot/ubuntu; ./genimage -a ppc64el -o ubuntu14.10 -p compute -k 3.16.0-24-generic --kerneldir /install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports --srcdir /install/ubuntu14.10/ppc64el --pkglist /opt/xcat/share/xcat/netboot/ubuntu/compute.pkglist --otherpkgdir "http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports  utopic main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports  utopic-backports main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-security  main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-updates   main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at7.1,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at8.0,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /" --otherpkglist /install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.otherpkg.pkglist --postinstall /install/postscripts/custom/tulgpu-0000-netboot-compute/compute.postinstall --rootimgdir /install/netboot/ubuntu14.10/ppc64el/compute tulgpu-0000-netboot-compute

this resulted in the output in the attached file genimage.log, which appears to complete successfully.

we then run the following on the management node to pack the image and initiate a boot:

packimage tulgpu-0000-netboot-compute
nodeset tulgpu002 osimage=tulgpu-0000-netboot-compute
rpower tulgpu002 reset

When we do this, the system console ends up displaying:

...
[    1.791833] NET: Registered protocol family 17
[    1.791943] Key type dns_resolver registered
[    1.792536] Loading compiled-in X.509 certificates
[    1.794152] Loaded X.509 cert 'Magrathea: Glacier signing key: 32f4d03489c67cd7716794f60c00d7f7e8d2780e'
[    1.794337] registered taskstats version 1
[    1.803268] Key type trusted registered
[    1.805287] Key type encrypted registered
[    1.807396] AppArmor: AppArmor sha1 policy hashing enabled
[    1.807452] ima: No TPM chip found, activating TPM-bypass!
[    1.807544] evm: HMAC attrs: 0x1
[    1.808470] /build/buildd/linux-3.16.0/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    1.809357] Freeing unused kernel memory: 5632K (c000000000d60000 - c0000000012e0000)
insmod: can't insert '/lib/bnx2.ko': No such file or directory
insmod: can't insert '/lib/bnx2x.ko': No such file or directory
[    1.832985] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[    1.833113] e1000: Copyright (c) 1999-2006 Intel Corporation.
[    1.846519] e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
[    1.846631] e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
insmod: can't insert '/lib/igb.ko': No such file or directory
Creating device nodes with udev
Couldn't find the proper booting device, switch to shell...

 Entering rescue/debug init shell.
 Exit shell to continue booting.
sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell

I have also attached a complete console log of this boot in the file tulgpu002.console.log

2 Attachments

Related

Bugs: #4466

Discussion

  • ralph bellofatto

    We further isolated the problem today by cloning the osimage definiition and then removing parameters from what we recalled was the last time we thought we saw a successful boot.

    It appears that the kernelver parameter keeps things from working correctly.

    We created a definition as follows:

    Object name: tulgpu-0001-netboot-compute
        exlist=/opt/xcat/share/xcat/netboot/ubuntu/compute.exlist
        groups=all
        imagetype=linux
        osarch=ppc64el
        osname=Linux
        osvers=ubuntu14.10
        otherpkgdir=http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports  utopic main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports  utopic-backports main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-security  main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-updates   main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at7.1,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at8.0,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /
        otherpkglist=/install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.otherpkg.pkglist
        pkgdir=/install/ubuntu14.10/ppc64el
        pkglist=/opt/xcat/share/xcat/netboot/ubuntu/compute.pkglist
        postinstall=/install/postscripts/custom/tulgpu-0000-netboot-compute/compute.postinstall
        postscripts=custom/tulgpu-0000-netboot-compute/compute.postscript
        profile=compute
        provmethod=netboot
        rootimgdir=/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute
        synclists=/install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.synclist
    

    And then proceeded to perform the genimage/packimage/nodeset/rpower on the corresponding CN and we got a successful boot.

    We then added the kernelver parameter as follows:

    chdef -t osimage tulgpu-0001-netboot-compute kernelver=3.16.0-24-generic
    

    resulting in the following definition:

    Object name: tulgpu-0001-netboot-compute
        exlist=/opt/xcat/share/xcat/netboot/ubuntu/compute.exlist
        groups=all
        imagetype=linux
        kernelver=3.16.0-24-generic
        osarch=ppc64el
        osname=Linux
        osvers=ubuntu14.10
        otherpkgdir=http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports  utopic main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports  utopic-backports main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-security  main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-updates   main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at7.1,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at8.0,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /
        otherpkglist=/install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.otherpkg.pkglist
        pkgdir=/install/ubuntu14.10/ppc64el
        pkglist=/opt/xcat/share/xcat/netboot/ubuntu/compute.pkglist
        postinstall=/install/postscripts/custom/tulgpu-0000-netboot-compute/compute.postinstall
        postscripts=custom/tulgpu-0000-netboot-compute/compute.postscript
        profile=compute
        provmethod=netboot
        rootimgdir=/install/netboot/ubuntu14.10/ppc64el/tulgpu-0001-netboot-compute
        synclists=/install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.synclist
    (reverse-i-search)`chdef': chdef -t osimage tulgpu-0001-netboot-compute kernelver=3.16.0-24-generic
    

    And then proceeded to perform the genimage/packimage/nodeset/rpower on the corresponding CN and we got failed boot, as indicated earlier in this ticket.

    In this specific instance, we also found that we really don't need the kernelver, since when we put the security and update repository mirrors in the otherpkgdir field, we get the kernel version we are after here.

    However, in the near future we are going to need to be able to build netboot images with custom kernels, and for that we are going to need the kernelver/kerneldir fields to work properly.

     
  • zhao er tao

    zhao er tao - 2014-12-12

    For the error info "Couldn't find the proper booting device", it seems the isntallnic and primarynic attributes of the node were not setting correctly. But from the node definition, it seems correct.

    lsdef tulgpu002 -i kcmdline
    Object name: tulgpu002
    kcmdline=imgurl=http://10.0.0.1//install/netboot/ubuntu14.10/ppc64el/compute/rootimg.gz XCAT=10.0.0.1:3001 BOOTIF=40:f2:e9:31:8e:f0 console=tty0 console=hvc0,19200n8r

    And I have noticed that the CN tulgpu002 working properly with diskless, so, what is the issue right now?

     
  • zhao er tao

    zhao er tao - 2014-12-12
    • assigned_to: zhao er tao
     
    • ralph bellofatto

      the issue now, is that it will NOT work if I explicitly call out the kernel
      version with the kernelver option in the osimage definition. As a result,
      it probably means that custom kernels won't work either, which is the
      primary feature that we would want to use the kernelver and kerneldir
      fields for.

      We managed to get things to work by putting the update and security ubuntu
      repositories in the otherpkgs directory, this caused an updated kernel to
      get loaded, which for the time being is what we needed to make progress,
      but the whole purpose of an ubuntu netboot is to be able to use custom and
      instrumented kernels, so the kernelver/kerneldir feature needs to work.

      If you need to experiment with this, i created a second osimage
      tulgpu-0001-netboot-compute, to act as a work area to figure out this
      problem.

      Ralph Bellofatto
      IBM TJ Watson Research
      1-914-945-3321
      ralphbel@us.ibm.com

      From: "zhao er tao" zhaoertao@users.sf.net
      To: "[xcat:bugs] " 4466@bugs.xcat.p.re.sf.net
      Date: 12/12/2014 01:09 AM
      Subject: [xcat:bugs] #4466 ubuntu 14.10 genimage for netboot results in
      an unbootable image

        assigned_to: zhao er tao
      

      [bugs:#4466] ubuntu 14.10 genimage for netboot results in an unbootable
      image

      Status: open
      Milestone: 2.9.1
      Created: Wed Dec 10, 2014 07:31 PM UTC by ralph bellofatto
      Last Updated: Fri Dec 12, 2014 06:08 AM UTC
      Owner: zhao er tao

      ubuntu 14.10 genimage for netboot results in an unbootable image

      I'm attempting to pick up a an updated kernel image and i'm following the
      directions here:

      http://sourceforge.net/p/xcat/wiki/Using_Provmethod%3Dosimagename/#installing-a-new-kernel-in-the-stateless-image

      as well as the information provided by:
      lsdef -t osimage -lh
      regarding the use of the kerneldir and kernelver parameters.
      kerneldir: The directory name where the 3rd-party kernel is stored. It is
      used for diskless image only.
      kernelver: The version of linux kernel used in the linux image. If the
      kernel version is not set, the default kernel in rootimgdir will be used

      We have a minimal list of otherpkgs to to install (including the kernel
      headers)

      We build this following the directions for building with a mixed
      architecture environment as follows:

      on the management node we run:

      genimage --dryrun tulgpu-0000-netboot-compute

      And then copy its output.

      On a stateful boot of the target node we run:

      mkdir -p /install
      mkdir -p /opt/xcat
      mkdir -p /etc/xcat
      mount bgxcat:/install /install # the mount needs to be rw
      mount bgxcat:/opt/xcat /opt/xcat
      mount bgxcat:/etc/xcat /etc/xcat

      cd /opt/xcat/share/xcat/netboot/ubuntu; ./genimage -a ppc64el -o
      ubuntu14.10 -p compute -k 3.16.0-24-generic
      --kerneldir /install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports
      --srcdir /install/ubuntu14.10/ppc64el
      --pkglist /opt/xcat/share/xcat/netboot/ubuntu/compute.pkglist --otherpkgdir
      "
      http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports
      utopic main restricted multiverse universe,
      http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports
      utopic-backports main restricted multiverse universe,
      http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/
      utopic-security main restricted multiverse universe,
      http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/
      utopic-updates main restricted multiverse universe,
      http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu
      trusty at7.1,
      http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/!
      pub/linuxpatch/toolchain/at/ubuntu trusty at8.0,http://10.0.0.1/install!

      /mirrors/ubuntu14.10/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /"

      --otherpkglist /install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.otherpkg.pkglist

      --postinstall /install/postscripts/custom/tulgpu-0000-netboot-compute/compute.postinstall
      --rootimgdir /install/netboot/ubuntu14.10/ppc64el/compute
      tulgpu-0000-netboot-compute

      this resulted in the output in the attached file genimage.log, which
      appears to complete successfully.

      we then run the following on the management node to pack the image and
      initiate a boot:

      packimage tulgpu-0000-netboot-compute
      nodeset tulgpu002 osimage=tulgpu-0000-netboot-compute
      rpower tulgpu002 reset

      When we do this, the system console ends up displaying:

      ...
      [ 1.791833] NET: Registered protocol family 17
      [ 1.791943] Key type dns_resolver registered
      [ 1.792536] Loading compiled-in X.509 certificates
      [ 1.794152] Loaded X.509 cert 'Magrathea: Glacier signing key:
      32f4d03489c67cd7716794f60c00d7f7e8d2780e'
      [ 1.794337] registered taskstats version 1
      [ 1.803268] Key type trusted registered
      [ 1.805287] Key type encrypted registered
      [ 1.807396] AppArmor: AppArmor sha1 policy hashing enabled
      [ 1.807452] ima: No TPM chip found, activating TPM-bypass!
      [ 1.807544] evm: HMAC attrs: 0x1
      [ 1.808470] /build/buildd/linux-3.16.0/drivers/rtc/hctosys.c: unable to
      open rtc device (rtc0)
      [ 1.809357] Freeing unused kernel memory: 5632K (c000000000d60000 -
      c0000000012e0000)
      insmod: can't insert '/lib/bnx2.ko': No such file or directory
      insmod: can't insert '/lib/bnx2x.ko': No such file or directory
      [ 1.832985] e1000: Intel(R) PRO/1000 Network Driver - version
      7.3.21-k8-NAPI
      [ 1.833113] e1000: Copyright (c) 1999-2006 Intel Corporation.
      [ 1.846519] e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
      [ 1.846631] e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
      insmod: can't insert '/lib/igb.ko': No such file or directory
      Creating device nodes with udev
      Couldn't find the proper booting device, switch to shell...

      Entering rescue/debug init shell.
      Exit shell to continue booting.
      sh: cannot set terminal process group (-1): Inappropriate ioctl for device
      sh: no job control in this shell

      I have also attached a complete console log of this boot in the file
      tulgpu002.console.log

      Sent from sourceforge.net because you indicated interest in
      https://sourceforge.net/p/xcat/bugs/4466/

      To unsubscribe from further messages, please visit
      https://sourceforge.net/auth/subscriptions/

       

      Related

      Bugs: #4466

  • ralph bellofatto

    any progress on this yet?

     
  • zhao er tao

    zhao er tao - 2014-12-17

    Sorry for the delayed response, I need to take some more time to fix this issue.

     
  • zhao er tao

    zhao er tao - 2015-01-06

    Fixed with git commit hash num f2b9eef7a02f5fdd4ea001564b0d8a3bd3b23bb1 for 2.9 and af5301faab58a3df6bf84df48b4f1214c7955acc for master.

     

    Last edit: zhao er tao 2015-01-07
  • zhao er tao

    zhao er tao - 2015-01-06
    • status: open --> pending
     
MongoDB Logo MongoDB