ubuntu 14.10 genimage for netboot results in an unbootable image
I'm attempting to pick up a an updated kernel image and i'm following the directions here:
as well as the information provided by:
lsdef -t osimage -lh
regarding the use of the kerneldir and kernelver parameters.
kerneldir: The directory name where the 3rd-party kernel is stored. It is used for diskless image only.
kernelver: The version of linux kernel used in the linux image. If the kernel version is not set, the default kernel in rootimgdir will be used
We have a minimal list of otherpkgs to to install (including the kernel headers)
We build this following the directions for building with a mixed architecture environment as follows:
on the management node we run:
genimage --dryrun tulgpu-0000-netboot-compute
And then copy its output.
On a stateful boot of the target node we run:
mkdir -p /install
mkdir -p /opt/xcat
mkdir -p /etc/xcat
mount bgxcat:/install /install # the mount needs to be rw
mount bgxcat:/opt/xcat /opt/xcat
mount bgxcat:/etc/xcat /etc/xcat
cd /opt/xcat/share/xcat/netboot/ubuntu; ./genimage -a ppc64el -o ubuntu14.10 -p compute -k 3.16.0-24-generic --kerneldir /install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports --srcdir /install/ubuntu14.10/ppc64el --pkglist /opt/xcat/share/xcat/netboot/ubuntu/compute.pkglist --otherpkgdir "http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports utopic main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports utopic-backports main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-security main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/ utopic-updates main restricted multiverse universe,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at7.1,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu trusty at8.0,http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /" --otherpkglist /install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.otherpkg.pkglist --postinstall /install/postscripts/custom/tulgpu-0000-netboot-compute/compute.postinstall --rootimgdir /install/netboot/ubuntu14.10/ppc64el/compute tulgpu-0000-netboot-compute
this resulted in the output in the attached file genimage.log, which appears to complete successfully.
we then run the following on the management node to pack the image and initiate a boot:
packimage tulgpu-0000-netboot-compute
nodeset tulgpu002 osimage=tulgpu-0000-netboot-compute
rpower tulgpu002 reset
When we do this, the system console ends up displaying:
...
[ 1.791833] NET: Registered protocol family 17
[ 1.791943] Key type dns_resolver registered
[ 1.792536] Loading compiled-in X.509 certificates
[ 1.794152] Loaded X.509 cert 'Magrathea: Glacier signing key: 32f4d03489c67cd7716794f60c00d7f7e8d2780e'
[ 1.794337] registered taskstats version 1
[ 1.803268] Key type trusted registered
[ 1.805287] Key type encrypted registered
[ 1.807396] AppArmor: AppArmor sha1 policy hashing enabled
[ 1.807452] ima: No TPM chip found, activating TPM-bypass!
[ 1.807544] evm: HMAC attrs: 0x1
[ 1.808470] /build/buildd/linux-3.16.0/drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[ 1.809357] Freeing unused kernel memory: 5632K (c000000000d60000 - c0000000012e0000)
insmod: can't insert '/lib/bnx2.ko': No such file or directory
insmod: can't insert '/lib/bnx2x.ko': No such file or directory
[ 1.832985] e1000: Intel(R) PRO/1000 Network Driver - version 7.3.21-k8-NAPI
[ 1.833113] e1000: Copyright (c) 1999-2006 Intel Corporation.
[ 1.846519] e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
[ 1.846631] e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
insmod: can't insert '/lib/igb.ko': No such file or directory
Creating device nodes with udev
Couldn't find the proper booting device, switch to shell...
Entering rescue/debug init shell.
Exit shell to continue booting.
sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
I have also attached a complete console log of this boot in the file tulgpu002.console.log
We further isolated the problem today by cloning the osimage definiition and then removing parameters from what we recalled was the last time we thought we saw a successful boot.
It appears that the kernelver parameter keeps things from working correctly.
We created a definition as follows:
And then proceeded to perform the genimage/packimage/nodeset/rpower on the corresponding CN and we got a successful boot.
We then added the kernelver parameter as follows:
resulting in the following definition:
And then proceeded to perform the genimage/packimage/nodeset/rpower on the corresponding CN and we got failed boot, as indicated earlier in this ticket.
In this specific instance, we also found that we really don't need the kernelver, since when we put the security and update repository mirrors in the otherpkgdir field, we get the kernel version we are after here.
However, in the near future we are going to need to be able to build netboot images with custom kernels, and for that we are going to need the kernelver/kerneldir fields to work properly.
For the error info "Couldn't find the proper booting device", it seems the isntallnic and primarynic attributes of the node were not setting correctly. But from the node definition, it seems correct.
lsdef tulgpu002 -i kcmdline
Object name: tulgpu002
kcmdline=imgurl=http://10.0.0.1//install/netboot/ubuntu14.10/ppc64el/compute/rootimg.gz XCAT=10.0.0.1:3001 BOOTIF=40:f2:e9:31:8e:f0 console=tty0 console=hvc0,19200n8r
And I have noticed that the CN tulgpu002 working properly with diskless, so, what is the issue right now?
the issue now, is that it will NOT work if I explicitly call out the kernel
version with the kernelver option in the osimage definition. As a result,
it probably means that custom kernels won't work either, which is the
primary feature that we would want to use the kernelver and kerneldir
fields for.
We managed to get things to work by putting the update and security ubuntu
repositories in the otherpkgs directory, this caused an updated kernel to
get loaded, which for the time being is what we needed to make progress,
but the whole purpose of an ubuntu netboot is to be able to use custom and
instrumented kernels, so the kernelver/kerneldir feature needs to work.
If you need to experiment with this, i created a second osimage
tulgpu-0001-netboot-compute, to act as a work area to figure out this
problem.
Ralph Bellofatto
IBM TJ Watson Research
1-914-945-3321
ralphbel@us.ibm.com
From: "zhao er tao" zhaoertao@users.sf.net
To: "[xcat:bugs] " 4466@bugs.xcat.p.re.sf.net
Date: 12/12/2014 01:09 AM
Subject: [xcat:bugs] #4466 ubuntu 14.10 genimage for netboot results in
an unbootable image
[bugs:#4466] ubuntu 14.10 genimage for netboot results in an unbootable
image
Status: open
Milestone: 2.9.1
Created: Wed Dec 10, 2014 07:31 PM UTC by ralph bellofatto
Last Updated: Fri Dec 12, 2014 06:08 AM UTC
Owner: zhao er tao
ubuntu 14.10 genimage for netboot results in an unbootable image
I'm attempting to pick up a an updated kernel image and i'm following the
directions here:
http://sourceforge.net/p/xcat/wiki/Using_Provmethod%3Dosimagename/#installing-a-new-kernel-in-the-stateless-image
as well as the information provided by:
lsdef -t osimage -lh
regarding the use of the kerneldir and kernelver parameters.
kerneldir: The directory name where the 3rd-party kernel is stored. It is
used for diskless image only.
kernelver: The version of linux kernel used in the linux image. If the
kernel version is not set, the default kernel in rootimgdir will be used
We have a minimal list of otherpkgs to to install (including the kernel
headers)
We build this following the directions for building with a mixed
architecture environment as follows:
on the management node we run:
genimage --dryrun tulgpu-0000-netboot-compute
And then copy its output.
On a stateful boot of the target node we run:
mkdir -p /install
mkdir -p /opt/xcat
mkdir -p /etc/xcat
mount bgxcat:/install /install # the mount needs to be rw
mount bgxcat:/opt/xcat /opt/xcat
mount bgxcat:/etc/xcat /etc/xcat
cd /opt/xcat/share/xcat/netboot/ubuntu; ./genimage -a ppc64el -o
ubuntu14.10 -p compute -k 3.16.0-24-generic
--kerneldir /install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports
--srcdir /install/ubuntu14.10/ppc64el
--pkglist /opt/xcat/share/xcat/netboot/ubuntu/compute.pkglist --otherpkgdir
"
http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports
utopic main restricted multiverse universe,
http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports
utopic-backports main restricted multiverse universe,
http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/
utopic-security main restricted multiverse universe,
http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ports.ubuntu.com/ubuntu-ports/
utopic-updates main restricted multiverse universe,
http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/pub/linuxpatch/toolchain/at/ubuntu
trusty at7.1,
http://10.0.0.1/install/mirrors/ubuntu14.10/mirror/ftp.unicamp.br/!
pub/linuxpatch/toolchain/at/ubuntu trusty at8.0,http://10.0.0.1/install!
/mirrors/ubuntu14.10/mirror/developer.download.nvidia.com/compute/cuda/repos/ubuntu1410/ppc64el /"
--otherpkglist /install/custom/tulgpu-0000-netboot-compute/ubuntu/ppc64el/compute.otherpkg.pkglist
--postinstall /install/postscripts/custom/tulgpu-0000-netboot-compute/compute.postinstall
--rootimgdir /install/netboot/ubuntu14.10/ppc64el/compute
tulgpu-0000-netboot-compute
this resulted in the output in the attached file genimage.log, which
appears to complete successfully.
we then run the following on the management node to pack the image and
initiate a boot:
packimage tulgpu-0000-netboot-compute
nodeset tulgpu002 osimage=tulgpu-0000-netboot-compute
rpower tulgpu002 reset
When we do this, the system console ends up displaying:
...
[ 1.791833] NET: Registered protocol family 17
[ 1.791943] Key type dns_resolver registered
[ 1.792536] Loading compiled-in X.509 certificates
[ 1.794152] Loaded X.509 cert 'Magrathea: Glacier signing key:
32f4d03489c67cd7716794f60c00d7f7e8d2780e'
[ 1.794337] registered taskstats version 1
[ 1.803268] Key type trusted registered
[ 1.805287] Key type encrypted registered
[ 1.807396] AppArmor: AppArmor sha1 policy hashing enabled
[ 1.807452] ima: No TPM chip found, activating TPM-bypass!
[ 1.807544] evm: HMAC attrs: 0x1
[ 1.808470] /build/buildd/linux-3.16.0/drivers/rtc/hctosys.c: unable to
open rtc device (rtc0)
[ 1.809357] Freeing unused kernel memory: 5632K (c000000000d60000 -
c0000000012e0000)
insmod: can't insert '/lib/bnx2.ko': No such file or directory
insmod: can't insert '/lib/bnx2x.ko': No such file or directory
[ 1.832985] e1000: Intel(R) PRO/1000 Network Driver - version
7.3.21-k8-NAPI
[ 1.833113] e1000: Copyright (c) 1999-2006 Intel Corporation.
[ 1.846519] e1000e: Intel(R) PRO/1000 Network Driver - 2.3.2-k
[ 1.846631] e1000e: Copyright(c) 1999 - 2014 Intel Corporation.
insmod: can't insert '/lib/igb.ko': No such file or directory
Creating device nodes with udev
Couldn't find the proper booting device, switch to shell...
Entering rescue/debug init shell.
Exit shell to continue booting.
sh: cannot set terminal process group (-1): Inappropriate ioctl for device
sh: no job control in this shell
I have also attached a complete console log of this boot in the file
tulgpu002.console.log
Sent from sourceforge.net because you indicated interest in
https://sourceforge.net/p/xcat/bugs/4466/
To unsubscribe from further messages, please visit
https://sourceforge.net/auth/subscriptions/
Related
Bugs: #4466
any progress on this yet?
Sorry for the delayed response, I need to take some more time to fix this issue.
Fixed with git commit hash num f2b9eef7a02f5fdd4ea001564b0d8a3bd3b23bb1 for 2.9 and af5301faab58a3df6bf84df48b4f1214c7955acc for master.
Last edit: zhao er tao 2015-01-07