Menu

#585 atftp: Failing pxe boot

v2.2
closed
5
2012-09-19
2009-09-02
No

If I start the installation process of a node and try to boot the node via pxe, the node does get the correct IP address via DHCP but the pxe boot process on the node exists with a file not found error message. Initially I thought it might be an update problem, so I additionally installed the xnba-undi and xnba-kvm package. This added the missing file to the /tftpboot directory, but did not change the error message.

Thank you very much for your help. The detailed description is added below.

Regards,
Andreas

Detailed error description:

On the management node I get the following error message:

/var/log/messages

Sep 2 10:11:15 mnode dhcpd: DHCPDISCOVER from 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:15 mnode dhcpd: DHCPOFFER on 10.1.1.2 to 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:17 mnode dhcpd: DHCPREQUEST for 10.1.1.2 (10.1.1.1) from 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:17 mnode dhcpd: DHCPACK on 10.1.1.2 to 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:17 mnode atftpd[19098]: Serving xcat/xnba.kpxe to 10.1.1.2:2070
Sep 2 10:11:17 mnode atftpd[19098]: File 5/xcat/xnba.kpxe not found
Sep 2 10:11:17 mnode atftpd[19098]: Server thread exiting
Sep 2 10:11:17 mnode atftpd[19098]: Serving xcat/xnba.kpxe to 10.1.1.2:2071
Sep 2 10:11:17 mnode atftpd[19098]: File 5/xcat/xnba.kpxe not found
Sep 2 10:11:17 mnode atftpd[19098]: Server thread exiting
================================================================================

If I look at mt /tftpboot directory I find everything which I think should be there, including the
/tftpboot/xcat/xnba.kpxe file. However I find one corrupt system link in the pxelinux.cfg directory:
================================================================================
ls -lahR /tftpboot
================================================================================
/tftpboot:
total 40K
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 .
drwxr-xr-x 28 root root 4.0K 2009-09-01 11:05 ..
drwxrwxrwx 2 root root 4.0K 2009-09-02 09:36 etc
-rwxrwxrwx 1 root root 15K 2009-09-01 14:41 pxelinux.0
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:55 pxelinux.cfg
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 xcat

/tftpboot/etc:
total 24K
drwxrwxrwx 2 root root 4.0K 2009-09-02 09:36 .
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 ..
-rwxrwxrwx 1 root root 116 2009-09-02 09:36 0a01
-rwxrwxrwx 1 root root 118 2009-09-02 09:36 0a0204
-rwxrwxrwx 1 root root 117 2009-09-02 09:36 7f
-rwxrwxrwx 1 root root 121 2009-09-02 09:36 c0a87a

/tftpboot/pxelinux.cfg:
total 24K
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:55 .
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 ..
-rwxrwxrwx 1 root root 108 2009-09-02 09:36 0A01
lrwxrwxrwx 1 root root 7 2009-09-01 14:55 0A010102 -> node002 (this link does not exists)
-rwxrwxrwx 1 root root 110 2009-09-02 09:36 0A0204
-rwxrwxrwx 1 root root 109 2009-09-02 09:36 7F
-rwxrwxrwx 1 root root 113 2009-09-02 09:36 C0A87A

/tftpboot/xcat:
total 16M
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 .
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 ..
drwxrwxrwx 3 root root 4.0K 2009-09-01 14:44 fedora9
-rwxrwxrwx 1 root root 5.5M 2009-09-02 09:36 nbfs.ppc64.gz
-rwxrwxrwx 1 root root 5.2M 2009-09-02 09:36 nbfs.x86_64.gz
-rwxrwxrwx 1 root root 5.0M 2009-09-02 09:36 nbfs.x86.gz
-rwxrwxrwx 1 root root 16K 2009-09-01 14:44 pxelinux.0
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 xnba
-rwxrwxrwx 1 root root 48K 2009-08-24 13:02 xnba.kpxe

/tftpboot/xcat/fedora9:
total 12K
drwxrwxrwx 3 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 ..
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 x86_64

/tftpboot/xcat/fedora9/x86_64:
total 12M
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 3 root root 4.0K 2009-09-01 14:44 ..
-rwxrwxrwx 1 root root 9.5M 2009-09-01 14:55 initrd.img
-rwxrwxrwx 1 root root 2.0M 2009-09-01 14:55 vmlinuz

/tftpboot/xcat/xnba:
total 16K
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 ..
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:41 nets
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 nodes

/tftpboot/xcat/xnba/nets:
total 24K
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:41 .
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 ..
-rwxrwxrwx 1 root root 192 2009-09-02 09:36 10.1.0.0_16
-rwxrwxrwx 1 root root 194 2009-09-02 09:36 10.2.4.0_24
-rwxrwxrwx 1 root root 193 2009-09-02 09:36 127.0.0.0_8
-rwxrwxrwx 1 root root 197 2009-09-02 09:36 192.168.122.0_24

/tftpboot/xcat/xnba/nodes:
total 12K
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 ..
-rwxrwxrwx 1 root root 308 2009-09-01 14:55 node002

================================================================================
DHCPD configuration
================================================================================
more /etc/dhcpd.conf:


[root@mnode /]# more /etc/dhcpd.conf

xCAT generated dhcp configuration

authoritative;
option space isan;
option isan-encap-opts code 43 = encapsulate isan;
option isan.iqn code 203 = string;
option isan.root-path code 201 = string;
option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;
option user-class-identifier code 77 = string;
option gpxe.no-pxedhcp code 176 = unsigned integer 8;
option iscsi-initiator-iqn code 203 = string;
ddns-update-style none;
option client-architecture code 93 = unsigned integer 16;
option gpxe.no-pxedhcp 1;

omapi-port 7911;
key xcat_key {
algorithm hmac-md5;
secret "QWhrQ0JqanFaRXZxbzFnRXljQXlaQTRrMzFVcXVLOU8=";
};
omapi-key xcat_key;
shared-network eth0 {
subnet 10.1.0.0 netmask 255.255.0.0 {
max-lease-time 43200;
min-lease-time 43200;
default-lease-time 43200;
option routers 10.1.1.1;
next-server 10.1.1.1;
option log-servers 10.1.1.1;
option ntp-servers 10.1.1.1;
option domain-name "cluster.priv";
option domain-name-servers 10.1.1.1;
if option user-class-identifier = "xNBA" { #x86, xCAT Network Boot Agent
filename = "http://10.1.1.1/tftpboot/xcat/xnba/nets/10.1.0.0_16";
} else if option client-architecture = 00:00 { #x86
filename "xcat/xnba.kpxe";
} else if option vendor-class-identifier = "Etherboot-5.4" { #x86
filename "xcat/xnba.kpxe";
} else if option client-architecture = 00:02 { #ia64
filename "elilo.efi";
} else if substring(filename,0,1) = null { #otherwise, provide yaboot if the client isn't specific
filename "/yaboot";
}
range dynamic-bootp 10.1.1.20 10.1.1.250;
} # 10.1.0.0/255.255.0.0 subnet_end
} # eth0 nic_end

definition for host node002 aka host node002 can be found in the dhcpd.leases file

definition for host node003 aka host node003 can be found in the dhcpd.leases file


more /var/lib/dhcpd/dhcpd.leases:

The format of this file is documented in the dhcpd.leases(5) manual page.

This lease file was written by isc-dhcp-4.0.0

host node002 {
dynamic;
hardware ethernet 00:30:48:c6:27:f4;
fixed-address 10.1.1.2;
supersede host-name = "node002";
if option user-class-identifier = "xNBA" {
supersede server.filename =
"http://10.1.1.1/tftpboot/xcat/xnba/nodes/node002";
} elsif exists client-architecture {
supersede server.filename = "xcat/xnba.kpxe";
}
supersede server.next-server = 0a:01:01:01;
}
host node003 {
dynamic;
hardware ethernet 00:30:48:c6:3a:50;
fixed-address 10.1.1.3;
supersede host-name = "node003";
if option user-class-identifier = "xNBA" {
supersede server.filename =
"http://10.1.1.1/tftpboot/xcat/xnba/nodes/node003";
} elsif exists client-architecture {
supersede server.filename = "xcat/xnba.kpxe";
}
supersede server.next-server = 0a:01:01:01;
}

================================================================================
lsdef node002
================================================================================
Object name: node002
arch=x86_64
chain=boot
chassis=CrayCX01
currchain=boot
currstate=install fedora9-x86_64-compute
groups=all,RackCX01,compute,vnc
initrd=xcat/fedora9/x86_64/initrd.img
installnic=eth0
interface=eth0
kcmdline=nofb utf8 ks=http://10.1.1.1/install/autoinst/node002 ksdevice=eth0 noipv6
kernel=xcat/fedora9/x86_64/vmlinuz
mac=00:30:48:C6:27:F4
mgt=ipmi
netboot=pxe
nfsdir=/install
nfsserver=10.1.1.1
nodetype=osi
ondiscover=boot
os=fedora9
postscripts=syslog,remoteshell,otherpkgs,syncfiles
power=ipmi
primarynic=eth0
profile=compute
rack=Main
room=Tekmira
slot=3
tftpserver=10.1.1.1
xcatmaster=10.1.1.1

================================================================================
tabdump site
================================================================================

key,value,comments,disable

"xcatdport","3001",,
"xcatiport","3002",,
"tftpdir","/tftpboot",,
"master","10.1.1.1",,
"domain","cluster.priv",,
"installdir","/install",,
"timezone","America/Vancouver",,
"nameservers","10.1.1.1",,
"dhcpinterfaces","eth0",,
"forwarders","172.30.7.195,172.30.7.196",,

================================================================================
Installed packages
================================================================================
xCAT.x86_64 2.3-snap200909011538 installed
xCAT-UI.noarch 4:2.3-snap200908200921 installed
xCAT-client.noarch 4:2.3-snap200909011404 installed
xCAT-nbkernel-ppc64.noarch 1:2.6.18_92-4 installed
xCAT-nbkernel-x86.noarch 1:2.6.18_92-8 installed
xCAT-nbkernel-x86_64.noarch 1:2.6.18_92-8 installed
xCAT-nbroot-core-ppc64.noarch 4:2.3-snap200906231531 installed
xCAT-nbroot-core-ppc64.noarch 4:2.3-snap200909011405 installed
xCAT-nbroot-core-x86.noarch 4:2.3-snap200909011405 installed
xCAT-nbroot-core-x86_64.noarch 4:2.3-snap200909011405 installed
xCAT-nbroot-oss-ppc64.noarch 2.0-snap200801291320 installed
xCAT-nbroot-oss-x86.noarch 2.0-snap200804021050 installed
xCAT-nbroot-oss-x86_64.noarch 2.0-snap200801291344 installed
xCAT-server.noarch 4:2.3-snap200909011405 installed
atftp.x86_64 0.7-5

Discussion

  • Lissa Valletta

    Lissa Valletta - 2009-09-03

    Do you have gpxe installed gpxe-xcat-0.9.5-1.noarch and syslinux-xcat-3.82-1.noarch.rpm. if so run makedhcp -n and makedhcp -a service dhcpd restart and try again.

     
  • Andreas Putz

    Andreas Putz - 2009-09-04

    I found the problem. I tried to increase the log level of atftpd by modifying the /etc/init.d/tftpd file. To increase the verbosity I added -v 5 or -v 7 to the startup option of atftpd. The correct syntax however is --verbose=7 . The previous version adds the 5 or the 7 to the path for the boot image. That's where the strange path in the error log comes from.

    With the correct option the pxe boot works perfectly.

    Now I have a new problem with the repositories, but that has to wait until after my holidays ...

    Thanks,
    Andreas

     
  • SourceForge Robot

    This Tracker item was closed automatically by the system. It was
    previously set to a Pending status, and the original submitter
    did not respond within 14 days (the time period specified by
    the administrator of this Tracker).