If I start the installation process of a node and try to boot the node via pxe, the node does get the correct IP address via DHCP but the pxe boot process on the node exists with a file not found error message. Initially I thought it might be an update problem, so I additionally installed the xnba-undi and xnba-kvm package. This added the missing file to the /tftpboot directory, but did not change the error message.
Thank you very much for your help. The detailed description is added below.
Regards,
Andreas
Detailed error description:
Sep 2 10:11:15 mnode dhcpd: DHCPDISCOVER from 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:15 mnode dhcpd: DHCPOFFER on 10.1.1.2 to 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:17 mnode dhcpd: DHCPREQUEST for 10.1.1.2 (10.1.1.1) from 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:17 mnode dhcpd: DHCPACK on 10.1.1.2 to 00:30:48:c6:27:f4 via eth0
Sep 2 10:11:17 mnode atftpd[19098]: Serving xcat/xnba.kpxe to 10.1.1.2:2070
Sep 2 10:11:17 mnode atftpd[19098]: File 5/xcat/xnba.kpxe not found
Sep 2 10:11:17 mnode atftpd[19098]: Server thread exiting
Sep 2 10:11:17 mnode atftpd[19098]: Serving xcat/xnba.kpxe to 10.1.1.2:2071
Sep 2 10:11:17 mnode atftpd[19098]: File 5/xcat/xnba.kpxe not found
Sep 2 10:11:17 mnode atftpd[19098]: Server thread exiting
================================================================================
If I look at mt /tftpboot directory I find everything which I think should be there, including the
/tftpboot/xcat/xnba.kpxe file. However I find one corrupt system link in the pxelinux.cfg directory:
================================================================================
ls -lahR /tftpboot
================================================================================
/tftpboot:
total 40K
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 .
drwxr-xr-x 28 root root 4.0K 2009-09-01 11:05 ..
drwxrwxrwx 2 root root 4.0K 2009-09-02 09:36 etc
-rwxrwxrwx 1 root root 15K 2009-09-01 14:41 pxelinux.0
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:55 pxelinux.cfg
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 xcat
/tftpboot/etc:
total 24K
drwxrwxrwx 2 root root 4.0K 2009-09-02 09:36 .
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 ..
-rwxrwxrwx 1 root root 116 2009-09-02 09:36 0a01
-rwxrwxrwx 1 root root 118 2009-09-02 09:36 0a0204
-rwxrwxrwx 1 root root 117 2009-09-02 09:36 7f
-rwxrwxrwx 1 root root 121 2009-09-02 09:36 c0a87a
/tftpboot/pxelinux.cfg:
total 24K
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:55 .
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 ..
-rwxrwxrwx 1 root root 108 2009-09-02 09:36 0A01
lrwxrwxrwx 1 root root 7 2009-09-01 14:55 0A010102 -> node002 (this link does not exists)
-rwxrwxrwx 1 root root 110 2009-09-02 09:36 0A0204
-rwxrwxrwx 1 root root 109 2009-09-02 09:36 7F
-rwxrwxrwx 1 root root 113 2009-09-02 09:36 C0A87A
/tftpboot/xcat:
total 16M
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 .
drwxrwxrwx 5 root root 4.0K 2009-09-02 09:36 ..
drwxrwxrwx 3 root root 4.0K 2009-09-01 14:44 fedora9
-rwxrwxrwx 1 root root 5.5M 2009-09-02 09:36 nbfs.ppc64.gz
-rwxrwxrwx 1 root root 5.2M 2009-09-02 09:36 nbfs.x86_64.gz
-rwxrwxrwx 1 root root 5.0M 2009-09-02 09:36 nbfs.x86.gz
-rwxrwxrwx 1 root root 16K 2009-09-01 14:44 pxelinux.0
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 xnba
-rwxrwxrwx 1 root root 48K 2009-08-24 13:02 xnba.kpxe
/tftpboot/xcat/fedora9:
total 12K
drwxrwxrwx 3 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 ..
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 x86_64
/tftpboot/xcat/fedora9/x86_64:
total 12M
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 3 root root 4.0K 2009-09-01 14:44 ..
-rwxrwxrwx 1 root root 9.5M 2009-09-01 14:55 initrd.img
-rwxrwxrwx 1 root root 2.0M 2009-09-01 14:55 vmlinuz
/tftpboot/xcat/xnba:
total 16K
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 4 root root 4.0K 2009-09-02 09:38 ..
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:41 nets
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 nodes
/tftpboot/xcat/xnba/nets:
total 24K
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:41 .
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 ..
-rwxrwxrwx 1 root root 192 2009-09-02 09:36 10.1.0.0_16
-rwxrwxrwx 1 root root 194 2009-09-02 09:36 10.2.4.0_24
-rwxrwxrwx 1 root root 193 2009-09-02 09:36 127.0.0.0_8
-rwxrwxrwx 1 root root 197 2009-09-02 09:36 192.168.122.0_24
/tftpboot/xcat/xnba/nodes:
total 12K
drwxrwxrwx 2 root root 4.0K 2009-09-01 14:44 .
drwxrwxrwx 4 root root 4.0K 2009-09-01 14:44 ..
-rwxrwxrwx 1 root root 308 2009-09-01 14:55 node002
================================================================================
DHCPD configuration
================================================================================
more /etc/dhcpd.conf:
[root@mnode /]# more /etc/dhcpd.conf
authoritative;
option space isan;
option isan-encap-opts code 43 = encapsulate isan;
option isan.iqn code 203 = string;
option isan.root-path code 201 = string;
option space gpxe;
option gpxe-encap-opts code 175 = encapsulate gpxe;
option gpxe.bus-id code 177 = string;
option user-class-identifier code 77 = string;
option gpxe.no-pxedhcp code 176 = unsigned integer 8;
option iscsi-initiator-iqn code 203 = string;
ddns-update-style none;
option client-architecture code 93 = unsigned integer 16;
option gpxe.no-pxedhcp 1;
omapi-port 7911;
key xcat_key {
algorithm hmac-md5;
secret "QWhrQ0JqanFaRXZxbzFnRXljQXlaQTRrMzFVcXVLOU8=";
};
omapi-key xcat_key;
shared-network eth0 {
subnet 10.1.0.0 netmask 255.255.0.0 {
max-lease-time 43200;
min-lease-time 43200;
default-lease-time 43200;
option routers 10.1.1.1;
next-server 10.1.1.1;
option log-servers 10.1.1.1;
option ntp-servers 10.1.1.1;
option domain-name "cluster.priv";
option domain-name-servers 10.1.1.1;
if option user-class-identifier = "xNBA" { #x86, xCAT Network Boot Agent
filename = "http://10.1.1.1/tftpboot/xcat/xnba/nets/10.1.0.0_16";
} else if option client-architecture = 00:00 { #x86
filename "xcat/xnba.kpxe";
} else if option vendor-class-identifier = "Etherboot-5.4" { #x86
filename "xcat/xnba.kpxe";
} else if option client-architecture = 00:02 { #ia64
filename "elilo.efi";
} else if substring(filename,0,1) = null { #otherwise, provide yaboot if the client isn't specific
filename "/yaboot";
}
range dynamic-bootp 10.1.1.20 10.1.1.250;
} # 10.1.0.0/255.255.0.0 subnet_end
} # eth0 nic_end
host node002 {
dynamic;
hardware ethernet 00:30:48:c6:27:f4;
fixed-address 10.1.1.2;
supersede host-name = "node002";
if option user-class-identifier = "xNBA" {
supersede server.filename =
"http://10.1.1.1/tftpboot/xcat/xnba/nodes/node002";
} elsif exists client-architecture {
supersede server.filename = "xcat/xnba.kpxe";
}
supersede server.next-server = 0a:01:01:01;
}
host node003 {
dynamic;
hardware ethernet 00:30:48:c6:3a:50;
fixed-address 10.1.1.3;
supersede host-name = "node003";
if option user-class-identifier = "xNBA" {
supersede server.filename =
"http://10.1.1.1/tftpboot/xcat/xnba/nodes/node003";
} elsif exists client-architecture {
supersede server.filename = "xcat/xnba.kpxe";
}
supersede server.next-server = 0a:01:01:01;
}
================================================================================
lsdef node002
================================================================================
Object name: node002
arch=x86_64
chain=boot
chassis=CrayCX01
currchain=boot
currstate=install fedora9-x86_64-compute
groups=all,RackCX01,compute,vnc
initrd=xcat/fedora9/x86_64/initrd.img
installnic=eth0
interface=eth0
kcmdline=nofb utf8 ks=http://10.1.1.1/install/autoinst/node002 ksdevice=eth0 noipv6
kernel=xcat/fedora9/x86_64/vmlinuz
mac=00:30:48:C6:27:F4
mgt=ipmi
netboot=pxe
nfsdir=/install
nfsserver=10.1.1.1
nodetype=osi
ondiscover=boot
os=fedora9
postscripts=syslog,remoteshell,otherpkgs,syncfiles
power=ipmi
primarynic=eth0
profile=compute
rack=Main
room=Tekmira
slot=3
tftpserver=10.1.1.1
xcatmaster=10.1.1.1
================================================================================
tabdump site
================================================================================
"xcatdport","3001",,
"xcatiport","3002",,
"tftpdir","/tftpboot",,
"master","10.1.1.1",,
"domain","cluster.priv",,
"installdir","/install",,
"timezone","America/Vancouver",,
"nameservers","10.1.1.1",,
"dhcpinterfaces","eth0",,
"forwarders","172.30.7.195,172.30.7.196",,
================================================================================
Installed packages
================================================================================
xCAT.x86_64 2.3-snap200909011538 installed
xCAT-UI.noarch 4:2.3-snap200908200921 installed
xCAT-client.noarch 4:2.3-snap200909011404 installed
xCAT-nbkernel-ppc64.noarch 1:2.6.18_92-4 installed
xCAT-nbkernel-x86.noarch 1:2.6.18_92-8 installed
xCAT-nbkernel-x86_64.noarch 1:2.6.18_92-8 installed
xCAT-nbroot-core-ppc64.noarch 4:2.3-snap200906231531 installed
xCAT-nbroot-core-ppc64.noarch 4:2.3-snap200909011405 installed
xCAT-nbroot-core-x86.noarch 4:2.3-snap200909011405 installed
xCAT-nbroot-core-x86_64.noarch 4:2.3-snap200909011405 installed
xCAT-nbroot-oss-ppc64.noarch 2.0-snap200801291320 installed
xCAT-nbroot-oss-x86.noarch 2.0-snap200804021050 installed
xCAT-nbroot-oss-x86_64.noarch 2.0-snap200801291344 installed
xCAT-server.noarch 4:2.3-snap200909011405 installed
atftp.x86_64 0.7-5
Do you have gpxe installed gpxe-xcat-0.9.5-1.noarch and syslinux-xcat-3.82-1.noarch.rpm. if so run makedhcp -n and makedhcp -a service dhcpd restart and try again.
I found the problem. I tried to increase the log level of atftpd by modifying the /etc/init.d/tftpd file. To increase the verbosity I added -v 5 or -v 7 to the startup option of atftpd. The correct syntax however is --verbose=7 . The previous version adds the 5 or the 7 to the path for the boot image. That's where the strange path in the error log comes from.
With the correct option the pxe boot works perfectly.
Now I have a new problem with the repositories, but that has to wait until after my holidays ...
Thanks,
Andreas
This Tracker item was closed automatically by the system. It was
previously set to a Pending status, and the original submitter
did not respond within 14 days (the time period specified by
the administrator of this Tracker).