The issue seems to be that our PXE phase uses legacy interface naming which makes the interfaces rather unpredictable, and like you said - disconnecting unnecessary ethernet cables can get around it. if you keep them all connected, and film the screen output with a phone or something - you can eventually figure out which interface corresponds to your onboard NIC for PXE booting. Once you figure that out, just go an modify /lib/mkpxeinitrd-net/initrd-skel/etc/netdev.conf and specify that interface....
Okay, so going off of my theory: networking seems to be set up during the PXE phase - which uses legacy interface naming, but the OS uses Predictable names. On the DRBL Server, I changed /etc/default/grub as follows: GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0" I made sure to follow up with update-grub and rebooted the DRBL Server. This made my DRBL Server stop recognizing the ethernet interfaces as ens18or ens19, and instead used the legacy interface naming eth0 and eth1 I then ran drblsrv -i...
Hey Steven, that was quite helpful, thank you for providing that. I think part of the problem is that networking seems to be set up during the PXE phase which uses legacy interface naming, but the OS uses Predictable names. I modified pxeinitrd/etc/linuxrc.conf by adding this at the end of the file: use_lag="yes" bond_master=bond0 bond_slaves="eth0 eth1 eth2 eth3" bond_mode=4 bond_miimon=100 bond_xmit_hash="layer2+3" Then I modified pxeinitrd/init around line 209 by adding this section: if [ "$use_lag"...
Seems like modifying /lib/mkpxeinitrd-net/initrd-skel/etc/linuxrc.conf doesn't actually do anything. I had to tweak the settings in /sbin/mknic-dbi and then run drblsrv -i Which worked, finally it was only making 1 attempt to get an IP address per interface. But now I started getting a fatal error message immediately after attempting to get DHCP on eth0, it wasn't even trying the other interfaces. Then I remembered - back in the /lib/mkpxeinitrd-net/initrd-skel/linuxrc-or-init file where I tried...
I tried to modify /lib/mkpxeinitrd-net/initrd-skel/etc/linuxrc.conf # retry max times for udhcp in one ethernet port # iretry_max="5" iretry_max="1" # The time out to wait for NIC to be linked. Unit: 0.1 sec # link_detect_timeout="70" link_detect_timeout="10" I re-ran drblpush -i... then tried to boot the client. But that didn't seem to have any effect. - it still tries 5 times for each interface, and each attempt is no faster than it was previously... I changed /lib/mkpxeinitrd-net/initrd-skel/etc/netdev.conf...
I tried to modify /lib/mkpxeinitrd-net/initrd-skel/etc/linuxrc.conf # retry max times for udhcp in one ethernet port # iretry_max="5" iretry_max="1" # The time out to wait for NIC to be linked. Unit: 0.1 sec # link_detect_timeout="70" link_detect_timeout="10" I re-ran drblpush -i... then tried to boot the client. But that didn't seem to have any effect. - it still tries 5 times for each interface, and each attempt is no faster than it was previously... I changed /lib/mkpxeinitrd-net/initrd-skel/etc/netdev.conf...
Oversimplified setup DRBL Server = Ubuntu (22.04) VM WAN IP - 10.0.0.30/16 DRBL IP - 10.10.0.30/16 DRBL Clients (x12) = HP Z220 SFF (Each client has 6 NICs total) 1 NIC - onboard 1 NIC - PCI 33/32 card 4 NICs - PCIe 2.0 x1 card 1. Why does the OS request DHCP on all interfaces? PXE boot itself works fine: the client gets a DHCP lease on eth0, downloads the kernel, and starts booting the OS. 12 Client Nodes: 10.10.0.101 - 10.10.0.112 However, during the OS boot phase, the system seems to reinitialize...
I have a homelab with a cluster of 12 clients (HP Z220 SFF) which I have successfully set up DRBL with Ubuntu Desktop. This is a fantastic tool, and I am so glad that I found it. One thing I have been trying to experiment with is running Proxmox from my DRBL server. The reason for this is that, from what I have read, a good Proxmox cluster with Ceph storage should be using 4 disks on each node. I only have space in each client for 3 SSD's, and committing one of them for the Proxmox install drops...