Eric - 2024-12-27

Oversimplified setup

DRBL Server = Ubuntu (22.04) VM

WAN IP - 10.0.0.30/16

DRBL IP - 10.10.0.30/16

DRBL Clients (x12) = HP Z220 SFF (Each client has 6 NICs total)

1 NIC - onboard

1 NIC - PCI 33/32 card

4 NICs - PCIe 2.0 x1 card

1. Why does the OS request DHCP on all interfaces?

PXE boot itself works fine: the client gets a DHCP lease on eth0, downloads the kernel, and starts booting the OS.

12 Client Nodes: 10.10.0.101 - 10.10.0.112

However, during the OS boot phase, the system seems to reinitialize all network interfaces (eth0 through eth5 in my case) and creates a DHCP requests on each one - 5 TIMES. This leads to unnecessary retries on unused interfaces, slowing down the boot process.

I'd like to understand:

Why does this happen after PXE has already configured the network via eth0?

How can I prevent/bypass this stage? It makes the boot time take forever...

Previously, on a bare metal installation, I could just set interfaces as "optional".

What I've tried: - (no success)

Modify /lib/mkpxeinitrd-net/initrd-skel/linuxrc-or-init

Tried to reduce the time/number of checks to see if the link is up

Tried to bypass the "Try to up" stage entirely...

2. How can I set up custom network settings for clients?

Before the OS completely boots up, I see references to eth0, eth1, eth2, eth3, eth4, and eth5 (as I addressed ^above^).

After the OS finishes booting up, ifconfig shows me I have eno1, ens4, enp3s0, enp4s0, enp5s0, and enp6s0.

Previously (before I started trying to PXE boot) I was able to set up my network interface(s) with netplan (For example, on Node7):

network: ethernets: eno1: optional: true ens4: optional: true enp3s0: optional: true enp4s0: optional: true enp5s0: optional: true enp6s0: optional: true bonds: bond0: interfaces: - eno1 - ens4 - enp3s0 - enp4s0 - enp5s0 - enp6s0 parameters: mode: 802.3ad lacp-rate: fast primary: enp3s0 vlans: vlan100: id: 100 link: bond0 vlan110: id: 110 link: bond0 vlan120: id: 120 link: bond0 vlan130: id: 130 link: bond0 vlan140: id: 140 link: bond0 vlan150: id: 150 link: bond0 bridges: br100: interfaces: [vlan100] dhcp4: false addresses: [10.0.0.107/16] routes: - to: default via: 10.0.0.1 nameservers: addresses: [1.1.1.1,8.8.8.8] br110: interfaces: [vlan110] addresses: [10.10.0.107/16] br120: interfaces: [vlan120] addresses: [10.20.0.107/27] br130: interfaces: [vlan130] addresses: [10.30.0.107/27] br140: interfaces: [vlan140] addresses: [10.40.0.107/27] br150: interfaces: [vlan150] addresses: [10.50.0.107/27] version: 2

For reference, all the VLANs/LAGs/Trunking/etc - is just me trying to learn... and cause headaches for myself.
> * VLAN100 - Management Traffic
> * VLAN120 - Storage Traffic
> * VLAN130 - Application Traffic
> * etc...

Ultimately, I'd like to have

A LAG group/Bond - at least on the 4-NIC card.

The ability to Trunk VLANs on this LAG

Move the PXE-IP address (VLAN110) for this client onto the LAG.

What I've tried: - (no success)

Modify /tftpboot/nodes/10.10.0.101/etc/network/interfaces

Set it up the same way I would with netplan on a bare metal installation

When the client node boots up, I can see this file - but it hasn't been applied.

I try "netplan apply" but that account wasn't part of the sudoers file

Create an rc.local file: /tftpboot/nodes/10.10.0.101/etc/rc.local

#!/bin/bash /usr/sbin/netplan apply exit 0

None of it has worked so far...

DRBL_DHCP_Requests_Screenshot.jpg
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Eric - 2024-12-27

I tried to modify /lib/mkpxeinitrd-net/initrd-skel/etc/linuxrc.conf

# retry max times for udhcp in one ethernet port # iretry_max="5" iretry_max="1" # The time out to wait for NIC to be linked. Unit: 0.1 sec # link_detect_timeout="70" link_detect_timeout="10"

I re-ran drblpush -i... then tried to boot the client. But that didn't seem to have any effect. - it still tries 5 times for each interface, and each attempt is no faster than it was previously...

I changed /lib/mkpxeinitrd-net/initrd-skel/etc/netdev.conf

netdevices="eth0"

I ran drblpush -i again, but it still wants to attempt leasing an IP address on each interface.

I also modified /etc/drbl/drblpush.conf

continue_with_one_port=yes

then ran drblpush -c /etc/drbl/drblpush.conf but that didn't have much effect either.

Last edit: Eric 2024-12-27
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Seems like modifying /lib/mkpxeinitrd-net/initrd-skel/etc/linuxrc.conf doesn't actually do anything.
I had to tweak the settings in /sbin/mknic-dbi and then run drblsrv -i

Which worked, finally it was only making 1 attempt to get an IP address per interface.
But now I started getting a fatal error message immediately after attempting to get DHCP on eth0, it wasn't even trying the other interfaces.

Then I remembered - back in the /lib/mkpxeinitrd-net/initrd-skel/linuxrc-or-init file where I tried to initially bypass this whole step - this comment:

IF the netdevices is not assign in /etc/netdev.conf, we use get-nic-devs to detect the network devices and do dhcpc request each one.

Which is just another way to say "We are either going to check {just the devices in /etc/netdev.conf - and ONLY those devices, in that order}, or we are going to check them all"

I set /etc/netdev.conf back to:

netdevices=""

followed by running drblsrv -i, and then spun up a client while filming the screen output.

Now I was back to checking EACH interface (at least it wasn't failing at eth0 anymore) - but it was also only making one attempt per interface. - Good.

I had to film it a few times because the screen rolled by so fast, but eventually I saw that eth5 was getting an IP (it was actually skipping eth4 all together - but I'm just going to ignore that for now...) and that's when we finally got the root filesystem mounted and the OS fully booted.

So I set /etc/netdev.conf to:

netdevices="eth5"

followed by running drblsrv -i, and then spun up a client - which got to the login screen almost immediately.

I put /sbin/mknic-dbi back to it's default of 5 attempts, and it still boots up just as fast.

I also noticed I forgot to plug a cable into the PCI 33/32 card - which I suspect is why eth4 got skipped all together.

Which leads me to infer - that the interfaces get defined "backwards" to what I was expecting:

eth#	port_i_assumed	port_it_actually_was
`eth0`	Onboard NIC `eno1`	PCIe 2.0 x1 `enp?s0`
`eth1`	PCI 33/32 `ens4`	PCIe 2.0 x1 `enp?s0`
`eth2`	PCIe 2.0 x1 `enp3s0`	PCIe 2.0 x1 `enp?s0`
`eth3`	PCIe 2.0 x1 `enp4s0`	PCIe 2.0 x1 `enp?s0`
`eth4`	PCIe 2.0 x1 `enp5s0`	PCI 33/32 `ens4`
`eth5`	PCIe 2.0 x1 `enp6s0`	Onboard NIC `eno1`

Hopefully someone can confirm this and/or explain it a bit better.

...Still trying to set up custom network settings for clients

So if anyone has a suggestion for that - I'm all ears..

The initrd for your PXE client is actually the file on the server. Here we take this one as an example:
/tftpboot/nbi_img/initrd-pxe.6.1.0-25-amd64.img
You can unmkinitrd it first:
sudo unmkinitramfs /tftpboot/nbi_img/initrd-pxe.6.1.0-25-amd64.img pxeinitrd
Then you will have a dir "pxeinitrd" which contains all for the files.
Then you can edit the file "pxeinitrd/etc/linuxrc.conf".
Once you have done that, you can generate the initrd for your PXE client again:
1. cd pxeinitrd/
2. find . | cpio --quiet -o -H newc | pigz -9 > /tftpboot/nbi_img/initrd-pxe.6.1.0-25-amd64.img

Steven

Hey Steven, that was quite helpful, thank you for providing that.

I think part of the problem is that networking seems to be set up during the PXE phase which uses legacy interface naming, but the OS uses Predictable names.

I modified pxeinitrd/etc/linuxrc.conf by adding this at the end of the file:

use_lag="yes"
bond_master=bond0
bond_slaves="eth0 eth1 eth2 eth3"
bond_mode=4
bond_miimon=100
bond_xmit_hash="layer2+3"

Then I modified pxeinitrd/init around line 209 by adding this section:

if [ "$use_lag" = "yes" ]; then
  $echo "Creating $bond_master using pre-defined slaves: $bond_slaves..."
  # Load bonding module
  modprobe bonding

  # Create the bond interface
  echo +$bond_master > /sys/class/net/bonding_masters

  # Add slave interfaces
  ifenslave $bond_master $bond_slaves

  # Configure bonding mode (802.3ad for LACP)
  echo $bond_mode > /sys/class/net/$bond_master/bonding/mode
  echo $bond_miimon > /sys/class/net/$bond_master/bonding/miimon
  echo $bond_xmit_hash > /sys/class/net/$bond_master/bonding/xmit_hash_policy

  # Bring up the bond interface
  ifconfig $bond_master up
fi

Then I ran find . | cpio --quiet -o -H newc | pigz -9 > /tftpboot/nbi_img/initrd-pxe.6.8.0-50-generic.img per your instructions.

After booting up a client and logging in, I open a terminal and check ip a, which shows that bond0 exists:

pxeadmin@Node01:~$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: enp3s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:e0:4c:68:f2:50 brd ff:ff:ff:ff:ff:ff
3: enp4s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:e0:4c:68:f2:51 brd ff:ff:ff:ff:ff:ff
4: enp5s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:e0:4c:68:f2:52 brd ff:ff:ff:ff:ff:ff
5: enp6s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 00:e0:4c:68:f2:53 brd ff:ff:ff:ff:ff:ff
6: ens4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 46:61:29:19:76:63 brd ff:ff:ff:ff:ff:ff
    altname enp7s0
7: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 74:46:a0:b9:80:3f brd ff:ff:ff:ff:ff:ff
    altname enp0s25
    inet 10.10.0.101/16 brd 10.10.255.255 scope global eno1
       valid_lft forever preferred_lft forever
    inet6 fe80::7646:a0ff:feb9:803f/64 scope link
       valid_lft forever preferred_lft forever
8: bond0: <NO-CARRIER,BROADCAST,MULTICAST,MASTER,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
    link/ether 4a:be:d6:e6:10:18 brd ff:ff:ff:ff:ff:ff

And when I check cat /proc/net/bonding/bond0, I don't see any interfaces attached to bond0:

pxeadmin@Node01:~$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v6.8.0-50-generic

Bonding Mode: IEEE 802.3ad Dynamic link aggregation
Transmit Hash Policy: layer2+3 (2)
MII Status: down
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

802.3ad info
LACP active: on
LACP rate: slow
Min links: 0
Aggregator selection policy (ad_select): stable

Eric - 2024-12-30

Okay, so going off of my theory:

networking seems to be set up during the PXE phase - which uses legacy interface naming, but the OS uses Predictable names.

On the DRBL Server, I changed /etc/default/grub as follows:

GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0"

I made sure to follow up with update-grub and rebooted the DRBL Server.
This made my DRBL Server stop recognizing the ethernet interfaces as ens18or ens19, and instead used the legacy interface naming eth0 and eth1

I then ran drblsrv -i and drblpush -i again.

When I checked /tftpboot/nodes/10.10.0.101/etc/default/grub I could see that the change I made in /etc/default/grub had carried over to the grub for this node.

I also added net.ifnames=0 biosdevname=0 to the append line of /tftpboot/nbi_img/pxelinux.cfg/default for good measure...

It didn't work....

When the client boots up, it's still showing interfaces as eno1, ens4, enp3s0, enp4s0, enp5s0, and enp6s0 - i.e. the predictable naming scheme...
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jazz Yao-Tsung Wang - 2024-12-29

Hi Eric,

Did you connect all NICs ( 6 NICs * 12 DRBL Client & 1 LAN NIC of DRBL Server ) to the same Network Switch/Hub?

Let me try to answer each question in different threads.

Q1. Why does the OS request DHCP on all interfaces?
I'd like to understand:
- Why does this happen after PXE has already configured the network via eth0?

Quick answer: PXE only configure 1 NIC at a time.

[PXE Phase] If you don't disable the PXE boot option of other 5 NICs, those 5 NICs will try to do DHCP and request PXE boot kernel image too. [OS Boot Phase] > this will related to your 2nd question 'How can I set up custom network settings for clients?' Linux kernel will first load NIC modules - so you first saw eth*N* device name which were then renamed to en*N*. (following the configuration listed in /tftpboot/nodes/10.10.0.101/etc/network/interfaces.

I understand why it's confusing people especially when it comes with more than 2 NICs.

I tried similar before and wanted to do network bonding of multiple NICs.
I have to connect the 12 DRBL Client onboard NICs and DRBL Server LAN NIC to 1 switch.

How can I prevent/bypass this stage? It makes the boot time take forever...

Simple tricks:
1. I left the other 5 DRBL Client NICs disconnected at the very beginning to reduce PXE and OS boot time.
2. I only enable PXE boot of onboard NIC. I disable PXE boot of the other 5 NICs,
3. I collected MAC address of those 12 DRBL Client NICs. It will prevent unknown NICs to get DHCP address from DRBL server. (Ref: https://drbl.org/fine-print.php?path=./faq/2_System/59_add_new_mac_address.faq#59_add_new_mac_address.faq )
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
- Eric - 2024-12-30
  
  The issue seems to be that our PXE phase uses legacy interface naming which makes the interfaces rather unpredictable, and like you said - disconnecting unnecessary ethernet cables can get around it.
  
  if you keep them all connected, and film the screen output with a phone or something - you can eventually figure out which interface corresponds to your onboard NIC for PXE booting.
  
  Once you figure that out, just go an modify /lib/mkpxeinitrd-net/initrd-skel/etc/netdev.conf and specify that interface.
  
  For example - in my setup I changed it to:
  
  netdevices="eth5"
  
  This seems to tell the init routine -
  
  "don't bother checking other ports, we want to use eth5"
  
  Don't forget to run drblsrv -i after you change this.
  Hope that helps - or keeps you from having to plug/unplug anytime your system boots.
  
  Also,
  
  Did you ever figure out how to get your network bonding set up?
  
  If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Jazz Yao-Tsung Wang - 2024-12-30

The issue seems to be that our PXE phase uses legacy interface naming which makes the interfaces rather unpredictable,

You're right. Based our experiment in 2009, it's the initrd-pxe which initialize all network devices (it's also when you see eth0 ... eth5). Your question reminds me that we tried to create eth0:1 for virtualization purpose (xen hypervisor) by modifing the initrd-pxe boot script.

If you would like to know the booting steps of DRBL, here is a simplified animation:
https://www.slideshare.net/jazzwang/08-06-24-drbl-introduction

Did you ever figure out how to get your network bonding set up?

Unfortunately, it was 2009~2010 that we tried to get network bonding working.
As Ubuntu evolved, I can't reproduce the steps now (2024).
(Ubuntu switch from Network Manager to netplan few years ago.)

I reviewed Ubuntu wiki
https://help.ubuntu.com/community/UbuntuBonding

I believe that you will need to enforce those kernel modules are loaded:

loop lp rtc bonding

What I can recall now,
the success case that we implemented network bonding in 2013,
we used CentOS 7 ( Note: we use local installation, not diskless solution).

If you're doing research, I recommend to dig into the shell script to create initrd-pxe.
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:

Custom Client Network Settings - AND - How to stop OS from requesting DHCP...

Forums

Help

Custom Client Network Settings - AND - How to stop OS from requesting DHCP on every interface during boot

Oversimplified setup

1. Why does the OS request DHCP on all interfaces?

2. How can I set up custom network settings for clients?

...Still trying to set up custom network settings for clients

Custom Client Network Settings - AND - How to stop OS from requesting DHCP...

Forums

Help

Custom Client Network Settings - AND - How to stop OS from requesting DHCP on every interface during boot document.SUBSCRIPTION_OPTIONS = { "thing": "topic", "subscribed": false, "url": "subscribe", "icon": { "css": "fa fa-envelope-o" } };

Oversimplified setup

1. Why does the OS request DHCP on all interfaces?

2. How can I set up custom network settings for clients?

...Still trying to set up custom network settings for clients

Custom Client Network Settings - AND - How to stop OS from requesting DHCP on every interface during boot