Hi.
I'm booting Clonezilla on a Gigabyte Z87N-WIFI board with 2 NICs (Atheros and Intel, both supported by kernel). I'm currently testing with the latest versions 20180423-bionic and 2.5.5-41.
The Debian version will sporadically (~10% of boots with an earlier build, 2/6 boots with current build) fail to load filesystem.squashfs with the following log messages - see attachment.
The ubuntu-based version apparently never boots on this system (tried 5 times, no success) - error message is basically the same (only different URL)
Is there something I can collect/debug in the early userspace to find what is causing the issue? Given the behaviour, I'm guessing either the "other" NIC coming up is mis-interpreted as the forced NIC coming up (the forced NIC may be slower sometimes), or IPv6 link-local assignment is misinterpreted as "address configuration done" and attempt to download is made before the IPv4 from DHCP is assigned.
Hm, found the problem.
Regular (non-alternative) clonezilla: /lib/live/boot/9990-networking.sh, line 134 compares $IPV4ADDR to "0.0.0.0" and considers configuration done if it doesn't match. In the failing case, the variable is empty.
Attaching full debug log and patch (not tested yet, as I'm figuring
out how to rebuild initrd...)
Full debug log.
Patch. (not tested yet, I'm figuring out how to rebuild the initrd)
EDIT: Tested just now, works as expected (DHCP configuration actually loops as intended).
I'm not sure if the scripts are supposed to wait for the network link to come up - it appears the reason the first query fails in the first place is because the link isn't up yet.
EDIT 2: Indeed, having looked at 9990-select-eth-device.sh, the branch that is used when live-netdev is specified, doesn't wait for carrier. I'll send a patch for that later.
Last edit: Michal Zatloukal 2018-05-09
Thanks. After receiving your another patch, we will try to apply that in the future release. Thanks for your contribution.
Steven
I looked into this and I'm a bit confused as to what is happening.
Waiting for carrier before returning from Select_eth_device reveals that interfaces are still down (manage-wise, like "ip link set ethX down") when returning from this function.
I suspect the above is incorrect, as one of the branches in Select_eth_device checks for carrier to pick from multiple interfaces. I don't see anything that would turn the interface up, so it shouldn't work, but apparently it does? (I recall booting successfully from eth1 without live-netdev parameter when eth0 was disconnected)
Any hints?
EDIT: Here's a patch that "works for what I'm quickly able to test", but I'm not sure if it's the right approach.
Last edit: Michal Zatloukal 2018-05-16
Michal,
These patches have been applied in Clonezilla live >= 2.5.6-6 and 20180601-*.
Could you please give it a try and let us know the results?
Thanks.
Steven
I picked 20180601-cosmic and the boot (both with explicit netdev + 2 NIC connected, and without netdev parameter, eth0 disconnected and eth1 used to access filesystem.squashfs) fails with 'unknown option "o"' for wget, but the logs before that point suggest the network detection and configuration is working properly - in each case, the script enables the NIC and waits for carrier, and dhclient finishes instantaneously.
Also tried the implicit boot with bionic, and 2.5.6-6 - same result, all fail with the wget error message.
Sorry, I found this issue two days ago, and it's fixed now. The newer versions of testing Clonezilla live were uploaded:
https://clonezilla.org/downloads.php
Please give it a try. Thanks.
Steven
Thanks, that version (2.5.6-7) boots OK in both cases.
Last edit: Michal Zatloukal 2018-06-07