From: Marc G. <gr...@at...> - 2009-01-21 12:29:28
|
Hi Gordan, On Wednesday 21 January 2009 02:52:57 Gordan Bobic wrote: > Hi, > > It would appear that > /opt/atix/comoonics-bootimage/boot-scripts/etc/rhel5/hardware-lib.sh has > gone through a few changes in the past few months, which, unfortunately, > break it for me. > > The problem is in the ordering of the detected NICs. On one of my > systems I have a dual e1000 built into the mobo, and an e100 as an > add-in card. /etc/modprobe.conf lists eth0 and eth1 as the e1000s, and > eth2 as e100. This works fine with hardware-lib.sh v1.5, but with v1.7 > the ordering seems to be both unstable (about 1/10 of the time it'll > actually get the NIC ordering as expected and specified in cluster.conf > and the rest of the time it'll do something different) and inconsistent > with what is in cluster.conf and modprobe.conf. That's strange. I have the same problems on one cluster like you describe it. One time everything works and the other time it doesn't. But all other clusters work. The reason why I changed the hw detection for rhel5 is because it didn't work for VMs (especially kvm) and I didn't find any problems on all the other clusters (except for the one me and the one from you). I think I have to look deeper into that matter. So what you say is if you just change hardware-lib.sh from 1.7 to 1.5 everything works fine? Cause I thought it was due to the order (that's what I've changed) of udevd and kudzu/modprobe eth* being called. Older versions first called kudzu then probed for the nics and then started udevd. Now I'm first starting udevd then - if appropriate - kudzu and then probe for the NICs. I always thought that it was because of the order. But if the new order works with hardware-lib.sh (v1.5) but not for 1.7 it isn't because of the order. As the order is defined by linuxrc.generic.sh. Can you acknowledge that it's only the version of hardware-lib.sh? > > The last version that works for me is v1.5, and the latest released > version (I'm talking about CVS version numbers here) appears to be v1.7 > for this file (in the comoonics-bootimage-1.3-40.noarch.rpm release). > > Needless to say, trying to boot off an iSCSI shared root with the NIC > not starting because eth designation doesn't match the MAC doesn't get > very far. :-/ Very needless. It's the same for non iscsi clusters ;-) . So this needs to be fixed. > > On a separate note, would it perhaps be a good idea to also have an > updinitrd script? After a few versions of the clustering tools and OSR > tools, it's impossible to tell what bugs could be introduced that break > things. Granted, indiscriminately doing "yum update" is a bad idea, but > it happens to the best of us that we miss something that we really ought > to exclude. But what could be done instead is to have an updinitrd > script that opens the current initrd and just modifies the handful of > files that need changing (e.g. adding a service to cluster.conf) before > re-cpio-ing it. Any thoughts on this idea? I know that in the ideal > world it shouldn't be needed, but this is exactly what I ended up having > to do yesterday because new initrds just wouldn't boot (there was an > additional problem _just_ before mounting the GFS off DRBD where it > fails and drops to the prompt - I haven't gotten to the bottom of that > one yet). Interestingly, the latest tools work just fine for GlusterFS. That's it. It's working for most clusters but some make problems so I need to elaborate on this. The updateinitrd was answered by Reiner already. Thanks and sorry about that ugly bug. -- Gruss / Regards, Marc Grimme http://www.atix.de/ http://www.open-sharedroot.org/ |