From: Matt R. <ma...@ql...> - 2007-12-18 11:04:30
|
Brandon Allhands wrote: > Cent does enable the firewall by default, but I always make sure to turn > it off immediately after installation (With selinux as well). The image > the QRM images was cloned from also had iptables disabled. > > What made this problem odd was the fact that it seemed almost random. > While it would consistently flake on the same blade, the blade right > next to it would boot the environment just fine. (The blades I was > having problems with were 14, 11, 9 and sometimes 12. 13, 10 and 7 > booted every single time, regardless of what I did). After enabling tcp, > all of the "trouble" nodes booted the environment perfectly fine (It's > all a shared image). ok, here the info i was looking for about the shared or non-shared. (i should read the latest mails first ;)) What i would like to recommend to try for your testing is to make the fs-image non-shared and see if "it" still happens on some of your nodes. If it still does happen then i would check the hardware/network differences between the hardware. If not, make the fs-image shared again, make the VE a multi-server but only start it on the nodes one by one. -> that means boot the VE on every single node separately. If "it" does happen again i would again check the hw/net differences. If not you eventually detected a race condition when creating the private directories (not sure that there is a race condition, just a guess). Then i would try to start the VE on a single node and then scale it up node per node to see if it works ok when running sequential. ... all just ideas for testing ;) hope it helps, thanks again + all the best, Matt > I think once or twice 12 booted it fine, and then > after a reboot would lock again. > > I am interested in figuring this out, and will most likely play with it > after I get back inside and eat dinner. I will definitely post the > results after I play with it. Nothing in a computer is truly "random", > so something is there messing with it. I really think it's the balancer > (It's an older ldir, which is supposed to be completely transparent to > normal traffic, and only changes the MAC in the header for balanced > connections). > > I REALLY need to use UDP. Since the NFS machines are set up in HA, tcp > will cause problems in a failback scenario. If it's the balancer then I > have a much bigger problem, so let's hope it's the mapping. > > Brandon > > > > Thomas Halinka wrote: > >> Am Montag, den 17.12.2007, 19:53 -0500 schrieb Brandon Allhands: >> >> >>> Ok. For grins I will take the map option out (I am pretty sure it was >>> added after this problem started) and try with UDP again. I really want >>> to know exactly what was causing this, for future reference. >>> >>> >> Yep that would be interesting ;) >> >> >> >>> Brandon >>> >>> PS: There isn't a firewall anywhere on the internal network. The device >>> I think might be causing this is a cisco LDIR load balancer sitting >>> between the QRM pool and the NFS servers. >>> >>> >> Nope, but maybe running iptables on centos-server or nodes? On my last >> CentOS-Installation it was installed by default and made me mad, if i >> tried accessing my zenoss-webgui... >> But i think this was installed because i choose server at installation.. >> It was just a try with the firewall, since i had a such a problem in the >> past when using centOS - its little different from usin debian :D >> >> Is iptables running on server or nodes? >> >> Regards, >> >> Thomas >> >> >> > > > -- www.openQRM.org - Keeps your Data-Center Up and Running Matt's blog - http://mattinaction.blogspot.com/ Please notice my Courses/Workshops for 2008 at the linuxhotel : openQRM Data-Center Management Plattform http://www.linuxhotel.de/kurs/openqrm/index.html Open Source SAN and Cluster-Filesystems http://www.linuxhotel.de/kurs/san_und_cluster_dateisysteme/index.html |