Re: [Openqrm-user] nfs server not responding on boot

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Brandon Allhands wrote:
> Cent does enable the firewall by default, but I always make sure to turn 
> it off immediately after installation (With selinux as well). The image 
> the QRM images was cloned from also had iptables disabled.
>
> What made this problem odd was the fact that it seemed almost random. 
> While it would consistently flake on the same blade, the blade right 
> next to it would boot the environment just fine. (The blades I was 
> having problems with were 14, 11, 9 and sometimes 12. 13, 10 and 7 
> booted every single time, regardless of what I did). After enabling tcp, 
> all of the "trouble" nodes booted the environment perfectly fine (It's 
> all a shared image).

ok, here the info i was looking for about the shared or non-shared.
(i should read the latest mails first ;))

What i would like to recommend to try for your testing is to
make the fs-image non-shared and see if "it" still happens on
some of your nodes. If it still does happen then i would check
the hardware/network differences between the hardware.
If not, make the fs-image shared again, make the VE a multi-server
but only start it on the nodes one by one. -> that means boot the
VE on every single node separately.
If "it" does happen again i would again check the hw/net differences.
If not you eventually detected a race condition when creating the
private directories (not sure that there is a race condition, just a guess).
Then i would try to start the VE on a single node and then scale it up
node per node to see if it works ok when running sequential.

... all just ideas for testing ;) hope it helps,

thanks again + all the best,

Matt

>  I think once or twice 12 booted it fine, and then 
> after a reboot would lock again.
>
> I am interested in figuring this out, and will most likely play with it 
> after I get back inside and eat dinner. I will definitely post the 
> results after I play with it. Nothing in a computer is truly "random", 
> so something is there messing with it. I really think it's the balancer 
> (It's an older ldir, which is supposed to be completely transparent to 
> normal traffic, and only changes the MAC in the header for balanced 
> connections).
>
> I REALLY need to use UDP. Since the NFS machines are set up in HA, tcp 
> will cause problems in a failback scenario. If it's the balancer then I 
> have a much bigger problem, so let's hope it's the mapping.
>
> Brandon
>
>
>
> Thomas Halinka wrote:
>   
>> Am Montag, den 17.12.2007, 19:53 -0500 schrieb Brandon Allhands:
>>   
>>     
>>> Ok. For grins I will take the map option out (I am pretty sure it was 
>>> added after this problem started) and try with UDP again. I really want 
>>> to know exactly what was causing this, for future reference.
>>>     
>>>       
>> Yep that would be interesting ;)
>>
>>   
>>     
>>> Brandon
>>>
>>> PS: There isn't a firewall anywhere on the internal network. The device 
>>> I think might be causing this is a cisco LDIR load balancer sitting 
>>> between the QRM pool and the NFS servers.
>>>     
>>>       
>> Nope, but maybe running iptables on centos-server or nodes? On my last
>> CentOS-Installation it was installed by default and made me mad, if i
>> tried accessing my zenoss-webgui...
>> But i think this was installed because i choose server at installation..
>> It was just a try with the firewall, since i had a such a problem in the
>> past when using centOS - its little different from usin debian :D
>>
>> Is iptables running on server or nodes?
>>
>> Regards,
>>
>> Thomas
>>
>>   
>>     
>
>
>   

-- 
www.openQRM.org 
- Keeps your Data-Center Up and Running

Matt's blog - http://mattinaction.blogspot.com/

Please notice my Courses/Workshops for 2008 at the linuxhotel :

openQRM Data-Center Management Plattform
http://www.linuxhotel.de/kurs/openqrm/index.html

Open Source SAN and Cluster-Filesystems
http://www.linuxhotel.de/kurs/san_und_cluster_dateisysteme/index.html

Re: [Openqrm-user] nfs server not responding on boot

Legacy Release only. Get latest Edition here: http://www.openqrm.com.

Re: [Openqrm-user] nfs server not responding on boot