Re: [OSR-users] Problem with VG activation clvmd runs at 100%

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 422-6466

Jorge,
you don't need to be doubtful about the fact that the volume group for the root file system is not flagged as clustered. This has no implications whatsoever on the gfs2 file system.

It will only be a problem whenever the lvm settings of the vg_osroot change (size, number of lvs etc.).

Nevertheless while thinking about your problem I think I had the idea on how to fix this problem on being able to have the root vg clustered also. I will provide new packages in the next days that should deal with the problem.

Keep in mind that there is a difference between cman_tool services and the lvm usage.
clvmd only uses the locktable clvmd shown by cman_tool services and the other locktables are relevant to the file systems and other services (fenced, rgmanager, ..). This is a complete different use case.

Try to elaborate a bit more on the fact
"I am still a bit stuck when nodes with gfs2 mounted don't restart if instructed to do so, but I will read some more."
What do you mean with it? How does this happen? This sounds like something you should have a look at.

"Once thing that I can confirm is 
osr(notice): Detecting nodeid & nodename 
This does not always display the correct info, but it doesn't seem to be a problem either ?"

You should always look at the nodeid the nodename is (more or less) only descriptive and might not be set as expected. But the nodeid should always be consistent. Does this help?

About your notes (I only take the relevant ones):

1. osr(notice): Creating clusterfiles /var/run/cman_admin /var/run/cman_client.. [OK] 
This message should not be misleading but only tells the these control files are being created inside the ramdisk. This has nothing to do with these files on your root file system. Nevertheless /etc/init.d/bootsr should take over this part and create the files. Please send me another 
bash -x /etc/init.d/bootsr start
output. Please when those files are not existant.

2. vgs
VG #PV #LV #SN Attr VSize VFree 
VG_SDATA 1 2 0 wz--nc 1000.00g 0 
vg_osroot 1 1 0 wz--n- 60.00g 0 

This is perfectly ok. This only means the vg is not clustered. But the filesystem IS. This does not have any connection.

Hope this helps.
Let me know about the open issues.

Regards
Marc.

----- Original Message -----
From: "Jorge Silva" <me...@je...>
To: "Marc Grimme" <gr...@at...>
Sent: Tuesday, November 13, 2012 2:15:23 AM
Subject: Re: Problem with VG activation clvmd runs at 100%

Marc 

Hi - I believe I have solved my problem, with your help, thank you. Yet, I'm not sure how I caused it - but the root volume group as you pointed out had the clustered attribute(and I had to have done something silly along the way). I re-installed from scratch see notes below and then just to prove that is a problem, I changed the attribute of the rootfs- vgchange -cy and rebooted and I ran into trouble, I changed it back and it is fine so that does cause problems on start-up, I'm not sure I understand why as there is an active quorum for the clvm to join and take part.. 

Despite it not being marked as a cluster volume cman_tool services show it as being, but clvmd status doesn't ? Is it safe to write to it with multiple nodes mounted? 

I am still a bit stuck when nodes with gfs2 mounted don't restart if instructed to do so, but I will read some more. 

Once thing that I can confirm is 
osr(notice): Detecting nodeid & nodename 

This does not always display the correct info, but it doesn't seem to be a problem either ? 

Thanks 
Jorge 

Notes: 
I decided to start from scratch and I blew away the rootfs and started from scratch as per the website. My assumption - that I edited something and messed it up (I did look at a lot of the scripts to try to "figure out and fix" the problem, I can send the history if you want or I can edit and contribute). 

I rebooted the server and I had an issue - I didn't disable selinux so I had to intervene in the boot stage. That completed, but I noticed that : 

osr(notice): Starting network configuration for lo0 [OK] 
osr(notice): Detecting nodeid & nodename 

Is blank, but somehow the correct nodeid and name was deduced. 

I had to rebuild the ram disk to fix the selinux disabled. I also added the following 

yum install pciutils - the mkinitrd warned about this so, I installed it. 
I also installed : 
yum install cluster-snmp 
yum install rgmanager 
in lvm 

On this reboot I noticed that despite this message 

sr(notice): Creating clusterfiles /var/run/cman_admin /var/run/cman_client.. [OK] 

Starting clvmd: dlm: Using TCP for communications 

Activating VG(s): File descriptor 3 (/dev/console) leaked on vgchange invocation. Parent PID 15995: /bin/bash 
File descriptor 4 (/dev/console) leaked on vgchange invocation. Parent PID 15995: /bin/bash 
Skipping clustered volume group VG_SDATA 
1 logical volume(s) in volume group "vg_osroot" now active 

the links weren't created and I did this manually 

ln -sf /var/comoonics/chroot//var/run/cman_admin /var/run/cman_admin 
ln -sf /var/comoonics/chroot//var/run/cman_client /var/run/cman_client 

I could then get clusterstatus etc, and clvmd was running ok 

I looked in /etc/lvm/lvm.conf and locking_type = 4 ? 

I then issued 

lvmconf --enable cluster - and this changed /etc/lvm/lvm.conf locking_type = 3. 

vgscan correctly showed up clusterd volumes and was working ok. 

I did not rebuild the ramdisk (I can confirm that the lvm .conf in the ramdisk has locking_type=4) I have rebooted and everything is working. 

Starting clvmd: dlm: Using TCP for communications 

Activating VG(s): File descriptor 3 (/dev/console) leaked on vgchange invocation. Parent PID 15983: /bin/bash 
File descriptor 4 (/dev/console) leaked on vgchange invocation. Parent PID 15983: /bin/bash 
Skipping clustered volume group VG_SDATA 
1 logical volume(s) in volume group "vg_osroot" now active 

I have rebooted a number of times and am confident that things are ok, 

I decided to add two other nodes to the mix and I can confirm that everytime a new node is added these files are missing : 

/var/run/cman_admin 
/var/run/cman_client 
But I can see from the logs: 

osr(notice): Creating clusterfiles /var/run/cman_admin /var/run/cman_client.. [OK] 

despite the above message, also, the information below is not always detected, but still the nodeid etc is correct... 

osr(notice): Detecting nodeid & nodename 

So now I have 3 nodes in the cluster and things look ok: 

[root@bwccs302 ~]# cman_tool services 
fence domain 
member count 3 
victim count 0 
victim now 0 
master nodeid 2 
wait state none 
members 2 3 4 

dlm lockspaces 
name home 
id 0xf8ee17aa 
flags 0x00000008 fs_reg 
change member 3 joined 1 remove 0 failed 0 seq 3,3 
members 2 3 4 

name clvmd 
id 0x4104eefa 
flags 0x00000000 
change member 3 joined 1 remove 0 failed 0 seq 15,15 
members 2 3 4 

name OSRoot 
id 0xab5404ad 
flags 0x00000008 fs_reg 
change member 3 joined 1 remove 0 failed 0 seq 7,7 
members 2 3 4 

gfs mountgroups 
name home 
id 0x686e3fc4 
flags 0x00000048 mounted 
change member 3 joined 1 remove 0 failed 0 seq 3,3 
members 2 3 4 

name OSRoot 
id 0x659f7afe 
flags 0x00000048 mounted 
change member 3 joined 1 remove 0 failed 0 seq 7,7 
members 2 3 4 

service clvmd status 
clvmd (pid 25771) is running... 
Clustered Volume Groups: VG_SDATA 
Active clustered Logical Volumes: LV_HOME LV_DEVDB 

it doesn't believe that the root file-system is clustered despite the output from the above. 

[root@bwccs302 ~]# vgs 
VG #PV #LV #SN Attr VSize VFree 
VG_SDATA 1 2 0 wz--nc 1000.00g 0 
vg_osroot 1 1 0 wz--n- 60.00g 0 

The above got me thinking on what you wanted me to do to diable the clusterd flag on the root volume - with it left on I was having problems (not sure how it got turned) on. 

With everything working ok, I remade ramdisk and now lvm.conf=3.. 

The systems start up and things look ok.