From: Marc G. <gr...@at...> - 2012-11-13 07:44:05
|
Jorge, you don't need to be doubtful about the fact that the volume group for the root file system is not flagged as clustered. This has no implications whatsoever on the gfs2 file system. It will only be a problem whenever the lvm settings of the vg_osroot change (size, number of lvs etc.). Nevertheless while thinking about your problem I think I had the idea on how to fix this problem on being able to have the root vg clustered also. I will provide new packages in the next days that should deal with the problem. Keep in mind that there is a difference between cman_tool services and the lvm usage. clvmd only uses the locktable clvmd shown by cman_tool services and the other locktables are relevant to the file systems and other services (fenced, rgmanager, ..). This is a complete different use case. Try to elaborate a bit more on the fact "I am still a bit stuck when nodes with gfs2 mounted don't restart if instructed to do so, but I will read some more." What do you mean with it? How does this happen? This sounds like something you should have a look at. "Once thing that I can confirm is osr(notice): Detecting nodeid & nodename This does not always display the correct info, but it doesn't seem to be a problem either ?" You should always look at the nodeid the nodename is (more or less) only descriptive and might not be set as expected. But the nodeid should always be consistent. Does this help? About your notes (I only take the relevant ones): 1. osr(notice): Creating clusterfiles /var/run/cman_admin /var/run/cman_client.. [OK] This message should not be misleading but only tells the these control files are being created inside the ramdisk. This has nothing to do with these files on your root file system. Nevertheless /etc/init.d/bootsr should take over this part and create the files. Please send me another bash -x /etc/init.d/bootsr start output. Please when those files are not existant. 2. vgs VG #PV #LV #SN Attr VSize VFree VG_SDATA 1 2 0 wz--nc 1000.00g 0 vg_osroot 1 1 0 wz--n- 60.00g 0 This is perfectly ok. This only means the vg is not clustered. But the filesystem IS. This does not have any connection. Hope this helps. Let me know about the open issues. Regards Marc. ----- Original Message ----- From: "Jorge Silva" <me...@je...> To: "Marc Grimme" <gr...@at...> Sent: Tuesday, November 13, 2012 2:15:23 AM Subject: Re: Problem with VG activation clvmd runs at 100% Marc Hi - I believe I have solved my problem, with your help, thank you. Yet, I'm not sure how I caused it - but the root volume group as you pointed out had the clustered attribute(and I had to have done something silly along the way). I re-installed from scratch see notes below and then just to prove that is a problem, I changed the attribute of the rootfs- vgchange -cy and rebooted and I ran into trouble, I changed it back and it is fine so that does cause problems on start-up, I'm not sure I understand why as there is an active quorum for the clvm to join and take part.. Despite it not being marked as a cluster volume cman_tool services show it as being, but clvmd status doesn't ? Is it safe to write to it with multiple nodes mounted? I am still a bit stuck when nodes with gfs2 mounted don't restart if instructed to do so, but I will read some more. Once thing that I can confirm is osr(notice): Detecting nodeid & nodename This does not always display the correct info, but it doesn't seem to be a problem either ? Thanks Jorge Notes: I decided to start from scratch and I blew away the rootfs and started from scratch as per the website. My assumption - that I edited something and messed it up (I did look at a lot of the scripts to try to "figure out and fix" the problem, I can send the history if you want or I can edit and contribute). I rebooted the server and I had an issue - I didn't disable selinux so I had to intervene in the boot stage. That completed, but I noticed that : osr(notice): Starting network configuration for lo0 [OK] osr(notice): Detecting nodeid & nodename Is blank, but somehow the correct nodeid and name was deduced. I had to rebuild the ram disk to fix the selinux disabled. I also added the following yum install pciutils - the mkinitrd warned about this so, I installed it. I also installed : yum install cluster-snmp yum install rgmanager in lvm On this reboot I noticed that despite this message sr(notice): Creating clusterfiles /var/run/cman_admin /var/run/cman_client.. [OK] Starting clvmd: dlm: Using TCP for communications Activating VG(s): File descriptor 3 (/dev/console) leaked on vgchange invocation. Parent PID 15995: /bin/bash File descriptor 4 (/dev/console) leaked on vgchange invocation. Parent PID 15995: /bin/bash Skipping clustered volume group VG_SDATA 1 logical volume(s) in volume group "vg_osroot" now active the links weren't created and I did this manually ln -sf /var/comoonics/chroot//var/run/cman_admin /var/run/cman_admin ln -sf /var/comoonics/chroot//var/run/cman_client /var/run/cman_client I could then get clusterstatus etc, and clvmd was running ok I looked in /etc/lvm/lvm.conf and locking_type = 4 ? I then issued lvmconf --enable cluster - and this changed /etc/lvm/lvm.conf locking_type = 3. vgscan correctly showed up clusterd volumes and was working ok. I did not rebuild the ramdisk (I can confirm that the lvm .conf in the ramdisk has locking_type=4) I have rebooted and everything is working. Starting clvmd: dlm: Using TCP for communications Activating VG(s): File descriptor 3 (/dev/console) leaked on vgchange invocation. Parent PID 15983: /bin/bash File descriptor 4 (/dev/console) leaked on vgchange invocation. Parent PID 15983: /bin/bash Skipping clustered volume group VG_SDATA 1 logical volume(s) in volume group "vg_osroot" now active I have rebooted a number of times and am confident that things are ok, I decided to add two other nodes to the mix and I can confirm that everytime a new node is added these files are missing : /var/run/cman_admin /var/run/cman_client But I can see from the logs: osr(notice): Creating clusterfiles /var/run/cman_admin /var/run/cman_client.. [OK] despite the above message, also, the information below is not always detected, but still the nodeid etc is correct... osr(notice): Detecting nodeid & nodename So now I have 3 nodes in the cluster and things look ok: [root@bwccs302 ~]# cman_tool services fence domain member count 3 victim count 0 victim now 0 master nodeid 2 wait state none members 2 3 4 dlm lockspaces name home id 0xf8ee17aa flags 0x00000008 fs_reg change member 3 joined 1 remove 0 failed 0 seq 3,3 members 2 3 4 name clvmd id 0x4104eefa flags 0x00000000 change member 3 joined 1 remove 0 failed 0 seq 15,15 members 2 3 4 name OSRoot id 0xab5404ad flags 0x00000008 fs_reg change member 3 joined 1 remove 0 failed 0 seq 7,7 members 2 3 4 gfs mountgroups name home id 0x686e3fc4 flags 0x00000048 mounted change member 3 joined 1 remove 0 failed 0 seq 3,3 members 2 3 4 name OSRoot id 0x659f7afe flags 0x00000048 mounted change member 3 joined 1 remove 0 failed 0 seq 7,7 members 2 3 4 service clvmd status clvmd (pid 25771) is running... Clustered Volume Groups: VG_SDATA Active clustered Logical Volumes: LV_HOME LV_DEVDB it doesn't believe that the root file-system is clustered despite the output from the above. [root@bwccs302 ~]# vgs VG #PV #LV #SN Attr VSize VFree VG_SDATA 1 2 0 wz--nc 1000.00g 0 vg_osroot 1 1 0 wz--n- 60.00g 0 The above got me thinking on what you wanted me to do to diable the clusterd flag on the root volume - with it left on I was having problems (not sure how it got turned) on. With everything working ok, I remade ramdisk and now lvm.conf=3.. The systems start up and things look ok. |