From: Stefano E. <ste...@so...> - 2009-07-01 14:29:32
|
Hi, I am happening a strange thing. I created a cluster with two nodes, clu01 and clu02, with the Shared-Root on a SAN. The node clu01 has the IP address 10.43.100.203 <clusternode name="clu01" votes="1" nodeid="1"> <com_info> <syslog name="clu01"/> <rootvolume name="/dev/sda2" fstype="ocfs2"/> <eth name="eth0" ip="10.43.100.203" mac="00:15:60:56:75:FD"/> <fenceackserver user="root" passwd="test123"/> </com_info> </clusternode> I also configured the service Httpd on the cluster and everything worked well. I had to change IP address (10.43.105.10) to the node_1 and so I preferred to do the procedure again, formatting the Shared-Root but not the server clu01. The cluster starts with the new IP address and when I am starting rgmanager: /etc/init.d/rgmanager strat everything seems ok but in the log file I read: Jun 27 10:13:14 clu01 kernel: dlm: Using TCP for communications Jun 27 10:13:14 clu01 kernel: dlm: Can't create listening comms socket Jun 27 10:13:14 clu01 kernel: dlm: cannot start dlm lowcomms -98 and the output of command : clustat Cluster Status for cluOCFS2 @ Wed Jul 1 13:35:10 2009 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ clu01 1 Online, Local clu02 2 Offline missing part on the service. if I try to make the restart of rgmanager, the log is: Jun 28 04:02:08 clu01 syslogd 1.4.1: restart. Jul 1 13:37:31 clu01 kernel: dlm: Using TCP for communications Jul 1 13:37:31 clu01 kernel: dlm: Can't create listening comms socket Jul 1 13:37:41 clu01 kernel: BUG: soft lockup - CPU#0 stuck for 10s! [clurgmgrd:13230] Jul 1 13:37:41 clu01 kernel: Jul 1 13:37:41 clu01 kernel: Pid: 13230, comm: clurgmgrd Jul 1 13:37:41 clu01 kernel: EIP: 0060:[<c0608d90>] CPU: 0 Jul 1 13:37:41 clu01 kernel: EIP is at _spin_lock+0x7/0xf Jul 1 13:37:41 clu01 kernel: EFLAGS: 00000286 Tainted: G (2.6.18-92.1.22.el5PAE #1) Jul 1 13:37:41 clu01 kernel: EAX: f1d93a98 EBX: f1d93a94 ECX: 00000000 EDX: e1958000 Jul 1 13:37:41 clu01 kernel: ESI: f1d93a94 EDI: f1e31000 EBP: e1958ebc DS: 007b ES: 007b Jul 1 13:37:41 clu01 kernel: CR0: 8005003b CR2: b7f48000 CR3: 37caef00 CR4: 000006f0 Jul 1 13:37:41 clu01 kernel: [<c06080ef>] __mutex_lock_slowpath +0x19/0x7c Jul 1 13:37:41 clu01 kernel: [<c0608161>] .text.lock.mutex+0xf/0x14 Jul 1 13:37:41 clu01 kernel: [<f8c2ff6b>] close_connection+0x11/0x5a [dlm] Jul 1 13:37:41 clu01 kernel: [<f8c308fd>] dlm_lowcomms_start+0x53e/ 0x59c [dlm] Jul 1 13:37:41 clu01 kernel: [<c06076a4>] schedule+0x920/0x9cd Jul 1 13:37:41 clu01 kernel: [<f8c2e879>] dlm_new_lockspace +0x87/0x742 [dlm] Jul 1 13:37:41 clu01 kernel: [<f8c33d38>] device_write+0x310/0x4b6 [dlm] Jul 1 13:37:41 clu01 kernel: [<f8c33a28>] device_write+0x0/0x4b6 [dlm] Jul 1 13:37:41 clu01 kernel: [<c0470283>] vfs_write+0xa1/0x143 Jul 1 13:37:41 clu01 kernel: [<c0470875>] sys_write+0x3c/0x63 Jul 1 13:37:41 clu01 kernel: [<c0404eff>] syscall_call+0x7/0xb Jul 1 13:37:41 clu01 kernel: ======================= if I change the file cluster.conf, put back the old IP (10.43.100.203) and create a new initrd, rgmanager works well. This happens even with the same IP subnet 10.43.100, in practice it seems that it works only with the single IP address with which it was originally created the cluster ! Thanks. Ing. Stefano Elmopi Gruppo Darco - Area ICT Sistemi Via Ostiense 131/L Corpo B, 00154 Roma cell. 3466147165 tel. 0657060500 email:ste...@so... |