Re: [OSR-users] Problem with rgmanager

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Stefano,

is your problem solved now ?

-Mark

On Monday 31 August 2009 15:00:06 Stefano Elmopi wrote:
> Hi Mark,
>
> excuse me if I answer only now but I was on vacation.
> I am writing to confirm that the problem was that I didn't change the
> file /etc/hosts.......
> ...... as we say in Italy...... I'm lost in a glass of water !!
>
>
> Thanks,
>
> Stefano
>
>
>
> Date: Wed, 1 Jul 2009 17:33:48 +0200
> From: Mark Hlawatschek <hla...@at...>
> Subject: Re: [OSR-users] Problem with rgmanager
> To: ope...@li...
> Message-ID: <200...@at...>
> Content-Type: text/plain;  charset="utf-8"
>
> Stefano,
>
> could you please give us an overview of your network setup ?
> # cat /etc/hosts
> # ip addr
>
> Could you also send me the output of the following command:
> # cman_tool status
> # cman_tool nodes
>
> Thanks,
>
> Mark
>
>
>
> Ing. Stefano Elmopi
> Gruppo Darco - Area ICT Sistemi
> Via Ostiense 131/L Corpo B, 00154 Roma
>
> cell. 3466147165
> tel.  0657060500
> email:ste...@so...
>
> Il giorno 01/lug/09, alle ore 14:16, Stefano Elmopi ha scritto:
> > Hi,
> >
> > I am happening a strange thing. I created a cluster with two nodes,
> > clu01 and clu02,
> > with the Shared-Root on a SAN. The node clu01 has the IP address
> > 10.43.100.203
> >
> > <clusternode name="clu01" votes="1" nodeid="1">
> >   <com_info>
> >     <syslog name="clu01"/>
> >     <rootvolume name="/dev/sda2" fstype="ocfs2"/>
> >     <eth name="eth0" ip="10.43.100.203" mac="00:15:60:56:75:FD"/>
> >     <fenceackserver user="root" passwd="test123"/>
> >   </com_info>
> > </clusternode>
> >
> > I also configured the service Httpd on the cluster and everything
> > worked well.
> > I had to change IP address (10.43.105.10) to the node_1 and so I
> > preferred to do the procedure again,
> > formatting the Shared-Root but not the server clu01.
> > The cluster starts with the new IP address and when I am starting
> > rgmanager:
> >
> > /etc/init.d/rgmanager strat
> >
> > everything seems ok
> > but in the log file I read:
> >
> > Jun 27 10:13:14 clu01 kernel: dlm: Using TCP for communications
> > Jun 27 10:13:14 clu01 kernel: dlm: Can't create listening comms socket
> > Jun 27 10:13:14 clu01 kernel: dlm: cannot start dlm lowcomms -98
> >
> > and the output of command :
> >
> > clustat
> > Cluster Status for cluOCFS2 @ Wed Jul  1 13:35:10 2009
> > Member Status: Quorate
> >
> >  Member Name
> > ID   Status
> >  ------ ----
> > ---- ------
> >  clu01
> > 1 Online, Local
> >  clu02
> > 2 Offline
> >
> >
> > missing part on the service.
> > if I try to make the restart of rgmanager, the log is:
> >
> > Jun 28 04:02:08 clu01 syslogd 1.4.1: restart.
> > Jul  1 13:37:31 clu01 kernel: dlm: Using TCP for communications
> > Jul  1 13:37:31 clu01 kernel: dlm: Can't create listening comms socket
> > Jul  1 13:37:41 clu01 kernel: BUG: soft lockup - CPU#0 stuck for
> > 10s! [clurgmgrd:13230]
> > Jul  1 13:37:41 clu01 kernel:
> > Jul  1 13:37:41 clu01 kernel: Pid: 13230, comm:            clurgmgrd
> > Jul  1 13:37:41 clu01 kernel: EIP: 0060:[<c0608d90>] CPU: 0
> > Jul  1 13:37:41 clu01 kernel: EIP is at _spin_lock+0x7/0xf
> > Jul  1 13:37:41 clu01 kernel:  EFLAGS: 00000286    Tainted: G
> > (2.6.18-92.1.22.el5PAE #1)
> > Jul  1 13:37:41 clu01 kernel: EAX: f1d93a98 EBX: f1d93a94 ECX:
> > 00000000 EDX: e1958000
> > Jul  1 13:37:41 clu01 kernel: ESI: f1d93a94 EDI: f1e31000 EBP:
> > e1958ebc DS: 007b ES: 007b
> > Jul  1 13:37:41 clu01 kernel: CR0: 8005003b CR2: b7f48000 CR3:
> > 37caef00 CR4: 000006f0
> > Jul  1 13:37:41 clu01 kernel:  [<c06080ef>] __mutex_lock_slowpath
> > +0x19/0x7c
> > Jul  1 13:37:41 clu01 kernel:  [<c0608161>] .text.lock.mutex+0xf/0x14
> > Jul  1 13:37:41 clu01 kernel:  [<f8c2ff6b>] close_connection
> > +0x11/0x5a [dlm]
> > Jul  1 13:37:41 clu01 kernel:  [<f8c308fd>] dlm_lowcomms_start+0x53e/
> > 0x59c [dlm]
> > Jul  1 13:37:41 clu01 kernel:  [<c06076a4>] schedule+0x920/0x9cd
> > Jul  1 13:37:41 clu01 kernel:  [<f8c2e879>] dlm_new_lockspace
> > +0x87/0x742 [dlm]
> > Jul  1 13:37:41 clu01 kernel:  [<f8c33d38>] device_write+0x310/0x4b6
> > [dlm]
> > Jul  1 13:37:41 clu01 kernel:  [<f8c33a28>] device_write+0x0/0x4b6
> > [dlm]
> > Jul  1 13:37:41 clu01 kernel:  [<c0470283>] vfs_write+0xa1/0x143
> > Jul  1 13:37:41 clu01 kernel:  [<c0470875>] sys_write+0x3c/0x63
> > Jul  1 13:37:41 clu01 kernel:  [<c0404eff>] syscall_call+0x7/0xb
> > Jul  1 13:37:41 clu01 kernel:  =======================
> >
> >
> >
> > if I change the file cluster.conf, put back the old IP
> > (10.43.100.203) and create a new initrd,
> > rgmanager works well.
> > This happens even with the same IP subnet 10.43.100, in practice it
> > seems that it works only
> > with the single IP address with which it was originally created the
> > cluster !
> >
> >
> > Thanks.
> >
> >
> >
> > Ing. Stefano Elmopi
> > Gruppo Darco - Area ICT Sistemi
> > Via Ostiense 131/L Corpo B, 00154 Roma
> >
> > cell. 3466147165
> > tel.  0657060500
> > email:ste...@so...