[OSR-users] Problem with rgmanager

SourceForge Headquarters 1320 Columbia Street Suite 310 San Diego, CA 92101 +1 (858) 422-6466

Hi Mark,

Yes, now the problem is solved.
I had forgotten to change the file /etc/hosts.
Thanks.

Bye
Stefano

Stefano,

is your problem solved now ?

-Mark

On Monday 31 August 2009 15:00:06 Stefano Elmopi wrote:
> Hi Mark,
>
> excuse me if I answer only now but I was on vacation.
> I am writing to confirm that the problem was that I didn't change the
> file /etc/hosts.......
> ...... as we say in Italy...... I'm lost in a glass of water !!
>
>
> Thanks,
>
> Stefano
>
>
>
> Date: Wed, 1 Jul 2009 17:33:48 +0200
> From: Mark Hlawatschek <hla...@at...>
> Subject: Re: [OSR-users] Problem with rgmanager
> To: ope...@li...
> Message-ID: <200...@at...>
> Content-Type: text/plain;  charset="utf-8"
>
> Stefano,
>
> could you please give us an overview of your network setup ?
> # cat /etc/hosts
> # ip addr
>
> Could you also send me the output of the following command:
> # cman_tool status
> # cman_tool nodes
>
> Thanks,
>
> Mark
>
>
>
> Ing. Stefano Elmopi
> Gruppo Darco - Area ICT Sistemi
> Via Ostiense 131/L Corpo B, 00154 Roma
>
> cell. 3466147165
> tel.  0657060500
> email:ste...@so...
>
> Il giorno 01/lug/09, alle ore 14:16, Stefano Elmopi ha scritto:
>> Hi,
>>
>> I am happening a strange thing. I created a cluster with two nodes,
>> clu01 and clu02,
>> with the Shared-Root on a SAN. The node clu01 has the IP address
>> 10.43.100.203
>>
>> <clusternode name="clu01" votes="1" nodeid="1">
>>  <com_info>
>>    <syslog name="clu01"/>
>>    <rootvolume name="/dev/sda2" fstype="ocfs2"/>
>>    <eth name="eth0" ip="10.43.100.203" mac="00:15:60:56:75:FD"/>
>>    <fenceackserver user="root" passwd="test123"/>
>>  </com_info>
>> </clusternode>
>>
>> I also configured the service Httpd on the cluster and everything
>> worked well.
>> I had to change IP address (10.43.105.10) to the node_1 and so I
>> preferred to do the procedure again,
>> formatting the Shared-Root but not the server clu01.
>> The cluster starts with the new IP address and when I am starting
>> rgmanager:
>>
>> /etc/init.d/rgmanager strat
>>
>> everything seems ok
>> but in the log file I read:
>>
>> Jun 27 10:13:14 clu01 kernel: dlm: Using TCP for communications
>> Jun 27 10:13:14 clu01 kernel: dlm: Can't create listening comms  
>> socket
>> Jun 27 10:13:14 clu01 kernel: dlm: cannot start dlm lowcomms -98
>>
>> and the output of command :
>>
>> clustat
>> Cluster Status for cluOCFS2 @ Wed Jul  1 13:35:10 2009
>> Member Status: Quorate
>>
>> Member Name
>> ID   Status
>> ------ ----
>> ---- ------
>> clu01
>> 1 Online, Local
>> clu02
>> 2 Offline
>>
>>
>> missing part on the service.
>> if I try to make the restart of rgmanager, the log is:
>>
>> Jun 28 04:02:08 clu01 syslogd 1.4.1: restart.
>> Jul  1 13:37:31 clu01 kernel: dlm: Using TCP for communications
>> Jul  1 13:37:31 clu01 kernel: dlm: Can't create listening comms  
>> socket
>> Jul  1 13:37:41 clu01 kernel: BUG: soft lockup - CPU#0 stuck for
>> 10s! [clurgmgrd:13230]
>> Jul  1 13:37:41 clu01 kernel:
>> Jul  1 13:37:41 clu01 kernel: Pid: 13230, comm:            clurgmgrd
>> Jul  1 13:37:41 clu01 kernel: EIP: 0060:[<c0608d90>] CPU: 0
>> Jul  1 13:37:41 clu01 kernel: EIP is at _spin_lock+0x7/0xf
>> Jul  1 13:37:41 clu01 kernel:  EFLAGS: 00000286    Tainted: G
>> (2.6.18-92.1.22.el5PAE #1)
>> Jul  1 13:37:41 clu01 kernel: EAX: f1d93a98 EBX: f1d93a94 ECX:
>> 00000000 EDX: e1958000
>> Jul  1 13:37:41 clu01 kernel: ESI: f1d93a94 EDI: f1e31000 EBP:
>> e1958ebc DS: 007b ES: 007b
>> Jul  1 13:37:41 clu01 kernel: CR0: 8005003b CR2: b7f48000 CR3:
>> 37caef00 CR4: 000006f0
>> Jul  1 13:37:41 clu01 kernel:  [<c06080ef>] __mutex_lock_slowpath
>> +0x19/0x7c
>> Jul  1 13:37:41 clu01 kernel:  [<c0608161>] .text.lock.mutex+0xf/0x14
>> Jul  1 13:37:41 clu01 kernel:  [<f8c2ff6b>] close_connection
>> +0x11/0x5a [dlm]
>> Jul  1 13:37:41 clu01 kernel:  [<f8c308fd>] dlm_lowcomms_start+0x53e/
>> 0x59c [dlm]
>> Jul  1 13:37:41 clu01 kernel:  [<c06076a4>] schedule+0x920/0x9cd
>> Jul  1 13:37:41 clu01 kernel:  [<f8c2e879>] dlm_new_lockspace
>> +0x87/0x742 [dlm]
>> Jul  1 13:37:41 clu01 kernel:  [<f8c33d38>] device_write+0x310/0x4b6
>> [dlm]
>> Jul  1 13:37:41 clu01 kernel:  [<f8c33a28>] device_write+0x0/0x4b6
>> [dlm]
>> Jul  1 13:37:41 clu01 kernel:  [<c0470283>] vfs_write+0xa1/0x143
>> Jul  1 13:37:41 clu01 kernel:  [<c0470875>] sys_write+0x3c/0x63
>> Jul  1 13:37:41 clu01 kernel:  [<c0404eff>] syscall_call+0x7/0xb
>> Jul  1 13:37:41 clu01 kernel:  =======================
>>
>>
>>
>> if I change the file cluster.conf, put back the old IP
>> (10.43.100.203) and create a new initrd,
>> rgmanager works well.
>> This happens even with the same IP subnet 10.43.100, in practice it
>> seems that it works only
>> with the single IP address with which it was originally created the
>> cluster !
>>
>>
>> Thanks.
>>
>>
>>
>> Ing. Stefano Elmopi
>> Gruppo Darco - Area ICT Sistemi
>> Via Ostiense 131/L Corpo B, 00154 Roma
>>
>> cell. 3466147165
>> tel.  0657060500
>> email:ste...@so...

Ing. Stefano Elmopi
Gruppo Darco - Resp. ICT Sistemi
Via Ostiense 131/L Corpo B, 00154 Roma

cell. 3466147165
tel.  0657060500
email:ste...@so...