Thread: [SSI-devel] Regarding DRBD on 1.9

SourceForge Headquarters 225 Broadway Suite 1600 San Diego, CA 92101 +1 (858) 454-5900

Hi Roger,
DRBD , 1.9 (built from OPENSSI-DEBIAN, May 27 12:58) and the kernel ( 
built from OPENSSI-DEBIAN, May 27 12:58) resulting in the kernel oops in 
the function sock_recvmsg. Till yesterday I could reproduce 
consistently(with the kernel & drbd built on May 25 ), but it suddenly 
disappeared when I built the new version of command drbdadm and 
drbdsetup and copied to test system.  After few set up reboot, it 
resulted in oops again!. Today with new kernel and DRBD module, it is 
resulting in oops consistently. Ofcourse, I built new drbdadm and 
drbdsetup commands and copied to test system.

The console message I am attaching at the end of this mail. .

Looking at the DRBD code and generating few debugging message it is 
clear that following mdev(drbd_dev) fields are some how 
corrupting/interchanging.
mdev->conf.my_addr_len = 1
mdev->conf.other_addr_len = 2
mdev->conf.this_nodenum = 16
mdev->conf.other_nodenum = 16

The function "drbd_wait_for_connect" could not successfully complete 
"bind" and resulting in error (-22) EINVAL. Because of this connect 
would fail in the function "drbd_try_connect" (This is expected). The 
reason for bind failure is some how address is not proper  and it is 
happening because of "my_addr_len & other_addr_len " is having 1 and 2 
respectively instead of 16(This is proper length). But node number 
"this_nodenum & "other_nodenum" is showing 16 instead of 1 and 2 
respectively. Since address len field is 1, during bind it might be 
getting first and second character of the address and it would resulted 
in bind.

Once it start working yesterday (After copying new copy of command 
drbdadm and drbdsetup), later I copied old command , but it  was 
working!. So I am not even convinced it is command problem. when I built 
new kernel and DRBD module today , again I am facing same problem 
consistently with new and old command.

Does it any thing to do with race?. Any idea the relationship between 
drbdsetup and initialization of drbd?. or anything to do with alignment? 
Any timing issues? My understanding is  at this stage of boot up 
drbdsetup would not have any problem.

Your inputs would be very helpful.

Thanks and regards,
Gopal.

======================CONSOLE messg=============
drbd: module cleanup done.
modprobe -k drbd  minor_count=1
drbd: initialised. Version: 0.7.10 (api:77/proto:74)
drbd: SVN Revision: 1743 build by go...@ha..., 
2005-05-27 12:58:01
drbd: registered as block device major 147
Starting DRBD resource:
drbd0: resync bitmap: bits=1151992 words=36000
drbd0: size = 4499 MB (4607968 KB)
drbd0: 4499 MB marked out-of-sync by on disk bit-map.
drbd0: Found 6 transactions (230 active extents) in activity log.
drbd0: Marked additional 0 KB as out-of-sync based on AL.
drbd0: drbdsetup [66084]: cstate Unconfigured --> StandAlone
drbd0: drbdsetup [66087]: cstate StandAlone --> Unconnected
drbd0: drbd0_receiver [66088]: cstate Unconnected --> WFConnection
drbd0: Unabl
# WARNING: Do not type 'yes' while waiting for DRBD connection
#         unless you know what you are doing!  You have been warned!
#         The only exception is when setting up DRBD first time.
#
e to bind (-22)
drbd0: Registering drbd0 with CLMS subsystem
dr<b4>d0dr: bdd0r:b d0T_ryriencegi tveo r co[6nn60ec88t ]:ag 
caisnta.<t1e >WUnFCaoblnene tcot iohan nd--le>  kUencrnonelne cNtUeLLd
pointer dereference at virtual address 00000008
  printing eip:
c039ae5c
*pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: drbd sd_mod sym53c8xx scsi_transport_spi cciss 
scsi_mod tg3 eepro100 e100 mii
CPU:    1
EIP:    0060:[<c039ae5c>]    Not tainted VLI
EFLAGS: 00010246   (2.6.10-ssi-686-smp)
EIP is at sock_recvmsg+0xac/0xf0
eax: f7036f20   ebx: 00000000   ecx: f71157f0   edx: 00000008
esi: 00004100   edi: f7036e74   ebp: f7036f00   esp: f7036e10
ds: 007b   es: 007b   ss: 0068
Process drbd0_receiver (pid: 66088, threadinfo=f7036000 task=f71157f0)
Stack: 000000fc 00000000 f7036e7c ffffffff c0465060 20008790 36303838 
00004100
        00000008 00000000 00000000 00000000 f7036f20 f712a040 f7036e60 
c03db8da
        f712a040 f712a150 f712a040 f712a040 00000282 c03e108a f712a040 
f712a040
Call Trace:
  [<c010687f>] show_stack+0x7f/0xa0
  [<c0106a34>] show_registers+0x164/0x220
  [<c0106dc4>] die+0xf4/0x1c0
  [<c011f325>] do_page_fault+0x375/0x695
  [<c01064d3>] error_code+0x2b/0x30
  [<f898e0e9>] drbd_recv+0x89/0x190 [drbd]
  [<f898e99a>] drbd_recv_header+0x2a/0xf0 [drbd]
  [<f8991f2c>] drbdd+0x2c/0x160 [drbd]
  [<f8992b18>] drbdd_init+0x78/0x410 [drbd]
  [<f8998e3e>] drbd_thread_setup+0x7e/0xf0 [drbd]
  [<c01022e5>] kernel_thread_helper+0x5/0x10
Code: 85 24 ff ff ff 89 45 e8 31 c0 89 85 3c ff ff ff 8b 45 0c 89 95 30 
ff ff ff 89 9d 34 ff ff ff 89 85 40 ff ff ff 89 b5 2c ff ff ff <8b> 43 
08 89 74 24 10 89 54 24 0c8b 55 0c 89 5c 24 04 89 3c 24

Entering kdb (current=0xf71157f0, pid 66088) on processor 1 Oops: Oops
due to oops @ 0xc039ae5c
eax = 0xf7036f20 ebx = 0x00000000 ecx = 0xf71157f0 edx = 0x00000008
esi = 0x00004100 edi = 0xf7036e74 esp = 0xf7036e10 eip = 0xc039ae5c
ebp = 0xf7036f00 xss = 0xc0390068 xcs = 0x00000060 eflags = 0x00010246
xds = 0x0000007b xes = 0x0000007b origeax = 0xffffffff &regs = 0xf7036ddc
[1]kdb>

Thread: [SSI-devel] Regarding DRBD on 1.9

ssic-linux-devel