Thread: [SSI-devel] OpenSSI/drbd on Debian
Brought to you by:
brucewalker,
rogertsang
From: John H. <john@Calva.COM> - 2007-02-13 12:34:23
|
I've been trying drbd out on my pre-1.9.3 system, and I can get it to work sometimes but it seems very flakey - it almost always crashes on a node-up, usually with a stack overflow in do_IRQ on the rootnode. Any idea about how to debug this? |
From: mandrake (G. Harrison) <man...@ma...> - 2007-02-13 14:38:13
|
outside of openSSI I have not had any problems under debian w/ drbd as long as I do not attempt to compile the drbd module from the sources in the debian repository (which do not like building). I have not yet attempted to use it under openssi. On 2/13/07, John Hughes <jo...@ca...> wrote: > > I've been trying drbd out on my pre-1.9.3 system, and I can get it to > work sometimes but it seems very flakey - it almost always crashes on a > node-up, usually with a stack overflow in do_IRQ on the rootnode. > > Any idea about how to debug this? > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job > easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |
From: Roger T. <rog...@gm...> - 2007-02-13 17:20:32
|
Which DRBD version? Can we see the stack trace(s)? Type bt in kdb. I assume you are SMP. I haven't run into DRBD related problems for some time now. I'm running latest OPENSSI-FC kernel and openssi/drbd on the same CVS branch. So maybe it's isolated to Debian. Roger On 2/13/07, John Hughes <jo...@ca...> wrote: > I've been trying drbd out on my pre-1.9.3 system, and I can get it to > work sometimes but it seems very flakey - it almost always crashes on a > node-up, usually with a stack overflow in do_IRQ on the rootnode. > > Any idea about how to debug this? > > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier. > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > ssic-linux-devel mailing list > ssi...@li... > https://lists.sourceforge.net/lists/listinfo/ssic-linux-devel > |
From: John H. <john@Calva.COM> - 2007-02-13 17:35:34
|
Roger Tsang wrote: > Which DRBD version? Can we see the stack trace(s)? Type bt in kdb. > I assume you are SMP. CVS + latest kernel from CVS No stack trace - it starts to print it then hangs. Can't get into kdb Yup it's SMP. I'm trying to rule out hardware problems at the moment (I also saw a NMI). |
From: Roger T. <rog...@gm...> - 2007-02-13 19:44:07
|
Maybe I'm just poking in the air, but the network card drivers in the kernel are old. On 2/13/07, John Hughes <jo...@ca...> wrote: > Roger Tsang wrote: > > Which DRBD version? Can we see the stack trace(s)? Type bt in kdb. > > I assume you are SMP. > CVS + latest kernel from CVS > > No stack trace - it starts to print it then hangs. > > Can't get into kdb > > Yup it's SMP. > > I'm trying to rule out hardware problems at the moment (I also saw a NMI). > > |
From: John H. <john@Calva.COM> - 2007-02-15 14:12:03
|
John Hughes wrote: > Roger Tsang wrote: > >> Which DRBD version? Can we see the stack trace(s)? Type bt in kdb. >> I assume you are SMP. >> > CVS + latest kernel from CVS > > No stack trace - it starts to print it then hangs. > > Can't get into kdb > > Yup it's SMP. > > I'm trying to rule out hardware problems at the moment (I also saw a NMI). > Ran memtest86+ => no problems. Built a kernel with 8k stacks => failover works with no problems. For my next trick I'll set it with 8k stacks but increase STACK_WARN to see where it's overflowing. |
From: John H. <john@Calva.COM> - 2007-02-15 15:14:52
|
Ok, here's my first stack overflow check: It had 4872 bytes free (i.e. 680 bytes on a 4k stack system). drbd0: Resync started as SyncSource (need to sync 327680 KB [81920 bits set]). do_IRQ: stack overflow: 4872 [<c01068be>] dump_stack+0x1e/0x30 [<c01080a8>] do_IRQ+0x68/0x70 [<c0106352>] common_interrupt+0x1a/0x20 [<c03ca024>] ip_finish_output2+0xb4/0x1d0 [<c03b29a5>] nf_hook_slow+0xc5/0x100 [<c03c7919>] ip_finish_output+0x209/0x210 [<c03c7b49>] ip_output+0x59/0x80 [<c03c7f41>] ip_queue_xmit+0x3d1/0x560 [<c03d9bc8>] tcp_transmit_skb+0x428/0x700 [<c03da938>] tcp_write_xmit+0x168/0x2f0 [<c03d7879>] __tcp_data_snd_check+0xe9/0x100 [<c03d8119>] tcp_rcv_established+0x4a9/0x8f0 [<c03e10aa>] tcp_v4_do_rcv+0x12a/0x130 [<c039f757>] __release_sock+0x57/0x80 [<c039ff58>] release_sock+0x78/0x80 [<c03cdc85>] tcp_sendmsg+0x495/0x1250 [<c03ef8fb>] inet_sendmsg+0x4b/0x60 [<c039be71>] sock_sendmsg+0xd1/0x100 [<c039bee2>] kernel_sendmsg+0x42/0x50 [<f08e3a03>] drbd_send+0xb3/0x260 [drbd] [<f08e3438>] drbd_send_dblock+0x268/0x3e0 [drbd] [<f08dd0ab>] drbd_make_request_common+0x4bb/0x8f0 [drbd] [<f08dd5c9>] drbd_make_request_26+0xe9/0x2a0 [drbd] [<c032cdfc>] generic_make_request+0x19c/0x250 [<c032cf1f>] submit_bio+0x6f/0x120 [<c01704b9>] submit_bh+0x149/0x1b0 [<c016eaa0>] __block_write_full_page+0x1b0/0x3b0 [<c017029e>] block_write_full_page+0xfe/0x120 [<f0a10734>] ext3_ordered_writepage+0xd4/0x1d0 [ext3] [<c019473c>] mpage_writepages+0x26c/0x3b0 [<c0151262>] do_writepages+0x42/0x50 [<c014a73f>] __filemap_fdatawrite_range+0x9f/0xb0 [<c014a7c6>] filemap_fdatawrite_range+0x36/0x40 [<c014aaf1>] sync_page_range_nolock+0x91/0xd0 [<c014d6c1>] generic_file_aio_write_nolock+0x91/0xa0 [<c014d8ca>] generic_file_aio_write+0x9a/0x110 [<f0a0d6cf>] ext3_file_write+0x3f/0xd0 [ext3] [<c016b743>] do_sync_write+0xa3/0xd0 [<c028b335>] cfsd_write+0x105/0x190 [<c0288e62>] rcfs_write+0x62/0xc0 [<c029756a>] svr_rcfs_write+0xda/0x170 [<c0207b05>] icssvr_daemon+0x2e5/0xa90 [<c01022e5>] kernel_thread_helper+0x5/0x10 |
From: John H. <john@Calva.COM> - 2007-02-16 09:59:32
|
John Hughes wrote: > Built a kernel with 8k stacks => failover works with no problems. > > Duh: 1367582 crashes with 4KSTACKS <https://sourceforge.net/tracker/index.php?func=detail&aid=1367582&group_id=32541&atid=405834> ** 2005-11-27 17:02 * 5 Open nobody catucci <https://sourceforge.net/users/catucci/> |
From: John H. <john@Calva.COM> - 2007-02-15 16:03:36
|
Here's another stack trace, this time captured from kdb: 0xdc31539c 0xc01068a0 dump_stack (0xc048446f, 0x1358, 0xdda60878, 0xdc138bb4) 0xc01080a8 do_IRQ+0x68 (0xdda60878, 0xdda60840, 0xdc138a14, 0xdc138bb4, 0xdc1389a0) 0xdc315408 0xc0106352 common_interrupt+0x1a (0xdc1389a0, 0xdda69320, 0xdc1389c0, 0xdc1389a0, 0xdc1389c0) 0xc039f757 __release_sock+0x57 (0xdc1389a0, 0x1000, 0xdc1389a0, 0x1000) 0xdc3154d0 0xc039ff58 release_sock+0x78 (0xdc1389a0, 0xdc315504, 0x1000, 0x0, 0x0) 0xdc3154f8 0xc03cd7e2 tcp_sendpage+0x82 (0xde9903e0, 0xc1479f50, 0x0, 0x1000, 0x4000) 0xdc31552c 0xe096714a [drbd]_drbd_send_page+0xfa (0xde997800, 0xc1479f50, 0x0, 0x1000, 0x8000) 0xdc315590 0xe0967484 [drbd]drbd_send_dblock+0x2b4 (0xde997800, 0xde8375fc, 0x0, 0xdf659340, 0xdc03d060) 0xdc315610 0xe09610ab [drbd]drbd_make_request_common+0x4bb (0xde997800, 0x1, 0x1000, 0x240260, 0x0) 0xdc315640 0xe09615c9 [drbd]drbd_make_request_26+0xe9 (0xdee0108c, 0xdc064500, 0xdc315680, 0xe08d12d7, 0x240260) 0xdc3156cc 0xc032cdfc generic_make_request+0x19c (0xdc064500, 0xdc3156d8, 0xdc3156d8, 0xdca27d24, 0xdca27d2c) 0xdc31571c 0xc032cf1f submit_bio+0x6f (0x1, 0xdc064500, 0x240260, 0x0, 0x8) 0xdc315750 0xc01704b9 submit_bh+0x149 (0x1, 0xdca1edb0, 0xdc315848, 0x51000, 0x4) 0xdc31578c 0xc016eaa0 __block_write_full_page+0x1b0 (0xdfdab0bc, 0xc1479f50, 0xc0174390, 0xdc3158cc, 0x0) 0xdc3157b8 0xc017029e block_write_full_page+0xfe (0xc1479f50, 0xc0174390, 0xdc3158cc) 0xdc3157cc 0xc0174521 blkdev_writepage+0x21 (0xc1479f50, 0xdc3158cc, 0xdc315808, 0x0, 0xe) 0xdc315864 0xc019473c mpage_writepages+0x26c (0xdfdab170, 0xdc3158cc, 0x0) 0xdc315878 0xc017579e generic_writepages+0x1e (0xdfdab170, 0xdc3158cc, 0xdee66a00) 0xdc31588c 0xc0151246 do_writepages+0x26 (0xdfdab170, 0xdc3158cc, 0x0, 0x1, 0x0) 0xdc315908 0xc014a73f __filemap_fdatawrite_range+0x9f (0xdfdab170, 0x0, 0x0, 0x0, 0x0) 0xdc315928 0xc014a782 filemap_fdatawrite+0x32 (0xdfdab170, 0x0, 0x0, 0x0, 0x0) 0xdc315964 0xc0290dac cfstok_objrevoke+0x9c (0xdc3159f4, 0x4, 0x2, 0x1, 0x0) 0xdc31599c 0xc0277a96 revoke_internal+0xd6 (0xdc3159f4, 0xdf8ba160, 0x1, 0x2, 0xde807000) 0xdc3159d4 0xc0277968 tok_revoke+0xb8 (0xdc3159f4, 0xdeb395bc, 0x4, 0x2, 0x2) 0xdc315a04 0xc0290490 _cfstok_revoke+0x50 (0xdeb39664, 0x1, 0x4, 0x2, 0x0) 0xdc315a48 0xc0290538 cfstok_revoke+0x98 (0xdeb39664, 0x1, 0x4, 0x2, 0x0) 0xdc315a9c 0xc028d277 cfs_tokmsg+0x397 (0xdc315b50, 0x1, 0x2002, 0x4, 0x2) 0xdc315ae4 0xc028cdb7 _cfs_tokmsg+0x77 (0xdc315b50, 0xdf659000, 0xdc315b18, 0xc03b8718, 0xdfc347e0) 0xdc315b1c 0xc027e097 tokseq_accept+0x167 (0xdf8ba680, 0x2, 0x905, 0xc028cd40, 0xdc315b50) 0xdc315b44 0xc028ce1f _cfs_tokmsg_seq+0x5f (0xdc315b50, 0x2f, 0x2, 0x1, 0x30d) 0xdc315be8 0xc028cecf cfs_tokmsg_seq+0x9f (0xdf8898e0, 0x1, 0x2002, 0x4, 0x2) 0xdc315d30 0xc028d5b1 svrcfstok_send+0x211 (0xdc315d88, 0x2002, 0x0, 0x0, 0x0) 0xdc315d10 0xc027d179 revoke+0x99 (0xdc315d88, 0xdf889900, 0x1, 0xdf3a53e0, 0xdc314000) 0xdc315d34 0xc027cef0 request_internal+0xe0 (0xdf3a53e0, 0x905, 0x2002, 0x4, 0x2) 0xdc315d6c 0xc0278f59 process_msgs+0x129 (0x0, 0xdf889900, 0x0, 0x2, 0x1) 0xdc315da4 0xc028d87c svrcfstok_request_range+0xac (0xdf8898e0, 0x1, 0x0, 0x2, 0x1) 0xdc315df8 0xc028cf8d cfs_tokmsg+0xad (0xdc15aa80, 0x1, 0x1000, 0x0, 0x2) 0xdc315e40 0xc028cdb7 _cfs_tokmsg+0x77 (0xdc15aa80, 0xdc315ed4, 0xdc315e6c, 0xc0294a23, 0xdc315ed4) 0xdc315e78 0xc027e097 tokseq_accept+0x167 (0xdf8ba680, 0x1, 0xf, 0xc028cd40, 0xdc15aa80) 0xdc315ea0 0xc028ce1f _cfs_tokmsg_seq+0x5f (0xdc15aa80) 0xdc315eac 0xc02898e1 cfsd_proc_tokmsg_0+0x11 (0xdc15aa80, 0x0, 0xdc15aa80, 0xdc00bc20, 0xdedae440) 0xdc315f00 0xc020c17a icsnsc_rpc_dispatch+0x1aa (0xa0001, 0xd, 0xdc315f38, 0xdc00bc20, 0xdc315f34) 0xdc315f6c 0xc020bb74 svr_icsnsc_rcall+0xc4 What dump_stack shows for the same one (not quite the same!) [<c01068be>] dump_stack+0x1e/0x30 [<c01080a8>] do_IRQ+0x68/0x70 [<c0106352>] common_interrupt+0x1a/0x20 [<c03d52b8>] tcp_ack+0xc8/0x5b0 [<c03d80f8>] tcp_rcv_established+0x488/0x8f0 [<c03e10aa>] tcp_v4_do_rcv+0x12a/0x130 [<c039f757>] __release_sock+0x57/0x80 [<c039ff58>] release_sock+0x78/0x80 [<c03cd7e2>] tcp_sendpage+0x82/0x90 [<e096714a>] _drbd_send_page+0xfa/0x180 [drbd] [<e0967484>] drbd_send_dblock+0x2b4/0x3e0 [drbd] [<e09610ab>] drbd_make_request_common+0x4bb/0x8f0 [drbd] [<e09615c9>] drbd_make_request_26+0xe9/0x2a0 [drbd] [<c032cdfc>] generic_make_request+0x19c/0x250 [<c032cf1f>] submit_bio+0x6f/0x120 [<c01704b9>] submit_bh+0x149/0x1b0 [<c016eaa0>] __block_write_full_page+0x1b0/0x3b0 [<c017029e>] block_write_full_page+0xfe/0x120 [<c0174521>] blkdev_writepage+0x21/0x30 [<c019473c>] mpage_writepages+0x26c/0x3b0 [<c017579e>] generic_writepages+0x1e/0x22 [<c0151246>] do_writepages+0x26/0x50 [<c014a73f>] __filemap_fdatawrite_range+0x9f/0xb0 [<c014a782>] filemap_fdatawrite+0x32/0x40 [<c0290dac>] cfstok_objrevoke+0x9c/0x180 [<c0277a96>] revoke_internal+0xd6/0x100 [<c0277968>] tok_revoke+0xb8/0xe0 [<c0290490>] _cfstok_revoke+0x50/0x60 [<c0290538>] cfstok_revoke+0x98/0xd0 [<c028d277>] cfs_tokmsg+0x397/0x460 [<c028cdb7>] _cfs_tokmsg+0x77/0x80 [<c027e097>] tokseq_accept+0x167/0x310 [<c028ce1f>] _cfs_tokmsg_seq+0x5f/0x70 [<c028cecf>] cfs_tokmsg_seq+0x9f/0xb0 [<c028d5b1>] svrcfstok_send+0x211/0x2a0 [<c0278f59>] process_msgs+0x129/0x180 [<c028d87c>] svrcfstok_request_range+0xac/0xc0 [<c028cf8d>] cfs_tokmsg+0xad/0x460 [<c028cdb7>] _cfs_tokmsg+0x77/0x80 [<c027e097>] tokseq_accept+0x167/0x310 [<c028ce1f>] _cfs_tokmsg_seq+0x5f/0x70 [<c02898e1>] cfsd_proc_tokmsg_0+0x11/0x20 [<c020c17a>] icsnsc_rpc_dispatch+0x1aa/0x220 [<c020bb74>] svr_icsnsc_rcall+0xc4/0x150 [<c0207b05>] icssvr_daemon+0x2e5/0xa90 [<c01022e5>] kernel_thread_helper+0x5/0x10 |
From: John H. <john@Calva.COM> - 2007-02-15 16:27:06
|
>From the 2nd trace the big users are: 100 [drbd]_drbd_send_page+0xfa (0xde997800, 0xc1479f50, 0x0, 0x1000, 0x8000) 108 dump_stack (0xc048446f, 0x1358, 0xdda60878, 0xdc138bb4) 108 icsnsc_rpc_dispatch+0x1aa (0xa0001, 0xd, 0xdc315f38, 0xdc00bc20, 0xdc315f34) 124 do_writepages+0x26 (0xdfdab170, 0xdc3158cc, 0x0, 0x1, 0x0) 128 [drbd]drbd_send_dblock+0x2b4 (0xde997800, 0xde8375fc, 0x0, 0xdf659340, 0xdc03d060) 140 [drbd]drbd_make_request_26+0xe9 (0xdee0108c, 0xdc064500, 0xdc315680, 0xe08d12d7, 0x240260) 152 blkdev_writepage+0x21 (0xc1479f50, 0xdc3158cc, 0xdc315808, 0x0, 0xe) 164 _cfs_tokmsg_seq+0x5f (0xdc315b50, 0x2f, 0x2, 0x1, 0x30d) 200 common_interrupt+0x1a (0xdc1389a0, 0xdda69320, 0xdc1389c0, 0xdc1389a0, 0xdc1389c0) 328 cfs_tokmsg_seq+0x9f (0xdf8898e0, 0x1, 0x2002, 0x4, 0x2) Nothing huge. |