From: Rommer <ro...@ac...> - 2012-01-28 00:21:58
|
Hello, Cluster with patched network drivers went down again within a week after patching. Another cluster with just disabled sendpage has uptime 16 days (non-patched network drivers). It seems that disable_sendpage parameter for drbd module fixes the problem. Most probably it should be noted in the wiki or readme file setting up iscsi-scst on top of drbd. On 01/10/2012 06:15 AM, Vladislav Bolkhovitin wrote: > Rommer, on 01/09/2012 10:36 AM wrote: >> Hello, >> >> Unfortunately it happened again. >> >> I found get_page()/put_page() calls in >> drivers/net/igb/igb_main.c and >> drivers/net/e1000e/netdev.c >> These drivers is in use. >> >> Do I need to convert them too? > > Hmm, if they are touching skb pages on the TX side, then, apparently, yes, but > this is the first time I'm hearing network drivers doing it, hence need this care. > > I've just checks, in 3.1 they are as expected doing it only on the RX side. There > must be other places missed, like in pages going from DRBD, e.g., in the block stack. > >> On 01/06/12 06:45, Rommer wrote: >>> Hello, >>> >>> On 01/06/2012 05:37 AM, Vladislav Bolkhovitin wrote: >>>> Rommer, on 01/05/2012 08:40 PM wrote: >>>>> Hello, >>>>> >>>>> I'm using scst-2.1.0 on rhel-6.1 on multiple servers as iSCSI targets. >>>>> I'm getting the kernel panic (attached) every few weeks on servers that >>>>> are running scst on top of drbd. On servers that don't use drbd kernel >>>>> panic never happened. >>>>> >>>>> I found two calls of put_page() in drbd-8.3.12 sources. Do I need to >>>>> convert them to net_put_page() ? >>>> >>>> Yes. Apparently, they are missed. >>>> >>> >>> Thank you. One of the put_page() calls refers to the block IO >>> (in drbd_bitmap.c), while the second one to the network stack >>> (drbd_receiver.c). I'll try to replace second call with net_put_page() >>> and post the results in a few weeks. >>> >>>>> scst-panic.txt >>>>> >>>>> >>>>> BUG: unable to handle kernel NULL pointer dereference at 00000000000000c4 >>>>> IP: [<ffffffffa0385832>] iscsi_get_page_callback+0x12/0x30 [iscsi_scst] >>>>> PGD 12f414067 PUD 0 >>>>> Oops: 0002 [#1] SMP >>>>> last sysfs file: /sys/devices/pci0000:00/0000:00:1c.0/0000:05:00.0/host1/target1:1:40/1:1:40:0/scsi_generic/sg68/dev >>>>> CPU 1 >>>>> Modules linked in: scst_vdisk(U) iscsi_scst(U) scst(U) ext2 mbcache libcrc32c crc32c_intel raid0 drbd(U) bonding ipv6 iptable_nat nf_nat nf_conntrack_ipv4 nf_conntrack nf_defrag_ipv4 iptable_filter ip_tables ses enclosure sg dm_mod sd_mod crc_t10dif e1000e aacraid usb_storage serio_raw i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support igb snd_hda_intel snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm snd_timer snd soundcore snd_page_alloc ioatdma dca i7core_edac edac_core shpchp [last unloaded: scst] >>>>> Pid: 5922, comm: iscsiwr27_19 Not tainted 2.6.32-131.21.1.el6.scst.x86_64 #1 X8DTL >>>>> RIP: 0010:[<ffffffffa0385832>] [<ffffffffa0385832>] iscsi_get_page_callback+0x12/0x30 [iscsi_scst] >>>>> RSP: 0018:ffff88006f6f1c50 EFLAGS: 00010286 >>>>> RAX: 0000000000000000 RBX: ffffea0001901a40 RCX: 0000000000001000 >>>>> RDX: 0000000000000001 RSI: 00000000000117a0 RDI: ffffea0001901a40 >>>>> RBP: ffff88006f6f1c50 R08: 0000000000000001 R09: 0000000000000000 >>>>> R10: ffffea0001901a40 R11: 0000000000000001 R12: ffff880063580d40 >>>>> R13: ffff880132928d40 R14: 0000000000000000 R15: 0000000000001000 >>>>> FS: 0000000000000000(0000) GS:ffff88002c620000(0000) knlGS:0000000000000000 >>>>> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b >>>>> CR2: 00000000000000c4 CR3: 0000000102d5b000 CR4: 00000000000006e0 >>>>> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>>>> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 >>>>> Process iscsiwr27_19 (pid: 5922, threadinfo ffff88006f6f0000, task ffff880060aa6b00) >>>>> Stack: >>>>> ffff88006f6f1c70 ffffffff8145e7ce ffff880063580d40 0000000000000000 >>>>> <0> ffff88006f6f1d20 ffffffff8145f795 ffff880060aa6b00 ffff880100000000 >>>>> <0> ffff880000000001 0000000000000001 ffff880000001000 ffffea0001901a40 >>>>> Call Trace: >>>>> [<ffffffff8145e7ce>] net_get_page+0x1e/0x30 >>>>> [<ffffffff8145f795>] tcp_sendpage+0x485/0x580 >>>>> [<ffffffffa03863ca>] iscsi_send+0x87a/0xca0 [iscsi_scst] >>>>> [<ffffffff8145f310>] ? tcp_sendpage+0x0/0x580 >>>>> [<ffffffff8145f310>] ? tcp_sendpage+0x0/0x580 >>>>> [<ffffffffa038695b>] istwr+0x16b/0x2f0 [iscsi_scst] >>>>> [<ffffffff8105d910>] ? default_wake_function+0x0/0x20 >>>>> [<ffffffffa03867f0>] ? istwr+0x0/0x2f0 [iscsi_scst] >>>>> [<ffffffff8108daf6>] kthread+0x96/0xa0 >>>>> [<ffffffff8100c1ca>] child_rip+0xa/0x20 >>>>> [<ffffffff8108da60>] ? kthread+0x0/0xa0 >>>>> [<ffffffff8100c1c0>] ? child_rip+0x0/0x20 >>>>> Code: 8b 45 c8 48 89 c7 e8 3e 72 15 e1 4c 89 e7 e8 56 70 15 e1 e9 a7 fe ff ff 90 55 48 89 e5 0f 1f 44 00 00 ba 01 00 00 00 48 8b 47 38<f0> 0f c1 90 c4 00 00 00 85 d2 75 07 f0 ff 80 c0 00 00 00 c9 c3 >>>>> RIP [<ffffffffa0385832>] iscsi_get_page_callback+0x12/0x30 [iscsi_scst] >>>>> RSP<ffff88006f6f1c50> >>>>> CR2: 00000000000000c4 >>>>> ---[ end trace 605bbcb9f76d2d0c ]--- >>>>> Kernel panic - not syncing: Fatal exception >>>>> Pid: 5922, comm: iscsiwr27_19 Tainted: G D ---------------- 2.6.32-131.21.1.el6.scst.x86_64 #1 >>>>> Call Trace: >>>>> [<ffffffff814d99d3>] ? panic+0x78/0x143 >>>>> [<ffffffff814dda24>] ? oops_end+0xe4/0x100 >>>>> [<ffffffff81040bab>] ? no_context+0xfb/0x260 >>>>> [<ffffffff81040e35>] ? __bad_area_nosemaphore+0x125/0x1e0 >>>>> [<ffffffff81040f03>] ? bad_area_nosemaphore+0x13/0x20 >>>>> [<ffffffff8104159d>] ? __do_page_fault+0x31d/0x480 >>>>> [<ffffffff814dca6b>] ? _spin_unlock_bh+0x1b/0x20 >>>>> [<ffffffff8140d200>] ? sock_aio_write+0x0/0x160 >>>>> [<ffffffff8117136b>] ? do_sync_readv_writev+0xfb/0x140 >>>>> [<ffffffff814df9de>] ? do_page_fault+0x3e/0xa0 >>>>> [<ffffffff814dcd95>] ? page_fault+0x25/0x30 >>>>> [<ffffffffa0385832>] ? iscsi_get_page_callback+0x12/0x30 [iscsi_scst] >>>>> [<ffffffff8145e7ce>] ? net_get_page+0x1e/0x30 >>>>> [<ffffffff8145f795>] ? tcp_sendpage+0x485/0x580 >>>>> [<ffffffffa03863ca>] ? iscsi_send+0x87a/0xca0 [iscsi_scst] >>>>> [<ffffffff8145f310>] ? tcp_sendpage+0x0/0x580 >>>>> [<ffffffff8145f310>] ? tcp_sendpage+0x0/0x580 >>>>> [<ffffffffa038695b>] ? istwr+0x16b/0x2f0 [iscsi_scst] >>>>> [<ffffffff8105d910>] ? default_wake_function+0x0/0x20 >>>>> >>>>> >>>>> >>>>> >>>>> ------------------------------------------------------------------------------ >>>>> Ridiculously easy VDI. With Citrix VDI-in-a-Box, you don't need a complex >>>>> infrastructure or vast IT resources to deliver seamless, secure access to >>>>> virtual desktops. With this all-in-one solution, easily deploy virtual >>>>> desktops for less than the cost of PCs and save 60% on VDI infrastructure >>>>> costs. Try it free! http://p.sf.net/sfu/Citrix-VDIinabox >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> Scst-devel mailing list >>>>> https://lists.sourceforge.net/lists/listinfo/scst-devel >>>> >>> >> > -- С уважением, Роман Шишнёв, CTO | ActiveCloud | http://www.active.by Т +375 17 2 911 511 доб. 308 | ro...@ac... Облачные решения | Серверы и инфраструктура | IaaS | SaaS | Хостинг |