Bug 73 not resolved (ipt_NETFLOW freeze network (NETDEV WATCHDOG: eth3...
NetFlow iptables module for Linux kernel
Brought to you by:
aabc
Bug 73 not resolved.
Oct 31 17:51:53 servername kernel: [ 3788.208009] BUG: soft lockup - CPU#5 stuck for 23s! [swapper/5:0] Oct 31 17:51:53 servername kernel: [ 3788.208010] Modules linked in: ip6table_filter(F) ip6_tables(F) xt_IMQ(F) iptable_mangle(F) xt_CT(F) iptable_raw(F) xt_nat(F) xt_mark(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_state(F) xt_tcpudp(F) ipt_NETFLOW(O) nf_conntrack(F) xt_hashlimit(F) xt_set(F) iptable_filter(F) ip_tables(F) x_tables(F) sch_sfq(F) cls_fw(F) cls_u32(F) sch_htb(F) imq(F) ip_set_hash_ip(F) ip_set_hash_net(F) ip_set(F) nfnetlink(F) bonding(F) radeon(F) kvm(F) ttm(F) drm_kms_helper(F) drm(F) gpio_ich(F) i2c_algo_bit(F) i5000_edac(F) edac_core(F) psmouse(F) lpc_ich(F) shpchp(F) microcode(F) i5k_amb(F) serio_raw(F) coretemp(F) joydev(F) mac_hid(F) dcdbas(F) lp(F) parport(F) usb_storage(F) hid_generic(F) usbhid(F) hid(F) igb(OF) dca(F) mptsas(F) mptscsih(F) mptbase(F) bnx2(F) scsi_transport_sas(F) Oct 31 17:51:53 servername kernel: [ 3788.208010] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GF W O 3.10.17-custom-imq-b2 #1 Oct 31 17:51:53 servername kernel: [ 3788.208010] Hardware name: Dell Inc. PowerEdge 1950/0TT740, BIOS 2.6.1 04/20/2009 Oct 31 17:51:53 servername kernel: [ 3788.208010] task: ffff880129b25dc0 ti: ffff880129b2c000 task.ti: ffff880129b2c000 Oct 31 17:51:53 servername kernel: [ 3788.208010] RIP: 0010:[<ffffffffa03cc6f5>] [<ffffffffa03cc6f5>] netflow_target+0xc95/0x1124 [ipt_NETFLOW] Oct 31 17:51:53 servername kernel: [ 3788.208010] RSP: 0018:ffff88012fd43a40 EFLAGS: 00000212 Oct 31 17:51:53 servername kernel: [ 3788.208010] RAX: ffff8800a856d888 RBX: ffff880125cab000 RCX: 0000000000000010 Oct 31 17:51:53 servername kernel: [ 3788.208010] RDX: 0000000000000000 RSI: 0000000000000011 RDI: 000000000000003e Oct 31 17:51:53 servername kernel: [ 3788.208010] RBP: ffff88012fd43b50 R08: 0000000000000020 R09: 0000000000000001 Oct 31 17:51:53 servername kernel: [ 3788.208010] R10: 0000000000000020 R11: 0000000000000020 R12: ffff88012fd439b8 Oct 31 17:51:53 servername kernel: [ 3788.208010] R13: ffffffff816b350a R14: ffff88012fd43b50 R15: ffff8800adeb3680 Oct 31 17:51:53 servername kernel: [ 3788.208010] FS: 0000000000000000(0000) GS:ffff88012fd40000(0000) knlGS:0000000000000000 Oct 31 17:51:53 servername kernel: [ 3788.208010] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Oct 31 17:51:53 servername kernel: [ 3788.208010] CR2: 0000000002583098 CR3: 000000012755e000 CR4: 00000000000007e0 Oct 31 17:51:53 servername kernel: [ 3788.208010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Oct 31 17:51:53 servername kernel: [ 3788.208010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Oct 31 17:51:53 servername kernel: [ 3788.208010] Stack: Oct 31 17:51:53 servername kernel: [ 3788.208010] ffff88012fd43a70 ffffffffa01370da 4661010a2fd43b50 0000000000000246 Oct 31 17:51:53 servername kernel: [ 3788.208010] ffff88012fd43a70 ffffffff00000020 ffff88012fd43ab0 ffffffffa00d66de Oct 31 17:51:53 servername kernel: [ 3788.208010] ffff880100000000 00000020000000bd 00000000ac162054 ffff88012fd43a60 Oct 31 17:51:53 servername kernel: [ 3788.208010] Call Trace: Oct 31 17:51:53 servername kernel: [ 3788.208010] <IRQ> Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffffa01370da>] ? hash_ip4_kadt+0x8a/0xb0 [ip_set_hash_ip] Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffffa00d66de>] ? ip_set_test+0x8e/0x120 [ip_set] Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffffa012f2aa>] ? hash_net4_kadt+0x9a/0xd0 [ip_set_hash_net] Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffffa0396156>] ipt_do_table+0x2c6/0x5e5 [ip_tables] Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffffa013f0d3>] iptable_filter_hook+0x33/0x64 [iptable_filter] Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815d3766>] nf_iterate+0x86/0xb0 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815db920>] ? ip_frag_mem+0x40/0x40 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815d3804>] nf_hook_slow+0x74/0x150 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815db920>] ? ip_frag_mem+0x40/0x40 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815dbd60>] ip_forward+0x3c0/0x3e0 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815d9b38>] ip_rcv_finish+0x78/0x320 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815da3f9>] ip_rcv+0x239/0x390 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815a63f2>] __netif_receive_skb_core+0x682/0x7f0 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff810a4007>] ? generic_exec_single+0xa7/0xb0 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815a6581>] __netif_receive_skb+0x21/0x70 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815a6674>] process_backlog+0xa4/0x180 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff815a6e39>] net_rx_action+0x139/0x230 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff8104ba67>] __do_softirq+0xe7/0x230 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff816b3b7c>] call_softirq+0x1c/0x30 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff810048a5>] do_softirq+0x55/0x90 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff8104bd15>] irq_exit+0xa5/0xb0 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff810256b5>] smp_call_function_single_interrupt+0x35/0x40 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff816b373a>] call_function_single_interrupt+0x6a/0x70 Oct 31 17:51:53 servername kernel: [ 3788.208010] <EOI> Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff8154cfb3>] ? cpuidle_enter_state+0x63/0xe0 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff8154d0e9>] cpuidle_idle_call+0xb9/0x200 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff8100b10e>] arch_cpu_idle+0xe/0x30 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff81094740>] cpu_startup_entry+0xd0/0x250 Oct 31 17:51:53 servername kernel: [ 3788.208010] [<ffffffff81694f68>] start_secondary+0x1df/0x1e4 Oct 31 17:51:53 servername kernel: [ 3788.208010] Code: 41 39 f0 76 2b 48 63 ce 83 c6 01 0f b6 3c 08 40 80 ff 1f 77 16 40 0f b6 cf 45 89 d3 41 29 cb 44 89 d9 45 89 cb 41 d3 e3 44 09 da <40> 84 ff 75 19 89 95 2c ff ff ff 48 c7 c3 20 40 01 00 48 8b a5
# modinfo ipt_NETFLOW filename: /lib/modules/3.10.17-custom-imq-b2/extra/ipt_NETFLOW.ko alias: ip6t_NETFLOW version: v1.8-70-g057b110 description: iptables NETFLOW target module author: <abc@telekom.ru> license: GPL srcversion: B7D5B791C709AD4446D811D depends: x_tables,nf_conntrack vermagic: 3.10.17-custom-imq-b2 SMP mod_unload modversions parm: destination:export destination ipaddress:port (charp) parm: inactive_timeout:inactive flows timeout in seconds (int) parm: active_timeout:active flows timeout in seconds (int) parm: debug:debug verbosity level (int) parm: sndbuf:udp socket SNDBUF size (int) parm: protocol:netflow protocol version (5, 9, 10) (int) parm: refresh_rate:NetFlow v9/IPFIX refresh rate (packets) (uint) parm: timeout_rate:NetFlow v9/IPFIX timeout rate (minutes) (uint) parm: natevents:send NAT Events (int) parm: hashsize:hash table size (int) parm: maxflows:maximum number of flows (int) parm: aggregation:aggregation ruleset (charp)
Желателен бинарник именно тот на котором образовался этот лог. Он еще полезен для анализа этих строк
Без правильного бинарника нельзя понять что такое netflow_target+0xc27
Last edit: ABC 2013-11-02
Пожалуйста пришлите (приаттачте) бинарник ipt_NETFLOW.ko для последнего kern.debug.log
Last edit: ABC 2013-11-02
Ушел на мейл бинарник и кусочек лога о загрузке модуля.
Спасибо, получил!
Я сделал несколько измненеий в коде и вставил пару дебаг сообщений, которые могут помочь, если будете тестировать - возьмите новую версию из git.
1.5 минуты полета и понеслось. Лог и модуль на почте.
Big thanks for all your help. Fixed in git 10d5298.
Я кстати зазеркалил себе tcp тарфик с одного сервера веб хостинга, тестирую - но со вчерашнего у меня ни разу не крашнулось.
Проблема была в tcp пакетах в которых вместо options был мусор (недопустимый по rfc). Возможно это дос атака, или каокй-то хитрый хак, или железка с битым tcp стеком, но видимо у меня таких пакетов небыло.
Сутки теста с pktgen и mirred tcp трафиком - не крашнулось и ни одного варнинга.
ipt_NETFLOW version v1.8-76-g10d5298, srcversion 529E69C322A6788A62E9CE6
Flows: active 19617 (peak 45304 reached 0d3h18m ago), mem 3218K, worker delay 2/250.
Hash: size 7999 (mem 62K), metric 2.18 [2.20, 2.04, 1.77]. MemTraf: 64561 pkt, 13641 K (pdu 28, 5824), Out 1070611000 pkt, 256759765 K.
Rate: 24541090 bits/sec, 12968 packets/sec; Avg 1 min: 25098182 bps, 12855 pps; 5 min: 25303419 bps, 12985 pps
cpu# stat: <search found="" new="" <span="">[metric], trunc frag alloc maxflows>, sock: <ok fail="" cberr,="" bytes="">, traffic: <pkt, bytes="">, drop: <pkt, bytes="">
Total stat: 1277555861 782821615 287853918 [2.19], 0 1 0 0, sock: 8266617 0 8266617, 11256274 K, traffic: 1070675533, 250755 MB, dro
p: 0, 0 K
Linux debian6 2.6.32 #2 SMP Sun Nov 3 03:30:02 MSK 2013 x86_64 GNU/Linux
Зеркалом трафика пол суток потестировал - вроде бы без проблем. Теперь собраться с духом и повесить абонентов.
Ну этот баг теперь точно пофиксен.