Menu

#74 Bug 73 not resolved (ipt_NETFLOW freeze network (NETDEV WATCHDOG: eth3 (igb): transmit queue 8 timed out))

git version
closed-fixed
ABC
None
1
2015-02-18
2013-10-31
Alexander
No

Bug 73 not resolved.

Oct 31 17:51:53 servername kernel: [ 3788.208009] BUG: soft lockup - CPU#5 stuck for 23s! [swapper/5:0]
Oct 31 17:51:53 servername kernel: [ 3788.208010] Modules linked in: ip6table_filter(F) ip6_tables(F) xt_IMQ(F) iptable_mangle(F) xt_CT(F) iptable_raw(F) xt_nat(F) xt_mark(F) iptable_nat(F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_state(F) xt_tcpudp(F) ipt_NETFLOW(O) nf_conntrack(F) xt_hashlimit(F) xt_set(F) iptable_filter(F) ip_tables(F) x_tables(F) sch_sfq(F) cls_fw(F) cls_u32(F) sch_htb(F) imq(F) ip_set_hash_ip(F) ip_set_hash_net(F) ip_set(F) nfnetlink(F) bonding(F) radeon(F) kvm(F) ttm(F) drm_kms_helper(F) drm(F) gpio_ich(F) i2c_algo_bit(F) i5000_edac(F) edac_core(F) psmouse(F) lpc_ich(F) shpchp(F) microcode(F) i5k_amb(F) serio_raw(F) coretemp(F) joydev(F) mac_hid(F) dcdbas(F) lp(F) parport(F) usb_storage(F) hid_generic(F) usbhid(F) hid(F) igb(OF) dca(F) mptsas(F) mptscsih(F) mptbase(F) bnx2(F) scsi_transport_sas(F)
Oct 31 17:51:53 servername kernel: [ 3788.208010] CPU: 5 PID: 0 Comm: swapper/5 Tainted: GF       W  O 3.10.17-custom-imq-b2 #1
Oct 31 17:51:53 servername kernel: [ 3788.208010] Hardware name: Dell Inc. PowerEdge 1950/0TT740, BIOS 2.6.1 04/20/2009
Oct 31 17:51:53 servername kernel: [ 3788.208010] task: ffff880129b25dc0 ti: ffff880129b2c000 task.ti: ffff880129b2c000
Oct 31 17:51:53 servername kernel: [ 3788.208010] RIP: 0010:[<ffffffffa03cc6f5>]  [<ffffffffa03cc6f5>] netflow_target+0xc95/0x1124 [ipt_NETFLOW]
Oct 31 17:51:53 servername kernel: [ 3788.208010] RSP: 0018:ffff88012fd43a40  EFLAGS: 00000212
Oct 31 17:51:53 servername kernel: [ 3788.208010] RAX: ffff8800a856d888 RBX: ffff880125cab000 RCX: 0000000000000010
Oct 31 17:51:53 servername kernel: [ 3788.208010] RDX: 0000000000000000 RSI: 0000000000000011 RDI: 000000000000003e
Oct 31 17:51:53 servername kernel: [ 3788.208010] RBP: ffff88012fd43b50 R08: 0000000000000020 R09: 0000000000000001
Oct 31 17:51:53 servername kernel: [ 3788.208010] R10: 0000000000000020 R11: 0000000000000020 R12: ffff88012fd439b8
Oct 31 17:51:53 servername kernel: [ 3788.208010] R13: ffffffff816b350a R14: ffff88012fd43b50 R15: ffff8800adeb3680
Oct 31 17:51:53 servername kernel: [ 3788.208010] FS:  0000000000000000(0000) GS:ffff88012fd40000(0000) knlGS:0000000000000000
Oct 31 17:51:53 servername kernel: [ 3788.208010] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Oct 31 17:51:53 servername kernel: [ 3788.208010] CR2: 0000000002583098 CR3: 000000012755e000 CR4: 00000000000007e0
Oct 31 17:51:53 servername kernel: [ 3788.208010] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Oct 31 17:51:53 servername kernel: [ 3788.208010] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
Oct 31 17:51:53 servername kernel: [ 3788.208010] Stack:
Oct 31 17:51:53 servername kernel: [ 3788.208010]  ffff88012fd43a70 ffffffffa01370da 4661010a2fd43b50 0000000000000246
Oct 31 17:51:53 servername kernel: [ 3788.208010]  ffff88012fd43a70 ffffffff00000020 ffff88012fd43ab0 ffffffffa00d66de
Oct 31 17:51:53 servername kernel: [ 3788.208010]  ffff880100000000 00000020000000bd 00000000ac162054 ffff88012fd43a60
Oct 31 17:51:53 servername kernel: [ 3788.208010] Call Trace:
Oct 31 17:51:53 servername kernel: [ 3788.208010]  <IRQ> 
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffffa01370da>] ? hash_ip4_kadt+0x8a/0xb0 [ip_set_hash_ip]
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffffa00d66de>] ? ip_set_test+0x8e/0x120 [ip_set]
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffffa012f2aa>] ? hash_net4_kadt+0x9a/0xd0 [ip_set_hash_net]
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffffa0396156>] ipt_do_table+0x2c6/0x5e5 [ip_tables]
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffffa013f0d3>] iptable_filter_hook+0x33/0x64 [iptable_filter]
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815d3766>] nf_iterate+0x86/0xb0
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815db920>] ? ip_frag_mem+0x40/0x40
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815d3804>] nf_hook_slow+0x74/0x150
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815db920>] ? ip_frag_mem+0x40/0x40
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815dbd60>] ip_forward+0x3c0/0x3e0
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815d9b38>] ip_rcv_finish+0x78/0x320
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815da3f9>] ip_rcv+0x239/0x390
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815a63f2>] __netif_receive_skb_core+0x682/0x7f0
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff810a4007>] ? generic_exec_single+0xa7/0xb0
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815a6581>] __netif_receive_skb+0x21/0x70
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815a6674>] process_backlog+0xa4/0x180
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff815a6e39>] net_rx_action+0x139/0x230
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff8104ba67>] __do_softirq+0xe7/0x230
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff816b3b7c>] call_softirq+0x1c/0x30
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff810048a5>] do_softirq+0x55/0x90
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff8104bd15>] irq_exit+0xa5/0xb0
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff810256b5>] smp_call_function_single_interrupt+0x35/0x40
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff816b373a>] call_function_single_interrupt+0x6a/0x70
Oct 31 17:51:53 servername kernel: [ 3788.208010]  <EOI> 
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff8154cfb3>] ? cpuidle_enter_state+0x63/0xe0
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff8154d0e9>] cpuidle_idle_call+0xb9/0x200
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff8100b10e>] arch_cpu_idle+0xe/0x30
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff81094740>] cpu_startup_entry+0xd0/0x250
Oct 31 17:51:53 servername kernel: [ 3788.208010]  [<ffffffff81694f68>] start_secondary+0x1df/0x1e4
Oct 31 17:51:53 servername kernel: [ 3788.208010] Code: 41 39 f0 76 2b 48 63 ce 83 c6 01 0f b6 3c 08 40 80 ff 1f 77 16 40 0f b6 cf 45 89 d3 41 29 cb 44 89 d9 45 89 cb 41 d3 e3 44 09 da <40> 84 ff 75 19 89 95 2c ff ff ff 48 c7 c3 20 40 01 00 48 8b a5 
# modinfo ipt_NETFLOW
filename:       /lib/modules/3.10.17-custom-imq-b2/extra/ipt_NETFLOW.ko
alias:          ip6t_NETFLOW
version:        v1.8-70-g057b110
description:    iptables NETFLOW target module
author:         <abc@telekom.ru>
license:        GPL
srcversion:     B7D5B791C709AD4446D811D
depends:        x_tables,nf_conntrack
vermagic:       3.10.17-custom-imq-b2 SMP mod_unload modversions 
parm:           destination:export destination ipaddress:port (charp)
parm:           inactive_timeout:inactive flows timeout in seconds (int)
parm:           active_timeout:active flows timeout in seconds (int)
parm:           debug:debug verbosity level (int)
parm:           sndbuf:udp socket SNDBUF size (int)
parm:           protocol:netflow protocol version (5, 9, 10) (int)
parm:           refresh_rate:NetFlow v9/IPFIX refresh rate (packets) (uint)
parm:           timeout_rate:NetFlow v9/IPFIX timeout rate (minutes) (uint)
parm:           natevents:send NAT Events (int)
parm:           hashsize:hash table size (int)
parm:           maxflows:maximum number of flows (int)
parm:           aggregation:aggregation ruleset (charp)

Discussion

<< < 1 2 (Page 2 of 2)
  • ABC

    ABC - 2013-11-02

    Желателен бинарник именно тот на котором образовался этот лог. Он еще полезен для анализа этих строк

    Nov  2 12:35:17 x kernel: [13268.264010]  [<ffffffffa028d5f7>] ? netflow_target+0xc27/0x10f8 [ipt_NETFLOW]
    Nov  2 12:35:17 x kernel: [13268.264010]  [<ffffffffa028d5d7>] ? netflow_target+0xc07/0x10f8 [ipt_NETFLOW]
    Nov  2 12:35:17 x kernel: [13268.264010]  [<ffffffffa028d09f>] ? netflow_target+0x6cf/0x10f8 [ipt_NETFLOW]
    

    Без правильного бинарника нельзя понять что такое netflow_target+0xc27

     

    Last edit: ABC 2013-11-02
  • ABC

    ABC - 2013-11-02

    Пожалуйста пришлите (приаттачте) бинарник ipt_NETFLOW.ko для последнего kern.debug.log

     

    Last edit: ABC 2013-11-02
  • Alexander

    Alexander - 2013-11-02

    Ушел на мейл бинарник и кусочек лога о загрузке модуля.

     
  • ABC

    ABC - 2013-11-02

    Спасибо, получил!

     
  • ABC

    ABC - 2013-11-03

    Я сделал несколько измненеий в коде и вставил пару дебаг сообщений, которые могут помочь, если будете тестировать - возьмите новую версию из git.

     
  • Alexander

    Alexander - 2013-11-03

    1.5 минуты полета и понеслось. Лог и модуль на почте.

     
  • ABC

    ABC - 2013-11-03

    Big thanks for all your help. Fixed in git 10d5298.

     
  • ABC

    ABC - 2013-11-03

    Я кстати зазеркалил себе tcp тарфик с одного сервера веб хостинга, тестирую - но со вчерашнего у меня ни разу не крашнулось.

    Проблема была в tcp пакетах в которых вместо options был мусор (недопустимый по rfc). Возможно это дос атака, или каокй-то хитрый хак, или железка с битым tcp стеком, но видимо у меня таких пакетов небыло.

     
  • ABC

    ABC - 2013-11-04

    Сутки теста с pktgen и mirred tcp трафиком - не крашнулось и ни одного варнинга.

    ipt_NETFLOW version v1.8-76-g10d5298, srcversion 529E69C322A6788A62E9CE6
    Flows: active 19617 (peak 45304 reached 0d3h18m ago), mem 3218K, worker delay 2/250.
    Hash: size 7999 (mem 62K), metric 2.18 [2.20, 2.04, 1.77]. MemTraf: 64561 pkt, 13641 K (pdu 28, 5824), Out 1070611000 pkt, 256759765 K.
    Rate: 24541090 bits/sec, 12968 packets/sec; Avg 1 min: 25098182 bps, 12855 pps; 5 min: 25303419 bps, 12985 pps
    cpu# stat: <search found="" new="" <span="">[metric], trunc frag alloc maxflows>, sock: <ok fail="" cberr,="" bytes="">, traffic: <pkt, bytes="">, drop: <pkt, bytes="">
    Total stat: 1277555861 782821615 287853918 [2.19], 0 1 0 0, sock: 8266617 0 8266617, 11256274 K, traffic: 1070675533, 250755 MB, dro
    p: 0, 0 K

    Linux debian6 2.6.32 #2 SMP Sun Nov 3 03:30:02 MSK 2013 x86_64 GNU/Linux

     
  • Alexander

    Alexander - 2013-11-05

    Зеркалом трафика пол суток потестировал - вроде бы без проблем. Теперь собраться с духом и повесить абонентов.

     
  • ABC

    ABC - 2013-11-05

    Ну этот баг теперь точно пофиксен.

     
  • ABC

    ABC - 2013-11-13
    • status: open-accepted --> open-fixed
    • Priority: 9 --> 1
     
  • ABC

    ABC - 2014-08-31
    • status: open-fixed --> closed-fixed
     
<< < 1 2 (Page 2 of 2)

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.