Re: [Drbl-user] The amount of TX is huge!
Brought to you by:
steven_shiau
From: Steven S. <st...@nc...> - 2013-02-18 08:38:17
|
On 02/11/2013 09:20 PM, Genie Jhang wrote: > FYI, here's the messages log after error occurred. > > "GEMsim" is some program I'm running and "nuclear" is hostname. > The device in problem is eth1. > > Thanks for all reading this mail. > > =============================================================================================== > > Feb 11 22:04:41 nuclear kernel: ------------[ cut here ]------------ > Feb 11 22:04:41 nuclear kernel: WARNING: at net/sched/sch_generic.c:261 dev_watchdog+0x26d/0x280() (Not tainted) > Feb 11 22:04:41 nuclear kernel: Hardware name: System Product Name > Feb 11 22:04:41 nuclear kernel: NETDEV WATCHDOG: eth1 (e1000e): transmit queue 0 timed out > Feb 11 22:04:41 nuclear kernel: Modules linked in: fuse nfsd lockd nfs_acl auth_rpcgss autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_MASQUERADE iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_tftp nf_conntrack_ftp ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 ext3 jbd xfs exportfs uinput ses enclosure sg microcode i2c_i801 i2c_core iTCO_wdt iTCO_vendor_support e1000e ext4 mbcache jbd2 sd_mod crc_t10dif pata_acpi ata_generic ata_piix aacraid dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan] > Feb 11 22:04:41 nuclear kernel: Pid: 15215, comm: GEMsim Not tainted 2.6.32-279.9.1.el6.x86_64 #1 > Feb 11 22:04:41 nuclear kernel: Call Trace: > Feb 11 22:04:41 nuclear kernel: <IRQ> [<ffffffff8106b797>] ? warn_slowpath_common+0x87/0xc0 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8106b886>] ? warn_slowpath_fmt+0x46/0x50 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8145993d>] ? dev_watchdog+0x26d/0x280 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8108ca8d>] ? insert_work+0x6d/0xb0 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8109cf43>] ? ktime_get+0x63/0xe0 > Feb 11 22:04:41 nuclear kernel: [<ffffffff814596d0>] ? dev_watchdog+0x0/0x280 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8107e917>] ? run_timer_softirq+0x197/0x340 > Feb 11 22:04:41 nuclear kernel: [<ffffffff81096d40>] ? hrtimer_interrupt+0x140/0x250 > Feb 11 22:04:41 nuclear kernel: [<ffffffff81073f41>] ? __do_softirq+0xc1/0x1e0 > Feb 11 22:04:41 nuclear kernel: [<ffffffff810dbb00>] ? handle_IRQ_event+0x60/0x170 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8100c24c>] ? call_softirq+0x1c/0x30 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8100de85>] ? do_softirq+0x65/0xa0 > Feb 11 22:04:41 nuclear kernel: [<ffffffff81073d25>] ? irq_exit+0x85/0x90 > Feb 11 22:04:41 nuclear kernel: [<ffffffff81506095>] ? do_IRQ+0x75/0xf0 > Feb 11 22:04:41 nuclear kernel: [<ffffffff8100ba53>] ? ret_from_intr+0x0/0x11 > Feb 11 22:04:41 nuclear kernel: <EOI> > Feb 11 22:04:41 nuclear kernel: ---[ end trace b202b869af5e1efe ]--- The above are kernel oops... Maybe you'd better to upgrade the kernel... Steven. > Feb 11 22:04:41 nuclear kernel: e1000e 0000:06:00.0: eth1: Reset adapter > Feb 11 22:04:41 nuclear kernel: e1000e 0000:06:00.0: eth1: Error reading PHY register > Feb 11 22:04:41 nuclear kernel: e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx > Feb 11 22:08:22 nuclear kernel: tcpdump uses obsolete (PF_INET,SOCK_PACKET) > Feb 11 22:08:37 nuclear kernel: device eth1 entered promiscuous mode > Feb 11 22:09:11 nuclear kernel: e1000e 0000:06:00.0: eth1: Reset adapter > Feb 11 22:09:11 nuclear kernel: e1000e 0000:06:00.0: eth1: Error reading PHY register > Feb 11 22:09:11 nuclear kernel: e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx > Feb 11 22:09:24 nuclear kernel: device eth1 left promiscuous mode > Feb 11 22:10:01 nuclear kernel: e1000e 0000:06:00.0: eth1: Reset adapter > Feb 11 22:10:01 nuclear kernel: e1000e 0000:06:00.0: eth1: Error reading PHY register > Feb 11 22:10:01 nuclear kernel: e1000e: eth1 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: Rx/Tx > Feb 11 22:11:46 nuclear kernel: e1000e 0000:06:00.0: eth1: Reset adapter > > > 2013. 2. 10., 오후 1:49, Steven Shiau <st...@nc...> 작성: > >> >> >> On 01/27/2013 05:53 AM, Genie Jhang wrote: >>> You're right, Steve. >>> After I deleted the Network-Manager, the amount of TX is greatly decreased. >>> >>> But the problem the eth1 goes down is not solved. >>> I don't know much on the network. >>> Out lab has 13 nodes connected to one server. >>> We're not using external ethernet card, which has multiple ports to be used bound as one. >>> >>> When eth1 went down, errors and drops counts are drastically increased. >>> >>> Any idea? >> Not really. Sorry. >> Any one on this mailing has idea? >> >> Steven. >>> >>> Thank you in advance. >>> >>> Best Regards, >>> Genie. >>> >>> Dec 24, 2012, 12:18 PM, Steven Shiau <st...@nc...> 작성: >>> >>>> >>>> >>>> On 2012/12/18 下午 09:40, Genie Jhang wrote: >>>>> Hi. >>>>> >>>>> I have two questions. >>>>> >>>>> Question number 1. >>>>> This is the message from ifconfig and eth1 is the ethernet for local connection. >>>>> >>>>> ------------------------------------------------------------------------------------------------------------ >>>>> eth1 Link encap:Ethernet HWaddr 48:5B:39:8D:50:48 >>>>> inet addr:10.15.0.100 Bcast:10.15.255.255 Mask:255.255.0.0 >>>>> inet6 addr: fe80::4a5b:39ff:fe8d:5048/64 Scope:Link >>>>> UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 >>>>> RX packets:95028520 errors:0 dropped:0 overruns:0 frame:0 >>>>> TX packets:111821407 errors:0 dropped:0 overruns:0 carrier:0 >>>>> collisions:0 txqueuelen:1000 >>>>> RX bytes:18300779563 (17.0 GiB) TX bytes:113186710243 (105.4 GiB) >>>>> Interrupt:19 Memory:fbde0000-fbe00000 >>>>> ------------------------------------------------------------------------------------------------------------ >>>>> >>>>> Uptime of the server is about 27 days. >>>>> Is the amount of TX bytes normal? >>>>> We have 12 nodes and use Scientific Linux 6.3 which is based on Redhat Enterprise Linux 6. >>>> I never pay attention to that number. However, for DRBL mode, yes, the >>>> network usage is high... >>>>> >>>>> Question number 2. >>>>> Time to time, eth1 goes down resulting in disconnection from all the other nodes. >>>>> This problem is solved when the server computer is rebooted. >>>>> Is this related to the TX bytes? >>>> Not very sure. However, did you use netowrk-manager? If so, it's >>>> recommended to remove network-manager on the server, and write the >>>> static network setting in /etc/sysconfig/network-scripts/ifcfg-eth1. >>>> Or try to upgrade the kernel, or getting the driver source from your >>>> vendor, compile the kernel module and use it. >>>> >>>> Steven. >>>> >>>>> >>>>> If it is, then can I solve the problem with a new ethernet card with 4 ports and use them with bonding? >>>>> >>>>> Thank you for your help in advance. >>>>> >>>>> Best Regards, >>>>> Genie >>>>> ------------------------------------------------------------------------------ >>>>> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial >>>>> Remotely access PCs and mobile devices and provide instant support >>>>> Improve your efficiency, and focus on delivering more value-add services >>>>> Discover what IT Professionals Know. Rescue delivers >>>>> http://p.sf.net/sfu/logmein_12329d2d >>>>> _______________________________________________ >>>>> Drbl-user mailing list >>>>> Drb...@li... >>>>> https://lists.sourceforge.net/lists/listinfo/drbl-user >>>>> >>>> >>>> -- >>>> Steven Shiau <steven _at_ nchc org tw> <steven _at_ stevenshiau org> >>>> National Center for High-performance Computing, Taiwan. >>>> http://www.nchc.org.tw >>>> Public Key Server PGP Key ID: 4096R/47CF935C >>>> Fingerprint: 0240 1FEB 695D 7112 62F0 8796 11C1 12DA 47CF 935C >>>> >>>> ------------------------------------------------------------------------------ >>>> LogMeIn Rescue: Anywhere, Anytime Remote support for IT. Free Trial >>>> Remotely access PCs and mobile devices and provide instant support >>>> Improve your efficiency, and focus on delivering more value-add services >>>> Discover what IT Professionals Know. Rescue delivers >>>> http://p.sf.net/sfu/logmein_12329d2d >>>> _______________________________________________ >>>> Drbl-user mailing list >>>> Drb...@li... >>>> https://lists.sourceforge.net/lists/listinfo/drbl-user >> >> -- >> Steven Shiau <steven _at_ nchc org tw> <steven _at_ stevenshiau org> >> National Center for High-performance Computing, Taiwan. >> http://www.nchc.org.tw >> Public Key Server PGP Key ID: 4096R/47CF935C >> Fingerprint: 0240 1FEB 695D 7112 62F0 8796 11C1 12DA 47CF 935C -- Steven Shiau <steven _at_ nchc org tw> <steven _at_ stevenshiau org> National Center for High-performance Computing, Taiwan. http://www.nchc.org.tw Public Key Server PGP Key ID: 4096R/47CF935C Fingerprint: 0240 1FEB 695D 7112 62F0 8796 11C1 12DA 47CF 935C |