From: Alexander D. <ale...@in...> - 2010-01-28 22:31:06
|
On Wed, 2010-01-27 at 04:14 -0800, Покотиленко Костик wrote: > Using serial console I've figured out: > > - system working fine except for the NIC > - ifconfig show only RX dropped increasing on eth1 (client side), other > counters stailed. > - ethtool -t eth0: > > The test result is FAIL > The test extra info: > Register test (offline) 0 > Eeprom test (offline) 0 > Interrupt test (offline) 0 > Loopback test (offline) 13 > Link test (on/offline) 0 > > - ethtool -t eth1 > > The test result is FAIL > The test extra info: > Register test (offline) 0 > Eeprom test (offline) 0 > Interrupt test (offline) 0 > Loopback test (offline) 13 > Link test (on/offline) 0 > > - After doing: > > ifdown -a; rmmod igb; rmmod dca; modprobe igb; ifup -a > > both ethtool commands (The test result is FAIL) and ifconfig show same > result > > So it seems like NIC hawdware hand. The next time this occurs could you go though and run the ethtool test on all of the network ports? I'm wondering if it is only eth0/1 that are blocked or if eth3/4 are stopped as well. > I don't think this problem is related to something other then NIC / igb > driver. If there are HW problems like memory or power I would notice > other system problems not just NIC, itsn't it?\ I'm wondering if this issue might somehow be a PCIe problem. The fact that the loopback test is failing tells me that the issue is likely somehow related to the NIC's ability to perform DMA transactions since that is essentially all the loopback test does. One of the reasons why I am thinking it is something in the system is because both eth0 and eth1 fail at the same time. From the software's perspective these ports appear as two separate devices, but there are certain physical items that are shared such as the PCIe physical link and it is possible that there may be some sort of issue there that is causing the hangs and resets. By doing an ethtool test on eth3/4 we will at least know if the issue extends to the bridge on the NIC or if it is only eth0/1. > > If I can do more testing let me know. Moving NIC to other server isn't > option for me. > > The server is quite new, could it be IRQ related problem, i.e. > motherboard not fully supported by <=2.6.30? > I'm not suspecting an IRQ problem because the loopback test doesn't do anything with the interrupts. Also one of the tests that are performed in the ethtool testing is an interrupt test and the fact that it passed means that interrupts are behaving as expected. Thanks, Alex |