|
From: Auke K. <auk...@in...> - 2006-10-20 21:22:33
|
Harry Edmon wrote: > I have having a strange problem with built-in Intel gigabit NICs on new > motherboards with dual core processors. I have had this problem both on > a SuperMicro X7DVL-E dual-core Xeon based server and an HP dc7700 Core2 > Duo workstation. The problem occurs in Debian etch distribution with > kernel versions 2.6.18, 2.6.18.1, and 2.6.19-rc1 with the included e1000 > driver and the latest driver (version 7.2.9). It also occurs with the > stock kernel on Centos 4.4. It may occur with other kernels, but those > are the ones I have tried. > > Here is where I see the problems. I am running NIS (aka yellow pages). > The systems with the problems are NIS clients connected to a full-duplex > gigabit network. The command that shows the problem is "id -G > <username>", where <username> is any of our users in the NIS database. > Sometimes when this command is executed it returns with no error > messages. But sometimes it prints: > > do_ypcall: clnt_call: RPC: Timed out > > one or more times with a 5 second delay between the messages before the > command finally successfully returns. In doing a tcpdump on the server > and the client, what you see when this occurs is the NIS server sends a > UDP packet that the the client never sees. After 5 seconds the client > times out, it prints out the error message, asks the server for the > information again and this time it sees the packet. I have tried many > different network switches and cables, none of the changes make any > difference. I have tried it on the Supermicro Server with the ioatdma > driver in and out, no difference. I have also tried several different > NIS servers (Sun Ultra 10, HP D310m), no difference. > > Now, for some more interesting information: > > 1. I tried putting a PCI Intel gigabit card into the SuperMicro server. > When I use that card to connect to the network the problem does not > occur. Thus it is not a problem with all Intel NICS. > > 2. I have a Dell Optiplex 745 with the Core2 Duo chips, but with a > Broadcom NIC. The problem does not occur on this system. Therefore it > is not seem to be a problem outside of the Intel NIC. > > 3. "ifconfig -a" and "netstat -i" do not show any dropped packets. > There are no other error messages. > > Although the "id" command eventually returns the information, we have > found cases where this delay in the NIS traffic causes our applications > to fail. Our current operating hypothesis is that their is a problem > with the PCI-Express based Intel gigabit NICs. Do you any reports of a > similar problem? first things first: which on-board NIC's are we talking about? can you include `lspci -vv` output, `dmesg`? Feel free to raise a ticket in our tracker at e1000.sf.net. I'm unsure whether a driver or NIC problem is at hand, it appears so but this is generally a very complex issue. Some distributions ship with default settings for e1000 that we generally don't recommend at all. Check your /etc/modules.conf or modprobe.conf, but also `dmesg` will tell us. This is the first report I've seen about this. It would be interesting to see the tcpdumps if possible. Cheers, Auke |