|
From: Harry E. <ha...@at...> - 2006-10-20 20:58:59
|
I have having a strange problem with built-in Intel gigabit NICs on new motherboards with dual core processors. I have had this problem both on a SuperMicro X7DVL-E dual-core Xeon based server and an HP dc7700 Core2 Duo workstation. The problem occurs in Debian etch distribution with kernel versions 2.6.18, 2.6.18.1, and 2.6.19-rc1 with the included e1000 driver and the latest driver (version 7.2.9). It also occurs with the stock kernel on Centos 4.4. It may occur with other kernels, but those are the ones I have tried. Here is where I see the problems. I am running NIS (aka yellow pages). The systems with the problems are NIS clients connected to a full-duplex gigabit network. The command that shows the problem is "id -G <username>", where <username> is any of our users in the NIS database. Sometimes when this command is executed it returns with no error messages. But sometimes it prints: do_ypcall: clnt_call: RPC: Timed out one or more times with a 5 second delay between the messages before the command finally successfully returns. In doing a tcpdump on the server and the client, what you see when this occurs is the NIS server sends a UDP packet that the the client never sees. After 5 seconds the client times out, it prints out the error message, asks the server for the information again and this time it sees the packet. I have tried many different network switches and cables, none of the changes make any difference. I have tried it on the Supermicro Server with the ioatdma driver in and out, no difference. I have also tried several different NIS servers (Sun Ultra 10, HP D310m), no difference. Now, for some more interesting information: 1. I tried putting a PCI Intel gigabit card into the SuperMicro server. When I use that card to connect to the network the problem does not occur. Thus it is not a problem with all Intel NICS. 2. I have a Dell Optiplex 745 with the Core2 Duo chips, but with a Broadcom NIC. The problem does not occur on this system. Therefore it is not seem to be a problem outside of the Intel NIC. 3. "ifconfig -a" and "netstat -i" do not show any dropped packets. There are no other error messages. Although the "id" command eventually returns the information, we have found cases where this delay in the NIS traffic causes our applications to fail. Our current operating hypothesis is that their is a problem with the PCI-Express based Intel gigabit NICs. Do you any reports of a similar problem? |