|
From: Brandeburg, J. <jes...@in...> - 2009-03-05 22:24:13
|
Hi Carsten, any update? Carsten Aulbert wrote: > Hi Jesse, > > sorry for the late reply, too much other work to rank this question > high enough. > > I was able to work around the problem by adding netconsole module > loading to a dhclient hook. > > Brandeburg, Jesse schrieb: >>> >>> 0d:00.0 0200: 8086:108c (rev 03) >>> 0e:00.0 0200: 8086:109a >> >> ah, our good friends on the supermicro motherboard, right? please >> make sure that the eeproms have been updated. >> > > Well spotted. We needed to upgrade the firmware due to some IPMI > interaction problems, which versions are recommended? > > n1500:~# ethtool -i eth0 > driver: e1000e > version: 0.3.3.3-k6 > firmware-version: 0.15-4 > bus-info: 0000:0d:00.0 > n1500:~# ethtool -i eth1 > driver: e1000e > version: 0.3.3.3-k6 > firmware-version: 0.5-7 > bus-info: 0000:0e:00.0 > >> the dhcp server tells the dhcp client to change the MTU? > > yes, we run with MTU 9000 everywhere on the data network. > >> if magic sysrq >> works, can you take a sysrq-t (you'll need netconsole or serial >> console to log the output) so we can see where it is frozen? I >> haven't heard of any issues like this for e1000e before now. What >> about the watchdog timers or nmi_watchdog=1 boot option? >> > > We will try that, I'll put one of our students on it. > >> I suspect that something happens to cause an infinite loop, whether >> it is our driver's fault needs to be isolated by getting a stack >> dump when its hung up. >> >>> Well, now the question, how to fix it - if I've provided you with at >>> least the initial amount of information you might need. >> >> not quite sure what is wrong yet. can you get a tcpdump of the >> traffic leading to the hang from a mirror port on a switch (or >> another client on a hub) or anything like that? It would help us >> reproduce the issue here. > > Yes, we have got a tap-device which we can put in between and > duplicate all traffic. > > More (hopefully) soon > > Sorry again. > > Carsten |