|
From: Kok, A. <auk...@in...> - 2008-03-28 21:09:54
|
snowcrash+e1000 wrote: > hi auke, > > thanks for the details/explanation. > >> So, obviously this is a very hard issue to solve and we have not yet found a >> resolution since the real cause is completely outside of the driver and the >> adapter, and in hardware that we do not control. >> >> We're here trying (still) to come up with a workaround... can never really fix >> this issue since it's not Intel hardware. > > reticent to monkey with my Dom0 -- but understanding your point -- i > managed to get my hands on a new drive -- and installed a separate > non-xen instance. > > for that system, here's my step-by-step for a NON-xen demonstration of > the issue. > > bottom line -- problem still exists, whether xen, or non-xen kernel is used good, that eliminates all that. Now we're down to ppp. I still notice that the tx hang is instantly - not even a single packet makes it out of the card apparently, which is *astounding* Can you perhaps attach a tcpdump to the interface before pppd is started? what output do you see immediately before and after the hang from `ifconfig -a` ? > options e1000 debug=16 XsumRX=0 Speed=1000 Duplex=2 > InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 > TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0 those are all a bit unneeded and I would suggest removing all of these options. Note that you are explicitly forcing speed/duplex here which is not recommended, and may be wrong depending on the link partner configuration. Try (1) with no options or (2) with only TSO disabled: `ethtool -K eth0 tso off` > details -n- bits: > > uname -a > Linux server.local 2.6.24.3-50.fc8 #1 SMP Thu Mar 20 13:39:08 EDT > 2008 x86_64 x86_64 x86_64 GNU/Linux cool, that's good enough for now > yum install ppp net-tools rp-pppoe kernel-devel gcc > > lspci | grep -i intel > 04:07.0 Ethernet controller: Intel Corporation 82541PI Gigabit > Ethernet Controller (rev 05) > > cd /etc/sysconfig/network-scripts > > cat /etc/sysconfig/network-scripts/ifcfg-eth0 > BOOTPROTO=none > DEVICE=eth0 > ETHTOOL_OPTS="autoneg on" > HWADDR=xx:xx:xx:xx:xx:xx > IPV6INIT=no > IPV6_AUTOCONF=no > MTU=1492 > NM_CONTROLLED=no > ONBOOT=yes > TYPE=Ethernet > USERCTL=no > > > cat /etc/modprobe.conf > alias eth0 e1000 > options e1000 debug=16 XsumRX=0 Speed=1000 Duplex=2 > InterruptThrottleRate=0 FlowControl=3 RxDescriptors=4096 > TxDescriptors=4096 RxIntDelay=0 TxIntDelay=0 > alias eth1 forcedeth > alias /dev/ppp ppp_generic > alias net-pf-24 pppoe > alias char-major-108 ppp_generic > alias tty-ldisc-3 ppp_async > alias tty-ldisc-14 ppp_synctty > alias ppp-compress-18 ppp_mppe > alias ppp-compress-21 bsd_comp > alias ppp-compress-24 ppp_deflate > alias ppp-compress-26 ppp_deflate > ... > > cat /etc/sysconfig/modules/pppoe.modules > #!/bin/sh > MP="/sbin/modprobe" > $MP pppoe > $MP ppp_generic > $MP ppp_async > $MP n_hdlc > $MP ppp_synctty > $MP ppp_mppe > $MP ppp_deflate > > grep -i ppp /boot/config-`uname -r` > CONFIG_PPP=m > CONFIG_PPP_MULTILINK=y > CONFIG_PPP_FILTER=y > CONFIG_PPP_ASYNC=m > CONFIG_PPP_SYNC_TTY=m > CONFIG_PPP_DEFLATE=m > # CONFIG_PPP_BSDCOMP is not set > CONFIG_PPP_MPPE=m > CONFIG_PPPOE=m > CONFIG_PPPOATM=m > CONFIG_ISDN_PPP=y > CONFIG_ISDN_PPP_VJ=y > CONFIG_IPPP_FILTER=y > # CONFIG_ISDN_PPP_BSDCOMP is not set > > modinfo e1000 | grep ^version > version: 7.3.20-k2-NAPI > cd /tmp/e1000-7.6.15.5/src > make install > > ls -al /lib/modules/2.6.24.3-50.fc8/kernel/drivers/net/e1000/e1000.ko > -rw-r--r-- 1 root root 3217601 2008-03-28 11:17 > /lib/modules/2.6.24.3-50.fc8/kernel/drivers/net/e1000/e1000.ko > > rmmod e1000 > insmod /lib/modules/2.6.24.3-50.fc8/kernel/drivers/net/e1000/e1000.ko > > modinfo e1000 | grep ^version > version: 7.6.15.5-NAPI excellent, thanks for doing that. > cat /etc/ppp/pppoe.conf > ACNAME= > CLAMPMSS=1412 > CONNECT_POLL=6 > CONNECT_TIMEOUT=60 > DEFAULTROUTE=yes > DEMAND=no > DNSTYPE=SERVER > ETH=eth0 > FIREWALL=NONE > LCP_FAILURE=3 > LCP_INTERVAL=20 > LINUX_PLUGIN= > PEERDNS=no > PIDFILE=/var/run/pppoe-adsl.pid > PING="." > PPPD_EXTRA= > PPPOE_EXTRA= > PPPOE_TIMEOUT=80 > SERVICENAME= > SYNCHRONOUS=no > USER=#####@sbcglobal.net > > reboot > ... > lsmod | egrep -i "ppp|hdlc" > ppp_deflate 14720 0 > zlib_deflate 28440 1 ppp_deflate > ppp_mppe 15496 0 > ppp_synctty 19712 0 > ppp_async 21248 0 > crc_ccitt 10752 1 ppp_async > pppoe 31616 0 > pppox 12432 1 pppoe > n_hdlc 18180 0 > ppp_generic 38304 6 > ppp_deflate,ppp_mppe,ppp_synctty,ppp_async,pppoe,pppox > slhc 14976 1 ppp_generic > > dmesg | grep e1000 > e1000: 0000:04:07.0: e1000_validate_option: Transmit Descriptors set to 4096 > e1000: 0000:04:07.0: e1000_validate_option: Receive Descriptors set to 4096 > e1000: 0000:04:07.0: e1000_validate_option: Checksum Offload Disabled > e1000: 0000:04:07.0: e1000_validate_option: Flow Control Enabled > e1000: 0000:04:07.0: e1000_validate_option: Transmit Interrupt Delay set to 0 > e1000: 0000:04:07.0: e1000_validate_option: Receive Interrupt Delay set to 0 > e1000: 0000:04:07.0: e1000_check_options: Interrupt Throttling Rate > (ints/sec) turned off > e1000: 0000:04:07.0: e1000_check_copper_options: Using > Autonegotiation at 1000 Mbps Full Duplex only > e1000: 0000:04:07.0: e1000_probe: (PCI:33MHz:32-bit) 00:1b:21:09:f7:2e > e1000: eth0: e1000_probe: Intel(R) PRO/1000 Network Connection > e1000: eth0: e1000_change_mtu: changing MTU from 1500 to 1492 > e1000: eth0: e1000_watchdog_task: NIC Link is Up 100 Mbps Full > Duplex, Flow Control: None notice that the driver already complains about autoneg settings being off with the switch... and then falls back to 100mbit/fd > cd /usr/share/doc/rp-pppoe-3.8 > setenv DEBUG 1 > sh pppoe-start > > Mar 28 12:03:52 server pppd[3115]: pppd 2.4.4 started by root, uid 0 > Mar 28 12:03:52 server pppd[3115]: Using interface ppp0 > Mar 28 12:03:52 server pppd[3115]: Connect: ppp0 <--> /dev/pts/2 > Mar 28 12:03:53 server kernel: e1000: eth0: e1000_clean_tx_irq: > Detected Tx Unit Hang > Mar 28 12:03:53 server kernel: Tx Queue <0> > Mar 28 12:03:53 server kernel: TDH <1> > Mar 28 12:03:53 server kernel: TDT <1> > Mar 28 12:03:53 server kernel: next_to_use <1> > Mar 28 12:03:53 server kernel: next_to_clean <0> > Mar 28 12:03:53 server kernel: buffer_info[next_to_clean] > Mar 28 12:03:53 server kernel: time_stamp <100003443> > Mar 28 12:03:53 server kernel: next_to_watch <0> > Mar 28 12:03:53 server kernel: jiffies <10000396d> > Mar 28 12:03:53 server kernel: next_to_watch.status <0> > Mar 28 12:03:55 server kernel: e1000: eth0: e1000_clean_tx_irq: > Detected Tx Unit Hang > Mar 28 12:03:55 server kernel: Tx Queue <0> > Mar 28 12:03:55 server kernel: TDH <1> > Mar 28 12:03:55 server kernel: TDT <1> > Mar 28 12:03:55 server kernel: next_to_use <1> > Mar 28 12:03:55 server kernel: next_to_clean <0> > Mar 28 12:03:55 server kernel: buffer_info[next_to_clean] > Mar 28 12:03:55 server kernel: time_stamp <100003443> > Mar 28 12:03:55 server kernel: next_to_watch <0> > Mar 28 12:03:55 server kernel: jiffies <10000413d> > Mar 28 12:03:55 server kernel: next_to_watch.status <0> > Mar 28 12:03:57 server kernel: NETDEV WATCHDOG: eth0: transmit timed out > Mar 28 12:03:59 server kernel: e1000: eth0: e1000_watchdog_task: NIC > Link is Up 100 Mbps Full Duplex, Flow Control: None > Mar 28 12:04:09 server kernel: e1000: eth0: e1000_clean_tx_irq: > Detected Tx Unit Hang > Mar 28 12:04:09 server kernel: Tx Queue <0> > Mar 28 12:04:09 server kernel: TDH <1> > Mar 28 12:04:09 server kernel: TDT <1> > Mar 28 12:04:09 server kernel: next_to_use <1> > Mar 28 12:04:09 server kernel: next_to_clean <0> > Mar 28 12:04:09 server kernel: buffer_info[next_to_clean] > Mar 28 12:04:09 server kernel: time_stamp <100006edb> > Mar 28 12:04:09 server kernel: next_to_watch <0> > Mar 28 12:04:09 server kernel: jiffies <1000077ed> > Mar 28 12:04:09 server kernel: next_to_watch.status <0> > Mar 28 12:04:11 server kernel: e1000: eth0: e1000_clean_tx_irq: > Detected Tx Unit Hang > Mar 28 12:04:11 server kernel: Tx Queue <0> > Mar 28 12:04:11 server kernel: TDH <1> > Mar 28 12:04:11 server kernel: TDT <1> > Mar 28 12:04:11 server kernel: next_to_use <1> > Mar 28 12:04:11 server kernel: next_to_clean <0> > Mar 28 12:04:11 server kernel: buffer_info[next_to_clean] > Mar 28 12:04:11 server kernel: time_stamp <100006edb> > Mar 28 12:04:11 server kernel: next_to_watch <0> > Mar 28 12:04:11 server kernel: jiffies <100007fbd> > Mar 28 12:04:11 server kernel: next_to_watch.status <0> > Mar 28 12:04:13 server kernel: e1000: eth0: e1000_clean_tx_irq: > Detected Tx Unit Hang > Mar 28 12:04:13 server kernel: Tx Queue <0> > Mar 28 12:04:13 server kernel: TDH <1> > Mar 28 12:04:13 server kernel: TDT <1> > Mar 28 12:04:13 server kernel: next_to_use <1> > Mar 28 12:04:13 server kernel: next_to_clean <0> > Mar 28 12:04:13 server kernel: buffer_info[next_to_clean] > Mar 28 12:04:13 server kernel: time_stamp <100006edb> > Mar 28 12:04:13 server kernel: next_to_watch <0> > Mar 28 12:04:13 server kernel: jiffies <10000878d> > Mar 28 12:04:13 server kernel: next_to_watch.status <0> > Mar 28 12:04:14 server kernel: NETDEV WATCHDOG: eth0: transmit timed out > Mar 28 12:04:18 server kernel: e1000: eth0: e1000_watchdog_task: NIC > Link is Up 100 Mbps Full Duplex, Flow Control: None > Mar 28 12:04:23 server pppd[3115]: LCP: timeout sending Config-Requests > Mar 28 12:04:23 server pppd[3115]: Connection terminated. > Mar 28 12:04:23 server pppd[3115]: Modem hangup > Mar 28 12:04:27 server pppoe[3118]: Timeout waiting for PADO packets > Mar 28 12:04:27 server pppd[3115]: Exit. > Mar 28 12:04:58 server kernel: e1000: eth0: e1000_watchdog_task: NIC > Link is Down not sure why link goes down here... ppp does that? pending the questions above I might have you run a special driver to dump the ring contents, but lets see what comes up from this first. Thanks for taking the time to dig into this. Auke thanks for taking the time to dig in this, this is starting to look like a new issue |