|
From: Brandeburg, J. <jes...@in...> - 2006-10-31 18:47:57
|
e10...@li... wrote: > Hello, >=20 > I have been investigating this one for weeks now, yesterday I > thought I've finally figured it out, seems I was wrong, so > here it comes, final try, emailing the developers. :) >=20 > Kernel is vanilla 2.6.17.11 SMP. [I hope it's not something > fixed in kernel in the meantime... :-/] >=20 > Driver is 7.0.33-k2-NAPI This is one of those where we might have fixed this in the driver already. In any case the 7.3.15 driver has a new statistic having to do with the number of times we've paused and restarted the tx queue. Please try to use the newest driver from our sourceforge site and see if it helps. =20 > The server is an intel etherepress based board, strong cpus, > plenty of memory, etc. Contains 4 cards, of which two is relevant: > 03:02.0 Ethernet controller: Intel Corporation 82545GM > Gigabit Ethernet Controller (rev 04) 06:01.0 Ethernet > controller: Intel Corporation 82541GI/PI Gigabit Ethernet Controller > (rev 05)=20 >=20 > First one is an external PCI-X:133Mhz:64bit, other is > integrated onboard, lspci thinks it is the same speed, e1000 drivr > thinks e1000: 0000:06:01.0: e1000_probe: (PCI:66MHz:32-bit) >=20 > The traffic goes through these cards, mainly in the first and > out the second > (200mbps) and slightly less traffic the other way (70mbps). >=20 > When traffic reaches approx. 32000 packets/sec (I fear it > might be 32768 pkts/sec which is always a bad omen) the first > card starts to queue outgoing traffic (while the second card > does not). The queueing is visible in the linux > queueing: Yeah, 32000 pps still shouldn't be any concern for our hardware. =20 > qdisc tbf 8007: dev eth0 rate 1000Mbit burst 32750b/8 mpu 0b > lat 20.0ms Sent 22893578821 bytes 68342382 pkt (dropped 1, > overlimits 61 requeues 299611) rate 85935Kbit 32380pps > backlog 0b 449p requeues 299611 qdisc tbf 8005: dev eth1 rate > 1000Mbit burst 32750b/8 mpu 0b lat 20.0ms Sent 681354341621 > bytes 937627172 pkt (dropped 12523, overlimits 127208 > requeues 1853808) rate 227450Kbit 34288pps backlog 0b 0p requeues > 1853808=20 I worry about the rates mentioned here, but I'll take your word for it. > (tbf was selected only to be able to measure the traffic, > rate is same as link capacity to prevent overlimits; the > queueing happens with pfifo_fast too.) >=20 > First we noticed the cards were generating 8k irq/sec, > thought the limiting causes this, so I fiddled with the irq limits, > and set it to: e1000 TxIntDelay=3D25,25,25 TxAbsIntDelay=3D256,256,256 > RxIntDelay=3D64,64,64 RxAbsIntDelay=3D256,256,256 Actually the ITR (InterruptThrottleRate) is what is limiting the interrupts here. If you wanted to see what your xxDelay parameters really did load the module with InterruptThrottleRate=3D0,0,0 > This reduced interrupt rate to 2k-5k, but it still queues at > the same rate. So maybe we need more interrupts per second, but see below. =20 > I do not belive it's a hardware issue (can't think of any > which would cause > this) but I cannot further come up with ideas what could > cause the packets to queue up. =20 > The stats doesn't tell me anything helpful either: > # ethtool -S eth0 > NIC statistics: > rx_packets: 1212685382 > tx_packets: 938711270 > rx_bytes: 750835435 > tx_bytes: 3250721930 > rx_no_buffer_count: 8083597 > rx_missed_errors: 52373 > tx_deferred_ok: 3991825 > tx_tcp_seg_good: 4 > rx_flow_control_xon: 0 > rx_flow_control_xoff: 141102752 > tx_flow_control_xon: 129374 > tx_flow_control_xoff: 143188 > rx_long_byte_count: 825384556267 > rx_csum_offload_good: 1208095761 > rx_csum_offload_errors: 37281 Several things are very telling here. 1) you're getting rx_no_buffer_count errors, which means software didn't have any descriptors ready for hardware to DMA into. =3D=3D> solution: increase RxDescriptors using ethtool 2) the second bit is that rx_missed shows that only some portion of the packets in 1) were dropped, so you proobably only need to increase rx descriptors a little bit. 3) flow control: RX: As the rx fifo fills up due to 1) a flow control pause is sent to the link partner (tx flow) TX: This interface is receiving a *lot* of flow control from its link partner, indicating the other end can't keep up with what this adapter is transmitting. This causes the TX unit to pause, causing the driver to stop the queue, causing pfifo_fast to queue. What is this machine connected to on this interface eth0? The cost per RX packet is probably high enough with NAPI enabled that it is taking the driver too long to return free buffers to the hardware. Lots of rx checksum offload errors too, ick. > So I'm clueless here. Any help would be very much > appreciated. Emailing me personally or CC'ing would be > appreciated either. I can naturally provide eprom dumps, > lspci -vv's or anything long, if it helps. Jesse |