From: Auke K. <auk...@in...> - 2006-10-31 18:54:37
|
Ronciak, John wrote: > These: > rx_no_buffer_count: 8083597 > rx_missed_errors: 52373 > > are aproblem. It means that the HW isn't being serviced fast enough. > Try things with turning all interrupt moderation off. This will make > the interrupt rate go up significantly but let see what it does to the > queueing. The missed packets should go away with this. I also suggest that reducing the number of interrupts might have hurt you in this case: >> First we noticed the cards were generating 8k irq/sec, >> thought the limiting >> causes this, so I fiddled with the irq limits, and set it to: >> e1000 TxIntDelay=25,25,25 TxAbsIntDelay=256,256,256 >> RxIntDelay=64,64,64 >> RxAbsIntDelay=256,256,256 I would suggest leaving these settings along and *only* playing with 'itr'. The default 'itr' value for that driver is 8000, and increasing it might help. We have seen good results with values over 20000 and even setting it to unlimited (itr=0) might help in case your system can handle it (and it's definately good for latency, whereas setting *IntDelay parameters are not). quick question: are you running with tso on or off? That (having tso on) might expose you to a tc/qdisc kernel bug that we recently uncovered, where the kernel is not accounting for tso packets properly. Cheers, Auke > > Cheers, > John > ----------------------------------------------------------- > "Those who would give up essential Liberty, to purchase a little > temporary Safety, deserve neither Liberty nor Safety.", Benjamin > Franklin 1755 > > >> -----Original Message----- >> From: e10...@li... >> [mailto:e10...@li...] On Behalf >> Of peter gervai >> Sent: Tuesday, October 31, 2006 10:15 AM >> To: e10...@li... >> Subject: [E1000-devel] Transmit queueing,resulting high >> latency and speed drops >> >> >> Hello, >> >> I have been investigating this one for weeks now, yesterday I >> thought I've >> finally figured it out, seems I was wrong, so here it comes, >> final try, emailing >> the developers. :) >> >> Kernel is vanilla 2.6.17.11 SMP. [I hope it's not something >> fixed in kernel in >> the meantime... :-/] >> >> Driver is 7.0.33-k2-NAPI >> >> The server is an intel etherepress based board, strong cpus, >> plenty of memory, >> etc. Contains 4 cards, of which two is relevant: >> 03:02.0 Ethernet controller: Intel Corporation 82545GM >> Gigabit Ethernet >> Controller (rev 04) >> 06:01.0 Ethernet controller: Intel Corporation 82541GI/PI >> Gigabit Ethernet >> Controller (rev 05) >> >> First one is an external PCI-X:133Mhz:64bit, other is >> integrated onboard, lspci >> thinks it is the same speed, e1000 drivr thinks >> e1000: 0000:06:01.0: e1000_probe: (PCI:66MHz:32-bit) >> >> The traffic goes through these cards, mainly in the first and >> out the second >> (200mbps) and slightly less traffic the other way (70mbps). >> >> When traffic reaches approx. 32000 packets/sec (I fear it >> might be 32768 >> pkts/sec which is always a bad omen) the first card starts to >> queue outgoing >> traffic (while the second card does not). The queueing is >> visible in the linux >> queueing: >> >> qdisc tbf 8007: dev eth0 rate 1000Mbit burst 32750b/8 mpu 0b >> lat 20.0ms >> Sent 22893578821 bytes 68342382 pkt (dropped 1, overlimits >> 61 requeues 299611) >> rate 85935Kbit 32380pps backlog 0b 449p requeues 299611 >> qdisc tbf 8005: dev eth1 rate 1000Mbit burst 32750b/8 mpu 0b >> lat 20.0ms >> Sent 681354341621 bytes 937627172 pkt (dropped 12523, >> overlimits 127208 >> requeues 1853808) >> rate 227450Kbit 34288pps backlog 0b 0p requeues 1853808 >> >> (tbf was selected only to be able to measure the traffic, >> rate is same as link >> capacity to prevent overlimits; the queueing happens with >> pfifo_fast too.) >> >> First we noticed the cards were generating 8k irq/sec, >> thought the limiting >> causes this, so I fiddled with the irq limits, and set it to: >> e1000 TxIntDelay=25,25,25 TxAbsIntDelay=256,256,256 >> RxIntDelay=64,64,64 >> RxAbsIntDelay=256,256,256 >> >> This reduced interrupt rate to 2k-5k, but it still queues at >> the same rate. >> >> I do not belive it's a hardware issue (can't think of any >> which would cause >> this) but I cannot further come up with ideas what could >> cause the packets to >> queue up. >> >> The stats doesn't tell me anything helpful either: >> # ethtool -S eth0 >> NIC statistics: >> rx_packets: 1212685382 >> tx_packets: 938711270 >> rx_bytes: 750835435 >> tx_bytes: 3250721930 >> rx_errors: 0 >> tx_errors: 0 >> tx_dropped: 0 >> multicast: 0 >> collisions: 0 >> rx_length_errors: 0 >> rx_over_errors: 0 >> rx_crc_errors: 0 >> rx_frame_errors: 0 >> rx_no_buffer_count: 8083597 >> rx_missed_errors: 52373 >> tx_aborted_errors: 0 >> tx_carrier_errors: 0 >> tx_fifo_errors: 0 >> tx_heartbeat_errors: 0 >> tx_window_errors: 0 >> tx_abort_late_coll: 0 >> tx_deferred_ok: 3991825 >> tx_single_coll_ok: 0 >> tx_multi_coll_ok: 0 >> tx_timeout_count: 0 >> rx_long_length_errors: 0 >> rx_short_length_errors: 0 >> rx_align_errors: 0 >> tx_tcp_seg_good: 4 >> tx_tcp_seg_failed: 0 >> rx_flow_control_xon: 0 >> rx_flow_control_xoff: 141102752 >> tx_flow_control_xon: 129374 >> tx_flow_control_xoff: 143188 >> rx_long_byte_count: 825384556267 >> rx_csum_offload_good: 1208095761 >> rx_csum_offload_errors: 37281 >> rx_header_split: 0 >> alloc_rx_buff_failed: 0 >> >> So I'm clueless here. Any help would be very much >> appreciated. Emailing me >> personally or CC'ing would be appreciated either. I can >> naturally provide eprom >> dumps, lspci -vv's or anything long, if it helps. >> >> Thanks, >> Peter >> >> >> >> -------------------------------------------------------------- >> ----------- >> Using Tomcat but need to do more? Need to support web >> services, security? >> Get stuff done quickly with pre-integrated technology to make >> your job easier >> Download IBM WebSphere Application Server v.1.0.1 based on >> Apache Geronimo >> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057& > dat=121642 > _______________________________________________ > E1000-devel mailing list > E10...@li... > https://lists.sourceforge.net/lists/listinfo/e1000-devel > > ------------------------------------------------------------------------- > Using Tomcat but need to do more? Need to support web services, security? > Get stuff done quickly with pre-integrated technology to make your job easier > Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo > http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 > _______________________________________________ > E1000-devel mailing list > E10...@li... > https://lists.sourceforge.net/lists/listinfo/e1000-devel |