|
From: Support T. <su...@ca...> - 2009-03-27 21:36:21
|
Hi Jay, If you could send us the patch to the bonding in 2.6 kernel, we can look and compare it with the 2.4 kernel to see if there is a problem. I did check the /proc for the bonding device, it showed everything fine. The thing we don't understand is why E1000E driver and 82573 NIC bonding can get 2GB/s with two NIC bonded, but Pro/1000 MT server adapter cannot break the 1GB/s limit. We actually tried also with CentOS 5.2, it seems true for that also -- E1000 driver with 82546 NICs bonding in 802.3ad cannot get more than 1GB/s through it. the tests are with same clients, same servers, same switch, and same network cable wires. Wayne -----Original Message----- From: Jay Vosburgh [mailto:fu...@us...] Sent: Friday, March 27, 2009 2:16 PM To: su...@ca... Cc: e10...@li...; 'Ronciak, John' Subject: [work] Re: [E1000-devel] Bug report E1000 driver bonding in 802.3ad mode can not go beyond 1GB/s throughput Support Team <su...@ca...> wrote: >Thanks for writing back regarding this matter. We use latest 2.4 kernel >from kernel.org. For E1000 driver, >we download it from Intel web site. If you have newer driver somewhere we >can download. We will download that >and give it a try. The bonding bug I referenced won't be in 2.4.37 (the lastest 2.4 from kernel.org). The latest 2.4 bonding driver is in there; it looks to be bonding version 2.6.0 from January 2004. There may be other bugs in this version of bonding; I'm not sure, as there is no development of bonding in the 2.4 kernel (and hasn't been for several years). And, as John said in a separate mail, the 802.3ad implementation is part of the bonding driver, not part of the network device driver (e1000, e1000e, etc). >Basically in our setup, both 82546 NIC based setup and 82573 based setup >using the same os and same kernel, >bonding configuration. the only difference in software is the driver: E1000E >vs. E1000. E1000E driver >working well, E1000 does not. We also plan to test igb driver bonding for >the newest Intel NIC chip. Since you didn't say one way or the other if you checked the bonding status that I described, I'll just reiterate that usually this sort of thing (bonding limited to 1 interface's throughput) is some kind of configuration problem. If you did check the /proc/net/bonding/bond* files, and they're fine, then perhaps the problem is elsewhere. If you didn't check the bonding /proc files, I'd really recommend that you do, because an 802.3ad aggregation failure would exhibit the symptoms you're describing. The most common cause that I see is switch misconfiguration. If it's not a configuration problem, then the next most likely cause is poor balance of traffic. This is often switch related as well, as some switches balance only by XOR of MAC addresses in the frames. I don't really know how your network topology is laid out or what switches you have, so I can't say if this is the problem or not. -J --- -Jay Vosburgh, IBM Linux Technology Center, fu...@us... >Thanks for your help! >Wayne > >-----Original Message----- >From: Jay Vosburgh [mailto:fu...@us...] >Sent: Friday, March 27, 2009 12:44 PM >To: su...@ca... >Cc: 'Ronciak, John'; e10...@li... >Subject: [work] Re: [E1000-devel] Bug report E1000 driver bonding in 802.3ad >mode can not go beyond 1GB/s throughput > >Support Team <su...@ca...> wrote: > >>Yes, same VLAN setup in the software does not work for 82546 chip. The >>setup in software are identical, >>other than 82546 load E1000 driver and 82573 load E1000E driver, nothing >>else changed in software or >>test environment. If we can see 82546/MT adapter can get 2GB/s in bond >>mode, we will be happy. > > Some bonding things to check: > >ip route show > > Make sure no interfaces (the slaves) have routes that supercede >the route for the bond itself. > >cat /proc/net/bonding/bond0 [or whatever bond device you have] > > Make sure the options you think you have match what bonding is >seeing (particularly the xmit_hash_policy if your throughput test is >many connections but not many discrete peers). > > Also in the proc file, make sure the 802.3ad is aggregating >properly; are all of the slaves in the active aggregator? > > Lastly, what distro (cat /etc/issue) and kernel version are you >running (uname -a)? > > I fixed a bug in 802.3ad related to aggregator assignments a >couple of weeks ago. If you've got a sufficiently recent kernel (the >bug was introduced late last year), you might be seeing that (if the >slaves don't aggregate properly). I observed the bug messing up 802.3ad >aggregations with my e1000 devices, so it might be that. > > -J > >--- > -Jay Vosburgh, IBM Linux Technology Center, fu...@us... > > > >--------------------------------------------------------------------------- --- >_______________________________________________ >E1000-devel mailing list >E10...@li... >https://lists.sourceforge.net/lists/listinfo/e1000-devel |