Thread: [Aoetools-discuss] High performance AoE
Brought to you by:
ecashin,
elcapitansam
From: Kenneth K. <ken...@gm...> - 2011-02-17 20:11:48
|
Dear list We currently have a AoE SAN running in production that needs several refinements, which I'll tackle in individual mails over the coming days. Basic layout is as follows: Infortrend storage array with SAS & SATA drives, connected via FC to two storage controllers which export logical volumes with AoE to Xen hosts. Each AoE target is used as a virtual block device, either containing the OS (stored on SATA) or additional storage for working data (SAS for databases & mail, SATA for websites, etc). AoE target is vblade, secondary storage is in "cold standby" mode (toggle FC port states on the switch and start vblade's to take over). Switches are Extreme Networks' Summit 7i. At present we have 30 AoE targets running, when our full migration is done we'll have well over a 100 and have plans to scale up way past that. We run a huge private cloud (we're in the wholesale ISP business) as well as managed private clouds for clients. We're ramping up for a full public cloud offering. Assuming the only optimizations I have done are the following: * AoE in private tag-based VLAN. * Bumped the MTU's to 9000 for all VLAN interfaces and switch ports. * Gigabit ethernet. * Leveraging decent switches with full non-blocking architectures. My next question is on leveraging multiple gigabit connections, which leads me to the following questions: Since vblade uses a specified device, should I use channel bonding to aggregate multiple links together for more performance ? If yes, is 802.3ad the best bonding method since the switch is involved in deciding down which link the ethernet frames are sent, or am I missing the plot on this one. I currently have 4 GBE ports per storage controller that I can leverage, and am considering jumping to dual 10 GBE interfaces to the switch. Then, on the initiator side my understanding is that "aggregation" comes for free. So in this case all I need to do is ensure I have a vlan interface per physical interface on the server, and use `aoe-interfaces` to restrict the scope to the multiple vlan interfaces. If not, would I need to bond here as well? My plan is to setup at least 2 GBE per Xen host. I hope to consolidate this information, and the other questions I'll post over the coming days into several blog posts. The biggest downside to learning AoE is cutting through the tons of AoE vs iSCSI crap and getting to the useful parts. I plan to publish the useful parts. Thanks in advance for the assistance, and the great project. Best -- Kenneth Kalmer ken...@gm... http://opensourcery.co.za @kennethkalmer |
From: Yacine K. <ya...@al...> - 2011-02-17 21:44:31
|
Hi, Just a question based on what you have listed: why don't you use Coraid goods instead of vblade on top of Infortrend!!! If the goal is to improve AoE performances then IMHO just test the SR and/or SRX line of products and you will be able to overcommit your performance objectives and it will make your life easier... But that just the point of view of someone which love the KISS principle ;-) Yacine kheddache / www.alyseo.com Le 17 févr. 2011 à 21:11, Kenneth Kalmer <ken...@gm...> a écrit : > Dear list > > We currently have a AoE SAN running in production that needs several > refinements, which I'll tackle in individual mails over the coming > days. > > Basic layout is as follows: > > Infortrend storage array with SAS & SATA drives, connected via FC to > two storage controllers which export logical volumes with AoE to Xen > hosts. Each AoE target is used as a virtual block device, either > containing the OS (stored on SATA) or additional storage for working > data (SAS for databases & mail, SATA for websites, etc). AoE target is > vblade, secondary storage is in "cold standby" mode (toggle FC port > states on the switch and start vblade's to take over). Switches are > Extreme Networks' Summit 7i. > > At present we have 30 AoE targets running, when our full migration is > done we'll have well over a 100 and have plans to scale up way past > that. We run a huge private cloud (we're in the wholesale ISP > business) as well as managed private clouds for clients. We're ramping > up for a full public cloud offering. > > Assuming the only optimizations I have done are the following: > > * AoE in private tag-based VLAN. > * Bumped the MTU's to 9000 for all VLAN interfaces and switch ports. > * Gigabit ethernet. > * Leveraging decent switches with full non-blocking architectures. > > My next question is on leveraging multiple gigabit connections, which > leads me to the following questions: > > Since vblade uses a specified device, should I use channel bonding to > aggregate multiple links together for more performance ? If yes, is > 802.3ad the best bonding method since the switch is involved in > deciding down which link the ethernet frames are sent, or am I missing > the plot on this one. I currently have 4 GBE ports per storage > controller that I can leverage, and am considering jumping to dual 10 > GBE interfaces to the switch. > > Then, on the initiator side my understanding is that "aggregation" > comes for free. So in this case all I need to do is ensure I have a > vlan interface per physical interface on the server, and use > `aoe-interfaces` to restrict the scope to the multiple vlan > interfaces. If not, would I need to bond here as well? My plan is to > setup at least 2 GBE per Xen host. > > I hope to consolidate this information, and the other questions I'll > post over the coming days into several blog posts. The biggest > downside to learning AoE is cutting through the tons of AoE vs iSCSI > crap and getting to the useful parts. I plan to publish the useful > parts. > > Thanks in advance for the assistance, and the great project. > > Best > > -- > Kenneth Kalmer > ken...@gm... > http://opensourcery.co.za > @kennethkalmer > > ------------------------------------------------------------------------------ > The ultimate all-in-one performance toolkit: Intel(R) Parallel Studio XE: > Pinpoint memory and threading errors before they happen. > Find and fix more than 250 security defects in the development cycle. > Locate bottlenecks in serial and parallel code that limit performance. > http://p.sf.net/sfu/intel-dev2devfeb > _______________________________________________ > Aoetools-discuss mailing list > Aoe...@li... > https://lists.sourceforge.net/lists/listinfo/aoetools-discuss |
From: Kenneth K. <ken...@gm...> - 2011-02-18 07:05:18
|
On Thu, Feb 17, 2011 at 11:28 PM, Yacine Kheddache <ya...@al...> wrote: > Hi, > > Just a question based on what you have listed: why don't you use Coraid goods instead of vblade on top of Infortrend!!! If the goal is to improve AoE performances then IMHO just test the SR and/or SRX line of products and you will be able to overcommit your performance objectives and it will make your life easier... Well Yacine, the procurement process for the hardware was out of my control and now I have to use it. The gear is pretty performant, we have a couple of volumes that bypass the AoE stack and access the LUN's directly, and they are super fast. The thing is I'll need to bring them back into the AoE stack to maintain a level of elasticity in our cloud. > But that just the point of view of someone which love the KISS principle ;-) +1, that is why I'm running AoE in the first place, and doing this research to get additional clarity on things that are currently just mentioned in passing on other sites. If I didn't believe in KISS, I would have bowed to all the iSCSI pressure I'm under already... Best -- Kenneth Kalmer ken...@gm... http://opensourcery.co.za @kennethkalmer |
From: Adi K. <ad...@cg...> - 2011-02-18 14:44:09
|
Hi! > We currently have a AoE SAN running in production that needs several > refinements, which I'll tackle in individual mails over the coming > days. [SNIP] > My next question is on leveraging multiple gigabit connections, which > leads me to the following questions: > > Since vblade uses a specified device, should I use channel bonding to > aggregate multiple links together for more performance ? If yes, is > 802.3ad the best bonding method since the switch is involved in > deciding down which link the ethernet frames are sent, or am I missing > the plot on this one. I currently have 4 GBE ports per storage > controller that I can leverage, and am considering jumping to dual 10 > GBE interfaces to the switch. Hmm... I think the main issue is caused by using vblade. I'd consider vblade kind of a reference implementation. Starting multiple vblade processes will not help either because it will just eat up available iops be introducing unnecessary, uncoordinated reads and writes. When talking about performance you have two options: Buy Coraid hardware -- as already suggested or choose a different implementation like ggaoed[1]. There you may specify multiple interfaces so you will get automatic load balancing and so on. Using link aggregation (802.3ad or linux bonding drivers) will not help in any way to improve performance between a storage server and a single frontend: only one lane will be used, based on source and destination mac (though it might work when using 2 different nics on the frontend -- but this will lead to endless issues when the switch decides to choose a different link). -- Adi [1] http://code.google.com/p/ggaoed/ PS: Just in case you find the time, it would be great if you could post a review of the different aoe implementations! :-) -- There is qaoed as well, for example... |
From: Tracy R. <tr...@ul...> - 2011-02-18 19:41:28
|
On Fri, Feb 18, 2011 at 03:27:35PM +0100, Adi Kriegisch spake thusly: > vblade kind of a reference implementation. Starting multiple vblade > processes will not help either because it will just eat up available iops > be introducing unnecessary, uncoordinated reads and writes. When talking Wouldn't the block layer aggregate and coordinate these reads and writes? vblade is just reading/writing from a file like any other process, no? I find vblade performance to be pretty decent. Especially since the "thundering herd" problem of all vblade processes being awakened by every packet was solved. I am never limited by CPU and very rarely even network, only by disk performance itself. -- Tracy Reed Digital signature attached for your safety. Copilotco Professionally Managed PCI Compliant Secure Hosting 866-MY-COPILOT x101 http://copilotco.com |
From: Kenneth K. <ken...@gm...> - 2011-02-20 19:37:43
|
On Fri, Feb 18, 2011 at 9:14 PM, Tracy Reed <tr...@ul...> wrote: > On Fri, Feb 18, 2011 at 03:27:35PM +0100, Adi Kriegisch spake thusly: >> vblade kind of a reference implementation. Starting multiple vblade >> processes will not help either because it will just eat up available iops >> be introducing unnecessary, uncoordinated reads and writes. When talking > > Wouldn't the block layer aggregate and coordinate these reads and writes? > vblade is just reading/writing from a file like any other process, no? I find > vblade performance to be pretty decent. Especially since the "thundering herd" > problem of all vblade processes being awakened by every packet was solved. I am > never limited by CPU and very rarely even network, only by disk performance > itself. I agree with Tracy here, although some tweaking of the kernel might help for optimal access to the underlying block devices. However, this is shooting off the hip and I think tests are in order to put this one to rest. -- Kenneth Kalmer ken...@gm... http://opensourcery.co.za @kennethkalmer |
From: Tracy R. <tr...@ul...> - 2011-02-18 19:41:29
|
On Thu, Feb 17, 2011 at 10:11:18PM +0200, Kenneth Kalmer spake thusly: > Since vblade uses a specified device, should I use channel bonding to > aggregate multiple links together for more performance ? If yes, is > 802.3ad the best bonding method since the switch is involved in > deciding down which link the ethernet frames are sent, or am I missing > the plot on this one. I say use channel bonding but understand that the connection between a particular pair of machines will only use one of the links due to the MAC hashing used by 802.3ad to choose a link to transmit to a particular host over. So no one connection will be getting more than 1Gb but the aggregate throughput from the AoE target to multiple iniators will be greater. And of course make sure you have enough disk performance to actually use the bandwidth. > I currently have 4 GBE ports per storage controller that I can > leverage, and am considering jumping to dual 10 GBE interfaces to the > switch. 10GBE interfaces would certainly get you faster individual connections than 1Gb. But worry about disk throughput first. Measure the bandwidth on your 1Gb links and make sure you can actually hit that before investing in 10Gb links and assuming the bandwidth is the issue. It takes a lot of disks to full even a 1Gb pipe on anything but pure streaming workloads. > Then, on the initiator side my understanding is that "aggregation" > comes for free. So in this case all I need to do is ensure I have a > vlan interface per physical interface on the server, and use > `aoe-interfaces` to restrict the scope to the multiple vlan > interfaces. If not, would I need to bond here as well? My plan is to > setup at least 2 GBE per Xen host. I suppose this would work although you wouldn't have protection against physical layer failure. > I hope to consolidate this information, and the other questions I'll > post over the coming days into several blog posts. The biggest > downside to learning AoE is cutting through the tons of AoE vs iSCSI > crap and getting to the useful parts. I plan to publish the useful > parts. Please post the link to your blog posts here when you get them up. I would love to read them. I have no idea what Coraid is doing these days but I never hear about them and I have long thought AoE is a poorly marketed and greatly under-appreciated storage technology. -- Tracy Reed http://tracyreed.org |
From: Kenneth K. <ken...@gm...> - 2011-02-20 19:45:49
|
On Thu, Feb 17, 2011 at 11:49 PM, Tracy Reed <tr...@ul...> wrote: > On Thu, Feb 17, 2011 at 10:11:18PM +0200, Kenneth Kalmer spake thusly: >> Since vblade uses a specified device, should I use channel bonding to >> aggregate multiple links together for more performance ? If yes, is >> 802.3ad the best bonding method since the switch is involved in >> deciding down which link the ethernet frames are sent, or am I missing >> the plot on this one. > > I say use channel bonding but understand that the connection between a > particular pair of machines will only use one of the links due to the > MAC hashing used by 802.3ad to choose a link to transmit to a particular > host over. So no one connection will be getting more than 1Gb but the > aggregate throughput from the AoE target to multiple iniators will be > greater. And of course make sure you have enough disk performance to > actually use the bandwidth. Good points. Since I'm using 4GB FC to the storage array, my thoughts of 4GBE to the initiators was to eliminate all but physical disk bottleneck. I'll check out the other bonding mechanisms as well and see if one could possibly give us higher throughput than a single GBE link. >> I currently have 4 GBE ports per storage controller that I can >> leverage, and am considering jumping to dual 10 GBE interfaces to the >> switch. > > 10GBE interfaces would certainly get you faster individual connections > than 1Gb. But worry about disk throughput first. Measure the bandwidth > on your 1Gb links and make sure you can actually hit that before > investing in 10Gb links and assuming the bandwidth is the issue. It > takes a lot of disks to full even a 1Gb pipe on anything but pure > streaming workloads. Duly noted, thanks for the reality check. >> Then, on the initiator side my understanding is that "aggregation" >> comes for free. So in this case all I need to do is ensure I have a >> vlan interface per physical interface on the server, and use >> `aoe-interfaces` to restrict the scope to the multiple vlan >> interfaces. If not, would I need to bond here as well? My plan is to >> setup at least 2 GBE per Xen host. > > I suppose this would work although you wouldn't have protection against > physical layer failure. I'm a bit lost here, I'll have two physical links on the initiator side, apart from switch failure what else would I need to worry about in terms of the physical layer ? >> I hope to consolidate this information, and the other questions I'll >> post over the coming days into several blog posts. The biggest >> downside to learning AoE is cutting through the tons of AoE vs iSCSI >> crap and getting to the useful parts. I plan to publish the useful >> parts. > > Please post the link to your blog posts here when you get them up. I > would love to read them. I have no idea what Coraid is doing these days > but I never hear about them and I have long thought AoE is a poorly > marketed and greatly under-appreciated storage technology. I'll definitely do, with high quality feedback like this and some testing in the lab this week I'm sure it will be a steady stream of documentation. Keep an eye on opensourcery.co.za, that is where they'll be posted. Thanks for the help so far, I'll fire off more questions in the week. Best -- Kenneth Kalmer ken...@gm... http://opensourcery.co.za @kennethkalmer |